代数の条件付き期待の直観

ましょう確率変数与え、確率空間であると -代数条件付き期待値である新しいランダム変数を構築できます。 $(\Omega,\mathscr{F},\mu)$ $\xi:\Omega \to \mathbb{R}$ $\sigma$ $\mathscr{G}\subseteq \mathscr{F}$ $E[\xi|\mathscr{G}]$

について考える直観は何ですか？以下の直感を理解しています。 $E[\xi|\mathscr{G}]$

（i） ここで、はイベント（正の確率）です。 $E[\xi|A]$ $A$

（ii） ここで、は離散確率変数です。 $E[\xi|\eta]$ $\eta$

しかし、視覚化することはできません。私はそれの数学を理解しており、視覚化できるより単純なケースを一般化するような方法で定義されていることを理解しています。しかし、それでも私はこの考え方が役に立つとは思いません。それは私にとって不思議なオブジェクトのままです。 $E[\xi|\mathscr{G}]$

たとえば、をイベントとし。形成 -代数、によって生成された1。次いで、に等しくなるなら、そして等しいなら。換言すれば、であれば、及び if。 $A$ $\mu(A)>0$ $\sigma$ $\mathscr{G} = \{ \emptyset, A, A^c, \Omega\}$ $A$ $E[\xi|\mathscr{G}](\omega)$ $\frac{1}{\mu(A)} \int_A \xi$ $\omega \in A$ $\frac{1}{\mu(A^c)} \int_{A^c} \xi$ $\omega \not \in A$ $E[\xi|\mathscr{G}](\omega) = E[\xi|A]$ $\omega\in A$ $E[\xi|\mathscr{G}](\omega) = E[\xi|A^c]$ $\omega \in A^c$

紛らわしい部分はなので、なぜ？をに置き換える理由かどうかによって異なりますが、をに置き換えることはできませんか？ $\omega \in \Omega$ $E[\xi|\mathscr{G}](\omega) = E[\xi|\Omega] = E[\xi]$ $E[\xi|\mathscr{G}]$ $E[\xi| A\text{ or } A^c]$ $\omega\in A$ $E[\xi|\mathscr{G}]$ $E[\xi]$

注意。この質問に答える際には、条件付き期待の厳密な定義を使用してこれを説明しないでください。という事は承知しています。私が理解したいのは、条件付き期待値が計算することになっているものと、なぜ別のものの代わりに1つを拒否するかです。

— ニコラス・ブルバキ
ソース

回答:

条件付き表現について考える1つの方法は、代数への投影としてです。 $\sigma$ $\mathscr{G}$

（ウィキメディアコモンズより）

これは、平方積分可能なランダム変数について話すとき、実際に厳密に真実です。この場合、は、実際には、に関して測定可能なランダム変数からなる部分空間へのランダム変数の正射影です。そして実際、これは、ランダム変数による近似を介して、ランダム変数について何らかの意味で真であることが判明しました。 $\mathbb{E}[\xi|\mathscr{G}]$ $\xi$ $L^2(\Omega)$ $\mathscr{G}$ $L^1$ $L^2$

（参照についてはコメントを参照してください。）

1を考えると私たちは（確率過程の理論的には絶対条件である通訳）利用可能を持っているどのくらいの情報、そしてより大きな表すものとして代数を代数は小さいながら、このように可能性のある結果についてのより多くの情報をより多くの可能性のあるイベントを意味し、代数とは、起こりうるイベントが少ないことを意味し、したがって、起こりうる結果に関する情報が少なくなります。 $\sigma-$ $\sigma-$ $\sigma-$

したがって、突出 -measurableランダム変数小さく上に代数の値の最善の推測撮影手段から入手可能な、より限定された情報が与えられた。 $\mathscr{F}$ $\xi$ $\sigma-$ $\mathscr{G}$ $\xi$ $\mathscr{G}$

つまり、からの情報全体ではなく、からの情報のみが与えられた場合、は、厳密な意味で、ランダム変数が何であるかについて可能な限り最良の推測です。 $\mathscr{G}$ $\mathscr{F}$ $\mathbb{E}[\xi|\mathscr{G}]$ $\xi$

あなたの例に関しては、ランダム変数とその値を混乱させるかもしれないと思います。ランダム変数は、ドメインがイベント空間である関数です。それは数字ではありません。言い換えれば、、、一方のための、。 $X$ $X: \Omega \to \mathbb{R}$ $X \in \{f\ |\ f: \Omega \to \mathbb{R} \}$ $\omega \in \Omega$ $X(\omega)\in\mathbb{R}$

私の意見では、条件付き期待値の表記は、ランダム変数そのもの、つまり関数であるため、本当に悪いです。対照的に、ランダム変数の（通常の）期待値は数値です。ランダム変数の条件付き期待値は、同じランダム変数の期待値とはまったく異なる量です。つまり、はと「型チェック」さえしません。 $\mathbb{E}[\xi|\mathscr{G}]$ $\mathbb{E}[\xi]$

言い換えれば、通常の期待と条件付きの期待の両方を示すために記号を使用することは、表記法の非常に大きな乱用であり、それは多くの不必要な混乱をもたらします。 $\mathbb{E}$

以上のことはすべて、ある数（ランダム変数の値の値で評価）が、は確率変数ですが、、によって生成された代数は $\mathbb{E}[\xi|\mathscr{G}](\omega)$ $\mathbb{E}[\xi|\mathscr{G}]$ $\omega$ $\mathbb{E}[\xi|\Omega]$ $\sigma$ $\Omega$ $\{ \emptyset, \Omega\}$ は自明/縮退であり、技術的に言えば、この定数確率変数の定数値はです。ここで、は通常の期待値を表し、条件付き期待値ではなく、したがってランダム変数ではありません。 $\mathbb{E}[\xi]$ $\mathbb{E}$

また、は次を意味します。技術的には上の条件にそれが唯一可能です話す個々のイベントの代数ではなく、確率測度は唯一の完全に定義されていますので、代数ではなく、個々のイベントに。したがって、は、を表し $\mathbb{E}[\xi|A]$ $\sigma-$ $\sigma-$ $\mathbb{E}[\xi|A]$ $\mathbb{E}[\xi|\sigma(A)]$ $\sigma(A)$ $\sigma-$ イベントによって生成された代数であり、。なお、。つまり、、およびはまったく同じオブジェクトを示すすべての異なる方法です。 $A$ $\{ \emptyset, A, A^c, \Omega\}$ $\sigma(A) = \mathscr{G} = \sigma(A^c)$ $\mathbb{E}[\xi|A]$ $\mathbb{E}[\xi|\mathscr{G}]$ $\mathbb{E}[\xi|A^c]$

最後に、ランダム変数定数値ちょうど数である - 代数 $\mathbb{E}[\xi|\Omega]=\mathbb{E}[\xi|\sigma(\Omega)]= \mathbb{E}[\xi| \{ \emptyset, \Omega\}]$ $\mathbb{E}[\xi]$ $\sigma-$ $\{ \emptyset, \Omega\}$ represents the least possible amount of information we could have, in fact essentially no information, so under this extreme circumstance the best possible guess we could have for which random variable $\xi$ is is the constant random variable whose constant value is $\mathbb{E}[\xi]$ .

すべての定数ランダム変数であることに注意してください確率変数、それらは全て自明に対して測定可能である -代数、そう確かに、我々は一定ランダムこと持たない直交投影です主張されたように、に関して測定可能なランダム変数からなるの部分空間に。 $L^2$ $\sigma$ $\{\emptyset, \Omega\}$ $\mathbb{E}[\xi]$ $\xi$ $L^2(\Omega)$ $\{\emptyset, \Omega\}$

— Chill2Macht
ソース

@William I disagree with you about the use of

E[ξ|A] $E[\xi|A]$ as a ran var. Many books define

E[ξ|A] $E[\xi|A]$ to be a number, not a ran var. It is the best possible estimate of

ξ|A $\xi|_A$ . This is a useful notion and highly intuitive. Disregarding it completely, just because you have a generalized notion of cond exp as a ran var is wrong from a pedagogical point-of-view. I am not confused about what a r.v. is, nor do I see how anything I wrote would lead you to thinking like that.

— Nicolas Bourbaki

@William Thinking of cond expe as an estimate to the ran var with

G $\mathscr{G}$ representing information, is something I have seen said before but I never gave it that much thought and tried to find a different way of visualizing cond expec. Using your suggestion, I am going to write up a simple example, and post it as an answer, for myself, and for other people. Perhaps, some people can then elaborate on my example and give a more exotic one.

— Nicolas Bourbaki

@NicolasBourbaki I recommend that you look at p.221 of the 4th edition of Durrett's Probability - Theory and Examples. I can refer you to other sources discussing this as well. In any case, it is not really a matter of opinion -- in the most general case, a conditional expectation is a random variable, and conditioning is only done with respect to

σ− $\sigma-$ algebras; conditioning with respect to an event is conditioning with respect to the

σ− $\sigma-$ algebra generated by the event, and conditioning with respect to a random variable is conditioning w.r.t. the

σ $\sigma$ -algebra generated by the RV

— Chill2Macht

@William And I can refer you to sources which do define the cond. exep. of an event to be a real number. I do not know why you are so stuck on this point. One can define it any way, as long as the notions are not mixed up. For pedagogical reasons, teaching a class on prob. theory, and instantly jumping into the most general def., is not illuminating. In either case, it really does not matter in this discussion, and your complaint is about notation/semantics.

— Nicolas Bourbaki

@NicolasBourbaki Chapter 5 of Whittle's Probability via Expectation gives a very good account (in my opinion) of both characterizations of conditional expectation, and explains well how each definition relates to and is motivated by the other definition. You are right that the distinction is one more of semantics. My enthusiasm for the more general definition stems (I think) from reading this chapter (5 of Whittle's Probability via Expectation), which made (I believe) good arguments about how the more general definition is in some ways easier to understand.

— Chill2Macht

I am going to try to elaborate what William suggested.

Let $\Omega$ be the sample space of tossing a coin twice. Define the ran. var. $\xi$ to be the num. of heads that occur in the experiment. Clearly, $E[\xi] = 1$ . One way of thinking of what $1$ , as an expec. value, represents is as the best possible estimate for $\xi$ . If we had to take a guess for what value $\xi$ would take, we would guess $1$ . This is because $E[(\xi - 1)^2] \leq E[(\xi - a)^2]$ for any real number $a$ .

Denote by $A = \{ HT, HH \}$ to be the event that the first outcome is a head. Let $\mathscr{G} = \{ \emptyset, A, A^c, \Omega\}$ be the $\sigma$ -alg. gen. by $A$ . We think of $\mathscr{G}$ as representing what we know after the first toss. After the first toss, either heads occured, or heads did not occur. Hence, we are either in the event $A$ or $A^c$ after the first toss.

If we are in the event $A$ , then the best possible estimate for $\xi$ would be $E[\xi|A] = 1.5$ , and if we are in the event $A^c$ , then the best possible estimate for $\xi$ would be $E[\xi|A^c] = 0.5$ .

Now define the ran. var. $\eta(\omega)$ to be either $1.5$ or $0.5$ depending on whether or not $\omega\in A$ . This ran. var. $\eta$ , is a better approximation than $1 = E[\xi]$ since $E[(\xi - \eta)^2] \leq E[(\xi -1)^2]$ .

What $\eta$ is doing is providing the answer to the question: what is the best estimate of $\xi$ after the first toss? Since we do not know the information after the first toss, $\eta$ will depend on $A$ . Once the event $\mathscr{G}$ is revealed to us, after the first toss, the value of $\eta$ is determined and provides the best possible estimate for $\xi$ .

The problem with using $\xi$ as its own estimate, i.e. $0=E[(\xi - \xi)^2] \leq E[(\xi - \eta)^2]$ is as follows. $\xi$ is not well-defined after the first toss. Say the outcome of the experiment is $\omega$ with first outcome being heads, we are in the event $A$ , but what is $\xi(\omega)=?$ We do not know from just the first toss, that value is ambiguous to us, and so $\xi$ is not well-defined. More formally, we say that $\xi$ is not $\mathscr{G}$ -measurable i.e. its value is not well-defined after the first toss. Thus, $\eta$ is the best possible estimate of $\xi$ after the first toss.

Perhaps, somebody here can come up with a more sophisticated example using the sample space $[0,1]$ , with $\xi (\omega) = \omega$ , and $\mathscr{G}$ some non-trivial $\sigma$ -algebra.

— Nicolas Bourbaki
ソース

Although you request not to use the formal definition, I think that the formal definition is probably the best way of explaining it.

Wikipedia - conditional expectation:

Then a conditional expectation of X given $\displaystyle \scriptstyle {\mathcal {H}}$ , denoted as $\displaystyle \scriptstyle \operatorname {E} (X\mid {\mathcal {H}})$ , is any $\displaystyle \scriptstyle {\mathcal {H}}$ -measurable function ( $\displaystyle \scriptstyle \Omega \to \mathbb {R} ^{n}$ ) which satisfies:

$\displaystyle \int _{H}\operatorname {E} (X\mid {\mathcal {H}})\;dP=\int _{H}X\;dP\qquad {\text{for each}}\quad H\in {\mathcal {H}}$

Firstly, it is a $\displaystyle \scriptstyle {\mathcal {H}}$ -measurable function. Secondly it has to match the expectation over every measurable (sub)set in $\displaystyle \scriptstyle {\mathcal {H}}$ . So for an event,A, the sigma algebra is $\{A,A^C,\emptyset, \Omega\}$ , so clearly it is set as you specified in your question for $\omega \in A/A^c$ . Similarly for any discrete random variable ( and combinations of them), we list out all primitive events and assign the expectation given that primitive event.

Now consider tossing a coin an infinite number of times, where at each toss i, you get $1/2^i$ , if your coin is tails then your total winnings are $X=\sum _{i=1}^\infty \frac{1}{2^i}c_i$ where $c_i$ = 1 for tails and 0 for heads. Then X is a real random variable on $[0,1]$ . After n coin tosses, you know the value of X to precision $1/2^n$ , eg after 2 coin tosses it is in [0,1/4], [1/4,1/2], [1/2,3/4] or [3/4,1] - after every coin toss, your associated sigma algebra is getting finer and finer, and similarly the conditional expectation of X is getting more and more precise.

Hopefully this example of a real valued random variable with a sequence of sigma algebras getting finer and finer (Filtration) gets you away from the purely event based intuition you are used to, and clarifies its purpose.

— seanv507
ソース

I apologize, but I downvoted this question. It does not answer what I originally asked. Nor does it provide any new information that I did not know before.

— Nicolas Bourbaki

What I am trying to suggest to you is you do not understand the formal definition as well as you think you do (as the other answer also suggested), so unless you work through what is unintuitive with the formal definition you will not progress.

— seanv507

I understand the formal definition just fine. The questions that I asked, I know how to answer them when working from the formal definitions. The 'other answer', was trying to explain my question without using the definition of con. exp.

— Nicolas Bourbaki