分散の線形性

16

次の2つの式が当てはまると思います。

V a r (a X) = a 2 V a r (X)

$\mathrm{Var}(aX)=a^2 \mathrm{Var}(X)$ に一定数である

V a r (X + Y) = V a r (X) + V a r (Y)

$\mathrm{Var}(X + Y)=\mathrm{Var}(X)+\mathrm{Var}(Y)$ 場合

X $X$ 、

Y $Y$ 独立しています

ただし、以下の何が問題なのかわかりません。

V a r (2 X) = V a r (X + X) = V a r (X) + V a r (X)

$\mathrm{Var}(2X) = \mathrm{Var}(X+X) = \mathrm{Var}(X) + \mathrm{Var}(X)$ これは

22Var(X) $2^2 \mathrm{Var}(X)$ に等しくない、すなわち

4Var(X) $4\mathrm{Var}(X)$ 。

あるとするならば $X$ 母集団から採取したサンプルで、私たちは常に想定することができると思い $X$ 他から独立しているように $X$ 、S。

それで私の混乱の何が悪いのでしょうか？

variance linearity fallacy

— ランセリバイ
ソース

8

それがあった場合、あなたが持っていると思います（あなたの最初の文が示すこの-分散は線形ではない

Var(aX)=aVar(X) $Var(aX) = a Var(X)$ 一方、共分散がバイリニアである。。

— バットマン

33

$\DeclareMathOperator{\Cov}{Cov}$ $\DeclareMathOperator{\Corr}{Corr}$ $\DeclareMathOperator{\Var}{Var}$

推論のラインの問題は

「は常に $X$ 他のから独立していると仮定できると思います $X$ 。」

$X$ の独立していない $X$ 。シンボル $X$ は、ここで同じランダム変数を参照するために使用されています。数式に表示される最初の値がわかれば $X$ 、2番目の値も表示されるように修正さ $X$ れます。個別の（潜在的に独立した）ランダム変数を参照する場合は、異なる文字（例： $X$ および $Y$ ）または添え字（例： $X_1$ および $X_2$ ）でそれらを示す必要があります。後者は、同じ分布から引き出された変数を示すためにしばしば使用されます（常にではありません）。

二つの変数場合はと、その後独立していると同じである：の値を知る私たちの価値に関する追加情報与えない。しかし、ある場合との値を知ること：それ以外は $X$ $Y$ $\Pr(X=a|Y=b)$ $\Pr(X=a)$ $Y$ $X$ $\Pr(X=a|X=b)$ $1$ $a=b$ $0$ $X$ の値に関する完全な情報を提供します。[この段落の確率を累積分布関数、または適切な場合は確率密度関数に置き換えて、本質的に同じ効果を得ることができます。] $X$

物事を見てのもう一つの方法は、ということである2つの変数が独立しているならば、彼らはゼロ相関を持っている（ただし、ゼロ相関は独立性を意味するものではありません！）が、されて完全に自分自身と相関し、ので、独立することはできませんそれ自体の。なお、以降の共分散は、で与えられる。 $X$ $\Corr(X,X)=1$ $X$ 、次いで $\Cov(X,Y)=\Corr(X,Y)\sqrt{\Var(X)\Var(Y)}$

Cov (X, X) = 1 Var (X) 2 - - - - - - - \sqrt = Var (X)

$\Cov(X,X)=1\sqrt{\Var(X)^2}=\Var(X)$

2つのランダム変数の合計の分散のより一般的な式は、

Var (X + Y) = Var (X) + Var (Y) + 2 Cov (X, Y)

$\Var(X+Y) = \Var(X) + \Var(Y) + 2 \Cov(X,Y)$

特に、なので、 $\Cov(X,X) = \Var(X)$

Var (X + X) = Var (X) + Var (X) + 2 Var (X) = 4 Var (X)

$\Var(X+X) = \Var(X) + \Var(X) + 2\Var(X) = 4\Var(X)$

これは、ルールの適用から推測したものと同じです

Var (a X) = a 2 Var (X) ⟹ Var (2 X) = 4 Var (X)

$\Var(aX) = a^2 \Var(X) \implies \Var(2X) = 4\Var(X)$

線形性に興味があるなら、共分散の双線形性に興味があるかもしれません。ランダム変数、、、および（従属または独立）および定数、、、およびについては、 $W$ $X$ $Y$ $Z$ $a$ $b$ $c$ $d$

Cov (a W + b X, Y) = a Cov (W, Y) + b Cov (X, Y)

$\Cov(aW + bX, Y) = a \Cov(W,Y) + b \Cov(X,Y)$

Cov (X, c Y + d Z) = c Cov (X, Y) + d Cov (X, Z)

$\Cov(X, cY + dZ) = c \Cov(X,Y) + d \Cov(X,Z)$

そして全体的に、

Cov (a W + b X, c Y + d Z) = a c Cov (W, Y) + a d Cov (W, Z) + b c Cov (X, Y) + b d Cov (X, Z)

$\Cov(aW + bX, cY + dZ) = ac \Cov(W,Y) + ad \Cov(W,Z) + bc \Cov(X,Y) + bd \Cov(X,Z)$

次に、これを使用して、投稿で書いた分散の（非線形）結果を証明できます。

Var (a X) = Cov (a X, a X) = a 2 Cov (X, X) = a 2 Var (X)

$\Var(aX) = \Cov(aX, aX) = a^2 \Cov(X,X) = a^2 \Var(X)$

Var (a X + b Y) Var (a X + b Y) = Cov (a X + b Y, a X + b Y) = a 2 Cov (X, X) + a b Cov (X, Y) + b a Cov (X, Y) + b 2 Cov (Y, Y) = a 2 Var (X) + b 2 Var (Y) + 2 a b Cov (X, Y)

$\begin{align} \Var(aX + bY) &= \Cov(aX + bY, aX + bY) \\ &= a^2 \Cov(X,X) + ab \Cov(X,Y) + ba \Cov (X,Y) + b^2 \Cov(Y,Y) \\ \Var(aX + bY) &= a^2 \Var(X) + b^2 \Var(Y) + 2ab \Cov(X,Y) \end{align}$

The latter gives, as a special case when $a=b=1$ ,

Var (X + Y) = Var (X) + Var (Y) + 2 Cov (X, Y)

$\Var(X+Y) = \Var(X) + \Var(Y) + 2 \Cov(X,Y)$

When $X$ and $Y$ are uncorrelated (which includes the case where they are independent), then this reduces to $\Var(X+Y) = \Var(X) + \Var(Y)$ . So if you want to manipulate variances in a "linear" way (which is often a nice way to work algebraically), then work with the covariances instead, and exploit their bilinearity.

— Silverfish
ソース

1

Yes! I think you pinpointed at the beginning that the confusion was essentially a notational one. I found it very helpful when one book (very explicitly, some might say laboriously) explained the interpretation of and rules of evaluating a probabilistic statement (so that, e.g., even if you know what you mean by

Pr(X+X=n) $\Pr (X+X=n)$ where

X∼Uniform(1..6) $X \sim \text{Uniform}(1..6)$ , it is technically incorrect if you're considering throwing a

n $n$ in craps (and

X+X=2X $X+X=2X$ would never yield an odd roll); the event would be properly expressed using

X1,X2 $X_1,X_2$ i.i.d.).

— Vandermonde

1

This is in contrast to (and I think my misapprehension might have stemmed from) how 2+PRNG(6)+PRNG(6) often is how you would toss dice as above and/or notation/conventions such as

2d6=d6+d6 $2 \text{d}6 = \text{d}6 + \text{d}6$ in which different instances are genuinely intended to be independent.

— Vandermonde

@Vandermonde That's an interesting point. I initially considered mentioning the use of subscripts to distinguish between "different

X $X$ s" but didn't bother - think I might edit it in now. The argument that "you'd never get an odd total score if the sum was

2X $2X$ " is very clear and convincing to someone who can't see the need to distinguish: thanks for sharing it.

— Silverfish

0

Another way of thinking about it is that with random variables $2X \neq X + X$ .

$2X$ would mean two times the value of the outcome of $X$ , while $X + X$ would mean two trials of $X$ . In other words, it's the difference between rolling a die once and doubling the result, vs rolling a die twice.

— Benjamin
ソース

+1 This is a perfectly clear and correct answer. Welcome to our site!

— whuber

Thanks @whuber!

— Benjamin