線形回帰で係数の分散共分散行列を導き出す方法

36

私は線形回帰に関する本を読んでいて、分散共分散行列を理解するのに苦労しています $\mathbf{b}$ ：

ここに画像の説明を入力してください

対角線の項目は簡単ですが、非対角線の項目はもう少し難しいです。私が困惑しているのは、

σ (b_{0}, b_{1}) = E (b_{0} b_{1}) - E (b_{0}) E (b_{1}) = E (b_{0} b_{1}) - β_{0} β_{1}

$\sigma(b_0, b_1) = E(b_0 b_1) - E(b_0)E(b_1) = E(b_0 b_1) - \beta_0 \beta_1$

ただし、ここにはと痕跡はありません。 $\beta_0$ $\beta_1$

regression

— qed
ソース

3

関連質問：stats.stackexchange.com/questions/44838/...

— ocram

2

本はどれですか？

— コンスタンティノス

。Neterら、アプライド・線形回帰モデルは、1983年には、ページ216あなたは応用線形統計モデルで同じ材料を見つけることができ、第5版、ページ207

— akavalar

53

これは実際には、回帰の基本的な理解に挑戦するクールな質問です。

最初に表記法に関する最初の混乱を取り除きます。回帰を見ています：

y = b_{0} + b_{1} x + \hat{u}

$y=b_0+b_1x+\hat{u}$

ここで $b_0$ 及び $b_1$ 真の推定量である $\beta_0$ 及び $\beta_1$ 、および回帰の残差です。したがって、基礎となる真の非観測回帰は次のように示されることに注意してください。 $\hat{u}$

y = β_{0} + β_{1} x + u

$y=\beta_0+\beta_1x+u$

期待と $E[u]=0$ と分散 $E[u^2]=\sigma^2$ 。いくつかの書籍が示す $b$ のように、我々はここで、この規則を適応させます。我々はまた、行列表記、使用するbはの推定保持する2×1ベクトルであり、、すなわち、 $\hat{\beta}$ $\beta=[\beta_0, \beta_1]'$ $b=[b_0, b_1]'$ 。（わかりやすくするために、以下の計算ではXを固定として扱います。）

さてあなたの質問に。共分散の式は確かに正しい、つまり：

σ (b_{0}, b_{1}) = E (b_{0} b_{1}) - E (b_{0}) E (b_{1}) = E (b_{0} b_{1}) - β_{0} β_{1}

$\sigma(b_0, b_1) = E(b_0 b_1) - E(b_0)E(b_1) = E(b_0 b_1) - \beta_0 \beta_1$

私はあなたが私たちが真の未観測の係数が持って来る方法を知りたいと思う $\beta_0, \beta_1$ この式では？数式を展開してさらに一歩進んだ場合、実際にキャンセルされます。これを確認するには、推定量の母分散が次の式で与えられることに注意してください。

V a r (\hat{β}) = σ^{2} (X^{'} X)^{- 1}

$Var(\hat\beta)=\sigma^2(X'X)^{-1}$

この行列は、対角要素の分散と非対角要素の共分散を保持します。

上記の式に到達するために、マトリックス表記を使用してクレームを一般化します。私たちしたがって表す分散をさせ $Var[\cdot]$ を有すると期待値 $E[\cdot]$ 。

V a r [b] = E [b^{2}] - E [b] E [b^{'}]

$Var[b]=E[b^2]-E[b]E[b']$

基本的に、マトリックス表記を使用した一般的な分散式があります。この式は、推定量 $b=(X'X)^{-1}X'y$ の標準式に代入すると解決します。また、 $E[b]=\beta$ が不偏推定量であると仮定します。したがって、以下を取得します。

E [((X^{'} X)^{- 1} X^{'} y)^{2}] - \underset{2 \times 2}{β^{2}}

$E[((X'X)^{-1}X'y)^2] - \underset{2 \times 2}{\beta^2}$

我々は右手側にあることに注意してください $\beta^2$ 2x2の行列、すなわち- $bb'$ が、あなたは、この時点ですでにまもなくこの用語で何が起こるかを推測します。

上記の真の基になるデータ生成プロセスの式で $y$ を置き換えると、次のようになります。

\begin{aligned} E [((X^{'} X)^{- 1} X^{'} y)^{2}] - β^{2} & = E [((X^{'} X)^{- 1} X^{'} (X β + u))^{2}] - β^{2} \\ = E [(\underset{= I}{\underset{⏟}{(X^{'} X)^{- 1} X^{'} X}} β + (X^{'} X)^{- 1} X^{'} u)^{2}] - β^{2} \\ = E [(β + (X^{'} X)^{- 1} X^{'} u)^{2}] - β^{2} \\ = β^{2} + E [(X^{'} X)^{- 1} X^{'} u)^{2}] - β^{2} \end{aligned}

$\begin{align*} E\Big[\Big((X'X)^{-1}X'y\Big)^2\Big] - \beta^2 &= E\Big[\Big((X'X)^{-1}X'(X\beta+u)\Big)^2\Big]-\beta^2 \\ &= E\Big[\Big(\underbrace{(X'X)^{-1}X'X}_{=I}\beta+(X'X)^{-1}X'u\Big)^2\Big]-\beta^2 \\ &= E\Big[\Big(\beta+(X'X)^{-1}X'u\Big)^2\Big]-\beta^2 \\ &= \beta^2+E\Big[\Big(X'X)^{-1}X'u\Big)^2\Big]-\beta^2 \end{align*}$

since $E[u]=0$ . Furthermore, the quadratic $\beta^2$ term cancels out as anticipated.

Thus we have:

V a r [b] = ((X^{'} X)^{- 1} X^{'})^{2} E [u^{2}]

$Var[b]=((X'X)^{-1}X')^2E[u^2]$

$E[u^2]=\sigma^2$ and $((X'X)^{-1}X')^2=(X'X)^{-1}X'X(X'X)'^{-1}=(X'X)^{-1}$ since $X'X$ is a $K\times K$ symetric matrix and thus the same as its transpose. Finally we arrive at

V a r [b] = σ^{2} (X^{'} X)^{- 1}

$Var[b]=\sigma^2(X'X)^{-1}$

Now that we got rid of all $\beta$ terms. Intuitively, the variance of the estimator is independent of the value of true underlying coefficient, as this is not a random variable per se. The result is valid for all individual elements in the variance covariance matrix as shown in the book thus also valid for the off diagonal elements as well with $\beta_0\beta_1$ to cancel out respectively. The only problem was that you had applied the general formula for the variance which does not reflect this cancellation at first.

Ultimately, the variance of the coefficients reduces to $\sigma^2(X'X)^{-1}$ and independent of $\beta$ . But what does this mean? (I believe you asked also for a more general understanding of the general covariance matrix)

Look at the formula in the book. It simply asserts that the variance of the estimator increases for when the true underlying error term is more noisy ( $\sigma^2$ increases), but decreases for when the spread of X increases. Because having more observations spread around the true value, lets you in general build an estimator that is more accurate and thus closer to the true $\beta$ . On the other hand, the covariance terms on the off-diagonal become practically relevant in hypothesis testing of joint hypotheses such as $b_0=b_1=0$ . Other than that they are a bit of a fudge, really. Hope this clarifies all questions.

— Majte
ソース

and when keep the spread constant and decrease the x's, the standard error of the intercept becomes smaller, which makes sense.

— Theta30

I don't follow the expansion of the square. Why is not simplified to

((X^{'} X)^{- 1} X^{'})^{2} = ((X^{'} X)^{- 1} X^{'}) ((X^{'} X)^{- 1} X^{'}) = X^{- 2}

$((X'X)^{-1}X')^2 = ((X'X)^{-1}X')((X'X)^{-1}X') = X^{-2}$ ?

— David

2

In your case we have

X^{'} X = [\begin{matrix} n & \sum X_{i} \\ \sum X_{i} & \sum X_{i}^{2} \end{matrix}]

$X'X=\begin{bmatrix}n & \sum X_i\\\sum X_i & \sum X_i^2\end{bmatrix}$

Invert this matrix and you will get the desired result.

— mpiktas
ソース

1

It appears that $\beta_0 \beta_1$ are the predicted values (expected values). They make the switch between $E(b_0)=\beta_0$ and $E(b_1)=\beta_1$ .

— Drew75
ソース

β_{0}

$\beta_0$ and

β_{1}

$\beta_1$ are generally unknown, what can they switch to?

— qed

I think I understand the confusion, and I think they perhaps should have written

β_{0}^{*}

$\beta_0^*$ rather than

β_{0}

$\beta_0$ . Here's another post that goes through the calculation: link

— Drew75

2

@qed: to sample estimates of the unknown quantities.

— Glen_b -Reinstate Monica