T分布が線形回帰係数の仮説検定に使用されるのはなぜですか？

16

実際には、標準のT検定を使用して線形回帰係数の有意性を確認するのが一般的です。計算の仕組みは私にとって理にかなっています。

T分布を使用して、線形回帰仮説検定で使用される標準検定統計量をモデル化できるのはなぜですか？私がここで言及している標準の検定統計量：

T_{0} = \frac{\hat{β} - β_{0}}{S E (\hat{β})}

$T_{0} = \frac{\widehat{\beta} - \beta_{0}}{SE(\widehat{\beta})}$

— ネイト・パーク
ソース

この質問に対する完全で完全な答えは、かなり長くなると思います。：あなたはこれを取り組むために誰かを待っている間、だから、あなたは、これは私がここにオンラインで見つけるいくつかのメモを見て、ケースである理由のかなり良いアイデアを得ることができ onlinecourses.science.psu.edu/stat501/node/297を。特に

ことに注意してください。

t_{(n - p)}^{2} = F_{(1, n - p)}

$t^2_{(n−p)}=F_{(1,n−p)}$

— StatsStudent

1

これが複製ではないと信じることはできませんが、すべての賛成票（質問と回答の両方）... これはどうですか？または多分それは...うわー...がある（または、今日までそこにあった）を意味し、重複、超基本的なトピックはまだクロス検証済みの存在の約7年間でカバーされていないではありません

— リチャード・ハーディ

@RichardHardyうーん、それは複製のように聞こえます。それはより冗長ですが、問題は、具体的である：「私はそれを証明するにはどうすればよい、 $\hat\beta_i$ " $\frac{\hat{\beta}_i - \beta_i} {s_{\hat{\beta}_i}} \sim t_{n-k}$

— Firebugの

25

我々はt分布を使用する理由を理解するためには、基本となるの分布が何であるかを知っておく必要がと残差平方和（のあなたにt分布を与えるこれら二つプット一緒になど）。 $\widehat{\beta}$ $RSS$

容易部分が分布である、このメモを表示する-正規分布であり、 = それは一次関数であるので、。結果として、それはまた、通常配布される $\widehat{\beta}$ $\widehat{\beta}$ $(X^{T}X)^{-1}X^{T}Y$ $Y$ $Y\sim N(X\beta, \sigma^{2}I_{n})$ -あなたがの分布導出助けが必要なら、私に知らせて。 $\widehat{\beta} \sim N(\beta, \sigma^{2}(X^{T}X)^{-1})$ $\widehat{\beta}$

また、、どこ観測や数れる、あなたの回帰で使用されるパラメータの数です。この証明はもう少し複雑ですが、簡単に導き出すこともできます（ここでの証明を参照してください。なぜRSSはカイ二乗倍npで配布されているのですか？）。 $RSS \sim \sigma^{2}\chi^{2}_{n-p}$ $n$ $p$

$\widehat{\beta}_{i}$

\frac{{\hat{β}}_{i} - β_{i}}{σ \sqrt{(X^{T} X)_{i i}^{- 1}}} \sim N (0, 1)

$\begin{equation} \frac{\widehat{\beta}_{i}-\beta_{i}}{\sigma\sqrt{(X^{T}X)^{-1}_{ii}}} \sim N(0,1) \end{equation}$

$RSS$

\frac{(n - p) s^{2}}{σ^{2}} \sim χ_{n - p}^{2}

$\begin{equation} \frac{(n-p)s^{2}}{\sigma^{2}} \sim \chi^{2}_{n-p} \end{equation}$

This was simply a rearrangement of the first chi-squared expression and is independent of the $N(0,1)$ . Additionally, we define $s^{2}=\frac{RSS}{n-p}$ , which is an unbiased estimator for $\sigma^{2}$ . By the definition of the $t_{n-p}$ definition that dividing a normal distribution by an independent chi-squared (over its degrees of freedom) gives you a t-distribution (for the proof see: A normal divided by the $\sqrt{\chi^2(s)/s}$ gives you a t-distribution -- proof) you get that:

\frac{{\hat{β}}_{i} - β_{i}}{s \sqrt{(X^{T} X)_{i i}^{- 1}}} \sim t_{n - p}

$\begin{equation} \frac{\widehat{\beta}_{i}-\beta_{i}}{s\sqrt{(X^{T}X)^{-1}_{ii}}} \sim t_{n-p} \end{equation}$

Where $s\sqrt{(X^{T}X)^{-1}_{ii}}=SE(\widehat{\beta}_{i})$ .

Let me know if it makes sense.

— francium87d
ソース

what a great answer! could you please explain why

\frac{{\hat{β}}_{i} - β_{i}}{σ \sqrt{(X^{T} X)_{i i}^{- 1}}} \sim N (0, 1)

$\begin{equation} \frac{\widehat{\beta}_{i}-\beta_{i}}{\sigma\sqrt{(X^{T}X)^{-1}_{ii}}} \sim N(0,1) \end{equation}$ ?

— KingDingeling

4

The answer is actually very simple: you use t-distribution because it was pretty much designed specifically for this purpose.

Ok, the nuance here is that it wasn't designed specifically for the linear regression. Gosset came up with distribution of sample that was drawn from the population. For instance, you draw a sample $x_1,x_2,\dots,x_n$ , and calculate its mean $\bar x=\sum_{i=1}^n x_i/n$ . What is the distribution of a sample mean $\bar x$ ?

If you knew the true (population) standard deviation $\sigma$ , then you'd say that the variable $\xi=(\bar x-\mu)\sqrt n/\sigma$ is from the standard normal distribution $\mathcal N(0,1)$ . The trouble's that you usually do not know $\sigma$ , and can only estimate it $\hat\sigma$ . So, Gosset figured out the distribution when you substitute $\sigma$ with $\hat\sigma$ in the denominator, and the distribution is now called after his pseduonym "Student t".

The technicalities of linear regression lead to a situation where we can estimate the standard error $\hat\sigma_\beta$ of the coefficient estimate $\hat\beta$ , but we do not know the true $\sigma$ , therefore Student t distribution is applied here too.

— Aksakal
ソース