我々はt分布を使用する理由を理解するためには、基本となるの分布が何であるかを知っておく必要がβと残差平方和(のR S Sあなたにt分布を与えるこれら二つプット一緒になど)。
容易部分が分布であるβ、このメモを表示する-正規分布であり、β = (X T X )- 1 X T Yそれは一次関数であるので、Y Y 〜N (X β 、σ 2 I n)。結果として、それはまた、通常、配布されるβ〜N (β 、σ 2(X T X )- -あなたがの分布導出助けが必要なら、私に知らせて βを。
This was simply a rearrangement of the first chi-squared expression and is independent of the . Additionally, we define , which is an unbiased estimator for . By the definition of the definition that dividing a normal distribution by an independent chi-squared (over its degrees of freedom) gives you a t-distribution (for the proof see: A normal divided by the gives you a t-distribution -- proof) you get that:
Where .
Let me know if it makes sense.
The answer is actually very simple: you use t-distribution because it was pretty much designed specifically for this purpose.
Ok, the nuance here is that it wasn't designed specifically for the linear regression. Gosset came up with distribution of sample that was drawn from the population. For instance, you draw a sample , and calculate its mean . What is the distribution of a sample mean ?
If you knew the true (population) standard deviation , then you'd say that the variable is from the standard normal distribution . The trouble's that you usually do not know , and can only estimate it . So, Gosset figured out the distribution when you substitute with in the denominator, and the distribution is now called after his pseduonym "Student t".
The technicalities of linear regression lead to a situation where we can estimate the standard error of the coefficient estimate , but we do not know the true , therefore Student t distribution is applied here too.