この質問に照らして:OLSモデルの係数が(nk)自由度のt分布に従うことの証明
理由を理解したい
ここで、モデルパラメータの数であり、観測の数及び全分散、の残留分散は、以下の分布。
どこから始めればいいのかわからないので、私はそれを証明しようとさえしなかったことを認めなければなりません。
この質問に照らして:OLSモデルの係数が(nk)自由度のt分布に従うことの証明
理由を理解したい
ここで、モデルパラメータの数であり、観測の数及び全分散、の残留分散は、以下の分布。
どこから始めればいいのかわからないので、私はそれを証明しようとさえしなかったことを認めなければなりません。
回答:
検定統計量の式が特殊なケースである一般的なケースの結果を示しましょう。一般的に、我々はによると、統計ができることを確認する必要があるの特性分布独立の比として書くこと、自由それら度で割ったRVS。
LET とと公知の、非ランダムおよび、フル列ランクを有する。これは、(OP表記法とは異なり)定数項を含むk個のリグレッサに対する線形制限を表します。したがって、@ user1627466の例では、p − 1はすべての勾配係数をゼロに設定するというq = k − 1の制限に対応します。
観点から、我々は
This, as shown in the answer that you link to (see also here), is independent of
So, as is a quadratic form in normals,
For illustration, consider the special case , , , and . Then,
In case you prefer a little simulation (which is of course not a proof!), in which the null is tested that none of the regressors matter - which they indeed do not, so that we simulate the null distribution.
We see very good agreement between the theoretical density and the histogram of the Monte Carlo test statistics.
library(lmtest)
n <- 100
reps <- 20000
sloperegs <- 5 # number of slope regressors, q or k-1 (minus the constant) in the above notation
critical.value <- qf(p = .95, df1 = sloperegs, df2 = n-sloperegs-1)
# for the null that none of the slope regrssors matter
Fstat <- rep(NA,reps)
for (i in 1:reps){
y <- rnorm(n)
X <- matrix(rnorm(n*sloperegs), ncol=sloperegs)
reg <- lm(y~X)
Fstat[i] <- waldtest(reg, test="F")$F[2]
}
mean(Fstat>critical.value) # very close to 0.05
hist(Fstat, breaks = 60, col="lightblue", freq = F, xlim=c(0,4))
x <- seq(0,6,by=.1)
lines(x, df(x, df1 = sloperegs, df2 = n-sloperegs-1), lwd=2, col="purple")
To see that the versions of the test statistics in the question and the answer are indeed equivalent, note that the null corresponds to the restrictions and .
Let be partitioned according to which coefficients are restricted to be zero under the null (in your case, all but the constant, but the derivation to follow is general). Also, let be the suitably partitioned OLS estimate.
Then,
Thus, the numerator of the statistic becomes (without the division by )
It remains to show that this numerator is identical to , the difference in unrestricted and restricted sum of squared residuals.
Here,
Again using FWL (which also shows that the residuals of the two approaches are identical), we can write (SSR in your notation) as the SSR of the regression
That is,
Thus,
@ChristophHanck has provided a very comprehensive answer, here I will add a sketch of proof on the special case OP mentioned. Hopefully it's also easier to follow for beginners.
A random variable if
In OLS model we write
Let us denote the matrix of all ones as , the sum of squares can then be expressed with quadratic forms:
We can now set out to show that -statistic has -distribution (search Cochran's theorem for more). Here we need two facts:
Since , we have