F統計がF分布に従うことの証明


20

この質問に照らして:OLSモデルの係数が(nk)自由度のt分布に従うことの証明

理由を理解したい

F=(TSSRSS)/(p1)RSS/(np),

ここで、pモデルパラメータの数であり、n観測の数及びTSS全分散、RSSの残留分散は、以下のFp1,np分布。

どこから始めればいいのかわからないので、私はそれを証明しようとさえしなかったことを認めなければなりません。


クリストフ・ハンクとフランシスはすでに非常に良い答えを出しています。それでも線形回帰のf検定の証拠を理解するのが難しい場合は、teamdable.github.io / techblog /…をチェックアウトしてみてください。線形回帰のftestの証明に関するブログ記事を書きました。韓国語で書かれていますが、ほとんどすべてが数式であるため、問題にはなりません。線形回帰のf検定の証明を理解するのがまだ困難な場合に役立つと思います。
テホホー

このリンクは質問に回答するかもしれませんが、回答の重要な部分をここに含め、参照用のリンクを提供する方が良いでしょう。リンクされたページが変更されると、リンクのみの回答が無効になる可能性があります。- レビューから
mkt-モニカーを復活させる

回答:


19

検定統計量の式が特殊なケースである一般的なケースの結果を示しましょう。一般的に、我々はによると、統計ができることを確認する必要があるの特性F分布独立の比として書くこと、χ2自由それら度で割ったRVS。

LET H0:Rβ=rRr公知の、非ランダムおよびR:k×q、フル列ランクを有するq。これは、(OP表記法とは異なり)定数項を含むk個のリグレッサに対するq線形制限を表します。したがって、@ user1627466の例では、p 1はすべての勾配係数をゼロに設定するというq = k 1の制限に対応します。kp1q=k1

観点からVar(β^ols)=σ2(XX)1、我々は

R(β^olsβ)N(0,σ2R(XX)1R),
これを有する(すなわちB1/2={R(XX)1R}1/2の"行列平方根"であるB1={R(XX)1R}1を介して、例えば、コレスキー分解)
n:=B1/2σR(β^olsβ)N(0,Iq),
as
Var(n)=B1/2σRVar(β^ols)RB1/2σ=B1/2σσ2BB1/2σ=I
where the second line uses the variance of the OLSE.

This, as shown in the answer that you link to (see also here), is independent of

d:=(nk)σ^2σ2χnk2,
where σ^2=yMXy/(nk) is the usual unbiased error variance estimate, with MX=IX(XX)1X is the "residual maker matrix" from regressing on X.

So, as nn is a quadratic form in normals,

nnχq2/qd/(nk)=(β^olsβ)R{R(XX)1R}1R(β^olsβ)/qσ^2Fq,nk.
In particular, under H0:Rβ=r, this reduces to the statistic
F=(Rβ^olsr){R(XX)1R}1(Rβ^olsr)/qσ^2Fq,nk.

For illustration, consider the special case R=I, r=0, q=2, σ^2=1 and XX=I. Then,

F=β^olsβ^ols/2=β^ols,12+β^ols,222,
the squared Euclidean distance of the OLS estimate from the origin standardized by the number of elements - highlighting that, since β^ols,22 are squared standard normals and hence χ12, the F distribution may be seen as an "average χ2 distribution.

In case you prefer a little simulation (which is of course not a proof!), in which the null is tested that none of the k regressors matter - which they indeed do not, so that we simulate the null distribution.

enter image description here

We see very good agreement between the theoretical density and the histogram of the Monte Carlo test statistics.

library(lmtest)
n <- 100
reps <- 20000
sloperegs <- 5 # number of slope regressors, q or k-1 (minus the constant) in the above notation
critical.value <- qf(p = .95, df1 = sloperegs, df2 = n-sloperegs-1) 
# for the null that none of the slope regrssors matter

Fstat <- rep(NA,reps)
for (i in 1:reps){
  y <- rnorm(n)
  X <- matrix(rnorm(n*sloperegs), ncol=sloperegs)
  reg <- lm(y~X)
  Fstat[i] <- waldtest(reg, test="F")$F[2] 
}

mean(Fstat>critical.value) # very close to 0.05

hist(Fstat, breaks = 60, col="lightblue", freq = F, xlim=c(0,4))
x <- seq(0,6,by=.1)
lines(x, df(x, df1 = sloperegs, df2 = n-sloperegs-1), lwd=2, col="purple")

To see that the versions of the test statistics in the question and the answer are indeed equivalent, note that the null corresponds to the restrictions R=[0I] and r=0.

Let X=[X1X2] be partitioned according to which coefficients are restricted to be zero under the null (in your case, all but the constant, but the derivation to follow is general). Also, let β^ols=(β^ols,1,β^ols,2) be the suitably partitioned OLS estimate.

Then,

Rβ^ols=β^ols,2
and
R(XX)1RD~,
the lower right block of
(XTX)1=(X1X1X1X2X2X1X2X2)1(A~B~C~D~)
Now, use results for partitioned inverses to obtain
D~=(X2X2X2X1(X1X1)1X1X2)1=(X2MX1X2)1
where MX1=IX1(X1X1)1X1.

Thus, the numerator of the F statistic becomes (without the division by q)

Fnum=β^ols,2(X2MX1X2)β^ols,2
Next, recall that by the Frisch-Waugh-Lovell theorem we may write
β^ols,2=(X2MX1X2)1X2MX1y
so that
Fnum=yMX1X2(X2MX1X2)1(X2MX1X2)(X2MX1X2)1X2MX1y=yMX1X2(X2MX1X2)1X2MX1y

It remains to show that this numerator is identical to USSRRSSR, the difference in unrestricted and restricted sum of squared residuals.

Here,

RSSR=yMX1y
is the residual sum of squares from regressing y on X1, i.e., with H0 imposed. In your special case, this is just TSS=i(yiy¯)2, the residuals of a regression on a constant.

Again using FWL (which also shows that the residuals of the two approaches are identical), we can write USSR (SSR in your notation) as the SSR of the regression

MX1yonMX1X2

That is,

USSR=yMX1MMX1X2MX1y=yMX1(IPMX1X2)MX1y=yMX1yyMX1MX1X2((MX1X2)MX1X2)1(MX1X2)MX1y=yMX1yyMX1X2(X2MX1X2)1X2MX1y

Thus,

RSSRUSSR=yMX1y(yMX1yyMX1X2(X2MX1X2)1X2MX1y)=yMX1X2(X2MX1X2)1X2MX1y


Thanks. I don't know if it's considered hand holding at this point but how do you go from your sum of squared betas to an expression that contains sum of squares?
user1627466

1
@user1627466, I added a derivation of the equivalence of the two formulae.
Christoph Hanck

4

@ChristophHanck has provided a very comprehensive answer, here I will add a sketch of proof on the special case OP mentioned. Hopefully it's also easier to follow for beginners.

A random variable YFd1,d2 if

Y=X1/d1X2/d2,
where X1χd12 and X2χd22 are independent. Thus, to show that the F-statistic has F-distribution, we may as well show that cESSχp12 and cRSSχnp2 for some constant c, and that they are independent.

In OLS model we write

y=Xβ+ε,
where X is a n×p matrix, and ideally εNn(0,σ2I). For convenience we introduce the hat matrix H=X(XTX)1XT (note y^=Hy), and the residual maker M=IH. Important properties of H and M are that they are both symmetric and idempotent. In addition, we have tr(H)=p and HX=X, these will come in handy later.

Let us denote the matrix of all ones as J, the sum of squares can then be expressed with quadratic forms:

TSS=yT(I1nJ)y,RSS=yTMy,ESS=yT(H1nJ)y.
Note that M+(HJ/n)+J/n=I. One can verify that J/n is idempotent and rank(M)+rank(HJ/n)+rank(J/n)=n. It follows from this then that HJ/n is also idempotent and M(HJ/n)=0.

We can now set out to show that F-statistic has F-distribution (search Cochran's theorem for more). Here we need two facts:

  1. Let xNn(μ,Σ). Suppose A is symmetric with rank r and AΣ is idempotent, then xTAxχr2(μTAμ/2), i.e. non-central χ2 with d.f. r and non-centrality μTAμ/2. This is a special case of Baldessari's result, a proof can also be found here.
  2. Let xNn(μ,Σ). If AΣB=0, then xTAx and xTBx are independent. This is known as Craig's theorem.

Since yNn(Xβ,σ2I), we have

ESSσ2=(yσ)T(H1nJ)yσχp12((Xβ)T(HJn)Xβ).
However, under null hypothesis β=0, so really ESS/σ2χp12. On the other hand, note that yTMy=εTMε since HX=X. Therefore RSS/σ2χnp2. Since M(HJ/n)=0, ESS/σ2 and RSS/σ2 are also independent. It immediately follows then
F=(TSSRSS)/(p1)RSS/(np)=ESSσ2/(p1)RSSσ2/(np)Fp1,np.
弊社のサイトを使用することにより、あなたは弊社のクッキーポリシーおよびプライバシーポリシーを読み、理解したものとみなされます。
Licensed under cc by-sa 3.0 with attribution required.