マハラノビス距離とレバレッジの関係を証明しますか？

ウィキペディアで数式を見てきました。マハラノビスの距離とレバレッジを関連付ける：

マハラノビス距離はレバレッジ統計 $h$ と密接に関連していますが、スケールは異なります：
$D^{2} = (N - 1) (h - \frac{1}{N}) .$ $D^2 = (N - 1)(h - \tfrac{1}{N}).$

ではリンク先の記事、ウィキペディアは説明する $h$ これらの用語には：

線形回帰モデルでは、のためにレバレッジスコア $i^{th}$ データユニットは、次のように定義される：
$h_{i i} = (H)_{i i},$ $h_{ii}=(H)_{ii},$ $i^{th}$ ハット行列の対角要素 $H=X(X^{\top}X)^{-1}X^{\top}$ 、行列転置を表します。 $^{\top}$

どこにも証拠が見つかりません。定義から始めようとしましたが、何も進展しません。誰でもヒントを与えることができますか？

— dave2d
ソース

マハラノビス距離の下部から上部への説明でのマハラノビス距離の説明は？2つの主要な結果が含まれます。

定義上、リグレッサが均一にシフトされても変化しません。
ベクトル間の二乗マハラノビス距離 $x$ 及び $y$ で与えられる
$D^{2} (x, y) = (x - y)^{'} Σ^{- 1} (x - y)$ $D^2(x,y) = (x-y)^\prime \Sigma^{-1}(x-y)$ $\Sigma$ データの共分散です。

（1）リグレッサの平均がすべてゼロであると仮定することができます。 $h_i$ 計算は残ります。ただし、主張が真実であるためには、もう1つの仮定を追加する必要があります。

モデルにはインターセプトが含まれている必要があります。

これを可能にすることが聞かせて $k \ge 0$ 説明変数と $n$ データ、リグレッサの値書き込む $j$ 観察するための $i$ のような $x_{ij}$ 。リグレッサーこれらの $n$ 値の列ベクトルを、観測値これらの値の行ベクトルと書く $j$ $\mathbf{x}_{,j}$ $k$ $i$ 書き込まれる $\mathbf{x}_i$ 。次に、モデル行列は

X = (\begin{matrix} 1 & x_{11} & \dots & x_{1 k} \\ 1 & x_{21} & \dots & x_{2 k} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ 1 & x_{n 1} & \dots & x_{n k} \end{matrix})

$X = \pmatrix{ 1 &x_{11} &\cdots &x_{1k} \\ 1 &x_{21} &\cdots &x_{2k} \\ \vdots &\vdots &\vdots &\vdots \\ 1 &x_{n1} &\cdots &x_{nk}}$

そして、定義により、帽子行列は

H = X (X^{'} X)^{- 1} X^{'},

$H = X(X^\prime X)^{-1} X^\prime,$

対角線に沿ったエントリ $i$ は

\begin{matrix} (1) & h_{i} = h_{i i} = (1; x_{i}) (X^{'} X)^{- 1} (1; x_{i})^{'} . \end{matrix}

$h_i = h_{ii} = (1; \mathbf{x}_i) (X^\prime X)^{-1} (1; \mathbf{x}_i)^\prime.\tag{1}$

その中心行列を逆にすること以外には何もありませんが、最初の重要な結果のおかげで、特にブロック行列形式で記述する場合は簡単です：

X^{'} X = n (\begin{matrix} 1 & 0^{'} \\ 0 & C \end{matrix})

$X^\prime X = n\pmatrix{1 & \mathbf{0}^\prime \\ \mathbf{0} & C}$

どこ $\mathbf{0} = (0,0,\ldots,0)^\prime$ と

C_{j k} = \frac{1}{n} \sum_{i = 1}^{n} x_{i j} x_{i k} = \frac{n - 1}{n} Cov (x_{j}, x_{k}) = \frac{n - 1}{n} Σ_{j k} .

$C_{jk} = \frac{1}{n} \sum_{i=1}^n x_{ij} x_{ik} = \frac{n-1}{n}\operatorname{Cov}(\mathbf{x}_j, \mathbf{x}_k) = \frac{n-1}{n}\Sigma_{jk}.$

（リグレッサのサンプル共分散行列の $\Sigma$ を記述しました。）これはブロック対角であるため、その逆はブロックを反転するだけで簡単に見つけることができます。

(X^{'} X)^{- 1} = \frac{1}{n} (\begin{matrix} 1 & 0^{'} \\ 0 & C^{- 1} \end{matrix}) = (\begin{matrix} \frac{1}{n} & 0^{'} \\ 0 & \frac{1}{n - 1} Σ^{- 1} \end{matrix}) .

$(X^\prime X)^{-1} = \frac{1}{n}\pmatrix{1 & \mathbf{0}^\prime \\ \mathbf{0} & C^{-1}} = \pmatrix{\frac{1}{n} & \mathbf{0}^\prime \\ \mathbf{0} & \frac{1}{n-1}\Sigma^{-1}}.$

From the definition $(1)$ we obtain

\begin{aligned} h_{i} & = (1; x_{i}) (\begin{matrix} \frac{1}{n} & 0^{'} \\ 0 & \frac{1}{n - 1} Σ^{- 1} \end{matrix}) (1; x_{i})^{'} \\ = \frac{1}{n} + \frac{1}{n - 1} x_{i} Σ^{- 1} x_{i}^{'} \\ = \frac{1}{n} + \frac{1}{n - 1} D^{2} (x_{i}, 0) . \end{aligned}

$\eqalign{ h_i &= (1; \mathbf{x}_i) \pmatrix{\frac{1}{n} & \mathbf{0}^\prime \\ \mathbf{0} & \frac{1}{n-1}\Sigma^{-1}}(1; \mathbf{x}_i)^\prime \\ &=\frac{1}{n} + \frac{1}{n-1}\mathbf{x}_i \Sigma^{-1}\mathbf{x}_i^\prime \\ &=\frac{1}{n} + \frac{1}{n-1} D^2(\mathbf{x}_i, \mathbf{0}). }$

Solving for the squared Mahalanobis length $D_i^2 = D^2(\mathbf{x}_i, \mathbf{0})$ yields

D_{i}^{2} = (n - 1) (h_{i} - \frac{1}{n}),

$D_i^2 = (n-1)\left(h_i - \frac{1}{n}\right),$

QED.

Looking back, we may trace the additive term $1/n$ to the presence of an intercept, which introduced the column of ones into the model matrix $X$ . The multiplicative term $n-1$ appeared after assuming the Mahalanobis distance would be computed using the sample covariance estimate (which divides the sums of squares and products by $n-1$ ) rather than the covariance matrix of the data (which divides the sum of squares and products by $n$ ).

The chief value of this analysis is to impart a geometric interpretation to the leverage, which measures how much a unit change in the response at observation $i$ will change the fitted value at that observation: high-leverage observations are at large Mahalanobis distances from the centroid of the regressors, exactly as a mechanically efficient lever operates at a large distance from its fulcrum.

R code to show that the relation indeed holds:

x <- mtcars

# Compute Mahalanobis distances
h <- hat(x, intercept = TRUE); names(h) <- rownames(mtcars)
M <- mahalanobis(x, colMeans(x), cov(x))

# Compute D^2 of the question
n <- nrow(x); D2 <- (n-1)*(h - 1/n)

# Compare.
all.equal(M, D2)               # TRUE
print(signif(cbind(M, D2), 3))

— whuber
ソース

Excellent answer, very well rounded with rigor and intuition. Cheers!

— cgrudz

Thanks for the post @whuber ! For sanity check, here is R code to show that the relation indeed holds: x <- mtcars rownames(x) <- NULL colnames(x) <- NULL n <- nrow(x) h <- hat(x, T) mahalanobis(x, colMeans(x), cov(x)) (n-1)*(h - 1/n) all.equal(mahalanobis(x, colMeans(x), cov(x)), (n-1)*(h - 1/n))

— Tal Galili

@Tal I didn't think I needed a sanity check--but thank you for the code. :-) I have made modifications to clarify it and its output a little.

— whuber

@whuber, I wanted an example that shows how to make the equality works (making clear to me that I got the assumptions right). I've also extended the relevant Wiki entry: en.wikipedia.org/wiki/… (feel free to also expend on it there, as you see fit :) )

— Tal Galili