サンプルサイズが変数の数より小さいときに、サンプルの共分散行列が特異なのはなぜですか？

30

次元の多変量ガウス分布があるとしましょう。そして、この分布から観測値（それぞれベクトル）を取得し、標本共分散行列を計算します。この論文では、著者は、計算されたサンプル共分散行列は特異であると述べています。 $p$ $n$ $p$ $S$ $p > n$

それはどのように真実または派生していますか？
説明はありますか？

covariance-matrix linear-algebra

— user34790
ソース

4

これは、基礎となる分布とは無関係に真であることに注意してください。ガウス分布である必要はありません。

— アメーバは、Reinstate Monica

22

証明なしで提供される行列ランクに関するいくつかの事実（ただし、それらのすべてまたはほとんどすべての証明は、標準線形代数テキストで提供されるか、場合によっては十分な情報を提供した後に演習として設定する必要があります）：

場合はと、2つの適合行列は、以下のとおりです。 $A$ $B$

（I）の列ランク =の行ランク $A$ $A$

（ii） $\text{rank}(A) = \text{rank}(A^T) = \text{rank}(A^TA) = \text{rank}(AA^T)$

（iii） $\text{rank}(AB)\leq \min(\text{rank}(A),\text{rank}(B))$

（iv） $\text{rank}(A+B) \leq \text{rank}(A) + \text{rank}(B)$

（v）がフルランクの正方行列の場合、 $B$ $\text{rank}(AB) = \text{rank}(A)$

サンプルデータの行列考えます。上記から、のランクは最大でです。 $n\times p$ $y$ $y$ $\min(n,p)$

さらに、上記から明らかなように、のランクはのランクより大きくなりません（マトリックス形式でのの計算を考慮して、おそらくいくらか簡略化しています）。 $S$ $y$ $S$

場合、で、その場合はです。 $n<p$ $\text{rank}(y)<p$ $\text{rank}(S)<p$

— Glen_b -Reinstate Monica
ソース

いい答えだ！しかし、yとSがAとBにどのように関係しているかは完全には明らかではありませんか？

— マティフォー

Sはyから計算されます。（元の投稿の「x」）。yについての事実と（上記のルールを介して）行われた操作を使用して、Sのランクの境界を取得できます。AとBが果たす役割は、ステップごとに変わります。

— Glen_b-モニカの復活

14

あなたの質問への短い答えはそのランクである。したがって、場合、は特異です。 $(S) \le n - 1$ $p > n$ $S$

より詳細な回答については、（不偏）サンプル共分散行列は次のように記述できることを思い出してください。

S = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_{i} - \bar{x}) (x_{i} - \bar{x})^{T} .

$S = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})(x_i - \bar{x})^T.$

実際には、それぞれがランク1の行列を合計します。観測が線形独立であると仮定すると、ある意味で各観測はランク 1を与え、ランクからa 1が減算されます（）各観測を中心に配置するため。ただし、観測に多重共線性が存在する場合、ランクが低下する可能性があります。これにより、ランクが未満になる理由が説明されます。 $n$ $x_i$ $(S)$ $p > n$ $\bar{x}$ $(S)$ $n - 1$

大量の作業がこの問題の研究に費やされています。例えば、私の同僚と私は書いた紙を我々があれば、処理方法を決定することに興味を持ったこの同じトピック、上のに適用した場合に特異である判別分析線形内の設定を。 $S$ $p \gg n$

— ラムライザー
ソース

4

各観測を中央ので $\bar x$ 、なぜ1を引くのか詳しく説明してください。

— アボカド14

@loganecolss：共分散行列のランクが最大

である理由を

n - 1

$n−1$ 参照してください。あなたの質問への答え。

— アメーバは、モニカを復活させる

いい答えだ！たぶん、それぞれがランク1を持っている𝑛行列を合計しているステートメントに関する説明/リンクを追加することができますか？ありがとう！

— マティフォー

10

状況を正しい方法で見ると、結論は直感的に明白ですぐにわかります。

この投稿では、2つのデモを提供しています。最初の、すぐ下にあるのは言葉です。これは、最後に表示される単純な図面と同等です。間には、言葉と図面の意味の説明があります。

変量観測の共分散行列は、行列（再センタリングされたデータ）に転置左掛けすることにより計算される行列です。この行列の積は、次元がおよびであるベクトル空間のパイプラインを通じてベクトルを送信します。したがって、共分散行列であるqua線形変換は、を次元が最大でである部分空間に送ります。 $n$ $p$ $p\times p$ $\mathbb{X}_{np}$ $\mathbb{X}_{pn}^\prime$ $p$ $n$ $\mathbb{R}^n$ $\min(p,n)$ 共分散行列のランクがより大きくないことはすぐにわかります。 $\min(p,n)$ したがって、場合、ランクは最大でであり、厳密によりも小さいことは、共分散行列が特異であることを意味します。 $p\gt n$ $n$ $p$

このすべての用語は、この投稿の残りの部分で完全に説明されています。

（アメーバが削除されたコメントで親切に指摘し、関連する質問への回答で示しているように、の画像は実際には（成分の合計がゼロになるベクトルで構成される）の余次元1部分空間にあります列はすべてゼロに再センタリングされているため、サンプル共分散行列ランク $\mathbb X$ $\mathbb{R}^n$ は超えることはできません。） $\frac{1}{n-1}\mathbb{X}^\prime \mathbb{X}$ $n-1$

線形代数はすべて、ベクトル空間の次元の追跡に関するものです。ランクと特異点に関するアサーションの深い直観を得るには、いくつかの基本的な概念を理解するだけです。

Matrix multiplication represents linear transformations of vectors. An $m\times n$ matrix $\mathbb{M}$ represents a linear transformation from an $n$ -dimensional space $V^n$ to an $m$ -dimensional space $V^m$ . Specifically, it sends any $x\in V^n$ to $\mathbb{M}x = y \in V^m$ . That this is a linear transformation follows immediately from the definition of linear transformation and basic arithmetical properties of matrix multiplication.
Linear transformations can never increase dimensions. This means that the image of the entire vector space $V^n$ under the transformation $\mathbb M$ (which is a sub-vector space of $V^m$ ) can have a dimension no greater than $n$ . This is an (easy) theorem that follows from the definition of dimension.
The dimension of any sub-vector space cannot exceed that of the space in which it lies. This is a theorem, but again it is obvious and easy to prove.
The rank of a linear transformation is the dimension of its image. The rank of a matrix is the rank of the linear transformation it represents. These are definitions.
A singular matrix $\mathbb{M}_{mn}$ has rank strictly less than $n$ (the dimension of its domain). In other words, its image has a smaller dimension. This is a definition.

To develop intuition, it helps to see the dimensions. I will therefore write the dimensions of all vectors and matrices immediately after them, as in $\mathbb{M}_{mn}$ and $x_n$ . Thus the generic formula

y_{m} = M_{m n} x_{n}

$y_m = \mathbb{M}_{mn} x_n$

is intended to mean that the $m\times n$ matrix $\mathbb M$ , when applied to the $n$ -vector $x$ , produces an $m$ -vector $y$ .

Products of matrices can be thought of as a "pipeline" of linear transformations. Generically, suppose $y_a$ is an $a$ -dimensional vector resulting from the successive applications of the linear transformations $\mathbb{M}_{mn}, \mathbb{L}_{lm}, \ldots, \mathbb{B}_{bc},$ and $\mathbb{A}_{ab}$ to the $n$ -vector $x_n$ coming from the space $V^n$ . This takes the vector $x_n$ successively through a set of vector spaces of dimensions $m, l, \ldots, c, b,$ and finally $a$ .

Look for the bottleneck: because dimensions cannot increase (point 2) and subspaces cannot have dimensions larger than the spaces in which they lie (point 3), it follows that the dimension of the image of $V^n$ cannot exceed the smallest dimension $\min(a,b,c,\ldots,l,m,n)$ encountered in the pipeline.

This diagram of the pipeline, then, fully proves the result when it is applied to the product $\mathbb{X}^\prime \mathbb{X}$ :

— whuber
ソース