それから

21

古典的な統計では、データセット統計 $T$ がパラメーターに対して完全であると定義され、それから不偏推定量を非自明に形成することは不可能であるという定義があります。つまり、唯一の方法は、持っている全てに対して有することであるであるほぼ確実。 $y_1, \ldots, y_n$ $\theta$ $0$ $E h(T (y )) = 0$ $\theta$ $h$ $0$

この背後に直感がありますか？これはかなり機械的な方法のように思えますが、これは以前に尋ねられたことを知っていますが、入門者の学生が資料を消化するのが簡単になる直感を非常に理解しやすいかどうか疑問に思っていました。

— user1398057
ソース

2

それは非常に良い質問です。自分で掘り下げなければなりませんでした。それがこのような機械的な定義であり、私のような標準的な実践者にとって直感的に意味がないように見える理由は、数学統計における基本的な貢献を証明するために主に使用されるためです。特に、私の短い検索により、リーマン・シェッフェの定理とバスの定理は、保持するために統計量の完全性を必要とすることが明らかになりました。これらは1950年代半ばの貢献です。私はあなたに直感的な説明を提供することはできません-しかし、あなたが本当にそれを作りたいなら、多分証明連想

— ジェレミアズK

18

私は他の答えに追加しようとします。まず、完全性は、それを使用する定理によって主に正当化される技術的条件です。それでは、関連するいくつかの概念と定理から始めましょう。

LET $X=(X_1,X_2,\dotsc,X_n)$ 我々が分布を有するようにモデル化するれ、IIDデータのベクトルを表し $f(x;\theta), \theta \in \Theta$ パラメータ $\theta$ のデータを管理するが不明です。条件付き分布がパラメーター依存しない場合、 $T=T(X)$ で十分です。 $X \mid T$ $\theta$ $V=V(X)$ は、の分布が（ファミリー内の）依存しない場合に補助的です。であるゼロの不偏推定量の期待値は関係なく、ゼロであれば。である完全な統計に基づいて、ゼロのいずれかの不偏推定場合あること、同じゼロである場合、 $V$ $\theta$ $f(x;\theta)$ $U=U(X)$ $\theta$ $S=S(X)$ $S$ $\DeclareMathOperator{\E}{\mathbb{E}} \E g(S)=0 (\text{for all $\theta$})$ 、 $g(S)=0$ ae（すべての $\theta$ ）。

ここで、十分な統計、基づいて、 $\theta$ 2つの異なる不偏推定量があるとします。つまり、シンボル $T$ $g_1(T), g_2(T)$

E g_{1} (T) = θ, E g_{2} (T) = θ

$\E g_1(T)=\theta ,\\ \E g_2(T)=\theta$ および

P (g_{1} (T) \neq g_{2} (T)) > 0

$\DeclareMathOperator{\P}{\mathbb{P}} \P(g_1(T) \not= g_2(T) ) > 0$ （すべての

θ

$\theta$ ）。次に、

g_{1} (T) - g_{2} (T)

$g_1(T)-g_2(T)$ はゼロの不偏推定量であり、

T

$T$ が完全ではないことを証明するゼロではありません。したがって、十分な統計

T

$T$ 完全性は、

一意の不偏推定量が1つだけ存在することを示します。

θ

$\theta$

T

$T$ 基づいています。これはすでにレーマン・シェッフェの定理に非常に近いものです。

いくつかの例を見てみましょう。仮定 $X_1, \dotsc, X_n$ 今は間隔にIID均一である $(\theta, \theta+1)$ 。（ $X_{(1)} < X_{(2)} < \dotsm < X_{(n)}$ は次数統計である）ペア $(X_{(1)}, X_{(n)})$ は十分ですが、完全ではないため、差 $X_{(n)}-X_{(1)}$ は補助的であり、その期待値を計算し、 $c$ （ $n$ のみの関数）とすると、 $X_{(n)}-X_{(1)} -c$ はゼロの不偏推定量になりますこれはまったくゼロではありません。したがって、この場合の十分な統計は完全ではなく、十分ではありません。そして、それが何を意味するのかを見ることができます：について情報を提供しない十分な統計の関数が存在します $\theta$ （モデルのコンテキスト内）。これは、完全に十分な統計では発生しません。ある意味では、その機能は有益ではありません。一方、ノイズの項とみなされる可能性のある最小限の十分な統計の関数がある場合、それはモデルの外乱/ノイズの項にゼロが期待されます。したがって、完全ではない十分な統計にはノイズが含まれていると言えます。

Look again at the range $R=X_{(n)}-X_{(1)}$ in this example. Since its distribution does not depend on $\theta$ , it doesn't by itself alone contain any information about $\theta$ . But, together with the sufficient statistic, it does! How? Look at the case where $R=1$ is observed.Then, in the context of our (known to be true) model, we have perfect knowledge of $\theta$ ! Namely, we can say with certainty that $\theta = X_{(1)}$ . You can check that any other value for $\theta$ then leads to either $X_{(1)}$ または $X_{(n)}$ は、想定モデルの下では不可能な観測です。一方、 $R=0.1$ を観察した場合、可能な値の範囲は $\theta$ is rather large (exercise ...).

この意味で、補助統計 $R$ は、このデータとモデルに基づいて $\theta$ を推定できる精度に関する情報が含まれています。この例などでは、補助統計量 $R$ 「サンプルサイズの役割を引き継ぎます」。通常、信頼区間などにはサンプルサイズ $n$ が必要ですが、この例では、ではなくのみを使用して計算される条件付き信頼区間を作成できます（運動）。これはフィッシャーのアイデアであり、推論はいくつかの補助的な統計。 $R$ $n$

Now, Basu's theorem: If $T$ is complete sufficient, then it is independent of any ancillary statistic. That is, inference based on a complete sufficient statistic is simpler, in that we do not need to consider conditional inference. Conditioning on a statistic which is independent of $T$ does not change anything, of course.

Then, a last example to give some more intuition. Change our uniform distribution example to a uniform distribution on the interval $(\theta_1, \theta_2)$ (with $\theta_1<\theta_2$ ). In this case the statistic $(X_{(1)}, X_{(n)})$ is complete and sufficient. What changed? We can see that completeness is really a property of the model. In the former case, we had a restricted parameter space. This restriction destroyed completeness by introducing relationships on the order statistics. By removing this restriction we got completeness! So, in a sense, lack of completeness means that the parameter space is not big enough, and by enlarging it we can hope to restore completeness (and thus, easier inference).

Some other examples where lack of completeness is caused by restrictions on the parameter space,

see my answer to: What kind of information is Fisher information?
Let $X_1, \dotsc, X_n$ be iid $\mathcal{Cauchy}(\theta,\sigma)$ (a location-scale model). Then the order statistics in sufficient but not complete. But now enlarge this model to a fully nonparametric model, still iid but from some completely unspecified distribution $F$ . Then the order statistics is sufficient and complete.
For exponential families with canonical parameter space (that is, as large as possible) the minimal sufficient statistic is also complete. But in many cases, introducing restrictions on the parameter space, as with curved exponential families, destroys completeness.

A very relevant paper is An Interpretation of Completeness and Basu's Theorem.

— kjetil b halvorsen
ソース

7

Some intuition may be available from the theory of best (minimum variance) unbiased estimators.

If $E_\theta W=\tau(\theta)$ then $W$ is a best unbiased estimator of $\tau(\theta)$ iff $W$ is uncorrelated with all unbiased estimators of zero.

Proof: Let $W$ be an unbiased estimator uncorrelated with all unbiased estimators of zero. Let $W'$ be another estimator such that $E_\theta W'=E_\theta W=\tau(\theta)$ . Write $W'=W+(W'-W)$ . By assumption, $Var_\theta W'=Var_\theta W+Var_\theta (W'-W)$ . Hence, for any $W'$ , $Var_\theta W'\geq Var_\theta W$ .

Now assume that $W$ is a best unbiased estimator. Let there be some other estimator $U$ with $E_\theta U=0$ . $\phi_a:=W+aU$ is also unbiased for $\tau(\theta)$ . We have

V a r_{θ} ϕ_{a} := V a r_{θ} W + 2 a C o v_{θ} (W, U) + a^{2} V a r_{θ} U .

$Var_\theta \phi_a:=Var_\theta W+2aCov_\theta(W,U)+a^2Var_\theta U.$ If there were a

θ_{0} \in Θ

$\theta_0\in\Theta$ such that

C o v_{θ_{0}} (W, U) < 0

$Cov_{\theta_0}(W,U)<0$ , we would obtain

V a r_{θ} ϕ_{a} < V a r_{θ} W

$Var_\theta \phi_a<Var_\theta W$ for

a \in (0, - 2 C o v_{θ_{0}} (W, U) / V a r_{θ_{0}} U)

$a\in(0,-2Cov_{\theta_0}(W,U)/Var_{\theta_0} U)$ .

W

$W$ could then not be the best unbiased estimator. QED

Intuitively, the result says that if an estimator is optimal, it must not be possible to improve it by just adding some noise to it, in the sense of combining it with an estimator that is just zero on average (being an unbiased estimator of zero).

Unfortunately, it is difficult to characterize all unbiased estimators of zero. The situation becomes much simpler if zero itself is the only unbiased estimator of zero, as any statistic $W$ satisfies $Cov_\theta(W,0)=0$ . Completeness describes such a situation.

— Christoph Hanck
ソース