標準偏差を増加させる値


12

私は次の声明に戸惑っています。

「数値セットの標準偏差を増加させるには、平均から複数の標準偏差離れた値を追加する必要があります」

その証拠は何ですか?もちろん標準偏差をどのように定義するかは知っていますが、その部分はどういうわけか見逃しているようです。コメントはありますか?


1
関連する代数を解決しようとしましたか?
アレコスパパドプロス

はい、あります。n + 1個の値の分散からn個の値のサンプル分散を差し引いて、その差がゼロより大きいことを要求しました。しかし、私はそれをまったく理解できません。
JohnK

3
最も簡単な方法の一つは、区別することであるウェルフォードのアルゴリズムを新たな値に対して、次に導入する場合のことを示すために統合Xをnは分散、次いで増加X N - ˉ X N - 12nはxnxnここで ˉ X N-1は、第一の平均であるN-1つの値とVN-1は、それらの分散推定値です。(xnx¯n1)2nn1vn1x¯n1n1vn1
whuber

わかりましたが、これはおそらく単純な代数で示すことができますか?統計に関する私の知識はそれほど高度ではありません。
JohnK

@JohnK、引用元を教えてください。
Pe Dro

回答:


20

以下のための任意の 番号Y 1Y 2... Y Nの平均と ˉ Y = 1Ny1,y2,,yN、分散は次式で与えられる。 σ 2y¯=1Ni=1Nyi

σ2=1N1i=1N(yiy¯)2=1N1i=1N(yi22yiy¯+y¯2)=1N1[(i=1Nyi2)2N(y¯)2+N(y¯)2](1)σ2=1N1i=1N(yi2(y¯)2)
(1)nx1,x2,xn which we take for convenience in exposition to have mean x¯=0, we have that
σ2=1n1i=1n(xi2(x¯)2)=1n1i=1nxi2
If we now add in a new observation xn+1 to this data set, then the new mean of the data set is
1n+1i=1n+1xi=nx¯+xn+1n+1=xn+1n+1
while the new variance is
σ^2=1ni=1n+1(xi2xn+12(n+1)2)=1n[((n1)σ2+xn+12)xn+12n+1]=1n[(n1)σ2+nn+1xn+12]>σ2 only if xn+12>n+1nσ2.
So |xn+1| needs to be larger than σ1+1n or, more generally, xn+1 needs to differ from the mean x¯ of the original data set by more than σ1+1n, in order for the augmented data set to have larger variance than the original data set. See also Ray Koopman's answer which points out that the new variance is larger than, equal to, or smaller than, the original variance according as xn+1 differs from the mean by more than, exactly, or less than σ1+1n.

5
+1 Finally somebody gets it right... ;-) The statement to be proved is correct; it's just not tight. Incidentally, you may also pick your units of measurement to make σ2=1, which further simplifies the calculation, reducing it to about two lines.
whuber

I suggest you use S instead of sigma in the first set of equations and thanks for the derivation. It was good to know :)
Theoden

3

The puzzling statement gives a necessary but insufficient condition for the standard deviation to increase. If the old sample size is n, the old mean is m, the old standard deviation is s, and a new point x is added to the data, then the new standard deviation will be less than, equal to, or greater than s according as |xm| is less than, equal to, or greater than s1+1/n.


1
Do you have a proof at hand?
JohnK

2

Leaving aside the algebra (which also works) think about it this way: The standard deviation is square root of the variance. The variance is the average of the squared distances from the mean. If we add a value that is closer to the mean than this, the variance will shrink. If we add a value that is farther from the mean than this, it will grow.

This is true of any average of values that are non-negative. If you add a value that is higher than the mean, the mean increases. If you add a value that is less, it decreases.


I would love to see a rigorous proof as well. While I understand the principle I am puzzled by the fact that the value has to be at least 1 deviation away from the mean. Why precisely 1?
JohnK

I don't see what is confusing. The variance is the average. If you add something greater than the average (that is, more than 1 sd) it increases. But I am not one for formal proofs
Peter Flom - Reinstate Monica

It could be greater than the average by 0.2 standard deviations. Why wouldn't it increase then?
JohnK

No, not greater than the mean of the data, greater than the variance, which is the mean of the squared distances.
Peter Flom - Reinstate Monica

4
It is confusing because including a new value changes the mean, so all the residuals change. It is conceivable that even when the new value is far from the old mean, its contribution to the SD could be compensated by reducing the sum of squares of the residuals of the other values. This is one of the many reasons why rigorous proofs are useful: they provide not only security in one's knowledge, but insight (and even new information) as well. For instance, the proof will show that you have to add a new value that is strictly further than one SD from the mean in order to increase the SD.
whuber

2

I'll get you started on the algebra, but won't take it quite all of the way. First, standardize your data by subtracting the mean and dividing by the standard deviation:

Z=xμσ.
Note that if x is within one standard deviation of the mean, Z is between -1 and 1. Z would be 1 if x were exactly one sd away from the mean. Then look at your equation for standard deviation:
σ=i=1NZi2N1
What happens to σ if ZN is between -1 and 1?

A number whose absolute value is less than 1, when squared it is also going to be less than 1 in abs. value. Yet what I do not understand is that even if Z_N falls into that category, we are adding a positive value to σ, so shouldn't it increase?
JohnK

Yes, you are adding a positive value, but it will be smaller than your average deviation from the mean and therefore reduce sigma. Maybe it would make more sense to consider the value as ZN+1.
wcampbell

1
1) Don't forget, when you add that value, you are also increasing N by 1. 2) You are not adding that value to σ, you are adding it to Zi2.
jbowman

Exactly what I was trying to express!
wcampbell

It's not that simple: in this answer you have computed the SD as if the new value were already part of the dataset. Instead, the Zi have to be standardized with respect to the SD and mean of the first N1 values only, not all of them.
whuber
弊社のサイトを使用することにより、あなたは弊社のクッキーポリシーおよびプライバシーポリシーを読み、理解したものとみなされます。
Licensed under cc by-sa 3.0 with attribution required.