「平均」の一般化のために、中央値は平均の一種ですか?


20

「平均」の概念は、従来の算術平均よりもはるかに広く歩き回ります。中央値を含むまで伸びますか?類推により、

raw dataidraw datameanraw meanid1arithmetic meanraw datarecipreciprocalsmeanmean reciprocalrecip1harmonic meanraw dataloglogsmeanmean loglog1geometric meanraw datasquaresquaresmeanmean squaresquare1root mean squareraw datarankranksmeanmean rankrank1median

私が描いているアナロジー、次によって与えられる準算術平均です。

Mf(x1,,xn)=f1(1ni=1nf(xi))

比較のために、5項目のデータセットの中央値が3番目の項目に等しいと言うと、データを1から5にランク付けすることと同等であることがわかります(関数示す場合がありますf)。変換されたデータの平均を取る(3); ランク3(一種f1)のデータ項目の値を読み返します。

幾何平均、調和平均、RMSの例では、fは固定された関数であり、任意の数に単独で適用できます。対照的に、ランクを割り当てるか、ランクから元のデータに戻る(必要に応じて補間する)には、データセット全体の知識が必要です。さらに、私が準算術平均を読んだ定義では、fは連続している必要があります。中央値は準算術平均の特別な場合と見なされますか?その場合、はどのようにf定義されますか?または、中央値はこれまでに「平均」の他のより広い概念のインスタンスとして記述されていますか?使用できる一般化は、準算術平均だけではありません。

問題の一部は用語です(とにかく「平均」とはどういう意味ですか、特に「中央傾向」または「平均」とは対照的ですか?)。たとえば、ファジー制御システムに関する文献では、集約関数は、F a a = aおよびF b b = b ; 集約関数F:[a,b]×[a,b][a,b]F(a,a)=aF(b,b)=bのすべてのためのx yは[ bが(一般的な意味で) "平均"と呼ばれます。言うまでもなく、このような定義は信じられないほど広いです!そして、この文脈では、中央値は確かに一種の平均と呼ばれます。[ 1 ]しかし、平均のあまり広範ではない特徴が、中央値、いわゆる一般化平均を包含するのに十分に拡張できるかどうか興味があります。min(x,y)F(x,y)max(x,y)x,y[a,b][1](これは「パワー平均」として説明する方が良いかもしれません)、レーマー平均はそうでありませんが、他の人はそうかもしれません。価値のあるものとして、ウィキペディアには「その他の手段」のリストに「中央値」が含まれていますが、さらなるコメントや引用はありません。

:3つ以上の入力に適切に拡張されたこのような平均の広い定義は、ファジィ制御の分野では標準的であり、インターネット検索中に中央値として記述されている中央値のインスタンスで何度も現れます。例えば、Fodor、JC、&Rudas、IJ(2009)、「移行性のある集約関数のいくつかのクラスについて」、IFSA / EUSFLAT Conf。(pp。653-656)。なお、「平均」という用語の最も初期のユーザーの一人(のはこの紙のノートmoyenneが)だったコーシーで、クールドールはドゥエコールロワイヤルポリテクニック、1èrePARTIEを分析。algébrique(1821)を分析します。後にアチェールチシーニ[1]コルモゴロフデ・フィネッティは、コーシーよりも「平均」のより一般的な概念を開発しており、フォーダー、J。、およびルーベンス、M。(1995)、「手段の意味について」、Journal of Computational and Applied Mathematics64(1) 、103-115。


算術平均、中央値、モード鉱石は一般に「平均」と呼ばれることが多く、この言葉はあいまいな方法で使用されることがあります。統計の本で嘘をつく方法は、統計で「うそをつく」例としてそれを使用します。(あなたの質問がより一般的であることを理解しているので、コメントとして投稿してください。)
ティム

@Tim私は、「平均」と呼ばれる「モード」を見ることはめったにないという非科学的な印象を持っています。しかし、「平均」(「算術平均」の同義語として使用されることもあれば、まったく意味のない中心傾向の測定値が含まれることもあります)と「平均」(一般的な使用は、技術的な意味ではなく、「算術平均」のために使用されますが、それだけではありません。ちなみに、「平均」の他の意味のために、インターネット検索では難しいトピックでもあります。
シルバーフィッシュ

3
平均(算術、幾何、調和、累乗、指数、組み合わせなど)は「分析平均」です。中央値、分位数、触覚は「位置平均」です。ランキングは、任意の変量から均一変量への単調な変換であり、非変換への戻りパスがないため、対数、平方などとはまったく異なります。
ttnphns

「一般化された平均」という用語は、en.wikipedia.org
wiki / Generalized_meanに

3
あなたは、計算における重み許可する場合は、そして中央値は簡単に平均の一種と見なすことができます。同様に、ただし同一ではありませんが、トリミングされた手段の概念には、制限的または礼儀的な特別なケースとして中央値が含まれます。stata-journal.com/article.html?article=st0313は、かなり最近のレビューです。iwixi,iwi=1
ニックコックス

回答:


9

中央値を「一般的な種類の平均」とみなす方法の1つは、次のとおりです。まず、順序統計の観点から通常の算術平均を慎重に定義します。

x¯=iwix(i),wi=1n.

次に、順序統計の通常の平均を他の重み関数で置き換えることにより、順序を説明する「一般化平均」の概念が得られます。

In that case, a host of potential measures of center become "generalized sorts of means". In the case of the median, for odd n, w(n+1)/2=1 and all others are 0, and for even n, wn2=wn2+1=12.

Similarly, if we look at M-estimation, location estimates might also be thought of as a generalization of the arithmetic mean (where for the mean, ρ is quadratic, ψ is linear, or the weight-function is flat), and the median falls also into this class of generalizations. This is a somewhat different generalization than the previous one.

There are a variety of other ways we might extend the notion of 'mean' that could include median.


This is very nice. Closely related to this answer, and which is discussed in the papers cited in the question: the ordered weighted average, or OWA
Silverfish

11

If you think of the mean as the point minimizing the quadratic loss function SSE, then the median is the point minimizing the linear loss function MAD, and the mode is the point minimizing some 0-1 loss function. No transformations required.

So the median is an example of a Fréchet mean.


3
@Mike Anderson: Well, this shows that the media is a Frechet mean (see the wikipedia article): en.wikipedia.org/wiki/Fr%C3%A9chet_mean
kjetil b halvorsen

@Kjetil Excellent! The fact that the median is an example of a Fréchet mean is exactly an answer to my question "is the median ever described as an instance of some other wider notion of "mean"?" And +1 to Mike Anderson. I hope this information is edited into the answer.
Silverfish

2
I've added @Kjetil's comment to the answer so that it will show up in a site search for "Frechet mean". Thanks to both of you.
Silverfish

4

One easy but fruitful generalization is to weighted means, i=1nwixi/i=1nwi, where i=1nwi=1. Clearly the common or garden mean is the simplest special case with equal weights wi=1/n.

Letting the weights depend on the order of values in magnitude, from smallest to largest, points to various other special cases, notably the idea of a trimmed mean, which is known by other names too.

To avoid excessive use of notation where it is not needed or especially helpful, imagine for example ignoring the smallest and largest values and taking the (equally weighted) mean of the others. Or imagine ignoring the two smallest and two largest and taking the mean of the others; and so forth. The most vigorous trimming would ignore all but the one or two middle values in order, depending on whether the number of values was odd or even, which is naturally just the familiar median. Nothing in the idea of trimming commits you to ignoring equal numbers in each tail of a sample, but saying more about asymmetric trimming would take us further away from the main idea in this thread.

In short, means (unqualified) and medians are extreme limiting cases of the family of (symmetric) trimmed means. The overall idea is to allow compromises between one ideal of using all the information in the data and another ideal of protecting oneself from extreme data points, which may be unreliable outliers.

See the reference here for one fairly recent review.


4

The question invites us to characterize the concept of "mean" in a sufficiently broad sense to encompass all the usual means--power means, Lp means, medians, trimmed means--but not so broadly that it becomes almost useless for data analysis. This reply discusses some of the axiomatic properties that any reasonably useful definition of "mean" should have.


Basic Axioms

A usefully broad definition of "mean" for the purpose of data analysis would be any sequence of well-defined, deterministic functions fn:AnA for AR and n=1,2, such that

(1) min(x)fn(x)max(x) for all x=(x1,x2,,xn)An (a mean lies between the extremes),

(2) fn is invariant under permutations of its arguments (means do not care about the order of the data), and

(3) each fn is nondecreasing in each of its arguments (as the numbers increase, their mean cannot decrease).

We must allow for A to be a proper subset of real numbers (such as all positive numbers) because plenty of means, such as geometric means, are defined only on such subsets.

We might also want to add that

(1') there exists at least some xA for which min(x)fn(x)max(x) (means are not extremes). (We cannot require that this always hold. For instance, the median of (0,0,,0,1) equals 0, which is the minimum.)

These properties seem to capture the idea behind a "mean" being some kind of "middle value" of a set of (unordered) data.

Consistency axioms

I am further tempted to stipulate the rather less obvious consistency criterion

(4.a) The range of fn+1(t,x1,x2,,xn) as t varies throughout the interval [min(x),max(x)] includes fn(x). In other words, it is always possible to leave the mean unchanged by adjoining an appropriate value t to a dataset. In conjunction with (3), it implies that adjoining extreme values to a dataset will pull the mean towards those extremes.

If we wish to apply the concept of mean to a distribution or "infinite population", then one way would be to obtain it in the limit of arbitrarily large random samples. Of course the limit might not always exist (it does not exist for the arithmetic mean when the distribution has no expectation, for instance). Therefore I do not want to impose any additional axioms to guarantee the existence of such limits, but the following seems natural and useful:

(4.b) Whenever A is bounded and xn is a sequence of samples from a distribution F supported on A, then the limit of fn(xn) almost surely exists. This prevents the mean from forever "bouncing around" within A even as sample sizes get larger and larger.

Along the same lines, we could further narrow the idea of a mean to insist that it become a better estimator of "location" as sample sizes increase:

(4.c) Whenever A is bounded, then the variance of the sampling distribution of fn(X(n)) for a random sample X(n)=(X1,X2,,Xn) of F is nondecreasing in n.

Continuity axiom

We might consider asking means to vary "nicely" with the data:

(5) fn is separately continuous in each argument (a small change in the data values should not induce a sudden jump in their mean).

This requirement might eliminate some strange generalizations, but it does not rule out any well-known mean. It will rule out some aggregation functions.

An invariance axiom

We can conceive of means as applying to either interval or ratio data (in Stevens' well-known sense). We cannot demand they be invariant under shifts of location (the geometric mean is not), but we can require

(6) fn(λx)=λfn(x) for all xAn and all λ>0 for which λxAn. This says only that we are free to compute fn using any units of measurement we like.

All the means mentioned in the question satisfy this axiom except for some aggregation functions.


Discussion

General aggregation functions f2, as described in the question, do not necessarily satisfy axioms (1'), (2), (3), (5), or (6). Whether they satisfy any consistency axioms may depend on how they are extended to n>2.

The usual sample median enjoys all these axiomatic properties.

We could augment the consistency axioms to include

(4.d) f2n(x;x)=fn(x) for all xAn.

This implies that when all elements of a dataset are repeated equally often, the mean does not change. This may be too strong, though: the Winsorized mean does not have this property (except asymptotically). The purpose of Winsorizing at the 100α% level is to provide resistance against changes in at least 100α% of the data at either extreme. For instance, the 10% Winsorized mean of (1,2,3,6) is the arithmetic mean of (2,2,3,3), equal to 2.5, but the 10% Winsorized mean of (1,1,2,2,3,3,6,6) is 3.5.

I do not know which of the consistency axioms (4.a), (4.b), or (4.c) would be most desirable or useful. They appear to be independent: I don't think any two of them imply the third.


(+1) I think (1'), "means are not extremes", is an interesting point. Many otherwise natural definitions of mean happen to include the minimum and maximum as special or limiting cases: this is true of power means, Lehmer means, Fréchet mean, Chisini mean and Stolarsky mean. Though it does seem a bit odd to refer to them as "average"!
Silverfish

Yes, limiting cases are unavoidable. But for finite datasets we might want to insist that neither the max nor the min qualify as "means."
whuber

On the other hand, not only is it true that "the usual sample median enjoys all these axiomatic properties", but so do the usual sample quantile (unless I've missed something). It also feels a bit odd to refer to e.g. the upper quartile as a "mean" (though I've seen it used as a measure of central tendency on very skewed data). If we accept all other quantiles, it no longer feels quite so perverse to admit minima and maxima. But I can certainly see it may be desirable to at least retain the right to exclude them.
Silverfish

1
I am not perturbed by the admission of quantiles into the pantheon of means. After all, for given families of distributions, certain non-median quantiles will coincide with arithmetic means, so you could be in trouble if you tried to eliminate this possibility axiomatically. (Consider a family of lognormal distributions of constant geometric SD, for instance.) If the arithmetic mean cannot qualify as a mean, all is lost!
whuber

1
I have considered that approach and rejected it, as explained in my answer: if you apply such a criterion for n>2, you eliminate the median as a form of mean!
whuber

2

I think the median can be considered a type of a generalization of the arithmetic mean. Specifically, the arithmetic mean and the median (among others) can be unified as special cases of the Chisini mean. If you are going to perform some operation over a set of values, the Chisini mean is a number that you can substitute for all of the original values in the set and still get the same result. For example, if you want to sum your values, replacing all the values with the arithmetic mean will yield the same sum. The idea is that a certain value is representative of the numbers in the set in the context of a certain operation over those numbers. (An interesting implication of this way of thinking is that a given value—the arithmetic mean—can only be considered representative under the assumption that you are doing certain things with those numbers.)

This is less obvious for the median (and I note that the median is not listed as one of the Chisini means on Wolfram or Wikipedia), but if you were to allow operations over ranks, the median could fit within the same idea.


This is a very interesting suggestion. Could you suggest a suitable operation, so that for a median M we would have f(M,M,...,M)=f(x1,x2,...,xn)?
Silverfish

That's a good question, @Silverfish, I've been thinking about that ;-). My thinking is more that, in your Q & the discussion in comments, the conceptual framework seems to be how to get the mean & how to get the data back from the mean; OTOH, my framing is what we use the mean for: viz as a compressed representation of the data w/ the minimum information loss.
gung - Reinstate Monica

I've added some citations to the question which show a wider range of conceptual frameworks, including this one. At the moment I can't see a better f than "take the median", which doesn't quite seem within the spirit of the piece!
Silverfish

@Silverfish, I grant that does seem like a somewhat problematic hole in my position.
gung - Reinstate Monica

While the insight from Chisini's set-up is that, for example, the arithmetic mean preserves the sum, while the geometric mean preserves the product, it's still true (just less interesting) that the arithmetic mean of (x¯,x¯,...,x¯) is also x¯ and so on. So I'm not convinced it's a fatal blow.
Silverfish

-1

The question is not well defined. If we agree on the common "street" definition of mean as the sum of n numbers divided by n then we have a stake in the ground. Further If we would look at measures of central tendency we could say both Mean and Median are generealization but not of each other. Part of my background is in non parametrics so I like the median and the robustness it provides, invariance to monotonic transformation and more. but each measure has it's place depending on objective.


2
Welcome to our site, Bob. I believe that if you read to the end of the question--especially the long penultimate paragraph--you will discover that it is precise and well-defined. (If not, it would be a good idea to explain what you mean by "not well defined.) Your comments don't really seem to address what is being asked.
whuber

1
I actually sympathise with Bob's feeling that the question is not terribly well-defined, in the sense that the concept of "mean" does not have a single definition, but I have tried my best to make things as clear as possible. I hope my most recent edit helps clarify things.
Silverfish

1
The reason I feel the question has some value other than mere terminology (what does mean mean anyway, and is there a definition we can stretch as far as to include the median?) is that it may be instructive to see the median as just one member of a family of generalizations of the mean; Nick Cox's example of the median as a limiting case of the trimmed mean is particularly nice - it ties in neatly with the "robustness" property you like. In the family of trimmed means, the "street" arithmetic mean and the median lie at opposite ends with a spectrum between them.
Silverfish
弊社のサイトを使用することにより、あなたは弊社のクッキーポリシーおよびプライバシーポリシーを読み、理解したものとみなされます。
Licensed under cc by-sa 3.0 with attribution required.