把握するのが最も難しい統計的概念は何ですか?


32

これはここの質問と似たような質問ですが、十分に異なっているので、質問する価値があると思います。

スターターにしたいと思ったのは、最も把握しにくいと思うことです。

私は確率頻度の差です。1つは「現実の知識」(確率)のレベルにあり、もう1つは「現実そのもの」(頻度)のレベルにあります。これについて考えすぎると、ほとんどいつも混乱します。

エドウィン・ジェーンズは、これらの事柄を混同することを説明するために、「マインドプロジェクションの誤 "」と呼ばれる用語を作成しました。

他に把握するのが難しい概念についての考えはありますか?


(これを答えとして十分に知らないので、コメントを追加します。)私は、PIが統計方程式に現れるのはいつも奇妙だと思っていました。つまり、PIは統計と何の関係があるのでしょうか?:)
モニカの復職-さよならSE

2
私は同意します(私の意外なことに)- 多くの数学的分析でそのが現れると思います。$記号で囲まれた\ piとしてLatexコマンドを使用して、πを書くことができます。wikiページを使用して、構文en.wikibooks.org/wiki/LaTeX/Mathematicsを取得します。別のコツは、このサイトに表示されている式を「右クリック」し、「ソースを表示」を選択して使用されたコマンドを取得することです。ππ\pi
確率論的

@Wiki直線の長さの測定から円の長さへの移動時に現れることを受け入れた場合、落下する確率の測定から移動中に表示されない理由がわかりません。セグメント上で円の一部に落ちる確率を測定する?π
ロビンジラード

@Wiki三角関数(サイン、コサイン、タンジェントなど)がある場合は常に、ポップアップするリスクがあります。また、関数を導出するときはいつでも実際に接線を見つけていることを忘れないでください。驚くべきことは、πより頻繁に現れないことです。ππ
カルロスアクシオリー

@Carlos私の有病率疑い大部分の使用によるものである2 N-球につながる、メトリック。同じ静脈では、私はそれがだ期待される電子その有病率の分析によるものです。2π2e
sesqu

回答:


31

何らかの理由で、人々はp値が実際に何であるかを把握することが困難です。


3
@shabbychef:ほとんどの人は、最悪の方法でそれを把握します。つまり、タイプIエラーが発生する確率です。
-suncoolsu

2
私はそれがほとんどのp値は、クラスで説明される方法に関連すると思う(すなわち:ちょうど迅速な定義を与えることによって、およびp値がされていないものを指定せずに)
NICO

I think this is mainly to do with how it is introduced. For me, it was an "add-on" to the classical hypothesis test - so it appears as though its just another way to do a hypothesis test. The other problem is that it is usually only taught with respect to a normal distribution, where everything "works nice" (e.g. p-value is a measure of evidence in testing a normal mean). Generalising the p-value is not easy as there is no specific principles to guide the generalisation (e.g. there is no general agreement on how a p-value should vary with the sample size & multiple comparisons)
probabilityislogic

@shabbychef +1 though student often have difficulties with p-values (roughly because the concept in testing is a bit more subtle than a binary decision process and be cause "inverting a function" is not easy to aprehend). When you say "for some reason" do you mean it is unclear for you why people have difficulties ? PS: If I could, I would try to make statistics on this site about the relation between "being a top answer" and "talking about p-value" :) . I also even ask myself if the hardest statistical concept to grasp can have the most upvote (if it is difficult to grasp ... :) )
robin girard

1
@eduardo - yes a small enough p-value is sufficient to cast doubt on the null hypothesis: but it is calculated in complete isolation to an alternative. Using p-values alone, you can never formally "reject" H0, because no alternative has been specified. If you formally reject H0, then you must also reject the calculations which was based on the assumption of H0 being true, which means you must reject the calculation of the p-value that was derived under this assumption (it messes with your head, but it is the only way to reason consistently).
probabilityislogic

23

Similar to shabbychef's answer, it is difficult to understand the meaning of a confidence interval in frequentist statistics. I think the biggest obstacle is that a confidence interval doesn't answer the question that we would like to answer. We'd like to know, "what's the chance that the true value is inside this particular interval?" Instead, we can only answer, "what's the chance that a randomly chosen interval created in this way contains the true parameter?" The latter is obviously less satisfying.


1
The more I think about confidence intervals, the harder it is for me to think of what kind of question they can answer at a conceptual level that cannot be answered by asking for "the chance a true value is within an interval, given one's state of knowledge". If I were to ask "what is the chance (conditional on my information) that the average income in 2010 was between 10,000 and 50,000?" I don't think the theory of confidence intervals can give an answer to this question.
probabilityislogic

21

What is the meaning of "degrees of freedom"? How about df that are not whole numbers?


13

Conditional probability probably leads to most mistakes in everyday experience. There are many harder concepts to grasp, of course, but people usually don't have to worry about them--this one they can't get away from & is a source of rampant misadventure.


+1; could you add an example or two, favourite or current ?
denis

1
For starters: P(you have the disease|test is positive) != P(test is positive|you have the disease).
xmjx

9

I think that very few scientists understand this basic point: It is only possible to interpret results of statistical analyses at face value, if every step was planned in advance. Specifically:

  • Sample size has to be picked in advance. It is not ok to keep analyzing the data as more subjects are added, stopping when the results looks good.
  • Any methods used to normalize the data or exclude outliers must also be decided in advance. It isn't ok to analyze various subsets of the data until you find results you like.
  • And finally, of course, the statistical methods must be decided in advance. Is it not ok to analyze the data via parametric and nonparametric methods, and pick the results you like.

Exploratory methods can be useful to, well, explore. But then you can't turn around and run regular statistical tests and interpret the results in the usual way.


5
I think John Tukey might disagree en.wikipedia.org/wiki/Exploratory_data_analysis ;o)
Dikran Marsupial

3
I would partially disagree here. I think the caveat that people miss is that the appropriate conditioning operations are easy to ignore for these kinds of issues. Each of these operations change the conditions of the inference, and hence, they change the conditions of it applicability (and therefore to its generality). These is definitely only applicable to "confirmatory analysis", where a well defined model and question have been constructed. In exploratory phase, not looking to answer definite questions - more looking to build a model and come up with hypothesis for the data.
probabilityislogic

I edited my answer a bit to take into account the comments of Dikran and probabilityislogic. Thanks.
Harvey Motulsky

1
For me, the "excluding outliers" is not as clearly wrong as your answer implies. For example, you may only be interested in the relationships at a certain range of responses, and excluding outliers actually helps this kind of analysis. For example, if you want to model "middle class" income, then excluding the super rich and impoverished outliers is a good idea. It is only the outliers within your frame of inference (e.g. "strange" middle class observations) were your comments apply
probabilityislogic

2
Ultimately the real problem with the issues raised in the initial answer is that they (at least partially) invalidate p-values. If you are interested in quantifying an observed effect, one should be able to do any and all of the above with impunity.
russellpierce

9

Tongue firmly in cheek: For frequentists, the Bayesian concept of probability; for Bayesians, the frequentist concept of probability. ;o)

Both have merit of course, but it can be very difficult to understand why one framework is interesting/useful/valid if your grasp of the other is too firm. Cross-validated is a good remedy as asking questions and listening to answers is a good way to learn.


2
I rule I use to remember: Use probabilities to predict frequencies. Once the frequencies have been observed, use them to evaluate the probabilities you assigned. The unfortunately confusing thing is that, often the probability you assign is equal to a frequency you have observed. One thing I have always found odd is why do frequentists even use the word probability? wouldn't it make their concepts easier to understand if the phrase "the frequency of an event" was used instead of "the probability of an event"?
probabilityislogic

Interestingly, cross validation can be seen as a Monte Carlo approximation to the integral of a loss function in Decision Theory. You have an integral p(x)L(xn,x)dx and you approximate it by i=1i=nL(x[ni],xi) Where xn is data vector, and x[ni] is the data vector with the ith observation xi removed
probabilityislogic

8

From my personal experience the concept of likelihood can also cause quite a lot of stir, especially for non-statisticians. As wikipedia says, it is very often mixed up with the concept of probability, which is not exactly correct.



6

What do the different distributions really represent, besides than how they are used.


3
This was the question I found most distracting after statistics 101. I would encounter many distributions with no motivation for them beyond "properties" that were relevant to topics at hand. It took unacceptably long to find out what any represented.
sesqu

1
Maximum entropy "thinking" is one method which helps understand what a distribution is, namely a state of knowledge (or a description of uncertainty about something). This is the only definition that has made sense to me in all situations
probabilityislogic

Ben Bolker provides a good overview of this in the 'beastiary of distributions' section of Ecological Models and Data in R
David LeBauer

5

I think the question is interpretable in two ways, which will give very different answers:

1) For people studying statistics, particularly at a relatively advanced level, what is the hardest concept to grasp?

2) Which statistical concept is misunderstood by the most people?

For 1) I don't know the answer at all. Something from measure theory, maybe? Some type of integration? I don't know.

For 2) p-value, hands down.


Measure theory is neither a field of statistics nor hard. Some types of integration are hard, but, once again, that isn't statistics.
pyon


5

I think people miss the boat on pretty much everything the first time around. I think what most students don't understand is that they're usually estimating parameters based on samples. They don't know the difference between a sample statistic and a population parameter. If you beat these ideas into their head, the other stuff should follow a little bit easier. I'm sure most students don't understand the crux of the CLT either.

弊社のサイトを使用することにより、あなたは弊社のクッキーポリシーおよびプライバシーポリシーを読み、理解したものとみなされます。
Licensed under cc by-sa 3.0 with attribution required.