さまざまな確率の確率分布

36

各試行の確率が0.6で、16回の試行で9回の成功の確率を取得したい場合、二項分布を使用できます。16の各試験の成功確率が異なる場合、何を使用できますか？

distributions probability binomial

— グレッグ
ソース

1

@whuber正規近似の説明では、平均と標準偏差の計算はウィキペディアの説明とは異なります。Wikiでは、平均はnpであり、標準偏差はnp（1-p）です。したがって、この問題では、二項分布の成功確率の変化の正規近似では、平均はp1 + p2 + p3 + p4 + p5 + ... + piであり、分散はp1（1-p1）+ p2（ 1-p2）+ ... + pi（1-pi）。私は正しいですか？

— デビッド

1

ポアソン二項分布に関するウィキペディアを参照してください。また、ここでいくつかのヒットを見つける検索語。

— グレン_b-モニカを復元

@Davidすべての

p_{i}

$p_i$ が共通の値

等しい場合

p

$p$ 、

p_{1} + p_{2} + \dots + p_{n} = n p

$p_1+p_2+\cdots+p_n = np$ および

p_{1} (1 - p_{1}) + \dots + p_{n} (1 - p_{n}) = n p (1 - p)

$p_1(1-p_1)+\cdots+p_n(1-p_n)=np(1-p)$ 、参照するウィキペディアの説明が単なる特殊なケースであることを示しています。

— whuber

stats.stackexchange.com/questions/160458/…

en.wikipedia.org/wiki/Poisson_binomial_distribution

— ジェシカ

22

これは16の（おそらく独立した）二項試行の合計です。独立性の仮定により、確率を掛けることができます。そこから、成功の確率およびを持つ2つの試行の後、両方の試行で成功する可能性はであり、成功しない可能性はであり、 1つの成功は $p_1$ $p_2$ $p_1 p_2$ $(1-p_1)(1-p_2)$ 。この最後の表現は、正確に1つの成功を得る2つの方法が相互に排他的であるという事実にその正当性を負っています。つまり、確率が追加されます。 $p_1(1-p_2) + (1-p_1)p_2$

これらの2つのルール（独立した確率の乗算と相互排他的なルールの加算）を使用して、たとえば確率 16回の試行の答えを算出できます。そのためには、指定された数の成功（9など）を取得するすべての方法を考慮する必要があります。あります $p_1, \ldots, p_{16}$ 9つの成功を達成する方法。たとえば、そのうちの1つは、試行1、2、4、5、6、11、12、14、および15が成功し、他の試行が失敗した場合に発生します。成功の確率を持っていたおよび及び障害は確率あっ $\binom{16}{9} = 11440$ $p_1, p_2, p_4, p_5, p_6, p_{11}, p_{12}, p_{14},$ $p_{15}$ 。これらの16の数値を乗算すると、この特定の結果シーケンスの可能性が得られます。 この数値と残りの11,439の数値を合計すると、答えが得られます。 $1-p_3, 1-p_7, \ldots, 1-p_{13}, 1-p_{16}$

もちろん、コンピューターを使用します。

16回を超える試行では、分布を概算する必要があります。確率およびも小さすぎない場合、正規近似はうまく機能する傾向があります。この方法を使用すると、の和の期待ことに注意して試験がある（試験は独立しているため）、分散が $p_i$ $1-p_i$ $n$ $\mu = p_1 + p_2 + \cdots + p_n$ 。次に、合計の分布が平均と標準偏差正規分布でとします。答えは、数倍以下でと異なる成功の割合に対応する確率の計算に適している傾向があります。大きくなり、この近似は、これまで以上に正確に取得し、のより大きな倍数のために働く離れてから。 $\sigma^2 = p_1(1-p_1) + p_2(1-p_2) + \cdots + p_n(1-p_n)$ $\mu$ $\sigma$ $\mu$ $\sigma$ $n$ $\sigma$ $\mu$

— ヒューバー
ソース

9

コンピュータ科学者はこれらを「ポアソン試験」と呼び、ベルヌーイ試験と区別しています。中央極限定理近似に加えて、利用可能な適切なテール境界もあります。ここに一つあります。Googleが「ポアソン試験のチェルノフ限界」で検索すると、一般的なCS治療で見つかる可能性のある結果が表示されます。

— 枢機

@Cardinalその命名法は興味深いです。非常に小さな

に対して有効ですが、そうでなければ分布はポアソン分布によって近似されないため、誤解を招くように思われます。（この質問に関するCVに関する別の議論があり、「16」が10,000に置き換えられ、テールの確率を調べますが、私はそれを再び見つけることができませんでした。）

p_{i}

$p_i$

— whuber

1

はい、名前に同意します。最初に出会ったとき、少し奇妙に感じました。ここでは、検索のための便利な用語としてこれを示しました。コンピューター科学者は、特定のアルゴリズムを扱う際にこれらの確率をしばしば考慮するようです。もしあなたがそれを見つけたら、私は他の質問を読むことに興味があります。それは、この1多分？

— 枢機

2

@cardinalは、私たち「CSの人々」がポアソン試行と呼ぶのが正しいことです。実際、この場合、標準のチェルノフ-ホーフディングの範囲は、OPが要求している範囲を正確に与えます。

— スレシュVenkatasubramanian

1

@デビッドによってコメント昨日あたりとしては、そことして平均値を近似する通常のあなたの文を使用して、何かが間違っている

我々は取ることができ、それぞれが16のベルヌーイRVSを、合算されています値が0または1であるため、合計のサポート領域は0〜1ではなく0〜16になります。SDも確認する価値があります。

μ = (p_{1} + p_{2} + \dots + p_{n}) / n

$\mu = (p_1 + p_2 + \cdots + p_n)/n$

— -wolfies

12

@whuberの通常の近似に代わる方法の1つは、「混合」確率、または階層モデルを使用することです。とき、これが適用されるであろういくつかの方法で類似している、とあなたは確率分布によってこれをモデル化することができの密度関数といくつかのパラメータによってインデックスさ。積分方程式が得られます： $p_i$ $p_i\sim Dist(\theta)$ $g(p|\theta)$ $\theta$

P r (s = 9 | n = 16, θ) = (\binom{16}{9}) \int_{0}^{1} p^{9} (1 - p)^{7} g (p | θ) d p

$Pr(s=9|n=16,\theta)={16 \choose 9}\int_{0}^{1} p^{9}(1-p)^{7}g(p|\theta)dp$

二項確率は設定から来、正規近似は（と思う）の設定から来 $g(p|\theta)=\delta(p-\theta)$ （@whuberの答えで定義されたとを使用）、そしてこのPDFの「テール」がピーク付近で急激に落ちることに注意してください。 $g(p|\theta)=g(p|\mu,\sigma)=\frac{1}{\sigma}\phi\left(\frac{p-\mu}{\sigma}\right)$ $\mu$ $\sigma$

また、ベータ分布を使用することもできます。これは、単純な分析形式につながり、通常の近似が行う「小さなp」問題に悩む必要はありません-ベータは非常に柔軟です。使用しとの分布次方程式の解によってセット（これを「mimimum KLダイバージェンス」推定です）。 $beta(\alpha,\beta)$ $\alpha,\beta$

ψ (α) - ψ (α + β) = \frac{1}{n} \sum_{i = 1}^{n} l o g [p_{i}]

$\psi(\alpha)-\psi(\alpha+\beta)=\frac{1}{n}\sum_{i=1}^{n}log[p_{i}]$

ψ (β) - ψ (α + β) = \frac{1}{n} \sum_{i = 1}^{n} l o g [1 - p_{i}]

$\psi(\beta)-\psi(\alpha+\beta)=\frac{1}{n}\sum_{i=1}^{n}log[1-p_{i}]$

Where $\psi(.)$ is the digamma function - closely related to harmonic series.

We get the "beta-binomial" compound distribution:

(\binom{16}{9}) \frac{1}{B (α, β)} \int_{0}^{1} p^{9 + α - 1} (1 - p)^{7 + β - 1} d p = (\binom{16}{9}) \frac{B (α + 9, β + 7)}{B (α, β)}

${16 \choose 9}\frac{1}{B(\alpha,\beta)}\int_{0}^{1} p^{9+\alpha-1}(1-p)^{7+\beta-1}dp ={16 \choose 9}\frac{B(\alpha+9,\beta+7)}{B(\alpha,\beta)}$

This distribution converges towards a normal distribution in the case that @whuber points out - but should give reasonable answers for small $n$ and skewed $p_i$ - but not for multimodal $p_i$ , as beta distribution only has one peak. But you can easily fix this, by simply using $M$ beta distributions for the $M$ modes. You break up the integral from $0<p<1$ into $M$ pieces so that each piece has a unique mode (and enough data to estimate parameters), and fit a beta distribution within each piece. then add up the results, noting that making the change of variables $p=\frac{x-L}{U-L}$ for $L<x<U$ the beta integral transforms to:

B (α, β) = \int_{L}^{U} \frac{(x - L)^{α - 1} (U - x)^{β - 1}}{(U - L)^{α + β - 1}} d x

$B(\alpha,\beta)=\int_{L}^{U}\frac{(x-L)^{\alpha-1}(U-x)^{\beta-1}}{(U-L)^{\alpha+\beta-1}}dx$

— probabilityislogic
ソース

+1 This answer contains some interesting and clever suggestions. The last one looks particularly flexible and powerful.

— whuber

Just to take something very simple and concrete, suppose (i)

p_{i} = \frac{i}{17}

$p_i = \frac{i}{17}$ and (ii)

p_{i} = \sqrt{i} / 17

$p_i = \sqrt{i}/17$ , for

i = 1

$i = 1$ to 16. What would be the solution to your

α

$\alpha$ and

β

$\beta$ estimates, and thus your estimates for

P (X = 9)

$P(X=9)$ given

n = 16

$n= 16$ , as per the OP's problem?

— wolfies

Great answer and proposal, especially the beta! It'd be cool to see this answer written in its general form with

n

$n$ and

s

$s$ .

— pglpm

8

Let $X_i$ ~ $Bernoulli(p_i)$ with probability generating function (pgf):

pgf = E [t^{X_{i}}] = 1 - p_{i} (1 - t)

$\text{pgf} = E[t^{X_i}] = 1 - p_i (1-t)$

Let $S = \sum_{i=1}^n X_i$ denote the sum of $n$ such independent random variables. Then, the pgf for the sum $S$ of $n=16$ such variables is:

\begin{aligned} pgfS & = E [t^{S}] \\ = E [t^{X_{1}}] E [t^{X_{2}}] \dots E [t^{X_{16}}] (... by independence) \\ = \prod_{i = 1}^{16} (1 - p_{i} (1 - t)) \end{aligned}

$\begin{align*}\displaystyle \text{pgfS} &= E[t^S] \\&= E[t^{X_1}] E[t^{X_2}] \dots E[t^{X_{16}}] \text{ (... by independence)} \\ &= \prod _{i=1}^{16} \left(1-p_i(1-t) \right)\end{align*}$

We seek $P(S=9)$ , which is:

\frac{1}{9!} \frac{d^{9} pgfS}{d t^{9}} |_{t = 0}

$\frac{1}{9!}\frac{d^9 \text{pgfS}}{dt^9}|_{t=0}$

ALL DONE. This produces the exact symbolic solution as a function of the $p_i$ . The answer is rather long to print on screen, but it is entirely tractable, and takes less than $\frac{1}{100}$ th of a second to evaluate using Mathematica on my computer.

Examples

If $p_i = \frac{i}{17}, i= 1 \text{ to } 16$ , then: $P(S=9) = \frac{9647941854334808184}{48661191875666868481} = 0.198268 \dots$

If $p_i = \frac{\sqrt{i}}{17}, i= 1 \text{ to } 16$ , then: $P(S=9) = 0.000228613 \dots$

More than 16 trials?

With more than 16 trials, there is no need to approximate the distribution. The above exact method works just as easily for examples with say $n = 50$ or $n = 100$ . For instance, when $n = 50$ , it takes less than $\frac{1}{10}$ th of second to evaluate the entire pmf (i.e. at every value $s = 0, 1, \dots, 50$ ) using the code below.

Mathematica code

Given a vector of $p_i$ values, say:

n = 16;   pvals = Table[Subscript[p, i] -> i/(n+1), {i, n}];

... here is some Mathematica code to do everything required:

pgfS = Expand[ Product[1-(1-t)Subscript[p,i], {i, n}] /. pvals];
D[pgfS, {t, 9}]/9! /. t -> 0  // N

0.198268

To derive the entire pmf:

Table[D[pgfS, {t,s}]/s! /. t -> 0 // N, {s, 0, n}]

... or use the even neater and faster (thanks to a suggestion from Ray Koopman below):

CoefficientList[pgfS, t] // N

For an example with $n = 1000$ , it takes just 1 second to calculate pgfS, and then 0.002 seconds to derive the entire pmf using CoefficientList, so it is extremely efficient.

— wolfies
ソース

1

It can be even simpler. With[{p = Range@16/17}, N@Coefficient[Times@@(1-p+p*t),t,9]] gives the probability of 9 successes, and With[{p = Range@16/17}, N@CoefficientList[Times@@(1-p+p*t),t]] gives the probabilities of 0,...,16 successes.

— Ray Koopman

@RayKoopman That is cool. The Table for the

p

$p$ -values is intentional to allow for more general forms not suitable with Range. Your use of CoefficientList is very nice! I've added an Expand to the code above which speeds the direct approach up enormously. Even so, CoefficientList is even faster than a ParallelTable. It does not make much difference for

n

$n$ under 50 (both approaches take just a tiny fraction of a second either way to generate the entire pmf), but your CoefficientList will also be a real practical advantage when n is really large.

— wolfies

5

@wolfies comment, and my attempt at a response to it revealed an important problem with my other answer, which I will discuss later.

Specific Case (n=16)

There is a fairly efficient way to code up the full distribution by using the "trick" of using base 2 (binary) numbers in the calculation. It only requires 4 lines of R code to get the full distribution of $Y=\sum_{i=1}^{n} Z_i$ where $Pr(Z_i=1)=p_i$ . Basically, there are a total of $2^n$ choices of the vector $z=(z_1,\dots,z_n)$ that the binary variables $Z_i$ could take. Now suppose we number each distinct choice from $1$ up to $2^n$ . This on its own is nothing special, but now suppose that we represent the "choice number" using base 2 arithmetic. Now take $n=3$ so I can write down all the choices so there are $2^3=8$ choices. Then $1,2,3,4,5,6,7,8$ in "ordinary numbers" becomes $1,10,11,100,101,110,111,1000$ in "binary numbers". Now suppose we write these as four digit numbers, then we have $0001,0010,0011,0100,0101,0110,0111,1000$ . Now look at the last $3$ digits of each number - $001$ can be thought of as $(Z_1=0,Z_2=0,Z_3=1)\implies Y=1$ , etc. Counting in binary form provides an efficient way to organise the summation. Fortunately, there is an R function which can do this binary conversion for us, called intToBits(x) and we convert the raw binary form into a numeric via as.numeric(intToBits(x)), then we will get a vector with $32$ elements, each element being the digit of the base 2 version of our number (read from right to left, not left to right). Using this trick combined with some other R vectorisations, we can calculate the probability that $y=9$ in 4 lines of R code:

exact_calc <- function(y,p){
    n       <- length(p)
    z       <- t(matrix(as.numeric(intToBits(1:2^n)),ncol=2^n))[,1:n] #don't need columns n+1,...,32 as these are always 0
    pz      <- z%*%log(p/(1-p))+sum(log(1-p))
    ydist   <- rowsum(exp(pz),rowSums(z))
    return(ydist[y+1])
}

Plugging in the uniform case $p_i^{(1)}=\frac{i}{17}$ and the sqrt root case $p_i^{(2)}=\frac{\sqrt{i}}{17}$ gives a full distribution for y as:

\begin{array}{cc} y & P r (Y = y | p_{i} = \frac{i}{17}) & P r (Y = y | p_{i} = \frac{\sqrt{i}}{17}) \\ 0 & 0.0000 & 0.0558 \\ 1 & 0.0000 & 0.1784 \\ 2 & 0.0003 & 0.2652 \\ 3 & 0.0026 & 0.2430 \\ 4 & 0.0139 & 0.1536 \\ 5 & 0.0491 & 0.0710 \\ 6 & 0.1181 & 0.0248 \\ 7 & 0.1983 & 0.0067 \\ 8 & 0.2353 & 0.0014 \\ 9 & 0.1983 & 0.0002 \\ 10 & 0.1181 & 0.0000 \\ 11 & 0.0491 & 0.0000 \\ 12 & 0.0139 & 0.0000 \\ 13 & 0.0026 & 0.0000 \\ 14 & 0.0003 & 0.0000 \\ 15 & 0.0000 & 0.0000 \\ 16 & 0.0000 & 0.0000 \end{array}

$\begin{array}{c|c}y & Pr(Y=y|p_i=\frac{i}{17}) & Pr(Y=y|p_i=\frac{\sqrt{i}}{17})\\ \hline 0 & 0.0000 & 0.0558 \\ 1 & 0.0000 & 0.1784 \\ 2 & 0.0003 & 0.2652 \\ 3 & 0.0026 & 0.2430 \\ 4 & 0.0139 & 0.1536 \\ 5 & 0.0491 & 0.0710 \\ 6 & 0.1181 & 0.0248 \\ 7 & 0.1983 & 0.0067 \\ 8 & 0.2353 & 0.0014 \\ 9 & 0.1983 & 0.0002 \\ 10 & 0.1181 & 0.0000 \\ 11 & 0.0491 & 0.0000 \\ 12 & 0.0139 & 0.0000 \\ 13 & 0.0026 & 0.0000 \\ 14 & 0.0003 & 0.0000 \\ 15 & 0.0000 & 0.0000 \\ 16 & 0.0000 & 0.0000 \\ \end{array}$

So for the specific problem of $y$ successes in $16$ trials, the exact calculations are straight-forward. This also works for a number of probabilities up to about $n=20$ - beyond that you are likely to start to run into memory problems, and different computing tricks are needed.

Note that by applying my suggested "beta distribution" we get parameter estimates of $\alpha=\beta=1.3206$ and this gives a probability estimate that is nearly uniform in $y$ , giving an approximate value of $pr(y=9)=0.06799\approx\frac{1}{17}$ . This seems strange given that a density of a beta distribution with $\alpha=\beta=1.3206$ closely approximates the histogram of the $p_i$ values. What went wrong?

General Case

I will now discuss the more general case, and why my simple beta approximation failed. Basically, by writing $(y|n,p)\sim Binom(n,p)$ and then mixing over $p$ with another distribution $p\sim f(\theta)$ is actually making an important assumption - that we can approximate the actual probability with a single binomial probability - the only problem that remains is which value of $p$ to use. One way to see this is to use the mixing density which is discrete uniform over the actual $p_i$ . So we replace the beta distribution $p\sim Beta(a,b)$ with a discrete density of $p\sim \sum_{i=1}^{16}w_i\delta(p-p_i)$ . Then using the mixing approximation can be expressed in words as choose a $p_i$ value with probability $w_i$ , and assume all bernoulli trials have this probability. Clearly, for such an approximation to work well, most of the $p_i$ values should be similar to each other. This basically means that for @wolfies uniform distribution of values, $p_i=\frac{i}{17}$ results in a woefully bad approximation when using the beta mixing distribution. This also explains why the approximation is much better for $p_i=\frac{\sqrt{i}}{17}$ - they are less spread out.

The mixing then uses the observed $p_i$ to average over all possible choices of a single $p$ . Now because "mixing" is like a weighted average, it cannot possibly do any better than using the single best $p$ . So if the $p_i$ are sufficiently spread out, there can be no single $p$ that could provide a good approximation to all $p_i$ .

One thing I did say in my other answer was that it may be better to use a mixture of beta distributions over a restricted range - but this still won't help here because this is still mixing over a single $p$ . What makes more sense is split the interval $(0,1)$ up into pieces and have a binomial within each piece. For example, we could choose $(0,0.1,0.2,\dots,0.9,1)$ as our splits and fit nine binomials within each $0.1$ range of probability. Basically, within each split, we would fit a simple approximation, such as using a binomial with probability equal to the average of the $p_i$ in that range. If we make the intervals small enough, the approximation becomes arbitrarily good. But note that all this does is leave us with having to deal with a sum of indpendent binomial trials with different probabilities, instead of Bernoulli trials. However, the previous part to this answer showed that we can do the exact calculations provided that the number of binomials is sufficiently small, say 10-15 or so.

To extend the bernoulli-based answer to a binomial-based one, we simply "re-interpret" what the $Z_i$ variables are. We simply state that $Z_i=I(X_i>0)$ - this reduces to the original bernoulli-based $Z_i$ but now says which binomials the successes are coming from. So the case $(Z_1=0,Z_2=0,Z_3=1)$ now means that all the "successes" come from the third binomial, and none from the first two.

Note that this is still "exponential" in that the number of calculations is something like $k^g$ where $g$ is the number of binomials, and $k$ is the group size - so you have $Y\approx\sum_{j=1}^{g}X_j$ where $X_j\sim Bin(k,p_j)$ . But this is better than the $2^{gk}$ that you'd be dealing with by using bernoulli random variables. For example, suppose we split the $n=16$ probabilities into $g=4$ groups with $k=4$ probabilities in each group. This gives $4^4=256$ calculations, compared to $2^{16}=65536$

By choosing $g=10$ groups, and noting that the limit was about $n=20$ which is about $10^7$ cells, we can effectively use this method to increase the maximum $n$ to $n=50$ .

If we make a cruder approximation, by lowering $g$ , we will increase the "feasible" size for $n$ . $g=5$ means that you can have an effective $n$ of about $125$ . Beyond this the normal approximation should be extremely accurate.

— probabilityislogic
ソース

@momo - I think this is ok, as my answers are two different ways to approach the problem. This answer is not an edited version of my first one - it is just a different answer

— probabilityislogic

1

For a solution in R that is extremely efficient and handles much, much larger values of

n

$n$ , please see stats.stackexchange.com/a/41263. For instance, it solved this problem for

n = 10^{4}

$n=10^4$ , giving the full distribution, in under three seconds. (A comparable Mathematica 9 solution--see @wolfies' answer--also performs well for smaller

n

$n$ but could not complete the execution with such a large value of

n

$n$ .)

— whuber

5

The (in general intractable) pmf is

Pr (S = k) = \sum_{\begin{matrix} A \subset {1, \dots, n} \\ | A | = k \end{matrix}} (\prod_{i \in A} p_{i}) (\prod_{j \in {1, \dots, n} ∖ A} (1 - p_{j})) .

$\Pr(S=k) = \sum_{\substack{A\subset\{1,\dots,n\}\\ |A|=k}} \left( \prod_{i\in A} p_i \right)\left(\prod_{j\in \{1,\dots,n\}\setminus A} (1-p_j) \right) \, .$ R code:

p <- seq(1, 16) / 17
cat(p, "\n")
n <- length(p)
k <- 9
S <- seq(1, n)
A <- combn(S, k)
pr <- 0
for (i in 1:choose(n, k)) {
    pr <- pr + exp(sum(log(p[A[,i]])) + sum(log(1 - p[setdiff(S, A[,i])])))
}
cat("Pr(S = ", k, ") = ", pr, "\n", sep = "")

For the $p_i$ 's used in wolfies answer, we have:

Pr(S = 9) = 0.1982677

When $n$ grows, use a convolution.

— Zen
ソース

1

Doing that with R code was really helpful. Some of us are more concrete-thinkers and it greatly helps to have an operational version of the generating function.

— DWin

@DWin I provide efficient R code in the solution to the same problem (with different values of the

p_{i}

$p_i$ ) at stats.stackexchange.com/a/41263. The problem here is solved in 0.00012 seconds total computation time (estimated by solving it 1000 times) compared to 0.53 seconds (estimated by solving it once) for this R code and 0.00058 seconds using Wolfies' Mathematica code (estimated by solving it 1000 times).

— whuber

So

P (S = k)

$P(S=k)$ would follow a Poisson-Binomial Distribution.

— fccoelho

+1 Very useful post in my attempt at answering this question. I was wondering if using logs is more of a cool mathematical formulation than a real need. I am not too concerned about running times...

— Antoni Parellada