正規分布からの描画を使用した均一分布からの描画のシミュレーション

15

最近、確率論の質問の1つが次のようなデータサイエンスインタビューリソースを購入しました。

既知のパラメーターを使用した正規分布からの描画を考えると、均一分布からの描画をどのようにシミュレートできますか？

私の最初の思考プロセスは、離散確率変数の場合、正規分布をK個の一意のサブセクションに分割でき、各サブセクションは正規曲線の下で等しい面積を持つというものでした。次に、変数が正常曲線のどの領域に入るかを認識することにより、変数がどのK値を取るかを決定できます。

しかし、これは離散確率変数に対してのみ機能します。連続したランダム変数に対して同じことを行う方法についていくつかの研究を行いましたが、残念ながら、入力として均一なランダム変数を使用し、他の分布からランダム変数を出力できる逆変換サンプリングなどの手法しか見つかりませんでした。おそらく、このプロセスを逆に実行して、一様なランダム変数を取得できると考えていましたか？

また、おそらく正規確率変数を線形合同ジェネレーターへの入力として使用することも考えましたが、これが機能するかどうかはわかりません。

この質問にどのようにアプローチするかについての考えはありますか？

— ウェリントン
ソース

30

正規分布の計算とは無関係の単純な代数計算を使用するという精神で、私は次のように傾くでしょう。それらは私が考えたとおりに注文されています（したがって、ますます創造的になる必要があります）が、最後まで最高の、そして最も驚くべきことを保存しました。

逆ボックス・ミュラー技術：法線の各対から $(X,Y)$ は、2人の独立した制服のように構成することができる $\text{atan2}(Y,X)$ （インターバルで $[-\pi, \pi]$ ）および $\exp(-(X^2+Y^2)/2)$ 間隔で（ $[0,1]$ ）。
2のグループに法線を取り、一連の取得するために彼らの正方形を合計 $\chi^2_2$ 変量 $Y_1, Y_2, \ldots, Y_i, \ldots$ 。式ペアから得られた

$X_{i} = \frac{Y_{2 i}}{Y_{2 i - 1} + Y_{2 i}}$ $X_i = \frac{Y_{2i}}{Y_{2i-1}+Y_{2i}}$
必要があります $\text{Beta}(1,1)$ 一様である分布を、。

これには基本的な単純な算術のみが必要であることは明らかです。
なぜならピアソン相関係数の正確な分布を標準的な二変量正規分布から4ペアのサンプルのが均一に分布している $[-1,1]$ 、我々は単に（すなわち、8つの値の4つのペアのグループに法線を取ることができます各セット）とこれらのペアの相関係数を返します。（これには、単純な算術演算と2つの平方根演算が含まれます。）
古くから、球体の円柱投影（3空間の表面）は等面積であることが知られています。これは、球体上の均一な分布の投影では、水平座標（経度に対応）と垂直座標（緯度に対応）の両方が均一に分布することを意味します。3変量標準正規分布は球対称であるため、球への投影は均一です。経度の取得は、基本的にBox-Mueller法の角度（qv）と同じ計算ですが、投影された緯度は新しいものです。球体への投影は、3組の座標正規化するだけです。 $(x,y,z)$ そしてそのポイントでは投影された緯度です。したがって、3つのグループ、の正規変量を取り、計算します。 $z$ $X_{3i-2}, X_{3i-1}, X_{3i}$

$\frac{X_{3 i}}{\sqrt{X_{3 i - 2}^{2} + X_{3 i - 1}^{2} + X_{3 i}^{2}}}$ $\frac{X_{3i}}{\sqrt{X_{3i-2}^2 + X_{3i-1}^2 + X_{3i}^2}}$
以下のために。 $i=1, 2, 3, \ldots$
ほとんどのコンピューティングシステムは2進数で数値を表現するため、通常、均一な数値の生成は、から（またはコンピューターの語長に関連する高いべき乗）の均一に分布した整数を生成し、必要に応じて再スケーリングすることから始まります。このような整数は、桁の2進数の文字列として内部的に表されます。Normal変数をその中央値と比較することにより、独立したランダムビットを取得できます。したがって、Normal変数を目的のビット数に等しいサイズのグループに分割し、それぞれをその平均と比較し、真/偽の結果の結果シーケンスを2進数に組み立てるだけで十分です。書く $0$ $2^{32}-1$ $2$ $32$ $k$ ビットの数の記号のために（ある、とき及びその他）我々が得られた正規化された均一な値を表現することができる式と $H$ $H(x)=1$ $x\gt 0$ $H(x)=0$ $[0, 1)$

$\sum_{j = 0}^{k - 1} H (X_{k i - j}) 2^{- j - 1} .$ $\sum_{j=0}^{k-1} H(X_{ki - j})2^{-j-1}.$
変量から引き出すことができる任意のその中央値である連続分布（標準的な通常通り）; それらはグループで処理され、各グループはそのような疑似均一値を1つ生成します。 $X_n$ $0$ $k$
棄却サンプリングは、任意の分布からランダム変量を抽出するための標準的で柔軟な強力な方法です。ターゲット分布にPDF ます。値は、PDF 別の分布に従って描画されます。棄却ステップでは、と間にある均一な値がとは独立して描画され、と比較されます：小さい場合は $f$ $Y$ $g$ $U$ $0$ $g(Y)$ $Y$ $f(Y)$ $Y$ 保持されますが、それ以外の場合はプロセスが繰り返されます。しかし、このアプローチは循環的に思えます。最初に一様変量を必要とするプロセスで、どのように一様変量を生成するのでしょうか。

答えは、棄却ステップを実行するために実際に一様変量を必要としないということです。代わりに（仮定してフェアコインをフリップして、またはランダムに取得できます。これは、均一な変量のバイナリ表現に最初のビットとして解釈される間隔で。結果がある場合にことを意味。そうでない場合は、。 $g(Y)\ne 0$ $0$ $1$ $U$ $[0,1)$ $0$ $0 \le U \lt 1/2$ $1/2\le U \lt 1$ ：時間の半分が、これは拒否ステップを決定するのに十分であるならばが、硬貨は、受け入れなければなりません。もしが、硬貨は、拒否されるべきです。そうでない場合は、次のビットを取得するために、コインを再度フリップする必要があります。なぜなら、どんな値 $f(Y)/g(Y) \ge 1/2$ $0$ $Y$ $f(Y)/g(Y) \lt 1/2$ $1$ $Y$ $U$ 有する-がある各フリップの後に停止する可能性は、フリップの予想される数はわずかであり、。 $f(Y)/g(Y)$ $1/2$ $1/2(1)+1/4(2)+1/8(3)+\cdots+2^{-n}(n)+\cdots=2$

予想される拒否の数が少ない場合、拒否サンプリングは価値があり（かつ効率的）な場合があります。これは、標準PDFの下に可能な限り大きな長方形（均一な分布を表す）を収めることで実現できます。

長方形の面積を最適化するための微積分を使用して、そのエンドポイントはであるべきであることがわかりますの高さが等しい、 $\pm 1$ $\exp(-1/2)/\sqrt{2\pi}\approx 0.241971$ , making its area a little greater than $0.48$ . By using this standard Normal density as $g$ and rejecting all values outside the interval $[-1,1]$ automatically, and otherwise applying the rejection procedure, we will obtain uniform variates in $[-1,1]$ efficiently:
- In a fraction $2\Phi(-1) \approx 0.317$ of the time, the Normal variate lies beyond $[-1,1]$ and is immediately rejected. ( $\Phi$ is the standard Normal CDF.)
- In the remaining fraction of the time, the binary rejection procedure has to be followed, requiring two more Normal variates on average.
- The overall procedure requires an average of $1/(2\exp(-1/2)/\sqrt{2\pi}) \approx 2.07$ steps.
The expected number of Normal variates needed to produce each uniform result works out to

$\sqrt{2 e π} (1 - 2 Φ (- 1)) \approx 2.82137.$

Although that is pretty efficient, note that (1) computation of the Normal PDF requires computing an exponential and (2) the value $\Phi(-1)$ must be precomputed once and for all. It's still a little less calculation than the Box-Mueller method (q.v.).
The order statistics of a uniform distribution have exponential gaps. Since the sum of squares of two Normals (of zero mean) is exponential, we may generate a realization of $n$ independent uniforms by summing the squares of pairs of such Normals, computing the cumulative sum of these, rescaling the results to fall in the interval $[0,1]$ , and dropping the last one (which will always equal $1$ ). This is a pleasing approach because it requires only squaring, summing, and (at the end) a single division.

The $n$ values will automatically be in ascending order. If such a sorting is desired, this method is computationally superior to all the others insofar as it avoids the $O(n\log(n))$ cost of a sort. If a sequence of independent uniforms is needed, however, then sorting these $n$ values randomly will do the trick. Since (as seen in the Box-Mueller method, q.v.) the ratios of each pair of Normals are independent of the sum of squares of each pair, we already have the means to obtain that random permutation: order the cumulative sums by the corresponding ratios. (If $n$ $k$ $2(k+1)$ $k$ $k$ $O(n\log(k))$ $O(n)$ $2n(1+1/k)$ Normal variates to generate $n$ uniform values.)
To a superb approximation, any Normal variate with a large standard deviation looks uniform over ranges of much smaller values. Upon rolling this distribution into the range $[0,1]$ (by taking only the fractional parts of the values), we thereby obtain a distribution that is uniform for all practical purposes. This is extremely efficient, requiring one of the simplest arithmetic operations of all: simply round each Normal variate down to the nearest integer and retain the excess. The simplicity of this approach becomes compelling when we examine a practical R implementation:
```
rnorm(n, sd=10) %% 1
```
reliably produces n uniform values in the range $[0,1]$ at the cost of just n Normal variates and almost no computation.

(Even when the standard deviation is $1$ , the PDF of this approximation varies from a uniform PDF, as shown in the following figure, by less than one part in $10^8$ ! To detect it reliably would require a sample of $10^{16}$ values--that's already beyond the capability of any standard test of randomness. With a larger standard deviation the non-uniformity is so small it cannot even be calculated. For instance, with an SD of $10$ as shown in the code, the maximum deviation from a uniform PDF is only $10^{-857}$ .)

In every case Normal variables "with known parameters" can easily be recentered and rescaled into the Standard Normals assumed above. Afterwards, the resulting uniformly distributed values can be recentered and rescaled to cover any desired interval. These require only basic arithmetic operations.

The ease of these constructions is evidenced by the following R code, which uses only one or two lines for most of them. Their correctness is witnessed by the resulting near-uniform histograms based on $100,000$ independent values in each case (requiring around 12 seconds for all seven simulations). For reference--in case you are worried about the amount of variation appearing in any of these plots--a histogram of uniform values simulated with R's uniform random number generator is included at the end.

Histograms

All these simulations were tested for uniformity using a $\chi^2$ test based on $1000$ bins; none could be considered significantly non-uniform (the lowest p-value was $3\%$ --for the results generated by R's actual uniform number generator!).

set.seed(17)
n <- 1e5
y <- matrix(rnorm(floor(n/2)*2), nrow=2)
x <- c(atan2(y[2,], y[1,])/(2*pi) + 1/2, exp(-(y[1,]^2+y[2,]^2)/2))
hist(x, main="Box-Mueller")

y <- apply(array(rnorm(4*n), c(2,2,n)), c(3,2), function(z) sum(z^2))
x <- y[,2] / (y[,1]+y[,2])
hist(x, main="Beta")

x <- apply(array(rnorm(8*n), c(4,2,n)), 3, function(y) cor(y[,1], y[,2]))
hist(x, main="Correlation")

n.bits <- 32; x <-  (2^-(1:n.bits)) %*% matrix(rnorm(n*n.bits) > 0, n.bits)
hist(x, main="Binary")

y <- matrix(rnorm(n*3), 3)
x <- y[1, ] / sqrt(apply(y, 2, function(x) sum(x^2)))
hist(x, main="Equal area")

accept <- function(p) { # Using random normals, return TRUE with chance `p`
  p.bit <- x <- 0
  while(p.bit == x) {
    p.bit <- p >= 1/2
    x <- rnorm(1) >= 0
    p <- (2*p) %% 1
  }
  return(x == 0)
}
y <- rnorm(ceiling(n * sqrt(exp(1)*pi/2))) # This aims to produce `n` uniforms
y <- y[abs(y) < 1]
x <- y[sapply(y, function(x) accept(exp((x^2-1)/2)))]
hist(x, main="Rejection")

y <- matrix(rnorm(2*(n+1))^2, 2)
x <- cumsum(y)[seq(2, 2*(n+1), 2)]
x <- x[-(n+1)] / x[n+1]
x <- x[order(y[2,-(n+1)]/y[1,-(n+1)])] 
hist(x, main="Ordered")

x <- rnorm(n) %% 1 # (Use SD of 5 or greater in practice)
hist(x, main="Modular")

x <- runif(n)      # Reference distribution
hist(x, main="Uniform")

— whuber
ソース

2

(+1) If I were asking this question in an interview, I'd modify it to ask about the case where the parameters are fixed, but unknown, which strikes me as more interesting. The Pearson correlation approach (#3) goes through unchanged, but is perhaps slightly esoteric. The Beta approach (#2) requires only slight modification by considering the squares of differences of disjoint pairs. Similarly,

Z = (X_{1} - X_{2}) / (X_{3} - X_{4})

$Z = (X_1 - X_2)/(X_3-X_4)$ is standard Cauchy (regardless of the mean and variance of

X

$X$ ), which has a nice cdf.

— cardinal

1

More generally, the principle is to find a pivotal quantity from the sample with a computationally amenable cdf. This ties in nicely with constructing confidence intervals and hypothesis tests, with the twist that we might seek to optimize the number of elements used rather than the latter cases which focus more on maximizing the information from a fixed sample size.

— cardinal

@Cardinal Thank you for the interesting comments, as well as the ninth method (Cauchy). Even finding a pivotal quantity is unnecessary when only a good approximation is sought. For instance, (8) works perfectly well if you reserve a small number of initial results to establish a rough scale.

— whuber

8

You can use a trick very similar to what you mention. Let's say that $X \sim N(\mu, \sigma^2)$ is a normal random variable with known parameters. Then we know its distribution function, $\Phi_{\mu,\sigma^2}$ , and $\Phi_{\mu,\sigma^2}(X)$ will be uniformly distributed on $(0,1)$ . To prove this, note that for $d \in (0,1)$ we see that

$P(\Phi_{\mu,\sigma^2}(X) \leq d) = P(X \leq \Phi_{\mu,\sigma^2}^{-1}(d)) = d$ .

The above probability is clearly zero for non-positive $d$ and $1$ for $d \geq 1$ . This is enough to show that $\Phi_{\mu,\sigma^2}(X)$ has a uniform distribution on $(0,1)$ as we have shown that the corresponding measures are equal for a generator of the Borel $\sigma$ -algebra on $\mathbb{R}$ . Thus, you can just tranform the normally distributed data by the distribution function and you'll get uniformly distributed data.

— swmo
ソース

4

It's the inverse of inverse transform sampling!

— Roger Fan

2番目の段落の2番目の文について詳しく説明していただけますか？「これは、Φ0、σ2（X）が（0,1）に一様分布していることを示すのに十分です。対応する測度は、ボレルσ代数の生成器に対して等しいことを示しました。 ℝに。」

— wellington

To show that some real random variable,

X

$X$ , has a uniform distribution, we should show that its corresponding measure,

X (P)

$X(P)$ equals that of the uniform distribution for all measurable sets of the real line. However, it's actually enough to consider some generator of the

σ

$\sigma$ -algebra, due to a uniqueness of measures-theorem. If they are equal on sets of the generator, they'll be equal for all measurable sets. This is just a measure-theoretic attachment to the answer.

— swmo