長方形内の一様にランダムな点がユークリッド距離が指定されたしきい値よりも小さい確率

持っていると仮定します $n$ 境界のある長方形の点 $[0,a] \times [0,b]$ 、これらの点はこの平面に均一に分布しています。（私は統計に精通していないので、エリア内のノードを均一に選択することの違いがわかりません $[0,a] \times [0,b]$ 、または均一に選ぶ $x$ -軸 $[0,a]$ そして $y$ -軸 $[0,b]$ 独立して）。

距離のしきい値が与えられた $d$ 、2点のユークリッド距離が次の値よりも小さい確率を知りたい場合があります。 $d$ 、またはより正確には、ノードの距離のペアがいくつになるか $d$ ？

たぶん、次の説明はあいまいではありません。

この問題を特定させてください。与えられた $n$ ノードとしきい値 $d$ 。これら $n$ ポイントは長方形に均一に分布しています $[0,a] \times [0,b]$ 。確率変数を表す $\xi$ 距離内のポイントのペアの数として $d$ 。探す $E[\xi]$ 。

probability distance

— zhouzhuojie
ソース

math.SEでも関連する質問をいくつか覚えているので、math.SEでも質問を参照してください。彼らはおそらくタグ付けされていprobabilityます。

— 枢機卿

ここで私はmath.SEに見て思い出したことをいくつかの質問がありますが、それらのどれもあなたが求める非常にどのようなしない：（1）math.stackexchange.com/questions/64028（2）math.stackexchange.com/questions/66777（3）math.stackexchange.com/questions/101692（4）math.stackexchange.com/questions/50775

— 枢機卿

この問題は、幾何学的な直観と引数を使用して分析的に解決できます。残念ながら、答えはかなり長く、少し厄介です。

基本的なセットアップ

最初に、いくつかの表記法を設定しましょう。長方形からランダムにランダムに点を描くと仮定します $[0,a] \times [0,b]$ 。一般性を失うことなく、 $0 < b < a$ 。しましょう $(X_1,Y_1)$ 最初の点の座標であり、 $(X_2,Y_2)$ 2番目の点の座標です。そして、 $X_1$ 、 $X_2$ 、 $Y_1$ 、および $Y_2$ と相互に独立しています $X_i$ 均一に分布 $[0,a]$ そして $Y_i$ 均一に分布 $[0,b]$ 。

2点間のユークリッド距離を考えます。これは

D = \sqrt{(X_{1} - X_{2})^{2} + (Y_{1} - Y_{2})^{2}} =: \sqrt{Z_{1}^{2} + Z_{2}^{2}},

$D = \sqrt{(X_1-X_2)^2 + (Y_1-Y_2)^2} =: \sqrt{ Z_1^2 + Z_2^2} \> ,$ どこ

Z_{1} = | X_{1} - X_{2} |

$Z_1 = |X_1-X_2|$ そして

Z_{2} = | Y_{1} - Y_{2} |

$Z_2 = |Y_1-Y_2|$ 。

三角分布

以来 $X_1$ そして $X_2$ 独立したユニフォームであり、 $X_1 - X_2$ 三角分布があります $Z_1 = |X_1 - X_2|$ 密度関数の分布があります

f_{a} (z_{1}) = \frac{2}{a^{2}} (a - z_{1}), 0 < z_{1} < a .

$f_a(z_1) = \frac{2}{a^2}(a-z_1) ,\quad 0 < z_1 < a \> .$ 対応する分布関数は

F_{a} (z_{1}) = 1 - (1 - z_{1} / a)^{2}

$F_a(z_1) = 1 - (1-z_1/a)^2$ ために

0 \leq z_{1} \leq a

$0 \leq z_1 \leq a$ 。同様に、

Z_{2} = | Y_{1} - Y_{2} |

$Z_2 = |Y_1 - Y_2|$ 密度があります

f_{b} (z_{2})

$f_b(z_2)$ および分布関数

F_{b} (z_{2})

$F_b(z_2)$ 。

以来、 $Z_1$ は2つの関数のみです $X_i$ そして $Z_2$ のみの関数です $Y_i$ 、その後 $Z_1$ そして $Z_2$ 独立しています。したがって、ポイント間の距離は、2つの独立した確率変数（分布が異なる）のユークリッドノルムです。

図の左側のパネルは、 $X_1 - X_2$ 右のパネルは $Z_1 = |X_1 - X_2|$ どこ $a = 5$ この例では。

Triangular densities

いくつかの幾何確率

そう $Z_1$ そして $Z_2$ 独立しており、サポートされています $[0,a]$ そして $[0,b]$ それぞれ。固定用 $d$ 、ユークリッド距離の分布関数は

P (D \leq d) = \iint_{{z_{1}^{2} + z_{2}^{2} \leq d^{2}}} f_{a} (z_{1}) f_{b} (z_{2}) d z_{1} d z_{2} .

$\renewcommand{\Pr}{\mathbb P}\newcommand{\rd}{\,\mathrm{d}} \Pr(D \leq d) = \iint_{\{z_1^2+z_2^2 \leq d^2\}} f_a(z_1) f_b(z_2) \rd z_1 \rd z_2 \> .$

これは幾何学的に長方形に分布があると考えることができます $[0,a] \times [0,b]$ 半径の四分の一円を考える $d$ 。これら2つの領域の交点の内側にある確率を知りたいのですが。考慮すべき3つの異なる可能性があります。

地域1（オレンジ）： $0 \leq d < b$ 。ここでは、四分円は完全に長方形の中にあります。

リージョン2（赤）： $b \leq d \leq a$ 。ここで、1/4円は上端と下端に沿って長方形と交差しています。

リージョン3（青）： $a < d \leq \sqrt{a^2 + b^2}$ 。四分円は上端と右端に沿って長方形と交差します。

これは、3つのタイプそれぞれの半径の例を描いた図です。長方形は、 $a = 5$ 、 $b = 4$ 。長方形内のグレースケールヒートマップは密度を示します $f_a(z_1) f_b(z_2) \rd z_1 \rd z_2$ 暗い領域は密度が高く、明るい領域は密度が小さくなります。図をクリックすると、その拡大版が開きます。

醜い微積分

確率を計算するには、いくつかの計算を行う必要があります。各領域を順に検討してみましょう。共通の積分が発生することがわかります。この積分は閉じた形をしていますが、あまりきれいではありません。

地域1： $0 \leq d < b$ 。

P (D \leq d) = \int_{0}^{d} \int_{0}^{\sqrt{d^{2} - y^{2}}} f_{b} (y) f_{a} (x) d x d y = \int_{0}^{d} f_{b} (y) \int_{0}^{\sqrt{d^{2} - y^{2}}} f_{a} (x) d x d y .

$\newcommand{\radius}{\sqrt{d^2 - y^2}} \Pr(D \leq d) = \int_0^d \int_0^{\radius} f_b(y) f_a(x) \rd x \rd y = \int_0^d f_b(y) \int_0^{\radius} f_a(x) \rd x \rd y \>.$

さて、内部積分は $\frac{1}{a^2}\radius (2 a - \radius)$ 。したがって、次の形式の積分を計算する必要があります

G (c) - G (0) = \int_{0}^{c} (b - y) \sqrt{d^{2} - y^{2}} (2 a - \sqrt{d^{2} - y^{2}}) d y,

$G(c) - G(0) = \int_0^c (b - y) \radius (2a - \radius) \rd y \> ,$ この場合、興味のある場所

c = d

$c = d$ 。被積分関数の反微分は

\begin{aligned} G (y) & = \int (b - y) \sqrt{d^{2} - y^{2}} (2 a - \sqrt{d^{2} - y^{2}}) d y \\ = \frac{a}{3} \sqrt{d^{2} - y^{2}} (y (3 b - 2 y) + 2 d^{2}) \\ + a b d^{2} \tan^{- 1} (\frac{y}{\sqrt{d^{2} - y^{2}}}) - b d^{2} y \\ + \frac{b y^{3}}{3} + \frac{(d y)^{2}}{2} - \frac{y^{4}}{4} . \end{aligned}

$\begin{align*} G(y) &= \int (b - y) \radius (2a - \radius) \rd y \\ &= \frac{a}{3} \radius ( y (3 b - 2 y) + 2 d^2) \\ &\quad + \,a b d^2 \tan^{-1}\Big(\frac{y}{{\scriptstyle \radius}}\Big) - b d^2 y \\ &\quad + \,\frac{b y^3}{3} + \frac{(d y)^2}{2} - \frac{y^4}{4} \> . \end{align*}$

From this we get that $\Pr(D \leq d) = \frac{2}{a^2 b^2} (G(d) - G(0))$ .

Region 2: $b \leq d \leq a$ .

P (D \leq d) = \frac{2}{a^{2} b^{2}} (G (b) - G (0)),

$\Pr(D \leq d) = \frac{2}{a^2 b^2} (G(b) - G(0)) \>,$ by the same reasoning as for Region 1, except now we must integrate along the

y

$y$ -axis all the way up to

b

$b$ instead of just

d

$d$ .

Region 3: $a < d \leq \sqrt{a^2 + b^2}$ .

\begin{aligned} P (D \leq d) & = \int_{0}^{\sqrt{d^{2} - a^{2}}} f_{b} (y) d y + \int_{\sqrt{d^{2} - a^{2}}}^{b} f_{b} (y) \int_{0}^{\sqrt{d^{2} - y^{2}}} f_{a} (x) d x d y \\ = F_{b} (\sqrt{d^{2} - a^{2}}) + \frac{2}{a^{2} b^{2}} (G (b) - G (\sqrt{d^{2} - a^{2}})) \end{aligned}

$\begin{align*} \Pr(D \leq d) &= \int_0^\sqrt{d^2-a^2} f_b(y)\rd y + \int_{\sqrt{d^2-a^2}}^b f_b(y) \int_{0}^\radius f_a(x) \rd x \rd y \\ &= F_b(\sqrt{d^2-a^2}) + \frac{2}{a^2 b^2} (G(b) - G(\sqrt{d^2-a^2})) \end{align*}$

Below is a simulation of 20000 points where we plot the empirical distribution as grey points and the theoretical distribution as a line, colored according to the particular region that applies.

Empirical cdf and theoretical

From the same simulation, below we plot the first 100 pairs of points and draw lines between them. Each is colored according to the distance between the pair of points and which region this distance falls into.

Random sample of points

The expected number of pairs of points within distance $d$ is simply

E [ξ] = (\binom{n}{2}) P (D \leq d),

$\mathbb E[\xi] = {n \choose 2} \Pr(D \leq d) \>,$ by linearity of expectation.

— cardinal
ソース

+1. Nice work! It would be wonderful to see the answer expressed in terms of intrinsic geometric properties of the rectangle: it ought to depend on things like its area, perimeter, and configuration of the four angles. (The literature--which I have seen referenced but have not had access to--appears to focus on domains with smooth boundaries.)

— whuber

Thanks. That's an excellent suggestion. I'll try to make such simplifications and reformulation.

— cardinal

@cardinal Very nice work! I was surprised that you thoroughly answered the problem even with the detailed cdf. Thanks.

— zhouzhuojie

If the points are truly uniformly distributed, i.e. in a fixed known pattern, then for any distance d, you can simply loop over all pairs and count the ones within the distance. Your probability is (that number / n).

If you have the additional freedom to pick how the n points are distributed/picked, then this is the rectangular version of the Bertrand paradox. That page shows a number of ways of answering this question based on how you distribute your points.

— cape1232
ソース

The question asks about the distribution for iid uniformly distributed points: these are random variables, not any "fixed known pattern," and one cannot just loop over pairs of them!

— whuber

I think you may have misunderstood the OP's question. Also, the desired distribution is unambiguously defined in the question. My comment to the OP hints that there is already a solution on the SE network to this question, hence this one can most likely be closed. :)

— cardinal

Are you sure there's a solution on math.SE, cardinal? This is a difficult problem due to the edge effects. Maybe there's a solution on the flat torus.

— whuber

@whuber: A solution? No. But, I'm almost positive this question appears. :) I'll see if I can find it. At any rate, I'm not sure this problem is so difficult, even in this case. I believe you can use translation invariance to simplify it somewhat. But, I haven't worked out the details.

— cardinal

@cardinal Thanks. Actually I went through all the questions on Math.SE, but I still could not find some close to this problem.

— zhouzhuojie