相関確率のベクトルがある場合。相関関係を壊すことなく、それらをバイナリに変換するにはどうすればよいですか？

私の最終的な目標は、相関するベルヌーイ確率変数のサイズのベクトルを生成する方法を持つことができるようにすることです。これを行う1つの方法は、ガウスクープラアプローチを使用することです。ただし、ガウシアンクープラアプローチでは、ベクトルが残ります。 $N$

(p_{1}, \dots, p_{N}) \in [0, 1]^{N}

$(p_1, \ldots, p_N) \in [0,1]^N$

Suppose that I have generated $(p_1, \ldots, p_N)$ such that the common correlation between them is $\rho$ . Now, how can I transform these into a new vector of $0$ or $1$ 's? In other words, I would like:

(X_{1}, \dots, X_{N}) \in {0, 1}^{N}

$(X_1, \ldots, X_N) \in \{0,1\}^N$

but with the same correlation $\rho$ .

One approach I thought of was to assign a hard cutoff rule such that if $p_i < 0.5$ , then let $X_i = 0$ and if $p_i \geq 0.5$ , then let $X_i = 1$ .

This seems to work well in simulations in that it retains the correlation structure but it is very arbitrary to me what cutoff value should be chosen aside from $0.5$ .

Another way is to treat each $X_i$ as a Bernoulli random variable with success probability $p_i$ and sample from it. However this approach seems to cause loss of correlation and instead of $\rho$ , I may get $\frac{\rho}{2}$ or $\frac{\rho}{3}$ .

Does anyone have any thoughts or inputs into this? Thank you.

— user321627
ソース

You have N variables. Why are you speaking of just single rho and not of a matrix of rhos?

— ttnphns

See this mathoverflow question

— Jakub Bartczuk

I don't understand Gaussian Copula enough to know what is the problem. But I found a way to generate correlated Bernoulli vectors.

Following https://mathoverflow.net/a/19436/105908 if we take a set of fixed vectors $v_1 ... v_n$ and a random vector on the unit sphere $u$ , we can transform $u$ into binary $X$ where $X_i = (u \cdot v_i > 0)$ . In this setup, $cor(X_i,X_j) = \frac{\pi - 2 * \theta(i,j)}{\pi}$ where $\theta(i,j)$ is the angle between $v_i$ and $v_j$ .

How to find suitable matrix $V = |v_1 ... v_n|$ to produce a desired correlation matrix $R$ ? The angle condition translates to $VV^T = cos(-\frac{\pi R - \pi}{2})$ and thus we can find $V$ with Cholesky decomposition.

An example code in R follows:

#Get a simple correlation matrix 
N = 3
cor_matrix <- matrix(c(1,0.5,0.8,0.5,1,0.3,0.8,0.3,1), N, N)

#Calculate the vectors with desired angles
vector_matrix <- chol(cos( (pi * cor_matrix - pi) * -0.5))

#You can generate random unit vectors by normalizing a vector 
#of normally distributed variables, note however that the normalization
#does not affect the sign of the dot product and so we ignore it
num_samples <- 10000
normal_rand <- matrix(rnorm(num_samples * N), num_samples, N)

#Generate the target variables
B <- (normal_rand %*% vector_matrix) > 0

#See for yourself that it works
cor(B)  
cor(B) - cor_matrix

Thanks @jakub-bartczuk for linking to the MO question - I wouldn't find that on my own.

The above code has one big limitation: the marginal distributions are fixed at $X_i \sim Bernoulli(0.5)$ . I am currently unaware of how to extend this approach to fit both correlations and marginal distributions. Another answer has an approach for the general case, but it looses a lot of simplicity (it involves numerical integration). There is also paper called Generating Spike Trains with Specified Correlation Coefficients and accompanying Matlab package where the sampling involves "only" finding numerically the unique zero of a monotonic function by bisection.

— Martin Modrák
ソース

Thank you, this is great! Can I ask how you got that the angle condition is

V V^{T} = c o s (- \frac{π R - π}{2})

$VV^T = cos(-\frac{\pi R - \pi}{2})$ ? Thanks!

— user321627

@user321627 You start with

R_{i, j} = \frac{π - 2 * θ (i, j)}{π}

$R_{i,j} = \frac{\pi - 2 * \theta(i,j)}{\pi}$ and the relation of dot product to angle

θ (i, j) = a r c c o s (\frac{v_{i} . v_{j}}{| v_{i} | . | v_{j} |})

$\theta(i,j) = arccos(\frac{v_i . v_j}{|v_i|.|v_j|})$ From there it is relatively simple linear algebra I am too lazy to write down on computer :-)

— Martin Modrák