相関確率のベクトルがある場合。相関関係を壊すことなく、それらをバイナリに変換するにはどうすればよいですか?


8

私の最終的な目標は、相関するベルヌーイ確率変数のサイズのベクトルを生成する方法を持つことができるようにすることです。これを行う1つの方法は、ガウスクープラアプローチを使用することです。ただし、ガウシアンクープラアプローチでは、ベクトルが残ります。N

(p1,,pN)[0,1]N

Suppose that I have generated (p1,,pN) such that the common correlation between them is ρ. Now, how can I transform these into a new vector of 0 or 1's? In other words, I would like:

(X1,,XN){0,1}N

but with the same correlation ρ.

One approach I thought of was to assign a hard cutoff rule such that if pi<0.5, then let Xi=0 and if pi0.5, then let Xi=1.

This seems to work well in simulations in that it retains the correlation structure but it is very arbitrary to me what cutoff value should be chosen aside from 0.5.

Another way is to treat each Xi as a Bernoulli random variable with success probability pi and sample from it. However this approach seems to cause loss of correlation and instead of ρ, I may get ρ2 or ρ3.

Does anyone have any thoughts or inputs into this? Thank you.


3
You have N variables. Why are you speaking of just single rho and not of a matrix of rhos?
ttnphns

回答:


3

I don't understand Gaussian Copula enough to know what is the problem. But I found a way to generate correlated Bernoulli vectors.

Following https://mathoverflow.net/a/19436/105908 if we take a set of fixed vectors v1...vn and a random vector on the unit sphere u, we can transform u into binary X where Xi=(uvi>0). In this setup, cor(Xi,Xj)=π2θ(i,j)π where θ(i,j) is the angle between vi and vj.

How to find suitable matrix V=|v1...vn| to produce a desired correlation matrix R? The angle condition translates to VVT=cos(πRπ2) and thus we can find V with Cholesky decomposition.

An example code in R follows:

#Get a simple correlation matrix 
N = 3
cor_matrix <- matrix(c(1,0.5,0.8,0.5,1,0.3,0.8,0.3,1), N, N)

#Calculate the vectors with desired angles
vector_matrix <- chol(cos( (pi * cor_matrix - pi) * -0.5))

#You can generate random unit vectors by normalizing a vector 
#of normally distributed variables, note however that the normalization
#does not affect the sign of the dot product and so we ignore it
num_samples <- 10000
normal_rand <- matrix(rnorm(num_samples * N), num_samples, N)

#Generate the target variables
B <- (normal_rand %*% vector_matrix) > 0

#See for yourself that it works
cor(B)  
cor(B) - cor_matrix 

Thanks @jakub-bartczuk for linking to the MO question - I wouldn't find that on my own.


The above code has one big limitation: the marginal distributions are fixed at XiBernoulli(0.5). I am currently unaware of how to extend this approach to fit both correlations and marginal distributions. Another answer has an approach for the general case, but it looses a lot of simplicity (it involves numerical integration). There is also paper called Generating Spike Trains with Specified Correlation Coefficients and accompanying Matlab package where the sampling involves "only" finding numerically the unique zero of a monotonic function by bisection.


Thank you, this is great! Can I ask how you got that the angle condition is VVT=cos(πRπ2)? Thanks!
user321627

1
@user321627 You start with Ri,j=π2θ(i,j)π and the relation of dot product to angle θ(i,j)=arccos(vi.vj|vi|.|vj|) From there it is relatively simple linear algebra I am too lazy to write down on computer :-)
Martin Modrák
弊社のサイトを使用することにより、あなたは弊社のクッキーポリシーおよびプライバシーポリシーを読み、理解したものとみなされます。
Licensed under cc by-sa 3.0 with attribution required.