完全な列ランク未満の制限付き最尤法


14

この質問は、線形モデルの特定のバージョンにおける制限付き最尤法(REML)の推定を扱っています。

Y=X(α)β+ϵ,ϵNn(0,Σ(α)),

ここで、は、と同様に、でパラメーター化された()行列です。は迷惑パラメーターの未知のベクトルです。関心が推定である、私たちは持っている。最尤法によるモデルの推定は問題ありませんが、REMLを使用したいと思います。これはよく知られており、例えば、参照LaMotteを、尤度その、なるよう任意の半直交行列である書くことができます。X(α)n×pαRkΣ(α)βαkpnAYAAX=0

LREML(αY)|XX|1/2|Σ|1/2|XΣ1X|1/2exp{12rΣ1r},r=(IX(XΣ1X)+XΣ1)Y,

when X is full column rank.

My problem is that for some perfectly reasonable, and scientifically interesting, α the matrix X(α) is not of full column rank. All the derivations I have seen of the restricted likelihood above makes use of determinant equalities that are not applicable when |XX|=0, i.e. they assume full column rank of X. This means that the above restricted likelihood is only correct for my setting on parts of the parameter space, and thus is not what I want to optimize.

Question: Are there more general restricted likelihoods, derived, in the statistical literature or elsewhere, without the assumption that X be full column rank? If so, what do they look like?

Some observations:

  • Deriving the exponential part is no problem for any X(α) and it may be written in terms of the Moore-Penrose inverse as above
  • The columns of A are an (any) orthonormal basis for C(X)
  • For known A, the likelihood for AY can easily be written down for every α, but of course the number of basis vectors, i.e. columns, in A depends on the column rank of X

If anyone interested in this question believes the exact parameterization of X,Σ would help, let me know and I'll write them down. At this point though, I'm mostly interested in a REML for a general X of the correct dimensions.


A more detailed description of the model follows here. Let yt=μ+Ayt1+vt,t=1,,T be an r-dimensional first order Vector Autoregression [VAR(1)] where vtiidN(0,Ω). Suppose the process is started in some fixed value y0 at time t=0.

Define Y=[y1,,yT]. The model may be written in the linear model form Y=Xβ+ε using the following definitions and notation:

X=[1TIr,C1B]β=[μ,y0μ]var(ε)1=C(ITΩ1)CC=[Ir00AIr00AIr]B=e1,TA,

where 1T denotes a Tdimensional vector of ones and e1,T the first standard basis vector of RT.

Denote α=vec(A). Notice that if A is not full rank then X(α) is not full column rank. This includes, for example, cases where one of the components of yt does not depend on the past.

The idea of estimating VARs using REML is well known in, for example, the predictive regressions literature (see e.g. Phillips and Chen and the references therein.)

It may be worthwhile to clarify that the matrix X is not a design matrix in the usual sense, it just falls out of the model and unless there is a priori knowledge about A there is, as far as I can tell, no way to reparameterize it to be full rank.


I have posted a question on math.stackexchange that is related to this one in the sense that an answer to the math question may help in deriving a likelihood that would answer this question.


1
Maybe one way to address the question is to ask, what happens in linear mixed models when the model matrix is not full column rank?
Greenparker

Thanks for the bounty @Greenparker. And, yes, if a restricted likelihood could be written down for a linear mixed model, with less than full column rank fixed effects design matrix, that would help.
ekvall

回答:


2

Deriving the exponential part is no problem for any X(α)X(α) and it may be written in terms of the Moore-Penrose inverse as above

I have doubt that this observation is correct. The generalized inverse actually put additional linear restriction on your estimators[Rao&Mitra], therefore we should consider the joint likelihood as a whole instead of guessing "Moore-Penrose inverse will work for exponential part". This seems formally correct yet you probably do not understand mixed model correctly.

(1)How to think mixed effect models correctly?

You have to think mixed effect model in a different way before you try to plug the g-inverse(OR Moore-Penrose inverse, which is a special kind of reflexive g-inverse [Rao&Mitra]) mechanically into the formula given by RMLE(Restricted Maximum Likelihood Estimator, same below.).

X=(fixedeffectrandomeffect)

A common way of thinking mixed effect is that the random effect part in the design matrix is introduced by measurement error, which bears another name of "stochastic predictor" if we care more about prediction rather than estimation. This is also one historical motivation of study of stochastic matrix in setting of statistics.

My problem is that for some perfectly reasonable, and scientifically interesting, αα the matrix X(α)X(α) is not of full column rank.

Given this way of thinking the likelihood, the probability that X(α) is not of full rank is zero. This is because determinant function is continuous in entries of matrix and the normal distribution is a continuous distribution that assigns zero probability to a single point. The probability of defective rank X(α) is positive iff you parameterized it in a pathological way like (ααααrandomeffect).

So the solution to your question is also rather straight forward, you simply perturb your design matrix Xϵ(α)=X(α)+ϵ(I000)(perturb the fixed effect part only), and use the perturbed matrix(which is full rank) to carry out all derivations. Unless your model has complicated hierarchies or X itself is near singular, I do not see there is a serious problem when you take ϵ0 in the final result since determinant function is continuous and we can take the limit inside the determinant function. limϵ0|Xϵ|=|limϵ0Xϵ|. And in perturbation form the inverse of Xϵ can be obtained by Sherman-Morrision-Woodbury Theorem. And the determinant of matrix I+X is given in standard linear algebra book like [Horn&Johnson]. Of course we can write the determinant in terms of each entry of the matrix, but perturbation is always preferred[Horn&Johnson].

(2)How should we deal with nuisance parameters in a model?

As you see, to deal with the random effect part in the model, we should regard it as sort of "nuisance parameter". The problem is: Is RMLE the most appropriate way of eliminating a nuisance parameter? Even in GLM and mixed effect models, RMLE is far from the only choice. [Basu] pointed out that many other ways of eliminating parameters in setting of estimation. Today people tend to choose inbetween RMLE and Bayesian modeling because they correspond to two popular computer based solutions: EM and MCMC respectively.

In my opinion it is definitely more suitable to introduce a prior in the situation of defective rank in the fixed effect part. Or you can reparameterize your model in order to make it into a full rank one.

Further, in case your fixed effect is not of full rank, you might worry above mis-specified covariance structure because the degrees of freedom in fixed effects should have go into the error part. To see this point more clearly, you may want to consider the MLE(also LSE) for the GLS(General least squre) β^=(XΣ1X)1Σ1y where Σ is the covariance structure of the error term, for the case where X(α) is not full rank.

(3)Further comments

The problem is not how you modify the RMLE to make it work in the case that fixed effect part of the matrix is not of full rank; the problem is that in that case your model itself may be problematic if non full-rank case has positive probability.

One relevant case I have encountered is that in the spatial case people may want to reduce the rank of fixed effect part due to computational consideration[Wikle].

I have not seen any "scientifically interesting" case in such situation, can you point out some literature where the non full-rank case is of major concern? I would like to know and discuss further, thanks.

Reference

[Rao&Mitra]Rao, Calyampudi Radhakrishna, and Sujit Kumar Mitra. Generalized inverse of matrices and its applications. Vol. 7. New York: Wiley, 1971.

[Basu]Basu, Debabrata. "On the elimination of nuisance parameters." Journal of the American Statistical Association 72.358 (1977): 355-366.

[Horn&Johnson]Horn, Roger A., and Charles R. Johnson. Matrix analysis. Cambridge university press, 2012.

[Wikle]Wikle, Christopher K. "Low-rank representations for spatial processes." Handbook of Spatial Statistics (2010): 107-118.


Thanks for your interest and very thought through answer, + 1 for effort. I will read it in more detail and come back with some clarifications. I think a first thing that I will have to clarify is that there are no random effects in this model, and the matrix X is not a design matrix at all, except perhaps by name fr lack of a better word; it's a highly non-linear function (deterministic) of the parameter α which consists of (the vectorization of) the coefficient matrix in a vector autoregressive process, so the concept of probability of being low-rank is not meaningful.
ekvall

@Student001 Yes, feel free to make any clarification since I also feel it more like a GLM instead of mixed model. I will try to answer again if I can:)
Henry.L

@Student001 If you can, do write the whole model and I would like to study such case, possibly AR(1) in spatial setting I guess.
Henry.L

"Given this way of thinking the likelihood, the probability that X(α) is not of full rank is zero." Right answer, wrong problem. The probability that it will be numerically not of full rank in finite precision is non-zero.
Mark L. Stone

@MarkL.Stone I already provided perturbation as a solution if you read lines carefully, which is a standard solution to numerical singularity. And the OP said he will update the description, so I guess we will reach some consesus on the correctly formulated problem.
Henry.L
弊社のサイトを使用することにより、あなたは弊社のクッキーポリシーおよびプライバシーポリシーを読み、理解したものとみなされます。
Licensed under cc by-sa 3.0 with attribution required.