設計行列Xに関する完全な共線性の例は何ですかですか?
私は、例えばたいβ =を(X ' X )- 1 X ' Yがので、推定することができない(X ' Xが)可逆ではありません。
設計行列Xに関する完全な共線性の例は何ですかですか?
私は、例えばたいβ =を(X ' X )- 1 X ' Yがので、推定することができない(X ' Xが)可逆ではありません。
回答:
以下に、3つの変数、、があり、方程式によって関連付けられている例を示します
どこ
特定のデータは
y x1 x2
1 4.520866 1 2
2 6.849811 2 4
3 6.539804 3 6
したがって、が倍数であることは明らかであるため、完全な共線性があります。
モデルを次のように書くことができます
どこ:
だから私たちは
次に、行列式を計算します。
Rでは、これを次のように表示できます。
> x1 <- c(1,2,3)
create x2
の倍数x1
> x2 <- x1*2
y、の線形結合x1
、x2
およびランダム性を作成します
> y <- x1 + x2 + rnorm(3,0,1)
それを観察する
> summary(m0 <- lm(y~x1+x2))
x2
係数の値を推定できません。
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.9512 1.6457 2.401 0.251
x1 1.0095 0.7618 1.325 0.412
x2 NA NA NA NA
Residual standard error: 0.02583 on 1 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 0.9999
F-statistic: 2.981e+04 on 1 and 1 DF, p-value: 0.003687
モデル行列は次のとおりです。
> (X <- model.matrix(m0))
(Intercept) x1 x2
1 1 1 2
2 1 2 4
3 1 3 6
だから、 IS
> (XXdash <- X %*% t(X))
1 2 3
1 6 11 16
2 11 21 31
3 16 31 46
によって示されるように、それは可逆的ではありません
> solve(XXdash)
Error in solve.default(XXdash) :
Lapack routine dgesv: system is exactly singular: U[3,3] = 0
または:
det(XXdash)[1] 0
完全な多重共線性を生成するかなり一般的なシナリオ、つまり、設計マトリックスの列が線形に依存する状況を次に示します。線形代数から、これはゼロに等しい設計行列の列の線形結合(係数がすべてゼロではない)があることを思い出してください。この落とし穴が頻繁に発生する理由を説明するのに役立つ実用的な例をいくつか紹介しました。ほとんどすべてに遭遇しました。
インターセプト項があるかどうかに関係なく、ある変数は別の変数の倍数です:異なる単位を使用して同じ変数を2回記録したため(たとえば、「センチメートル単位の長さ」が「メートル単位の長さ」の100倍)分母が固定されている場合、変数を生の数値として一度、比率またはパーセンテージとして一度記録しました(たとえば、「ペトリ皿のコロニー化面積」と「ペトリ皿のコロニー化率」は、面積が各ペトリ皿の同じです)。私たちは、共線性を持っているのであればところワットと、xは変数(デザイン行列の列)であり、aははスカラー定数であり、はゼロに等しい変数の線形結合です。
切片項及び定数によって別の一つの変数は異なるがあります:あなたは(変数をセンタリングする場合は、この現象が発生します)との両方を含む生のxおよび中心wはあなたの回帰インチ あなたの変数が一定で、例えばによって異なる異なる単位系で測定されている場合にも起こるのだろうかのwは「ケルビン温度」とあるX、その後、「°Cの温度」として、W I = X I + 273.15。切片項を常に1(1の列として表される変数)と見なす場合、設計行列の 1 n)で、定数kにwi=xi+kがある場合、1( → w)−1( → x)−k( → 1 n)はwの線形結合であることを意味し、ゼロに等しい設計行列のxおよび1列。
There is an intercept term and one variable is given by an affine transformation of another: i.e. you have variables and , related by where and are constants. For instance this happens if you standardize a variable as and include both raw and standardized variables in your regression. It also happens if you record as "temperature in °F" and as "temperature in °C", since those unit systems do not share a common zero but are related by . Or in a business context, suppose there is fixed cost (e.g. covering delivery) for each order, as well as a cost per unit sold; then if is the cost of order and is the number of units ordered, we have . The linear combination of interest is . Note that if , then (3) includes (2) as a special case; if , then (3) includes (1) as a special case.
There is an intercept term and the sum of several variables is fixed (e.g. in the famous "dummy variable trap"): for example if you have "percentage of satisfied customers", "percentage of dissatisfied customers" and "percentage of customers neither satisfied nor dissatisfied" then these three variables will always (barring rounding error) sum to 100. One of these variables — or alternatively, the intercept term — needs to be dropped from the regression to prevent collinearity. The "dummy variable trap" occurs when you use indicator variables (more commonly but less usefully called "dummies") for every possible level of a categorical variable. For instance, suppose vases are produced in red, green or blue color schemes. If you recorded the categorical variable "color" by three indicator variables (red
, green
and blue
would be binary variables, stored as 1
for "yes" and 0
for "no") then for each vase only one of the variables would be a one, and hence red + green + blue = 1
. Since there is a vector of ones for the intercept term, the linear combination 1(red) + 1(green) + 1(blue) - 1(1) = 0
. The usual remedy here is either to drop the intercept, or drop one of the indicators (e.g. leave out red
) which becomes a baseline or reference level. In this case, the regression coefficient for green
would indicate the change in the mean response associated with switching from a red vase to a green one, holding other explanatory variables constant.
There are at least two subsets of variables, each having a fixed sum, regardless of whether there is an intercept term: suppose the vases in (4) were produced in three sizes, and the categorical variable for size was stored as three additional indicator variables. We would havelarge + medium + small = 1
. Then we have the linear combination 1(large) + 1(medium) + 1(small) - 1(red) - 1(green) - 1(blue) = 0
, even when there is no intercept term. The two subsets need not share the same sum, e.g. if we have explanatory variables such that every and .
One variable is defined as a linear combination of several other variables: for instance, if you record the length , width and perimeter of each rectangle, then so we have the linear combination . An example with an intercept term: suppose a mail-order business has two product lines, and we record that order consisted of of the first product at unit cost and of the second at unit cost , with fixed delivery charge . If we also include the order cost as an explanatory variable, then and so . This is an obvious generalization of (3). It also gives us a different way of thinking about (4): once we know all bar one of the subset of variables whose sum is fixed, then the remaining one is their complement so can be expressed as a linear combination of them and their sum. If we know 50% of customers were satisfied and 20% were dissatisfied, then 100% - 50% - 20% = 30% must be neither satisfied nor dissatisfied; if we know the vase is not red (red=0
) and it is green (green=1
) then we know it is not blue (blue = 1(1) - 1(red) - 1(green) = 1 - 0 - 1 = 0
).
One variable is constant and zero, regardless of whether there is an intercept term: in an observational study, a variable will be constant if your sample does not exhibit sufficient (any!) variation. There may be variation in the population that is not captured in your sample, e.g. if there is a very common modal value: perhaps your sample size is too small and was therefore unlikely to include any values that differed from the mode, or your measurements were insufficiently accurate to detect small variations from the mode. Alternatively, there may be theoretical reasons for the lack of variation, particularly if you are studying a sub-population. In a study of new-build properties in Los Angeles, it would not be surprising that every data point has AgeOfProperty = 0
and State = California
! In an experimental study, you may have measured an independent variable that is under experimental control. Should one of your explanatory variables be both constant and zero, then we have immediately that the linear combination (with coefficient zero for any other variables) is .
There is an intercept term and at least one variable is constant: if is constant so that each , then the linear combination .
At least two variables are constant, regardless of whether there is an intercept term: if each and , then the linear combination .
Number of columns of design matrix, , exceeds number of rows, : even when there is no conceptual relationship between your variables, it is mathematically necessitated that the columns of your design matrix will be linearly dependent when . It simply isn't possible to have linearly independent vectors in a space with a number of dimensions lower than : for instance, while you can draw two independent vectors on a sheet of paper (a two-dimensional plane, ) any further vector drawn on the page must lie within their span, and hence be a linear combination of them. Note that an intercept term contributes a column of ones to the design matrix, so counts as one of your columns. (This scenario is often called the "large , small " problem: see also this related CV question.)
Data examples with R code
Each example gives a design matrix , the matrix (note this is always square and symmetrical) and . Note that if is singular (zero determinant, hence not invertible) then we cannot estimate . The condition that be non-singular is equivalent to the condition that has full rank so its columns are linearly independent: see this Math SE question, or this one and its converse.
(1) One column is multiple of another
# x2 = 2 * x1
# Note no intercept term (column of 1s) is needed
X <- matrix(c(2, 4, 1, 2, 3, 6, 2, 4), ncol = 2, byrow=TRUE)
X
# [,1] [,2]
#[1,] 2 4
#[2,] 1 2
#[3,] 3 6
#[4,] 2 4
t(X) %*% X
# [,1] [,2]
#[1,] 18 36
#[2,] 36 72
round(det(t(X) %*% X), digits = 9)
#0
(2) Intercept term and one variable differs from another by constant
# x1 represents intercept term
# x3 = x2 + 2
X <- matrix(c(1, 2, 4, 1, 1, 3, 1, 3, 5, 1, 0, 2), ncol = 3, byrow=TRUE)
X
# [,1] [,2] [,3]
#[1,] 1 2 4
#[2,] 1 1 3
#[3,] 1 3 5
#[4,] 1 0 2
t(X) %*% X
# [,1] [,2] [,3]
#[1,] 4 6 14
#[2,] 6 14 26
#[3,] 14 26 54
round(det(t(X) %*% X), digits = 9)
#0
# NB if we drop the intercept, cols now linearly independent
# x2 = x1 + 2 with no intercept column
X <- matrix(c(2, 4, 1, 3, 3, 5, 0, 2), ncol = 2, byrow=TRUE)
X
# [,1] [,2]
#[1,] 2 4
#[2,] 1 3
#[3,] 3 5
#[4,] 0 2
t(X) %*% X
# [,1] [,2]
#[1,] 14 26
#[2,] 26 54
# Can you see how this matrix is related to the previous one, and why?
round(det(t(X) %*% X), digits = 9)
#80
# Non-zero determinant so X'X is invertible
(3) Intercept term and one variable is affine transformation of another
# x1 represents intercept term
# x3 = 2*x2 - 3
X <- matrix(c(1, 2, 1, 1, 1, -1, 1, 3, 3, 1, 0, -3), ncol = 3, byrow=TRUE)
X
# [,1] [,2] [,3]
#[1,] 1 2 1
#[2,] 1 1 -1
#[3,] 1 3 3
#[4,] 1 0 -3
t(X) %*% X
# [,1] [,2] [,3]
#[1,] 4 6 0
#[2,] 6 14 10
#[3,] 0 10 20
round(det(t(X) %*% X), digits = 9)
#0
# NB if we drop the intercept, cols now linearly independent
# x2 = 2*x1 - 3 with no intercept column
X <- matrix(c(2, 1, 1, -1, 3, 3, 0, -3), ncol = 2, byrow=TRUE)
X
# [,1] [,2]
#[1,] 2 1
#[2,] 1 -1
#[3,] 3 3
#[4,] 0 -3
t(X) %*% X
# [,1] [,2]
#[1,] 14 10
#[2,] 10 20
# Can you see how this matrix is related to the previous one, and why?
round(det(t(X) %*% X), digits = 9)
#180
# Non-zero determinant so X'X is invertible
(4) Intercept term and sum of several variables is fixed
# x1 represents intercept term
# x2 + x3 = 10
X <- matrix(c(1, 2, 8, 1, 1, 9, 1, 3, 7, 1, 0, 10), ncol = 3, byrow=TRUE)
X
# [,1] [,2] [,3]
#[1,] 1 2 8
#[2,] 1 1 9
#[3,] 1 3 7
#[4,] 1 0 10
t(X) %*% X
# [,1] [,2] [,3]
#[1,] 4 6 34
#[2,] 6 14 46
#[3,] 34 46 294
round(det(t(X) %*% X), digits = 9)
#0
# NB if we drop the intercept, then columns now linearly independent
# x1 + x2 = 10 with no intercept column
X <- matrix(c(2, 8, 1, 9, 3, 7, 0, 10), ncol = 2, byrow=TRUE)
X
# [,1] [,2]
#[1,] 2 8
#[2,] 1 9
#[3,] 3 7
#[4,] 0 10
t(X) %*% X
# [,1] [,2]
#[1,] 14 46
#[2,] 46 294
# Can you see how this matrix is related to the previous one, and why?
round(det(t(X) %*% X), digits = 9)
#2000
# Non-zero determinant so X'X is invertible
(4a) Intercept term with dummy variable trap
# x1 represents intercept term
# x2 + x3 + x4 = 1
X <- matrix(c(1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0), ncol = 4, byrow=TRUE)
X
# [,1] [,2] [,3] [,4]
#[1,] 1 0 0 1
#[2,] 1 1 0 0
#[3,] 1 0 1 0
#[4,] 1 1 0 0
#[5,] 1 0 1 0
t(X) %*% X
# [,1] [,2] [,3] [,4]
#[1,] 5 2 2 1
#[2,] 2 2 0 0
#[3,] 2 0 2 0
#[4,] 1 0 0 1
# This matrix has a very natural interpretation - can you work it out?
round(det(t(X) %*% X), digits = 9)
#0
# NB if we drop the intercept, then columns now linearly independent
# x1 + x2 + x3 = 1 with no intercept column
X <- matrix(c(0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0), ncol = 3, byrow=TRUE)
X
# [,1] [,2] [,3]
#[1,] 0 0 1
#[2,] 1 0 0
#[3,] 0 1 0
#[4,] 1 0 0
#[5,] 0 1 0
t(X) %*% X
# [,1] [,2] [,3]
#[1,] 2 0 0
#[2,] 0 2 0
#[3,] 0 0 1
# Can you see how this matrix is related to the previous one?
round(det(t(X) %*% X), digits = 9)
#4
# Non-zero determinant so X'X is invertible
(5) Two subsets of variables with fixed sum
# No intercept term needed
# x1 + x2 = 1
# x3 + x4 = 1
X <- matrix(c(0,1,0,1,1,0,0,1,0,1,1,0,1,0,0,1,1,0,1,0,0,1,1,0), ncol = 4, byrow=TRUE)
X
# [,1] [,2] [,3] [,4]
#[1,] 0 1 0 1
#[2,] 1 0 0 1
#[3,] 0 1 1 0
#[4,] 1 0 0 1
#[5,] 1 0 1 0
#[6,] 0 1 1 0
t(X) %*% X
# [,1] [,2] [,3] [,4]
#[1,] 3 0 1 2
#[2,] 0 3 2 1
#[3,] 1 2 3 0
#[4,] 2 1 0 3
# This matrix has a very natural interpretation - can you work it out?
round(det(t(X) %*% X), digits = 9)
#0
(6) One variable is linear combination of others
# No intercept term
# x3 = x1 + 2*x2
X <- matrix(c(1,1,3,0,2,4,2,1,4,3,1,5,1,2,5), ncol = 3, byrow=TRUE)
X
# [,1] [,2] [,3]
#[1,] 1 1 3
#[2,] 0 2 4
#[3,] 2 1 4
#[4,] 3 1 5
#[5,] 1 2 5
t(X) %*% X
# [,1] [,2] [,3]
#[1,] 15 8 31
#[2,] 8 11 30
#[3,] 31 30 91
round(det(t(X) %*% X), digits = 9)
#0
(7) One variable is constant and zero
# No intercept term
# x3 = 0
X <- matrix(c(1,1,0,0,2,0,2,1,0,3,1,0,1,2,0), ncol = 3, byrow=TRUE)
X
# [,1] [,2] [,3]
#[1,] 1 1 0
#[2,] 0 2 0
#[3,] 2 1 0
#[4,] 3 1 0
#[5,] 1 2 0
t(X) %*% X
# [,1] [,2] [,3]
#[1,] 15 8 0
#[2,] 8 11 0
#[3,] 0 0 0
round(det(t(X) %*% X), digits = 9)
#0
(8) Intercept term and one constant variable
# x1 is intercept term, x3 = 5
X <- matrix(c(1,1,5,1,2,5,1,1,5,1,1,5,1,2,5), ncol = 3, byrow=TRUE)
X
# [,1] [,2] [,3]
#[1,] 1 1 5
#[2,] 1 2 5
#[3,] 1 1 5
#[4,] 1 1 5
#[5,] 1 2 5
t(X) %*% X
# [,1] [,2] [,3]
#[1,] 5 7 25
#[2,] 7 11 35
#[3,] 25 35 125
round(det(t(X) %*% X), digits = 9)
#0
(9) Two constant variables
# No intercept term, x2 = 2, x3 = 5
X <- matrix(c(1,2,5,2,2,5,1,2,5,1,2,5,2,2,5), ncol = 3, byrow=TRUE)
X
# [,1] [,2] [,3]
#[1,] 1 2 5
#[2,] 2 2 5
#[3,] 1 2 5
#[4,] 1 2 5
#[5,] 2 2 5
t(X) %*% X
# [,1] [,2] [,3]
#[1,] 11 14 35
#[2,] 14 20 50
#[3,] 35 50 125
round(det(t(X) %*% X), digits = 9)
#0
(10)
# Design matrix has 4 columns but only 3 rows
X <- matrix(c(1,1,1,1,1,2,4,8,1,3,9,27), ncol = 4, byrow=TRUE)
X
# [,1] [,2] [,3] [,4]
#[1,] 1 1 1 1
#[2,] 1 2 4 8
#[3,] 1 3 9 27
t(X) %*% X
# [,1] [,2] [,3] [,4]
#[1,] 3 6 14 36
#[2,] 6 14 36 98
#[3,] 14 36 98 276
#[4,] 36 98 276 794
round(det(t(X) %*% X), digits = 9)
#0
There a multitude of ways such that one column of data will be a linear function of your other data. Some of them are obvious (eg. meters vs. centimeters) while others can be more subtle (eg. age and years of schooling for younger children).
Notational notes: Let denote the first column of , the second column etc..., and denotes a vector of ones, which is what's included in the design matrix X if you include a constant in your regression.