回帰分析と分散分析の違いは?


21

現在、回帰分析と分散分析について学んでいます。

回帰分析では、1つの変数が固定されており、変数が他の変数とどのように関係するかを知りたいと考えています。

たとえば、分散分析では、この特定の動物向け食品が動物の体重に影響を与える場合... 1つの固定変数と他の動物への影響について...

それは正しいか間違っていますか、plsは私を助けます...

回答:


25

データセットは、セットで構成されているとのために、あなたがの依存性を見たい上。(xi,yi)i=1,,nyx

値もし見つけ仮定及びのと残差平方和最小にすること 次に、を、任意の(必ずしも観測されていない)値の予測値にします。それは線形回帰です。α^β^αβ

i=1n(yi(α+βxi))2.
y^=α^+β^xyx

ここで、合計平方和 を「説明された」部分と「説明されていない」部分へのの自由度:

i=1n(yiy¯)2where y¯=y1++ynn
n1
i=1n((α^+β^xi)y¯)2explained + i=1n(yi(α^+β^xi))2unexplained.
1 and n2 degrees of freedom, respectively. That's analysis of variance, and one then considers things like F-statistics
F=i=1n((α^+β^xi)y¯)2/1i=1n(yi(α^+β^xi))2/(n2).
This F-statistic tests the null hypothesis β=0.

One often first encounters the term "analysis of variance" when the predictor is categorical, so that you're fitting the model

y=α+βi
where i identifies which category is the value of the predictor. If there are k categories, you'd get k1 degrees of freedom in the numerator in the F-statistic, and usually nk degrees of freedom in the denominator. But the distinction between regression and analysis of variance is still the same for this kind of model.

A couple of additional points:

  • To some mathematicians, the account above may make it appear that the whole field is only what is seen above, so it may seem mysterious that both regression and analysis of variance are active research areas. There is much that won't fit into an answer appropriate for posting here.
  • There is a popular and tempting mistake, which is that it's called "linear" because the graph of y=α+βx is a line. That is false. One of my earlier answers explains why it's still called "linear regression" when you're fitting a polynomial via least squares.

5
@MichaelHardy While the decomposition of variance into components in regression is often referred to as an analysis of variance table. That is not what statisticians commonly mean by ANOVA. The methods 1) linear regression, 2) analysis of variance and 3) analysis of covariance are categories under the general heading of the general linear model, linear regression involves continuous covariates, ANOVA includes discrete groups only and ANCOVA is a combination of continuous covariates and discrete groups.
Michael R. Chernick

1
Informally one sometimes speaks that way, and my answer didn't say that, but one should know that (1) least-squares estimation of coefficients is done in either of the two problems (continuous or categorical predictors) and a decomposition of the sum of squares with their corresponding degrees of freedom---an anova table---is also done in either of the two problems.
Michael Hardy

5
With that concession then you have to conced that there is nothing wrong with my answer. Also the terms ANOVA, ANCOVA and regression are not informal terms. They are very distinctly formal and it is incorrect to tell the OP that ANOVA is the decomposition of variance in regression. The fact that a statistical procedure that someone named anova can do any linear model doesn't prove anything. In SAS proc reg deals only with regression, proc anova deals only with the analysis of variance as I defined it and proc glm is the one that does both.
Michael R. Chernick

1
....and in R, "lm(....)" gives regression coefficients in both situations, and "anova(lm(....))" gives the decomposition of the sum of square and degrees of freedom, in both situations. As far as "have to concede" goes, I've put some further comments below your answer. Certainly if you're going to mention logistic regression, it would be clearer if you said that as soon as you're not talking about linear regression, the word "regression" is a very broad terms that can include many things.
Michael Hardy

@MichaelHardy Feel free to comment on my question raised on the stats.SE site. I think that your answer and my answer to this question are both correct in a way. I certainly object to my answer being downvoted. I wanted to get the opinions of others in the statistics community about this.
Michael R. Chernick

5

The main difference is the response variable. While logistic regression deals with a binary response in linear regression analysis and also nonlinear regression the response variable is continuous. You have a variable(s) (aka covariate(s)) that have a functional relationship to the continuous response variable. In the analysis of variance the response is continuous but belongs to a few different categories (e.g. treatment group and control group). In the analysis of variance you look for difference in the mean response between groups. In linear regression you look at how the response changes as the covariates change. Another way to look at the difference is to say that in regression the covariates are continuous whereas in analysis of variance they are a discrete set of groups.


6
I'd have taken the question to mean the difference between linear regression and analysis of variance; bringing in logistic regression seems to get away from the topic. However, your last sentence is wrong. Analysis of variance can be done regardless of whether the predictors are discrete or continuous.
Michael Hardy

1
There are indeed predictors in the analysis of variance. In your example, the predictor is categorical, but it need not be so. Analysis of variance does not only consider problems involving "discrete groups".
Michael Hardy

3
@MichaelHardy I am taking a step back because when I check my statistical encyclopedias I find reference to the analysis of variance in terms of the decomposition of variance in the general linear model. But the term has two meanings and quite often ANOVA is distinguished from ANCOVA and regression in the way I described. So the OP should be aware of both terms the one that refers to infernece about variance components in the general linear model and the one that refers to the subclass of linear models that involve only discrete groups.
Michael R. Chernick

2
I think of the usage you're using as informal. It seems strange to mention logistic regression without saying it's just one of a variety of "regressions", when that term is used in the broad sense of estimating an average or predicted value of one variable given another, and then distinguishing that from analysis of variance. But the question of the difference between linear regression models and analysis of variance seems like a more sensible question. But there are often uncertainties about what the original poster intended.
Michael Hardy

7
Whatever your intentions might have been, I find the "I have a PhD in statistics,..." commentary to be inappropriate. First of all, it does nothing to resolve the issue at hand. Appealing to authority is an oft-used, but very misguided approach to proving things. Appealing to your own authority is even more problematic. It also can be interpreted as showing (inadvertently or otherwise) a lack of respect for @MichaelHardy (the personal you are addressing), who also happens to have a PhD in statistics from a very reputable program.
cardinal

2

The analysis of variance (ANOVA) is a body of statistical method of analyzing observations assumed to be of the structure

yi=β1xi1+β2xi2++βpxip+ei, i=1(1)n,which are constituted of linear combinations of p unknown quantities β1,β2,,βp plus errors e1,e2,,en and the {xij} are known constant coefficients with the r.v's {ei} are uncorrelated and have the same mean 0 and the variance σ2(unknown).

i.e. E(yn×1)=Xβ,D(y)=σ2In Where D is dispersion matrix or variance-covariance matrix.

,where the coefficients {xij} are the values of counter variables or indicator variables which refer to the presence or absence of the effects {βj} in the conditions under which the observations are taken:{xij} is the number of times βj occurs in the i-th observation,and this is usually 0 or 1.In general,in the analysis of variance all the factors are treated qualitatively.

If the {xij} are values taken on in the observations not by counter variables but by continuous variables like t=time ,T=temperature,t2,eT,etc,then we have a case of *regression analysis.In general,in regression analysis all factors are quantitative and treated quantitatively.

Mainly,these two are two kinds of Analysis.


What does the notation i=1(1)n mean?

1
i=1(1)n means i=1,2,,n
アルガ

-1

In regression analysis you have one variable fixed and you want to know how the variable goes with the other variable.

In analysis of variance you want to know for example: If this specific animal food influences the weight of animals... SO one fixed var and the influence on the others.


1
Hello Aiza, welcome to SE. You need to edit this to give more context and make it clear what the question actually is.
Stop Closing Questions Fast
弊社のサイトを使用することにより、あなたは弊社のクッキーポリシーおよびプライバシーポリシーを読み、理解したものとみなされます。
Licensed under cc by-sa 3.0 with attribution required.