これらの変数のいくつかは、それらの間で強く相関していることを意味します。どのように/なぜ/どのような文脈でそれらを独立変数として定義しますか?
これらの変数のいくつかは、それらの間で強く相関していることを意味します。どのように/なぜ/どのような文脈でそれらを独立変数として定義しますか?
回答:
If we pull back from today's emphasis on machine learning and recall how much of statistical analysis was developed for controlled experimental studies, the phrase "independent variables" makes a good deal of sense.
In controlled experimental studies, the choices of a drug and its concentrations, or the choices of a fertilizer and its amounts per acre, are made independently by the investigator. The interest is in how a response variable of interest (e.g., blood pressure, crop yield) depends on these experimental manipulations. Ideally, the characteristics of the independent variables are tightly specified, with essentially no errors in knowing their values. Then standard linear regression, for example, models the differences among values of dependent variables in terms of the values of the independent variables plus residual errors.
The same mathematical formalism used for regression in the context of controlled experimental studies also can be applied to analysis of observed data sets with little to no experimental manipulation, so it's perhaps not surprising that the phrase "independent variables" has carried over to such types of studies. But, as others on this page note, that's probably an unfortunate choice, with "predictors" or "features" more appropriate in such contexts.
In many ways, "independent variable" is an unfortunate choice. The variables need not be independent of each other, and of course need not be independent of the dependent variable . In teaching and in my book Regression Modeling Strategies I use the word predictor. In some situations that word is not strong enough, but it works well on the average. A full description of the role of the (right hand side) variables in a statistical model might be too long to use each time: the set of variables or measurements upon which the distribution of is conditioned. This is another way of saying the set of variables whose distributions we are currently not interested in, but whose values we treat as constants.
I agree with the other answers here that "independent" and "dependent" is poor terminology. As EdM explains, this terminology arose in the context of controlled experiments where the researcher could set the regressors independently of each other. There are many preferable terms that do not have this loaded causal connotation, and in my experience, statisticians tend to prefer the more neutral terms. There are many other terms used here, including the following:
Personally, I use the terms explanatory variables, and response variable, since those terms have no connotation of statistical independence or control, etc. (One might argue that 'response' has a causal connotation, but this is a fairly weak connotation, so I have not found it problematic.)
To add to Frank Harrell's and Peter Flom's answers:
I agree that calling a variable "independent" or "dependent" is often misleading. But some people still do that. I once heard an answer why:
In regression analysis we have one "special" variable (usually denoted by ) and many "not-so-special" variables ('s) and we want to see how changes in 's affect . In other words, we want to see how depends on 's.
That is why is called "dependent". And if one is called "dependent" how would you call another one?
"Dependent" and "independent" can be confusing terms. One sense is pseudo-causal or even causal and this is the one that is meant when saying "independent variable" and "dependent variable". We mean that the DV, in some sense, depends on the IV. So, for example, when modeling the relationship of height and weight in adult humans, we say weight is the DV and height is the IV.
This does capture something that "predictor" does not - namely, the direction of the relationship. Height predicts weight, but weight also predicts height. That is, if you were told to guess the height of people and were told their weights, that would be useful.
But we wouldn't say that height depends on weight.
Based on the above answers, yes , i agree that this dependent and independent variable are weak terminology. But I can explain the context in which it is being used by many of us. You say that for a general regression problem we have a Output variable, say Y, whose value depends on other input variables, say x1, x2, x3. That is why it is called a "Dependent Variable". And similarly depending upon this context only, and just to differentiate between Output and Input Variable, x1, x2, x3 are termed as independent variable. Because unlike Y it does not depend on any other variable(But yes here we are not talking about there dependency with themselves.)
Independent variables are called independent because they do not depend on other variables. For example, consider the house price prediction problem. Assume we have data on house_size, location, and house_price. Here, house_price is determined based on the house_size and location but the location and house_size can vary for different houses.