In regression, it is often recommended to center the variables so that the predictors have mean 0 . This makes it so the intercept term is interpreted as the expected value of Yi when the predictor values are set to their means. Otherwise, the intercept is interpreted as the expected value of Yi when the predictors are set to 0, which may not be a realistic or interpretable situation (e.g. what if the predictors were height and weight?). Another practical reason for scaling in regression is when one variable has a very large scale, e.g. if you were using population size of a country as a predictor. In that case, the regression coefficients may on be a very small order of magnitude (e.g. 10−6 ) which can be a little annoying when you're reading computer output, so you may convert the variable to, for example, population size in millions. The convention that you standardize predictions primarily exists so that the units of the regression coefficients are the same.
As @gung alludes to and @MånsT shows explicitly (+1 to both, btw), centering/scaling does not effect your statistical inference in regression models - the estimates are adjusted appropriately and the p -values will be the same.
Other situations where centering and/or scaling may be useful:
- when you're trying to sum or average variables that are on different scales, perhaps to create a composite score of some kind. Without scaling, it may be the case that one variable has a larger impact on the sum due purely to its scale, which may be undesirable.
- To simplify calculations and notation. For example, the sample covariance matrix of a matrix of values centered by their sample means is simply
X′X . Similarly, if a univariate random variableX has been mean centered, thenvar(X)=E(X2) and the variance can be estimated from a sample by looking at the sample mean of the squares of the observed values. - Related to aforementioned, PCA can only be interpreted as the singular value decompositionof a data matrix when the columns have first been centered by their means.
Note that scaling is not necessary in the last two bullet points I mentioned and centering may not be necessary in the first bullet I mentioned, so the two do not need to go hand and hand at all times.
No comments:
Post a Comment