You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.
|
|
MLR and Collinearity
Collinear variables are a major problem with MLR modeling. Two variables
are said to be collinear if they are approximately (or exactly) linearly
dependent, or in other words, if there is a high correlation between the
two variables. If a model is based on highly correlated variables, the
estimated regression coefficients become unstable. This renders the coefficients
useless for causal interpretation.
There are at least three ways to determine collinearity:
-
looking at the cross correlation table. The cross correlation table, however,
displays only collinearities between two variables. If there is a linear
relationship between more variables, the cross correlation table is only
of limited use. In addition, the correlation is heavily affected
by outliers.
-
the variance inflation factor (VIF) measures the increase in variance compared
to an orthogonal base. The VIF is defined by VIKk = 1/(1-Rk2),
where Rk is the correlation coefficient between xk
and yk. yk is a linear predictor based on the remaining
x-variables.
-
the condition index: the condition index is defined by the square root
of the ratio of the largest and the smallest eigenvalue
of the scatter matrix XTX. This value is large
if there is collinearity between the variables.
Last Update: 2006-Jän-17