You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information. |
Table of Contents Multivariate Data Basic Knowledge Scatter, Covariance, and Correlation Matrix | |
See also: matrix algebra, covariance, correlation |
These three types of matrices often form the basis of a multivariate method. The correlation and the covariance matrix are also often used for a first inspection of relationships among the variables of a multivariate data set. Therefore it is crucial to understand the principles behind them and the pitfalls which may arise from not-as-expected data sets.
Basically, all of these matrices are calculated using the same procedure:
ATA.
The only difference between them is how the data is scaled before the matrix
multiplication is executed:
What is the effect of a single
outlier on these matrices?
Suppose you have a data matrix which contains one object which is an outlier compared to the rest of the data. This single outlier will completely "corrupt" the matrices (especially the cross correlation matrix), showing a fake correlation. This fake correlation can misguide any unprepared operator. You may try this effect yourself by running the following .
Be extremly careful when selecting variables by
looking at the cross correlation table. A high correlation value may be
due to a single outlier in the data matrix.
Last Update: 2006-Jän-17