You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.
|
|
Missing Values
One major problem of any analysis of data is caused by missing values.
The resulting, partially empty data matrices are hard to interpret and
should be avoided whenever possible. However, several methods exist to
deal with missing values.
Voice of an expert:
"Proper (i.e. versatile) missing value handling
is essential to any data analysis package worthy of the name"
Mark Myatt, Brixton Health, UK, newsgroup sci.stat.consult,
Dec 1996
Possibilities to deal with missing values:
-
use only rows (or columns) of data that have no missing values
-
fill in missing values with row (or column) averages or with values estimated
by regression
-
use only this data for each analysis option which is available for that
particular case
-
use your knowledge of the data source to impute missing values
-
some packages do not offer any methods of imputation, but extends all interactive
graphic tools to include missing values
-
sometimes missing data may have a meaning of its own (e.g. in sociological
studies, where no answer to a question may also be some kind of an answer)
The results of a model or analysis should always be checked with
and without the missing data. If they are markedly different you should
try to find some explanation for this. More information on that topic is
available in the book on missing data by Rubin
.
Be sure to always mark imputed data as such.
Otherwise you may confuse it with real data later on.
Last Update: 2006-Jän-17