You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.
Sometimes a large number of independent variables, Xi,
is available for a given modeling problem, and not all of these predictor
variables may contribute equally well to the explanation of the predicted
variable Y. Some of the independent variables may not contribute at all
to the model. Thus we have to select from these variables to obtain a model
which contains as little variables as possible while still being the "best"
model. In principle, all possible combinations of independent variables
should be tried for calculating a suitable model. This could turn out to
be a formidable task, even if high performance computers are available.
Besides the practicability of this approach, there are also several theoretical
considerations which should be taken into account:
the contribution of a single variable to the explanation of Y may not easily
be assessed if only a small number of observations is available
a simple criterion, like the goodness of fit, r2, may lead to
wrong conclusions if the number of selected variables approaches the number
of observations
for more complicated models (e.g. artificial neural networks) the calculation
of a single model may be so time-consuming that it is practically impossible
to find the "best" combination of independent variables
the selection of combinations is guided by the available data; thus the
resulting final selection reflects the "best" model for the given data
set, and not the "best" subset for the population
some of the selection methods are specifically tailored to linear (regression)
models; they are unusable with non-linear methods such as neural networks.
Depending on the type of model being used, there are several strategies
to (partially) solve the problem: