Change search
ReferencesLink to record
Permanent link

Direct link
Missing data and the preprocessing perceptron
Umeå University, Faculty of Science and Technology, Computing Science.
2004 (English)Report (Other academic)
Abstract [en]

In this paper, several ways to handle missing data, e.g. removing cases, mean imputation, and multiple imputation, are described and discussed. The Pima-Indians-Diabetes data set is used as a case study. This particular data set is interesting to use since it has not been obvious to all users that it actually contains a substantial amount of missing data. The data set is described in detail and the methods for coping with missing data mentioned in the text is applied on the data set.

The preprocessing perceptron is used to train decision support systems on the data sets. A sketch of a way to impute missing data using the preprocessing perceptron is also proposed and discussed. The accuracy of the trained decision support systems, at the optimal efficiency point, lied in the interval 76-82% for the different methods. The highest values were obtained when all missing data cases were removed both from the test and the training set. This is, however, not a good way to handle missing data since the resulting decision support system is biased. Furthermore it will not be able to handle missing data when used on real data in the future. The results of the remaining methods were surprisingly similar, a reason for this might be that the data set used is rather large. Differences between methods would probably be larger in a smaller data set with larger amount of missing data.

Place, publisher, year, edition, pages
Dept. of Computing Science, Umeå University , 2004. , 30 p.
, UMINF, 04.02
URN: urn:nbn:se:umu:diva-8399OAI: diva2:148070
Available from: 2008-01-21 Created: 2008-01-21Bibliographically approved

Open Access in DiVA

No full text

Other links

Search in DiVA

By author/editor
Kallin Westin, Lena
By organisation
Computing Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 68 hits
ReferencesLink to record
Permanent link

Direct link