umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Missing data and the preprocessing perceptron
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2004 (English)Report (Other academic)
Abstract [en]

In this paper, several ways to handle missing data, e.g. removing cases, mean imputation, and multiple imputation, are described and discussed. The Pima-Indians-Diabetes data set is used as a case study. This particular data set is interesting to use since it has not been obvious to all users that it actually contains a substantial amount of missing data. The data set is described in detail and the methods for coping with missing data mentioned in the text is applied on the data set.

The preprocessing perceptron is used to train decision support systems on the data sets. A sketch of a way to impute missing data using the preprocessing perceptron is also proposed and discussed. The accuracy of the trained decision support systems, at the optimal efficiency point, lied in the interval 76-82% for the different methods. The highest values were obtained when all missing data cases were removed both from the test and the training set. This is, however, not a good way to handle missing data since the resulting decision support system is biased. Furthermore it will not be able to handle missing data when used on real data in the future. The results of the remaining methods were surprisingly similar, a reason for this might be that the data set used is rather large. Differences between methods would probably be larger in a smaller data set with larger amount of missing data.

Place, publisher, year, edition, pages
Dept. of Computing Science, Umeå University , 2004. , 30 p.
Series
UMINF, 04.02
National Category
Computer Systems
Research subject
Computing Science
Identifiers
URN: urn:nbn:se:umu:diva-8399OAI: oai:DiVA.org:umu-8399DiVA: diva2:148070
Available from: 2008-01-21 Created: 2008-01-21 Last updated: 2017-09-25

Open Access in DiVA

No full text

Other links

http://www.cs.umu.se/research/reports/2004/002/part1.pdf

Search in DiVA

By author/editor
Kallin Westin, Lena
By organisation
Department of Computing Science
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 102 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf