umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Missing data and the preprocessing perceptron
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
2004 (Engelska)Rapport (Övrigt vetenskapligt)
Abstract [en]

In this paper, several ways to handle missing data, e.g. removing cases, mean imputation, and multiple imputation, are described and discussed. The Pima-Indians-Diabetes data set is used as a case study. This particular data set is interesting to use since it has not been obvious to all users that it actually contains a substantial amount of missing data. The data set is described in detail and the methods for coping with missing data mentioned in the text is applied on the data set.

The preprocessing perceptron is used to train decision support systems on the data sets. A sketch of a way to impute missing data using the preprocessing perceptron is also proposed and discussed. The accuracy of the trained decision support systems, at the optimal efficiency point, lied in the interval 76-82% for the different methods. The highest values were obtained when all missing data cases were removed both from the test and the training set. This is, however, not a good way to handle missing data since the resulting decision support system is biased. Furthermore it will not be able to handle missing data when used on real data in the future. The results of the remaining methods were surprisingly similar, a reason for this might be that the data set used is rather large. Differences between methods would probably be larger in a smaller data set with larger amount of missing data.

Ort, förlag, år, upplaga, sidor
Dept. of Computing Science, Umeå University , 2004. , s. 30
Serie
UMINF ; 04.02
Nationell ämneskategori
Datorsystem
Forskningsämne
administrativ databehandling
Identifikatorer
URN: urn:nbn:se:umu:diva-8399OAI: oai:DiVA.org:umu-8399DiVA, id: diva2:148070
Tillgänglig från: 2008-01-21 Skapad: 2008-01-21 Senast uppdaterad: 2018-06-09

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

http://www.cs.umu.se/research/reports/2004/002/part1.pdf

Personposter BETA

Kallin Westin, Lena

Sök vidare i DiVA

Av författaren/redaktören
Kallin Westin, Lena
Av organisationen
Institutionen för datavetenskap
Datorsystem

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 244 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf