umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Study on Record Linkage regarding Accuracy and Scalability
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2018 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

The idea of record linkage is to find records that refer to the same entity across different data sources. There are multiple synonyms that refer to record linkage, such as data matching, entity resolution, entity disambiguation, or deduplication etc. Record linkage is useful for lots of practices including data cleaning, data management, and business intelligence. Machine learning methods include both unsupervised and supervised learning methods have been applied to address the problem of record linkage. The rise of the big data era has presented new challenges. The trade-off of accuracy and scalability presents a few critical issues for the linkage process. The objective of this study is to present an overview of the state-of-the-art machine learning algorithms for record linkage, a comparison between them, and explore the optimization possibilities of these algorithms based on different similarity functions. The optimization is evaluated in terms of accuracy and scalability. Results showed that supervised classification algorithms, even with a relatively small training set, classified sets of data in shorter time and had approximately the same accuracy as the unsupervised counterparts.

Place, publisher, year, edition, pages
2018. , p. 28
Series
UMNAD ; 1168
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:umu:diva-155357OAI: oai:DiVA.org:umu-155357DiVA, id: diva2:1278358
Educational program
Bachelor of Science Programme in Computing Science
Supervisors
Examiners
Available from: 2019-01-14 Created: 2019-01-14 Last updated: 2019-01-14Bibliographically approved

Open Access in DiVA

fulltext(328 kB)61 downloads
File information
File name FULLTEXT01.pdfFile size 328 kBChecksum SHA-512
d30bbb5cc9267495c8d0751d848daa3df2642f2ea6e6534e5901ec6ad0cb8970c654857ee3bd270704927fd9e83866ee63e8389e2b229d367874d59b1833d871
Type fulltextMimetype application/pdf

By organisation
Department of Computing Science
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 61 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 149 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf