umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient fuzzy type-ahead search on big data using a ranked trie data structure
Umeå University, Faculty of Science and Technology, Department of Physics.
2018 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The efficiency of modern search engines depends on how well they present typo-corrected results to a user while typing. So-called fuzzy type-ahead search combines fuzzy string matching and search-as-you-type functionality, and creates a powerful tool for exploring indexed data. Current fuzzy type-ahead search algorithms work well on small data sets, but for big data of social networking services such as Facebook, e-commerce sites such as Amazon, or media streaming services such as YouTube, responsive fuzzy type-ahead search remains a great challenge.

This thesis describes a method that enables responsive type-ahead search combined with fuzzy string matching on big data by keeping the search time optimal for human interaction at the expense of lower accuracy for less popular records when a query contains typos. This makes the method effective for e-commerce and media services where the popularity of search terms is a result of human behaviour and thus often follow a power-law distribution.

Abstract [sv]

Effektiviteten hos moderna sökmotorer beror på hur väl de presenterar rättstavade resultat för en användare medan en sökning skrivs. Så kallad fuzzy type-ahead sök kombinerar approximativ strängmatchning och sök-medan-du-skriver funktionalitet, vilket skapar ett kraftfullt verktyg för att utforska data. Dagens algoritmer för fuzzy type-ahead sök fungerar väl för små mängder data, men för data i storleksordningen “big data” från t.ex sociala nätverkstjänster så som Facebook, e-handelssidor så som Amazon, eller media tjänster så som YouTube, är en responsiv fuzzy type-ahead sök ännu en stor utmaning.

Denna avhandling beskriver en metod som möjliggör responsiv type-ahead sök kombinerat med approximativ strängmatchning för big data genom att hålla söktiden optimal för mänsklig interaktion på bekostnad av lägre precision för mindre populär information när en sök-förfrågan innehåller felstavningar. Detta gör metoden effektiv för e-handel och mediatjänster där populariteten av sök-termer är ett resultat av mänskligt beteende vilket ofta följer en potens-lag distribution.

Place, publisher, year, edition, pages
2018. , p. 42
Keywords [en]
Approximate string matching, Fuzzy search, Type-ahead search, String similarity
National Category
Computer Sciences Information Systems
Identifiers
URN: urn:nbn:se:umu:diva-145029OAI: oai:DiVA.org:umu-145029DiVA, id: diva2:1183576
External cooperation
Infobaleen
Subject / course
Examensarbete i teknisk fysik
Educational program
Master of Science Programme in Engineering Physics
Supervisors
Examiners
Available from: 2018-02-20 Created: 2018-02-18 Last updated: 2018-02-20Bibliographically approved

Open Access in DiVA

efficient_fuzzy_typeahead_search_on_bigdata(1583 kB)141 downloads
File information
File name FULLTEXT01.pdfFile size 1583 kBChecksum SHA-512
977c0e1a8a879acc6cd257d76e67b9f6b82dda85c2df02b521b30f7cc61016f5f2bf642325eb2d0fba896b0a87c346dcb682b0cf3c42b17bf50dd3e65e51681a
Type fulltextMimetype application/pdf

By organisation
Department of Physics
Computer SciencesInformation Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 141 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 501 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf