umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Textual information retrieval: An approach based on language modeling and neural networks
Umeå University, Faculty of Science and Technology, Applied Physics and Electronics.
2004 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

This thesis covers topics relevant to information organization and retrieval. The main objective of the work is to provide algorithms that can elevate the recall-precision performance of retrieval tasks in a wide range of applications ranging from document organization and retrieval to web-document pre-fetching and finally clustering of documents based on novel encoding techniques.

The first part of the thesis deals with the concept of document organization and retrieval using unsupervised neural networks, namely the self-organizing map, and statistical encoding methods for representing the available documents into numerical vectors. The objective of this section is to introduce a set of novel variants of the self-organizing map algorithm that addresses certain shortcomings of the original algorithm.

In the second part of the thesis the latencies perceived by users surfing the Internet are shortened with the usage of a novel transparent and speculative pre-fetching algorithm. The proposed algorithm relies on a model of behaviour for the user browsing the Internet and predicts his future actions when surfing the Internet. In modeling the users behaviour the algorithm relies on the contextual statistics of the web pages visited by the user.

Finally, the last chapter of the thesis provides preliminary theoretical results along with a general framework on the current and future scientific work. The chapter describes the usage of the Zipf distribution for document organization and the usage of the adaboosting algorithm for the elevation of the performance of pre-fetching algorithms.

Place, publisher, year, edition, pages
Umeå: Tillämpad fysik och elektronik , 2004. , 176 p.
Keyword [en]
Informatics, computer and systems science, Language modeling
Keyword [sv]
Informatik, data- och systemvetenskap
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:umu:diva-252ISBN: 91-7305-623-5 (print)OAI: oai:DiVA.org:umu-252DiVA: diva2:142818
Public defence
2004-04-15, Umeε, 13:00
Supervisors
Available from: 2004-04-29 Created: 2004-04-29Bibliographically approved

Open Access in DiVA

fulltext(5301 kB)4458 downloads
File information
File name FULLTEXT01.pdfFile size 5301 kBChecksum SHA-1
c9ea12dddb74501d84a01606543a3a84dfc7ac68196fe590f038686dd336f5583e822fba
Type fulltextMimetype application/pdf

By organisation
Applied Physics and Electronics
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 4458 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1442 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf