umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Textual content, cited references, similarity order, and clustering: an experimental study in the context of science mapping
Högskolan i Jönköping, Högskolebiblioteket.ORCID iD: 0000-0002-7653-4004
Department of e-Resources, University Library, Stockholm University.
2009 (English)In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, 2009Conference paper, Published paper (Refereed)
Abstract [en]

This paper deals with document-document similarity approaches, the issue of similarity order, and clustering methods, in the context of science mapping. Using two data sets of bibliographic records, associated with the fields of information retrieval and scientometrics, we investigate how well two document-document similarity approaches, a text-based approach and bibliographic coupling, agree with ground truth classifications (obtained by subject experts), under first-order and second-order similarities, and under four different clustering methods. The clustering methods are average linkage, complete linkage, Ward’s method and consensus clustering. The performance of first-order and second-order similarities is compared within the two document-document similarity approaches, and under each clustering method. We also compare the performance of the clustering methods. The results show that the text-based approach consistently outperformed bibliographic coupling with regard to the information retrieval data set, but performed consistently worse than the latter approach regarding the scientometrics data set. For the similarity order issue, second-order similarities performed better than first-order in 12 out of 16 cases. Average linkage had the best overall performance among the clustering methods, followed by consensus clustering. The main conclusion of the study is that second-order similarities seem to be a better choice than first-order in the science mapping context.

Place, publisher, year, edition, pages
2009.
Keyword [en]
Bibliometrics, Citation data, Text mining, Similarity order, Consensus clustering
Identifiers
URN: urn:nbn:se:umu:diva-37583OAI: oai:DiVA.org:umu-37583DiVA: diva2:369083
Available from: 2010-11-09 Created: 2010-11-09 Last updated: 2015-04-01Bibliographically approved

Open Access in DiVA

No full text

Authority records BETA

Colliander, Cristian

Search in DiVA

By author/editor
Colliander, Cristian

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 275 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf