Change search
ReferencesLink to record
Permanent link

Direct link
Textual content, cited references, similarity order, and clustering: an experimental study in the context of science mapping
Högskolan i Jönköping, Högskolebiblioteket.ORCID iD: 0000-0002-7653-4004
Department of e-Resources, University Library, Stockholm University.
2009 (English)In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, 2009Conference paper (Refereed)
Abstract [en]

This paper deals with document-document similarity approaches, the issue of similarity order, and clustering methods, in the context of science mapping. Using two data sets of bibliographic records, associated with the fields of information retrieval and scientometrics, we investigate how well two document-document similarity approaches, a text-based approach and bibliographic coupling, agree with ground truth classifications (obtained by subject experts), under first-order and second-order similarities, and under four different clustering methods. The clustering methods are average linkage, complete linkage, Ward’s method and consensus clustering. The performance of first-order and second-order similarities is compared within the two document-document similarity approaches, and under each clustering method. We also compare the performance of the clustering methods. The results show that the text-based approach consistently outperformed bibliographic coupling with regard to the information retrieval data set, but performed consistently worse than the latter approach regarding the scientometrics data set. For the similarity order issue, second-order similarities performed better than first-order in 12 out of 16 cases. Average linkage had the best overall performance among the clustering methods, followed by consensus clustering. The main conclusion of the study is that second-order similarities seem to be a better choice than first-order in the science mapping context.

Place, publisher, year, edition, pages
Keyword [en]
Bibliometrics, Citation data, Text mining, Similarity order, Consensus clustering
URN: urn:nbn:se:umu:diva-37583OAI: diva2:369083
Available from: 2010-11-09 Created: 2010-11-09 Last updated: 2015-04-01Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Colliander, Cristian

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 154 hits
ReferencesLink to record
Permanent link

Direct link