Textual content, cited references, similarity order, and clustering: an experimental study in the context of science mapping
2009 (English)In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, 2009Conference paper (Refereed)
This paper deals with document-document similarity approaches, the issue of similarity order, and clustering methods, in the context of science mapping. Using two data sets of bibliographic records, associated with the fields of information retrieval and scientometrics, we investigate how well two document-document similarity approaches, a text-based approach and bibliographic coupling, agree with ground truth classifications (obtained by subject experts), under first-order and second-order similarities, and under four different clustering methods. The clustering methods are average linkage, complete linkage, Ward’s method and consensus clustering. The performance of first-order and second-order similarities is compared within the two document-document similarity approaches, and under each clustering method. We also compare the performance of the clustering methods. The results show that the text-based approach consistently outperformed bibliographic coupling with regard to the information retrieval data set, but performed consistently worse than the latter approach regarding the scientometrics data set. For the similarity order issue, second-order similarities performed better than first-order in 12 out of 16 cases. Average linkage had the best overall performance among the clustering methods, followed by consensus clustering. The main conclusion of the study is that second-order similarities seem to be a better choice than first-order in the science mapping context.
Place, publisher, year, edition, pages
Bibliometrics, Citation data, Text mining, Similarity order, Consensus clustering
IdentifiersURN: urn:nbn:se:umu:diva-37583OAI: oai:DiVA.org:umu-37583DiVA: diva2:369083