Change search
ReferencesLink to record
Permanent link

Direct link
Document-document similarity approaches and science mapping: experimental comparison of five approaches
Department of e-Resources, University Library, Stockholm University.
Högskolan i Jönköping, Högskolebiblioteket.ORCID iD: 0000-0002-7653-4004
2009 (English)In: Journal of Informetrics, ISSN 1751-1577, Vol. 3, no 1, 49-63 p.Article in journal (Refereed) Published
Abstract [en]

This paper treats document-document similarity approaches in the context of science mapping. Five approaches, involving nine methods, are compared experimentally. We compare text-based approaches, the citation-based bibliographic coupling approach, and approaches that combine text-based approaches and bibliographic coupling. Forty-three articles, published in the journal Information Retrieval, are used as test documents. We investigate how well the approaches agree with a ground truth subject classification of the test documents, when the complete linkage method is used, and under two types of similarities, first-order and second-order. The results show that it is possible to achieve a very good approximation of the classification by means of automatic grouping of articles. One text-only method and one combination method, under second-order similarities in both cases, give rise to cluster solutions that to a large extent agree with the classification.

Place, publisher, year, edition, pages
Elsevier BV , 2009. Vol. 3, no 1, 49-63 p.
Keyword [en]
Bibliometrics, Citation data, Text mining, Cluster analysis, Data source combination, Science mapping
National Category
Computer and Information Science
URN: urn:nbn:se:umu:diva-37580DOI: 10.1016/j.joi.2008.11.003ISI: 000262496700005OAI: diva2:369081
Available from: 2010-11-09 Created: 2010-11-09 Last updated: 2015-04-01Bibliographically approved
In thesis
1. Science mapping and research evaluation: a novel methodology for creating normalized citation indicators and estimating their stability
Open this publication in new window or tab >>Science mapping and research evaluation: a novel methodology for creating normalized citation indicators and estimating their stability
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The purpose of this thesis is to contribute to the methodology at the intersection of relational and evaluative bibliometrics. Experimental investigations are presented that address the question of how we can most successfully produce estimates of the subject similarity between documents. The results from these investigations are then explored in the context of citation-based research evaluations in an effort to enhance existing citation normalization methods that are used to enable comparisons of subject-disparate documents with respect to their relative impact or perceived utility. This thesis also suggests and explores an approach for revealing the uncertainty and stability (or lack thereof) coupled with different kinds of citation indicators.This suggestion is motivated by the specific nature of the bibliographic data and the data collection process utilized in citation-based evaluation studies.

The results of these investigations suggest that similarity-detection methods that take a global view of the problem of identifying similar documents are more successful in solving the problem than conventional methods that are more local in scope. These results are important for all applications that require subject similarity estimates between documents. Here these insights are specifically adopted in an effort to create a novel citation normalization approach that – compared to current best practice – is more in tune with the idea of controlling for subject matter when thematically different documents are assessed with respect to impact or perceived utility. The normalization approach is flexible with respect to the size of the normalization baseline and enables a fuzzy partition of the scientific literature. It is shown that this approach is more successful than currently applied normalization approaches in reducing the variability in the observed citation distribution that stems from the variability in the articles’ addressed subject matter. In addition, the suggested approach can enhance the interpretability of normalized citation counts. Finally, the proposed method for assessing the stability of citation indicators stresses that small alterations that could be artifacts from the data collection and preparation steps can have a significant influence on the picture that is painted by the citationindicator. Therefore, providing stability intervals around derived indicators prevents unfounded conclusions that otherwise could have unwanted policy implications.

Together, the new normalization approach and the method for assessing the stability of citation indicators have the potential to enable fairer bibliometric evaluative exercises and more cautious interpretations of citation indicators.

Place, publisher, year, edition, pages
Umeå: Umeå universitet, 2014. 37 p.
Akademiska avhandlingar vid Sociologiska institutionen, Umeå universitet, ISSN 1104-2508 ; 76
document-document similarity, science mapping, citation analysis, citation normalization, stability analysis, citation impact, research evaluation
National Category
Social Sciences Interdisciplinary Information Studies
Research subject
biblioteks- och informationsvetenskap
urn:nbn:se:umu:diva-94189 (URN)978-91-7601-134-8 (ISBN)
Public defence
2014-10-31, Hörsal 1031, Norra Beteendevetarhuset, Umeå universitet, Umeå, 13:15 (English)
Available from: 2014-10-10 Created: 2014-10-06 Last updated: 2015-04-01Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Colliander, Cristian
In the same journal
Journal of Informetrics
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 294 hits
ReferencesLink to record
Permanent link

Direct link