Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Analyzing polytomous test data: a comparison between an information-based IRT model and the generalized partial credit model
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE).ORCID iD: 0000-0001-7573-0671
McGill University, Canada.
Ottawa Hospital Research Institute, Canada.
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE).ORCID iD: 0000-0001-5549-8262
2024 (English)In: Journal of educational and behavioral statistics, ISSN 1076-9986, E-ISSN 1935-1054, Vol. 49, no 5, p. 753-779Article in journal (Refereed) Published
Abstract [en]

Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker’s attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of information theory, and the generalized partial credit (GPC) model, a widely used parametric alternative. We evaluate these models using both simulated and real test data. In the real data examples, the OS model demonstrates superior model fit compared to the GPC model across all analyzed datasets. In our simulation study, the OS model outperforms the GPC model in terms of bias, but at the cost of larger standard errors for the probabilities along the estimated item response functions. Furthermore, we illustrate how surprisal arc length, an IRT scale invariant measure of ability with metric properties, can be used to put scores from vastly different types of IRT models on a common scale. We also demonstrate how arc length can be a viable alternative to sum scores for scoring test takers.

Place, publisher, year, edition, pages
Sage Publications, 2024. Vol. 49, no 5, p. 753-779
Keywords [en]
item characteristic curves, item response theory, nonparametric IRT, simulation
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:umu:diva-216867DOI: 10.3102/10769986231207879ISI: 001098507600001Scopus ID: 2-s2.0-85176273777OAI: oai:DiVA.org:umu-216867DiVA, id: diva2:1818929
Funder
Marianne and Marcus Wallenberg Foundation, 2019.0129Available from: 2023-12-12 Created: 2023-12-12 Last updated: 2024-12-31Bibliographically approved
In thesis
1. Extensions and applications of item response theory
Open this publication in new window or tab >>Extensions and applications of item response theory
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Vidareutveckling och tillämpningar av item response theory
Abstract [en]

This doctoral thesis focuses on Item Response Theory (IRT), a statistical method widely used in fields such as education and psychology to analyze response patterns on tests and surveys. In practice, IRT models are estimated using collected test data, which allows researchers to assess both how effectively each item measures the underlying trait—such as subject knowledge or personality characteristics—that the test aims to evaluate, and to estimate each individual's level of that trait. Unlike traditional methods that simply sum predetermined item scores, IRT accounts for the difficulty of each item and its ability to measure the intended trait.

The thesis consists of four research articles, each addressing different aspects of IRT and its applications. The first article focuses on test equating, ensuring that scores from different versions of a test are comparable. Equating methods with and without IRT are compared using simulations to explore the advantages and disadvantages of incorporating IRT into the kernel equating framework. The second and third articles introduce and compare different types of IRT models. Through simulations and real test data examples, these studies demonstrate that more flexible models can better capture the true relationships between test responses and the underlying traits being measured.

Finally, the IRTorch Python package is presented in the fourth study. IRTorch supports various IRT models and estimation methods and can be used to analyze data from different types of tests and surveys. In summary, the thesis demonstrates how IRT-based equating methods can serve as an alternative to traditional equating methods, how more flexible IRT models can improve the precision of test results, and how user-friendly software can make advanced statistical models accessible to a wider audience.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2025. p. 25
Series
Statistical studies, ISSN 1100-8989 ; 60
Keywords
Machine learning, Autoencoders, Item response theory, psychometrics, Test equating, Statistical software, Educational assessment, Latent variable modelling
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:umu:diva-233351 (URN)978-91-8070-572-1 (ISBN)978-91-8070-571-4 (ISBN)
Public defence
2025-02-07, HUM.D.220 (Hjortronlandet), Humanisthuset, Umeå university, Umeå, 09:00 (English)
Opponent
Supervisors
Available from: 2025-01-08 Created: 2024-12-31 Last updated: 2025-01-08Bibliographically approved

Open Access in DiVA

fulltext(849 kB)66 downloads
File information
File name FULLTEXT02.pdfFile size 849 kBChecksum SHA-512
934e71976d175055a36b349e35afa0bf46448c8652b55c714dfe6f35e496a601a26463158899a5fc3c45ca3532884be33be5d4f48f75e13a67cd34c58ff4acd1
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Wallmark, JoakimWiberg, Marie

Search in DiVA

By author/editor
Wallmark, JoakimWiberg, Marie
By organisation
Umeå School of Business and Economics (USBE)
In the same journal
Journal of educational and behavioral statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 244 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 399 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf