Umeå University's logo

umu.sePublications
567891011 11 of 11
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Extensions and applications of item response theory
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.ORCID iD: 0000-0001-7573-0671
2025 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Vidareutveckling och tillämpningar av item response theory (Swedish)
Abstract [en]

This doctoral thesis focuses on Item Response Theory (IRT), a statistical method widely used in fields such as education and psychology to analyze response patterns on tests and surveys. In practice, IRT models are estimated using collected test data, which allows researchers to assess both how effectively each item measures the underlying trait—such as subject knowledge or personality characteristics—that the test aims to evaluate, and to estimate each individual's level of that trait. Unlike traditional methods that simply sum predetermined item scores, IRT accounts for the difficulty of each item and its ability to measure the intended trait.

The thesis consists of four research articles, each addressing different aspects of IRT and its applications. The first article focuses on test equating, ensuring that scores from different versions of a test are comparable. Equating methods with and without IRT are compared using simulations to explore the advantages and disadvantages of incorporating IRT into the kernel equating framework. The second and third articles introduce and compare different types of IRT models. Through simulations and real test data examples, these studies demonstrate that more flexible models can better capture the true relationships between test responses and the underlying traits being measured.

Finally, the IRTorch Python package is presented in the fourth study. IRTorch supports various IRT models and estimation methods and can be used to analyze data from different types of tests and surveys. In summary, the thesis demonstrates how IRT-based equating methods can serve as an alternative to traditional equating methods, how more flexible IRT models can improve the precision of test results, and how user-friendly software can make advanced statistical models accessible to a wider audience.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2025. , p. 25
Series
Statistical studies, ISSN 1100-8989 ; 60
Keywords [en]
Machine learning, Autoencoders, Item response theory, psychometrics, Test equating, Statistical software, Educational assessment, Latent variable modelling
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
URN: urn:nbn:se:umu:diva-233351ISBN: 978-91-8070-572-1 (electronic)ISBN: 978-91-8070-571-4 (print)OAI: oai:DiVA.org:umu-233351DiVA, id: diva2:1923843
Public defence
2025-02-07, HUM.D.220 (Hjortronlandet), Humanisthuset, Umeå university, Umeå, 09:00 (English)
Opponent
Supervisors
Available from: 2025-01-08 Created: 2024-12-31 Last updated: 2025-01-08Bibliographically approved
List of papers
1. Efficiency analysis of item response theory kernel equating for mixed-format tests
Open this publication in new window or tab >>Efficiency analysis of item response theory kernel equating for mixed-format tests
2023 (English)In: Applied psychological measurement, ISSN 0146-6216, E-ISSN 1552-3497, Vol. 47, no 7-8, p. 496-512Article in journal (Refereed) Published
Abstract [en]

This study aims to evaluate the performance of Item Response Theory (IRT) kernel equating in the context of mixed-format tests by comparing it to IRT observed score equating and kernel equating with log-linear presmoothing. Comparisons were made through both simulations and real data applications, under both equivalent groups (EG) and non-equivalent groups with anchor test (NEAT) sampling designs. To prevent bias towards IRT methods, data were simulated with and without the use of IRT models. The results suggest that the difference between IRT kernel equating and IRT observed score equating is minimal, both in terms of the equated scores and their standard errors. The application of IRT models for presmoothing yielded smaller standard error of equating than the log-linear presmoothing approach. When test data were generated using IRT models, IRT-based methods proved less biased than log-linear kernel equating. However, when data were simulated without IRT models, log-linear kernel equating showed less bias. Overall, IRT kernel equating shows great promise when equating mixed-format tests.

Place, publisher, year, edition, pages
Sage Publications, 2023
Keywords
item response theory, kernel equating, log-linear models, presmoothing, simulation
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-215929 (URN)10.1177/01466216231209757 (DOI)2-s2.0-85174542085 (Scopus ID)
Funder
Marianne and Marcus Wallenberg Foundation, 2019.0129
Available from: 2023-11-02 Created: 2023-11-02 Last updated: 2024-12-31Bibliographically approved
2. Analyzing polytomous test data: a comparison between an information-based IRT model and the generalized partial credit model
Open this publication in new window or tab >>Analyzing polytomous test data: a comparison between an information-based IRT model and the generalized partial credit model
2024 (English)In: Journal of educational and behavioral statistics, ISSN 1076-9986, E-ISSN 1935-1054, Vol. 49, no 5, p. 753-779Article in journal (Refereed) Published
Abstract [en]

Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker’s attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of information theory, and the generalized partial credit (GPC) model, a widely used parametric alternative. We evaluate these models using both simulated and real test data. In the real data examples, the OS model demonstrates superior model fit compared to the GPC model across all analyzed datasets. In our simulation study, the OS model outperforms the GPC model in terms of bias, but at the cost of larger standard errors for the probabilities along the estimated item response functions. Furthermore, we illustrate how surprisal arc length, an IRT scale invariant measure of ability with metric properties, can be used to put scores from vastly different types of IRT models on a common scale. We also demonstrate how arc length can be a viable alternative to sum scores for scoring test takers.

Place, publisher, year, edition, pages
Sage Publications, 2024
Keywords
item characteristic curves, item response theory, nonparametric IRT, simulation
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-216867 (URN)10.3102/10769986231207879 (DOI)001098507600001 ()2-s2.0-85176273777 (Scopus ID)
Funder
Marianne and Marcus Wallenberg Foundation, 2019.0129
Available from: 2023-12-12 Created: 2023-12-12 Last updated: 2024-12-31Bibliographically approved
3. Introducing flexible monotone multiple choice item response theory models and bit scales
Open this publication in new window or tab >>Introducing flexible monotone multiple choice item response theory models and bit scales
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Item Response Theory (IRT) is a powerful statistical approach for evaluating test items and determining test taker abilities through response analysis. An IRT model that better fits the data leads to more accurate latent trait estimates. In this study, we present a new model for multiple choice data, the monotone multiple choice (MMC) model, which we fit using autoencoders. Using both simulated scenarios and real data from the Swedish Scholastic Aptitude Test, we demonstrate empirically that the MMC model outperforms the traditional nominal response IRT model in terms of fit. Furthermore, we illustrate how the latent trait scale from any fitted IRT model can be transformed into a ratio scale, aiding in score interpretation and making it easier to compare different types of IRT models. We refer to these new scales as bit scales. Bit scales are especially useful for models for which minimal or no assumptions are made for the latent trait scale distributions, such as for the autoencoder fitted models in this study.

Keywords
Item response theory, Neural networks, autoencoders
National Category
Probability Theory and Statistics
Research subject
Statistics; data science
Identifiers
urn:nbn:se:umu:diva-233350 (URN)10.48550/arXiv.2410.01480 (DOI)
Funder
Wallenberg Foundations, 2022-02046
Available from: 2024-12-31 Created: 2024-12-31 Last updated: 2025-01-02Bibliographically approved
4. IRTorch: an Item Response Theory Python package
Open this publication in new window or tab >>IRTorch: an Item Response Theory Python package
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Item Response Theory (IRT) is a statistical framework used to model the relationshipbetween latent traits (such as abilities or personality traits) and responses to items meantto assess those traits. In this article, we introduce the IRTorch Python package for fittingand evaluating IRT models. The package utilizes PyTorch for parameter optimizationand GPU support. It supports a diverse range of unidimensional and multidimensionalIRT models, both parametric and semiparametric. IRTorch also emphasizes the arbitrarynature of the latent variable scale, which is implicitly assumed and often ignored in otherIRT software. The package provides a flexible framework to implement custom models,scale transformations, and fitting algorithms. We illustrate some of the package’s featuresthrough several examples, including fitting traditional IRT models, using autoencoders forfitting IRT models, and using the bit scale transformation to give a unit of measurementto the latent trait scale.

Keywords
IRT, Python, PyTorch, model estimation, autoencoders
National Category
Probability Theory and Statistics
Research subject
Statistics; education; Psychology
Identifiers
urn:nbn:se:umu:diva-233349 (URN)
Funder
Swedish Research Council, 022-02046
Available from: 2024-12-31 Created: 2024-12-31 Last updated: 2025-01-02Bibliographically approved

Open Access in DiVA

fulltext(699 kB)25 downloads
File information
File name FULLTEXT01.pdfFile size 699 kBChecksum SHA-512
a57942a13e3d13bd6165ca2b96acd2c23d6d164197fe610569539e31c0b80cff155aa73a0699ba83c00cda8e3bdf253cc4bfe0930b7500bdede2d8618e61f515
Type fulltextMimetype application/pdf
spikblad(134 kB)14 downloads
File information
File name SPIKBLAD01.pdfFile size 134 kBChecksum SHA-512
c5ec1cc8b44469420d9460463cfa95c35b9fb540457c0df8055a3b683369a6c41e08478856d7d29bb58f94a7e882906869cf6e4c04bba95318231ef44b035a88
Type spikbladMimetype application/pdf

Authority records

Wallmark, Joakim

Search in DiVA

By author/editor
Wallmark, Joakim
By organisation
Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 25 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 213 hits
567891011 11 of 11
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf