Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 147) Show all publications
Ramsay, J. O., Li, J., Wallmark, J. & Wiberg, M. (2025). An information manifold perspective for analyzing test data. Applied psychological measurement, 49(3), 90-108
Open this publication in new window or tab >>An information manifold perspective for analyzing test data
2025 (English)In: Applied psychological measurement, ISSN 0146-6216, E-ISSN 1552-3497, Vol. 49, no 3, p. 90-108Article in journal (Refereed) Published
Abstract [en]

Modifications of current psychometric models for analyzing test data are proposed that produce an additive scale measure of information. This information measure is a one-dimensional space curve or curved surface manifold that is invariant across varying manifold indexing systems. The arc length along a curve manifold is used as it is an additive metric having a defined zero and a version of the bit as a unit. This property, referred to here as the scope of the test or an item, facilitates the evaluation of graphs and numerical summaries. The measurement power of the test is defined by the length of the manifold, and the performance or experiential level of a person by a position along the curve. In this study, we also use all information from the items including the information from the distractors. Test data from a large-scale college admissions test are used to illustrate the test information manifold perspective and to compare it with the well-known item response theory nominal model. It is illustrated that the use of information theory opens a vista of new ways of assessing item performance and inter-item dependency, as well as test takers' knowledge.

Place, publisher, year, edition, pages
Sage Publications, 2025
Keywords
entropy, expected sum score, nominal model, scope, score index, spline functions, surprisal, test information, TestGardener
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-233717 (URN)10.1177/01466216241310600 (DOI)001380542200001 ()39713764 (PubMedID)2-s2.0-105001648191 (Scopus ID)
Funder
Wallenberg Foundations, MMW 2019.012
Available from: 2025-01-09 Created: 2025-01-09 Last updated: 2025-04-29Bibliographically approved
Wiberg, M. & Laukaityte, I. (2025). Calculating bias in test score equating in a NEAT design. Applied psychological measurement
Open this publication in new window or tab >>Calculating bias in test score equating in a NEAT design
2025 (English)In: Applied psychological measurement, ISSN 0146-6216, E-ISSN 1552-3497Article in journal (Refereed) Epub ahead of print
Abstract [en]

Test score equating is used to make scores from different test forms comparable, even when groups differ in ability. In practice, the non-equivalent group with anchor test (NEAT) design is commonly used. The overall aim was to compare the amount of bias under different conditions when using either chained equating or frequency estimation with five different criterion functions: the identity function, linear equating, equipercentile, chained equating and frequency estimation. We used real test data from a multiple-choice binary scored college admissions test to illustrate that the choice of criterion function matter. Further, we simulated data in line with the empirical data to examine difference in ability between groups, difference in item difficulty, difference in anchor test form and regular test form length, difference in correlations between anchor test form and regular test forms, and different sample size. The results indicate that how bias is defined heavily affects the conclusions we draw about which equating method is to be preferred in different scenarios. Practical implications of this in standardized tests are given together with recommendations on how to calculate bias when evaluating equating transformations.

Place, publisher, year, edition, pages
Sage Publications, 2025
Keywords
criterion function, frequency estimation, chained equating
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:umu:diva-236964 (URN)10.1177/01466216251330305 (DOI)001450757600001 ()40162326 (PubMedID)2-s2.0-105001869754 (Scopus ID)
Funder
Marianne and Marcus Wallenberg Foundation, 2019.0129
Available from: 2025-03-26 Created: 2025-03-26 Last updated: 2025-04-28
Wiberg, M., González, J. & von Davier, A. A. (2025). Generalized kernel equating with applications in R (1ed.). Boca Raton: CRC Press
Open this publication in new window or tab >>Generalized kernel equating with applications in R
2025 (English)Book (Refereed)
Abstract [en]

Generalized Kernel Equating is a comprehensive guide for statisticians, psychometricians, and educational researchers aiming to master test score equating. This book introduces the Generalized Kernel Equating (GKE) framework, providing the necessary tools and methodologies for accurate and fair score comparisons.

The book presents test score equating as a statistical problem and covers all commonly used data collection designs. It details the five steps of the GKE framework: presmoothing, estimating score probabilities, continuization, equating transformation, and evaluating the equating transformation. Various presmoothing strategies are explored, including log-linear models, item response theory models, beta4 models, and discrete kernel estimators. The estimation of score probabilities when using IRT models is described and Gaussian kernel continuization is extended to other kernels such as uniform, logistic, epanechnikov and adaptive kernels. Several bandwidth selection methods are described. The kernel equating transformation and variants of it are defined, and both equating-specific and statistical measures for evaluating equating transformations are included. Real data examples, guiding readers through the GKE steps with detailed R code and explanations are provided. Readers are equipped with an advanced knowledge and practical skills for implementing test score equating methods.

Place, publisher, year, edition, pages
Boca Raton: CRC Press, 2025. p. 235 Edition: 1
Series
Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:umu:diva-231970 (URN)10.1201/9781315283777 (DOI)2-s2.0-85208298899 (Scopus ID)9781138196988 (ISBN)9781032904955 (ISBN)9781315283777 (ISBN)
Available from: 2024-11-19 Created: 2024-11-19 Last updated: 2024-11-22Bibliographically approved
González, J. & Wiberg, M. (2024). A family of discrete kernels for presmoothing test score distributions. In: Heungsun Hwang; Hao Wu; Tracy Sweet (Ed.), Quantitative psychology: the 88th annual meeting of the psychometric society, Maryland, USA, 2023. Paper presented at 88th Annual Meeting of the Psychometric Society, IMPS 2023, Maryland, USA, July 25-28, 2023 (pp. 1-12). Cham: Springer
Open this publication in new window or tab >>A family of discrete kernels for presmoothing test score distributions
2024 (English)In: Quantitative psychology: the 88th annual meeting of the psychometric society, Maryland, USA, 2023 / [ed] Heungsun Hwang; Hao Wu; Tracy Sweet, Cham: Springer, 2024, p. 1-12Conference paper, Published paper (Refereed)
Abstract [en]

In the fields of educational measurement and testing, score distributions are often estimated by the sample relative frequency distribution. As many score distributions are discrete and may have irregularities, it has been common practice to use presmoothing techniques to correct for such irregularities of the score distributions. A common way to conduct presmoothing has been to use log-linear models. In this chapter, we introduce a novel class of discrete kernels that can effectively estimate the probability mass function of scores, providing a presmoothing solution. The chapter includes an empirical illustration demonstrating that the proposed discrete kernel estimates perform as well as or better than the existing methods like log-linear models in presmoothing score distributions. The practical implications of this finding are discussed, highlighting the potential benefits of using discrete kernels in educational measurement contexts. Additionally, the chapter identifies several areas for further research, indicating opportunities for advancing the field’s methodology and practices.

Place, publisher, year, edition, pages
Cham: Springer, 2024
Series
Springer Proceedings in Mathematics & Statistics, ISSN 2194-1009, E-ISSN 2194-1017 ; 452
Keywords
Discrete kernels, Irregularities, Presmoothing, Score distributions
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-228139 (URN)10.1007/978-3-031-55548-0_1 (DOI)001290462200001 ()2-s2.0-85199519263 (Scopus ID)9783031555473 (ISBN)9783031555503 (ISBN)9783031555480 (ISBN)
Conference
88th Annual Meeting of the Psychometric Society, IMPS 2023, Maryland, USA, July 25-28, 2023
Available from: 2024-08-05 Created: 2024-08-05 Last updated: 2025-04-24Bibliographically approved
Wallmark, J., Ramsay, J. O., Li, J. & Wiberg, M. (2024). Analyzing polytomous test data: a comparison between an information-based IRT model and the generalized partial credit model. Journal of educational and behavioral statistics, 49(5), 753-779
Open this publication in new window or tab >>Analyzing polytomous test data: a comparison between an information-based IRT model and the generalized partial credit model
2024 (English)In: Journal of educational and behavioral statistics, ISSN 1076-9986, E-ISSN 1935-1054, Vol. 49, no 5, p. 753-779Article in journal (Refereed) Published
Abstract [en]

Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker’s attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of information theory, and the generalized partial credit (GPC) model, a widely used parametric alternative. We evaluate these models using both simulated and real test data. In the real data examples, the OS model demonstrates superior model fit compared to the GPC model across all analyzed datasets. In our simulation study, the OS model outperforms the GPC model in terms of bias, but at the cost of larger standard errors for the probabilities along the estimated item response functions. Furthermore, we illustrate how surprisal arc length, an IRT scale invariant measure of ability with metric properties, can be used to put scores from vastly different types of IRT models on a common scale. We also demonstrate how arc length can be a viable alternative to sum scores for scoring test takers.

Place, publisher, year, edition, pages
Sage Publications, 2024
Keywords
item characteristic curves, item response theory, nonparametric IRT, simulation
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-216867 (URN)10.3102/10769986231207879 (DOI)001098507600001 ()2-s2.0-85176273777 (Scopus ID)
Funder
Marianne and Marcus Wallenberg Foundation, 2019.0129
Available from: 2023-12-12 Created: 2023-12-12 Last updated: 2024-12-31Bibliographically approved
Boman, B. & Wiberg, M. (2024). Cognitive ability, gender, and well-being in school contexts: longitudinal evidence from Sweden. Frontiers in Psychology, 15, Article ID 1396682.
Open this publication in new window or tab >>Cognitive ability, gender, and well-being in school contexts: longitudinal evidence from Sweden
2024 (English)In: Frontiers in Psychology, E-ISSN 1664-1078, Vol. 15, article id 1396682Article in journal (Refereed) Published
Abstract [en]

While well-being does generally constitute a moderate predictor of school achievement, research on the predictive validity of cognitive ability for well-being in school contexts remains scant. The current study analyzed longitudinal relations between cognitive ability measured at age 13 (Grade 6) and well-being measured at age 18 (Grade 12, valid N = 2,705) in a Swedish sample, using several multivariate model techniques. The results indicate that cognitive ability was not a statistically significant predictor when several predictors were entered in a multiple regression model. However, gender was a significant covariate as girls and young women have a substantially lower degree of self-reported well-being. This casts light on the limitations of cognitive ability as a construct for some non-cognitive outcomes, at least in shorter and narrower spatial–temporal contexts.

Place, publisher, year, edition, pages
Frontiers Media S.A., 2024
Keywords
cognitive ability, gender, intelligence, longitudinal analysis, well-being
National Category
Psychology (excluding Applied Psychology)
Identifiers
urn:nbn:se:umu:diva-231041 (URN)10.3389/fpsyg.2024.1396682 (DOI)001329616800001 ()39391843 (PubMedID)2-s2.0-85206355088 (Scopus ID)
Available from: 2024-10-23 Created: 2024-10-23 Last updated: 2024-10-23Bibliographically approved
Franco, V. R., Laros, J. A., Wiberg, M. & Bastos, R. V. (2024). How to think straight about psychometrics: improving measurement by identifying its assumptions. Trends in Psychology, 32, 786-806
Open this publication in new window or tab >>How to think straight about psychometrics: improving measurement by identifying its assumptions
2024 (English)In: Trends in Psychology, ISSN 2358-1883, Vol. 32, p. 786-806Article in journal (Refereed) Published
Abstract [en]

The aim of the current study is to introduce three assumptions common to psychometric theory and psychometric practice, and to show how alternatives to traditional psychometric approaches can be used to improve psychological measurement. These alternatives are developed by adapting each of these three assumptions. The assumption of structural validity relates to the implementation of mathematical models. The process assumption regards the underlying process generating the observed data. The construct assumption implies that the observed data on its own do not constitute a measurement, but the latent variable that originates the observed data. Nonparametric item response modeling and cognitive psychometric modeling are presented as alternatives for relaxing the first two assumptions, respectively. Network psychometrics is the alternative for relaxing the third assumption. Final remarks sum up the most important conclusions of the study.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Item response theory, Network psychometrics, Psychological measurement, Psychometrics
National Category
Psychology (excluding Applied Psychology) Probability Theory and Statistics
Identifiers
urn:nbn:se:umu:diva-194851 (URN)10.1007/s43076-022-00183-6 (DOI)2-s2.0-85129639278 (Scopus ID)
Available from: 2022-06-07 Created: 2022-06-07 Last updated: 2024-12-05Bibliographically approved
Laukaityte, I. & Wiberg, M. (2024). Impacts of differences in group abilities and anchor test features on three non-IRT test equating methods. Practical Assessment, Research, and Evaluation, 29(5), Article ID 5.
Open this publication in new window or tab >>Impacts of differences in group abilities and anchor test features on three non-IRT test equating methods
2024 (English)In: Practical Assessment, Research, and Evaluation, E-ISSN 1531-7714, Vol. 29, no 5, article id 5Article in journal (Refereed) Published
Abstract [en]

The overall aim was to examine effects of differences in group ability and features of the anchor test formon equating bias and the standard error of equating (SEE) using both real and simulated data. Chainedkernel equating, Postratification kernel equating, and Circle-arc equating were studied. A collegeadmissions test with four different anchor test forms administered at three test administrations was used.The simulation study examined the differences in ability of the test groups, and differences in the anchortest form with respect to item difficulty and discrimination. In the empirical study, the equated valuesfrom the three methods only slightly differed. The simulation study indicated that an easier anchor testform and/or an easier regular test form, and anchor items with a wider spread in difficulty, negativelyaffected the SEE and bias. The ability level of groups was also important. Equating with only less or morecapable groups resulted in high SEEs at higher and lower test scores, respectively. The discussion includespractical recommendations to whom an anchor test should be given if there is a choice and how to selectan anchor test form which have equating as primary purpose.

Place, publisher, year, edition, pages
University of Massachusetts Press, 2024
Keywords
NEAT, chained kernel equating, Postratification kernel equating, Circle-arc equating, admission test, high stakes assessment
National Category
Probability Theory and Statistics Educational Sciences
Identifiers
urn:nbn:se:umu:diva-221929 (URN)10.7275/pare.2020 (DOI)2-s2.0-85208327525 (Scopus ID)
Funder
Wallenberg Foundations, 2019.0129
Available from: 2024-03-11 Created: 2024-03-11 Last updated: 2025-02-24Bibliographically approved
Heister, H. H., Casper, A. J., Wiberg, M. & Timmerman, M. E. (2024). Item response theory-based continuous norming. Psychological methods
Open this publication in new window or tab >>Item response theory-based continuous norming
2024 (English)In: Psychological methods, ISSN 1082-989X, E-ISSN 1939-1463Article in journal (Refereed) Epub ahead of print
Abstract [en]

In norm-referenced psychological testing, an individual’s performance is expressed in relation to a reference population using a standardized score, like an intelligence quotient score. The reference population can depend on a continuous variable, like age. Current continuous norming methods transform the raw score into an age-dependent standardized score. Such methods have the shortcoming to solely rely on the raw test scores, ignoring valuable information from individual item responses. Instead of modeling the raw test scores, we propose modeling the item scores with a Bayesian two-parameter logistic (2PL) item response theory model with age-dependent mean and variance of the latent trait distribution, 2PL-norm for short. Norms are then derived using the estimated latent trait score and the age-dependent distribution parameters. Simulations show that 2PL-norms are overall more accurate than those from the most popular raw score-based norming methods cNORM and generalized additive models for location, scale, and shape (GAMLSS). Furthermore, the credible intervals of 2PL-norm exhibit clearly superior coverage over the confidence intervals of the raw score-based methods. The only issue of 2PL-norm is its slightly lower performance at the tails of the norms. Among the raw score-based norming methods, GAMLSS outperforms cNORM. For empirical practice this suggests the use of 2PL-norm, if the model assumptions hold. If not, or the interest is solely in the point estimates of the extreme trait positions, GAMLSS-based norming is a better alternative. The use of the 2PL-norm is illustrated and compared with GAMLSS and cNORM using empirical data, and code is provided, so that users can readily apply 2PL-norm to their normative data.

Place, publisher, year, edition, pages
American Psychological Association (APA), 2024
Keywords
two-parameter item response theory model, psychological tests, norm-referenced tests
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:umu:diva-231255 (URN)10.1037/met0000686 (DOI)001369790600001 ()2-s2.0-85206668111 (Scopus ID)
Funder
Swedish Research Council, 2022-02046
Available from: 2024-10-29 Created: 2024-10-29 Last updated: 2025-04-24Bibliographically approved
Wiberg, M., Kim, J.-S., Hwang, H., Wu, H. & Sweet, T. (2024). Preface. In: Heungsun Hwang; Hao Wu; Tracy Sweet (Ed.), Quantitative psychology: (pp. v-v). Springer Nature
Open this publication in new window or tab >>Preface
Show others...
2024 (English)In: Quantitative psychology / [ed] Heungsun Hwang; Hao Wu; Tracy Sweet, Springer Nature, 2024, p. v-vChapter in book (Other academic)
Place, publisher, year, edition, pages
Springer Nature, 2024
Series
Springer Proceedings in Mathematics & Statistics, ISSN 2194-1009, E-ISSN 2194-1017 ; 452
National Category
Mathematics
Identifiers
urn:nbn:se:umu:diva-228113 (URN)2-s2.0-85199500199 (Scopus ID)9783031555473 (ISBN)9783031555480 (ISBN)
Note

This book includes presentations given at the 88th annual meeting of the Psychometric Society, held in Maryland, USA on July 24–28, 2023.

Available from: 2024-08-01 Created: 2024-08-01 Last updated: 2024-08-01Bibliographically approved
Projects
Analysis and modelling of Swedish students´ performance in TIMSS and PISA in an international perspective [2008-04027_VR]; Umeå UniversityNew statistical methods to ascertain high quality over time in standardized achievement tests [2014-00578_VR]; Umeå UniversityInnovative methods to compare standardized achievement tests over time [2019-03493_VR]; Umeå University
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-5549-8262

Search in DiVA

Show all publications