umu.sePublications
Change search
Refine search result
12 1 - 50 of 89
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Adolfsson, Lena
    et al.
    Umeå University, Faculty of Science and Technology, Department of Science and Mathematics Education.
    Benckert, Sylvia
    Umeå University, Faculty of Science and Technology, Department of Physics.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Statistics.
    Gapet har minskat: skillnader mellan hög- och lågpresterande flickors och pojkars attityder till biologi, fysik och kemi 1995 och 20072011In: NorDiNa: Nordic Studies in Science Education, ISSN 1504-4556, E-ISSN 1894-1257, Vol. 7, no 1, p. 3-16Article in journal (Refereed)
    Abstract [en]

    This article explores the change over time of boys’ and girls’ attitudes towards biology, physics and chemistry. We use data from the TIMSS studies for grade 8 in Sweden to investigate how the attitudes for high- and low performing pupils have changed between 1995 and 2007. The attitude is measured by four questions from the student questionnaire in the TIMSS study. The results indicate that there have been some changes in attitudes between 1995 and 2007. High-achieving pupils and especially boys have a more negative attitude towards all three subjects, biology, physics and chemistry, in 2007 compared to 1995. The low-achieving students think that they are performing better in all three subjects 2007 compared to 1995. The difference between the group that are most positive to physics and chemistry and the least positive group has diminished between the two years. The results are discussed in relation to the changes in Swedish schools during the period.

  • 2.
    Andersson, Björn
    et al.
    Uppsala universitet.
    Bränberg, Kenny
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    kequate: The kernel method of test equating. R package version 1.1.02012Other (Other academic)
    Abstract [en]

    Implements the kernel method of test equating using the CB, EG, SG, NEAT CE/PSE and NEC designs, supporting gaussian,logistic and uniform kernels and unsmoothed and pre-smoothed input data.

  • 3.
    Andersson, Björn
    et al.
    Uppsala universitet.
    Bränberg, Kenny
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Performing the Kernel Method of Test Equating with the Package kequate2013In: Journal of Statistical Software, ISSN 1548-7660, E-ISSN 1548-7660, Vol. 55, no 6, p. 1-25Article in journal (Refereed)
    Abstract [en]

    In standardized testing it is important to equate tests in order to ensure that the test takers, regardless of the test version given, obtain a fair test. Recently, the kernel method of test equating, which is a conjoint framework of test equating, has gained popularity. The kernel method of test equating includes five steps: (1) pre-smoothing, (2) estimation of the score probabilities, (3) continuization, (4) equating, and (5) computing the standard error of equating and the standard error of equating difference. Here, an implementation has been made for six different equating designs: equivalent groups, single group, counter balanced, non-equivalent groups with anchor test using either chain equating or post- stratification equating, and non-equivalent groups using covariates. An R package for the kernel method of test equating called kequate is presented. Included in the package are also diagnostic tools aiding in the search for a proper log-linear model in the pre-smoothing step for use in conjunction with the R function glm.

  • 4.
    Andersson, Björn
    et al.
    Beijing Normal University.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Item response theory observed-score kernel equating2017In: Psychometrika, ISSN 0033-3123, E-ISSN 1860-0980, Vol. 82, no 1, p. 48-66Article in journal (Refereed)
    Abstract [en]

    Item response theory (IRT) observed-score kernel equating is introduced for the non-equivalent groups with anchor test equating design using either chain equating or post-stratification equating. The equating function is treated in a multivariate setting and the asymptotic covariance matrices of IRT observed-score kernel equating functions are derived. Equating is conducted using the two-parameter and three-parameter logistic models with simulated data and data from a standardized achievement test. The results show that IRT observed-score kernel equating offers small standard errors and low equating bias under most settings considered.

  • 5.
    Bränberg, Kenny
    et al.
    Umeå University, Faculty of Social Sciences, Department of Statistics.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Statistics.
    Observed score linear equating with covariates2011In: Journal of educational measurement, ISSN 0022-0655, E-ISSN 1745-3984, Vol. 48, no 4, p. 419-440Article in journal (Refereed)
    Abstract [en]

    This paper examined observed score linear equating in two different data collection designs, the equivalent groups design and the nonequivalent groups design, when information from covariates (i.e. background variables correlated with the test scores) was included. The main purpose of the study was to examine the effect (i.e. bias, variance, and mean square error) on the estimators of including this additional information. A model for observed-score linear equating with covariates first was suggested. As a second step, the model was used in a simulation study to show that the use of covariates such as gender and education can increase the accuracy of an equating by reducing the mean squared error of the estimators. Finally, data from two administrations of the Swedish Scholastic Assessment Test were used to illustrate the use of the model.

  • 6.
    Bränberg, Kenny
    et al.
    Umeå University, Faculty of Social Sciences, Department of Statistics.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Statistics.
    The effect on equating of using background variablesManuscript (preprint) (Other academic)
    Abstract [en]

    In this paper observed score linear equating with two different data collection designs, the equivalent groups design and the non-equivalent groups design, is examined when including information from background variables. The purpose of the study is to examine the effect (i.e., bias, variance and mean squared error) on the estimators of including this additional information. In a simulation study, we show that the use of background variables, such as gender and education, can increase the accuracy of an equating by reducing the mean squared error (MSE) of the estimators. 

  • 7.
    Carelli, Grazia
    et al.
    Umeå University, Faculty of Social Sciences, Department of Psychology.
    Wiberg, Britt
    Umeå University, Faculty of Social Sciences, Department of Psychology.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Statistics.
    Development and Construct Validation of the Swedish Zimbardo Time Perspective Inventory (S-ZTPI).2011In: European Journal of Psychological Assessment, ISSN 1015-5759, E-ISSN 2151-2426, Vol. 27, no 4, p. 220-227Article in journal (Refereed)
    Abstract [en]

    In this study, we developed and evaluated a Swedish version of the Zimbardo Time Perspective Inventory (ZTPI;Zimbardo & Boyd, 1999). The original version of the ZTPI was extended by including a Future Negative scale, and the psychometric properties of both versions were examined in a sample of 419 adults aged between 18 and 80 years. Confirmatory factor analysis (CFA) provided support both for the original five-factor solution proposed byZimbardo and Boyd (1999) in a Swedish sample and for a six-factor solution with the Future Negative scale as an independent factor. These findings extend the original ZTPI and suggest that negative feelings about the future constitute a central dimension of the temporal perspective. The Swedish Zimbardo Time Perspective Inventory (S-ZTPI) provides a reliable and valid instrument for measuring time perspective in the context of Swedish research and to be beneficial in its application in multiple areas of psychology and related disciplines.

  • 8.
    Fahlén, Jessica
    et al.
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    KU-dag som möjlighet till erfarenhetsutbyte2016Conference paper (Refereed)
    Abstract [sv]

    Många lärare på högskolor och universitet har kompetensutveckling (KU) i sin tjänst vilket är reglerade i lokala avtal. Enligt Umeå universitets avtal innebär det att alla universitetslektorer ska under en treårs period ges utrymme om minst 20 procent. Vidare framgår att avrapporteringen av genomförda uppgifter ska ske på lämpligt sätt. Avstämning och revidering av lärarens kompetensutvecklingsplan sker vanligtvis på de flesta institutionerna/enheterna vid återkommande utvecklingssamtal. På vår Enhet har det vanliga varit har lärarna har genomfört sin KU utifrån sin plan och rapporterat en gång per år vid utvecklingssamtalet. Vid detta tillfälle har även planen reviderats. Tyvärr sker då inte ett erfarenhetsutbyte mellan kollegor utan enbart en avrapportering mot närmaste chef. För att förändra det här införde vi för tre år sedan på försök en KU-dag. Inspirationen till en KU-dag kom från Enhetens doktoranddag som ges en gång per år och där alla doktorander presenterar vad de har gjort det gångna året samt blickar framåt vad de ska göra det kommande året. Idén med en KU-dag är att alla som har KU-tid i sin bemanning presenterar vad de har gjort på sin tid samt en kort framåtblick på 10-15 minuter. För att engagera fler personer att delta så uppmanade vi även alla projektledare med externa medel att presentera vad de gör på sin forskningstid. Vårt huvudsakliga syfte med KU-dagen var att synliggöra vad alla gör samt att öppna upp för att lära oss av varandra men också för att öka medvetenheten om vad alla gör på Enheten. Erfarenheterna efter tre år med en årlig KU-dag har varit många. Första året så upplevde vi en viss osäkerhet bland deltagarna. Vad leder egentligen införandet av en KU-dag till? Är den till för att kontrollera ens arbete eller kan man lära sig något? Redan andra året så uttryckte flera av deltagarna att de ville delta och det pratades om det i positiva termer både inför och efter genomförandet. Detta beror troligtvis på att vi arbetat aktivt med att det inte ska kännas som en kontroll utan mer till för att dela erfarenheter. Nu efter att ha genomfört det ett tredje år så ses det som ett självklart inslag i vår verksamhet. Lärdomar vi har dragit under de här åren inkluderar betydelsen av hur man lanserar KU-dagen, när på terminen den ges för att säkerställa att så många som möjligt kan delta, samt hur värdefullt det är att vara medveten om vad andra gör för att skapa nya projekt och ett inkluderande arbetsklimat på Enheten.

  • 9. Gonzaléz, Jorge
    et al.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    The use of Poisson-Binomial distribution in equating test scores2015Conference paper (Refereed)
  • 10. González, Jorge
    et al.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Applying test equating methods using R2017Book (Refereed)
    Abstract [en]

    This book describes how to use test equating methods in practice. The non-commercial software R is used throughout the book to illustrate how to perform different equating methods when scores data are collected under different data collection designs, such as equivalent groups design, single group design, counterbalanced design and non equivalent groups with anchor test design. The R packages equate, kequate and SNSequate, among others, are used to practically illustrate the different methods, while simulated and real data sets illustrate how the methods are conducted with the program R. The book covers traditional equating methods including, mean and linear equating, frequency estimation equating and chain equating, as well as modern equating methods such as kernel equating, local equating and combinations of these. It also offers chapters on observed and true score item response theory equating and discusses recent developments within the equating field. More specifically it covers the issue of including covariates within the equating process, the use of different kernels and ways of selecting bandwidths in kernel equating, and the Bayesian nonparametric estimation of equating functions. It also illustrates how to evaluate equating in practice using simulation and different equating specific measures such as the standard error of equating, percent relative error, different that matters and others.

  • 11. González, Jorge
    et al.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    von Davier, Alina A.
    A note on the Poisson's binomial distribution in Item Response Theory2016In: Applied psychological measurement, ISSN 0146-6216, E-ISSN 1552-3497, Vol. 40, no 2, p. 302-310Article in journal (Refereed)
    Abstract [en]

    The Poisson's binomial (PB) is the probability distribution of the number of successes in independent but not necessarily identically distributed binary trials. The independent non-identically distributed case emerges naturally in the field of item response theory, where answers to a set of binary items are conditionally independent given the level of ability, but with different probabilities of success. In many applications, the number of successes represents the score obtained by individuals, and the compound binomial (CB) distribution has been used to obtain score probabilities. It is shown here that the PB and the CB distributions lead to equivalent probabilities. Furthermore, one of the proposed algorithms to calculate the PB probabilities coincides exactly with the well-known Lord and Wingersky (LW) algorithm for CBs. Surprisingly, we could not find any reference in the psychometric literature pointing to this equivalence. In a simulation study, different methods to calculate the PB distribution are compared with the LW algorithm. Providing an exact alternative to the traditional LWapproximation for obtaining score distributions is a contribution to the field.

  • 12.
    Henriksson, Widar
    et al.
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement.
    Stenlund, TovaUmeå University, Faculty of Social Sciences, Department of Educational Measurement.Sundström, AnnaUmeå University, Faculty of Social Sciences, Department of Educational Measurement.Wiberg, MarieUmeå University, Faculty of Social Sciences, Department of Educational Measurement. Umeå University, Faculty of Social Sciences, Department of Statistics.
    Proceedings from the conference: The GDE-model as a guide in driver training and testing: Umeå, May 7-8, 20072007Conference proceedings (editor) (Other (popular science, discussion, etc.))
  • 13.
    Henriksson, Widar
    et al.
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement.
    Stenlund, Tova
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement.
    Sundström, Anna
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement. Umeå University, Faculty of Social Sciences, Department of Statistics.
    The GDE-model as a guide in driver training and testing.: Proceedings from the conference, Umeå, May 7-8, 20072007Report (Other academic)
  • 14.
    Henriksson, Widar
    et al.
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement.
    Sundström, Anna
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement. Umeå University, Faculty of Social Sciences, Department of Statistics.
    The Swedish driving-license test: A summary of studies from the department of educational measurement2004Report (Other academic)
    Abstract [en]

    Since 1990, the Department of Educational Measurement at Umeå University has been commissioned to study the Swedish drivinglicense test by the Swedish National Road Administration, SNRA. Over the past few years several studies have been conducted in order to develop and improve the Swedish driving-license test. The focus of the majority of the studies has been the theory test.

    The aims of this paper were threefold: firstly to describe the development of the driver education and the driving-license test in Sweden during the past century; secondly, to summarize the findings of our research, which is related to important issues in test development; and finally, to make some suggestions for further research.

  • 15.
    Häggström, Jenny
    et al.
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Optimal Bandwidth Selection in Observed-Score Kernel Equating2014In: Journal of educational measurement, ISSN 0022-0655, E-ISSN 1745-3984, Vol. 51, no 2, p. 201-211Article in journal (Refereed)
    Abstract [en]

    The selection of bandwidth in kernel equating is important because it has a direct impact on the equated test scores. The aim of this article is to examine the use of double smoothing when selecting bandwidths in kernel equating and to compare double smoothing with the commonly used penalty method. This comparison was made using both an equivalent groups design and a nonequivalent group with anchor test design. The performance of the methods was evaluated through simulation studies using both symmetric and skewed score distributions. In addition, the bandwidth selection methods were applied to real data from a college admissions test. The results show that the traditional penalty method works well although double smoothing is a viable alternative because it performs reasonably well compared to the traditional method.

  • 16. L. van der Ark, Andries
    et al.
    Wiberg, MarieUmeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.Culpepper, Steven A.Douglas, Jeffrey A.Wang, Wen-Chung
    Quantitative psychology. The 81st annual meeting of the Psychometric society, Asheville, North Carolina, 20162017Conference proceedings (editor) (Refereed)
  • 17.
    Laukaityte, Inga
    et al.
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Importance of sampling weights in multilevel modeling of international large-scale assessment data2018In: Communications in Statistics - Theory and Methods, ISSN 0361-0926, E-ISSN 1532-415X, Vol. 47, no 20, p. 4991-5012Article in journal (Refereed)
    Abstract [en]

    Multilevel modeling is an important tool for analyzing large-scale assessment data. However, the standard multilevel modeling will typically give biased results for such complex survey data. This bias can be eliminated by introducing design weights which must be used carefully as they can affect the results. The aim of this paper is to examine different approaches and to give recommendations concerning handling design weights in multilevel models when analyzing large-scale assessments such as TIMSS (The Trends in International Mathematics and Science Study). To achieve the goal of the paper, we examined real data from two countries and included a simulation study. The analyses in the empirical study showed that using no weights or only level 1 weights sometimes could lead to misleading conclusions. The simulation study only showed small differences in estimation of the weighted and unweighted models when informative design weights were used. The use of unscaled or not rescaled weights however caused significant differences in some parameter estimates.

  • 18.
    Laukaityte, Inga
    et al.
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Using plausible values in secondary analysis in large–scale assessments2017In: Communications in Statistics - Theory and Methods, ISSN 0361-0926, E-ISSN 1532-415X, Vol. 46, no 22, p. 11341-11357Article in journal (Refereed)
    Abstract [en]

    Plausible values are typically used in large–scale assessment studies, in particular in the Trends in International Mathematics and Science Study and the Programme for International Student Assessment. Despite its large spread there are still some questions regarding the use of plausible values and how such use affects statistical analyses. The aim of this paper is to demonstrate the role of plausible values in large–scale assessment surveys when multilevel modelling is used. Different user strategies concerning plausible values for multilevel models as well as means and variances are examined. The results show that some commonly used user strategies give incorrect results while others give reasonable estimates but incorrect standard errors. These findings are important for anyone wishing to make secondary analyses of large–scale assessment data, especially those interested in using multilevel models to analyze the data.

  • 19. Leôncio, Waldir
    et al.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Evaluating equating transformations from different frameworks2018In: Quantitative psychology: the 82nd annual meeting of the Psychometric Society, Zurich, Switzerland, 2017 / [ed] Marie Wiberg, Steven Culpepper, Rianne Janssen, Jorge González, Dylan Molenaar, Cham, Switzerland: Springer, 2018, p. 101-110Chapter in book (Refereed)
    Abstract [en]

    Test equating is used to ensure that test scores from different test forms can be used interchangeably. This paper aims to compare the statistical and computational properties from three equating frameworks: item response theory observed-score equating (IRTOSE), kernel equating and kernel IRTOSE. The real data applications suggest that IRT-based frameworks tend to providemore stable and accurate results than kernel equating. Nonetheless, kernel equating can provide satisfactory results if we can find a good model for the data, while also being much faster than the IRT-based frameworks. Our general recommendation is to try all methods and examine how much the equated scores change, always ensuring that the assumptions are met and that a good model for the data can be found.

  • 20. Li, Juan
    et al.
    Ramsay, James O.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    TestGardener: a program for optimal scoring and graphical analysis2019In: Quantitative psychology: 83rd annual meeting of the psychometric society, New York, NY 2018 / [ed] Marie Wiberg, Steven Culpepper, Rianne Janssen, Jorge González, Dylan Molenaar, D., New York: Springer, 2019, p. 87-94Chapter in book (Refereed)
    Abstract [en]

    The aim of this paper is to demonstrate how to use TestGardener to analyze testing data with various item types and explain some main displays. TestGardener is a software designed to aid the development, evaluation, and use of multiple choice examinations, psychological scales, questionnaires, and similar types of data. This software implements the optimal scoring of binary and multi-option items, and uses spline smoothing to obtain item characteristics curves (ICCs) that better fit the real data. Using TestGardner does not require any programming skill or formal statistical knowledge, which will make optimal scoring and item response theory more approachable for test analysts, test developers, researchers, and general public.

  • 21. Lindvall, Jannika
    et al.
    Helenius, Ola
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Critical features of professional development programs: Comparing content focus and impact of two large-scale programs2018In: Teaching and Teacher Education: An International Journal of Research and Studies, ISSN 0742-051X, E-ISSN 1879-2480, Vol. 70, p. 121-131Article in journal (Refereed)
    Abstract [en]

    By comparing two large-scale professional development programs' content and impact on student achievement, we contribute to research on critical features of high quality professional development, especially content focus. Even though the programs are conducted in the same context and are highly similar if characterized according to established research frameworks, our results suggest that they differ in their impact on student achievement. We therefore develop an analytical framework that allow us to characterize the programs' content and delivery in detail. Through this approach, we identify important differences between the programs that provide explanatory value in discussing reasons for their differing impacts.

  • 22.
    Pettersson, Lennart
    et al.
    Umeå University, Faculty of Arts, History and Theory of Art.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Statistics.
    Sålt och lottat: en studie av privatinköp och lotterier i Norrlands konstförening 1883 och 18842006Report (Other academic)
  • 23.
    Pettersson, Lennart
    et al.
    Umeå University, Faculty of Arts, History and Theory of Art.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Statistics.
    Sålt och lottat: en studie av privatinköp och lotterier i Norrlands konstförening 1883 och 18842005Book (Other academic)
  • 24. Ramsay, James O.
    et al.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    A Strategy for Replacing Sum Scoring2017In: Journal of educational and behavioral statistics, ISSN 1076-9986, E-ISSN 1935-1054, Vol. 42, no 3, p. 282-307Article in journal (Refereed)
    Abstract [en]

    This article promotes the use of modern test theory in testing situations where sum scores for binary responses are now used. It directly compares the efficiencies and biases of classical and modern test analyses and finds an improvement in the root mean squared error of ability estimates of about 5% for two designed multiple-choice tests and about 12% for a classroom test. A new parametric density function for ability estimates, the tilted scaled , is used to resolve the nonidentifiability of the univariate test theory model. Item characteristic curves (ICCs) are represented as basis function expansions of their log-odds transforms. A parameter cascading method along with roughness penalties is used to estimate the corresponding log odds of the ICCs and is demonstrated to be sufficiently computationally efficient that it can support the analysis of large data sets.

  • 25. Ramsay, James O.
    et al.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Breaking through the sum scoring barrier2017In: Quantitative Psychology: The 81st annual meeting of the psychometric society, Asheville, North Carolina, 2016 / [ed] L. Andries van der Ark, Marie Wiberg, Steven A. Culpepper, Jeffrey A. Douglas, Wen-Chung Wang, Cham: Springer, 2017, p. 151-158Conference paper (Refereed)
    Abstract [en]

    The aim of this paper is to reflect around what would be needed in order to replace sum scoring, including technical advances, communivation with both test constructors and examinees, and organizational strategy. Sum scoring are proposed to be replaces by smart scoring and a brief description, and some theoretical support for smart scoring and methods for achieving it are given together with an example from a large-scale assessment test.

  • 26.
    Rolfsman, Ewa
    et al.
    Umeå University, Faculty of Social Sciences, Department of applied educational science.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Laukaityte, Inga
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    School effectiveness in the Nordic countries in relation to PISA and TIMSS2013Conference paper (Refereed)
    Abstract [en]

    In a Nordic perspective, the finish students overall achieve the highest score on PISA (Programme for International Student Assessment), while the Swedish students exhibit declining results. The results of the Swedish students have drawn attention to the quality of education and the role of the educational professionals and the efficiency of the school. It is therefore of vital importance to investigate whether these results can be related to school level factors in a Nordic perspective. However, TIMSS (Trends in International Mathematics and Science Study) and PISA exhibit similarities as well as differences as they target different subjects. In addition, the results on TIMSS and PISA differ between countries. The aim of this study is to investigate whether school level factors can contribute to the explanation of the results for the Nordic countries participating in PISA 2009 and, if so, identify factors that can be influenced in order to enhance students’ achievement. We focus on school effectiveness in relation to PISA, since all Nordic countries participate in PISA. However, the results are contrasted to results from TIMSS for Sweden and Norway. In order to separate the effect of school level variables from the effect of student’s home environment and to take care of the sampling design used in TIMSS and PISA, multilevel analysis was used. The results show that only a few school level factors were significant, and only in Sweden and Finland. Furthermore, school level factors in Sweden and Norway on PISA differ from school level factors based on TIMSS data.

  • 27. Sansivieri, Valentina
    et al.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    IRT observed-score equating with the non-equivalent groups with covariates design2017In: Quantitative Psychology: The 81st Annual Meeting of the Psychometric Society, Asheville, North Carolina, 2016 / [ed] L. Andries van der Ark, Marie Wiberg, Steven A. Culpepper, Jeffrey A. Douglas, Wen-Chung Wang, Cham: Springer, 2017, p. 275-285Conference paper (Refereed)
    Abstract [en]

    Nonequivalent groups with anchor test (NEAT) design is typically preferred in test score equating, but there are tests which do not administer an anchor test. If the groups are nonequivalent, an equivalent groups (EG) design cannot be recommended. Instead, one can use a nonequivalent groups with covariates (NEC) design. The overall aim of this work was to propose the use of item response theory (IRT) with a NEC design by incorporating the mixed-measurement IRT with covariates model within IRT observed-score equating in order to model both test scores and covariates. Both simulations and a real test example are used to examine the proposed test equating method in comparison with traditional IRT observed-score equating methods with an EG design and a NEAT design. The results show that the proposed method can be used in practice, and the simulations show that the standard errors of the equating are lower with the proposed method as compared with traditional methods.

  • 28. Sansivieri, Valentina
    et al.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Linking Scales in Item Response Theory with Covariates2018In: Journal of Research in Education, Science and Technology, ISSN 2548-0286, Vol. 3, no 2, p. 12-32Article in journal (Refereed)
    Abstract [en]

    When test forms are administered to different non-equivalent groups of examinees and are scored by item response theory (IRT), it is necessary to put item parameters estimated separately on two groups on the same scale. In the IRT models which include covariates about the examinees, we have two parameters which model uniform and non-uniform differential item functioning (DIF) and that have to be put on the same scale. The aim of this study is to propose conversion equations, which are used to put the uniform and non-uniform DIF parameters on the same scale. To estimate the coefficients of the conversion equations we will use four methods: mean/mean, mean/sigma, Haebara and Stocking-Lord. We give a simulation study and an empirical example. The results of the simulation study show that the coefficients of the conversion equations are substantially equal for the Haebara and Stocking-Lord methods, while they are different for the other methods. The results of the empirical example is that IRT with covariates produces a more informative test than using IRT without covariates for high abilities’ values and, when the mean-mean and the mean-sigma methods are used, we obtain more informative tests than when using concurrent calibration.

  • 29. Sansivieri, Valentina
    et al.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Matteucci, Mariagiulia
    A review of test equating methods with a special focus on IRT-based approaches2017In: Statistica, ISSN 1973-2201, Vol. 77, no 4, p. 329-352Article, review/survey (Refereed)
  • 30.
    Sundström, Anna
    et al.
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement.
    Henriksson, Widar
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement. Umeå University, Faculty of Social Sciences, Department of Statistics.
    Alger, Susanne
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement.
    "Ett givet inslag i förarutbildningen": Umeå universitet försvarar självvärderingen2008In: Mitt i trafiken, no 1, p. 33-33Article in journal (Other (popular science, discussion, etc.))
    Abstract [en]

    Sammanfattningsvis framstår det som en självklarhet att självvärdering ska vara ett inslag i såväl förarutbildning som förar­prov eftersom det utgör en väsentlig del i den nya kursplanen. Det står också klart att många vinster, både för lärandet och tra­fiksäkerheten, kan göras genom att införa självvärdering i utbildningen. Nästa steg är att diskutera hur detta ska genomföras i praktiken genom att finna sätt för hur självvärdering ska användas både i utbild­ning och i förarprov.

  • 31.
    Sundström, Anna
    et al.
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Statistics. Umeå University, Faculty of Social Sciences, Department of Educational Measurement.
    Den svenska förarprövningens resultat: sambandet mellan kunskapsprovet och körprovet för underkända och godkända provtagare2005Report (Other academic)
    Abstract [sv]

    The Swedish driving-license test consists of a theory test and a practical test. The purpose of these tests is to investigate if the learner driver has the knowledge and abilities stated in the curriculum. The purpose of this study was threefold. Firstly, the purpose was to examine the relationship between the theory test and the practical test in a sample where testtakers that both passed and failed the theory test is included. Secondly, the purpose was to study the structure of the test score and the performance of the test-takers with regard to age, gender and driver education. Thirdly, the purpose was to investigate the relationship between self-assessed performance and test performance.

    The results on the theory test were similar compared to previous studies which indicated that the sample used was representative for the population of test-takers. However, many test-takers repeat the test several times and the percentage of test-takers taking the test for the first time has decreased. The results on the practical test in this study showed that pass rates have decreased compared to previous studies. One possible explanation for this is that the pass-rates are affected by the fact that test-takers who failed the theory test are included in the sample, and thus the test-takers limited theoretical knowledge is reflected in the decrease in pass-rates. When the relationship between the tests was examined the results indicated that the correlation was stronger than in previous studies. Moreover, the results showed that students from traffic school performed better on the theory test compared to private learners. The results on the practical test showed that students from traffic school, and those who combined professional education with private driver training, performed better than private learners. With regard to self-assessed performance, results indicated a relationship between performance on the theory test and self-evaluation. Testtakers performing high on the test rated their performance high and vice versa.

    The main conclusions of the study was that there is a relationship between theory and practice, in that sense that those performing well on the theory test perform better on the practical test compared to those performing less well on the theory test. Moreover, students from traffic school perform better on both the theory test and the practical test compared to private learners.

  • 32. van der Ark, Andries
    et al.
    Bolt, Daniel MWang, Wen-ChungDouglas, Jeffrey A.Wiberg, MarieUmeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Quantitative Psychology Research: The 80th Annual Meeting of the Psychometric Society, Beijing, 20152016Conference proceedings (editor) (Refereed)
  • 33.
    van der Linden, Wim J.
    et al.
    CTB/McGraw-Hill, Monterey, California.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Statistics.
    Local observed-score equating with anchor-test designs2010In: Applied psychological measurement, ISSN 0146-6216, E-ISSN 1552-3497, Vol. 34, no 8, p. 620-640Article in journal (Refereed)
    Abstract [en]

    For traditional methods of observed-score equating with anchor-test designs, such as chain and poststratification equating, it is difficult to satisfy the criteria of equity and population invariance. Their equatings are therefore likely to be biased. The bias in these methods was evaluated against a simple local equating method in which the anchor-test score was used as a proxy of the proficiency measured by the test and the equating was conditional on this score. The results showed substantial bias for the two traditional methods under a variety of conditions but much smaller bias for the local method. In addition, unlike the traditional methods, the local method appeared to be quite robust with respect to changes in the difficulty and accuracy of the two tests that were equated. But like these methods, it appeared to be sensitive to a decrease in the accuracy of the anchor test as a proxy of the ability measured by the tests.

  • 34.
    Wallin, Gabriel
    et al.
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Häggström, Jenny
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    How to select the bandwidth in kernel equating: an evaluation of five different methods2018In: Quantitative psychology: the 82nd annual meeting of the Psychometric Society, Zurich, Switzerland, 2017 / [ed] Marie Wiberg, Steven Culpepper, Rianne Janssen, Jorge González, Dylan Molenaar, Cham, Switzerland: Springer, 2018, p. 91-100Chapter in book (Refereed)
    Abstract [en]

    When using kernel equating to equate two test forms, a bandwidth needs to be selected. The bandwidth parameter determines the smoothness of the continuized score distributions and has been shown to have a large effect on the kernel density estimate. There are a number of suggested criteria for selecting the bandwidth, and currently four of them have been implemented in kernel equating. In this paper, all four of the existing bandwidth selectors suggested for kernel equating are evaluated and compared against each other using real test data together with a new criterion that implements leave-one-out cross-validation. Although the bandwidth methods generally were similar in terms of equated scores, there were potentially important differences in the upper part of the score scale where critical admission decisions are typically made.

  • 35.
    Wallin, Gabriel
    et al.
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Kernel Equating Using Propensity Scores for Nonequivalent Groups2019In: Journal of educational and behavioral statistics, ISSN 1076-9986, E-ISSN 1935-1054Article in journal (Refereed)
    Abstract [en]

    When equating two test forms, the equated scores will be biased if the test groups differ in ability. To adjust for the ability imbalance between nonequivalent groups, a set of common items is often used. When no common items are available, it has been suggested to use covariates correlated with the test scores instead. In this article, we reduce the covariates to a propensity score and equate the test forms with respect to this score. The propensity score is incorporated within the kernel equating framework using poststratification and chained equating. The methods are evaluated using real college admissions test data and through a simulation study. The results show that propensity scores give an increased equating precision in comparison with the equivalent groups design and a smaller mean squared error than by using the covariates directly. Practical implications are also discussed.

  • 36.
    Wallin, Gabriel
    et al.
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Nonequivalent groups with covariates design using propensity scores for kernel equating2017In: Quantitative Psychology: The 81st Annual Meeting of the Psychometric Society, Asheville, North Carolina, 2016 / [ed] van der Ark L., Wiberg M., Culpepper S., Douglas J., Wang WC., Cham: Springer, 2017, p. 309-319Conference paper (Refereed)
    Abstract [en]

    In test score equating, the non-equivalent groups with covariates (NEC) design uses covariates with high correlation to the test scores as a substitute for an anchor test when the latter is lacking. However, as the number of covariates increases, the number of observations for each covariate combination decreases. We suggest to use propensity scores instead, which we include in the kernel equating framework using both post-stratification and chained equating. The two approaches are illustrated with data from a large scale assessment, and the results show an increased precision in comparison with the equivalent groups design, and great similarities in comparison with the results when using an anchor test.

  • 37.
    Wiberg, Britt
    et al.
    Umeå University, Faculty of Social Sciences, Department of Psychology.
    Sircova, Anna
    DIS Copenhagen.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Carelli, Maria Grazia
    Umeå University, Faculty of Social Sciences, Department of Psychology.
    Balanced time perspective: developing empirical profile and exploring its stability over time2017In: Time perspective: theory and practice / [ed] Aleksandra Kostić, Derek Chadee, London: Palgrave Macmillan, 2017, p. 63-95Chapter in book (Refereed)
    Abstract [en]

    Balanced time perspective (BTP) is characterized by flexible switching between a person's past, present and future time orientations, depending on situational demands, personal resources, experiences, and social evaluations. The present study aimed to explore the psychological characteristics of people with a BTP profile and attain a deeper understanding of the BTP construct. Seven people with BTP profiles were investigated using in-depth interviews, self-report instruments, and a projective test. By testing the participants on two occasions within an 18-month interval, we investigated the stability of BTP. Analyses showed that participants were aware of the "now" and had a synchronicity between the present and the past, and also between the present and the future. Results indicated a degree of temporal stability in the BTP profile and that people's interpretations and interactions within the surrounding context of events influences their time perspectives.

  • 38.
    Wiberg, Britt
    et al.
    Umeå University, Faculty of Social Sciences, Department of Psychology.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Carelli, Maria Grazia
    Umeå University, Faculty of Social Sciences, Department of Psychology.
    Sircova, Anna
    Umeå University, Faculty of Social Sciences, Department of Psychology.
    A qualitative and quantitative study of seven persons with balanced time perspective (BTP) according to S-ZTPI2012In: 1st international conference on time perspective and research: converging paths in psychology time theory and research / [ed] Maria Paula Paixao, Victor E.C. Ortuno, Pedro Cordeiro, Rute David, ESPACOBRANCO , 2012, p. 120-120Conference paper (Refereed)
    Abstract [en]

    The theoretical notion of balanced time perspective (BTP) has been suggested by a number of authors and some attempts to operationalize BTP (Boniwell, 2005; Drake, Duncan, Sutherland, Abernethy & Henry, 2008; Sircova & Mitina, 2008; Boniwell, Osin, Linley & Ivanchenko, 2010) has been done. The aim of the present study was to get a deeper understanding of the BTP-concept by studying seven BTP-persons with both interviews and self-rating scales (e.g. SCL-90, Life Events scale, Scales of Psychological Well-Being (C. Ryff) and Satisfaction with Life Scale (E.Diener) at two occasions. Swedish Zimbardo Time Perspective Inventory (S-ZTPI) (Carelli, Wiberg & Wiberg, 2011) was administered in order to study the stability and change in BTP-level. The results showed a great stability in the BTP-level (Wiberg, Sircova, Wiberg& Carelli, in press), although a small change was observed. The 14 interviews were analyzed according to Interpretative Phenomenological Analysis (IPA). The result shows a consciousness about the "now" among the participants and a synchronicity between the present and the past and also between the present and the future. The results give strength to a holistic present scale (Zimbardo & Boyd, 2008) and an "extended now".

  • 39.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    A note on equating test scores with covariates2015In: Festschrift in honor of Hans Nyquist on the occasion of his 65th birthday / [ed] Ellinor Fackle-Fornius, Stockholm: Stockholm University, 2015, p. 96-99Chapter in book (Refereed)
  • 40.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Alternative linear item response theory observed-score equating methods2016In: Applied psychological measurement, ISSN 0146-6216, E-ISSN 1552-3497, Vol. 40, no 3, p. 180-199Article in journal (Refereed)
    Abstract [en]

    Item response theory observed-score equating (IRTOSE) is widely used in many testing programs. The aim of this study was to empirically examine three alternative linear IRTOSE methods compared with the traditional IRTOSE method and to discuss these methods in light of previously suggested alternatives. This contribution is both conceptual, by exploring three alternative methods that fit into the current observed-score equating framework, and empirical by comparing the methods through simulations and with real data. The results show that the local linear (kernel) IRTOSE methods yield low bias and low values on loss measures. However, using only a linear IRTOSE method results in excessive bias and cannot be recommended because of the ease with which IRTOSE with full distributions can be performed. An example using real data showed considerable differences in the equated scores with the alternative methods as well as in comparison with the traditional IRTOSE method. Practical considerations are given in the concluding remarks.

  • 41.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Statistics. Umeå University, Faculty of Social Sciences, Department of Educational Measurement.
    An optimal design approach to criterion-referenced computerized testing2003In: Journal of educational and behavioral statistics, ISSN 1076-9986, E-ISSN 1935-1054, Vol. 28, no 2, p. 97-110Article in journal (Refereed)
    Abstract [en]

    A criterion-referenced computerized test is expressed as a statistical hypothesis problem. This admits that it can be studied by using the theory of optimal design. The power function of the statistical test is used as a criterion function when designing the test. A formal proof is provided showing that all items should have the same item characteristics, i.e. items that have high discrimination, low guessing and difficulty near the cut-off score give the most powerful statistical test. An efficiency study shows how many times more items are needed if nonoptimal items are used instead of optimal items in order to get the same power in the test.

  • 42.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Analys och modellering av svenska elevers prestationer i TIMSS och PISA i ett internationellt perspektiv2015Report (Other academic)
  • 43.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Book review of Alina A. von Davier (Ed.) (2011) Statistical Models for Test Equating, Scaling, and Linking2013In: Psychometrika, ISSN 0033-3123, E-ISSN 1860-0980, Vol. 78, no 1, p. 185-187Article, book review (Refereed)
  • 44.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Can a multidimensional test be evaluated with unidimensional item response theory?2012In: Educational Research and Evaluation, ISSN 1380-3611, E-ISSN 1744-4187, Vol. 18, no 4, p. 307-320Article in journal (Refereed)
    Abstract [en]

    The aim of this study was to evaluate possible consequences of using unidimensional item response theory (UIRT) on a multidimensional college admission test. The test consists of 5 subscales and can be divided into two sections, that is, it can be considered both as a unidimensional and a multidimensional test. The test was examined with both UIRT and multidimensional IRT (MIRT). Simulations were used to examine item and ability parameter recovery when UIRT and MIRT models were used. The results obtained from the college admission test showed that although we get a better model fit when using MIRT instead of UIRT, the difference is small if we compare it with using a consecutive UIRT approach. The results from the simulations indicate that if the test only has between-item multidimensionality, it is probably not harmful to use UIRT instead of MIRT models.

  • 45.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Educational Measurement. Statistics.
    Changes in the Swedish driving-license test?: Using the GDE framework2007In: Proceedings from the conference: The GDE-model as a guide in driver training and testing: Umeå, May 7-8, 2007, 2007Conference paper (Other academic)
  • 46.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Statistics. Educational Measurement.
    Classical test theory vs. item response theory: An evaluation of the theory test in the Swedish driving-license test2004Report (Other academic)
    Abstract [en]

    The Swedish driving-license test consists of a theory test and a practical road test. The aim of this paper is to evaluate which Item Response Theory (IRT) model among the one (1PL), two (2PL) and three (3PL) parameter logistic IRT models that is the most suitable to use when evaluating the theory test in the Swedish driving-license test. Further, to compare the chosen IRT model with the indices in Classical Test Theory (CTT). The theory test has 65 multiple-choice items and is criterionreferenced. The evaluation of the models were made by verifying the assumptions that IRT models rely on, examining the expected model features and evaluating how well the models predict actual test results. The overall conclusion from this evaluation is that 3PL model is preferable to use when evaluating the theory test. By comparing the indices from CTT and IRT it was concluded that both give valuable information and should be included in an analysis of the theory test in the Swedish driving-license test.

  • 47.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement. Umeå University, Faculty of Social Sciences, Department of Statistics.
    Datoriseringen av teoriprovet: En beskrivning av effekter utifrån ett antal statistiska indikatorer1999Report (Other academic)
  • 48.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Department of Statistics.
    Differential Item Functioning in Mastery tests: A comparison of three methods using real data2009In: International Journal of Testing, ISSN 1530-5058, E-ISSN 1532-7574, Vol. 9, no 1, p. 41-59Article in journal (Refereed)
    Abstract [en]

    The aim of this study was to examine log linear modelling (LLM) compared with logistic regression (LR) and Mantel-Haenszel (MH) test for detecting Differential Item Functioning (DIF) in a mastery test. The three methods were chosen because they have similar components. The results showed fairly high matching percentages together with high correlations among the methods regarding size of DIF. The MH approach yielded more conservative results than both LR and LLM. LLM and LR were fairly consistent with each other. The LLM has the advantage of dividing the test scores into certain intervals, which is of special interest in mastery tests. This partition of test scores was also tried with LR and MH with different results.

  • 49.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    Ensuring test quality over time by monitoring the equating transformations2017In: Quantitative Psychology: The 81st Annual Meeting of the Psychometric Society, Asheville, North Carolina, 2016 / [ed] L. Andries van der Ark, Marie Wiberg, Steven A. Culpepper, Jeffrey A. Douglas, Wen-Chung Wang, Cham: Springer, 2017, p. 239-251Conference paper (Refereed)
    Abstract [en]

    One important part of ensuring test quality over consecutive test administrations is to make sure that the equating procedure works as intended, especially when the composition of the test taker groups might change over the administrations. The aim of this study was to examine the equating transformations obtained using one or two previous administrations of a college admissions test that is given twice a year. The test has an external anchor, and thus a nonequivalent group with anchor test design is typically used to equate the test, although other data collection designs are possible. This study examined the use of different equating methods with different data collection designs and different braiding plans. The methods included traditional equating methods and (item response theory) Kernel equating methods. We found that different equating methods and different braiding strategies gave somewhat different results, and some reflections on how to proceed in the future are given.

  • 50.
    Wiberg, Marie
    Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
    equateIRT Package in R2018In: Measurement, ISSN 1536-6367, E-ISSN 1536-6359, Vol. 16, no 3, p. 195-202Article in journal (Refereed)
    Abstract [en]

    Equating test scores between different achievement test versions is important to assure comparability between test takers’ scores. As many items are modelled with item response theory (IRT), it makes sense to also equate the test scores with IRT equating methods. The equateIRT package in R provides a set of functions which implements IRT equating methods including newer extensions. This paper summarizes some of the advances in equating with IRT, reviews the equateIRT package, and demonstrates, through two illustrative examples, some of the key features of the package.

12 1 - 50 of 89
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf