Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Theory and validity evidence for a large-scale test for selection to higher education
Umeå universitet, Samhällsvetenskapliga fakulteten, Institutionen för tillämpad utbildningsvetenskap, Beteendevetenskapliga mätningar (BVM).ORCID-id: 0000-0002-8479-9117
2017 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Validity is a crucial part of all forms of measurement, and especially in instruments that are high-stakes to the test takers. The aim of this thesis was to examine theory and validity evidence for a recently revised large-scale instrument used for selection to higher education in Sweden, the Swedish Scholastic Assessment Test (SweSAT), as well as identify threats to its validity. Previous versions of the SweSAT have been intensely studied but when it was revised in 2011, further research was needed to strengthen the validity arguments for the test. The validity approach suggested in the most recent version of the Standards for education and psychological testing, in which the theoretical basis and five sources of validity evidence are the key aspects of validity, was adopted in this thesis.

The four studies that are presented in this thesis focus on different aspects of the SweSAT, including theory, score reporting, item functioning and linking of test forms. These studies examine validity evidence from four of the five sources of validity: evidence based on test content, response processes, internal structure and consequences of testing.

The results from the thesis as a whole show that there is validity evidence that supports some of the validity arguments for the intended interpretations and uses of SweSAT scores, and that there are potential threats to validity that require further attention. Empirical evidence supports the two-dimensional structure of the construct scholastic proficiency, but the construct requires a more thorough definition in order to better examine validity evidence based on content and consequences for test takers. Section scores provide more information about test takers' strengths and weaknesses than what is already provided by the total score and can therefore be reported, but subtest scores do not provide additional information and should not be reported. All four quantitative subtests, as well as the Swedish reading comprehension subtest, are essentially free of differential item functioning (DIF) but there is moderate DIF that could be bias in two of the four verbal subtests. Finally, the equating procedure, although it appears to be appropriate, needs to be examined further in order to determine whether it is the best practice available or not for the SweSAT.

Some of the results in this thesis are specific to the SweSAT because only SweSAT data was used but the design of the studies and the methods that were applied serve as practical examples of validating a test and are therefore likely useful to different populations of people involved in test development, test use and psychometric research.

Suggestions for further research include: (1) a study to create a more clear and elaborate definition of the construct, scholastic proficiency; (2) a large and empirically focused study of subscore value in the SweSAT using repeat test takers and applying Haberman’s method along with recently proposed effect size measures; (3) a cross-validation DIF-study using more recently administered test forms; (4) a study that examines the causes for the recurring score differences between women and men on the SweSAT; and (5) a study that re-examines the best practice for equating the current version of the SweSAT, using simulated data in addition to empirical data.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet , 2017. , s. 51
Serie
Academic dissertations at the department of Educational Measurement, ISSN 1652-9650 ; 10
Nyckelord [en]
SweSAT, validity, theoretical model, score reporting, subscores, DIF, equating, linking
Nyckelord [sv]
Högskoleprovet, validitet, teoretisk modell, rapportering av provpoäng, ekvivalering, länkning
Nationell ämneskategori
Utbildningsvetenskap
Forskningsämne
beteendevetenskapliga mätningar
Identifikatorer
URN: urn:nbn:se:umu:diva-138492ISBN: 978-91-7601-732-6 (tryckt)OAI: oai:DiVA.org:umu-138492DiVA, id: diva2:1135845
Disputation
2017-09-22, Hörsal 1031, Norra beteendevetarhuset, Umeå, 10:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2017-09-01 Skapad: 2017-08-24 Senast uppdaterad: 2018-06-09Bibliografiskt granskad
Delarbeten
1. From aptitude to proficiency: The theory behind the Swedish Scholastic Assessment Test
Öppna denna publikation i ny flik eller fönster >>From aptitude to proficiency: The theory behind the Swedish Scholastic Assessment Test
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

Validity arguments for tests should include both theory and empirical evidence, but no theoretical framework for the Swedish Scholastic Assessment Test (SweSAT) has yet been suggested. The purpose of this study was to, for the first time, formulate and present theoretical models for the original and current SweSAT versions, using a synthesis of information from reports and scientific studies. The study also follows the development of the SweSAT’s construct scholastic aptitude, which later became scholastic proficiency. The findings were that the model of 1977 was theoretically elaborate but with little empirical support for the construct domains. In contrast the 2011 model had much empirical support but the construct was less precisely defined. Both models share the same purpose of aiming to measure what is required to succeed in higher education. Suggestions for future research include more precisely defining the contents of the SweSAT’s current construct scholastic proficiency

Nationell ämneskategori
Utbildningsvetenskap
Identifikatorer
urn:nbn:se:umu:diva-138812 (URN)
Tillgänglig från: 2017-08-31 Skapad: 2017-08-31 Senast uppdaterad: 2018-06-09
2. Methods for Examining the Psychometric Quality of Subscores: A Review and Application
Öppna denna publikation i ny flik eller fönster >>Methods for Examining the Psychometric Quality of Subscores: A Review and Application
2015 (Engelska)Ingår i: Practical Assessment, Research, and Evaluation, E-ISSN 1531-7714, Vol. 20, artikel-id 21Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

When subscores on a test are reported to the test taker, the appropriateness of reporting them depends on whether they provide useful information above what is provided by the total score. Subscores that fail to do so lack adequate psychometric quality and should not be reported. There are several methods for examining the quality of subscores, and in this study seven such methods, four of which are based on classical test theory and three of which are based on item response theory, were reviewed and applied to empirical data. The data consisted of test takers' scores on four test forms – two administrations of a first version of a college admission test and two administrations of a second version – and the analyses were carried out on the subtest and section levels. The two section scores were found to have adequate psychometric quality with all methods used, whereas the results for subtest scores ranged from almost all scores having adequate psychometric quality to none having adequate psychometric quality. The authors recommend using Haberman's method and the related utility index because of their solid theoretical foundation and because of various issues with the other subscore quality methods.

Nyckelord
subscores, score reporting, mean squared error, factor analysis, IRT, college admissions tests
Nationell ämneskategori
Pedagogik Psykologi
Forskningsämne
beteendevetenskapliga mätningar
Identifikatorer
urn:nbn:se:umu:diva-112181 (URN)
Tillgänglig från: 2015-12-03 Skapad: 2015-12-03 Senast uppdaterad: 2024-01-16Bibliografiskt granskad
3. Reasons for gender-related differential item functioning in a college admissions test
Öppna denna publikation i ny flik eller fönster >>Reasons for gender-related differential item functioning in a college admissions test
2018 (Engelska)Ingår i: Scandinavian Journal of Educational Research, ISSN 0031-3831, E-ISSN 1470-1170, Vol. 62, nr 6, s. 959-970Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Gender fairness in testing can be impeded by the presence of differential item functioning (DIF), which potentially causes test bias. In this study, the presence and causes of gender-related DIF were investigated with real data from 800 items answered by 250,000 test takers. DIF was examined using the Mantel-Haenszel and logistic regression procedures. Little DIF was found in the quantitative items and a moderate amount was found in the verbal items. Vocabulary items favored women if sampled from traditionally female domains but generally not vice versa if sampled from male domains. The sentence completion item format in the English reading comprehension subtest favored men regardless of content. The findings, if supported in a cross-validation study, can potentially lead to changes in how vocabulary items are sampled and in the use of the sentence completion format in English reading comprehension, thereby increasing gender fairness in the examined test.

Ort, förlag, år, upplaga, sidor
Routledge, 2018
Nyckelord
DIF, Mantel-Haenszel, logistic regression, SweSAT, fairness
Nationell ämneskategori
Utbildningsvetenskap
Identifikatorer
urn:nbn:se:umu:diva-138816 (URN)10.1080/00313831.2017.1402365 (DOI)000445081100010 ()2-s2.0-85035812910 (Scopus ID)
Tillgänglig från: 2017-08-31 Skapad: 2017-08-31 Senast uppdaterad: 2022-03-09Bibliografiskt granskad
4. Equating challenges when revising large-scale tests: A comparison of different frameworks, methods and designs
Öppna denna publikation i ny flik eller fönster >>Equating challenges when revising large-scale tests: A comparison of different frameworks, methods and designs
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

This study compared the performance of kernel and traditional equipercentile observed-score equating methods when linking a revised test to an old version of that test, and when equating two test forms of the revised test. Several equating designs were included for both methods and R, especially the packages equate and kequate, was used to perform the equatings. Evaluation criteria for the equatings were standard error of equating, percent relative error and difference that matters. The results show that kernel equating is superior to traditional equating when linking a revised test to an old test under the single group design. Kernel equating was not found to be preferable over traditional equating when equating the revised test. Although the percent relative error was low for all designs when using kernel equating, many score differences between kernel- and traditional equating were larger than a difference that matters. The recommendation is therefore to continue to equate with the traditional equating method and to further investigate kernel equating as a future alternative.

Nationell ämneskategori
Utbildningsvetenskap
Identifikatorer
urn:nbn:se:umu:diva-138818 (URN)
Tillgänglig från: 2017-08-31 Skapad: 2017-08-31 Senast uppdaterad: 2018-06-09

Open Access i DiVA

fulltext(618 kB)2039 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 618 kBChecksumma SHA-512
6097189085e6e8f560f1cd95062961b7cf64c8427f5ceac408d45fddbbf4b3d9311bc24a5b09741b739d800d6d0cc8c8c3b0310981bfea8e9254ee5e43f0688a
Typ fulltextMimetyp application/pdf
spikblad(120 kB)84 nedladdningar
Filinformation
Filnamn FULLTEXT03.pdfFilstorlek 120 kBChecksumma SHA-512
597e9ccc0546e23c6b75ca945c5b7de46b1e858b52f6607b21e4f4f160886f58a46e8fb29ca5fe28362ad8b267b8fa18e9950050b7b8908da05d36f54c1be53a
Typ spikbladMimetyp application/pdf
omslag(8112 kB)0 nedladdningar
Filinformation
Filnamn COVER02.jpgFilstorlek 8112 kBChecksumma SHA-512
4d6406266393d81b1e40b741bfb45b4cd721d840b96c3b4217da30c0f7099580674f1af296a687a1b727fca8ee7e94a3360db01785953a11ede636cd415720f0
Typ coverMimetyp image/jpeg

Person

Wedman, Jonathan

Sök vidare i DiVA

Av författaren/redaktören
Wedman, Jonathan
Av organisationen
Beteendevetenskapliga mätningar (BVM)
Utbildningsvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 2123 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 3939 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf