umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Is This Reliable Enough?: Examining Classification Consistency and Accuracy in a Criterion-Referenced Test
Umeå universitet, Samhällsvetenskapliga fakulteten, Institutionen för tillämpad utbildningsvetenskap.
2016 (Engelska)Ingår i: International journal of assessment tools in education, ISSN 2148-7456, Vol. 3, nr 2, s. 137-150Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

One important step for assessing the quality of a test is to examine the reliability of test score interpretation. Which aspect of reliability is the most relevant depends on what type of test it is and how the scores are to be used. For criterion-referenced tests, and in particular certification tests, where students are classified into performance categories, primary focus need not be on the size of error but on the impact of this error on classification. This impact can be described in terms of classification consistency and classification accuracy. In this article selected methods from classical test theory for estimating classification consistency and classification accuracy were applied to the theory part of the Swedish driving licence test, a high-stakes criterion-referenced test which is rarely studied in terms of reliability of classification. The results for this particular test indicated a level of classification consistency that falls slightly short of the recommended level which is why lengthening the test should be considered. More evidence should also be gathered as to whether the placement of the cut-off score is appropriate since this has implications for the validity of classifications.

Ort, förlag, år, upplaga, sidor
International Journal of Assessment Tools in Education (IJATE) , 2016. Vol. 3, nr 2, s. 137-150
Nyckelord [en]
reliability, criterion-referenced test, driving licence test, classification consistency, decision nsistency, single administration
Nationell ämneskategori
Sannolikhetsteori och statistik Utbildningsvetenskap
Identifikatorer
URN: urn:nbn:se:umu:diva-139826DOI: 10.21449/ijate.245198ISI: 000409392100003OAI: oai:DiVA.org:umu-139826DiVA, id: diva2:1143724
Tillgänglig från: 2017-09-22 Skapad: 2017-09-22 Senast uppdaterad: 2019-10-14Bibliografiskt granskad
Ingår i avhandling
1. Licence to drive: the importance of reliability for the validity of the Swedish driving licence test
Öppna denna publikation i ny flik eller fönster >>Licence to drive: the importance of reliability for the validity of the Swedish driving licence test
2019 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Background: The Swedish driving licence test is a criterion-referenced test resulting in a pass or fail. It currently consists of two parts - a theory test with 65 multiple-choice items and a practical driving test where at least 25 minutes are spent driving in traffic. It is a high-stakes test in the sense that the results are used to determine whether the test-taker should be allowed to drive a car without supervision. As the only other requirements for obtaining a licence is a few hours of hazard education (and a short introduction if you intend to drive with a lay instructor) it is important that the test result, in terms of pass or fail, is reliable and valid. If this is not the case it could have detrimental effects on traffic safety. Examining all relevant aspects is beyond the scope of this licentiate thesis so I have focused on reliability.

Methods Reliability for both the theoretical and practical test results was examined. As these are very different types of tests the types of reliability examined also differed. In order to examine inter-rater reliability of the driving test 83 examiners were accompanied by one of five selected supervising examiners for a day of tests. All in all 535 tests were conducted with two examiners assessing the same performance. At the end of the day the examiners compared notes and tried to determine the reason for any inconsistencies. Both examiners and students also filled in questionnaires with questions about background and preparation. As for studying decision consistency and decision accuracy of the theory test, three test versions (a total of around 12,000 tests) were examined with the help of methods devised by Subkoviak (Subkoviak, 1976, 1988) and Hanson & Brennan (Brennan, 2004; Hanson & Brennan, 1990).

Results The results from two research studies concerning reliability were presented. Study I focused on inter-rater reliability in the driving test and in 93 per cent of cases the examiners made the same assessment. For the tests where their opinions differed there was no correlation to any of the background variables or other variables examined except for three, which had logical explanations and did not constitute a problem. Although there were cases where the differences were due to different stances on matters of interpretation the most common suggested cause was the placement in the car (back seat vs. front seat). Although the supervising examiners gave both praise and criticism as to how the test was carried out the study does not answer the question whether the tests were equal in terms of composition and difficulty.

In Study II the focus was on decision consistency and decision accuracy in the theory test. Three versions of the theory tests were examined and, on the whole, found to be fairly similar in terms of item difficulty and score distribution, but the mean was so close to the cut-score (i.e. the score required to pass) that the pass rate differed somewhat between versions. Agreement coefficients were around .80 for all test versions (between .79 and .82 depending on method). Classification accuracy indicated an .87 probability of a correct classification.

Conclusion It is important to examine the reliability and validity of the driving licence test since a misclassification can have serious consequences in terms of traffic safety. In the studies included here the rate of agreement between examiners is deemed as satisfactory. It would be preferable if the classification consistency and classification accuracy, as estimated by the methods used, were higher for the theory test, given its importance.

While reliability in terms of agreement between raters/examiners or consistency and accuracy of classification are routinely examined in other contexts, such as large-scale educational testing, this is not often done for the driving licence tests. At the same time, the methods used here can be transferred to contexts where such properties are generally not examined. Collecting information about test-takers and examiners, like in Study I, can provide evidence concerning possible bias.

Examining to what extent decisions are consistent is one important aspect of collecting evidence that shows that test results can be used to draw conclusions about driver competence. Still, regardless of outcome, validation is a process that never ends. There is always reason to examine various aspects and make further improvements. There are also many other relevant aspects to examine. A prerequisite for the validity of the score interpretation of a criterion-referenced test like this one is that the cut-score is appropriate and the content relevant. This should therefore be the subject of further research as the validation process continues.

Ort, förlag, år, upplaga, sidor
Umeå: Department of applied educational science, Educational measurement, Umeå university, 2019. s. 56
Serie
Academic dissertations at the department of Educational Measurement, ISSN 1652-9650 ; 12
Nyckelord
Driving licence tests, driver's licence, driving test, theory test, licensing test, interrater reliability, classification consistency, examiner agreement, classification accuracy, förarprov, körprov, kunskapsprov, reliabilitet, validitet, bedömare
Nationell ämneskategori
Utbildningsvetenskap
Forskningsämne
beteendevetenskapliga mätningar
Identifikatorer
urn:nbn:se:umu:diva-163949 (URN)9789178551156 (ISBN)
Presentation
2019-10-25, Aulan, Vårdvetarhuset, Umeå, 10:00 (Svenska)
Opponent
Handledare
Tillgänglig från: 2019-10-14 Skapad: 2019-10-12 Senast uppdaterad: 2019-10-14Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltext

Personposter BETA

Alger, Susanne

Sök vidare i DiVA

Av författaren/redaktören
Alger, Susanne
Av organisationen
Institutionen för tillämpad utbildningsvetenskap
Sannolikhetsteori och statistikUtbildningsvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 378 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf