Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Dimensions of validity: studies of the Swedish national tests in mathematics
Umeå universitet, Samhällsvetenskapliga fakulteten, Institutionen för tillämpad utbildningsvetenskap, Beteendevetenskapliga mätningar (BVM).ORCID-id: 0000-0003-3028-1299
2018 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)Alternativ titel
Aspekter av validitet : studier av de Svenska nationella proven i matematik (Svenska)
Abstract [en]

The main purpose for the Swedish national tests was from the beginning to provide exemplary assessments in a subject and support teachers when interpreting the syllabus. Today, their main purpose is to provide an important basis for teachers when grading their students. Although the results from tests do not entirely decides a student’s grade, they are to be taken into special account in the grading process. Given the increasing importance and raise of the stakes, quality issues in terms of validity and reliability is attracting greater attention. The main purpose of this thesis is to examine evidence demonstrating the validity for the Swedish national tests in upper secondary school mathematics and thereby identify potential threats to validity that may affect the interpretations of the test results and lead to invalid conclusions. The validation is made in relation to the purpose that the national tests should support fair and equal assessment and grading. More specifically, the focus was to investigate how differences connected to digital tools, different scorers and the standard setting process affect the results, and also investigate if subscores can be used when interpreting the results. A model visualized as a chain containing links associated with various aspects of validity, ranging from administration and scoring to interpretation and decision-making, is used as a framework for the validation.

The thesis consists of four empirical studies presented in the form of papers and an introduction with summaries of the papers. Different parts of the validation chain are examined in the studies. The focus of the first study is the administration and impact of using advanced calculators when answering test items. These calculators are able to solve equations algebraically and therefore reduce the risk of a student making mistakes. Since the use of such calculators is allowed but not required and since they are quite expensive, there is an obvious threat to validity since the national tests are supposed to be fair and equal for all test takers. The results show that the advanced calculators were not used to a great extent and it was mainly those students who were high-achieving in mathematics that benefited the most. Therefore the conclusion was that the calculators did not affect the results.

The second study was an inter-rater reliability study. In Sweden, teachers are responsible for scoring their own students’ national tests, without any training, monitoring or moderation. Therefore it was interesting to investigate the reliability of the scoring since there is a potential risk of bias against one’s own students. The analyses showed that the agreement between different raters, analyzed with percent-agreement and kappa, is rather high but some items have lower agreement. In general, items with several correct answers or items where different solution strategies are available are more difficult to score reliably.

The cut scores set by a judgmental Angoff standard setting, the method used to define the cut scores for the national tests in mathematics, was in study three compared with a statistical linking procedure using an anchor test design in order to investigate if the cut scores for two test forms were equally demanding. The results indicate that there were no large differences between the test forms. However, one of the test taker groups was rather small which restricts the power of the analysis. The national tests do not include any anchor items and the study highlights the challenges of introducing equating, that is comparing the difficulty of different test forms, on a regular basis.

In study four, the focus was on subscores and whether there was any value in reporting them in addition to the total score. The syllabus in mathematics has been competence-based since 2011 and the items in the national tests are categorized in relation to these competencies. The test grades are only connected to the total score via the cut scores but the result for each student is consolidated in a result profile based on those competencies. The subscore analysis shows that none of the subscores have added value and the tests would have to be up to four times longer in order to achieve any significant value.

In conclusion, the studies indicate that several of the potential threats do not appear to be significant and the evidence suggests that the interpretations made and decisions taken have the potential to be valid. However, there is a need for further studies. In particular, there is a need to develop a procedure for equating that can be implemented on a regular basis.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet , 2018. , s. 61
Serie
Academic dissertations at the department of Educational Measurement, ISSN 1652-9650 ; 11
Nyckelord [en]
national tests, validity, interrater reliability, standard setting, linking, subscores, test development
Nationell ämneskategori
Pedagogiskt arbete
Forskningsämne
beteendevetenskapliga mätningar
Identifikatorer
URN: urn:nbn:se:umu:diva-153056ISBN: 978-91-7601-936-8 (tryckt)OAI: oai:DiVA.org:umu-153056DiVA, id: diva2:1260770
Disputation
2018-11-30, KBE303, Stora hörsalen i KBC-huset, Umeå, 10:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2018-11-09 Skapad: 2018-11-05 Senast uppdaterad: 2018-11-20Bibliografiskt granskad
Delarbeten
1. Students’ use of CAS calculators: effects on the trustworthiness and fairness of mathematics assessments
Öppna denna publikation i ny flik eller fönster >>Students’ use of CAS calculators: effects on the trustworthiness and fairness of mathematics assessments
2012 (Engelska)Ingår i: International journal of mathematical education in science and technology, ISSN 0020-739X, E-ISSN 1464-5211, Vol. 43, nr 7, s. 843-861Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Calculators with computer algebra systems (CAS) are powerful tools when working with equations and algebraic expressions in mathematics. When the calculators are allowed to be used during assessments but are not available or provided to every student, they may cause bias.  The CAS calculators may also have an impact on the trustworthiness of results.

 In this study students’ use of the CAS calculator in their work with released assessment items from TIMSS Advanced 2008 is studied using two approaches. Eight students familiar with CAS, from two  mathematics classes in the 12th form, were video filmed when encouraged to think aloud during their work with the items. In addition, a questionnaire was distributed to all 33 students in the two classes who had been working with a CAS.

The main finding is that even if the students are used to working with the CAS calculator, they are not using the calculator to a large extent. The analysis indicates that the difference in performance between the high- and low-achieving students has slightly increased due to the use of the calculator. From a validity perspective one could therefore argue that the CAS calculator is no major threat to the trustworthiness of the assessment. Nevertheless, the result indicates that those students in the study, mainly high achieving, who know how to use the CAS calculator, get an additional advantage. The advantage brings an amount of unfairness into the assessment and could be a threat to the trustworthiness and fairness.

Ort, förlag, år, upplaga, sidor
Taylor & Francis Group, 2012
Nyckelord
computer algebra system; assessment; trustworthiness; fairness; validity
Nationell ämneskategori
Pedagogiskt arbete
Forskningsämne
beteendevetenskapliga mätningar
Identifikatorer
urn:nbn:se:umu:diva-54359 (URN)10.1080/0020739X.2012.662289 (DOI)2-s2.0-84867250871 (Scopus ID)
Tillgänglig från: 2012-04-25 Skapad: 2012-04-24 Senast uppdaterad: 2024-07-02Bibliografiskt granskad
2. Interrater reliability in large-scale assessments: can teachers score national tests reliably without external controls?
Öppna denna publikation i ny flik eller fönster >>Interrater reliability in large-scale assessments: can teachers score national tests reliably without external controls?
2015 (Engelska)Ingår i: Practical Assessment, Research, and Evaluation, E-ISSN 1531-7714, Vol. 20, nr 9Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

In most large-scale assessment systems a set of rather expensive external quality controls are implemented in order to guarantee the quality of interrater reliability. This study empirically examines if teachers’ ratings of national tests in mathematics can be reliable without using monitoring, training, or other methods of external quality assurance. A sample of 99 booklets of students’ answers to a national test in mathematics was scored by five teachers independently. The interrater reliability was analyzed using consensus and consistency estimates, with the focus on the test as a whole, as well as on individual items. The results show that the estimates are acceptable and in many cases fairly high, irrespective of the reliability measure used. Some plausible explanations for lower interrater reliability in individual items are discussed, and some suggestions are made in the direction of further improving reliability without imposing any system of control.

Nationell ämneskategori
Pedagogiskt arbete
Identifikatorer
urn:nbn:se:umu:diva-101511 (URN)
Tillgänglig från: 2015-03-31 Skapad: 2015-03-31 Senast uppdaterad: 2024-07-02Bibliografiskt granskad
3. Validating standard setting: comparing judgmental and statistical linking
Öppna denna publikation i ny flik eller fönster >>Validating standard setting: comparing judgmental and statistical linking
2017 (Engelska)Ingår i: Standard setting in education: the Nordic countries in an international perspective / [ed] Sigrid Blömeke; Jan-Eric Gustafsson, Cham: Springer, 2017, 1, s. 143-160Kapitel i bok, del av antologi (Refereegranskat)
Abstract [en]

This study presents a validation of the proposed cut scores for two test forms in mathematics that were developed from the same syllabus and blueprint. The external validity was analyzed by comparing the cut scores set by an Angoff procedure with the results provided by mean and linear observed score equating procedures. A non-equivalent group anchor test (NEAT) design was also used. The results provide evidence that the cut scores obtained through both judgmental and statistical linking are equivalent. However, the equating procedure revealed several methodological and practical challenges.

Ort, förlag, år, upplaga, sidor
Cham: Springer, 2017 Upplaga: 1
Serie
Methodology of Educational Measurement and Assessment, ISSN 2367-170X, E-ISSN 2367-1718 ; 1
Nyckelord
Standard setting, Equating, National testing, Equivalence, Fairness
Nationell ämneskategori
Utbildningsvetenskap
Identifikatorer
urn:nbn:se:umu:diva-132736 (URN)10.1007/978-3-319-50856-6_9 (DOI)2-s2.0-85151490015 (Scopus ID)978-3-319-50855-9 (ISBN)978-3-319-50856-6 (ISBN)
Tillgänglig från: 2017-03-22 Skapad: 2017-03-22 Senast uppdaterad: 2023-04-13Bibliografiskt granskad
4. Using summative tests for formative purposes: an analysis of the added value of subscores
Öppna denna publikation i ny flik eller fönster >>Using summative tests for formative purposes: an analysis of the added value of subscores
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

Knowledge tests, both standardized and teacher developed, are central in teachers’ daily work when forming decisions on student achievement. Although it is recommended that a test should be used only for its intended purpose, tests that were designed for summative purposes are nevertheless used for giving feedback or making formative decisions. The purpose of this paper is to investigate whether a summative test within the Swedish national test framework can provide meaningful information for formative use by testing its reliability on the subscore level. The study also aims to analyze whether a Swedish national test can be used for provide guidance for practitioners who wish to use the information on the subscore level for planning instruction as well as other formative purposes, as sometimes implied in the information to teachers.

Nyckelord
Subscores, test development, classroom assessments
Nationell ämneskategori
Pedagogiskt arbete
Forskningsämne
beteendevetenskapliga mätningar
Identifikatorer
urn:nbn:se:umu:diva-152962 (URN)
Tillgänglig från: 2018-11-05 Skapad: 2018-11-05 Senast uppdaterad: 2018-11-05

Open Access i DiVA

fulltext(772 kB)1958 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 772 kBChecksumma SHA-512
b1e419dc5c58bb7780eac640dda7315228f423de8c5e7be3be2d8ae07c14e6231e49cb51d84f1a8fcccb1c05cca28fa61bd9abf6cba204fd7bbe6bbcf65668cf
Typ fulltextMimetyp application/pdf
spikblad(65 kB)136 nedladdningar
Filinformation
Filnamn SPIKBLAD01.pdfFilstorlek 65 kBChecksumma SHA-512
bf58c6d7d339075e36f97977ceb7865ae8f3d65295c38486a99a783253a1ef9f1072afd05ef40d0c595b51dc7b6954f2d10c8f088ff9f7735485551bf56acd4a
Typ spikbladMimetyp application/pdf

Person

Lind Pantzare, Anna

Sök vidare i DiVA

Av författaren/redaktören
Lind Pantzare, Anna
Av organisationen
Beteendevetenskapliga mätningar (BVM)
Pedagogiskt arbete

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 1973 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 1998 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf