Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 16) Show all publications
Lind Pantzare, A. (2023). Towards a fairer and more equitable national test system: focusing standard setting and equating. In: AEA '23: Assessment reform journeys: intentions, enactment and evaluation: book of abstracts. Paper presented at AEA-Europe 2023, "Assessment reform journeys: intentions, enactment and evaluation", Sliema, Malta, November 1-4, 2023 (pp. 169).
Open this publication in new window or tab >>Towards a fairer and more equitable national test system: focusing standard setting and equating
2023 (English)In: AEA '23: Assessment reform journeys: intentions, enactment and evaluation: book of abstracts, 2023, p. 169-Conference paper, Oral presentation with published abstract (Refereed)
National Category
Educational Sciences
Identifiers
urn:nbn:se:umu:diva-222712 (URN)
Conference
AEA-Europe 2023, "Assessment reform journeys: intentions, enactment and evaluation", Sliema, Malta, November 1-4, 2023
Available from: 2024-03-26 Created: 2024-03-26 Last updated: 2024-03-26Bibliographically approved
Lind Pantzare, A. (2022). The future of national tests: Comparing paper-based and digital assessments in upper secondary school mathematics. In: AEA '22: Book of Abstracts. Paper presented at AEA - Europe 2022, "New visions for assessment in uncertain times", Dublin, Ireland, November 9-12, 2022 (pp. 142-142).
Open this publication in new window or tab >>The future of national tests: Comparing paper-based and digital assessments in upper secondary school mathematics
2022 (English)In: AEA '22: Book of Abstracts, 2022, p. 142-142Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

In this presentation the results from a study where two separate test forms in mathematics, one paper-based and one digital, were administrated to two groups of students will be reported. The aim with the study was to learn more about differences in difficulty if mathematics items are served in a paper and pencil form or if they are administrated digitally where also the answers are given digitally. A second issue investigated was the cognitive workload and the use of scratch paper. The hypothesis was that when solving short answer items digitally students only use mental arithmetic and not the scratch paper and that a possible lack of scratchpaper use might affect the proportion of students solving the item correct. A third issue is about writing down solutions to mathematical problems using an equation editor, which will be necessary when the national tests are digitalised in a couple of years.

National Category
Educational Sciences
Identifiers
urn:nbn:se:umu:diva-222713 (URN)
Conference
AEA - Europe 2022, "New visions for assessment in uncertain times", Dublin, Ireland, November 9-12, 2022
Available from: 2024-03-26 Created: 2024-03-26 Last updated: 2024-03-26Bibliographically approved
Wiberg, M., Lyrén, P.-E. & Lind Pantzare, A. (2021). Schools, Universities and Large-Scale Assessment Responses to COVID-19: The Swedish Example. Education Sciences, 11(4), Article ID 175.
Open this publication in new window or tab >>Schools, Universities and Large-Scale Assessment Responses to COVID-19: The Swedish Example
2021 (English)In: Education Sciences, E-ISSN 2227-7102, Vol. 11, no 4, article id 175Article in journal (Refereed) Published
Abstract [en]

The aim of this paper is to describe, analyze, and discuss how Swedish schools and the national tests in schools, university teaching and examination, and the college admissions test, Swedish Scholastic Aptitude Test (SweSAT), have been affected by the COVID-19 situation. A further aim is to discuss the challenges in schools, universities and in the admissions test process in Sweden which are due to the COVID-19 situation. Contrary to many other countries, Swedish schools remained open, except for upper secondary school and universities where teaching went online. However, the spring administrations of the national tests and the high-stake college admission test, SweSAT, were cancelled, which had impact on admissions to universities in the fall. By using documentation from the news, school, and university authorities, as well as governmental reports of the events and a student survey, challenges are discussed. The novelty of this study includes a discussion of the events and their upcoming challenges. A discussion of what could be learned and what to expect in the close future is included, as well as conclusions which can be drawn from this situation.

Place, publisher, year, edition, pages
MDPI, 2021
Keywords
COVID-19, school response, testing challenges, testing situations, university response
National Category
Pedagogy
Identifiers
urn:nbn:se:umu:diva-183569 (URN)10.3390/educsci11040175 (DOI)000642971200001 ()2-s2.0-85105214686 (Scopus ID)
Available from: 2021-06-14 Created: 2021-06-14 Last updated: 2023-12-01Bibliographically approved
Lind Pantzare, A. (2018). Dimensions of validity: studies of the Swedish national tests in mathematics. (Doctoral dissertation). Umeå: Umeå universitet
Open this publication in new window or tab >>Dimensions of validity: studies of the Swedish national tests in mathematics
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Aspekter av validitet : studier av de Svenska nationella proven i matematik
Abstract [en]

The main purpose for the Swedish national tests was from the beginning to provide exemplary assessments in a subject and support teachers when interpreting the syllabus. Today, their main purpose is to provide an important basis for teachers when grading their students. Although the results from tests do not entirely decides a student’s grade, they are to be taken into special account in the grading process. Given the increasing importance and raise of the stakes, quality issues in terms of validity and reliability is attracting greater attention. The main purpose of this thesis is to examine evidence demonstrating the validity for the Swedish national tests in upper secondary school mathematics and thereby identify potential threats to validity that may affect the interpretations of the test results and lead to invalid conclusions. The validation is made in relation to the purpose that the national tests should support fair and equal assessment and grading. More specifically, the focus was to investigate how differences connected to digital tools, different scorers and the standard setting process affect the results, and also investigate if subscores can be used when interpreting the results. A model visualized as a chain containing links associated with various aspects of validity, ranging from administration and scoring to interpretation and decision-making, is used as a framework for the validation.

The thesis consists of four empirical studies presented in the form of papers and an introduction with summaries of the papers. Different parts of the validation chain are examined in the studies. The focus of the first study is the administration and impact of using advanced calculators when answering test items. These calculators are able to solve equations algebraically and therefore reduce the risk of a student making mistakes. Since the use of such calculators is allowed but not required and since they are quite expensive, there is an obvious threat to validity since the national tests are supposed to be fair and equal for all test takers. The results show that the advanced calculators were not used to a great extent and it was mainly those students who were high-achieving in mathematics that benefited the most. Therefore the conclusion was that the calculators did not affect the results.

The second study was an inter-rater reliability study. In Sweden, teachers are responsible for scoring their own students’ national tests, without any training, monitoring or moderation. Therefore it was interesting to investigate the reliability of the scoring since there is a potential risk of bias against one’s own students. The analyses showed that the agreement between different raters, analyzed with percent-agreement and kappa, is rather high but some items have lower agreement. In general, items with several correct answers or items where different solution strategies are available are more difficult to score reliably.

The cut scores set by a judgmental Angoff standard setting, the method used to define the cut scores for the national tests in mathematics, was in study three compared with a statistical linking procedure using an anchor test design in order to investigate if the cut scores for two test forms were equally demanding. The results indicate that there were no large differences between the test forms. However, one of the test taker groups was rather small which restricts the power of the analysis. The national tests do not include any anchor items and the study highlights the challenges of introducing equating, that is comparing the difficulty of different test forms, on a regular basis.

In study four, the focus was on subscores and whether there was any value in reporting them in addition to the total score. The syllabus in mathematics has been competence-based since 2011 and the items in the national tests are categorized in relation to these competencies. The test grades are only connected to the total score via the cut scores but the result for each student is consolidated in a result profile based on those competencies. The subscore analysis shows that none of the subscores have added value and the tests would have to be up to four times longer in order to achieve any significant value.

In conclusion, the studies indicate that several of the potential threats do not appear to be significant and the evidence suggests that the interpretations made and decisions taken have the potential to be valid. However, there is a need for further studies. In particular, there is a need to develop a procedure for equating that can be implemented on a regular basis.

Place, publisher, year, edition, pages
Umeå: Umeå universitet, 2018. p. 61
Series
Academic dissertations at the department of Educational Measurement, ISSN 1652-9650 ; 11
Keywords
national tests, validity, interrater reliability, standard setting, linking, subscores, test development
National Category
Pedagogical Work
Research subject
didactics of educational measurement
Identifiers
urn:nbn:se:umu:diva-153056 (URN)978-91-7601-936-8 (ISBN)
Public defence
2018-11-30, KBE303, Stora hörsalen i KBC-huset, Umeå, 10:00 (English)
Opponent
Supervisors
Available from: 2018-11-09 Created: 2018-11-05 Last updated: 2018-11-20Bibliographically approved
Wikström, C. & Lind Pantzare, A. (2018). Standard setting in Sweden: school grades and national tests. In: Jo-Anne Baird, Tina Isaacs, Dennis Opposs, Lena Gray (Ed.), Examination standards: how measures and meanings differ around the world (pp. 235-251). London, UK: UCL Press
Open this publication in new window or tab >>Standard setting in Sweden: school grades and national tests
2018 (English)In: Examination standards: how measures and meanings differ around the world / [ed] Jo-Anne Baird, Tina Isaacs, Dennis Opposs, Lena Gray, London, UK: UCL Press, 2018, p. 235-251Chapter in book (Other academic)
Place, publisher, year, edition, pages
London, UK: UCL Press, 2018
National Category
Educational Sciences
Research subject
education
Identifiers
urn:nbn:se:umu:diva-152945 (URN)978-1-78277-260-6 (ISBN)978-1-78277-261-3 (ISBN)978-1-78277-262-0 (ISBN)978-1-78277-263-7 (ISBN)
Available from: 2018-10-30 Created: 2018-10-30 Last updated: 2019-05-29Bibliographically approved
Lind Pantzare, A. (2017). Validating standard setting: comparing judgmental and statistical linking (1ed.). In: Sigrid Blömeke; Jan-Eric Gustafsson (Ed.), Standard setting in education: the Nordic countries in an international perspective (pp. 143-160). Cham: Springer
Open this publication in new window or tab >>Validating standard setting: comparing judgmental and statistical linking
2017 (English)In: Standard setting in education: the Nordic countries in an international perspective / [ed] Sigrid Blömeke; Jan-Eric Gustafsson, Cham: Springer, 2017, 1, p. 143-160Chapter in book (Refereed)
Abstract [en]

This study presents a validation of the proposed cut scores for two test forms in mathematics that were developed from the same syllabus and blueprint. The external validity was analyzed by comparing the cut scores set by an Angoff procedure with the results provided by mean and linear observed score equating procedures. A non-equivalent group anchor test (NEAT) design was also used. The results provide evidence that the cut scores obtained through both judgmental and statistical linking are equivalent. However, the equating procedure revealed several methodological and practical challenges.

Place, publisher, year, edition, pages
Cham: Springer, 2017 Edition: 1
Series
Methodology of Educational Measurement and Assessment, ISSN 2367-170X, E-ISSN 2367-1718 ; 1
Keywords
Standard setting, Equating, National testing, Equivalence, Fairness
National Category
Educational Sciences
Identifiers
urn:nbn:se:umu:diva-132736 (URN)10.1007/978-3-319-50856-6_9 (DOI)2-s2.0-85151490015 (Scopus ID)978-3-319-50855-9 (ISBN)978-3-319-50856-6 (ISBN)
Available from: 2017-03-22 Created: 2017-03-22 Last updated: 2023-04-13Bibliographically approved
Lind Pantzare, A. (2017). Validity issues in educational measurement - should subscores in national tests be reported or not?. In: Assessment cultures in a globalised world: The 18th Annual AEA-Europe Conference - Programme. Paper presented at AEA-Europe 2017; 18th annual meeting of the Association for Educational Assessment - Europe, Praque, Czech Republic, November 9-11, 2017 (pp. 46-47).
Open this publication in new window or tab >>Validity issues in educational measurement - should subscores in national tests be reported or not?
2017 (English)In: Assessment cultures in a globalised world: The 18th Annual AEA-Europe Conference - Programme, 2017, p. 46-47Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

In the Swedish criterion-referenced school system teachers are trusted to teach, assess and grade their students. The grading is high stakes since the grades are used for admission to higher education. There are national tests for some of the courses. However, the national tests are not final examinations and the results from the national tests are not decisive in the grading. The main aim with the national tests is that they should support fairness and equality when assessing and grading the students.

In 2011, new syllabuses for upper secondary school were introduced. In mathematics the most obvious and visible change was an ambition to set an even larger focus on competencies instead of content.

The result on the Swedish national tests are reported in the form of a test grade. It has also been taken for granted that the only reasonable is to report the total result and nothing else. However, there has been an increased demand to not only report results based on the total score but also report subscores connected to the competencies. The question is if the national tests are developed in that manner that it is possible and relevant to report subscores based on the competencies.

National Category
Educational Sciences
Research subject
didactics of educational measurement
Identifiers
urn:nbn:se:umu:diva-142158 (URN)
Conference
AEA-Europe 2017; 18th annual meeting of the Association for Educational Assessment - Europe, Praque, Czech Republic, November 9-11, 2017
Available from: 2017-11-24 Created: 2017-11-24 Last updated: 2020-11-25Bibliographically approved
Lind Pantzare, A. & Almarlind, P. (2016). Are the influences from social and political agents beneficial and a necessity in the development and validation of educational assessments?. In: Social and Political underpinnings of educational assessment: Past, present and future: the 17th Annual AEA-Europe Conference : programme. Paper presented at The 17th annual AEA-Europé Conference, 2-5 November 2016, Limassol, Cyprus. (pp. 149-150). AEA-Europe
Open this publication in new window or tab >>Are the influences from social and political agents beneficial and a necessity in the development and validation of educational assessments?
2016 (English)In: Social and Political underpinnings of educational assessment: Past, present and future: the 17th Annual AEA-Europe Conference : programme, AEA-Europe , 2016, p. 149-150Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

In order to produce sound and valid tests with respect to the Standards (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014) there are a lot of issues to consider and take into account. Especially when developing tests on a national level. In addition, there are often several stakeholders having an ambition to influence the tests in different directions. Sometimes these stakeholders agree, but often their requests are diametrically opposed and it is not unusual that the requests are not in line with a good measurement practice. In the midst are the test developing organisations, commissioned to develop products that are valid in relation to the aim/aims for the test and therefore being developed within a sound measurement practice but also accepted by all users.

As described in the theme for the conference this external influence from stakeholders on the tests is immense and sometimes described as only negative. Often there are politicians who uses educational assessments, like national tests or exams, to control the school system on the one hand but on the other hand using the tests to implement changes. At the same time the politicians are sensitive to reactions from the teachers, parents and other stakeholders since they are important groups of voters. In Sweden the debate is at the moment focused on the large number of national tests and the workload they entails to teachers but also students. In a recently published government-appointed inquiry (SOU 2016:25, 2016) it is suggested that the number of national tests should be reduced, that the remaining tests should be less extensive and that the tests should be easier to administer and mark, which probably will affect the validity of the tests.

These external influences could, from a test developing perspective, be seen as problematic since it often introduces (rapid) changes of the tests. On the other hand, one could argue that these external influences are necessary prerequisites to have an ongoing process in order to develop the tests so that they become even more cost effective, valid and seen as valuable for the users.

We think it would be interesting to discuss this complex system of, on the one hand, social and political agents trying to influence and change the national assessment systems and, on the other hand, the test developing organisations aiming to develop assessments that are valid. But at the same time these organisations are dependent of getting resources from the agents to fulfil the commission, which might affect which changes that are implemented and not.

This is a proposal for a discussion group based on the broad question posed as title. Below we have specified some themes that would be interesting to discuss getting perspectives from different countries and testing systems.

  • How are the products, i.e. the tests, and the processes developing the tests affected by the influences from different social and political agents?
  • Are there stakeholders having greater impact, and if there are, is it a necessity or a risk? Why How?
  • Finally, is it maybe necessary to have this continuous external validation of the tests in order to develop, strengthen and legitimise them or does it “ruin the work”?
Place, publisher, year, edition, pages
AEA-Europe, 2016
Keywords
Swedish natonal tests in science
National Category
Educational Sciences
Identifiers
urn:nbn:se:umu:diva-146025 (URN)
Conference
The 17th annual AEA-Europé Conference, 2-5 November 2016, Limassol, Cyprus.
Available from: 2018-03-26 Created: 2018-03-26 Last updated: 2020-08-12Bibliographically approved
Lind Pantzare, A. (2015). Interrater reliability in large-scale assessments: can teachers score national tests reliably without external controls?. Practical Assessment, Research, and Evaluation, 20(9)
Open this publication in new window or tab >>Interrater reliability in large-scale assessments: can teachers score national tests reliably without external controls?
2015 (English)In: Practical Assessment, Research, and Evaluation, E-ISSN 1531-7714, Vol. 20, no 9Article in journal (Refereed) Published
Abstract [en]

In most large-scale assessment systems a set of rather expensive external quality controls are implemented in order to guarantee the quality of interrater reliability. This study empirically examines if teachers’ ratings of national tests in mathematics can be reliable without using monitoring, training, or other methods of external quality assurance. A sample of 99 booklets of students’ answers to a national test in mathematics was scored by five teachers independently. The interrater reliability was analyzed using consensus and consistency estimates, with the focus on the test as a whole, as well as on individual items. The results show that the estimates are acceptable and in many cases fairly high, irrespective of the reliability measure used. Some plausible explanations for lower interrater reliability in individual items are discussed, and some suggestions are made in the direction of further improving reliability without imposing any system of control.

National Category
Pedagogical Work
Identifiers
urn:nbn:se:umu:diva-101511 (URN)
Available from: 2015-03-31 Created: 2015-03-31 Last updated: 2024-07-02Bibliographically approved
Lind Pantzare, A. (2015). Validating cut scores set by Angoff procedures with results from equating procedures. In: : . Paper presented at Standard-setting: International state of research and practices in the Nordic countries, Oslo, September 21-23, 2015.
Open this publication in new window or tab >>Validating cut scores set by Angoff procedures with results from equating procedures
2015 (English)Conference paper, Oral presentation only (Other academic)
Abstract [en]

In Sweden the cut scores for each new test form of national tests in mathematics are set before test administration. This demand has existed ever since the transition to the current criterion-referenced system in 1994. One argument given for this requirement is to make sure that teachers no longer score and interpret the test score in a relative manner. The cut scores are set with a judgemental Angoff procedure, without inclusion of item field test data and with no regular equating or linking procedure. Therefore, a relevant question is if it is naïve to assume that the cut scores are equivalent over years. In these studies the equivalence of the cut scores for two, different and separate, pairs of tests are investigated, by comparing cut scores set by Angoff procedures with the results from equating procedures. In both examples a non-equivalent group anchor test (NEAT) design was used. The cut scores was compared to equating procedures with linear and equipercentile methods. The results show that there are validity arguments supporting that the Angoff procedure is working. However, the equating procedures reveal several methodological and practical challenges.

National Category
Educational Sciences
Research subject
didactics of educational measurement
Identifiers
urn:nbn:se:umu:diva-111086 (URN)
Conference
Standard-setting: International state of research and practices in the Nordic countries, Oslo, September 21-23, 2015
Available from: 2015-11-04 Created: 2015-11-04 Last updated: 2021-09-03Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3028-1299

Search in DiVA

Show all publications