OECD Programme for International Student Assessment (PISA)

Publication Date

4-12-2024

Subjects

automated test assembly, multi-stage adaptive testing, PISA, student assessment, testlets

Comments

Presented as part of the symposium: Furthering research underpinning OECD Programme for International Student Assessment - PISA held in conjunction with the 2024 NCME Annual Meeting is Reconceptualizing Measurement Theory and Practice to Reduce Inequities. Session Organizer: Alejandra Osses-Vargas and Chair: Goran Lazendic, ACER.

Abstract

In large-scale assessments, the manual assembly of tests to meet many competing criteria may become an intractable task. Automated test assembly (ATA) can use optimisation algorithms to achieve optimal assembly outcomes. Since 2018, PISA has been transitioning from fixed test forms to adaptive testing. This transition is driven by the aim of improved test targeting, which increases reliability. Multi-stage adaptive testing (MSAT) has been preferred to item level computer adaptive testing (CAT) as it can better address the sometimes-competing requirements of an assessment like PISA where item exposure, position effects, construct coverage, and cross-sectional trend all must be met in addition to minimising uncertainty in an individual student’s assessment. Over time, the MSAT design in PISA has been adapted and strengthened – in particular, moving to a hybrid design with some proportion of fixed paths and better leveraging of technology. Overall, the use of the hybrid MSAT design plus ATA with mixed-integer linear programming (MILP) has become the preferred way to balance PISA’s competing requirements. This paper describes aspects of the continuing effort to strengthen the MSAT design for PISA in 2025. We report research undertaken to compare and contrast different approaches to ATA, including (1) item-based and unit-based approaches to ATA, (2) different ATA optimisation algorithms and implementations (including open-source solvers such as GLPK, lp_solve and Symphony, commercial solvers such as Gurobi, and software packages that provide direct support for test assembly such as eatATA); and (3) to explore different options for the treatment of ‘enemy’ units in ATA. Results of this research show that unit-based ATA can yield testlets that are well optimised to the Test Information Functions (TIFs) for the assessment; that open-source implementations can provide feasible solutions to ATA problems but they are typically slower than commercial implementations; and that definitions of pairs of ‘enemy’ units can be built into ATA with little impact on the target Test Characteristics Curves (TCCs), providing the number of pairs is small. The discussion focuses on how ATA can incorporate with systematic and qualitative evaluation of test-assembly outcomes by content-area experts without introducing excessive manual intervention. Future opportunities to continue to research and improve the MSAT design for PISA are also proposed.

Place of Publication

Camberwell, Australia

Publisher

Australian Council for Educational Research

DOI

https://doi.org/10.37517/2024041114-01

Share

 
COinS