Assessment of practical tasks, as opposed to that of theoretical tasks, has been considered to be problematic, mainly because it is usually resource intensive and the scoring is subjective. Most practical tasks need to be assessed on site or involve products that need to be collected, stored, or transported. Moreover, because practical tasks are generally open-ended, and therefore subjective, there is concern over the reliability of the scores. In high-stakes assessment, these problems are even more challenging. There is a need for an assessment method that could overcome these problems. In this study, such a method that will be referred to as the Comparative Pairs judgements was investigated. This scoring method was applied to samples from the practical examination in two secondary courses in Western Australia: Design and Visual Arts.

This study was conducted within the first phase of an Australian Research Council (ARC) Linkage Project titled the Authentic Digital Representation of Creative Works in Education. This main project was a collaboration between the Centre for Schooling and Learning Technologies (CSaLT) in Edith Cowan University and the Curriculum Council of Western Australia. The purpose of the present study was to investigate the suitability of the Comparative Pairs judgements as an alternative assessment method for assessing highstakes practical production tasks. The overarching research question was how representative are the Comparative Pairs judgement scores of the quality of the student practical production work in Visual Arts and Design courses? In the present study, student work that was submitted for the practical examination was digitised for online scoring processes. The digital representation of student work enabled online access for judging, regardless of the location of the assessors. Both a Comparative Pairs judgements method and an Analytical marking method were used to score these digital representations.

An interpretive research paradigm was employed, by utilising an explanatory sequential mixed method design. Data collected for the present study were part of the data collected in the main project. While data for the main project was quite extensive, only scoring data and the assessor interviews and online notes were considered relevant to this study, and therefore only these data were analysed and discussed in this thesis. A total of 157 students studying Design and Visual Arts participated in the first phase of the main project and the present study. A total of 25 assessors participated in the Comparative Pairs judgements and the Analytical marking processes.

Scoring data analysed in this study were obtained from three scoring processes: the official practical examination scores, the online Analytical marking, and the Comparative Pairs judgements. Data analysis included descriptive statistics, correlation analysis, Rasch dichotomous modelling, fit statistics, and reliability analysis. A further discrepancy analysis was conducted on student works that showed scoring inconsistency, either between methods of scoring or between assessors. Data from the assessor interviews and judgement notes from the scoring processes were triangulated with the scoring data to examine the validity of the Comparative Pairs judgements method as an alternative scoring method. Data from the scoring of the digital representations of the student work in Design and Visual Arts were analysed separately to examine the suitability of the Comparative Pairs judgements in each course, and consequently compared to examine the influence of the different assessment tasks in the two subjects on the scoring result.

Findings for both the Design and Visual Arts courses suggested that the scoring resulting from the Comparative Pairs judgements was reliable. This was mainly due to the numerous judgements and the pairing algorithm, therefore the inconsistencies in judgements were cancelled out, creating scoring results that could be more reliable than the more commonly used Analytical marking. The validity analysis that was conducted used both the evidence for, and threats against validity, suggested that this assessment method could be a valid method for high-stakes practical assessment in these two courses. The present study found that the reliability of the scores and the validity of the Comparative Pairs judgements as an assessment method make this method suitable for assessing high-stakes practical production. Findings from the present study suggested that this method is applied and further investigated in different educational settings for different practical assessment tasks. This method of judgements should be considered to be potentially valuable for formative assessment and summative assessment alike, as well as teacher professional learning, and moderation practices