Effects of rating criteria order on the halo effect in second language writing assessment

Thumbnail Image
Date
2019-01-01
Authors
Kim, Hyunwoo
Major Professor
Advisor
Gary J. Ockey
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Research Projects
Organizational Units
Organizational Unit
Journal Issue
Is Version Of
Versions
Series
Department
English
Abstract

The halo effect is raters’ undesirable tendency to assign more similar ratings across rating criteria than they should, thus compromising the validity of the ratings. The impacts of the halo effect on ratings have been studied in rater-mediated performance assessment. Little is known, however, about the extent to which rating criteria order in analytic rating scales is associated with the magnitude of the halo effect. Thus, the aim of the study is to examine the extent to which the magnitude of the halo effect exhibited by trained novice Korean raters is associated with rating criteria order in analytic rating scales in the context of second language writing assessment. To select essays that appropriately display uneven profiles across four rating criteria, the single-trait rating method was implemented along with the employment of four expert raters. Next, 11 trained novice Korean raters rated the same 30 screened essays in three different rating orders: standard-, reverse-, and random-order. In the standard-order rating rubric, the rating criteria were presented as follows: content, organization, vocabulary, and language use. This order was precisely reversed in the reverse-order rating rubric. In the random-order rating rubric, the rating criteria were randomly displayed to raters. Along with the preliminary inspection of a multitrait-multimethod (MTMM) matrix, a three-facet rating scale model within a many-facet Rasch measurement (MFRM) framework was fitted to estimate the magnitude of the halo effect. Think-aloud verbal protocol analysis was conducted to examine how rating criteria order affects the rating process. The overall results of this study showed that the similar magnitude of the group-level halo effect was detected in the standard- and reverse-order rating rubric while the random presentation of rating criteria decreased the group-level halo effect. The results of the think-aloud verbal protocol analysis indicated that the standard- and reverse-order rubrics affected the rating process. When anchoring rating criterion difficulty, rater fit statistics were effective in flagging the halo-exhibiting individual raters. A major implication of the study is the necessity of considering rating criteria order as a source of construct-irrelevant difficulty when developing analytic rating scales.

Comments
Description
Keywords
Citation
DOI
Source
Copyright
Sun Dec 01 00:00:00 UTC 2019