Effects of rating criteria order on the halo effect in second language writing assessment

Kim, Hyunwoo

Effects of rating criteria order on the halo effect in second language writing assessment

File

Kim_iastate_0097E_18376.pdf (1.43 MB)

Date

2019-01-01

Authors

Kim, Hyunwoo

Advisor

Gary J. Ockey

Organizational Units

Organizational Unit

English

Department

English

Abstract

The halo effect is raters’ undesirable tendency to assign more similar ratings across rating criteria than they should, thus compromising the validity of the ratings. The impacts of the halo effect on ratings have been studied in rater-mediated performance assessment. Little is known, however, about the extent to which rating criteria order in analytic rating scales is associated with the magnitude of the halo effect. Thus, the aim of the study is to examine the extent to which the magnitude of the halo effect exhibited by trained novice Korean raters is associated with rating criteria order in analytic rating scales in the context of second language writing assessment. To select essays that appropriately display uneven profiles across four rating criteria, the single-trait rating method was implemented along with the employment of four expert raters. Next, 11 trained novice Korean raters rated the same 30 screened essays in three different rating orders: standard-, reverse-, and random-order. In the standard-order rating rubric, the rating criteria were presented as follows: content, organization, vocabulary, and language use. This order was precisely reversed in the reverse-order rating rubric. In the random-order rating rubric, the rating criteria were randomly displayed to raters. Along with the preliminary inspection of a multitrait-multimethod (MTMM) matrix, a three-facet rating scale model within a many-facet Rasch measurement (MFRM) framework was fitted to estimate the magnitude of the halo effect. Think-aloud verbal protocol analysis was conducted to examine how rating criteria order affects the rating process. The overall results of this study showed that the similar magnitude of the group-level halo effect was detected in the standard- and reverse-order rating rubric while the random presentation of rating criteria decreased the group-level halo effect. The results of the think-aloud verbal protocol analysis indicated that the standard- and reverse-order rubrics affected the rating process. When anchoring rating criterion difficulty, rater fit statistics were effective in flagging the halo-exhibiting individual raters. A major implication of the study is the necessity of considering rating criteria order as a source of construct-irrelevant difficulty when developing analytic rating scales.

Copyright

Sun Dec 01 00:00:00 UTC 2019

Collections

Theses and Dissertations

Full item page