The effect of task complexity on rater severity in an adaptive performance-based second language oral communication test

Won, Yongkook

The effect of task complexity on rater severity in an adaptive performance-based second language oral communication test

File

Won_iastate_0097E_17834.pdf (2.52 MB)

Date

2019-01-01

Authors

Won, Yongkook

Advisor

Gary J. Ockey

Organizational Units

Organizational Unit

English

Department

English

Abstract

Despite the benefits of performance-based oral communication tests, a plethora of variables, as illustrated in Ockey and Li’s (2015) model of oral communication assessment, can create construct-irrelevant variance in test scores. In relation to human participants in the oral communication tests, previous studies mostly focused on the direct effect of the rater group variable on test scores. Little attention has been paid to the interaction of raters with interviewers in oral communication tests. The present study investigates how raters evaluate test takers’ performance in performance-based oral communication tests when interviewers can adaptively choose their questions, in terms of task complexity, responding to test takers’ performance.

An explanatory sequential design with a mixed-methods approach was used to investigate the effect of task complexity on rater severity. For the initial quantitative data analysis, operational rating data from 1,689 test takers whose native languages are not English and scored by 24 certified raters in the Oral English Certification Test (OECT) and 162 audio recordings of 81 international graduate students in the OECT were analyzed with multilevel ordinal logistic regression, a paired samples t-test, and many-facet Rasch measurement (MFRM). To further investigate the effect of task complexity on rater severity, nine newly trained raters were trained to judge 80 speech samples of 40 test takers in the OECT. A partial credit model of MFRM was used to analyze raters’ use of the scoring rubric depending on task complexity.

In the initial quantitative data analysis, low complexity prompts were statistically estimated as the most difficult item. The results of paired samples t-tests showed that only a few fluency measures demonstrated statistical differences by task complexity. The analysis of the interaction of task complexity with rating contexts with nine newly trained raters using Welch’s t-test showed that the difficulty of high complexity tasks decreased when raters became aware of the task complexity. This change of task difficulty suggests that raters in this adaptive performance-based oral communication may have changed their rating severity depending on their understanding of the task complexity. Follow-up verbal reports and interviews supported the findings in the quantitative analysis.

Copyright

Thu Aug 01 00:00:00 UTC 2019

Collections

Theses and Dissertations

Full item page