Predictive modeling of human placement decisions in an English Writing Placement Test

Thumbnail Image
Date
2016-01-01
Authors
Vu, Ngan
Major Professor
Advisor
Volker H. Hegelheimer
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Journal Issue
Is Version Of
Versions
Series
Department
English
Abstract

Writing is an important component in standardized tests that are utilized for admission decisions, class placement, and academic or professional development. Placement results of the EPT Writing Test at the undergraduate level are used to determine whether international students meet English requirements for writing skills (i.e., Pass); and to direct students to appropriate ESL writing classes (i.e., 101B or 101C). Practical constraints during evaluation processes in the English Writing Placement Test (the EPT Writing Test) at Iowa State University, such as rater disagreement, rater turnover, and heavy administrative workload, have demonstrated the necessity to develop valid scoring models for an automated writing evaluation tool. Statistical algorithms of the scoring engines were essential to predict human raters' quality judgments of EPT essays in the future. Furthermore, in measuring L2 writing performance, previous research has heavily focused on writer-oriented text features in students' writing performance, rather than reader-oriented linguistic features that were influential to human raters for making quality judgments. To address the practical concerns of the EPT Writing Test and the existing gap in the literature, the current project aimed at developing a predictive model that best defines human placement decisions in the EPT Writing Test. A two-phase multistage mixed-methods design was adopted in this study within a model-specification phase and in interconnection with model-specification and model-construction phases.

In the model-specification phase, results of a Multifaceted-Rasch-Measurement (MFRM) analysis allowed for selection of five EPT expert raters that represented rating severity levels. Concurrent think-aloud protocols provided by the five participants while evaluating EPT sample essays were analyzed qualitatively to identify text features to which raters attended. Based on the qualitative findings, 52 evaluative variables and metrics were generated. Among the 52 variables, 36 variables were chosen to be analyzed in the whole EPT essay corpus. After that, a corpus-based analysis of 297 EPT essays in terms of 37 text features was conducted to obtain quantitative data on the 36 variables in the model-construction phase. Principal Component Analysis (PCA) helped extract seven principal components (PCs). Results of MANOVA and one-way ANOVA tests revealed 17 original variables and six PCs that significantly differentiated the three EPT placement levels (i.e., 101B, 101C, and Pass). A profile analysis suggested that the lowest level (101B) and the highest level (Pass) seemed to have distinct profiles in terms of text features. Test takers placed in 101C classes were likely to be characterized as an average group. Like 101B students, 101C students appeared to have some linguistic problems. However, students in 101C classes and those who passed the test similarly demonstrated an ability to develop an essay.

In the model-construction phase, random forests (Breiman, 2001) were deployed as a data mining technique to define predictive models of human raters' placement decisions in different task types. Results of the random forests indicated that fragments, part-of-speech-related errors and PC2 (clear organization but limited paragraph development) were significant predictors of the 101B level, and PC6 (academic word use) of the Pass level. The generic classifier on the 17 original variables was seemingly the best model that could perfectly predict the training data set (0% error) and successfully forecast the test set (8% error). Differences in prediction performance between the generic and task specific models were negligible. Results of this project provided little evidence of generalizability of the predictive models in classifying new EPT essays. However, within-class examinations showed that the best classifier could recognize the highest and lowest essays, but crossover cases existed at the adjacent levels. Implications of the project for placement assessment purposes, pedagogical practices in ESL writing courses and automated essay scoring (AES) development for the EPT Writing Test are brought into the discussion.

Comments
Description
Keywords
Citation
Source
Copyright
Fri Jan 01 00:00:00 UTC 2016