This dissertation study aimed, first and foremost, to address the current scarcity of assessments of productive and contextualized academic vocabulary in English, and thus contribute to the Applied Linguistics and Language Assessment field, by developing what seems to be the first IRT-based computer-adaptive test of productive and contextualized breadth of academic vocabulary in English (henceforth, CAT-PAV), which will be made freely available online to any researchers, ESL instructors, or other interested parties who believe the test can potentially be useful for their specific scenarios. A second aim of the present study was to validate the interpretation of scores from the CAT-PAV and their use for two specific scenarios at Iowa State University (ISU): as a diagnostic tool, or as an ESL placement aid for Iowa State’s English Placement Test (EPT).

The development of the CAT-PAV was informed by evidence-centered design (Mislevy & Riconscente, 2006), a test development framework that allows for a close alignment of test development decisions with interpretations of precisely what is being measured. The validation of the intended interpretation and specific uses of the CAT-PAV presented here, in turn, was informed by an argument-based approach to validation (Chapelle, Enright, Jamieson, 2008; Kane, 2013), which specifies through an interpretive argument a chain of inferences for the interpretation and uses of test scores, along with a detailed and explicit definition of what kind of support is required for each of those inferences to be warranted in a subsequent validity argument. Five inferences were employed in the validation of the interpretation and uses of CAT-PAV scores: Domain Description, Evaluation, Generalization, Explanation, and finally, Utilization. Analysis of the extent to which each of the five inferences could be supported was based primarily on the responses of over 900 test takers (which included ESL instructors) to test items and pre- and post-test questionnaires, collected over a period of four months.

Results indicated that the first four inferences in the validity argument for CAT-PAV score interpretation and use were fully warranted, whereas the Utilization inference could only be partially warranted at this time. The task type utilized in the CAT-PAV was shown to be present in materials employed in ESL classes at ISU, and the great majority of ESL instructors polled indicated that the knowledge and abilities required to achieve a good score in the CAT-PAV are also necessary when using academic vocabulary in ESL classes at ISU (support for Domain Description inference). CAT-PAV items were shown to be increasingly monotonic, essentially unidimensional, and all versions of the test showed reliabilities above 0.91 (support for Evaluation inference), while alternate-form reliability for the test was also high (support for Generalization inference). Finally, correlations between CAT-PAV scores and scores on other tests requiring substantial knowledge of academic vocabulary, such as the TOEFL iBT or Laufer and Nation’s (1999) Vocabulary Levels Test Academic were positive (support for Explanation inference), and the majority of ESL test takers indicated that taking the test had a positive effect on their academic English, while the majority of ESL instructors believed that use of the CAT-PAV as a diagnostic or placement-aid tool at Iowa State could positively impact ESL learners and their development of academic English (partial support for the Utilization inference).

Limitations of the present study, pertaining to item development, data collection, and test administration will be discussed. Validation issues that require further investigation will also be discussed, and suggestions for future research into the CAT-PAV will be provided, with a focus on possible ways to quickly expand the current item bank for the test.

