Degree Type


Date of Award


Degree Name

Doctor of Philosophy




Applied Linguistic s and Technology

First Advisor

Carol Chapelle


Paired/group oral tasks have been shown to be effective in assessing aspects of second language (L2) oral communication ability (OCA) (e.g., Bonk & Ockey, 2003; Taylor, 2011), especially interactional competence (IC) (e.g., Galaczi, 2014; May, Nakatsuhara, Lam, & Galaczi, 2020). However, the use of these tasks in large-scale assessments is limited because of the impracticality of task administration and the potential impact of interlocutor characteristics on oral performance (e.g., Ockey, 2009; O’Sullivan, 2002). Thus, there is a need for a practical and standard way to administer such tasks. One possible way to address this need is by designing Spoken Dialog Systems (SDSs) to deliver versions of paired/group oral tasks in which computers act as conversation partners and engage in discussions with test takers. Given the lack of research on this potential solution, the current dissertation study aimed to (1) develop a prototype SDS-mediated paired oral task (SDS-POT) and (2) evaluate the appropriateness of the task for assessing L2 oral communication. The study was guided by two methodological frameworks: design-based research (McKenney & Reeves, 2012; Wang & Hanafin, 2005) and argument-based validity (Chapelle, 2021; Kane, 2006, 2013).

Based on research on SDS design and L2 oral assessment task design, the prototype SDS-POT was developed in five steps: (1) designing a paired oral task, (2) building an SDS architecture, (3) creating and analyzing a seed corpus for algorithm development, (4) developing task-specific algorithms, and (5) testing and refining the algorithms. A 4-point holistic rating scale was also devised using information obtained from two sources: a literature review on the construct of L2 OCA and a preliminary test taker discourse analysis. Following the principles of design-based research (McKenney & Reeves, 2012; Wang & Hanafin, 2005), the entire task and rating scale development process were documented to provide guidance for other test developers and researchers on building SDSs for paired oral tasks.

Utilizing the conceptual tools provided by the argument-based validity framework (Chapelle, 2021; Kane, 2006, 2013), an interpretation use argument for the SDS-POT was constructed to specify the evidence needed to support the interpretations and uses of the task scores. Through the development and evaluation of the task, various types of validity evidence were sought to justify four inferences: construct/domain definition, evaluation, generalization, and explanation. Backing for the construct/domain definition and generalization inferences came mainly from the task and rating scale development documentation. Backing for the evaluation and explanation inferences was obtained through an empirical evaluation study. Using a mixed methods convergent design, qualitative and quantitative data were collected from 30 test takers (English as a Second Language students at a Midwestern university in the U.S.) and three experienced raters. The qualitative data consisted of test taker oral task responses, stimulated recalls, and semi-structured interviews. It also included rater semi-structured interviews. The quantitative data was limited to test taker task scores.

The evaluation and explanation inferences were generally well supported by the evidence obtained from the qualitative and quantitative analyses of the data. The evaluation inference was backed by findings showing that (a) most test takers found the design of the SDS-POT appropriate for L2 oral communication assessment, (b) the raters viewed the rating scale and rater training as appropriate for scoring performance on the task, and (c) the raters could reliably rate task responses using the rating scale given (intra-class correlation coefficient = .93). The explanation inference was supported by results indicating that: (a) most IC features were consistently observable in the task responses; (b) the frequency of the use of observable IC features generally varied with score levels as expected; (c) discourse-analytic measures of fluency, pronunciation, and grammar/vocabulary exhibited differences across score levels in keeping with expectations; (d) a wide range of construct-relevant strategies were used to complete the task; and (e) the use of these strategies generally varied with score levels as expected.

This study makes an original contribution to the field of language assessment by investigating the development and evaluation of an SDS-mediated paired oral task for L2 oral communication assessment. The finding that even under development, an SDS-mediated paired oral task can measure five aspects of L2 OCA (i.e., IC, fluency, pronunciation, grammar/vocabulary, and strategic competence) is encouraging. The study has important implications for the design and validation of SDSs for paired oral tasks.


Copyright Owner

Nazlinur Gokturk Tuney



File Format


File Size

375 pages

Available for download on Saturday, January 07, 2023