Date of Award
Doctor of Philosophy
Applied Linguistics and Technology
John M. Levis
Segmental perception training is important as many phonemic errors are common in second language pronunciation and the perception of foreign phonemic contrasts is often difficult to acquire without instruction (Best & Tyler, 2007; Birdsong, 1992, 2006; Flege, 1988, 1995). Numerous computer-assisted programs exist that provide training for segmental perception, but few of them have made effective use of already-existing language resources. There has been a call for the creation of a computer-assisted pronunciation teaching (CAPT) program that provides individualized needs-based training based on first language and learner proficiency (Levis, 2007; Munro, Derwing, & Thomson, 2015). A perception training model is yet to be developed that takes into account the major components important to intelligibility, the use of technology, and the state-of-the-art research findings on perception training. Specifically, the ideal training model first needs to account for learners’ L1 backgrounds since L2 segmental errors are often L1-specific (Swan & Smith, 2002). Second, the training model should also be tailored to individual needs as not everyone sharing the same L1 will certainly have the L1-predicted errors (Munro, 2018; Munro, Derwing, & Thomson, 2015). Third, the functional load theory (King, 1967) suggests that not all phonemic errors affect intelligibility equally and that perception training should not target all errors as if they had an equal impact on intelligibility. Fourth, the training model should leverage a high-variability phonetic training design, defined as a technique of using multiple voice models for perception training (Pisoni & Lively, 1995), which has been found to be efficacious in improving perception (Thomson, 2012; Wang & Munro, 2004) as well as production (Thomson, 2011).
This study introduces an innovative online perception training system that uses computational approaches to deliver high variability phonetic training designed to improve learners’ ability to discriminate and identify segmental contrasts. The system was designed with five major features. First, the system was developed with intelligibility-driven goals by only focusing on high functional load segmental errors. Second, the system offered training customized to individual learners’ pre-training diagnostic performance and then adapted the training content and intensity based on individual learners’ errors during real-time learning. Third, in recognition of the efficacy of multi-voice models for perception acquisition (Thomson, 2011, 2012; Wang & Munro, 2004), the system utilized high-variability phonetic training exercises developed using two North American text-to-speech voices. Fourth, the training system was self-contained and could be accessed and used by learners flexibly and independently based on their own pace with little teacher guidance. Fifth, immediate individualized feedback was available on every item during training. In addition, the stimuli used for the training system were automatically extracted from a phonetically transcribed dictionary with word frequency controlled. Specifically, only words among the top 5,000 lemmas in the Contemporary Corpus and American English were selected by the system to ensure that all the training and test stimuli were likely to be familiar to the participants in the study so that they would be able to recognize the stimuli aurally during perception tests and training without seeing the words spelled out.
Four types of exercises created with text-to-speech minimal pairs, automatically extracted from the Illinois Speech and Language Engineering Dictionary, were used for training. The training exercises came in four types: same-different discrimination, oddity discrimination, simple identification, and yes/no identification. The voices and words of the training stimuli were controlled for in order to examine the learners’ potential transfer of perception gains to three novel conditions: to trained words spoken with untrained voices, to untrained words spoken with trained voices, and to trained items in sentences. The training system was used for approximately three months by 266 Chinese-L1 English majors from three universities located in three cities (Harbin, Soochow, and Guangzhou). The learners were placed into either an experimental group or a control group based on their institution, and used the system for perception training on nine English consonant and vowel contrasts that were predicted to be challenging for the learners.
An analysis of the participants’ diagnostic and training performance revealed substantial variation among the learners’ actual segmental errors and pace of learning. This suggests that L2 phonemic acquisition is not merely L1-specific or dialect-specific but is a process distinctive to individual learners but that was not correlated with time on training, highlighting the importance of incorporating adaptability in the design and delivery of pronunciation training materials. Descriptive and inferential statistics on training effect, retention and transfer of test gains showed that an average of 143 minutes of focused effort led to robust improvement and retention of phonemic perception for most of the segmental contrasts under investigation. L2 segmental acquisition was sensitive to the linguistic context of a segment and the training in the study helped the learners transfer perception gains to untrained contexts (new voices, new words, and the untrained sentence contexts). The results showed that high-variability input materials and the text-to-speech technology can be effectively used to develop perception training materials. The study also showed that exercises designed to specifically sharpen aural sensitivity to contrasting phonemes may facilitate learners’ ability in self correcting phonemic issues even without explicit training on the issues. Findings in the study were discussed within the exemplar theory (Bybee, 2000), the analogical modeling theory (Skousen, 1989), the TRACE model within the connectionist framework (Joanisse & McClelland, 2015), the item versus system learning theory (Cruttenden, 1981), the U-shaped Learning Theory (Gass & Selinker, 2008), and the Speech Learning Model (Flege, 1995). Future research is encouraged to investigate the effect of adaptive perception training in improving learner response latency and productive performance that are essential to real life pronunciation and communication competence.
Qian, Manman, "An adaptive computational system for automated, learner-customized segmental perception training in words and sentences: Design, implementation, assessment" (2018). Graduate Theses and Dissertations. 17293.