OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification
This proceeding is published as Vajjala, Sowmya, and Ivana Lucic. "OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification." In Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications (2018): 297-304.
This paper describes the collection and compilation of the OneStopEnglish corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification. The corpus consists of 189 texts, each in three versions (567 in total). The corpus is now freely available under a CC by-SA 4.0 license1 and we hope that it would foster further research on the topics of readability assessment and text simplification.