THE SPECIFIC FEATURES OF CREATING A CORPUS TO ANALYZE TEXT COMPLEXITY IN THE ENGLISH LANGUAGE
Keywords:
Text complexity; corpus linguistics; English language; readability; lexical analysis; syntactic annotation; discourse analysis; computational linguisticsAbstract
: the study of text complexity in English has gained growing importance within applied linguistics, language pedagogy, readability studies, and computational linguistics. As educational systems seek to align reading materials with learners’ proficiency levels, and as computational models increasingly rely on large datasets for natural language understanding, the need for systematically structured corpora becomes more apparent. Text complexity encompasses lexical, syntactic, semantic, and discourse-level dimensions, which are best examined through well-constructed corpora that represent authentic language use. This article provides an in-depth examination of the specific features required to create a corpus for analyzing text complexity in English. It outlines methodological principles, text selection strategies, annotation procedures, computational tools, and challenges inherent in corpus construction. By synthesizing current research and practical approaches, the study demonstrates how a reliably designed corpus can significantly contribute to understanding linguistic difficulty, improving educational practices, and advancing computational text analysis.