USING WEB CORPORA FOR LINGUISTIC RESEARCH
Keywords:
Language changes rapidly, particularly through digital communication. Web-based corpora capture:emerging slang, new grammatical constructions, neologisms, and current discourse trends.Researchers can observe linguistic innovation almost in real time, which is impossible with printed corpora that may be decades old.Abstract
Web corpora—large collections of linguistic data gathered from the internet—have become essential tools in modern linguistic research. This article examines how web-derived corpora contribute to the study of vocabulary, grammar, discourse, and language variation. It analyzes the methodological advantages and limitations of using online data, highlights examples from widely used corpora such as the Corpus of Contemporary American English (COCA), the iWeb Corpus, and the TenTen corpora, and discusses how researchers employ web-harvested data for quantitative and qualitative analysis. The article argues that web corpora, despite their challenges, provide unparalleled access to vast, up-to-date linguistic data and therefore have transformed empirical language study.
References
1.Bybee, J. (2010). Language, Usage and Cognition. Cambridge University Press.
2.Kilgarriff, A., & Grefenstette, G. (2003). Introduction to the special issue on the web as corpus. Computational Linguistics, 29(3), 333–347.
3.McEnery, T., & Hardie, A. (2012). Corpus Linguistics: Method, Theory and Practice. Cambridge University Press.
4.Meyer, C. (2002). English Corpus Linguistics. Cambridge University Press.