PART-OF-SPEECH TAGGING AND ITS APPLICATIONS
Keywords:
part-of-speech tagging, corpus linguistics, natural language processing, linguistic annotation, morphologically rich languages, text analysis, machine learning, computational linguistics, Kazakh language, Uzbek language.Abstract
This article investigates the theoretical foundations, methodologies, and applications of part-of-speech (POS) tagging within the framework of corpus linguistics. POS tagging, as a fundamental stage of natural language processing (NLP), enables computational systems to identify and classify words into grammatical categories, thereby providing a structural basis for syntactic and semantic analysis. The study reviews the evolution of POS tagging approaches—from rule-based to stochastic, hybrid, and deep learning models—and emphasizes their role in linguistic annotation and empirical language research. Using corpus-based examples, the research explores how accurate POS tagging enhances tasks such as parsing, information retrieval, text classification, and machine translation. The discussion highlights challenges specific to morphologically rich and low-resource languages, including Kazakh and Uzbek, and outlines strategies for building effective language resources. The study concludes that POS tagging is not only a technical process but also a methodological instrument for linguistic inquiry, linking computational technology with linguistic theory.
References
1. Brill E. A simple rule-based part of speech tagger // Proceedings of the Third Conference on Applied Natural Language Processing. – Trento, 1992. – P. 152-155.
2. Charniak E. Statistical Techniques for Natural Language Parsing. – Cambridge: MIT Press, 1997. – 318 p.
3. Devlin J., Chang M.-W., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding // Proceedings of NAACL. – 2019. – P. 4171-4186.
4. Greene B.B., Rubin G.M. Automatic grammatical tagging of English // Technical Report, Brown University. – 1971. – 48 p.
5. Kadirova N. Neural Approaches to Morphological Analysis for the Uzbek Language // Computational Linguistics and Intelligent Systems. – 2021. – Vol. 5. – P. 121-132.
6. Leech G., Wilson A. Standards for Tagging and Annotation // In: Ide N., Véronis J. (Eds.), Text Encoding Initiative Guidelines. – Oxford University Press, 1999. – P. 105-118.
7. Mukushev M., Nurkasymova S., Yessenbayev Z. Developing a Morphological Analyzer for the Kazakh Language // Bulletin of KazNU. – 2020. – Vol. 179, No. 1. – P. 55-63.
8. Nivre J. et al. Universal Dependencies v2: An ever-growing multilingual treebank collection // Proceedings of LREC. – 2018. – P. 1863-1871.
9. Schütze H. Part-of-Speech Tagging // In: Jurafsky D., Martin J.H. Speech and Language Processing. – 2nd ed. – Prentice Hall, 2009. – P. 181-210.
10. Turchin A.V. Machine Translation and Morphological Tagging of Turkic Languages. – Almaty: KazNU Press, 2022. – 142 p.
11. Zubanova T.A. Corpus Linguistics: Methods and Technologies. – Moscow: INFRA-M, 2018. – 228 p.