AI DAVRIDA MAʼLUMOT SERIYALASHNING YANGI BOSQICHI: TOONNING JSONGA NISBATAN TOKEN TEJAMKORLIGI VA PARSING ANIQLIGI
Keywords:
sunʼiy intellekt, maʼlumot seriyalash, TOON formati, JSON formati, tokenizatsiya, katta til modellari, LLM, byte samaradorligi, strukturaviy optimallashtirish, parsing aniqligi, iqtisodiy samaradorlik, rekursiv funksiyalar, UTF 8 kodlash.Abstract
Ushbu maqolada sunʼiy intellekt tizimlarida maʼlumot seriyalash
formatlarining rivojlanishi, xususan, Token-Oriented Object Notation (TOON)
formatining JavaScript Object Notation (JSON) formatiga nisbatan ustunliklari
matematik jihatdan chuqur tahlil qilingan. Tadqiqot davomida Large Language
Models (LLM) tizimlari uchun tokenizatsiya xarajatlarini kamaytirish maqsadida,
TOON formatining strukturaviy ortiqchalikni qanday bartaraf etishi va byte
samaradorligini oshirish mexanizmlari qatʼiy matematik formulalar orqali
isbotlangan. Maqolada ikki format oʻrtasida rekursiv byte uzunligi funksiyalari (Ljson
va Ltoon) asosida formal matematik taqqoslash natijalari berilgan. Tadqiqot natijalari
TOON formatining massivlar massivi tuzilmasidan tashqari barcha hollarda 28.6%
dan 71.4% gacha token sonini kamaytirishni taʼminlashini koʻrsatadi,
References
1.
Grand View Research. (2024). Large Language Model Market Size,
Share & Trends Analysis Report. https://www.grandviewresearch.com/industry
analysis/large-language-model-market
2.
Bray, T. (Ed.). (2017). The JavaScript Object Notation (JSON) Data
Interchange Format (RFC 8259). Internet Engineering Task Force.
https://doi.org/10.17487/RFC8259
3.
Sennrich, R., Haddow, B., & Birch, A. (2016). Neural Machine
Translation of Rare Words with Subword Units. Proceedings of the 54th Annual
Meeting
of
the
Association
https://doi.org/10.18653/v1/P16-1162
4.
for
Computational
Linguistics.
Lafalce, M. (2024). TOON vs. JSON: A Mathematical Evaluation
of Byte Efficiency in Structured Data. UTN-FRLP. https://toonformat.dev/
5.
6.
Generative
OpenAI. (2024). API Pricing. https://openai.com/pricing
McKinsey & Company. (2023). The Economic Potential of
AI:
The
Next
Productivity
Frontier.
https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the
economic-potential-of-generative-ai-the-next-productivity-frontier
7.
(Standard
Ecma International. (2017). The JSON Data Interchange Syntax
No.
ECMA-404,
2nd
ed.).
international.org/publications-and-standards/standards/ecma-404/
8.
W3C. (2008). Extensible Markup Language (XML) 1.0 (Fifth
Edition). https://www.w3.org/TR/xml/