A. Jalili, H. Tabrizchi, A. Mosavi, A. R. Várkonyi-Kóczy: Enhancing Language Model Performance with a Novel Text Preprocessing Method. ACTA PHYSICA POLONICA A Vol. 146, No. 4, 2024. pp. 542–552. ISSN 0587-4246 link
Abstract: Advances in natural language processing highlight the importance of text data preparation with machine learning. It has been reported that the traditional methods often fail to deal with the language complexity which affects model performance. Consequently, this paper proposes an approach which uses tokenization, noise reduction, and normalization to improve text quality.