| Nurzhan Amantay, Yasin Ortakcı, Oğuz Fındık Primary and Secondary Language Identification in Multilingual Texts Using a Multi-Layer Perceptron and Kolmogorov-Arnold Network |
|---|
| Abstract. Language identification in multilingual texts is a critical task for Natural Language Processing (NLP) applications, particularly when texts combine primary and secondary languages. This study introduces a novel approach to identifying both primary and secondary languages using a dataset of 11,000 multilingual sentences, encompassing 27 primary and 58 secondary languages, including Indian languages such as Hindi, Tamil, and Telugu. We utilize Multi-Layer Perceptron (MLP) and Kolmogorov-Arnold Networks (KAN) classifiers, capitalizing on their pattern recognition capabilities to detect linguistic features. The MLP model achieves an accuracy of 99% for primary language identification and 85% for secondary language identification. In comparison, the KAN model attains a similar primary language accuracy of 99% while improving secondary language accuracy to 87%. Both models were tested on a subset of the dataset reserved for evaluation, ensuring robust performance assessment. This work underscores the potential of KAN-based models for robust language identification and provides a foundation for future research in multilingual NLP applications. |
| Keywords: Language Identification, Multilingual text, Multi-Layer Perceptron, Kolmogorov-Arnold Network |
Download PDF |
| DOI: |

Download PDF