Autores
Bade Girma Yohannis
Kolesnikova Olga
Oropeza Rodríguez José Luis
Sidorov Grigori
Título Lexicon-based Language Relatedness Analysis
Tipo Congreso
Sub-tipo Memoria
Descripción 6th International Conference on AI in Computational Linguistics, ACLing 2024
Resumen The field of computational linguistics has been impacting various issues in language disciplines. The enormous growth of machine learning algorithms and Natural Language Processing (NLP) empowers its advancement and brings huge benefits to societies. For instance, machine translation, text summarization, sentence auto-completion, and sentiment analysis are a few of its benefits. However, leveraging this opportunity for low-resourced languages is challenging due to the lack of available electronic datasets. This paper presents a lexicon-based language relatedness analysis on Ethiopian low-resourced languages. The languages Wolaita, Dawuro, Gamo, and Gofa belong to the Ethiopian Omotic language family and share rich linguistic cultures and similarities. However, the extent of their inter-relatedness remains unknown. To address this gap, we collected and prepared novel corpora from the Bible and academic texts. We employed the TF-IDF technique for feature extraction and used the cosine similarity method to measure the similarities among these languages. In addition to cosine similarity, we used Euclidean distance to measure the spatial distances between the languages. The experiment results showed that Wolaita and Gofa exhibited high relatedness (33.4%), while Dawuro and Gamo demonstrated low relatedness (12.1%). © 2024 The Authors. Published by Elsevier B.V.
Observaciones 10.1016/j.procs.2024.10.200 Procedia Computer Science, v. 244
Lugar Hybrid, Dubai
País Afghanistan
No. de páginas 268-277
Vol. / Cap.
Inicio 2024-09-21
Fin 2024-09-22
ISBN/ISSN