SABER

Autores
Bade Girma Yohannis
Kolesnikova Olga
Oropeza Rodríguez José Luis
Sidorov Grigori

Título	Lexicon-based Language Relatedness Analysis
Tipo	Congreso
Sub-tipo	Memoria
Descripción	6th International Conference on AI in Computational Linguistics, ACLing 2024
Resumen	The field of computational linguistics has been impacting various issues in language disciplines. The enormous growth of machine learning algorithms and Natural Language Processing (NLP) empowers its advancement and brings huge benefits to societies. For instance, machine translation, text summarization, sentence auto-completion, and sentiment analysis are a few of its benefits. However, leveraging this opportunity for low-resourced languages is challenging due to the lack of available electronic datasets. This paper presents a lexicon-based language relatedness analysis on Ethiopian low-resourced languages. The languages Wolaita, Dawuro, Gamo, and Gofa belong to the Ethiopian Omotic language family and share rich linguistic cultures and similarities. However, the extent of their inter-relatedness remains unknown. To address this gap, we collected and prepared novel corpora from the Bible and academic texts. We employed the TF-IDF technique for feature extraction and used the cosine similarity method to measure the similarities among these languages. In addition to cosine similarity, we used Euclidean distance to measure the spatial distances between the languages. The experiment results showed that Wolaita and Gofa exhibited high relatedness (33.4%), while Dawuro and Gamo demonstrated low relatedness (12.1%). © 2024 The Authors. Published by Elsevier B.V.
Observaciones	10.1016/j.procs.2024.10.200 Procedia Computer Science, v. 244
Lugar	Hybrid, Dubai
País	Afghanistan
No. de páginas	268-277
Vol. / Cap.
Inicio	2024-09-21
Fin	2024-09-22
ISBN/ISSN