Autores
Alcántara Medina Tania Gisela
García Vázquez Omar
Hernández Laureano Mayte
Calvo Castro Francisco Hiram
Título LyricScraper: A Dataset of Spanish Song Lyrics Created via Web Scraping and Dual-labeling for LLM Classification
Tipo Revista
Sub-tipo CONACYT
Descripción Computación y Sistemas
Resumen Songs represent a powerful means of expressing emotions through melody and lyrics. This study focuses on understanding and classifying emotions present in songs, ranging from positive and negative to neutral emotions. This classification and understanding would not be possible without data, which was gathered using a proprietary web scraping algorithm to collect lyrics data online. Subsequently, a pseudo-labeling approach based on BERT was employed to assign sentiment labels to these lyrics, leveraging BERT’s ability to comprehend context and semantic relationships in language. This process enhanced the dataset’s quality and contributed to the success of sentiment analysis in songs. The new dataset addressed challenges related to sentence length by providing examples of song lyrics of varying lengths, facilitating more effective model training. Additionally, data imbalance was addressed through careful sample selection, representing a wide range of emotions in songs. This new dataset underwent classification using large-scale language models, achieving promising results. The accuracy metric reached an impressive 97.66% for DistilBERT and 97.83% for the F1 metric, highlighting the effectiveness of this approach in song sentiment analysis. This study underscores the importance of understanding emotions in songs and offers practical solutions to enhance the capabilities of language models in this task. © 2024 Instituto Politecnico Nacional. All rights reserved.
Observaciones DOI 10.13053/CyS-28-4-5292
Lugar Ciudad de México
País Mexico
No. de páginas 2251-2260
Vol. / Cap. v. 28 no. 4
Inicio 2024-10-01
Fin
ISBN/ISSN