| Título |
Using Transformers on Noisy vs. Clean Data for Paraphrase Identification in Mexican Spanish |
| Tipo |
Congreso |
| Sub-tipo |
Memoria |
| Descripción |
2022 Iberian Languages Evaluation Forum, IberLEF 2022 |
| Resumen |
Paraphrase identification is relevant for plagiarism detection, question answering, and machine translation among others. In this work, we report a transfer learning approach using transformers to tackle paraphrase identification on noisy vs. clean data in Spanish as our contribution to the PAR-MEX 2022 shared task. We carried out fine-tuning as well as hyperparameters tuning on BERTIN, a model pre-trained on the Spanish portion of a massive multilingual web corpus. We achieved the best performance in the competition (F1 = 0.94) by fine-tuning BERTIN on noisy data and using it to identify paraphrase on clean data. © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). |
| Observaciones |
CEUR Workshop Proceedings, v. 3202 |
| Lugar |
Coruña |
| País |
España |
| No. de páginas |
|
| Vol. / Cap. |
|
| Inicio |
2022-09-20 |
| Fin |
|
| ISBN/ISSN |
|