| Título |
Author identification using latent dirichlet allocation |
| Tipo |
Congreso |
| Sub-tipo |
Memoria |
| Descripción |
18th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2017 |
| Resumen |
We tackle the task of author identification at PAN 2015 through a Latent Dirichlet Allocation (LDA) model. By using this method, we take into account the vocabulary and context of words at the same time, and after a statistical process find to what extent the relations between words are given in each document; processing a set of documents by LDA returns a set of distributions of topics. Each distribution can be seen as a vector of features and a fingerprint of each document within the collection. We used then a Naïve Bayes classifier on the obtained patterns with different performances. We obtained state-of-the-art performance for English, overtaking the best FS score reported in PAN 2015, while obtaining mixed results for other languages. © Springer Nature Switzerland AG 2018. |
| Observaciones |
DOI 10.1007/978-3-319-77116-8_22, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), V. 10762 |
| Lugar |
Budapest |
| País |
Hungria |
| No. de páginas |
303-312 |
| Vol. / Cap. |
10762 LNCS |
| Inicio |
2017-04-17 |
| Fin |
2017-04-23 |
| ISBN/ISSN |
9783319771151 |