Autores
Calvo Castro Francisco Hiram
Ávila Argüelles Ricardo
Gelbukh Alexander
Godoy Calderón Salvador
Título Assigning Library of Congress Classification Codes to Books Based Only on their Titles
Tipo Revista
Sub-tipo SCOPUS
Descripción Informatica (Ljubljana)
Resumen Many publishers follow the Library of Congress Classification (LCC) scheme to indicate a classification code on the first pages of their books. This is useful for many libraries worldwide because it akes possible to search and retrieve books by content type, and this scheme has become a de facto standard. However, not every book has been pre-classified by the publisher; in particular, in many universities, new dissertations have to be classified manually. Although there are many systems available for automatic text classification, all of them use extensive information which is not always available, such as the index, abstract, or even the whole content of the work. In this work, we present our experiments on supervised classification of books by using only their title, which would allow massive automatic indexing. We propose a new text comparison measure, which mixes two well-known text classification techniques: the Lesk voting scheme and the Term Frequency (TF). In addition, we experiment with different weighing as well as logical-combinatorial methods such as ALVOT in order to determine the contribution of the title in the correct classification. We found this contribution to be pproximately one third, as we correctly classified 36% (on average by each branch) of 122,431 previously unseen titles (in total) upon training with 489,726 samples (in otal) of one major branch (Q) of the LCC catalogue.
Observaciones Received: February 4, 2009
Lugar
País Eslovenia
No. de páginas 77-84
Vol. / Cap. Vol. 34, Issue 1
Inicio 2010-01-01
Fin
ISBN/ISSN