Autores
Sidorov Grigori
Viveros Jiménez Francisco
Título One Sense per Discourse Heuristic for Improving Precision of WSD Methods based on Lexical Intersections with the Context
Tipo Revista
Sub-tipo CONACYT
Descripción Polibits
Resumen Word sense disambiguation is the task of choosing a sense for a target word in a given text using some words from the text and, in some cases, hand-tagged samples or dictionary definitions. The sense list is taken usually from an explanatory dictionary for a given language. Note that since the word is part of the text, we rely on the context words for making the decision. The methods that use information from words in the (near) context are very simple, because they consider lexical intersections of the word with the context words and/or their definitions or samples of usage. These methods reach precision of up to 70%. There are also methods that have better performance, but they are much more sophisticated: they use expensive resources – usually hand crafted – and rely on complex algorithms. In this paper, we show how to increase precision for certain word classes of these simple methods to the level comparable with that of the most sophisticated ones. Namely, we observed that these methods usually disambiguate correctly those words that conform to the One Sense per Discourse heuristic (OSD words). We used Semcor and Wikipedia to find the OSD words and left non-OSD words without disambiguation, thus improving precision at the expense of recall. Our motivation for this situation – more precision, less recall – is: (1) if we need high quality disambiguation and use human evaluators, then we can reduce the cost by asking them to disambiguate only words that are really difficult for the algorithms; (2) in an automatic system, we can apply this method for disambiguation of the corresponding words, and use other more sophisticated method for disambiguation of other words, i.e., use different methods for disambiguation (meta-disambiguation). We experimented with the complete and simplified Lesk algorithms, the graph based algorithm, and the first sense heuristic. The precision of all algorithms increases and some algorithms reach the level of the inter annotator agreement.
Observaciones DOI 10.17562/PB-57-4
Lugar Ciudad de México
País Mexico
No. de páginas 45-50
Vol. / Cap. v. 57
Inicio 2018-01-01
Fin 2018-06-30
ISBN/ISSN