| Resumen |
Monitoring online sources for pests and diseases is a crucial component of the early warning system, as it identifies and extracts documents to generate timely information about potential threats that could spread within the national territory. The National Service for Agrifood Health, Safety, and Quality (SENASICA) is the Mexican government agency responsible for safeguarding agricultural, aquaculture, and livestock resources from quarantine-significant pests and diseases. To support this effort, a news extractor was developed using web scraping with targeted keywords to collect articles about pests that may pose a risk to Mexico's food sector. Through natural language processing, relevant data from these news articles are extracted, including the title, country, mentioned pest, event date, and other key details. Furthermore, by leveraging transformers, specifically the RoBERTa model and NER, key information was effectively extracted from the news text, improving the identification of relevant entities and their relationships. Additionally, a scoring formula was applied to measure the relevance of each article, ensuring that the most critical reports are prioritized. To evaluate the effectiveness of the extraction, another system visually presents the collected information with statistical insights, allowing SENASICA analysts to determine whether the extracted pest data is sufficient and assess the potential risk of spread within Mexican territory, supported by additional indicators. © 2025 SPIE. |