Autores
Zamir Muhammad Tayyab
Gelbukh Alexander
Felipe Riverón Edgardo Manuel
Título Explainable AI-Driven Analysis of Radiology Reports Using Text and Image Data: Experimental Study
Tipo Revista
Sub-tipo CONACYT
Descripción JMIR Formative Research
Resumen Background: Artificial intelligence (AI) is increasingly being integrated into clinical diagnostics; yet, its lack of transparency hinders trust and adoption among health care professionals. The explainable artificial intelligence (XAI) has the potential to improve the interpretability and reliability of AI-based decisions in clinical practice. Objective: This study evaluates the use of XAI for interpreting radiology reports to improve health care practitioners’ confidence and comprehension of AI-assisted diagnostics. Methods: This study used the Indiana University chest x-ray dataset containing 3169 textual reports and 6471 images. Textual data were being classified as either normal or abnormal by using a range of machine learning approaches. This includes traditional machine learning models and ensemble methods, deep learning models (long short-term memory network), and advanced transformer-based language models (GPT-2, T5, LLaMA-2, and LLaMA-3.1). For image-based classifications, convolutional neural networks, including DenseNet121 and DenseNet169, were used. Top-performing models were interpreted using XAI methods SHAP (Shapley Adaptive Explanations) and Local Interpretable Model-Agnostic Explanations to support clinical decision making by enhancing transparency and trust in model predictions. Results: The LLaMA-3.1 model achieved the highest accuracy of 98% in classifying the textual radiology reports. Statistical analysis confirmed the model’s robustness, with Cohen κ (k=0.981) indicating near-perfect agreement beyond chance. Both the chi-square and Fisher exact tests revealed a highly significant association between the actual and predicted labels (P<.001). Although the McNemar Test yielding a nonsignificant result (P=.25) suggests a balanced class performance, the highest accuracy of 84% was achieved in the analysis of imaging data using the DenseNet169 and DenseNet121 models. To assess explainability, Local Interpretable Model-Agnostic Explanations and SHAP were applied to the best-performing models. These models consistently highlighted that the medical-related terms such as “opacity,” “consolidation,” and “pleural” are clear indications for abnormal findings in textual reports. Conclusions: The research underscores that explainability is an essential component of any AI systems used in diagnostics and is helpful in the design and implementation of AI in the health care sector. Such an approach improves the accuracy of the diagnosis and builds confidence in health workers, who in the future will use XAI in clinical settings, particularly in the application of AI explainability for medical purposes. ©Muhammad Tayyab Zamir, Safir Ullah Khan, Alexander Gelbukh, Edgardo Manuel Felipe Riverón, Irina Gelbukh.
Observaciones DOI 10.2196/77482
Lugar Toronto
País Canada
No. de páginas Article number e77482
Vol. / Cap. v. 9
Inicio 2025-10-14
Fin
ISBN/ISSN