Explainable AI for Classifying UTI Risk Groups Using a Real-World Linked EHR and Pathology Lab Dataset

Authors: Yujie Dai, Brian Sullivan, Axel Montout, Amy Dillon, Chris Waller, Peter Acs, Rachel Denholm, Philip Williams, Alastair D Hay, Raul Santos-Rodriguez, Andrew Dowsey

Abstract: The use of machine learning and AI on electronic health records (EHRs) holds
substantial potential for clinical insight. However, this approach faces
significant challenges due to data heterogeneity, sparsity, temporal
misalignment, and limited labeled outcomes. In this context, we leverage a
linked EHR dataset of approximately one million de-identified individuals from
Bristol, North Somerset, and South Gloucestershire, UK, to characterize urinary
tract infections (UTIs) and develop predictive models focused on data quality,
fairness and transparency. A comprehensive data pre-processing and curation
pipeline transforms the raw EHR data into a structured format suitable for AI
modeling. Given the limited availability and biases of ground truth UTI
outcomes, we introduce a UTI risk estimation framework informed by clinical
expertise to estimate UTI risk across individual patient timelines. Using this
framework, we built pairwise XGBoost models to differentiate UTI risk
categories with explainable AI techniques to identify key predictors while
ensuring interpretability. Our findings reveal differences in clinical and
demographic factors across risk groups, offering insights into UTI risk
stratification and progression. This study demonstrates the added value of
AI-driven insights into UTI clinical decision-making while prioritizing
interpretability, transparency, and fairness, underscoring the importance of
sound data practices in advancing health outcomes.

Source: http://arxiv.org/abs/2411.17645v1

About the Author

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these