Authors: Xiaoman Zhang, Hong-Yu Zhou, Xiaoli Yang, Oishi Banerjee, Julián N. Acosta, Josh Miller, Ouwen Huang, Pranav Rajpurkar
Abstract: AI-driven models have demonstrated significant potential in automating
radiology report generation for chest X-rays. However, there is no standardized
benchmark for objectively evaluating their performance. To address this, we
present ReXrank, https://rexrank.ai, a public leaderboard and challenge for
assessing AI-powered radiology report generation. Our framework incorporates
ReXGradient, the largest test dataset consisting of 10,000 studies, and three
public datasets (MIMIC-CXR, IU-Xray, CheXpert Plus) for report generation
assessment. ReXrank employs 8 evaluation metrics and separately assesses models
capable of generating only findings sections and those providing both findings
and impressions sections. By providing this standardized evaluation framework,
ReXrank enables meaningful comparisons of model performance and offers crucial
insights into their robustness across diverse clinical settings. Beyond its
current focus on chest X-rays, ReXrank’s framework sets the stage for
comprehensive evaluation of automated reporting across the full spectrum of
medical imaging.
Source: http://arxiv.org/abs/2411.15122v1