Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang
Abstract: Data-driven deep learning models have shown great capabilities to assist
radiologists in breast ultrasound (US) diagnoses. However, their effectiveness
is limited by the long-tail distribution of training data, which leads to
inaccuracies in rare cases. In this study, we address a long-standing challenge
of improving the diagnostic model performance on rare cases using long-tailed
data. Specifically, we introduce a pipeline, TAILOR, that builds a
knowledge-driven generative model to produce tailored synthetic data. The
generative model, using 3,749 lesions as source data, can generate millions of
breast-US images, especially for error-prone rare cases. The generated data can
be further used to build a diagnostic model for accurate and interpretable
diagnoses. In the prospective external evaluation, our diagnostic model
outperforms the average performance of nine radiologists by 33.5% in
specificity with the same sensitivity, improving their performance by providing
predictions with an interpretable decision-making process. Moreover, on ductal
carcinoma in situ (DCIS), our diagnostic model outperforms all radiologists by
a large margin, with only 34 DCIS lesions in the source data. We believe that
TAILOR can potentially be extended to various diseases and imaging modalities.
Source: http://arxiv.org/abs/2407.16634v1