Evaluating The Performance of Using Large Language Models to Automate Summarization of CT Simulation Orders in Radiation Oncology

Authors: Meiyun Cao, Shaw Hu, Jason Sharp, Edward Clouser, Jason Holmes, Linda L. Lam, Xiaoning Ding, Diego Santos Toesca, Wendy S. Lindholm, Samir H. Patel, Sujay A. Vora, Peilong Wang, Wei Liu

Abstract: Purpose: This study aims to use a large language model (LLM) to automate the
generation of summaries from the CT simulation orders and evaluate its
performance.
Materials and Methods: A total of 607 CT simulation orders for patients were
collected from the Aria database at our institution. A locally hosted Llama 3.1
405B model, accessed via the Application Programming Interface (API) service,
was used to extract keywords from the CT simulation orders and generate
summaries. The downloaded CT simulation orders were categorized into seven
groups based on treatment modalities and disease sites. For each group, a
customized instruction prompt was developed collaboratively with therapists to
guide the Llama 3.1 405B model in generating summaries. The ground truth for
the corresponding summaries was manually derived by carefully reviewing each CT
simulation order and subsequently verified by therapists. The accuracy of the
LLM-generated summaries was evaluated by therapists using the verified ground
truth as a reference.
Results: About 98% of the LLM-generated summaries aligned with the manually
generated ground truth in terms of accuracy. Our evaluations showed an improved
consistency in format and enhanced readability of the LLM-generated summaries
compared to the corresponding therapists-generated summaries. This automated
approach demonstrated a consistent performance across all groups, regardless of
modality or disease site.
Conclusions: This study demonstrated the high precision and consistency of
the Llama 3.1 405B model in extracting keywords and summarizing CT simulation
orders, suggesting that LLMs have great potential to help with this task,
reduce the workload of therapists and improve workflow efficiency.