“`html
Evaluation of Large Language Models in Breast Cancer Clinical Scenarios
Practical Solutions and Value
- Large language models (LLMs) like GPT-3.5, GPT-4.0, and Claude2 have shown potential in breast cancer diagnosis, treatment, and care.
- A study compared the performances of these LLMs in clinical scenarios related to breast cancer.
- Clinical scenarios were categorized into assessment and diagnosis, treatment decision-making, postoperative care, psychosocial support, and prognosis and rehabilitation.
- Breast cancer specialists evaluated the feedback from LLMs in terms of quality, relevance, and applicability.
Results
- There was a moderate level of agreement among the raters.
- GPT-4.0 and GPT-3.5 provided longer feedback compared to Claude2.
- GPT-4.0 outperformed the other models in average quality, relevance, and applicability.
- Across the clinical areas, GPT-4.0 surpassed GPT-3.5 in quality and scored higher than Claude2 in tasks related to psychosocial support and treatment decision-making.
Conclusion
- GPT-4.0 shows superiority in quality, relevance, and applicability in clinical applications for breast cancer compared to GPT-3.5.
- It also holds advantages over Claude2 in specific domains.
- Optimization and accuracy assessments are crucial for the expanding use of LLMs in the clinical field.
“`