Artificial intelligence as a surgical advisor before a DIEP breast reconstruction. A blinded comparative study of three large language models.
Abstract
[INTRODUCTION] Large language models (LLMs) are increasingly used in clinical communication, but their accuracy and readability in patient education remain unclear. This study compared three LLMs for preoperative counseling before a DIEP breast reconstruction.
[METHODS] A total of 40 frequently asked preoperative questions regarding DIEP breast reconstruction were collected and categorized using the BREAST-Q framework. These were submitted in English to three LLMs: ChatGPT, Gemini and Copilot (anonymized as Model A-C). Each question was submitted to all three models and the responses were anonymized. An expert panel of eight board-certified plastic surgeons from both Europe and USA. Ratings were made of a 5-point Likert scale for accuracy, informativeness and readability. Together with a general evaluation (easiness, problematic content, incorrectness) and information-material specific evaluation (relevance and lowest reading level).
[RESULTS] Significant differences were found between models across all domains. ChatGPT achieved the highest accuracy ( = 0.019), Copilot was the most informative ( = 0.041), and both ChatGPT and Copilot produced more readable responses than Gemini ( < 0.001). Copilot had fewer problematic statements, while Gemini generated text at the simplest reading level but with lower accuracy. Agreement among raters was strong for accuracy (κ = 0.96) but weak for qualitative domains.
[CONCLUSION] Each LLM showed distinct strength ChatGPT produced the most accurate answers, Copilot the most informative, and Gemini the simplest language. No model was uniformly superior. These findings support supervised, task-specific use of LLMs in patient education for breast reconstruction.
[METHODS] A total of 40 frequently asked preoperative questions regarding DIEP breast reconstruction were collected and categorized using the BREAST-Q framework. These were submitted in English to three LLMs: ChatGPT, Gemini and Copilot (anonymized as Model A-C). Each question was submitted to all three models and the responses were anonymized. An expert panel of eight board-certified plastic surgeons from both Europe and USA. Ratings were made of a 5-point Likert scale for accuracy, informativeness and readability. Together with a general evaluation (easiness, problematic content, incorrectness) and information-material specific evaluation (relevance and lowest reading level).
[RESULTS] Significant differences were found between models across all domains. ChatGPT achieved the highest accuracy ( = 0.019), Copilot was the most informative ( = 0.041), and both ChatGPT and Copilot produced more readable responses than Gemini ( < 0.001). Copilot had fewer problematic statements, while Gemini generated text at the simplest reading level but with lower accuracy. Agreement among raters was strong for accuracy (κ = 0.96) but weak for qualitative domains.
[CONCLUSION] Each LLM showed distinct strength ChatGPT produced the most accurate answers, Copilot the most informative, and Gemini the simplest language. No model was uniformly superior. These findings support supervised, task-specific use of LLMs in patient education for breast reconstruction.
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 해부 | breast
|
유방 | dict | 5 |
📑 인용 관계
이 논문이 참조한 문헌 16
- Artificial Intelligence in Plastic Surgery: ChatGPT as a Tool to Address Disparities in Health Liter…
- Artificial Intelligence Language Model Performance for Rapid Intraoperative Queries in Plastic Surge…
- ChatGPT for improving postoperative instructions in multiple fields of plastic surgery.
- Patient Management Strategies in Perioperative, Intraoperative, and Postoperative Period in Breast R…
- Lumbar Flap versus the Gold Standard: Comparison to the DIEP Flap.
🔗 함께 등장하는 도메인
이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들
관련 논문
- The impact of three-dimensional simulation and virtual reality technologies on surgical decision-making and postoperative satisfaction in aesthetic surgery: a preliminary study.
- Cutaneous fistula of the breast: A complication of cosmetic autologous fat transfer.
- Epidermal inclusion cyst after breast reduction mammoplasty.
- The Plastic Surgery In-Service Examination: A Scoping Review.
- Clinical outcomes of synthetic absorbable mesh use in breast surgery: First case series in reconstruction and aesthetic mastopexy.