Are large language models consistent with the ASPS and AAPS guidelines? A comparison of AI chatbot recommendations and plastic surgery clinical guidance.
Abstract
[INTRODUCTION] Assessing the ability of AI chatbots to provide information consistent with clinical guidelines is essential for evaluating the accuracy of the information that patients may receive. We evaluated the ability of three widely used chatbots to reference and respond to clinical questions in alignment with the American Society of Plastic Surgeons' (ASPS) clinical guidelines.
[METHODS] Evidence-based clinical practice guidelines from ASPS and the American Association of Plastic Surgeons (AAPS) were used to develop prompts for ChatGPT-4, Meta Llama 3.1, and Microsoft Copilot. Reviewers determined if the chatbots' answer aligned with the ASPS guidelines. Any reference to ASPS by the chatbots was recorded. Descriptive statistics were used for data analysis.
[RESULTS] Forty-nine total recommendations from five clinical guidelines were included: reduction mammoplasty, autologous breast reconstruction, breast-implant associated anaplastic large cell lymphoma, eyelid surgery, and reconstruction after skin cancer. Copilot cited ASPS recommendations most frequently (Copilot: 67.3%, Llama: 34.7%, ChatGPT: 16.3%; p<0.0001) and had the highest rate of ASPS- and AAPS-aligned responses (Copilot: 79.6%, Llama: 73.5%, ChatGPT: 69.4%; p>0.05). Among the misaligned responses, neutral responses were most common with no significant differences among the chatbots (Copilot: 60%, Llama: 69.2%, ChatGPT: 40%; p=0.62).
[CONCLUSION] In our study, up to 30% of chatbot responses did not align with ASPS and AAPS guidance. These results indicate a need for advocacy from plastic surgery societies regarding patient reliance on AI chatbots and training AI models specific to the specialty.
[METHODS] Evidence-based clinical practice guidelines from ASPS and the American Association of Plastic Surgeons (AAPS) were used to develop prompts for ChatGPT-4, Meta Llama 3.1, and Microsoft Copilot. Reviewers determined if the chatbots' answer aligned with the ASPS guidelines. Any reference to ASPS by the chatbots was recorded. Descriptive statistics were used for data analysis.
[RESULTS] Forty-nine total recommendations from five clinical guidelines were included: reduction mammoplasty, autologous breast reconstruction, breast-implant associated anaplastic large cell lymphoma, eyelid surgery, and reconstruction after skin cancer. Copilot cited ASPS recommendations most frequently (Copilot: 67.3%, Llama: 34.7%, ChatGPT: 16.3%; p<0.0001) and had the highest rate of ASPS- and AAPS-aligned responses (Copilot: 79.6%, Llama: 73.5%, ChatGPT: 69.4%; p>0.05). Among the misaligned responses, neutral responses were most common with no significant differences among the chatbots (Copilot: 60%, Llama: 69.2%, ChatGPT: 40%; p=0.62).
[CONCLUSION] In our study, up to 30% of chatbot responses did not align with ASPS and AAPS guidance. These results indicate a need for advocacy from plastic surgery societies regarding patient reliance on AI chatbots and training AI models specific to the specialty.
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 해부 | breast
|
유방 | dict | 2 | |
| 시술 | eyelid surgery
|
안검성형술 | dict | 1 | |
| 시술 | reduction mammoplasty
|
유방성형술 | dict | 1 | |
| 해부 | eyelid
|
눈꺼풀 | dict | 1 | |
| 합병증 | anaplastic large cell lymphoma
|
보형물연관 역형성대세포림프종 | dict | 1 | |
| 약물 | [INTRODUCTION]
|
scispacy | 1 | ||
| 약물 | ChatGPT
|
scispacy | 1 | ||
| 질환 | ASPS
→ American Society of Plastic Surgeons'
|
scispacy | 1 | ||
| 질환 | skin cancer
|
C0007114
Malignant neoplasm of skin
|
scispacy | 1 | |
| 기타 | patients
|
scispacy | 1 | ||
| 기타 | Llama
|
scispacy | 1 | ||
| 기타 | ASPS-
|
scispacy | 1 | ||
| 기타 | Copilot
|
scispacy | 1 | ||
| 기타 | patient
|
scispacy | 1 |
🔗 함께 등장하는 도메인
이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들
관련 논문
- The impact of three-dimensional simulation and virtual reality technologies on surgical decision-making and postoperative satisfaction in aesthetic surgery: a preliminary study.
- Cutaneous fistula of the breast: A complication of cosmetic autologous fat transfer.
- Epidermal inclusion cyst after breast reduction mammoplasty.
- Penetrating globe injury following periocular hyaluronic acid filler injection: A case report.
- Implications of Dermatologic Disorders in Facial Cosmetic Surgery: A Systematic Review.