Evaluating the Insights of ChatGPT, Gemini and Expert Surgeons in Revision Rhinoplasty Consultation.
Abstract
[OBJECTIVES] This study aims to evaluate and compare the responses of two large language model (LLM) AI chatbots, ChatGPT and Gemini, against those provided by expert surgeons during consultations for revision rhinoplasty. Given the emotional complexities and relatively low satisfaction rates in revision cases, assessing AI's effectiveness in providing empathetic and accurate information is essential.
[MATERIALS AND METHODS] A set of fifteen hypothetical questions reflecting patient concerns were presented to ChatGPT, Gemini, and two expert surgeons. Four academic otolaryngologists rated the responses based on empathy, precision, perfectness, and communication skills using a 5-point Likert scale. The ratings were analyzed using one-way ANOVA and Bonferroni tests to determine statistical significance.
[RESULTS] ChatGPT achieved the highest mean scores across all categories, outperforming both expert surgeons significantly in empathy, precision, perfectness, and communication skills (p < 0.01). Gemini also outperformed the expert surgeons in these categories. Notably, ChatGPT excelled in perfectness compared to Gemini, while expert surgeon1 demonstrated superior precision. Evaluators showed consistent ratings in precision, perfectness, and communication skills, but significant differences were found in empathy (p < 0.01).
[CONCLUSION] ChatGPT and Gemini showed remarkable performance in consultation for revision rhinoplasty. However, there are known weak points in LLM chatbots; they can play an under-controlled role in facial plastic surgery and the healthcare system.
[LEVEL OF EVIDENCE IV] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
[MATERIALS AND METHODS] A set of fifteen hypothetical questions reflecting patient concerns were presented to ChatGPT, Gemini, and two expert surgeons. Four academic otolaryngologists rated the responses based on empathy, precision, perfectness, and communication skills using a 5-point Likert scale. The ratings were analyzed using one-way ANOVA and Bonferroni tests to determine statistical significance.
[RESULTS] ChatGPT achieved the highest mean scores across all categories, outperforming both expert surgeons significantly in empathy, precision, perfectness, and communication skills (p < 0.01). Gemini also outperformed the expert surgeons in these categories. Notably, ChatGPT excelled in perfectness compared to Gemini, while expert surgeon1 demonstrated superior precision. Evaluators showed consistent ratings in precision, perfectness, and communication skills, but significant differences were found in empathy (p < 0.01).
[CONCLUSION] ChatGPT and Gemini showed remarkable performance in consultation for revision rhinoplasty. However, there are known weak points in LLM chatbots; they can play an under-controlled role in facial plastic surgery and the healthcare system.
[LEVEL OF EVIDENCE IV] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 시술 | rhinoplasty
|
코성형술 | dict | 3 | |
| 약물 | Gemini
|
scispacy | 1 | ||
| 약물 | [OBJECTIVES]
|
scispacy | 1 | ||
| 약물 | ChatGPT
|
scispacy | 1 | ||
| 기타 | ChatGPT
|
scispacy | 1 | ||
| 기타 | Gemini
|
scispacy | 1 | ||
| 기타 | patient
|
scispacy | 1 |
🔗 함께 등장하는 도메인
이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들
관련 논문
- The impact of three-dimensional simulation and virtual reality technologies on surgical decision-making and postoperative satisfaction in aesthetic surgery: a preliminary study.
- Aesthetically ideal noses created using a single artificial intelligence model: Validating literature and exploring ethnic differences.
- Septocolumellar strut technique: Tip stability and aesthetic outcomes in rhinoplasty.
- Implications of Dermatologic Disorders in Facial Cosmetic Surgery: A Systematic Review.
- Factors on Quality of Life Improvement in Septorhinoplasty: Prospective Evaluation Using the Functional Rhinoplasty Outcome Inventory 17 and Its Minimally Important Difference.