Evaluation of Rhinoplasty Information from ChatGPT, Gemini, and Claude for Readability and Accuracy.
Abstract
[OBJECTIVE] Assessment of the readability, accuracy, quality, and completeness of ChatGPT (Open AI, San Francisco, CA), Gemini (Google, Mountain View, CA), and Claude (Anthropic, San Francisco, CA) responses to common questions about rhinoplasty.
[METHODS] Ten questions commonly encountered in the senior author's (SPM) rhinoplasty practice were presented to ChatGPT-4, Gemini and Claude. Seven Facial Plastic and Reconstructive Surgeons with experience in rhinoplasty were asked to evaluate these responses for accuracy, quality, completeness, relevance, and use of medical jargon on a Likert scale. The responses were also evaluated using several readability indices.
[RESULTS] ChatGPT achieved significantly higher evaluator scores for accuracy, and overall quality but scored significantly lower on completeness compared to Gemini and Claude. All three chatbot responses to the ten questions were rated as neutral to incomplete. All three chatbots were found to use medical jargon and scored at a college reading level for readability scores.
[CONCLUSIONS] Rhinoplasty surgeons should be aware that the medical information found on chatbot platforms is incomplete and still needs to be scrutinized for accuracy. However, the technology does have potential for use in healthcare education by training it on evidence-based recommendations and improving readability.
[LEVEL OF EVIDENCE V] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
[METHODS] Ten questions commonly encountered in the senior author's (SPM) rhinoplasty practice were presented to ChatGPT-4, Gemini and Claude. Seven Facial Plastic and Reconstructive Surgeons with experience in rhinoplasty were asked to evaluate these responses for accuracy, quality, completeness, relevance, and use of medical jargon on a Likert scale. The responses were also evaluated using several readability indices.
[RESULTS] ChatGPT achieved significantly higher evaluator scores for accuracy, and overall quality but scored significantly lower on completeness compared to Gemini and Claude. All three chatbot responses to the ten questions were rated as neutral to incomplete. All three chatbots were found to use medical jargon and scored at a college reading level for readability scores.
[CONCLUSIONS] Rhinoplasty surgeons should be aware that the medical information found on chatbot platforms is incomplete and still needs to be scrutinized for accuracy. However, the technology does have potential for use in healthcare education by training it on evidence-based recommendations and improving readability.
[LEVEL OF EVIDENCE V] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 시술 | rhinoplasty
|
코성형술 | dict | 5 | |
| 약물 | [OBJECTIVE]
|
scispacy | 1 | ||
| 약물 | ChatGPT
|
scispacy | 1 | ||
| 약물 | Gemini
→ Google, Mountain View, CA
|
scispacy | 1 | ||
| 약물 | [CONCLUSIONS] Rhinoplasty
|
scispacy | 1 | ||
| 기타 | ChatGPT
|
scispacy | 1 | ||
| 기타 | Gemini
→ Google, Mountain View, CA
|
scispacy | 1 |
MeSH Terms
Rhinoplasty; Humans; Comprehension; Female; Male; Surveys and Questionnaires; Internet; Generative Artificial Intelligence
🔗 함께 등장하는 도메인
이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들
관련 논문
- The impact of three-dimensional simulation and virtual reality technologies on surgical decision-making and postoperative satisfaction in aesthetic surgery: a preliminary study.
- Aesthetically ideal noses created using a single artificial intelligence model: Validating literature and exploring ethnic differences.
- Septocolumellar strut technique: Tip stability and aesthetic outcomes in rhinoplasty.
- Implications of Dermatologic Disorders in Facial Cosmetic Surgery: A Systematic Review.
- Factors on Quality of Life Improvement in Septorhinoplasty: Prospective Evaluation Using the Functional Rhinoplasty Outcome Inventory 17 and Its Minimally Important Difference.