Evaluating ChatGPT-4o's Quality and Readability in Preoperative Facial Plastic Surgery Counseling: A Vignette-Based Analysis.
Abstract
[BACKGROUND] Generative artificial intelligence (AI) models such as ChatGPT have demonstrated potential in medical education and patient communication and education. However, their utility for delivering personalized preoperative information in plastic surgery remains unsettled.
[METHODS] This study evaluated ChatGPT-4o's ability to generate preoperative counseling for the 5 most common aesthetic procedures, using vignette-style prompts featuring a standardized patient profile. Responses were assessed for quality using the DISCERN instrument, specificity via a Likert scale, and readability through the Flesch-Kincaid Grade Level score.
[RESULTS] ChatGPT's responses achieved moderate overall quality. DISCERN total scores ranged from 40 to 44.7 out of 75 across procedures. Reliability was high for clarity and relevance but low for source transparency and encouragement of shared decision-making. Specificity scores were low to moderate, with the highest specificity observed for abdominoplasty-related responses (mean Likert score 2.6). Readability remained within the recommended range, with an average Flesch-Kincaid Grade Level of 8.4, suitable for most patient populations.
[CONCLUSION] ChatGPT-4o can generate preoperative educational materials at an appropriate reading level with moderate quality and specificity. However, its lack of source disclosure, minimal emphasis on alternative treatments, and insufficient promotion of physician consultation highlight its limitations. ChatGPT may serve as an adjunct to physician counseling but is not a substitute for personalized medical consultation by a plastic surgeon.
[METHODS] This study evaluated ChatGPT-4o's ability to generate preoperative counseling for the 5 most common aesthetic procedures, using vignette-style prompts featuring a standardized patient profile. Responses were assessed for quality using the DISCERN instrument, specificity via a Likert scale, and readability through the Flesch-Kincaid Grade Level score.
[RESULTS] ChatGPT's responses achieved moderate overall quality. DISCERN total scores ranged from 40 to 44.7 out of 75 across procedures. Reliability was high for clarity and relevance but low for source transparency and encouragement of shared decision-making. Specificity scores were low to moderate, with the highest specificity observed for abdominoplasty-related responses (mean Likert score 2.6). Readability remained within the recommended range, with an average Flesch-Kincaid Grade Level of 8.4, suitable for most patient populations.
[CONCLUSION] ChatGPT-4o can generate preoperative educational materials at an appropriate reading level with moderate quality and specificity. However, its lack of source disclosure, minimal emphasis on alternative treatments, and insufficient promotion of physician consultation highlight its limitations. ChatGPT may serve as an adjunct to physician counseling but is not a substitute for personalized medical consultation by a plastic surgeon.
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 시술 | abdominoplasty
|
복부성형술 | dict | 1 | |
| 약물 | [BACKGROUND]
|
scispacy | 1 | ||
| 약물 | ChatGPT
|
scispacy | 1 | ||
| 기타 | ChatGPT-4o
|
scispacy | 1 | ||
| 기타 | patient
|
scispacy | 1 | ||
| 기타 | ChatGPT
|
scispacy | 1 |
🔗 함께 등장하는 도메인
이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들
관련 논문
- Case report of a rare soft tissue tuberculosis in a patient undergoing lipoabdominoplasty.
- What is the potential role of the nonopioid suzetrigine in pain management?
- Ex Vivo and In Vivo Histological Evaluation of a 3-μm Wavelength, 40-μm Spot Size Fractional Laser System for Dermatology.
- Correspondence on "Lymphatic pathway remodeling in the supraumbilical region after abdominoplasty: A prospective cohort study".
- Sculpting Success-The TULUANHA: Modified TULUA Lipo-Abdominoplasty in Post-Bariatric Body Contouring.