Evaluation of Artificial Intelligence Chatbots for Facial Injection Planning: Comparative Performance and Safety Limitations.
Abstract
[BACKGROUND] To evaluate the performance of artificial intelligence (AI)-powered chatbots in generating treatment plans for facial aesthetic injections, focusing on their accuracy, safety, and clinical applicability.
[METHODS] A comparative observational study was conducted in an otolaryngology tertiary care department according to STROBE guidelines. Patients seeking facial injections were recruited from July to October 2024. Forty patients (85% female; mean age: 45.8 years) underwent photographic documentation and received AI-generated treatment plans for botulinum toxin and hyaluronic acid injections. Six AI chatbots and three generative vision models were evaluated based on five criteria: product selection, injection strategy, facial analysis, alignment with patient preferences, and safety. Likert scale ratings, each ranging from - 2 to + 2, were analyzed using Friedman and Durbin-Conover pairwise tests to identify significant differences (p < 0.05). The sum of the five Likert scales provided an overall score ranging from - 10 to + 10.
[RESULTS] ChatGPTo1 and ChatGPT4o achieved higher scores than other chatbots across most evaluation criteria, with mean total scores of 7.87 ± 0.29 and 7.85 ± 0.44, respectively (p = 0.295). Both chatbots were statistically superior (p < 0.05) to Claude, CopilotPro, and Llama in product selection (ChatGPT4o = 1.92 ± 0.05), injection strategy precision (ChatGPTo1 = 1.67 ± 0.08), alignment with patient preferences (ChatGPTo1 = 1.95 ± 0.03) and safety (ChatGPTo1 = 1.30 ± 0.17). Claude provided relevant facial analysis (1.50 ± 0.16) without significant difference compared to ChatGPT models (all p > 0.05). Generative vision models failed to produce relevant visual annotations.
[CONCLUSION] Among the AI systems tested, ChatGPT-based chatbots demonstrated relatively superior performance in generating treatment plans for facial injections. However, safety limitations remain and preclude unsupervised clinical use.
[LEVEL OF EVIDENCE IV] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
[METHODS] A comparative observational study was conducted in an otolaryngology tertiary care department according to STROBE guidelines. Patients seeking facial injections were recruited from July to October 2024. Forty patients (85% female; mean age: 45.8 years) underwent photographic documentation and received AI-generated treatment plans for botulinum toxin and hyaluronic acid injections. Six AI chatbots and three generative vision models were evaluated based on five criteria: product selection, injection strategy, facial analysis, alignment with patient preferences, and safety. Likert scale ratings, each ranging from - 2 to + 2, were analyzed using Friedman and Durbin-Conover pairwise tests to identify significant differences (p < 0.05). The sum of the five Likert scales provided an overall score ranging from - 10 to + 10.
[RESULTS] ChatGPTo1 and ChatGPT4o achieved higher scores than other chatbots across most evaluation criteria, with mean total scores of 7.87 ± 0.29 and 7.85 ± 0.44, respectively (p = 0.295). Both chatbots were statistically superior (p < 0.05) to Claude, CopilotPro, and Llama in product selection (ChatGPT4o = 1.92 ± 0.05), injection strategy precision (ChatGPTo1 = 1.67 ± 0.08), alignment with patient preferences (ChatGPTo1 = 1.95 ± 0.03) and safety (ChatGPTo1 = 1.30 ± 0.17). Claude provided relevant facial analysis (1.50 ± 0.16) without significant difference compared to ChatGPT models (all p > 0.05). Generative vision models failed to produce relevant visual annotations.
[CONCLUSION] Among the AI systems tested, ChatGPT-based chatbots demonstrated relatively superior performance in generating treatment plans for facial injections. However, safety limitations remain and preclude unsupervised clinical use.
[LEVEL OF EVIDENCE IV] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 시술 | botulinum toxin
|
보툴리눔독소 주사 | dict | 1 | |
| 재료 | hyaluronic acid
|
히알루론산 | dict | 1 | |
| 약물 | [BACKGROUND]
|
scispacy | 1 | ||
| 약물 | [RESULTS] ChatGPTo1
|
scispacy | 1 | ||
| 약물 | ChatGPT4o
|
scispacy | 1 | ||
| 약물 | ChatGPT
|
scispacy | 1 | ||
| 기타 | STROBE
|
scispacy | 1 | ||
| 기타 | Patients
|
scispacy | 1 | ||
| 기타 | female
|
scispacy | 1 | ||
| 기타 | patient
|
scispacy | 1 | ||
| 기타 | Llama
|
scispacy | 1 | ||
| 기타 | ChatGPT4o
|
scispacy | 1 | ||
| 기타 | ChatGPTo1
|
scispacy | 1 |
MeSH Terms
Humans; Female; Middle Aged; Male; Artificial Intelligence; Hyaluronic Acid; Adult; Cosmetic Techniques; Face; Skin Aging; Dermal Fillers; Patient Care Planning; Generative Artificial Intelligence
🔗 함께 등장하는 도메인
이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들
관련 논문
- Local therapeutic strategies for neurocutaneous dysesthesia: from capsaicin to cannabinoids.
- Comparative efficacy of intralesional therapies for keloid scars: a network meta-analysis.
- Adverse neurological events following botulinum toxin type A: A case series of post-injection seizures and paralysis.
- Decreased utilization of component separation techniques over time in complex abdominal wall reconstruction following introduction of preoperative botulinum toxin A.
- Current Perspectives on Pectoralis Minor Syndrome: A Narrative Review.