Evaluation of Artificial Intelligence Chatbots for Facial Injection Planning: Comparative Performance and Safety Limitations.

Aesthetic plastic surgery 2025 Vol.49(21) p. 5866-5876

Radulesco T, Ebode D, Maniaci A, Gargula S, Saibene AM, Chiesa-Estomba C, Gengler I, Vaira L, Vishnumurthy P, Lechien JR, Michel J

관련 도메인

Abstract

[BACKGROUND] To evaluate the performance of artificial intelligence (AI)-powered chatbots in generating treatment plans for facial aesthetic injections, focusing on their accuracy, safety, and clinical applicability.

[METHODS] A comparative observational study was conducted in an otolaryngology tertiary care department according to STROBE guidelines. Patients seeking facial injections were recruited from July to October 2024. Forty patients (85% female; mean age: 45.8 years) underwent photographic documentation and received AI-generated treatment plans for botulinum toxin and hyaluronic acid injections. Six AI chatbots and three generative vision models were evaluated based on five criteria: product selection, injection strategy, facial analysis, alignment with patient preferences, and safety. Likert scale ratings, each ranging from - 2 to + 2, were analyzed using Friedman and Durbin-Conover pairwise tests to identify significant differences (p < 0.05). The sum of the five Likert scales provided an overall score ranging from - 10 to + 10.

[RESULTS] ChatGPTo1 and ChatGPT4o achieved higher scores than other chatbots across most evaluation criteria, with mean total scores of 7.87 ± 0.29 and 7.85 ± 0.44, respectively (p = 0.295). Both chatbots were statistically superior (p < 0.05) to Claude, CopilotPro, and Llama in product selection (ChatGPT4o = 1.92 ± 0.05), injection strategy precision (ChatGPTo1 = 1.67 ± 0.08), alignment with patient preferences (ChatGPTo1 = 1.95 ± 0.03) and safety (ChatGPTo1 = 1.30 ± 0.17). Claude provided relevant facial analysis (1.50 ± 0.16) without significant difference compared to ChatGPT models (all p > 0.05). Generative vision models failed to produce relevant visual annotations.

[CONCLUSION] Among the AI systems tested, ChatGPT-based chatbots demonstrated relatively superior performance in generating treatment plans for facial injections. However, safety limitations remain and preclude unsupervised clinical use.

[LEVEL OF EVIDENCE IV] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .

추출된 의학 개체 (NER)

유형영어 표현한국어 / 풀이UMLS CUI출처등장
시술 botulinum toxin 보툴리눔독소 주사 dict 1
재료 hyaluronic acid 히알루론산 dict 1
약물 [BACKGROUND] scispacy 1
약물 [RESULTS] ChatGPTo1 scispacy 1
약물 ChatGPT4o scispacy 1
약물 ChatGPT scispacy 1
기타 STROBE scispacy 1
기타 Patients scispacy 1
기타 female scispacy 1
기타 patient scispacy 1
기타 Llama scispacy 1
기타 ChatGPT4o scispacy 1
기타 ChatGPTo1 scispacy 1

MeSH Terms

Humans; Female; Middle Aged; Male; Artificial Intelligence; Hyaluronic Acid; Adult; Cosmetic Techniques; Face; Skin Aging; Dermal Fillers; Patient Care Planning; Generative Artificial Intelligence

🔗 함께 등장하는 도메인

이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들

관련 논문