Evaluating Large Language Models as Medical Consultation Tools for Double Eyelid Surgery: A Cross-Language Study in English and Chinese.

Aesthetic plastic surgery 2026 Vol.50(5) p. 1706-1716

Xin J, He X

관련 도메인

Abstract

[BACKGROUND] Double eyelid surgery is a common cosmetic procedure that creates a crease in the upper eyelid. Due to insufficient understanding of the procedure, numerous consultations have emerged, placing a heavy burden on plastic surgeons. The rise of large language models (LLMs) offers a potential solution to this issue.

[METHODS] This study collected sixteen questions commonly of concern to individuals seeking the surgery via an online questionnaire and assessed the efficacy of fifteen popular LLMs in answering these questions with both English and Chinese inputs. All responses from the LLMs were scored multidimensionally by three expert eyelid plastic surgeons across dimensions including professionalism, patient friendliness, informativeness, practicality, and logical clarity. The scoring results were statistically analyzed using the Friedman test and Nemenyi post-hoc test.

[RESULTS] With English input, ERNIE-Bot, ChatGPT-4o, and Gemini-2.0-Flash consistently ranked among the top three across most evaluation dimensions. In contrast, Claude-3.7-Sonnet, HuatuoGPT, ZoeGPT, CompliantGPT, and BastionGPT ranked lower across all dimensions, with performance significantly lagging behind the top performers. For Chinese input, DeepSeek-R1 maintained a leading position across all dimensions, forming the first tier alongside DeepSeek-V3, Gemini-2.0-Flash, and ERNIE-Bot. Meanwhile, Claude-3.5-Haiku, ZoeGPT, Llama3.3-70B-Instruct, CompliantGPT, HuatuoGPT, and BastionGPT ranked lower in multiple dimensions, with a significant gap relative to first-tier models.

[CONCLUSION] This study demonstrated LLMs' potential as medical consultation tools for double eyelid surgery, providing useful guidance for both English and Chinese users. Future research should focus on fine-tuning LLMs with more specialized medical data and exploring workflows for surgeon-LLM collaboration to validate their clinical utility.

[LEVEL OF EVIDENCE V] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .

추출된 의학 개체 (NER)

유형영어 표현한국어 / 풀이UMLS CUI출처등장
해부 eyelid 눈꺼풀 dict 4
시술 double eyelid 안검성형술 dict 3
해부 upper eyelid 눈꺼풀 dict 1
해부 crease scispacy 1
해부 ERNIE-Bot scispacy 1
합병증 eyelid plastic scispacy 1
약물 [BACKGROUND] Double scispacy 1
약물 BastionGPT scispacy 1
질환 Language scispacy 1
기타 patient scispacy 1
기타 CompliantGPT scispacy 1

MeSH Terms

Humans; Blepharoplasty; Language; Referral and Consultation; Surveys and Questionnaires; Female; Eyelids; Male; China; Adult; Middle Aged; Surgery, Plastic; Large Language Models; East Asian People

📑 인용 관계

🔗 함께 등장하는 도메인

이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들

관련 논문