ChatGPT-5 Matches Surgeon-Level Assessment of Facelift Candidacy: A Pilot Proof-of-Concept Study.
Abstract
[BACKGROUND] The use of multimodal artificial intelligence (AI) in plastic surgery is steadily increasing. Whether a general-purpose multimodal AI tool can, from photographs alone, assess facial aging and facelift candidacy at a level comparable to board-certified specialist plastic surgeons remains unknown.
[OBJECTIVES] To determine if ChatGPT-5 (OpenAI, San Francisco, CA, USA) can identify facial aging features, stratify severity, and judge facelift candidacy from photographs alone, compared with board-certified plastic surgeons.
[METHODS] Two-center observational pilot. Twenty-two volunteers (mean age 42.0 ± 16.8 years; median 34 years; range 24-80) provided standardized four-view facial composite photographs. Five board-certified plastic surgeons independently completed an eight-item questionnaire per case. ChatGPT-5 assessed the same images with identical wording. Assessments were image-only and blinded (no demographics/history). Surgeon consensus was defined by plurality. Primary outcomes were agreement and Cohen's κ; for ordinal items, weighted κ, Spearman's ρ, and mean absolute error (MAE) were reported. McNemar's test assessed discordance for binary items.
[RESULTS] For facelift candidacy, agreement was 95.5% (21/22; Cohen's κ = 0.91; McNemar P = 1.00). For binary aging features, agreement ranged from 81.8 to 90.9% (κ ≈ 0.61 to 0.81). For ordinal severity (lower face and midface), exact agreement was 77.3%, disagreements were adjacent only, weighted κ = 0.74 to 0.86, Spearman's ρ = 0.84 (P < .001). Inter-surgeon agreement on ordinal items was moderate to fair. For the adjunct-procedure recommendation, Top-1 accuracy was 70.6% (12/17; κ = 0.58) and Top-2 agreement was 77.3% (17/22).
[CONCLUSIONS] In a blinded, standardized-photograph setting, ChatGPT-5 matched surgeons on binary facelift candidacy assessment and closely tracked severity grading with small, one-level differences at most. These findings may support use as a decision-support tool (triage, patient education) while surgeons retain hands-on examination and personalized planning. Larger, multicenter studies with more diverse image datasets are warranted to confirm generalizability and define deployment standards.
[LEVEL OF EVIDENCE IV] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
[OBJECTIVES] To determine if ChatGPT-5 (OpenAI, San Francisco, CA, USA) can identify facial aging features, stratify severity, and judge facelift candidacy from photographs alone, compared with board-certified plastic surgeons.
[METHODS] Two-center observational pilot. Twenty-two volunteers (mean age 42.0 ± 16.8 years; median 34 years; range 24-80) provided standardized four-view facial composite photographs. Five board-certified plastic surgeons independently completed an eight-item questionnaire per case. ChatGPT-5 assessed the same images with identical wording. Assessments were image-only and blinded (no demographics/history). Surgeon consensus was defined by plurality. Primary outcomes were agreement and Cohen's κ; for ordinal items, weighted κ, Spearman's ρ, and mean absolute error (MAE) were reported. McNemar's test assessed discordance for binary items.
[RESULTS] For facelift candidacy, agreement was 95.5% (21/22; Cohen's κ = 0.91; McNemar P = 1.00). For binary aging features, agreement ranged from 81.8 to 90.9% (κ ≈ 0.61 to 0.81). For ordinal severity (lower face and midface), exact agreement was 77.3%, disagreements were adjacent only, weighted κ = 0.74 to 0.86, Spearman's ρ = 0.84 (P < .001). Inter-surgeon agreement on ordinal items was moderate to fair. For the adjunct-procedure recommendation, Top-1 accuracy was 70.6% (12/17; κ = 0.58) and Top-2 agreement was 77.3% (17/22).
[CONCLUSIONS] In a blinded, standardized-photograph setting, ChatGPT-5 matched surgeons on binary facelift candidacy assessment and closely tracked severity grading with small, one-level differences at most. These findings may support use as a decision-support tool (triage, patient education) while surgeons retain hands-on examination and personalized planning. Larger, multicenter studies with more diverse image datasets are warranted to confirm generalizability and define deployment standards.
[LEVEL OF EVIDENCE IV] This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 시술 | facelift
|
안면거상술 | dict | 5 | |
| 합병증 | four-view facial
|
scispacy | 1 | ||
| 약물 | [BACKGROUND]
|
scispacy | 1 | ||
| 약물 | [OBJECTIVES]
|
scispacy | 1 | ||
| 약물 | [CONCLUSIONS] In
|
scispacy | 1 | ||
| 질환 | Top-1
|
scispacy | 1 | ||
| 질환 | standardized-photograph
|
scispacy | 1 | ||
| 기타 | Top-1
|
scispacy | 1 | ||
| 기타 | patient
|
scispacy | 1 |
📑 인용 관계
이 논문이 참조한 문헌 18
🔗 함께 등장하는 도메인
이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들
관련 논문
- Implications of Dermatologic Disorders in Facial Cosmetic Surgery: A Systematic Review.
- Clinical safety of a low-modification hyaluronic acid filler (MoD 2%) for facial rejuvenation.
- Medial Limited Midface-Lift-16-Year Experience.
- The Outcome of the Reconstructive Procedure Using Buccal Pad of Fat Flap and Deep Plane Facelift after Permanent Filler Removal.
- Sialendoscopy as treatment of face aesthetic surgery complications: technical note.