← 뒤로

Prompt engineering to increase GPT3.5's performance on the Plastic Surgery In-Service Exams.

Journal of plastic, reconstructive & aesthetic surgery : JPRAS 2024 Vol.98() p. 158-160

Nahass GR, Chin SW, Scharf IM, Kazmouz S, Kaplan N, Chiu R, Yang K, Bou Zeid N, Corcoran J, Alkureishi LWT

원문 ↗ DOI ↗

Abstract

This study assesses ChatGPT's (GPT-3.5) performance on the 2021 ASPS Plastic Surgery In-Service Examination using prompt modifications and Retrieval Augmented Generation (RAG). ChatGPT was instructed to act as a "resident," "attending," or "medical student," and RAG utilized a curated vector database for context. Results showed no significant improvement, with the "resident" prompt yielding the highest accuracy at 54%, and RAG failing to enhance performance, with accuracy remaining at 54.3%. Despite appropriate reasoning when correct, ChatGPT's overall performance fell in the 10th percentile, indicating the need for fine-tuning and more sophisticated approaches to improve AI's utility in complex medical tasks.

추출된 의학 개체 (NER)

유형	영어 표현	UMLS CUI	출처	등장
약물	`RAG` → Retrieval Augmented Generation		scispacy	1
약물	`ChatGPT`		scispacy	1
질환	`ASPS`	C0206293 Asp snake	scispacy	1
기타	`GPT3.5`		scispacy	1
기타	`RAG` → Retrieval Augmented Generation		scispacy	1
기타	`ChatGPT`		scispacy	1

MeSH Terms

Humans; Surgery, Plastic; Educational Measurement; Clinical Competence; Internship and Residency