Prompt engineering to increase GPT3.5's performance on the Plastic Surgery In-Service Exams.
Abstract
This study assesses ChatGPT's (GPT-3.5) performance on the 2021 ASPS Plastic Surgery In-Service Examination using prompt modifications and Retrieval Augmented Generation (RAG). ChatGPT was instructed to act as a "resident," "attending," or "medical student," and RAG utilized a curated vector database for context. Results showed no significant improvement, with the "resident" prompt yielding the highest accuracy at 54%, and RAG failing to enhance performance, with accuracy remaining at 54.3%. Despite appropriate reasoning when correct, ChatGPT's overall performance fell in the 10th percentile, indicating the need for fine-tuning and more sophisticated approaches to improve AI's utility in complex medical tasks.
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 약물 | RAG
→ Retrieval Augmented Generation
|
scispacy | 1 | ||
| 약물 | ChatGPT
|
scispacy | 1 | ||
| 질환 | ASPS
|
C0206293
Asp snake
|
scispacy | 1 | |
| 기타 | GPT3.5
|
scispacy | 1 | ||
| 기타 | RAG
→ Retrieval Augmented Generation
|
scispacy | 1 | ||
| 기타 | ChatGPT
|
scispacy | 1 |
MeSH Terms
Humans; Surgery, Plastic; Educational Measurement; Clinical Competence; Internship and Residency