Comparing the Efficacy of Large Language Models ChatGPT, BARD, and Bing AI in Providing Information on Rhinoplasty: An Observational Study.
Abstract
[BACKGROUND] Large language models (LLMs) are emerging artificial intelligence (AI) technologies refining research and healthcare. However, the impact of these models on presurgical planning and education remains under-explored.
[OBJECTIVES] This study aims to assess 3 prominent LLMs-Google's AI BARD (Mountain View, CA), Bing AI (Microsoft, Redmond, WA), and ChatGPT-3.5 (Open AI, San Francisco, CA) in providing safe medical information for rhinoplasty.
[METHODS] Six questions regarding rhinoplasty were prompted to ChatGPT, BARD, and Bing AI. A Likert scale was used to evaluate these responses by a panel of Specialist Plastic and Reconstructive Surgeons with extensive experience in rhinoplasty. To measure reliability, the Flesch Reading Ease Score, the Flesch-Kincaid Grade Level, and the Coleman-Liau Index were used. The modified DISCERN score was chosen as the criterion for assessing suitability and reliability. A test was performed to calculate the difference between the LLMs, and a double-sided -value <.05 was considered statistically significant.
[RESULTS] In terms of reliability, BARD and ChatGPT demonstrated a significantly ( < .05) greater Flesch Reading Ease Score of 47.47 (±15.32) and 37.68 (±12.96), Flesch-Kincaid Grade Level of 9.7 (±3.12) and 10.15 (±1.84), and a Coleman-Liau Index of 10.83 (±2.14) and 12.17 (±1.17) than Bing AI. In terms of suitability, BARD (46.3 ± 2.8) demonstrated a significantly greater DISCERN score than ChatGPT and Bing AI. In terms of Likert score, ChatGPT and BARD demonstrated similar scores and were greater than Bing AI.
[CONCLUSIONS] BARD delivered the most succinct and comprehensible information, followed by ChatGPT and Bing AI. Although these models demonstrate potential, challenges regarding their depth and specificity remain. Therefore, future research should aim to augment LLM performance through the integration of specialized databases and expert knowledge, while also refining their algorithms.
[OBJECTIVES] This study aims to assess 3 prominent LLMs-Google's AI BARD (Mountain View, CA), Bing AI (Microsoft, Redmond, WA), and ChatGPT-3.5 (Open AI, San Francisco, CA) in providing safe medical information for rhinoplasty.
[METHODS] Six questions regarding rhinoplasty were prompted to ChatGPT, BARD, and Bing AI. A Likert scale was used to evaluate these responses by a panel of Specialist Plastic and Reconstructive Surgeons with extensive experience in rhinoplasty. To measure reliability, the Flesch Reading Ease Score, the Flesch-Kincaid Grade Level, and the Coleman-Liau Index were used. The modified DISCERN score was chosen as the criterion for assessing suitability and reliability. A test was performed to calculate the difference between the LLMs, and a double-sided -value <.05 was considered statistically significant.
[RESULTS] In terms of reliability, BARD and ChatGPT demonstrated a significantly ( < .05) greater Flesch Reading Ease Score of 47.47 (±15.32) and 37.68 (±12.96), Flesch-Kincaid Grade Level of 9.7 (±3.12) and 10.15 (±1.84), and a Coleman-Liau Index of 10.83 (±2.14) and 12.17 (±1.17) than Bing AI. In terms of suitability, BARD (46.3 ± 2.8) demonstrated a significantly greater DISCERN score than ChatGPT and Bing AI. In terms of Likert score, ChatGPT and BARD demonstrated similar scores and were greater than Bing AI.
[CONCLUSIONS] BARD delivered the most succinct and comprehensible information, followed by ChatGPT and Bing AI. Although these models demonstrate potential, challenges regarding their depth and specificity remain. Therefore, future research should aim to augment LLM performance through the integration of specialized databases and expert knowledge, while also refining their algorithms.
추출된 의학 개체 (NER)
| 유형 | 영어 표현 | 한국어 / 풀이 | UMLS CUI | 출처 | 등장 |
|---|---|---|---|---|---|
| 시술 | rhinoplasty
|
코성형술 | dict | 4 | |
| 약물 | [BACKGROUND] Large
|
scispacy | 1 | ||
| 약물 | [OBJECTIVES]
|
scispacy | 1 | ||
| 약물 | ChatGPT
|
scispacy | 1 | ||
| 약물 | [RESULTS]
|
scispacy | 1 | ||
| 약물 | [CONCLUSIONS]
|
scispacy | 1 | ||
| 질환 | Language
|
scispacy | 1 | ||
| 질환 | LLM
|
scispacy | 1 | ||
| 기타 | LLMs-Google
|
scispacy | 1 | ||
| 기타 | ChatGPT-3.5
|
scispacy | 1 |
🔗 함께 등장하는 도메인
이 논문이 속한 카테고리와 같은 논문에서 자주 함께 다뤄지는 카테고리들
관련 논문
- The impact of three-dimensional simulation and virtual reality technologies on surgical decision-making and postoperative satisfaction in aesthetic surgery: a preliminary study.
- Aesthetically ideal noses created using a single artificial intelligence model: Validating literature and exploring ethnic differences.
- Septocolumellar strut technique: Tip stability and aesthetic outcomes in rhinoplasty.
- Implications of Dermatologic Disorders in Facial Cosmetic Surgery: A Systematic Review.
- Factors on Quality of Life Improvement in Septorhinoplasty: Prospective Evaluation Using the Functional Rhinoplasty Outcome Inventory 17 and Its Minimally Important Difference.