A recent research paper titled “Evaluating the usefulness of ChatGPT across the clinical workflow: A development and usability study” published in the Journal of Medical Internet Research, evaluated the utility of ChatGPT in clinical decision making. ChatGPT, a large language model (LLM) based on OpenAI’s Generative Pre-trained Transformer-3.5, was tested using 36 clinical vignettes from the Merck Sharpe & Dohme (MSD) clinical guideline. The study aimed to evaluate its performance in providing clinical decision support including differential diagnoses, diagnostic tests, definitive diagnosis and management based on patient demographics and case specificity.
Findings show that ChatGPT achieved an overall accuracy of 71.7% across all vignettes, outperforming final diagnoses with an accuracy rate of 76.9%. However, it had lower performance in generating initial differential diagnoses with a 60.3% accuracy rate. Accuracy was consistent across patient age and gender, indicating broad applicability in a variety of clinical contexts. This performance was measured without ChatGPT’s internet access, relying solely on training data up to 2021.
The usefulness of ChatGPT was assessed by representing each component of the clinical workflow as a sequential prompt, allowing the model to integrate information from earlier parts of the conversation into later responses. This approach reflects the iterative nature of clinical medicine, where new information continually updates previous hypotheses.
The study is significant as it provides first-of-its-kind evidence of the potential use of AI tools such as ChatGPT across the clinical workflow. It highlights the model’s ability to adapt and respond to changing clinical scenarios, a crucial aspect of patient care. This research opens up new possibilities for supporting AI in healthcare, potentially improving decision-making, treatment and care in a variety of medical settings.
Image source: Shutterstock