Article Text

Download PDFPDF
Development of a Preliminary Patient Safety Classification System for Generative AI
  1. Bat-Zion Hose1,2,
  2. Jessica L Handley1,
  3. Joshua Biro1,
  4. Sahithi Reddy2,
  5. Seth Krevat1,2,
  6. Aaron Zachary Hettinger2,3,
  7. Raj M Ratwani1,2
  1. 1 National Center for Human Factors in Healthcare, MedStar Health Research Institute, Washington, District of Columbia, USA
  2. 2 Georgetown University Medical Center, Washington, District of Columbia, USA
  3. 3 Center for Biostatistics, Informatics and Data Science, MedStar Health Research Institute, Washington, District of Columbia, USA
  1. Correspondence to Dr Bat-Zion Hose; bat-zion.hose{at}medstar.net

Abstract

Generative artificial intelligence (AI) technologies have the potential to revolutionise healthcare delivery but require classification and monitoring of patient safety risks. To address this need, we developed and evaluated a preliminary classification system for categorising generative AI patient safety errors. Our classification system is organised around two AI system stages (input and output) with specific error types by stage. We applied our classification system to two generative AI applications to assess its effectiveness in categorising safety issues: patient-facing conversational large language models (LLMs) and an ambient digital scribe (ADS) system for clinical documentation. In the LLM analysis, we identified 45 errors across 27 patient medical queries, with omission being the most common (42% of errors). Of the identified errors, 50% were categorised as low clinical significance, 25% as moderate clinical significance and 25% as high clinical significance. Similarly, in the ADS simulation, we identified 66 errors across 11 patient visits, with omission being the most common (83% of errors). Of the identified errors, 55% were categorised as low clinical significance and 45% were categorised as moderate clinical significance. These findings demonstrate the classification system’s utility in categorising output errors from two different AI healthcare applications, providing a starting point for developing a robust process to better understand AI-enabled errors.

  • Patient safety
  • Human factors
  • Audit and feedback

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Footnotes

  • Contributors BZH provided expert input on the error classification development, analysed data for use case 1 (LLM study) and is responsible as the first author of the manuscript. BZH is the guarantor. JLH provided expert input on the error classification development, analysed data for use cases 1 (LLM study) and 2 (ADS simulation study of errors) and reviewed/revised the manuscript. JB designed use case 2 (ADS simulation study of errors) and analysed the data. SR participated in data analysis for use cases 1 (LLM study) and 2 (ADS simulation study of errors). SK provided expert input on the error classification development, participated in data analysis of use cases 1 (LLM study) and 2 (ADS simulation study of errors) and reviewed the manuscript. AZH provided expert input on the error classification development. RMR provided expert input on the error classification development, designed case 1 (LLM study) and reviewed the manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.