Article Text
Abstract
Generative artificial intelligence (AI) technologies have the potential to revolutionise healthcare delivery but require classification and monitoring of patient safety risks. To address this need, we developed and evaluated a preliminary classification system for categorising generative AI patient safety errors. Our classification system is organised around two AI system stages (input and output) with specific error types by stage. We applied our classification system to two generative AI applications to assess its effectiveness in categorising safety issues: patient-facing conversational large language models (LLMs) and an ambient digital scribe (ADS) system for clinical documentation. In the LLM analysis, we identified 45 errors across 27 patient medical queries, with omission being the most common (42% of errors). Of the identified errors, 50% were categorised as low clinical significance, 25% as moderate clinical significance and 25% as high clinical significance. Similarly, in the ADS simulation, we identified 66 errors across 11 patient visits, with omission being the most common (83% of errors). Of the identified errors, 55% were categorised as low clinical significance and 45% were categorised as moderate clinical significance. These findings demonstrate the classification system’s utility in categorising output errors from two different AI healthcare applications, providing a starting point for developing a robust process to better understand AI-enabled errors.
- Patient safety
- Human factors
- Audit and feedback