Generative AI tools are already helping doctors transcribe visits and summarize patient records. The technology behind ChatGPT, trained on vast amounts of data from the internet, made headlines when it correctly answered more than 80% of board exam questions. In July, a team at Beth Israel saw promising results when using GPT-4 during a diagnosis workshop for medical residents.
But the tool is by no means ready for primetime. When given questions about real-life medical scenarios by Stanford researchers, GPT frequently disagreed with humans or offered irrelevant information. The AI models are prone to “hallucinating,” or making things up because they sound right — a tendency that could cause immeasurable harm if let loose on patients. AI leaders and policymakers alike have called for more regulations.
STAT decided to put ChatGPT to the test at its annual summit on Wednesday, pitting the tool against Ann Woolley, infectious disease specialist at Brigham and Women’s Hospital. Marc Succi, the associate innovation chair at Mass General Brigham, gave two patient scenarios to the tool while Woolley explained her own diagnoses.
To submit a correction request, please visit our Contact Us page.
STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect