Latest News

Expert Warns of Problems with Large Language Models in Dermatology


 

FROM PDA 2024

HUNTINGTON BEACH, CALIF. — When Roxana Daneshjou, MD, PhD, began reviewing responses to an exploratory survey she and her colleagues created on dermatologists’ use of large language models (LLMs) such as ChatGPT in clinical practice, she was both surprised and alarmed.

Of the 134 respondents who completed the survey, 87 (65%) reported using LLMs in a clinical setting. Of those 87 respondents, 17 (20%) used LMMs daily, 28 (32%) weekly, 5 (6%) monthly, and 37 (43%) rarely. That represents “pretty significant usage,” Dr. Daneshjou, assistant professor of biomedical data science and dermatology at Stanford University, Palo Alto, California, said at the annual meeting of the Pacific Dermatologic Association.

Roxana Daneshjou, MD, PhD, assistant professor of biomedical data science and dermatology at Stanford University, Stanford, California courtesy Dr. Roxana Daneshjou

Dr. Roxana Daneshjou

Most of the respondents reported using LLMs for patient care (79%), followed by administrative tasks (74%), medical records (43%), and education (18%), “which can be problematic,” she said. “These models are not appropriate to use for patient care.”

When asked about their thoughts on the accuracy of LLMs, 58% of respondents deemed them to be “somewhat accurate” and 7% viewed them as “extremely accurate.”

The overall survey responses raise concern because LLMs “are not trained for accuracy; they are trained initially as a next-word predictor on large bodies of tech data,” Dr. Daneshjou said. “LLMs are already being implemented but have the potential to cause harm and bias, and I believe they will if we implement them the way things are rolling out right now. I don’t understand why we’re implementing something without any clinical trial or showing that it improves care before we throw untested technology into our healthcare system.”

Meanwhile, Epic and Microsoft are collaborating to bring AI technology to electronic health records, she said, and Epic is building more than 100 new AI features for physicians and patients. “I think it’s important for every physician and trainee to understand what is going on in the realm of AI,” said Dr. Daneshjou, who is an associate editor for the monthly journal NEJM AI. “Be involved in the conversation because we are the clinical experts, and a lot of people making decisions and building tools do not have the clinical expertise.”

To further illustrate her concerns, Dr. Daneshjou referenced a red teaming event she and her colleagues held with computer scientists, biomedical data scientists, engineers, and physicians across multiple specialties to identify issues related to safety, bias, factual errors, and/or security issues in GPT-3.5, GPT-4, and GPT-4 with internet. The goal was to mimic clinical health scenarios, ask the LLM to respond, and have the team members review the accuracy of LLM responses.

The participants found that nearly 20% of LLM responses were inappropriate. For example, in one task, an LLM was asked to calculate a RegiSCAR score for Drug Reaction With Eosinophilia and Systemic Symptoms for a patient, but the response included an incorrect score for eosinophilia. “That’s why these tools can be so dangerous because you’re reading along and everything seems right, but there might be something so minor that can impact patient care and you might miss it,” Dr. Daneshjou said.

On a related note, she advised against dermatologists uploading images into GPT-4 Vision, an LLM that can analyze images and provide textual responses to questions about them, and she recommends not using GPT-4 Vision for any diagnostic support. At this time, “GPT-4 Vision overcalls malignancies, and the specificity and sensitivity are not very good,” she explained.

Dr. Daneshjou disclosed that she has served as an adviser to MDalgorithms and Revea and has received consulting fees from Pfizer, L’Oréal, Frazier Healthcare Partners, and DWA and research funding from UCB.

A version of this article first appeared on Medscape.com.

Recommended Reading

Ustekinumab’s ‘Egregious’ Medicare Part B and D Pricing Differences Led to Federal Intervention
MDedge Dermatology
The Wellness Industry: Financially Toxic, Says Ethicist
MDedge Dermatology
Focusing on Value in Social Media Posts
MDedge Dermatology
Why More Doctors Are Joining Unions
MDedge Dermatology
The Silent Exodus: Are Nurse Practitioners and Physician Assistants Quiet Quitting?
MDedge Dermatology
UVA Defends Medical School Dean, Hospital CEO After Docs Call for Their Removal
MDedge Dermatology
Are Pharmacy Deserts Worsening Health Disparities?
MDedge Dermatology
AI-Powered Clinical Documentation Tool Reduces EHR Time for Clinicians
MDedge Dermatology
‘Reform School’ for Pharmacy Benefit Managers: How Might Legislation Help Patients?
MDedge Dermatology
Laser, Radiofrequency Therapies Offer Little Benefit for Genitourinary Syndrome of Menopause
MDedge Dermatology