ChatGPT, the artificial intelligence chatbot from OpenAI, could potentially rival Google one day as an online health resource, many people say — but how reliable are its responses right now?
Researchers from the University of Maryland School of Medicine (UMSOM) were eager to find out.
In February, they created a list of 25 questions related to breast cancer screening guidelines — then asked ChatGPT to answer each of the questions three times.
The researchers found that 22 out of 25 of the chatbot’s responses were accurate. However, two of the questions resulted in significantly different answers each time out.
ARTIFICIAL INTELLIGENCE IN HEALTH CARE: NEW PRODUCT ACTS AS ‘COPILOT FOR DOCTORS’
Also, ChatGPT gave outdated information in one of its responses, according to a press release announcing the findings.
Overall, the researchers said that ChatGPT answered questions correctly about 88% of the time.
The findings of the study were published this month in the journal Radiology. Researchers from Massachusetts General Hospital and the Johns Hopkins University School of Medicine also participated.
“ChatGPT has tremendous potential to provide medical information, as we showed in our study,” study co-author Paul Yi, M.D., assistant professor of diagnostic radiology and nuclear medicine at UMSOM, told Fox News Digital in an email.
“Although it often provides correct information, the wrong information it does present could have negative consequences.”
“However, it is not ready for the real world,” he also said. “Although it often provides correct information, the wrong information it does present could have negative consequences.”
The questions focused on breast cancer symptoms, individual risk factors and recommendations for mammogram screenings.
Although the responses had a high accuracy rate, the researchers pointed out that they were not as in-depth as what a Google search might provide.
“ChatGPT provided only one set of recommendations on breast cancer screening, issued from the American Cancer Society, but did not mention differing recommendations put out by the Centers for Disease Control and Prevention (CDC) or the U.S. Preventative Services Task Force (USPSTF),” said study lead author Hana Haver, M.D., a radiology resident at University of Maryland Medical Center, in the press release.
The single “inappropriate” response was given to the question, “Do I need to plan my mammogram around my COVID vaccination?”
ChatGPT responded that women should wait four to six weeks after the vaccine to schedule a mammogram — but that guidance changed in February 2022. The chatbot was basing its responses on outdated information.
The chatbot also gave inconsistent responses to the questions “How can I prevent breast cancer?” and “Where can I get screened for breast cancer?”
AI AND HEART HEALTH: MACHINES DO A BETTER JOB OF READING ULTRASOUNDS THAN SONOGRAPHERS DO, SAYS STUDY
“It can provide wrong information that can sound very convincing — but there is no mechanism currently available to indicate if it is unsure about its answers,” Yi told Fox News Digital.
“This is important to solve before these chatbots can be used safely in real-world medical education.”
Why does ChatGPT give different answers to the same question?
Those who ask ChatGPT the same question multiple times will likely receive different responses. Dr. Harvey Castro, a Dallas, Texas-based board-certified emergency medicine physician and national speaker on artificial intelligence in health care, said there are a few reasons for this.
(Castro was not involved in the UMSOM study.)
“ChatGPT is always learning new things from the data it gets,” he explained to Fox News Digital. “Each generation of this software will get better because of the data it can access. If a human corrects the data, ChatGPT will update its reply based on others’ responses.”
He went on, “So if you ask the same question tomorrow, it might have learned further information [by then] that could change its answer. This makes the program better at giving helpful and up-to-date responses.”
The chatbot also has a wealth of knowledge at its disposal, so it can “think” of many different ways to answer a question, Castro explained.
ChatGPT responses should be vetted by a doctor, experts say.
Additionally, ChatGPT varies its word choice for any given response.
“ChatGPT works by thinking about which words should come next in a sentence,” Castro said. “It looks at the chances of different words fitting well. Because of this, there is always a bit of randomness in its answers.”
ChatGPT also remembers conversations — so if someone asks the same question a few times in one talk, the chatbot might change its answer based on what was said earlier, noted Castro.
As AI shows promise, experts urge caution
While ChatGPT can be a helpful resource, the experts agree that the responses should be vetted by the appropriate doctor.
“It can provide wrong information that can sound very convincing.”
Sanjeev Agrawal, president and chief operating officer of California-based LeanTaaS, which develops AI solutions for hospitals across the country, was impressed by the results of the study — although he noted that 88% is not nearly as high a score as patients would like to see when they’re being screened for cancer.
CLICK HERE TO SIGN UP FOR OUR HEALTH NEWSLETTER
“While I don’t see this as replacing the last mile of needing a qualified, trained doctor just yet, I can very much see the value to both the patient and the doctor in getting an AI-assisted synthesis of their screening test as a starting point,” he told Fox News Digital.
CLICK HERE TO GET THE FOX NEWS APP
Added Agrawal, “For less sophisticated and more routine advice and screening, this could enable patients to get reliable and accurate advice sooner and take some of the burden off the health care system.”