ChatGPT bested Google in a blinded comparison of their answers to common oculoplastic questions.
Google has long held the position of being the source for answers to people’s medical questions, earning the search engine the nickname “Dr. Google.”
But thanks to the advent of large language models (LLMs), there may be a new online go-to medical expert: ChatGPT, M.D.
When answers to frequently answers from ChatGPT and Google’s Gemini — which is Google’s generative chatbot — were evaluated by six oculoplastic surgeons, 79% of their grades on 30 questions indicated that ChatGPT gave the better answer, according to research presented at the 2024 annual meeting of the American Academy of Ophthalmology that was held in Chicago this weekend. ChatGPT, M.D., also edged Dr. Google when it came to accuracy in the judgement of the surgeons.
"Given that patients are increasingly turning to these LLMs to learn more about their health, we wanted to compare kind of the old school way that people look for health information online, which is Google, and this new school way, which is the AI chatbots, and see how accurate they were at responding," said Samuel Cohen, M.D., a first-year ophthalmology resident at University of California, Los Angeles,Stein Eye Institute.
The study was designed to look at the readability of the answers provided by each platform using five validated indices and the quality of the results.
Cohen walked the audience through a sample process for how the answers were produced by ChatGPT, which involved typing a query into the increasingly popular artificial intelligence-powered chatbot. Three separate sets of questions were asked across 10 common oculoplastic conditions: thyroid eye disease, orbital cellulitis, orbital tumors, proptosis, ptosis, entropion, blepharospasm, chalazion, nasolacrimal duct obstruction and epiphora. The responses for each from both platforms were blinded and then presented side by side to the panel.
Overall, for readability, ChatGPT defaulted to a grade level of 15.6 and Google to a grade 10 level. Across the questions, which were each graded for favorability separately by each panelist, ChatGPT was ranked as the better answer 79% of the time, with Google favored 12% and 9% saying there was no difference between the two.
ChatGPT also edged out Google on accuracy. The panel deemed it accurate 93% of the time compared with 78% of the time for Google.
"We think that these findings are important because we think that LLMs really do have the potential to revolutionize how patients learn about their health and also how physicians educate patients," said Cohen. "Patients have long now gone to Google but what we've shown here is there could be a superior alternative in terms of high-quality information preferred by experts in the field."
Cohen and his colleagues also tested ChatGPT’s ability to generate patient education materials. They supplied Open AI’s household-name chatbot with a default answer written at a grade 15.7 level along with a prompt to write provide an answer at a grade 7 level. Despite the prompt, the chatbot returned an answer at a grade 11.7 reading level.
Both the default and the requested grade 7 reading level responses were judged by the group of surgeons. The default response was favored over the ChatGPT (40% vs. 25%) in their grading, with the remainder indications that either response would work. Both received high grades for accuracy.
"We've seen that physicians can really use AI to create these patient education materials that can be informative for patients," said Cohen. "Although we did not get to that 7th grade level, we did see improvements that could make these education materials more accessible for more patients."
Members of the audience said in their experience that asking ChatGPT for a grade 5 reading level response yielded one at a grade 8 level. Cohen noted that his study used the 3.5version of ChatGPT and that newer versions would likely produce better results.
"It seems like it's gotten a little bit better, but I think really specifying in the request the education level that you want back is going to become more important,” he said.
In this latest episode of Tuning In to the C-Suite podcast, Briana Contreras, an editor with MHE had the pleasure of meeting Loren McCaghy, director of consulting, health and consumer engagement and product insight at Accenture, to discuss the organization's latest report on U.S. consumers switching healthcare providers and insurance payers.
Listen
Patient Advocacy Groups and Caretaker Diversity in Metastatic Breast Cancer Research
October 22nd 2024Stephanie Graff, M.D., FACP, FASCO, director of breast oncology at the Lifespan Cancer Institute and author of Investigating the Salience of Clinical Meaningfulness and Clinically Meaningful Outcomes in Metastatic Breast Cancer Care Delivery, shares the reasons why she chose to study metastatic breast cancer patients.
Read More
In our latest "Meet the Board" podcast episode, Managed Healthcare Executive Editors caught up with editorial advisory board member, Eric Hunter, CEO of CareOregon, to discuss a number of topics, one including the merger that never closed with SCAN Health Plan due to local opposition from Oregonians.
Listen
Addressing the Challenges of AI is Imperative | AMCP Nexus 2024
October 22nd 2024There is a need for strict policies regarding the use of AI in the managed care space, according to Douglas S. Burgoyne, Pharm.D., MBA, FAMCP, adjunct associate professor at the University of Utah College of Pharmacy.
Read More