Move Over, Dr. Google. ChatGPT, M.D., Has Better Answers | AAO 2024

News
Article

ChatGPT bested Google in a blinded comparison of their answers to common oculoplastic questions.

Google has long held the position of being the source for answers to people’s medical questions, earning the search engine the nickname “Dr. Google.”

But thanks to the advent of large language models (LLMs), there may be a new online go-to medical expert: ChatGPT, M.D.

When answers to frequently answers from ChatGPT and Google’s Gemini — which is Google’s generative chatbot — were evaluated by six oculoplastic surgeons, 79% of their grades on 30 questions indicated that ChatGPT gave the better answer, according to research presented at the 2024 annual meeting of the American Academy of Ophthalmology that was held in Chicago this weekend. ChatGPT, M.D., also edged Dr. Google when it came to accuracy in the judgement of the surgeons.

"Given that patients are increasingly turning to these LLMs to learn more about their health, we wanted to compare kind of the old school way that people look for health information online, which is Google, and this new school way, which is the AI chatbots, and see how accurate they were at responding," said Samuel Cohen, M.D., a first-year ophthalmology resident at University of California, Los Angeles,Stein Eye Institute.

The study was designed to look at the readability of the answers provided by each platform using five validated indices and the quality of the results.

Samuel Cohen

Samuel Cohen

Cohen walked the audience through a sample process for how the answers were produced by ChatGPT, which involved typing a query into the increasingly popular artificial intelligence-powered chatbot. Three separate sets of questions were asked across 10 common oculoplastic conditions: thyroid eye disease, orbital cellulitis, orbital tumors, proptosis, ptosis, entropion, blepharospasm, chalazion, nasolacrimal duct obstruction and epiphora. The responses for each from both platforms were blinded and then presented side by side to the panel.

Overall, for readability, ChatGPT defaulted to a grade level of 15.6 and Google to a grade 10 level. Across the questions, which were each graded for favorability separately by each panelist, ChatGPT was ranked as the better answer 79% of the time, with Google favored 12% and 9% saying there was no difference between the two.

ChatGPT also edged out Google on accuracy. The panel deemed it accurate 93% of the time compared with 78% of the time for Google.

"We think that these findings are important because we think that LLMs really do have the potential to revolutionize how patients learn about their health and also how physicians educate patients," said Cohen. "Patients have long now gone to Google but what we've shown here is there could be a superior alternative in terms of high-quality information preferred by experts in the field."

Cohen and his colleagues also tested ChatGPT’s ability to generate patient education materials. They supplied Open AI’s household-name chatbot with a default answer written at a grade 15.7 level along with a prompt to write provide an answer at a grade 7 level. Despite the prompt, the chatbot returned an answer at a grade 11.7 reading level.

Both the default and the requested grade 7 reading level responses were judged by the group of surgeons. The default response was favored over the ChatGPT (40% vs. 25%) in their grading, with the remainder indications that either response would work. Both received high grades for accuracy.

"We've seen that physicians can really use AI to create these patient education materials that can be informative for patients," said Cohen. "Although we did not get to that 7th grade level, we did see improvements that could make these education materials more accessible for more patients."

Members of the audience said in their experience that asking ChatGPT for a grade 5 reading level response yielded one at a grade 8 level. Cohen noted that his study used the 3.5version of ChatGPT and that newer versions would likely produce better results.

"It seems like it's gotten a little bit better, but I think really specifying in the request the education level that you want back is going to become more important,” he said.

Recent Videos
Screenshot of an interview with Harmony Garges, MD
George O. Waring, IV, MD, during a video interview
1 KOL is featured in this series.
1 KOL is featured in this series.
Michelle Cespedes, M.D
Related Content
© 2024 MJH Life Sciences

All rights reserved.