Fact-based news without bias awaits. Make 1440 your choice today.

Overwhelmed by biased news? Cut through the clutter and get straight facts with your daily 1440 digest. From politics to sports, join millions who start their day informed.

Hi!

Today is LLM’s day.

This week OpenEvidence showed the results of their system responding to questions from the United States Medical Licensing Examination (USMLE). The results? a score of 100% (Link).

The system answers the questions correctly and also teaches the reasoning behind each question, which is very useful for students.

Also, this week OpenEvidence launched “Visits” a Digital Assistant that transcribes medical visits, and gives you (in real-time) the latest guidelines, research, and clinical recommendations. It is currently free from US healthcare providers.

I use OpenEvidence almost every day. Have you tested it?

Now, let’s dive into today’s issue.

LLMs

This study examined how LLMs behave under adversarial attacks, for example, when the prompt contains false clinical data (laboratory test, a physical or radiological sign, or a medical condition), causing "hallucinations".

🔬 Methods

They tested on:

  • Default (standard settings).

  • Mitigating prompt (designed to reduce hallucinations)

  • Temperature 0 (getting the same response to the same question)

Attack types:

  • Input perturbation (modified clinical text to cause hallucinations).

  • Prompt injection (misleading instructions).

  • Decision steering (forcing biased outcomes).

They evaluated:

  • Accuracy of the responses

  • Consistency across models

  • Susceptibility

📊 Results

  • LLMs showed high vulnerability to adversarial hallucinations.

  • Hallucination rates ranged from 50% to 82% across models and prompting methods.

  • Use of a mitigation prompt reduced the hallucination rate from 66% to 44%.

  • The best model was GPT-4o, hallucinations declined from 53 % to 23 %.

🔑 Key Takeaways

  • LLMs are not yet reliable for unsupervised clinical use under adversarial conditions.

  • Multi-model assurance and human supervision are critical.

  • Keep in mind that although LLMs fail under deliberate attacks, they can still be useful when presented with the correct information and validated by a professional.

🔗 Omar M, Sorin V. Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support. Nat Mach Intell. 2025; doi:10.1038/s43856-025-01021-3

🦾TechTools

Today, we have 3 more LLMs. 2 of them are Open Source, where the code is public, free, and anyone can modify it. If you want to test these LLMs, click on the name.

  • Better for research. If you want to find information, this is the right LLM.

  • Cites the source of information, like medical references.

  • It has access to current medical information.

  • Can be integrated with Claude and GPT-4o.

Llama (Meta):

  • It is Open Source.

  • Can process text and images.

  • Can manage extensive documents.

  • Cost-effective for healthcare institutions.

  • Multilingual (12 languages).

  • It is Open source, used by researchers and developers.

  • Lower cost

  • Can manage large amounts of data.

  • Low hallucinations.

  • Strong for coding.

We’re done for today.

If you know people in healthcare who would like to get updates on LLM news, feel free to share it.

You can:

↪️ Forward this email or 📲copy this link and send it on your phone. 

Thank you! Enjoy the weekend ☀️.

Until next Wednesday.

Itzel Fer, MD PM&R

Follow me on LinkedIn | Substack | X | Instagram

Forwarded this email? Sign up here

How did you like today's newsletter?

Login or Subscribe to participate