Fact-based news without bias awaits. Make 1440 your choice today.

Overwhelmed by biased news? Cut through the clutter and get straight facts with your daily 1440 digest. From politics to sports, join millions who start their day informed.

Hi!

Today is LLM’s day.

This week OpenEvidence showed the results of their system responding to questions from the United States Medical Licensing Examination (USMLE). The results? a score of 100% (Link).

The system answers the questions correctly and also teaches the reasoning behind each question, which is very useful for students.

Also, this week OpenEvidence launched “Visits” a Digital Assistant that transcribes medical visits, and gives you (in real-time) the latest guidelines, research, and clinical recommendations. It is currently free from US healthcare providers.

I use OpenEvidence almost every day. Have you tested it?

Now, let’s dive into today’s issue.

✨LLMs

LLMs Fail Under Adversarial Attacks in Clinical Cases

This study examined how LLMs behave under adversarial attacks, for example, when the prompt contains false clinical data (laboratory test, a physical or radiological sign, or a medical condition), causing "hallucinations".

🔬 Methods

LLMs tested: GPT-4, GPT-4o, Llama-3, Claude 3, Gemini 1.5, and Meditron.

They tested on:

Default (standard settings).
Mitigating prompt (designed to reduce hallucinations)
Temperature 0 (getting the same response to the same question)

Attack types:

Input perturbation (modified clinical text to cause hallucinations).
Prompt injection (misleading instructions).
Decision steering (forcing biased outcomes).

They evaluated:

Accuracy of the responses
Consistency across models
Susceptibility

📊 Results

LLMs showed high vulnerability to adversarial hallucinations.
Hallucination rates ranged from 50% to 82% across models and prompting methods.
Use of a mitigation prompt reduced the hallucination rate from 66% to 44%.
The best model was GPT-4o, hallucinations declined from 53 % to 23 %.

🔑 Key Takeaways

LLMs are not yet reliable for unsupervised clinical use under adversarial conditions.
Multi-model assurance and human supervision are critical.
Keep in mind that although LLMs fail under deliberate attacks, they can still be useful when presented with the correct information and validated by a professional.

🔗 Omar M, Sorin V. Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support. Nat Mach Intell. 2025; doi:10.1038/s43856-025-01021-3

🦾TechTools

Today, we have 3 more LLMs. 2 of them are Open Source, where the code is public, free, and anyone can modify it. If you want to test these LLMs, click on the name.

Perplexity: