Hi! Welcome back to AIMedily.
How’s the first week of the new year going?
I don’t usually do New Year’s resolutions. Instead, I focus on stacking habits and staying consistent.
This year, I’m clear on my priorities:
Health
Exercising 5 days a week, getting 7–8 hours of sleep, and keeping a daily meditation practice.
Personal life
More quality time with my family, one small adventure each month, and two meaningful trips.
Professional
Growing this newsletter, contributing more at the Neurobionics lab, and attending 4–6 conferences.
Do you set goals—or approach the new year differently?
Before we dive in, today OpenAI introduced ChatGPT Health — and honestly, I’m not sure what to think. Here are my thoughts.
Let’s dive in.
🤖 AIBytes
Researchers from Stanford trained a large AI foundation model on overnight sleep studies to predict long-term disease risk from a single night of sleep.
🔬 Methods
Study type: Multicenter model development and validation study
Data: Polysomnography (PSG) recordings
Participants: 65,000 individuals
Total sleep data: >585,000 hours
Signals used:
Brain activity (EEG/EOG)
Heart activity (ECG)
Muscle activity (EMG)
Respiratory signals
Model: SleepFM.
Prediction window: Up to 6 years after a single sleep study.
Outcomes evaluated: 1,041 disease phenotypes mapped from Electronic Health Record data.
📊 Results
130 diseases predicted with strong accuracy from one night of sleep.
C-Index ≥0.75 for all 130 conditions
Examples of diseases predicted years in advance:
All-cause mortality: 0.84
Dementia: 0.85
Myocardial infarction: 0.81
Heart failure: 0.80
Chronic kidney disease: 0.79
Stroke: 0.78
Atrial fibrillation: 0.78
The model reliably identified sleep stages and could detect sleep apnea.

🔑 Key Takeaways
A single night of sleep contains strong signals about future disease risk.
The model generalized across cohorts and recording setups.
This is risk prediction—not diagnosis—and clinical use is not established.
Multimodal sleep data outperformed demographics alone for prediction.
🔗 Thapa R, Kjaer MR, He B, et al. A multimodal sleep foundation model for disease prediction. Nature Medicine. 2025. doi:10.1038/s41591-025-04133-4
Researchers evaluated whether LLM agents can safely and reliably complete multi-step clinical tasks, and introduce a memory component that enables the agent to learn from prior failures.
🔬 Methods
Benchmark: MedAgentBench v2.
Test: 300 clinical tasks, based on real-world clinical processes.
Tasks evaluated:
Multi-step: Completing actions in the correct sequence.
Tool-using: Agents interacted with tools, similar to using an electronic health record (EHR).
Workflow-dependent: Following a realistic clinical workflow.
Agent capabilities tested:
Planning: Deciding what steps to take and in what order.
EHR-like tool use: Interacting with a simulated clinical system.
Memory vs no memory: Whether the agent could retain information from earlier steps.
Base model: GPT-4.1
Generalization test: An additional 300 unseen tasks designed by a physician to test performance on new scenarios.
📊 Results
Performance on original tasks:
91.0% success without memory
98.0% success with memory
Performance on new tasks:
88.67% overall success
266 out of 300 tasks completed correctly
Errors observed:
Most failures were due to skipped or missed steps.
Errors occurred even when the medical knowledge itself was correct.
Design impact:
Memory reduced repeated mistakes.
Structured agent design improved consistency.

🔑 Key Takeaways
Evaluating AI by workflow completion is more clinically meaningful than testing question accuracy alone.
Memory improves task performance but raises safety and oversight concerns.
High success rates still leave room for clinically important failures.
These agents are not ready for autonomous clinical use, but they offer a better way to test future clinical AI systems.
🔗 Chen E, Postelnik S, Black K, et al. MedAgentBench v2: Improving medical LLM agent design. Pac Symp Biocomput. 2026;354–371.
🦾TechTool
Enables real-time remote access to the operating room.
It scales training, education, and collaboration to improve surgery.
Relevant for complex surgeries where access to specialized expertise is limited.
Uses AI to analyze coronary CT angiography and quantify plaque characteristics and stenosis.
Provides trackable insights to personalize prevention.
Promotes treatment on early stages.
Visualizes how research papers connect , showing citation pathways and emerging themes.
Useful for clinicians and researchers who want to understand a field quickly without missing key studies.
🧬AIMedily Snaps
Utah becomes the first state to evaluate autonomous AI for prescription renewals in chronic disease (Link).
Stanford AI Experts Predict What Will Happen in 2026 A “ChatGPT Moment” for AI in Medicine (Link).
FDA paves way for more consumer wearables, AI-enabled devices to hit the market (Link).
Executives discuss AI reshaping the healthcare workforce (Link).
OpenAI selected b.well to connect health data for ChatGPT Health (Link).
AI liability: A framework for health systems (Link).
🧩TriviaRX
Time to test your general knowledge.
Which medical specialty accounts for nearly half of FDA-cleared AI/ML medical devices?
A) Cardiology
B) Radiology
C) Neurology
D) Pathology
That’s it for today.
As always, thank you for taking the time to read.
You’re already ahead of the curve in medical AI — don’t keep it to yourself. Share AIMedily with your friends.
See you around next Wednesday.
Itzel Fer, MD PM&R
Forwarded this email? Sign up here
P.S. Are you enjoying AIMedily? Leave a review 👉 Here.






