Hi! Welcome back to AIMedily.

How’s the first week of the new year going?

I don’t usually do New Year’s resolutions. Instead, I focus on stacking habits and staying consistent.

This year, I’m clear on my priorities:

Health
Exercising 5 days a week, getting 7–8 hours of sleep, and keeping a daily meditation practice.

Personal life
More quality time with my family, one small adventure each month, and two meaningful trips.

Professional
Growing this newsletter, contributing more at the Neurobionics lab, and attending 4–6 conferences.

Do you set goals—or approach the new year differently?

Before we dive in, today OpenAI introduced ChatGPT Health — and honestly, I’m not sure what to think. Here are my thoughts.

Let’s dive in.

🤖 AIBytes

One Night of Sleep Can Predict Future Disease Risk

Researchers from Stanford trained a large AI foundation model on overnight sleep studies to predict long-term disease risk from a single night of sleep.

🔬 Methods

Study type: Multicenter model development and validation study

Data: Polysomnography (PSG) recordings

Participants: 65,000 individuals

Total sleep data: >585,000 hours

Signals used:

Brain activity (EEG/EOG)
Heart activity (ECG)
Muscle activity (EMG)
Respiratory signals

Model: SleepFM.

Prediction window: Up to 6 years after a single sleep study.

Outcomes evaluated: 1,041 disease phenotypes mapped from Electronic Health Record data.

📊 Results

130 diseases predicted with strong accuracy from one night of sleep.
C-Index ≥0.75 for all 130 conditions

Examples of diseases predicted years in advance:

All-cause mortality: 0.84
Dementia: 0.85
Myocardial infarction: 0.81
Heart failure: 0.80
Chronic kidney disease: 0.79
Stroke: 0.78
Atrial fibrillation: 0.78

The model reliably identified sleep stages and could detect sleep apnea.

🔑 Key Takeaways

A single night of sleep contains strong signals about future disease risk.
The model generalized across cohorts and recording setups.
This is risk prediction—not diagnosis—and clinical use is not established.
Multimodal sleep data outperformed demographics alone for prediction.

🔗 Thapa R, Kjaer MR, He B, et al. A multimodal sleep foundation model for disease prediction. Nature Medicine. 2025. doi:10.1038/s41591-025-04133-4

Testing AI agents on Multi-Step Clinical Tasks

Researchers evaluated whether LLM agents can safely and reliably complete multi-step clinical tasks, and introduce a memory component that enables the agent to learn from prior failures.

🔬 Methods

Benchmark: MedAgentBench v2.

Test: 300 clinical tasks, based on real-world clinical processes.

Tasks evaluated:

Multi-step: Completing actions in the correct sequence.
Tool-using: Agents interacted with tools, similar to using an electronic health record (EHR).
Workflow-dependent: Following a realistic clinical workflow.

Agent capabilities tested:

Planning: Deciding what steps to take and in what order.
EHR-like tool use: Interacting with a simulated clinical system.
Memory vs no memory: Whether the agent could retain information from earlier steps.

Base model: GPT-4.1
Generalization test: An additional 300 unseen tasks designed by a physician to test performance on new scenarios.

📊 Results

Performance on original tasks:

91.0% success without memory
98.0% success with memory

Performance on new tasks:

88.67% overall success
266 out of 300 tasks completed correctly

Errors observed:

Most failures were due to skipped or missed steps.
Errors occurred even when the medical knowledge itself was correct.

Design impact:

Memory reduced repeated mistakes.
Structured agent design improved consistency.

🔑 Key Takeaways

Evaluating AI by workflow completion is more clinically meaningful than testing question accuracy alone.
Memory improves task performance but raises safety and oversight concerns.
High success rates still leave room for clinically important failures.
These agents are not ready for autonomous clinical use, but they offer a better way to test future clinical AI systems.

🔗 Chen E, Postelnik S, Black K, et al. MedAgentBench v2: Improving medical LLM agent design. Pac Symp Biocomput. 2026;354–371.

🦾TechTool

Proximie

Enables real-time remote access to the operating room.
It scales training, education, and collaboration to improve surgery.
Relevant for complex surgeries where access to specialized expertise is limited.

Cleerly

Uses AI to analyze coronary CT angiography and quantify plaque characteristics and stenosis.
Provides trackable insights to personalize prevention.
Promotes treatment on early stages.

Litmaps

Visualizes how research papers connect , showing citation pathways and emerging themes.
Useful for clinicians and researchers who want to understand a field quickly without missing key studies.

🧬AIMedily Snaps

Utah becomes the first state to evaluate autonomous AI for prescription renewals in chronic disease (Link).
Stanford AI Experts Predict What Will Happen in 2026 A “ChatGPT Moment” for AI in Medicine (Link).
FDA paves way for more consumer wearables, AI-enabled devices to hit the market (Link).
Executives discuss AI reshaping the healthcare workforce (Link).
OpenAI selected b.well to connect health data for ChatGPT Health (Link).
AI liability: A framework for health systems (Link).