Week 27, 2025

ERNIE 4.5

Baidu ERNIE 4.5 VLMs and LLMs beats DeepSeek v3, Qwen 235B and competitive to OpenAI O1 (for VLM) - Apache 2.0 licensed.

The 21B A3B is 30% less than Qwen3 30B A3B and better on most benchmarks.

MAI-DxO

SDBench introduces a new benchmark that transforms 304 NEJM cases into interactive diagnostic simulations. AI must ask questions, order tests, and weigh costs, mirroring the complexity of real clinical decision-making.

MAI-DxO is a model-agnostic orchestrator that simulates a panel of virtual physicians. It achieves 85.5% diagnostic accuracy—four times that of experienced doctors—while cutting diagnostic costs.

Together, these advances offer a blueprint for how AI can help deliver precision and efficiency in healthcare, and we're looking forward to working with healthcare partners and the entire ecosystem on these advances making a difference.