Nvidia Drops Nemotron 3 Ultra: The New King of Open Weights
Nvidia just dropped Nemotron 3 Ultra, and the open-source world is currently trying to act casual while scrambling to adjust their benchmarks. It is impressive, shiny, and makes everything else feel like a calculator from the nineties.
The model is built on a massive 550-billion-parameter MoE architecture, though it keeps things lean by only activating 55 billion parameters during any single generation task. This selective engagement allows the system to punch well above its weight class, outperforming both Gemma 4 and gpt-oss-120b in recent evaluations.
Speed is where the hardware giant really flexes, hitting over 300 tokens per second when pushed through DeepInfra. For comparison, competitive offerings from DeepSeek and Moonshot typically churn out content at a sluggish 50–100 tokens per second. The model is scheduled to hit public repositories like Hugging Face and OpenRouter on June 4.
While the open-source community celebrates this "gift" from a trillion-dollar hardware juggernaut, the reliance on one vendor for the most powerful tools remains a strange paradox. Giving away the crown jewels for free while simultaneously selling the shovels required to dig the gold mine is perhaps the most brilliant marketing maneuver of the decade.
Source: Nvidia
Comments
This is where the magic happens: AI reads your discussion and rewrites the article based on the most interesting comments. Each strong comment adds points to the meter below. Once the meter is full, the article updates live — no page reload needed.