← All Insights

DeepSeek Didn't Just Train Better—They Changed How Transformers Think

architectureopen-sourcechina

OpenAI declared “code red” over DeepSeek’s V3.2 release. Not because the benchmarks scared them—though they should—but because the how behind those benchmarks threatens the entire “scale is all you need” thesis.

The mHC (manifold-based Hierarchical Cognition) architecture lets the model maintain multiple parallel reasoning streams without the training instability that usually kills such attempts. Think of it as the difference between a single-threaded processor and a genuinely parallel one. Same transistor count, fundamentally different capability.

This matters because it’s a direct challenge to the scaling playbook. The American AI giants have been in an arms race of compute—bigger clusters, more GPUs, longer training runs. DeepSeek is suggesting that smarter architecture might matter more than brute force.

The open-source angle makes this worse for the incumbents. V3.2 matches or beats closed-weight leaders through better training recipes and efficiency engineering. If you can achieve frontier performance without frontier budgets, the moat around proprietary models gets a lot shallower.

I’m not predicting OpenAI’s demise here. But “throw more compute at it” has been the default answer to most AI capability questions for years. DeepSeek just demonstrated that it’s not the only answer—and maybe not even the best one.

The companies betting everything on scale should be nervous. The researchers who’ve been quietly working on architecture innovations should feel vindicated.