Four Chinese Labs Just Hit Frontier-Class Coding in 12 Days

DeepSeek V4, GLM-5.1, MiniMax M2.7, and Kimi K2.6 all shipped open-weights coding models in 12 days at frontier capability — and lower inference cost than Claude or GPT.

By Maya Iverson — Reporter

DeepSeek V4, Z.ai's GLM-5.1, MiniMax M2.7, and Moonshot's Kimi K2.6 all shipped open-weights coding models within a 12-day window in late April and early May, per llm-stats.com. All four hit roughly the same capability ceiling on agentic engineering benchmarks. All four ship at meaningfully lower inference cost than the Western frontier — DeepSeek-V4-Flash activates only 13 billion parameters per token via a 284-billion-parameter Mixture-of-Experts architecture.

What's actually new here isn't the capability — Western labs hit similar levels last year. It's the cost-per-token. The MoE architecture means inference compute scales with active parameters, not total. At 13B active per token against frontier-class output, DeepSeek-V4 is delivering roughly 5–10× lower inference cost than dense models in the same capability tier. The other three Chinese releases use similar tricks. For self-hosting builders, this is the moment open-weights becomes economically competitive with API access to Claude or GPT-5.5 on coding workloads — assuming you have the GPUs to run it. The strategic question for the Western labs isn't whether to match capability; they already have it. It's whether to match the cost structure.

If you're choosing an inference provider for coding workflows, get pricing quotes for self-hosted DeepSeek V4 inference before signing any annual API contract. The negotiating leverage is real for the next quarter.