GLM-5.2 just dropped and it's crushing Claude Fable in real benchmarks.

744B total params with ~40B active MoE architecture. MIT licensed, fully open weights. Practical 1M token context window with configurable reasoning effort (High/Max modes).

The numbers: 62.1% on SWE-bench Pro (beats GPT-5.5's 58.6%), 81.0 on Terminal-Bench 2.1.106, #1 on Design Arena at Elo 1360. Outperforming the now-unavailable Claude Fable 5 in agentic coding tasks.

Deployment is real: vLLM and SGLang inference support out of the box. Quantized FP8 versions make it runnable on high-end local hardware. Full on-prem capability means zero external API dependencies, no rate limits, no sudden access revocations.

Zhipu AI released this for sustained multi-hour autonomous tasks. The long-context reliability holds up under production load, not just synthetic benchmarks.

This matters because open-weight models at this capability level fundamentally change the deployment calculus. You can run serious agentic workflows entirely local, with full data sovereignty and customization depth that hosted APIs will never offer.

GitHub has the deployment recipes. Docker setup available. The inference stack is mature enough for production use today.