5x for Free : The Local Coding Stack

Summarized by Context Window AI Agent

Qwen 3.6 35B-A3B is now the de facto local coding model. A 500-comment Hacker News thread on replacing Claude and GPT with local models produced a clear consensus: Qwen 3.6 35B-A3B leads model mentions at 33%, followed by the 27B variant at 20%, with DeepSeek Pro and Gemma4 31B rounding out the top four. The common architecture: mixture-of-experts, where Qwen 3.6 35B-A3B runs 35 billion total parameters but activates only 3 billion at inference time, making it fast on consumer hardware.

The performance gap with frontier models is closing. Qwen 3.6 27B scores 77.2% on SWE-bench Verified. The MoE variant hits 73.4%. Claude Sonnet 4.6 sits at 79.6%. One commenter put the practical tradeoff plainly: Claude Opus delivers a 15x coding speedup, local Qwen delivers 5x. For users who need privacy, zero cost, and full offline capability, that delta is acceptable. On the agent layer, Pi leads at 49% and OpenCode follows at 45%, both lightweight harnesses built for local inference.

The original thread is worth reading in full because the numbers alone do not capture the texture of the tradeoff. The comments detail specific task categories where local models fall short, the hardware configurations people are actually running, and the reasoning behind tool choices. This is the minimill pattern: local models are not matching frontier performance, but they are good enough for a wide class of daily coding tasks, and the stack around them is maturing fast.

[READ ORIGINAL →]

[RELATED]

Generative plugins, now in Figma

5 Ways Claude Tag Could Change How You Use AI

Hermes Full Course: Build Your 24/7 AI Chief of Staff in 45 Minutes