Skip to main content

One post tagged with "coding-models"

View All Tags

What's the best way to run a 30B coding model on a 24 GB Mac?

· 7 min read

We explored ways to beat the existing open coding model's stock quantizations. In the end, the bottleneck turned out to be something else.

Quick summary: Our goal was to find the best coding model that fits in 24 GB on an Apple Silicon Mac. We started from North-Mini-Code, a 30-billion-parameter open model (Apache-2.0), and tried roughly a dozen ways to quantize it past its stock 4-bit version: cheaper formats, more expensive ones, custom per-part bit budgets, and a calibrated method we had to build new tooling just to run. None of them beat plain round-to-nearest 4-bit on coding tests. The 4-bit model already fits in 20.81 GB at a realistic working context and decodes at about 40 tokens per second, so memory and speed were never the real limit. The real limit only showed up when we ran the model as an agent: it often falls into repetition loops and never finishes the task. That is a decoding problem, not a quantization problem.