2026 OPENROUTER
CODE_
RANK_
MAC_
ROUTER.

Abstract code screen and OpenRouter programming leaderboard visualization

Open openrouter.ai/rankings?category=programming. As of 2026-05-26, the Programming sub-chart no longer matches the SWE-bench Verified leaderboard. DeepSeek V4 Flash leads with 4.02T tokens/week, Tencent Hy3 preview enters at #2 with 3.48T, and Claude Opus 4.7 and Sonnet 4.6 sit at #3 and #4. But on SWE-bench Verified the order is different: GPT-5.5 88.7% > Opus 4.7 87.6% > Opus 4.6 80.8% > Gemini 3.1 Pro 80.6% > DeepSeek V4 Pro 80.6% > MiniMax M2.5 80.2% > Kimi K2.6 80.2%. The #1 usage model V4 Flash scores ~79%, while the #1 benchmark model GPT-5.5 is nowhere in the usage Top 10. So the question: on Apple Silicon Macs, should Cursor / Cline / Continue / Zed pick by usage chart or by benchmark chart? Who runs locally, who needs a remote Mac, who must hit OpenRouter API? This piece delivers a leaderboard snapshot, usage-vs-benchmark contrast table, Mac local-fit matrix, IDE multi-route playbook, three-lane decision grid, acceptance checklist, and FAQ. We cross-link May OpenRouter overall chart matrix, Cursor + local LLM three paths, and macMLX OpenAI-compatible API.

1. Pain points: usage chart is not benchmark, benchmark is not a router

First, usage volume does not equal capability. DeepSeek V4 Flash tops the Programming chart at 4.02T because it ships free tier, 1M context, $0.14/$0.28 pricing, and default IDE integration. Its SWE-bench Verified score is around 79%, so on the hardest bugs it visibly trails Opus 4.7. Second, benchmark scores do not equal what you pay. GPT-5.5 sits at the top of SWE-bench Verified at $5/$30 per million tokens; a single Cursor Composer task with 60K input and 20K output costs roughly $0.90. The same workload on V4 Flash costs about $0.014, a 64× difference. Third, local capacity is brutal on Macs. DeepSeek V4 Flash is a 284B-parameter, 13B-active MoE model. Even at FP8 it needs roughly 150GB of memory, so consumer Macs cannot host it at all. Kimi K2.6 with its 128K context window scores well on SWE-bench Verified, but the full model also exceeds typical Apple Silicon footprints. Fourth, IDE routing strategies drift. Teams that set Sonnet 4.6 as the global default pay 100× the per-token cost of V4 Flash on completions. Teams that point Composer at V4 Flash see multi-file patches miss edge cases that Sonnet handles cleanly. Fifth, the chart itself moves fast. Hy3 preview was not in the Top 10 a week ago and is now #2. Owl Alpha is a stealth newcomer. Gemini 3 Flash Preview entered the Top 7 inside seven days. Routing configured against last quarter's chart is routing against last quarter's cost structure.

2. End-May 2026 OpenRouter Programming snapshot (Python view, this week)

#ModelVendorWeekly tokens (coding)$/M (in/out)ContextWeek change
1DeepSeek V4 FlashDeepSeek~4.02T$0.14 / $0.281MHolds
2Hy3 previewTencent~3.48Tpaid tier200K↑ New #2
3Claude Opus 4.7Anthropic~2.26T$5.00 / $25.001M↓ 1
4Claude Sonnet 4.6Anthropic~2.15T$3.00 / $15.001MFlat
5Owl AlphaStealth~1.6Tfree preview1M↑ New
6DeepSeek V4 ProDeepSeek~1.4T$0.435 / $0.871M↑ 1
7Gemini 3 Flash PreviewGoogle~1.2T$0.30 / $2.501.05M↑ New
8DeepSeek V3.2DeepSeek~900B$0.25 / $0.381M↓ 2
9Kimi K2.6MoonshotAI~750B$0.75 / $3.50128K↑ 1
10Gemini 2.5 Flash LiteGoogle~600B$0.10 / $0.401M↓ 1

3. Usage vs SWE-bench Verified contrast table

ModelUsage chartSWE-bench VerifiedOutput $/MUsage-vs-capability gap
GPT-5.5Not in coding Top 1088.7%$30Top capability, priced out
Claude Opus 4.7#3 (2.26T)87.6%$25High usage and high score, expensive
Claude Opus 4.6Not in Top 1080.8%$25Replaced by 4.7
Gemini 3.1 ProNot in Top 1080.6%$12Strong but routing affinity weak
DeepSeek V4 Pro#6 (1.4T)80.6%$0.87Best value
MiniMax M2.5Not in Top 1080.2%$1.20Score up, usage flat
Kimi K2.6#9 (750B)80.2%$3.50Agent-heavy workloads
GPT-5.4Not in Top 1078.2%$15Eaten by 5.5
MiMo-V2-ProOut of coding chart (overall #1)78.0%$3General strong, coding mid
DeepSeek V4 Flash#1 (4.02T)~79%$0.28Usage king, mid capability

The conclusion is clean: the usage chart measures the price-performance sweet spot for the bulk of day-to-day coding tasks, while the benchmark chart measures the capability ceiling for the hardest 10% of bugs. Around 80% of Cursor and Cline traffic—line completion, single-file refactors, unit-test generation—runs faster and cheaper on DeepSeek V4 Flash. The other 20%—architectural rewrites, cross-module refactors, hard debugging—is where Opus 4.7 or GPT-5.5 earns its price. Collapsing both curves onto a single default model gets you expensive, slow, or wrong.

4. Mac Apple Silicon local-fit matrix

BucketRepresentative coding modelsMac local strategyUnified memory floor
A. Strong localQwen3 Coder 30B, DeepSeek Coder V2 Lite, Kimi K2 MiniMLX 4-bit at 32K–64K, IDE points at 127.0.0.1:8081≥ 32GB (M2 Pro+)
B. Local with high-end specsQwen3 Coder 72B, Kimi K2.6 128K, DeepSeek V3.2 distillMLX 4-bit at 64K, leave swap headroom, IDE over LAN /v1≥ 64GB (M3/M4 Max)
C. Remote Mac requiredDistilled V4 Pro, mid-size Owl Alpha, open Hy3 variants (if any)Will not fit on a laptop—deploy on rented 128GB+ Apple SiliconLocal viable only at 128GB+
D. API onlyDeepSeek V4 Flash (284B/13B MoE), Hy3 preview, Claude Opus 4.7, GPT-5.5, Gemini 3 Flash PreviewClosed or oversized—must use OpenRouter or vendor API
E. Agent long-chainKimi K2.6 (agent swarm), Claude Sonnet 4.6 (Cursor Composer)Sonnet via API; Kimi 32B distill viable locally≥ 64GB (distill)

Heads-up on naming: DeepSeek V4 Flash sounds small but it is a 284B-parameter, 13B-active MoE. Even FP8 needs roughly 150GB of memory. An M4 Max with 192GB still cannot host the full weights; locally you swap in Coder V2 Lite or Qwen3 Coder 30B instead. Hy3 preview is Tencent Hunyuan's preview endpoint with no open weights, which keeps it firmly in bucket D.

5. Six-step rollout: turn the Programming chart into your IDE router

Step 1 — Snapshot Programming chart and SWE-bench together

Every Monday pull openrouter.ai/rankings?category=programming&view=week plus /api/v1/models (pricing, context, providers). Manually align the week's SWE-bench Verified table. Persist into local SQLite with a unified usage / capability / price / Mac-fit view.

Step 2 — Bucket your coding workloads

Four buckets: inline completion, single-file refactor, multi-file Composer-agent, complex debugging and architectural change. For each, pick two candidates (primary + standby) constrained by latency, tool-call support, and per-request budget.

Step 3 — Local MLX for coding models

For bucket A (completion + single file) run mlx_lm.server --model mlx-community/Qwen3-Coder-30B-Instruct-4bit --port 8081. In Cursor add a Custom OpenAI provider pointing at http://127.0.0.1:8081/v1. Run five canonical prompts and record TTFT, decode tok/s, and unified memory peak as the baseline.

Step 4 — Multi-route across Cursor / Cline / Continue / Zed

Configure primary + fallback + per-task routing in each IDE. Cursor: Settings → Models → add OpenRouter as Custom OpenAI. Cline: edit ~/.cline/config.json with provider: openrouter and a fallback array. Continue: in ~/.continue/config.json assign distinct models per role (autocomplete, chat, edit). Zed: set language_models in settings.json to OpenRouter.

Step 5 — Remote Mac takes over buckets C and E

For models that must run on Apple Silicon but exceed local memory (Qwen3 Coder 72B, Kimi K2.6 distill, larger DeepSeek distills), rent an M4 Max 128GB Mac. Run macMLX or mlx-batch-server on /v1. Connect through an SSH tunnel from the laptop IDE.

Step 6 — Thirty-minute probe + weekly review

Every new model first passes a 30-minute mixed-prompt probe: error rate below 1%, p95 TTFT under 2.5s for completion or 8s for Composer, and per-request cost inside budget. Weekly, review OpenRouter cost / token / error dashboards and reorder route priorities.

# 1. Snapshot the coding chart curl -s "https://openrouter.ai/api/v1/models" \ | jq '.data[] | select(.id|test("coder|code|deepseek-v4|hy3|opus|sonnet|gemini.*flash|kimi")) | {id, pricing, context_length}' \ > /tmp/or-coding-$(date +%Y%m%d).json # 2. Local Qwen3 Coder via MLX on port 8081 mlx_lm.server --model mlx-community/Qwen3-Coder-30B-Instruct-4bit \ --host 127.0.0.1 --port 8081 # 3. Cursor → OpenRouter (Settings → Models → Custom OpenAI) # Base URL: https://openrouter.ai/api/v1 # Models: # deepseek/deepseek-v4-flash ← completion / single-file default # tencent/hy3-preview ← high-throughput cheap fallback # anthropic/claude-sonnet-4.6 ← Composer multi-file # anthropic/claude-opus-4.7 ← deep debugging / architecture # google/gemini-3-flash-preview ← Fallback # 4. Cline config snippet (~/.cline/config.json) { "providers": [{ "id": "openrouter", "apiKey": "$OPENROUTER_KEY", "models": [ {"id": "deepseek/deepseek-v4-flash", "role": "default"}, {"id": "anthropic/claude-sonnet-4.6", "role": "composer"}, {"id": "anthropic/claude-opus-4.7", "role": "deep-debug"} ], "fallback": ["google/gemini-3-flash-preview", "deepseek/deepseek-v3.2"] }] } # 5. Remote Mac SSH tunnel (map remote 8081 to local 8088) ssh -N -L 8088:127.0.0.1:8081 user@your-remote-mac.macgpu.com

6. Three-lane decision matrix: local / remote Mac / OpenRouter API

Coding taskSuggested laneReference modelTypical $/taskKey acceptance
Inline completionLocal MLX (bucket A)Qwen3 Coder 30B 4-bit$0 (marginal)TTFT < 200ms, first-token rate > 99%
Single-file refactorOpenRouter (low D)DeepSeek V4 Flash$0.003–0.01p95 < 4s, diff consistency > 95%
Multi-file ComposerOpenRouter (mid D)Claude Sonnet 4.6$0.10–0.40Multi-file patch pass-rate > 85%
Complex debugging / architectureOpenRouter (high D)Claude Opus 4.7 / GPT-5.5$0.40–1.50SWE-bench Verified self-test > 80%
Nightly batch refactorRemote Mac (bucket C)Qwen3 Coder 72B 4-bit / Kimi K2 distill$0 (node monthly)Batch success > 95%, 6h run no OOM
Agent long-chain / tool callsOpenRouter (bucket E)Kimi K2.6$0.05–0.20Tool-call first-try success > 90%

7. Case study: an 8-person backend team cuts $3,200 to $980 by re-routing

"An 8-person Go and Python backend team ran Cursor with Claude Opus 4.7 as the global default. Month-start invoice landed at $3,200 and was tracking toward $5K. The Tech Lead rebuilt routing against the end-May Programming chart: inline completion to local Qwen3 Coder 30B 4-bit on an M3 Max ($0 marginal); single-file edits to DeepSeek V4 Flash on OpenRouter at $0.14/$0.28; Cursor Composer to Sonnet 4.6; only production bug fixes and cross-module architectural changes routed to Opus 4.7. A week later the monthly run-rate fell to $1,250. They added a MACGPU M4 Max 128GB Mac for nightly batch lint fixes and unit-test generation on Qwen3 Coder 72B 4-bit. Day 30: $980/month total, a 69% saving, with the internal SWE-bench regression set still at 82% pass@1."

The lesson is not "switch to a cheaper model." It is split routing across three axes: usage chart for default value, benchmark chart for the capability ceiling, and Mac local-fit for what gets brought back in-house. The Tech Lead wrote on the team wiki: "The Programming chart tells you who to use day to day. SWE-bench tells you who to call when production is on fire. Unified memory tells you who can move into your laptop." More important, the remote Mac is not a cost trick. It is the engineering pivot that lets you locally host open coding weights you cannot get on OpenRouter while keeping the laptop free for foreground work.

8. Industry insight: the Programming chart ends the single-default-model era

From late 2026 onward, the "one default model in Cursor" era is effectively over. Frontline teams build multi-route architectures aligned against both the OpenRouter Programming chart and SWE-bench Verified. The usage chart decides the day-to-day default, the benchmark chart decides the fire-drill backup, and the price sheet caps per-request spend on every route. Three structural facts underwrite this: capability convergence inside the coding Top 10 sits in a 78%–89% SWE-bench band—a gap below 10 points that most daily tasks cannot feel; 1M context is now standard, so long-repo RAG is no longer an architectural concern; and every major IDE now ships role-based routing (autocomplete / chat / edit / agent) out of the box, so multi-route has no configuration overhead left.

Mac occupies a distinct slot in this architecture. Apple Silicon's unified memory, Metal stack, and round-the-clock stability turn a 30B-to-72B coding model into a viable local inference endpoint. macMLX, mlx-batch-server, and the Ollama MLX backend expose OpenAI-compatible APIs that any IDE can consume. NVIDIA still leads on raw 70B+ training, but when you need to run completion in Cursor during the day, batch lint fixes at night, render UI mockups in ComfyUI, and transcribe a requirements call with Whisper all at the same time, the unified memory model is the engineering pivot. If your laptop runs out of headroom and you do not want to send every completion request to the cloud, the cleanest path is to rent a remote Apple Silicon Mac. MACGPU rents hourly M3 and M4 Max nodes preloaded with macMLX and mlx-batch-server. Connect over SSH and the open coding models the chart promised but your laptop cannot host are suddenly local again.

9. Quotable numbers

1) DeepSeek V4 Flash weekly coding volume: ~4.02T tokens. 2) Hy3 preview weekly coding volume: ~3.48T tokens, new #2. 3) Claude Opus 4.7 SWE-bench Verified: 87.6%; GPT-5.5: 88.7%. 4) Qwen3 Coder 30B 4-bit on M3 Max 64GB at 32K context: peak unified memory ~24GB, decode ~38 tok/s. 5) DeepSeek V4 Flash pricing: $0.14 / $0.28 per million (input/output). 6) Case team monthly cost after routing: $3,200 → $980, 69% saving.

10. FAQ

Is the Programming chart very different from the overall chart? Yes. The overall #1 MiMo-V2-Pro is not even on the Programming chart, while the Programming #1 is DeepSeek V4 Flash. Overall and Programming Top 10s overlap on fewer than half their slots. Can I run DeepSeek V4 Flash locally? No. The 284B/13B MoE needs roughly 150GB of memory even quantized. Substitute Coder V2 Lite or Qwen3 Coder 30B for local use. Can Cursor Composer use V4 Flash? Single-file edits yes, multi-file patch quality drops measurably versus Sonnet 4.6. Keep Sonnet 4.6 for Composer. What coding models suit a remote Mac? Qwen3 Coder 30B/72B, Kimi K2 distill, and DeepSeek Coder V2 variants—open weights too large for a laptop but comfortable inside 64–128GB unified memory at 4-bit. What does MACGPU solve here? Hosting the open coding models that exceed your laptop's memory, running nightly batches, and giving your IDE LAN-style latency over an hourly-rented Apple Silicon node.