2026 Google Gemini CLI: enterprise wall, кризис доверия к open source и survival guide для Mac

В июне 2025 Google выпустил Gemini CLI под Apache 2.0; за год — 100 000+ stars и 6 000+ merged PR. 19 мая 2026 на Google I/O: с 18 июня 2026 Free, Google AI Pro/Ultra и Individuals перестают обслуживаться через Gemini CLI — миграция в closed-source Antigravity (agy), free quota с ~1 000 до 20 вызовов/день (~98 %). На Mac это ~/.gemini/, Agent Skills и terminal pipeline упрётся в auth wall за дни. Вывод по throughput: open license ≠ open service; до cutoff — paid API Key, альтернативные CLI, Metal/MLX offload на remote Mac с измеримым tok/s и p95 latency.

1. Pain points: bait-and-switch в цифрах

(1) Community пишет код, vendor владеет runtime. Репо Apache 2.0, OAuth для физлиц умирает 18.06. Andrea Alberti (27-commit PR в день policy): «работаем бесплатно на код только для enterprise?» (2) «Техническая необходимость» в двух весах. Enterprise Standard/Enterprise держат Gemini CLI + Antigravity; individuals — только миграция. (3) Antigravity и throughput. Closed Go binary; Reddit: Pro лимит после 6–7 prompts — не тянет длинные agent loops. (4) Mac blast radius. Связка с полное руководство Cursor Agent Skills и OpenClaw → перепрошивка auth/base URL за две недели.

Для Metal/MLX: unified memory на M2 Pro 32 GB при Qwen3 32B 4-bit @ 32K уже даёт peak ~22 GB — добавьте Gemini CLI stream + ComfyUI и вы получите thermal throttle раньше, чем исчерпаете «1000 calls/day». Remote Mac M4 Max 128 GB с mlx_lm.server снимает decode с ноутбука; API Key оставляет open-source клиент живым после cutoff.

2. Timeline 2025–2026

Дата	Событие	Метрика
2025-06	Apache 2.0 launch	open contributions
2025-06 – 2026-05	рост	100k+ stars, 6k+ PR
2026-05-19	Google I/O	Antigravity; лимиты Gemini CLI
2026-05-23	backlash	Discussion #27274
2026-05-29	Linux Foundation	isitopen.ai
2026-06-18	cutoff	OAuth path stop

3. Кто теряет доступ

Сегмент	После 18.06	Mac escape
Google AI Free	❌ blocked	свой key / Antigravity
Pro / Ultra	❌ blocked	то же
Code Assist Individuals	❌ blocked	IDE paths
Code Assist GitHub personal	❌ then stop	enterprise exempt
Standard / Enterprise	✅ unchanged	Gemini CLI OK
Paid Gemini / Enterprise API Key	✅ Gemini CLI	primary escape

Christine Hall (FOSS Force): «Google не менял лицензию — отключил инфраструктуру.» Fork без model API = hollow shell.

4. Gemini CLI vs Antigravity — decision matrix

Измерение	Gemini CLI	Antigravity
License	Apache 2.0	Closed
Free quota/day	~1000	~20
Multi-model / ACP	зрелая экосистема	gaps
Project memory	Markdown context	early missing
Enterprise parallel	да	coexist
Agent loop throughput	выше на free tier	упирается в 20/day

5. Пятиступенчатый runbook Mac (до 18.06)

Шаг 1 — Auth inventory

gemini auth status, ~/.gemini/. OAuth умирает; API Key держит open client.

Шаг 2 — API Key + budget cap

AI Studio / Cloud paid key; логируйте $/1M tokens; сравните Claude Code и матрица OpenRouter Mac.

Шаг 3 — Backup configs

~/.gemini/, .agents/skills/, MCP — не удалять до валидации Antigravity import.

Шаг 4 — Parallel CLI

Claude Code, Codex, Cursor models; MLX/Ollama offline baseline — 5 канонических prompt, error rate <1%.

Шаг 5 — Infrastructure register

license, auth owner, quota owner, fork viability — isitopen.ai.

# Auth + smoke (macOS)
gemini auth status
ls -la ~/.gemini/
export GEMINI_API_KEY="***"
gemini -p "ping" --model gemini-2.5-pro

# MLX local baseline
mlx_lm.server --model mlx-community/Qwen3-Coder-30B-4bit \
  --host 127.0.0.1 --port 8081

6. Throughput / cost table

Path	Daily cap	$/M tokens	Metal/remote
OAuth (ends)	1000→0	subscription	—
Antigravity free	~20	$0	binary only
Gemini API Key	your cap	metered	client stays
OpenRouter	dashboard	multi-vendor	CI routing
MLX local	RAM-bound	power	Metal decode
Remote Mac 128GB	queue depth	rental	batch tok/s

7. Open source vs open service + Metal reality

Code on GitHub, brain on vendor API. Partners (Dynatrace, Elastic, Figma, Shopify, Stripe) тоже мигрируют. Hard numbers: 100k+ stars; 6k+ PRs; quota 1000→20; discussion comment 31 downvotes.

На Mac измеряйте: TTFT, decode tok/s, unified memory peak, thermal sustained watts. Bucket A (Qwen3 Coder 30B 4-bit local) — для diff preview; Bucket C (billion-class) — только API или remote 128 GB node.

8. Metal / MLX / remote routing matrix

Workload	Local MLX	API	Remote Mac
Offline diff preview	✅ 30B 4-bit	opt	✅ 128GB
Heavy refactor agent	⚠️ thermal	✅ key	✅ nightly batch
OpenClaw 7×24	❌ laptop	✅ caps	✅ launchd rack
ComfyUI + Agent parallel	⚠️ UM pressure	—	✅ split GPU queue

9. Case study: dual-stack week

«Indie dev, MacBook Pro, 3 SaaS repos: D1 Cloud API key cap $15/day; D2 heavy refactors → Claude Code; D3 CI gemini → OpenRouter; D4 remote Mac Ollama+MLX offline diff, measured decode ~45 tok/s 4-bit; D6 Antigravity only GCP scaffold. 17.06 OAuth dead — 0h terminal downtime, ~9h migration; dropped Antigravity D5 (20 calls/day ceiling).»

Official migration ≠ only path. Sustainable: API Key + multi-CLI + remote Metal throughput.

10. Acceptance, FAQ, KPIs

KPIs: ① 100k+ stars ② 6k+ PRs ③ 98% quota drop ④ ~9h migration ⑤ 20 Antigravity calls/day free.

11. Terminal OAuth → Mac compute node

Long agent sessions saturate unified memory before CLI branding. Windows/Linux mount API keys fine; Xcode, FCP, ComfyUI, launchd OpenClaw still favor macOS. Cursor on laptop; overnight regression + gateway on remote Mac — avoid «OAuth died + thermal throttle» same week.

Policy won't revert fast; dependency graph can. Community guards code; rented Apple Silicon guards 24/7 throughput. Stable remote Agent hosts: MACGPU nodes — light CLI local, heavy queue in rack.

12. Industry shift: stars ≠ sustainable throughput

Gemini CLI is a data point in a larger pattern: OSS repos as talent and marketing funnels, margin on closed runtime and enterprise contracts. For Mac teams, architecture reviews must separate four layers — source license, authentication, model hosting, quota policy. After 18.06., only layer 1 stays stable for individuals if you move layer 2 to paid API keys.

Compare to traditional devtools: formulas remain usable when SaaS ends. Here the client is swappable; billing identity is the lock-in. Measure resilience: can you switch auth owner within 24 h at same error rate and p95 latency? Mature target: <4 h documented failover drill before cutoff.

Partners (Dynatrace, Elastic, Figma, Shopify, Stripe) migrate in parallel — confirms multi-CLI / multi-provider is median architecture in 2026, not exception. Single-vendor OAuth-only paths are concentration risk entries in vendor registers.

13. Metal / MLX deep dive: measurable offload

On M2 Pro 32 GB, Qwen3 Coder 30B 4-bit @ 32K often peaks ~22 GB unified memory — add Gemini stream + ComfyUI and you throttle before quota matters. Split workloads: laptop runs Cursor orchestration; remote M4 Max 128 GB runs mlx_lm.server or macMLX with OpenAI-compatible /v1 over SSH tunnel.

Log every baseline: TTFT (p50/p95), decode tok/s, UM peak GB, sustained power. Acceptance gates for MLX path: pass@1 ≥90 % of current default on 30 canonical tasks; 24 h mixed load error rate <1 %; weekly cost within 110 % of prior API chain at comparable p95. Fail any gate → rollback routing documented in wiki.

API Key path keeps Apache client alive post-cutoff; MLX path reduces egress and variable $/M tokens for diff preview and lint-heavy loops. Use матрица OpenRouter Mac for second-hop provider failover when Gemini rate-limits — primary → fallback chain in Agent config, same pattern as OpenClaw 429 runbooks.

14. Extended FAQ & SRE checklist

Success metrics post-migration? (G1) CLI error rate <1 % / 24 h mixed load; (G2) weekly token spend ±10 % budget; (G3) auth failover drill <4 h. Install Antigravity? Sandbox only — 20 calls/day kills agent loop throughput. GitHub Actions? Rotate to GEMINI_API_KEY, remove OAuth secrets, dry-run before 18.06.

launchd / OpenClaw: night gateways on remote Mac with stable thermals, not sleeping MacBook. MLX vs API: MLX for offline diff and high-frequency short tasks; API for heavy reasoning. Report to leads: median tokens/successful agent run; % runs hitting daily cap (target 0); active CLI paths (target ≥2); days since last failover drill (target <30).

Controversy won't revert quickly; your dependency graph can. Community guards code; rented Apple Silicon guards 24/7 tok/s. MACGPU nodes for rack queue — laptop stays light CLI + Skills.

15. Antigravity traps & 17.06. OAuth expiry (field notes)

Three recurring failures in Discussion #27274: (A) delete ~/.gemini/ before validated import — lose Skills/MCP IDs. (B) Antigravity free for CI — 20 calls/day cannot finish one nightly agent retry loop. (C) Pro/Ultra users expect unlimited OAuth — consumer path ends; Cloud billing with API key continues.

Risk	Early signal	Countermeasure
Auth wall 18.06.	`401` after UTC midnight	Test API key by 17.06. evening
Quota shock	<10 useful prompts on agy	Primary = key + OpenRouter
Memory gap	No Markdown project memory	Archive context in Skills
UM throttle	Fan spike + decode drop	MLX on remote 128 GB node

Google sunset history (Reader,+, Stadia) is a prior for risk scoring, not a prediction. Combine isitopen.ai with internal vendor score: contract term, lock-in depth, failover path count. With полное руководство Cursor Agent Skills, decouple endpoints from secrets — agentskills.io stays vendor-neutral. Treat 18.06. as infra incident: change freeze window, failover drill one week prior, OAuth read-only eve, post-mortem 19.06. on token $, error %, MLX tok/s — then Antigravity as secondary only.

SRE note: log gemini auth status output daily until cutoff; alert on OAuth expiry T-7/T-1; keep mlx_lm.server health on remote node separate from cloud API monitors. Throughput target after migration: maintain ≥95 % of pre-cutoff successful agent runs/week with ≤110 % cost — if not, re-open routing matrix.

16. Primary sources & pre-cutoff validation matrix

Traceability: Google I/O 2026 policy post, GitHub Discussion #27274, Linux Foundation isitopen.ai (29 May 2026 summit), FOSS Force (Christine Hall infrastructure quote). Internal wiki links: полное руководство Cursor Agent Skills, матрица OpenRouter Mac, MACGPU remote node docs for 7×24 gateway load.

Canonical prompt	Pass criteria	Log fields
Lint fix	clean build	tokens, p95 TTFT
Multi-file refactor	tests green	tok/s decode, UM GB
Test generation	coverage +2 %	error code
Docs sync	diff acceptable	$/run
Security scan	no false block	provider id

Only CLI paths with <1 % error rate over 24 h mixed load graduate to «production» before 18.06.; others stay «experimental». Run the matrix on API key path, Claude Code, OpenRouter hop, and MLX local — four rows minimum in your routing spreadsheet. Post-cutoff: weekly review of tok/s on remote node vs $/M on cloud; rebalance when thermal logs show sustained >85 °C on laptop during parallel ComfyUI + Agent.

Small teams (≤5): paid Gemini key + daily cap, OpenRouter fallback, local MLX smoke, documented remote Mac for nightly — four wiki lines, monthly review. Larger teams add vendor scorecard (license, auth, quota, fork, subprocessors) and calendarize 18.06. drill like any infra change. Metal lesson: unified memory is the real quota — plan UM before cloud quota; offload batches to 128 GB remote when M2/M3 peaks exceed 80 % UM during agent+ComfyUI overlap.

17. Benchmark discipline after cutoff

Every Sunday until Q3 2026: rerun the five canonical prompts on API key path, record p50/p95 TTFT, decode tok/s, UM peak, and $/run. Compare against pre-cutoff baseline stored in git-tracked CSV. If cloud p95 TTFT regresses >15 % while MLX remote holds steady, shift batch queues to SSH-tunneled /v1 on MACGPU node. If API costs exceed 110 % budget two weeks running, promote OpenRouter fallback in openclaw.json and demote Antigravity to sandbox-only. Throughput engineering beats policy outrage — measure first, tweet later.

Target after migration: ≥95 % successful agent runs per week at ≤110 % pre-cutoff cost. Miss the target → reopen the Metal/MLX vs API matrix, re-run SSH tunnel health checks, and re-benchmark mlx_lm.server on the remote 128 GB node before the next sprint. Policy threads on GitHub will not restore OAuth; your routing spreadsheet will. Log fan RPM and decode tok/s alongside token bills — thermal throttling is the silent quota on Apple Silicon.