2026 HERMES
SKILLS_
GEPA_
EVOLVE.
In early 2026, Nous Research released Hermes Agent. Within two months it passed 160k GitHub stars, making it one of the fastest-growing open-source agent projects. Its core pitch is not a bigger model but "the agent that grows with you"—an agent that understands you better over time. The foundation is the Skills system: standardized, evolvable, cross-session procedural memory. This guide skips the basics and goes straight to advanced topics: Skills vs Memory vs Prompt, SKILL.md format and three-tier progressive disclosure, Skill Bundles, conditional activation, GEPA+DSPy self-evolution, Tap publishing, and the open ecosystem. Bottom line: write "how to do it" as portable SKILL.md files, load workflows with Bundles in one command, and let GEPA improve skills over time—without touching model weights. Below: concept comparison tables, format deep dive, Bundle/conditional activation, community Taps, GEPA five-stage pipeline, authoring tips, blog workflow case study, FAQ, and resource links.
1. Why Skills Deserve Deep Study
1) Prompts are one-shot: repeating an 800-word deployment runbook every session wastes tokens and drops steps. 2) Memory is facts, not process: knowing you prefer TypeScript does not teach the agent how to open a PR per team policy. 3) Token costs spiral: stuffing every runbook into the system prompt can consume tens of thousands of tokens at Level 0. 4) No cross-platform reuse: each agent stack invents its own config; teams cannot share. Hermes Skills follow the open agentskills.io standard and port across Hermes, Claude Code, Cursor, and OpenCode—making Skills the agent infrastructure bet worth taking in 2026.
2. Core Concepts: Skills ≠ Prompts, Skills ≠ Memory
| Dimension | Plain Prompt | Memory | Skills |
|---|---|---|---|
| Persistence | Current conversation | Cross-session, permanent | Cross-session, permanent |
| Load timing | Always in context | Auto-injected each session | On demand (key difference) |
| Token cost | Every turn | Small and stable | Zero until activated |
| Content type | Any intent description | User preferences / facts | Procedural steps (how to do something) |
| Maintenance | Manual by user | Automatic by agent | User + agent |
| Shareability | Awkward | Private | Publishable as community Tap |
Mnemonic: Prompt = sticky note (valid this turn); Memory = notebook (permanent notes, always nearby); Skill = SOP manual (step-by-step process, opened when needed).
3. SKILL.md Format Deep Dive (agentskills.io Open Standard)
All Hermes Skills follow the agentskills.io spec for cross-agent portability:
3.1 Skill Directory Structure (Modular Design)
3.2 Progressive Disclosure: Three Loading Levels
| Level | Content | Trigger | Token cost |
|---|---|---|---|
| Level 0 | name + description | Session start, all skills | ~3K (all skills combined) |
| Level 1 | Full SKILL.md body | User /skill-name or LLM decides needed | Depends on file length |
| Level 2 | references/ scripts/ files | LLM decides during execution | On demand, per file |
Writing tip: The description field is all Level 0 information—the LLM uses it to decide whether to load the full skill. Clarify when to use more than what it is. Validate with skills-ref validate ./my-skill.
4. Skill Bundles: One Command, Full Workflow
Skill Bundles are a major Hermes 2026 feature. A Bundle is a lightweight YAML file that packs related skills into a single slash command. Running /bundle-name loads every listed skill at once—no per-skill triggers. Location: ~/.hermes/skill-bundles/<slug>.yaml
Advanced examples: An AI researcher workflow can bundle arxiv + deep-research + plan + excalidraw; an MLOps deploy pipeline can bundle vllm + llama-cpp + github-pr-workflow + systematic-debugging.
Bundle priority rules: When a Bundle and a single Skill share a name, the Bundle wins; uninstalled Skills in a Bundle are skipped without error with a missing-skill notice; Bundles do not modify the system prompt, so prompt cache stays valid (token-friendly). Quick CLI creation:
5. Conditional Activation: Environment-Aware Skills
Skills can auto-show or hide based on tool availability in the current session. Configure under metadata.hermes in SKILL.md:
| Field | Behavior |
|---|---|
requires_toolsets | Hide skill when listed toolsets are missing |
requires_tools | Hide skill when listed tools are missing |
fallback_for_toolsets | Hide skill when listed toolsets exist (fallback path) |
fallback_for_tools | Hide skill when listed tools exist (fallback path) |
Classic scenario—free vs paid tool switching: When FIRECRAWL_KEY / BRAVE_SEARCH_KEY are set, paid web_search activates and the DuckDuckGo skill (fallback_for_tools: [web_search]) drops from the prompt, saving tokens. When the API is unavailable, the fallback reappears. Via the hermes skills TUI, you can also toggle individual skills per platform (CLI, Telegram, Discord).
6. Skills Hub and Open Community Ecosystem
| Repository | Description | Highlights |
|---|---|---|
| awesome-hermes-skills | Curated production-grade skills | Deep Research, MLOps, Apple integration; 23 skills with GitHub Copilot |
| hermeshub | Community skill registry | Security scanning, API/marketplace, prompt-injection detection |
| ai-agent-skills | 191 skills, 28 categories | One-click install for Hermes / Claude Code / Cursor |
| hermes-agent | Official main repo | Authoritative source with skill authoring specs |
7. Publish Your Own Skill Tap: Team and Community Sharing
Team deployment flow:
Version control tip: put ~/.hermes/skills/ under Git; across devices run git pull && hermes skills reset to sync and rebuild built-in skills.
8. Self-Evolving Skills: GEPA + DSPy Automatic Improvement
GEPA (Genetic-Pareto Prompt Evolution) is an ICLR 2026 Oral result integrated into hermes-agent-self-evolution. Core idea: no model fine-tuning—analyze execution traces, generate variants, and apply multi-objective Pareto optimization to improve skill text itself. Cost roughly $2–10 per run (API calls only, no GPU).
GEPA five-stage evolution pipeline:
Stage 1 Execution trace collection (SQLite DB, full reasoning traces) → Stage 2 Reflective failure analysis (LLM generates actionable "why it failed" side information) → Stage 3 Targeted mutation (10–20 SKILL.md variants per failure mode) → Stage 4 Multi-objective Pareto evaluation (optimize success rate × token efficiency × speed) → Stage 5 Human PR review (best variant opens a PR; ship after approval).
Four safety guardrails: ① Full test suite pytest tests/ -q must pass 100%; ② Size limits Skills ≤ 15KB, tool descriptions ≤ 500 chars; ③ Prompt cache compatibility; ④ Semantic preservation check (must not drift from original skill purpose).
| Phase | Optimization target | Engine | Status |
|---|---|---|---|
| Phase 1 | Skill files (SKILL.md) | DSPy + GEPA | Implemented |
| Phase 2 | Tool descriptions | DSPy + GEPA | Planned |
| Phase 3 | System prompt fragments | DSPy + GEPA | Planned |
| Phase 4 | Tool implementation code | Darwinian Evolver | Planned |
| Phase 5 | Continuous improvement loop (fully automated) | Automation pipeline | Planned |
9. Plugin Skills: Extending Hermes Boundaries
Plugins pack skills into namespaces (plugin:skill) so they: do not appear in default skills_list (less system-prompt noise); activate only on explicit user call (opt-in); can cross-reference within the plugin. Loading shows sibling skills from the same plugin.
10. Advanced Skill Authoring Tips (Engineer's View)
10.1 description drives activation precision: Bad: "Helps with code." Good: "Use when reviewing a pull request, checking for code quality issues, security vulnerabilities... Do NOT use for writing new code."
10.2 Pitfalls separate good from great: Include concrete failure modes, root-cause analysis, and actionable fixes (fragile CSS selectors, GitHub API rate limits, large diff token overflow, etc.).
10.3 Script when possible: In Procedure, specify the agent runs scripts/extract_schema.py --input $FILE; on failure, load references/manual-extract.md.
10.4 Size control: <500 lines keep everything in SKILL.md; 500–1000 move to references/; >1000 strongly consider splitting; >15KB exceeds GEPA limit and must split.
10.5 skill_manage lets the agent self-maintain:
11. Case Study: Tech Blog Workflow Skills
Custom seo-keyword-research skill: at blog session start, search Chinese long-tail ("X 怎么用", "X 教程") and English long-tail ("X tutorial", "how to X", "X vs Y"), cross-reference Juejin hot lists, Dev.to trending, and HN; output 3–5 primary keywords plus a 10–15 term long-tail matrix. Note that Chinese and English audiences search the same concept differently (e.g. "Agent" vs "智能体" vs "代理").
12. Five-Step Implementation Checklist
Step 1 — Install Hermes Agent and browse official Skills: hermes skills install official/research/arxiv.
Step 2 — Create your first SKILL.md in ~/.hermes/skills/ with a clear description trigger.
Step 3 — Build Bundle YAML for common workflows; use hermes bundles create for quick setup.
Step 4 — Configure conditional activation (free/paid tool fallback) to cut token noise.
Step 5 — Team sharing: create a Tap repo, hermes skills tap add github:your-org/your-skills-tap; advanced users clone self-evolution and run GEPA optimization.
13. FAQ and Citable Numbers
Q: How do Skills differ from MCP? Skills are procedural knowledge documents (teaching the agent how to act); MCP is a tool interface (giving the agent extra tool calls). They complement each other.
Q: Why does the agent use an old Skill after I edited it? Changes do not apply mid-session; run /reset for a new session, or install with --now (invalidates prompt cache).
Q: Are GEPA-evolved skills safe? Four guardrails plus human PR review; semantic drift detection keeps the original purpose intact.
Q: How do I reuse Hermes Skills in Claude Code? Copy SKILL.md to ~/.claude/skills/, or use ai-agent-skills for multi-platform install.
Q: Does Chinese content hurt token efficiency? Chinese chars run ~1–1.5 tokens each, similar to English; keep description in English for sharper LLM matching.
Citable numbers: ① Hermes Agent 160k+ GitHub stars (early 2026, within two months). ② Level 0 for all skills combined ~3K tokens. ③ GEPA single-run cost $2–10 (no GPU). ④ GEPA Skills size limit ≤15KB. ⑤ ai-agent-skills repo 191 cross-platform skills.
14. Further Reading and Resources
Official: Hermes Agent docs · Chinese docs · Skills system · agentskills.io
Open source: hermes-agent-self-evolution · gepa-ai/gepa · stanfordnlp/dspy
Community: SegmentFault Chinese practical guides · Dev.to Self-Improving Agent deep dives · YouTube GEPA & Skill Bundles tutorials
15. Deep Case: Hermes Skills + Remote Mac 7×24 Evolution Loop
"A tech media team wrapped blog writing into a blog-workflow Bundle: local Hermes handles SEO research and outline generation (Level 0 costs only ~3K tokens), while GEPA weekly optimizes outline-generator Pitfalls from real session traces—success rate rose from 72% to 91%, average token use dropped 18%. Heavy scripts (code validation, multilingual publish) run on a remote Mac node over SSH so local unified memory is not consumed by overnight queues. A Tap repo lets an 8-person team hermes skills tap add one-click sync; private skills subscribe via GitHub Token."
This complements our OpenRouter Hermes usage guide and Cursor Agent Skills guide: Cursor Skills handle IDE on-demand loading; Hermes Skills + GEPA close the "gets better with use" loop. Windows and Linux run Hermes CLI fine, but macOS still fits best for parallel Xcode/FCP/ComfyUI work, launchd-resident Gateway, and Metal sidecar inference. When GEPA evolution or Bundle heavy scripts need long test runs or batch renders, laptop unified memory fills fast—Skills define how; a remote Mac defines where.
If you already organize workflows with Hermes Skills and need stable, rentable Apple Silicon for GEPA evaluation, scripts, and 7×24 agents, consider a MACGPU remote Mac node: run evolution evals and batch jobs on dedicated hardware; keep Hermes orchestration and Skill authoring on your laptop—unified memory for thinking, compute for overnight queues.