Hermes Agent — Context Engineering & On-Disk Storage

Date: 2026-06-19 Subject: Nous Research's Hermes Agent (github.com/NousResearch/hermes-agent), analyzed for how it engineers the agent's context ("context engineering") and how/where it persists data on disk. Why: Hermes is a long-lived, self-improving agent runtime with a markdown-and-files memory model that is a close structural cousin to Yggdrasil — a useful reference as we rethink Yggdrasil's structure and current-plan.md.

Source fidelity: Built from a shallow git clone of the repo (174 MB) at /tmp/research/hermes-agent, read via three parallel deep-read passes over the website/docs/ developer + user guides and the core Python (hermes_constants.py, hermes_state.py, run_agent.py, agent/*.py). Fine-grained claims (exact SQL DDL, the fuller ~/.hermes file inventory) were drawn from code/doc reads and then run through a fresh-eyes gap-check pass against the clone — see the Fidelity & corrections section at the end for what was verified, softened, or struck.

One-line orientation. Hermes is "the agent that grows with you": a persistent runtime (CLI + messaging gateways) whose state lives under a single ~/.hermes/ home, whose context each turn is assembled from a layered stable → context → volatile prompt, and which writes its own skills from experience via a background self-improvement fork, then prunes them with a curator.


1. What Hermes is (the shape, briefly)

  • A long-lived runtime, not a one-shot CLI. It runs as a persistent process with a gateway that fans out across messaging platforms (Telegram, Discord, Slack, email, …), plus a local CLI/TUI. It remembers across sessions and "gets more capable the longer it runs."
  • Released Feb 2026; past our Jan 2026 knowledge cutoff — hence researched from source rather than recalled. The cloned source is version 0.16.0 (pyproject.toml). (Release-date/version/star-count trivia from web search — e.g. a "v0.9 everywhere release" and a ~27k-star count — is external recall I could not verify against the clone, and the v0.9 figure is contradicted by the 0.16.0 source; treat that trivia as unconfirmed.)
  • Three pillars relevant to us: (1) a layered context-assembly pipeline; (2) a single on-disk home with a SQLite session spine + markdown memory/skills; (3) a self-improvement learning loop that authors and maintains skills autonomously.

2. Context engineering — how Hermes assembles the agent's context

Hermes builds the system prompt each session as three tiers joined with \n\n, ordered stable → context → volatile. The ordering is deliberate and cache-aware: the stable tier is a long-lived cacheable prefix; the volatile tier is what gets rebuilt on compression. (Assembly entry point: build_system_prompt_parts() in agent/system_prompt.py, returning a dict keyed stable / context / volatile; called from AIAgent._build_system_prompt() in run_agent.py.)

2.1 Tier 1 — STABLE (identity + behavioral guidance; cached, rarely mutated)

In assembly order:

  1. Agent identity~/.hermes/SOUL.md (loaded via load_soul_md() in agent/prompt_builder.py), falling back to a hardcoded DEFAULT_AGENT_IDENTITY if absent. This is slot #1 of the prompt.
  2. Tool-aware behavior guidance — a set of named constant blocks: memory guidance, session-search guidance, parallel-tool-call guidance, skills guidance, a Hermes-help pointer.
  3. Task-completion guidance — anti-fabrication / use-the-tools directives (config-gated).
  4. Model-specific operational guidance — different nudges for Gemini vs. GPT/Codex vs. Grok (e.g., "execute, don't describe" for models prone to narrating tool use).
  5. Skills index — a structured list of available skills (name + category + short description), built by build_skills_system_prompt(). This is the menu; full skill bodies load on demand (see §3.4 progressive disclosure).
  6. Environment hints — WSL/Termux detection, Python toolchain probe, etc.
  7. Coding posture + git-status snapshot — cached once per session (agent/coding_context.py).
  8. Active-profile hint and platform hint (per-channel style: e.g. CLI "avoid Markdown, use simple text", Telegram "lean into rich Markdown", etc., overridable in config).

2.2 Tier 2 — CONTEXT (project instructions; loaded at startup, held for the session)

  • One project context file, first match wins (priority order): .hermes.md / HERMES.md (walks up to git root) → AGENTS.md (cwd) → CLAUDE.md (cwd) → .cursorrules / .cursor/rules/*.mdc (cwd). Injected under a # Project Context header. Hermes natively reads AGENTS.md and CLAUDE.md — the same convention files we use.
  • Progressive subdirectory discovery. Only the root context file enters the cached system prompt at start. As the agent navigates into subdirectories, a SubdirectoryHintTracker (explicitly modeled on Block/goose's tracker of the same name) discovers nested convention files (here it checks AGENTS.md/CLAUDE.md/.cursorrules — note: not .hermes.md — and loads all it finds in a dir, once per directory) and injects that content into the tool result, not the system prompt — specifically to avoid breaking the prompt cache. (This is a clever pattern worth noting for us: lazy, navigation-triggered context loading that keeps the cacheable prefix stable.)
  • All context files are security-scanned for prompt injection and truncated (~20K chars, head/tail with a marker) before inclusion.

2.3 Tier 3 — VOLATILE (memory snapshots + timestamp; rebuilt on compression)

  1. Memory snapshot~/.hermes/memories/MEMORY.md, a frozen snapshot injected under ## Persistent Memory. Frozen = loaded once at session start; mid-session writes hit disk immediately but don't re-enter the live prompt until the next session, which preserves the prompt cache. The prompt shows a usage gauge (e.g., MEMORY [67% — 1,474/2,200 chars]).
  2. User-profile snapshot~/.hermes/memories/USER.md, injected under ## User Profile.
  3. External memory-provider block — if a provider plugin (e.g., Honcho) is active.
  4. Timestamp + session metadata (current time w/ tz, session id, model/provider).

2.4 Identity: SOUL.md vs. personality vs. AGENTS.md

A clean separation worth stealing conceptually:

Layer File / mechanism Scope Holds
Identity ~/.hermes/SOUL.md Global to this Hermes instance Tone, voice, directness, how to handle ambiguity — who it is
Personality /personality slash command Session-temporary overlay Built-in presets (concise, technical, teacher, pirate…)
Project context AGENTS.md / CLAUDE.md Per-project Conventions, paths, architecture — how this repo works

Rule of thumb in their docs: applies everywhere → SOUL.md; belongs to one project → AGENTS.md. This is directly analogous to our personal layer (CLAUDE.md) vs. project .claude/ split — Hermes draws the same line, with SOUL.md ≈ the personal identity layer and AGENTS.md ≈ the project layer.

2.5 Keeping a long-lived agent inside the context window — compression & caching

Two independent compression layers:

  • Agent-loop ContextCompressor (primary; agent/context_compressor.py) fires at a configurable threshold (default 0.50 of context) using accurate API-reported token counts. It is an implementation of a pluggable ContextEngine ABC (agent/context_engine.py) — only one engine is active at a time; alternatives can be dropped in under plugins/context_engine/<name>/.
  • Gateway session hygiene (safety net; gateway/run.py) fires at a fixed 85% before the agent processes an inbound message — catches sessions that accumulated overnight on a chat platform.

The compression algorithm (the part most relevant to our "summarize-the-middle, protect-the-ends" instincts):

  1. Prune old tool results (cheap, no LLM): tool outputs >200 chars outside the protected tail are replaced with [Old tool output cleared to save context space].
  2. Boundaries: protect the first N (default 3) and a token-budgeted tail (≥ protect_last_n, default 20); the middle is summarized. Boundaries are aligned so a tool_call/tool_result pair is never split.
  3. Structured summary via an auxiliary model, against a fixed template. The literal headings are: Active Task · Goal · Constraints & Preferences · Completed Actions · Active State · In Progress · Blocked · Key Decisions · Resolved Questions · Pending Asks · Relevant Files · Remaining Work · Critical Context. Crucially, re-compression is iterative — the previous summary is fed back in with update instructions so detail survives repeated compactions.
  4. Reassemble: head + summary + untouched tail; orphaned tool pairs sanitized.

Note the resonance with Yggdrasil's checkpoint. That summary template is almost exactly the shape of our ## Current Checkpoint (Just finished / In progress / Next step / References). Hermes generates it automatically on context pressure; we hand-author it at session end via /save-progress. Same artifact, different trigger.

Prompt caching (Anthropic): a "system_and_3" strategy — cache the system prompt + a rolling 3-message window (4 breakpoints max), TTL configurable (default 5m). The whole tiering exists to keep that cacheable prefix stable; compression appends its note only to the first message so the system prompt cache survives.

2.6 The agent loop (one turn)

append user msg → build/reuse cached system prompt → preflight compression check → convert history to provider format (OpenAI/Anthropic/Codex adapters) → inject ephemeral overlays → apply cache markers → interruptible API call → if tool_calls: execute (concurrent via thread pool, order preserved) and loop; else persist session + flush memory + return. Six tools (todo, session_search, memory, clarify, read_terminal, delegate_task) are intercepted agent-side before normal tool dispatch. Iteration budget defaults to 90 turns; subagents (delegate_task) get independent, smaller budgets (default 50).


3. On-disk storage — how/where Hermes persists

3.1 The home root and the profile model

  • Root: ~/.hermes on POSIX (%LOCALAPPDATA%\hermes on native Windows), overridable by the HERMES_HOME env var. Resolution lives in hermes_constants.py via get_hermes_home() — called pervasively so that one env var re-scopes the entire install.
  • Profiles: ~/.hermes/profiles/<name>/ is a fully isolated instance (its own config.yaml, .env, SOUL.md, state.db, memories/, skills/, logs/, …). An active_profile marker file tracks the current one. HERMES_HOME pointed at a profile dir makes everything scope to it. This is essentially our "layered configuration" idea expressed as on-disk directory scoping.

3.2 Annotated ~/.hermes layout (the parts that matter to us)

~/.hermes/
├── config.yaml          # main config (YAML; ${ENV} substitution)
├── .env                 # API keys / secrets (KEY=VALUE)
├── SOUL.md              # agent identity (prompt slot #1)
├── state.db (+ -wal/-shm)   # SQLite session spine (WAL mode)
├── memories/
│   ├── MEMORY.md        # agent memory — §-delimited entries, ~2,200 char cap
│   └── USER.md          # user profile — §-delimited entries, ~1,375 char cap
├── skills/              # bundled + hub-installed + agent-authored skills (see §3.4)
│   ├── .usage.json      # curator telemetry (per-skill use/view/patch counts, state)
│   ├── .bundled_manifest    # content hashes of bundled skills (user-edit detection)
│   ├── .archive/        # curator-archived skills (recoverable)
│   └── .hub/            # hub-installed skill registry + audit log
├── checkpoints/         # opt-in file-rollback store (a shared *bare git repo*, content-addressed)
├── cron/                # scheduled jobs (jobs.json + per-job output/)
├── logs/                # rotated logs, incl. logs/curator/<run>/REPORT.md
├── plugins/             # memory- and context-engine plugins
└── profiles/<name>/     # isolated per-profile copies of all of the above

(The clone contains many more leaf files — OAuth tokens, per-platform pairing JSON, media/audio caches, a kanban DB. Catalogued but elided here as not relevant to the Yggdrasil comparison.)

3.3 Session storage — SQLite spine with full-text search

This is the biggest mechanical contrast with Yggdrasil (we keep everything in flat markdown; Hermes keeps conversation history in a database):

  • ~/.hermes/state.db — SQLite in WAL mode, schema-versioned, with short lock timeouts + application-level retry/backoff and periodic checkpointing (it's a multi-platform gateway, so concurrent writers are real).
  • sessions table — one row per session: source (cli/telegram/discord/…), model, a snapshot of the system prompt at session start, token/cost accounting, a unique human-readable title, and a parent_session_id for lineage.
  • messages table — full per-message history (role, content, tool_calls, reasoning, token counts).
  • messages_fts (FTS5) + a trigram variant — full-text + substring/CJK search over history, exposed to the agent as a session_search tool. The agent can grep its own past.
  • Session lineage: when compression splits a session, the old one is closed (end_reason="compression") and a child is created with parent_session_id set; titles chain "my project""my project #2"#3, and resuming by title jumps to the newest in the chain.

Yggdrasil parallel: their session lineage is exactly our "rename current-plan.mdYYYY-MM-DD-handoff.md and start fresh" versioning note — but automated, with parent pointers and searchable history, instead of a manual file rename.

3.4 Skills on disk — the self-authored knowledge store

  • Format: a skill is a directory skills/<category>/<skill-name>/ containing SKILL.md (YAML frontmatter + markdown body) plus optional references/, templates/, scripts/, assets/. The body convention is When to Use · Quick Reference · Procedure · Pitfalls · Verification — i.e., the same "procedure + pitfalls + verification" shape our own skills trend toward.
  • Frontmatter carries name/description/version, plus a rich metadata.hermes block: tags, related_skills, conditional activation (requires_toolsets / requires_tools / fallback_for_* — show/hide a skill based on what tools are present), declared config keys, and a blueprint (turn a skill into a scheduled automation). There's also a platforms: OS gate.
  • Search path / shadowing: local ~/.hermes/skills/ is always first; configured external_dirs follow; a local skill shadows an external one of the same name. Bundled skills resolve from a separate path (env override → wheel-install → source checkout → ~/.hermes/skills fallback).
  • Progressive disclosure (3 levels): skills_list() (names+descriptions; cheap index) → skill_view(name) (full SKILL.md) → skill_view(name, file_path) (a specific reference file). The agent pays tokens only for what it opens. Every skill is also auto-bound to a /skill-name slash command, and skills can be grouped into bundles.

3.5 Memory store — markdown with a hard cap and no auto-compact

  • Two files, MEMORY.md (agent) and USER.md (user), entries delimited by §, with character caps (not token caps — model-independent).
  • No auto-compaction. When a write would exceed the cap, the memory tool returns an error with the current entries, and the agent must consolidate/remove in the same turn before retrying. This is a deliberate "force the agent to curate rather than silently drop" design — the same anti-rot instinct behind our hygiene system, enforced at the tool boundary.
  • Entries are security-scanned at load (injection/exfil patterns, invisible-Unicode stripping).

3.6 Checkpoints / rollback & config

  • checkpoints/ is an opt-in (off by default) file-level undo: a single shared bare git repo at checkpoints/store/ with per-project history kept as refs (refs/hermes/<project-hash>, the hash derived from the working-dir path), snapshotting before destructive terminal commands (rm/mv/cp/sed -i/dd/output redirects >/git reset|clean|checkout…). /rollback restores. (Conceptually our worktree discipline, but at file-snapshot granularity and git-backed under the hood.)
  • config.yaml centralizes everything: model/provider, terminal backend, compression thresholds, memory caps, skill write_approval, checkpoint and session retention, curator settings.

4. The self-improvement learning loop (the "growing" part)

This is the piece with no direct Yggdrasil analog yet, and the most interesting to study.

  • Background review fork. After turns complete, Hermes can spawn a daemon-thread fork of the agent that replays the conversation snapshot with a whitelist of just the memory + skills toolsets (i.e. the memory tool plus skill_manage/skills_list/skill_view — it can read its skills, not only write them; everything else is denied) and decides whether to capture a skill or memory. It runs on the parent's live model/credentials/cache and never touches the main session's prompt cache. Writes are tagged with a background-review origin.
  • What warrants a skill (their explicit signals): the user corrected your style/tone/verbosity (treated as first-class — "stop being so verbose" becomes a durable skill edit); the user corrected your workflow; a non-trivial technique/fix; or a loaded skill was wrong/outdated (patch it now).
  • Action preference order (apply the earliest that fits): patch a currently-loaded skill → update an existing umbrella skill → add a references//templates//scripts/ support file → only then create a new class-level skill. New-skill names must be class-level — never a PR number, error string, or "fix-X-today" artifact. There's an explicit do-not-capture list (environment- dependent failures, negative tool claims, transient errors) so the agent doesn't ossify one-off breakage into permanent self-imposed constraints.
  • skill_manage tool gives the agent a CRUD API over its own skills (create/patch/edit/write_file/ remove_file/delete), with an optional write_approval staging mode (~/.hermes/pending/skills/, reviewed via /skills approve|reject).

4.1 The curator — background pruning of self-authored skills

  • An inactivity-triggered background task (no cron daemon): runs when enough time has passed (interval_hours, default 168 = 7d) and the agent's been idle (min_idle_hours, default 2).
  • Phase 1 (deterministic, always): age-based lifecycle — active (0–30d unused) → stale (30–90d) → archived (90+d, moved to .archive/, never deleted, recoverable). Pinned skills are exempt.
  • Phase 2 (LLM, opt-in consolidate: true): a fork surveys agent-created skills and may patch drift or merge overlapping ones into class-level umbrellas.
  • Scope guard: the curator only touches skills marked created_by: "agent" in .usage.json — hand-written and bundled/hub skills are left alone. Every real pass takes a tar.gz backup first (.curator_backups/) and writes a logs/curator/<run>/REPORT.md.

This is a near-perfect mirror of our /hygiene-check + ledger + archive/ system — cadence-gated, age-banded, archive-not-delete, backup-before-prune, report-after — but Hermes applies it to machine-authored skills and triggers it on idle time, where ours applies to bookmarks/backburner and triggers on session end / on demand. The convergence is striking and validates our design.


5. Yggdrasil-relevant observations

Pulling the threads together — where Hermes is a useful mirror, and where it diverges:

Strong parallels (independent convergence on the same ideas):

  1. Layered identity vs. project context — SOUL.md/AGENTS.md ≈ our personal-CLAUDE.md/project-.claude.
  2. Single scoped home, env-var re-rootedHERMES_HOME + profiles ≈ our layered config + ~/.claude wiring; profiles are isolation we achieve with separate repos.
  3. Cadence-gated, archive-not-delete hygiene — the curator ≈ /hygiene-check + ledger + archive/.
  4. The checkpoint summary template — their compression summary (Goal/Progress/Decisions/Next Steps) ≈ our ## Current Checkpoint. They auto-generate; we hand-author.
  5. Session lineage / handoff versioningparent_session_id chains ≈ our handoff-file rename note.
  6. Skill body shape — When-to-Use / Procedure / Pitfalls / Verification ≈ our SKILL.md conventions.
  7. Read-only-by-default + staged writeswrite_approval staging ≈ our prompt-on-write posture and puppet read-only gate.

Where Hermes goes further (candidate ideas, not endorsements):

  • A database spine for history (state.db + FTS5 + session_search) instead of flat markdown — the agent can full-text-search its own past sessions. Our equivalent is grepping markdown; a searchable history is a different scale.
  • Automated context assembly with explicit tiers + cache-aware ordering — we assemble context implicitly via what's in CLAUDE.md/current-plan.md; Hermes has a named, ordered, cache-conscious pipeline. The subdirectory-hint-into-tool-result trick (lazy context that doesn't bust the cache) is a genuinely novel pattern.
  • Self-authored skills + the background review fork — the agent writing/patching its own skills from conversational feedback. This is the headline capability we don't have; whether we'd want it is a values question (it trades dogfooded human-in-the-loop authorship for autonomy).

Where Yggdrasil deliberately diverges (and why that's fine):

  • Hermes is built for autonomy + always-on multi-platform operation; Yggdrasil is built for human-in-the-loop, save-and-resume, low-demand-character work. Many Hermes mechanisms (auto-skill- creation, goals-that-loop-until-done, cron self-direction) sit on the autonomy side of a line we've intentionally drawn. The lesson is in the structures (tiered context, the curator, the checkpoint template, the home-scoping), not the autonomy.

Most directly applicable to the current-plan.md refactor specifically:

  • Hermes proves out the idea of splitting one giant living doc into a small always-loaded core + on-demand detail. Their analog: a tiny cached system prompt + a skills index (names only) + progressive skill_view on demand; a frozen memory snapshot with a visible size gauge; searchable history in a DB rather than inline. Our 133 KB current-plan.md loaded every session is exactly the anti-pattern their progressive-disclosure + frozen-snapshot + searchable-history design avoids. The refactor could borrow: (a) a small core "checkpoint" that's always loaded, (b) a menu of detail loaded on demand, (c) history moved out of the always-loaded path into a searchable archive.

6. Fidelity & corrections (gap-check pass)

A fresh-eyes subagent independently re-verified every claim above against the cloned source (hermes_constants.py, hermes_state.py, agent/*.py, and the website/docs/ guides), citing files and line numbers. The report held up well — the three-tier context model, SOUL.md as slot #1, the context-file priority order, the subdir-hint-into-tool-result pattern, compression thresholds (0.50 agent / 0.85 gateway) and protect-first-3/last-20, the [Old tool output cleared…] placeholder, iterative re-compression, system_and_3 caching, the full SQLite schema (sessions/messages/FTS5/trigram/ session_search/lineage), memory caps + § delimiter + error-on-overflow, the skills layout and body convention, the skill action-preference order, and the entire curator spec (168h interval / 2h idle / 30d stale / 90d archive / created_by:"agent" scope / backup-before-pass / REPORT.md) all verified against source.

Corrections already folded into the body above (logged here for the trail):

  1. Session-split end_reason — was written as "compression_split"; the source value is plain "compression" (compression_split does not exist; a separate "orphaned_compression" covers an error-recovery case). Fixed in §3.3.
  2. Background-review tool whitelist — was "only two tools (memory + skill_manage)"; it's actually the memory + skills toolsets, and skills includes skills_list + skill_view, so the fork can read skills too. Fixed in §4.
  3. Version/release trivia — the "v0.9 / Apr 2026 / ~27k stars" web-recall is unverifiable from the clone and contradicted by the source version 0.16.0. Softened/flagged in §1.
  4. Checkpoints path & triggers — the bare repo is at checkpoints/store/ (not checkpoints/), and the documented snapshot triggers are destructive terminal commands (rm/mv/sed -i/redirects/ git reset…), not specifically write_file/patch. Fixed in §3.6.
  5. Compression summary template — the report's "Progress (Done/In Progress/Blocked) / Next Steps" was a paraphrase; the literal headings (Completed Actions / Active State / In Progress / Blocked / … / Remaining Work) are now used. Fixed in §2.5.
  6. Minor: the agent-side intercepted-tool list was missing clarify and read_terminal (6 total, now listed); the Telegram platform-hint example was backwards ("lean into Markdown," not "short messages"); the "~3k tokens" figure for skills_list was an unsourced estimate (removed). Fixed in §2.1/2.2/2.6/3.4.

Residual caveats: the SQLite DDL in §3.3 is described in prose (paraphrased, not reproduced verbatim) — accurate per the gap-check, but if exact column names/types are ever needed, read hermes_state.py:514-570 from source. The ContextEngine ABC base-class defaults (0.75 / 3 / 6) differ from the active ContextCompressor (0.50 / 3 / 20) cited above — §2.5 reflects the active engine, which is what runs.