Hermes Agent — Context Engineering & On-Disk Storage

Date: 2026-06-19 Subject: Nous Research's Hermes Agent (github.com/NousResearch/hermes-agent), analyzed for how it engineers the agent's context ("context engineering") and how/where it persists data on disk. Why: Hermes is a long-lived, self-improving agent runtime with a markdown-and-files memory model that is a close structural cousin to Yggdrasil — a useful reference as we rethink Yggdrasil's structure and current-plan.md.

Source fidelity: Built from a shallow git clone of the repo (174 MB) at /tmp/research/hermes-agent, read via three parallel deep-read passes over the website/docs/ developer + user guides and the core Python (hermes_constants.py, hermes_state.py, run_agent.py, agent/*.py). Fine-grained claims (exact SQL DDL, the fuller ~/.hermes file inventory) were drawn from code/doc reads and then run through a fresh-eyes gap-check pass against the clone — see the Fidelity & corrections section at the end for what was verified, softened, or struck.

One-line orientation. Hermes is "the agent that grows with you": a persistent runtime (CLI + messaging gateways) whose state lives under a single ~/.hermes/ home, whose context each turn is assembled from a layered stable → context → volatile prompt, and which writes its own skills from experience via a background self-improvement fork, then prunes them with a curator.

1. What Hermes is (the shape, briefly)

A long-lived runtime, not a one-shot CLI. It runs as a persistent process with a gateway that fans out across messaging platforms (Telegram, Discord, Slack, email, …), plus a local CLI/TUI. It remembers across sessions and "gets more capable the longer it runs."
Released Feb 2026; past our Jan 2026 knowledge cutoff — hence researched from source rather than recalled. The cloned source is version 0.16.0 (pyproject.toml). (Release-date/version/star-count trivia from web search — e.g. a "v0.9 everywhere release" and a ~27k-star count — is external recall I could not verify against the clone, and the v0.9 figure is contradicted by the 0.16.0 source; treat that trivia as unconfirmed.)
Three pillars relevant to us: (1) a layered context-assembly pipeline; (2) a single on-disk home with a SQLite session spine + markdown memory/skills; (3) a self-improvement learning loop that authors and maintains skills autonomously.

2. Context engineering — how Hermes assembles the agent's context

Hermes builds the system prompt each session as three tiers joined with \n\n, ordered stable → context → volatile. The ordering is deliberate and cache-aware: the stable tier is a long-lived cacheable prefix; the volatile tier is what gets rebuilt on compression. (Assembly entry point: build_system_prompt_parts() in agent/system_prompt.py, returning a dict keyed stable / context / volatile; called from AIAgent._build_system_prompt() in run_agent.py.)

2.1 Tier 1 — STABLE (identity + behavioral guidance; cached, rarely mutated)

In assembly order:

Agent identity — ~/.hermes/SOUL.md (loaded via load_soul_md() in agent/prompt_builder.py), falling back to a hardcoded DEFAULT_AGENT_IDENTITY if absent. This is slot #1 of the prompt.
Tool-aware behavior guidance — a set of named constant blocks: memory guidance, session-search guidance, parallel-tool-call guidance, skills guidance, a Hermes-help pointer.
Task-completion guidance — anti-fabrication / use-the-tools directives (config-gated).
Model-specific operational guidance — different nudges for Gemini vs. GPT/Codex vs. Grok (e.g., "execute, don't describe" for models prone to narrating tool use).
Skills index — a structured list of available skills (name + category + short description), built by build_skills_system_prompt(). This is the menu; full skill bodies load on demand (see §3.4 progressive disclosure).
Environment hints — WSL/Termux detection, Python toolchain probe, etc.
Coding posture + git-status snapshot — cached once per session (agent/coding_context.py).
Active-profile hint and platform hint (per-channel style: e.g. CLI "avoid Markdown, use simple text", Telegram "lean into rich Markdown", etc., overridable in config).

2.2 Tier 2 — CONTEXT (project instructions; loaded at startup, held for the session)

One project context file, first match wins (priority order): .hermes.md / HERMES.md (walks up to git root) → AGENTS.md (cwd) → CLAUDE.md (cwd) → .cursorrules / .cursor/rules/*.mdc (cwd). Injected under a # Project Context header. Hermes natively reads AGENTS.md and CLAUDE.md — the same convention files we use.
Progressive subdirectory discovery. Only the root context file enters the cached system prompt at start. As the agent navigates into subdirectories, a SubdirectoryHintTracker (explicitly modeled on Block/goose's tracker of the same name) discovers nested convention files (here it checks AGENTS.md/CLAUDE.md/.cursorrules — note: not .hermes.md — and loads all it finds in a dir, once per directory) and injects that content into the tool result, not the system prompt — specifically to avoid breaking the prompt cache. (This is a clever pattern worth noting for us: lazy, navigation-triggered context loading that keeps the cacheable prefix stable.)
All context files are security-scanned for prompt injection and truncated (~20K chars, head/tail with a marker) before inclusion.

2.3 Tier 3 — VOLATILE (memory snapshots + timestamp; rebuilt on compression)

Memory snapshot — ~/.hermes/memories/MEMORY.md, a frozen snapshot injected under ## Persistent Memory. Frozen = loaded once at session start; mid-session writes hit disk immediately but don't re-enter the live prompt until the next session, which preserves the prompt cache. The prompt shows a usage gauge (e.g., MEMORY [67% — 1,474/2,200 chars]).
User-profile snapshot — ~/.hermes/memories/USER.md, injected under ## User Profile.
External memory-provider block — if a provider plugin (e.g., Honcho) is active.
Timestamp + session metadata (current time w/ tz, session id, model/provider).

2.4 Identity: SOUL.md vs. personality vs. AGENTS.md

A clean separation worth stealing conceptually:

Layer	File / mechanism	Scope	Holds
Identity	`~/.hermes/SOUL.md`	Global to this Hermes instance	Tone, voice, directness, how to handle ambiguity — who it is
Personality	`/personality` slash command	Session-temporary overlay	Built-in presets (concise, technical, teacher, pirate…)
Project context	`AGENTS.md` / `CLAUDE.md`	Per-project	Conventions, paths, architecture — how this repo works

Rule of thumb in their docs: applies everywhere → SOUL.md; belongs to one project → AGENTS.md. This is directly analogous to our personal layer (CLAUDE.md) vs. project .claude/ split — Hermes draws the same line, with SOUL.md ≈ the personal identity layer and AGENTS.md ≈ the project layer.

2.5 Keeping a long-lived agent inside the context window — compression & caching

Two independent compression layers:

Agent-loop ContextCompressor (primary; agent/context_compressor.py) fires at a configurable threshold (default 0.50 of context) using accurate API-reported token counts. It is an implementation of a pluggable ContextEngine ABC (agent/context_engine.py) — only one engine is active at a time; alternatives can be dropped in under plugins/context_engine/<name>/.
Gateway session hygiene (safety net; gateway/run.py) fires at a fixed 85% before the agent processes an inbound message — catches sessions that accumulated overnight on a chat platform.

The compression algorithm (the part most relevant to our "summarize-the-middle, protect-the-ends" instincts):

Prune old tool results (cheap, no LLM): tool outputs >200 chars outside the protected tail are replaced with [Old tool output cleared to save context space].
Boundaries: protect the first N (default 3) and a token-budgeted tail (≥ protect_last_n, default 20); the middle is summarized. Boundaries are aligned so a tool_call/tool_result pair is never split.
Structured summary via an auxiliary model, against a fixed template. The literal headings are: Active Task · Goal · Constraints & Preferences · Completed Actions · Active State · In Progress · Blocked · Key Decisions · Resolved Questions · Pending Asks · Relevant Files · Remaining Work · Critical Context. Crucially, re-compression is iterative — the previous summary is fed back in with update instructions so detail survives repeated compactions.
Reassemble: head + summary + untouched tail; orphaned tool pairs sanitized.

Note the resonance with Yggdrasil's checkpoint. That summary template is almost exactly the shape of our ## Current Checkpoint (Just finished / In progress / Next step / References). Hermes generates it automatically on context pressure; we hand-author it at session end via /save-progress. Same artifact, different trigger.

Prompt caching (Anthropic): a "system_and_3" strategy — cache the system prompt + a rolling 3-message window (4 breakpoints max), TTL configurable (default 5m). The whole tiering exists to keep that cacheable prefix stable; compression appends its note only to the first message so the system prompt cache survives.

2.6 The agent loop (one turn)

append user msg → build/reuse cached system prompt → preflight compression check → convert history to provider format (OpenAI/Anthropic/Codex adapters) → inject ephemeral overlays → apply cache markers → interruptible API call → if tool_calls: execute (concurrent via thread pool, order preserved) and loop; else persist session + flush memory + return. Six tools (todo, session_search, memory, clarify, read_terminal, delegate_task) are intercepted agent-side before normal tool dispatch. Iteration budget defaults to 90 turns; subagents (delegate_task) get independent, smaller budgets (default 50).

3. On-disk storage — how/where Hermes persists

3.1 The home root and the profile model

Root: ~/.hermes on POSIX (%LOCALAPPDATA%\hermes on native Windows), overridable by the HERMES_HOME env var. Resolution lives in hermes_constants.py via get_hermes_home() — called pervasively so that one env var re-scopes the entire install.
Profiles: ~/.hermes/profiles/<name>/ is a fully isolated instance (its own config.yaml, .env, SOUL.md, state.db, memories/, skills/, logs/, …). An active_profile marker file tracks the current one. HERMES_HOME pointed at a profile dir makes everything scope to it. This is essentially our "layered configuration" idea expressed as on-disk directory scoping.

3.2 Annotated `~/.hermes` layout (the parts that matter to us)

~/.hermes/
├── config.yaml          # main config (YAML; ${ENV} substitution)
├── .env                 # API keys / secrets (KEY=VALUE)
├── SOUL.md              # agent identity (prompt slot #1)
├── state.db (+ -wal/-shm)   # SQLite session spine (WAL mode)
├── memories/
│   ├── MEMORY.md        # agent memory — §-delimited entries, ~2,200 char cap
│   └── USER.md          # user profile — §-delimited entries, ~1,375 char cap
├── skills/              # bundled + hub-installed + agent-authored skills (see §3.4)
│   ├── .usage.json      # curator telemetry (per-skill use/view/patch counts, state)
│   ├── .bundled_manifest    # content hashes of bundled skills (user-edit detection)
│   ├── .archive/        # curator-archived skills (recoverable)
│   └── .hub/            # hub-installed skill registry + audit log
├── checkpoints/         # opt-in file-rollback store (a shared *bare git repo*, content-addressed)
├── cron/                # scheduled jobs (jobs.json + per-job output/)
├── logs/                # rotated logs, incl. logs/curator/<run>/REPORT.md
├── plugins/             # memory- and context-engine plugins
└── profiles/<name>/     # isolated per-profile copies of all of the above

(The clone contains many more leaf files — OAuth tokens, per-platform pairing JSON, media/audio caches, a kanban DB. Catalogued but elided here as not relevant to the Yggdrasil comparison.)

3.3 Session storage — SQLite spine with full-text search

This is the biggest mechanical contrast with Yggdrasil (we keep everything in flat markdown; Hermes keeps conversation history in a database):

~/.hermes/state.db — SQLite in WAL mode, schema-versioned, with short lock timeouts + application-level retry/backoff and periodic checkpointing (it's a multi-platform gateway, so concurrent writers are real).
sessions table — one row per session: source (cli/telegram/discord/…), model, a snapshot of the system prompt at session start, token/cost accounting, a unique human-readable title, and a parent_session_id for lineage.
messages table — full per-message history (role, content, tool_calls, reasoning, token counts).
messages_fts (FTS5) + a trigram variant — full-text + substring/CJK search over history, exposed to the agent as a session_search tool. The agent can grep its own past.
Session lineage: when compression splits a session, the old one is closed (end_reason="compression") and a child is created with parent_session_id set; titles chain "my project" → "my project #2" → #3, and resuming by title jumps to the newest in the chain.

Yggdrasil parallel: their session lineage is exactly our "rename current-plan.md → YYYY-MM-DD-handoff.md and start fresh" versioning note — but automated, with parent pointers and searchable history, instead of a manual file rename.

3.4 Skills on disk — the self-authored knowledge store

Format: a skill is a directory skills/<category>/<skill-name>/ containing SKILL.md (YAML frontmatter + markdown body) plus optional references/, templates/, scripts/, assets/. The body convention is When to Use · Quick Reference · Procedure · Pitfalls · Verification — i.e., the same "procedure + pitfalls + verification" shape our own skills trend toward.
Frontmatter carries name/description/version, plus a rich metadata.hermes block: tags, related_skills, conditional activation (requires_toolsets / requires_tools / fallback_for_* — show/hide a skill based on what tools are present), declared config keys, and a blueprint (turn a skill into a scheduled automation). There's also a platforms: OS gate.
Search path / shadowing: local ~/.hermes/skills/ is always first; configured external_dirs follow; a local skill shadows an external one of the same name. Bundled skills resolve from a separate path (env override → wheel-install → source checkout → ~/.hermes/skills fallback).
Progressive disclosure (3 levels): skills_list() (names+descriptions; cheap index) → skill_view(name) (full SKILL.md) → skill_view(name, file_path) (a specific reference file). The agent pays tokens only for what it opens. Every skill is also auto-bound to a /skill-name slash command, and skills can be grouped into bundles.

3.5 Memory store — markdown with a hard cap and no auto-compact

Two files, MEMORY.md (agent) and USER.md (user), entries delimited by §, with character caps (not token caps — model-independent).
No auto-compaction. When a write would exceed the cap, the memory tool returns an error with the current entries, and the agent must consolidate/remove in the same turn before retrying. This is a deliberate "force the agent to curate rather than silently drop" design — the same anti-rot instinct behind our hygiene system, enforced at the tool boundary.
Entries are security-scanned at load (injection/exfil patterns, invisible-Unicode stripping).

3.6 Checkpoints / rollback & config

checkpoints/ is an opt-in (off by default) file-level undo: a single shared bare git repo at checkpoints/store/ with per-project history kept as refs (refs/hermes/<project-hash>, the hash derived from the working-dir path), snapshotting before destructive terminal commands (rm/mv/cp/sed -i/dd/output redirects >/git reset|clean|checkout…). /rollback restores. (Conceptually our worktree discipline, but at file-snapshot granularity and git-backed under the hood.)
config.yaml centralizes everything: model/provider, terminal backend, compression thresholds, memory caps, skill write_approval, checkpoint and session retention, curator settings.

4. The self-improvement learning loop (the "growing" part)

This is the piece with no direct Yggdrasil analog yet, and the most interesting to study.

Background review fork. After turns complete, Hermes can spawn a daemon-thread fork of the agent that replays the conversation snapshot with a whitelist of just the memory + skills toolsets (i.e. the memory tool plus skill_manage/skills_list/skill_view — it can read its skills, not only write them; everything else is denied) and decides whether to capture a skill or memory. It runs on the parent's live model/credentials/cache and never touches the main session's prompt cache. Writes are tagged with a background-review origin.
What warrants a skill (their explicit signals): the user corrected your style/tone/verbosity (treated as first-class — "stop being so verbose" becomes a durable skill edit); the user corrected your workflow; a non-trivial technique/fix; or a loaded skill was wrong/outdated (patch it now).
Action preference order (apply the earliest that fits): patch a currently-loaded skill → update an existing umbrella skill → add a references//templates//scripts/ support file → only then create a new class-level skill. New-skill names must be class-level — never a PR number, error string, or "fix-X-today" artifact. There's an explicit do-not-capture list (environment- dependent failures, negative tool claims, transient errors) so the agent doesn't ossify one-off breakage into permanent self-imposed constraints.
skill_manage tool gives the agent a CRUD API over its own skills (create/patch/edit/write_file/ remove_file/delete), with an optional write_approval staging mode (~/.hermes/pending/skills/, reviewed via /skills approve|reject).

4.1 The curator — background pruning of self-authored skills

An inactivity-triggered background task (no cron daemon): runs when enough time has passed (interval_hours, default 168 = 7d) and the agent's been idle (min_idle_hours, default 2).
Phase 1 (deterministic, always): age-based lifecycle — active (0–30d unused) → stale (30–90d) → archived (90+d, moved to .archive/, never deleted, recoverable). Pinned skills are exempt.
Phase 2 (LLM, opt-in consolidate: true): a fork surveys agent-created skills and may patch drift or merge overlapping ones into class-level umbrellas.
Scope guard: the curator only touches skills marked created_by: "agent" in .usage.json — hand-written and bundled/hub skills are left alone. Every real pass takes a tar.gz backup first (.curator_backups/) and writes a logs/curator/<run>/REPORT.md.

This is a near-perfect mirror of our /hygiene-check + ledger + archive/ system — cadence-gated, age-banded, archive-not-delete, backup-before-prune, report-after — but Hermes applies it to machine-authored skills and triggers it on idle time, where ours applies to bookmarks/backburner and triggers on session end / on demand. The convergence is striking and validates our design.

5. Yggdrasil-relevant observations

Pulling the threads together — where Hermes is a useful mirror, and where it diverges:

Strong parallels (independent convergence on the same ideas):

Layered identity vs. project context — SOUL.md/AGENTS.md ≈ our personal-CLAUDE.md/project-.claude.
Single scoped home, env-var re-rooted — HERMES_HOME + profiles ≈ our layered config + ~/.claude wiring; profiles are isolation we achieve with separate repos.
Cadence-gated, archive-not-delete hygiene — the curator ≈ /hygiene-check + ledger + archive/.
The checkpoint summary template — their compression summary (Goal/Progress/Decisions/Next Steps) ≈ our ## Current Checkpoint. They auto-generate; we hand-author.
Session lineage / handoff versioning — parent_session_id chains ≈ our handoff-file rename note.
Skill body shape — When-to-Use / Procedure / Pitfalls / Verification ≈ our SKILL.md conventions.
Read-only-by-default + staged writes — write_approval staging ≈ our prompt-on-write posture and puppet read-only gate.

Where Hermes goes further (candidate ideas, not endorsements):

A database spine for history (state.db + FTS5 + session_search) instead of flat markdown — the agent can full-text-search its own past sessions. Our equivalent is grepping markdown; a searchable history is a different scale.
Automated context assembly with explicit tiers + cache-aware ordering — we assemble context implicitly via what's in CLAUDE.md/current-plan.md; Hermes has a named, ordered, cache-conscious pipeline. The subdirectory-hint-into-tool-result trick (lazy context that doesn't bust the cache) is a genuinely novel pattern.
Self-authored skills + the background review fork — the agent writing/patching its own skills from conversational feedback. This is the headline capability we don't have; whether we'd want it is a values question (it trades dogfooded human-in-the-loop authorship for autonomy).

Where Yggdrasil deliberately diverges (and why that's fine):

Hermes is built for autonomy + always-on multi-platform operation; Yggdrasil is built for human-in-the-loop, save-and-resume, low-demand-character work. Many Hermes mechanisms (auto-skill- creation, goals-that-loop-until-done, cron self-direction) sit on the autonomy side of a line we've intentionally drawn. The lesson is in the structures (tiered context, the curator, the checkpoint template, the home-scoping), not the autonomy.

Most directly applicable to the current-plan.md refactor specifically:

Hermes proves out the idea of splitting one giant living doc into a small always-loaded core + on-demand detail. Their analog: a tiny cached system prompt + a skills index (names only) + progressive skill_view on demand; a frozen memory snapshot with a visible size gauge; searchable history in a DB rather than inline. Our 133 KB current-plan.md loaded every session is exactly the anti-pattern their progressive-disclosure + frozen-snapshot + searchable-history design avoids. The refactor could borrow: (a) a small core "checkpoint" that's always loaded, (b) a menu of detail loaded on demand, (c) history moved out of the always-loaded path into a searchable archive.

6. Fidelity & corrections (gap-check pass)

A fresh-eyes subagent independently re-verified every claim above against the cloned source (hermes_constants.py, hermes_state.py, agent/*.py, and the website/docs/ guides), citing files and line numbers. The report held up well — the three-tier context model, SOUL.md as slot #1, the context-file priority order, the subdir-hint-into-tool-result pattern, compression thresholds (0.50 agent / 0.85 gateway) and protect-first-3/last-20, the [Old tool output cleared…] placeholder, iterative re-compression, system_and_3 caching, the full SQLite schema (sessions/messages/FTS5/trigram/ session_search/lineage), memory caps + § delimiter + error-on-overflow, the skills layout and body convention, the skill action-preference order, and the entire curator spec (168h interval / 2h idle / 30d stale / 90d archive / created_by:"agent" scope / backup-before-pass / REPORT.md) all verified against source.

Corrections already folded into the body above (logged here for the trail):

Session-split end_reason — was written as "compression_split"; the source value is plain "compression" (compression_split does not exist; a separate "orphaned_compression" covers an error-recovery case). Fixed in §3.3.
Background-review tool whitelist — was "only two tools (memory + skill_manage)"; it's actually the memory + skills toolsets, and skills includes skills_list + skill_view, so the fork can read skills too. Fixed in §4.
Version/release trivia — the "v0.9 / Apr 2026 / ~27k stars" web-recall is unverifiable from the clone and contradicted by the source version 0.16.0. Softened/flagged in §1.
Checkpoints path & triggers — the bare repo is at checkpoints/store/ (not checkpoints/), and the documented snapshot triggers are destructive terminal commands (rm/mv/sed -i/redirects/ git reset…), not specifically write_file/patch. Fixed in §3.6.
Compression summary template — the report's "Progress (Done/In Progress/Blocked) / Next Steps" was a paraphrase; the literal headings (Completed Actions / Active State / In Progress / Blocked / … / Remaining Work) are now used. Fixed in §2.5.
Minor: the agent-side intercepted-tool list was missing clarify and read_terminal (6 total, now listed); the Telegram platform-hint example was backwards ("lean into Markdown," not "short messages"); the "~3k tokens" figure for skills_list was an unsourced estimate (removed). Fixed in §2.1/2.2/2.6/3.4.

Residual caveats: the SQLite DDL in §3.3 is described in prose (paraphrased, not reproduced verbatim) — accurate per the gap-check, but if exact column names/types are ever needed, read hermes_state.py:514-570 from source. The ContextEngine ABC base-class defaults (0.75 / 3 / 6) differ from the active ContextCompressor (0.50 / 3 / 20) cited above — §2.5 reflects the active engine, which is what runs.