Hermes Agent — Context Engineering & On-Disk Storage
Date: 2026-06-19
Subject: Nous Research's Hermes Agent (github.com/NousResearch/hermes-agent), analyzed for how it
engineers the agent's context ("context engineering") and how/where it persists data on disk.
Why: Hermes is a long-lived, self-improving agent runtime with a markdown-and-files memory model
that is a close structural cousin to Yggdrasil — a useful reference as we rethink Yggdrasil's structure
and current-plan.md.
Source fidelity: Built from a shallow git clone of the repo (174 MB) at /tmp/research/hermes-agent,
read via three parallel deep-read passes over the website/docs/ developer + user guides and the core
Python (hermes_constants.py, hermes_state.py, run_agent.py, agent/*.py). Fine-grained claims
(exact SQL DDL, the fuller ~/.hermes file inventory) were drawn from code/doc reads and then run
through a fresh-eyes gap-check pass against the clone — see the Fidelity & corrections section at the
end for what was verified, softened, or struck.
One-line orientation. Hermes is "the agent that grows with you": a persistent runtime (CLI + messaging gateways) whose state lives under a single
~/.hermes/home, whose context each turn is assembled from a layered stable → context → volatile prompt, and which writes its own skills from experience via a background self-improvement fork, then prunes them with a curator.
1. What Hermes is (the shape, briefly)
- A long-lived runtime, not a one-shot CLI. It runs as a persistent process with a gateway that fans out across messaging platforms (Telegram, Discord, Slack, email, …), plus a local CLI/TUI. It remembers across sessions and "gets more capable the longer it runs."
- Released Feb 2026; past our Jan 2026 knowledge cutoff — hence researched from source rather than
recalled. The cloned source is version 0.16.0 (
pyproject.toml). (Release-date/version/star-count trivia from web search — e.g. a "v0.9 everywhere release" and a ~27k-star count — is external recall I could not verify against the clone, and the v0.9 figure is contradicted by the 0.16.0 source; treat that trivia as unconfirmed.) - Three pillars relevant to us: (1) a layered context-assembly pipeline; (2) a single on-disk home with a SQLite session spine + markdown memory/skills; (3) a self-improvement learning loop that authors and maintains skills autonomously.
2. Context engineering — how Hermes assembles the agent's context
Hermes builds the system prompt each session as three tiers joined with \n\n, ordered
stable → context → volatile. The ordering is deliberate and cache-aware: the stable tier is a
long-lived cacheable prefix; the volatile tier is what gets rebuilt on compression. (Assembly entry
point: build_system_prompt_parts() in agent/system_prompt.py, returning a dict keyed
stable / context / volatile; called from AIAgent._build_system_prompt() in run_agent.py.)
2.1 Tier 1 — STABLE (identity + behavioral guidance; cached, rarely mutated)
In assembly order:
- Agent identity —
~/.hermes/SOUL.md(loaded viaload_soul_md()inagent/prompt_builder.py), falling back to a hardcodedDEFAULT_AGENT_IDENTITYif absent. This is slot #1 of the prompt. - Tool-aware behavior guidance — a set of named constant blocks: memory guidance, session-search guidance, parallel-tool-call guidance, skills guidance, a Hermes-help pointer.
- Task-completion guidance — anti-fabrication / use-the-tools directives (config-gated).
- Model-specific operational guidance — different nudges for Gemini vs. GPT/Codex vs. Grok (e.g., "execute, don't describe" for models prone to narrating tool use).
- Skills index — a structured list of available skills (name + category + short description),
built by
build_skills_system_prompt(). This is the menu; full skill bodies load on demand (see §3.4 progressive disclosure). - Environment hints — WSL/Termux detection, Python toolchain probe, etc.
- Coding posture + git-status snapshot — cached once per session (
agent/coding_context.py). - Active-profile hint and platform hint (per-channel style: e.g. CLI "avoid Markdown, use simple text", Telegram "lean into rich Markdown", etc., overridable in config).
2.2 Tier 2 — CONTEXT (project instructions; loaded at startup, held for the session)
- One project context file, first match wins (priority order):
.hermes.md/HERMES.md(walks up to git root) →AGENTS.md(cwd) →CLAUDE.md(cwd) →.cursorrules/.cursor/rules/*.mdc(cwd). Injected under a# Project Contextheader. Hermes natively readsAGENTS.mdandCLAUDE.md— the same convention files we use. - Progressive subdirectory discovery. Only the root context file enters the cached system prompt
at start. As the agent navigates into subdirectories, a
SubdirectoryHintTracker(explicitly modeled on Block/goose's tracker of the same name) discovers nested convention files (here it checksAGENTS.md/CLAUDE.md/.cursorrules— note: not.hermes.md— and loads all it finds in a dir, once per directory) and injects that content into the tool result, not the system prompt — specifically to avoid breaking the prompt cache. (This is a clever pattern worth noting for us: lazy, navigation-triggered context loading that keeps the cacheable prefix stable.) - All context files are security-scanned for prompt injection and truncated (~20K chars, head/tail with a marker) before inclusion.
2.3 Tier 3 — VOLATILE (memory snapshots + timestamp; rebuilt on compression)
- Memory snapshot —
~/.hermes/memories/MEMORY.md, a frozen snapshot injected under## Persistent Memory. Frozen = loaded once at session start; mid-session writes hit disk immediately but don't re-enter the live prompt until the next session, which preserves the prompt cache. The prompt shows a usage gauge (e.g.,MEMORY [67% — 1,474/2,200 chars]). - User-profile snapshot —
~/.hermes/memories/USER.md, injected under## User Profile. - External memory-provider block — if a provider plugin (e.g., Honcho) is active.
- Timestamp + session metadata (current time w/ tz, session id, model/provider).
2.4 Identity: SOUL.md vs. personality vs. AGENTS.md
A clean separation worth stealing conceptually:
| Layer | File / mechanism | Scope | Holds |
|---|---|---|---|
| Identity | ~/.hermes/SOUL.md |
Global to this Hermes instance | Tone, voice, directness, how to handle ambiguity — who it is |
| Personality | /personality slash command |
Session-temporary overlay | Built-in presets (concise, technical, teacher, pirate…) |
| Project context | AGENTS.md / CLAUDE.md |
Per-project | Conventions, paths, architecture — how this repo works |
Rule of thumb in their docs: applies everywhere → SOUL.md; belongs to one project → AGENTS.md. This is
directly analogous to our personal layer (CLAUDE.md) vs. project .claude/ split — Hermes draws the
same line, with SOUL.md ≈ the personal identity layer and AGENTS.md ≈ the project layer.
2.5 Keeping a long-lived agent inside the context window — compression & caching
Two independent compression layers:
- Agent-loop
ContextCompressor(primary;agent/context_compressor.py) fires at a configurable threshold (default 0.50 of context) using accurate API-reported token counts. It is an implementation of a pluggableContextEngineABC (agent/context_engine.py) — only one engine is active at a time; alternatives can be dropped in underplugins/context_engine/<name>/. - Gateway session hygiene (safety net;
gateway/run.py) fires at a fixed 85% before the agent processes an inbound message — catches sessions that accumulated overnight on a chat platform.
The compression algorithm (the part most relevant to our "summarize-the-middle, protect-the-ends" instincts):
- Prune old tool results (cheap, no LLM): tool outputs >200 chars outside the protected tail are
replaced with
[Old tool output cleared to save context space]. - Boundaries: protect the first N (default 3) and a token-budgeted tail (≥
protect_last_n, default 20); the middle is summarized. Boundaries are aligned so atool_call/tool_resultpair is never split. - Structured summary via an auxiliary model, against a fixed template. The literal headings are: Active Task · Goal · Constraints & Preferences · Completed Actions · Active State · In Progress · Blocked · Key Decisions · Resolved Questions · Pending Asks · Relevant Files · Remaining Work · Critical Context. Crucially, re-compression is iterative — the previous summary is fed back in with update instructions so detail survives repeated compactions.
- Reassemble: head + summary + untouched tail; orphaned tool pairs sanitized.
Note the resonance with Yggdrasil's checkpoint. That summary template is almost exactly the shape of our
## Current Checkpoint(Just finished / In progress / Next step / References). Hermes generates it automatically on context pressure; we hand-author it at session end via/save-progress. Same artifact, different trigger.
Prompt caching (Anthropic): a "system_and_3" strategy — cache the system prompt + a rolling
3-message window (4 breakpoints max), TTL configurable (default 5m). The whole tiering exists to keep
that cacheable prefix stable; compression appends its note only to the first message so the system
prompt cache survives.
2.6 The agent loop (one turn)
append user msg → build/reuse cached system prompt → preflight compression check → convert history to provider format (OpenAI/Anthropic/Codex adapters) → inject ephemeral overlays → apply cache markers → interruptible API call → if tool_calls: execute (concurrent via thread pool, order preserved) and loop; else persist session + flush memory + return. Six tools (todo, session_search, memory, clarify,
read_terminal, delegate_task) are intercepted agent-side before normal tool dispatch. Iteration
budget defaults to 90 turns; subagents (delegate_task) get independent, smaller budgets (default 50).
3. On-disk storage — how/where Hermes persists
3.1 The home root and the profile model
- Root:
~/.hermeson POSIX (%LOCALAPPDATA%\hermeson native Windows), overridable by theHERMES_HOMEenv var. Resolution lives inhermes_constants.pyviaget_hermes_home()— called pervasively so that one env var re-scopes the entire install. - Profiles:
~/.hermes/profiles/<name>/is a fully isolated instance (its ownconfig.yaml,.env,SOUL.md,state.db,memories/,skills/,logs/, …). Anactive_profilemarker file tracks the current one.HERMES_HOMEpointed at a profile dir makes everything scope to it. This is essentially our "layered configuration" idea expressed as on-disk directory scoping.
3.2 Annotated ~/.hermes layout (the parts that matter to us)
~/.hermes/
├── config.yaml # main config (YAML; ${ENV} substitution)
├── .env # API keys / secrets (KEY=VALUE)
├── SOUL.md # agent identity (prompt slot #1)
├── state.db (+ -wal/-shm) # SQLite session spine (WAL mode)
├── memories/
│ ├── MEMORY.md # agent memory — §-delimited entries, ~2,200 char cap
│ └── USER.md # user profile — §-delimited entries, ~1,375 char cap
├── skills/ # bundled + hub-installed + agent-authored skills (see §3.4)
│ ├── .usage.json # curator telemetry (per-skill use/view/patch counts, state)
│ ├── .bundled_manifest # content hashes of bundled skills (user-edit detection)
│ ├── .archive/ # curator-archived skills (recoverable)
│ └── .hub/ # hub-installed skill registry + audit log
├── checkpoints/ # opt-in file-rollback store (a shared *bare git repo*, content-addressed)
├── cron/ # scheduled jobs (jobs.json + per-job output/)
├── logs/ # rotated logs, incl. logs/curator/<run>/REPORT.md
├── plugins/ # memory- and context-engine plugins
└── profiles/<name>/ # isolated per-profile copies of all of the above
(The clone contains many more leaf files — OAuth tokens, per-platform pairing JSON, media/audio caches, a kanban DB. Catalogued but elided here as not relevant to the Yggdrasil comparison.)
3.3 Session storage — SQLite spine with full-text search
This is the biggest mechanical contrast with Yggdrasil (we keep everything in flat markdown; Hermes keeps conversation history in a database):
~/.hermes/state.db— SQLite in WAL mode, schema-versioned, with short lock timeouts + application-level retry/backoff and periodic checkpointing (it's a multi-platform gateway, so concurrent writers are real).sessionstable — one row per session: source (cli/telegram/discord/…), model, a snapshot of the system prompt at session start, token/cost accounting, a unique human-readabletitle, and aparent_session_idfor lineage.messagestable — full per-message history (role, content, tool_calls, reasoning, token counts).messages_fts(FTS5) + a trigram variant — full-text + substring/CJK search over history, exposed to the agent as asession_searchtool. The agent can grep its own past.- Session lineage: when compression splits a session, the old one is closed
(
end_reason="compression") and a child is created withparent_session_idset; titles chain"my project"→"my project #2"→#3, and resuming by title jumps to the newest in the chain.
Yggdrasil parallel: their session lineage is exactly our "rename
current-plan.md→YYYY-MM-DD-handoff.mdand start fresh" versioning note — but automated, with parent pointers and searchable history, instead of a manual file rename.
3.4 Skills on disk — the self-authored knowledge store
- Format: a skill is a directory
skills/<category>/<skill-name>/containingSKILL.md(YAML frontmatter + markdown body) plus optionalreferences/,templates/,scripts/,assets/. The body convention is When to Use · Quick Reference · Procedure · Pitfalls · Verification — i.e., the same "procedure + pitfalls + verification" shape our own skills trend toward. - Frontmatter carries
name/description/version, plus a richmetadata.hermesblock:tags,related_skills, conditional activation (requires_toolsets/requires_tools/fallback_for_*— show/hide a skill based on what tools are present), declaredconfigkeys, and ablueprint(turn a skill into a scheduled automation). There's also aplatforms:OS gate. - Search path / shadowing: local
~/.hermes/skills/is always first; configuredexternal_dirsfollow; a local skill shadows an external one of the same name. Bundled skills resolve from a separate path (env override → wheel-install → source checkout →~/.hermes/skillsfallback). - Progressive disclosure (3 levels):
skills_list()(names+descriptions; cheap index) →skill_view(name)(full SKILL.md) →skill_view(name, file_path)(a specific reference file). The agent pays tokens only for what it opens. Every skill is also auto-bound to a/skill-nameslash command, and skills can be grouped into bundles.
3.5 Memory store — markdown with a hard cap and no auto-compact
- Two files,
MEMORY.md(agent) andUSER.md(user), entries delimited by§, with character caps (not token caps — model-independent). - No auto-compaction. When a write would exceed the cap, the
memorytool returns an error with the current entries, and the agent must consolidate/remove in the same turn before retrying. This is a deliberate "force the agent to curate rather than silently drop" design — the same anti-rot instinct behind our hygiene system, enforced at the tool boundary. - Entries are security-scanned at load (injection/exfil patterns, invisible-Unicode stripping).
3.6 Checkpoints / rollback & config
checkpoints/is an opt-in (off by default) file-level undo: a single shared bare git repo atcheckpoints/store/with per-project history kept as refs (refs/hermes/<project-hash>, the hash derived from the working-dir path), snapshotting before destructive terminal commands (rm/mv/cp/sed -i/dd/output redirects>/git reset|clean|checkout…)./rollbackrestores. (Conceptually our worktree discipline, but at file-snapshot granularity and git-backed under the hood.)config.yamlcentralizes everything: model/provider, terminal backend, compression thresholds, memory caps, skillwrite_approval, checkpoint and session retention, curator settings.
4. The self-improvement learning loop (the "growing" part)
This is the piece with no direct Yggdrasil analog yet, and the most interesting to study.
- Background review fork. After turns complete, Hermes can spawn a daemon-thread fork of the
agent that replays the conversation snapshot with a whitelist of just the
memory+skillstoolsets (i.e. the memory tool plusskill_manage/skills_list/skill_view— it can read its skills, not only write them; everything else is denied) and decides whether to capture a skill or memory. It runs on the parent's live model/credentials/cache and never touches the main session's prompt cache. Writes are tagged with a background-review origin. - What warrants a skill (their explicit signals): the user corrected your style/tone/verbosity (treated as first-class — "stop being so verbose" becomes a durable skill edit); the user corrected your workflow; a non-trivial technique/fix; or a loaded skill was wrong/outdated (patch it now).
- Action preference order (apply the earliest that fits): patch a currently-loaded skill →
update an existing umbrella skill → add a
references//templates//scripts/support file → only then create a new class-level skill. New-skill names must be class-level — never a PR number, error string, or "fix-X-today" artifact. There's an explicit do-not-capture list (environment- dependent failures, negative tool claims, transient errors) so the agent doesn't ossify one-off breakage into permanent self-imposed constraints. skill_managetool gives the agent a CRUD API over its own skills (create/patch/edit/write_file/ remove_file/delete), with an optionalwrite_approvalstaging mode (~/.hermes/pending/skills/, reviewed via/skills approve|reject).
4.1 The curator — background pruning of self-authored skills
- An inactivity-triggered background task (no cron daemon): runs when enough time has passed
(
interval_hours, default 168 = 7d) and the agent's been idle (min_idle_hours, default 2). - Phase 1 (deterministic, always): age-based lifecycle —
active(0–30d unused) →stale(30–90d) →archived(90+d, moved to.archive/, never deleted, recoverable). Pinned skills are exempt. - Phase 2 (LLM, opt-in
consolidate: true): a fork surveys agent-created skills and may patch drift or merge overlapping ones into class-level umbrellas. - Scope guard: the curator only touches skills marked
created_by: "agent"in.usage.json— hand-written and bundled/hub skills are left alone. Every real pass takes a tar.gz backup first (.curator_backups/) and writes alogs/curator/<run>/REPORT.md.
This is a near-perfect mirror of our
/hygiene-check+ ledger + archive/ system — cadence-gated, age-banded, archive-not-delete, backup-before-prune, report-after — but Hermes applies it to machine-authored skills and triggers it on idle time, where ours applies to bookmarks/backburner and triggers on session end / on demand. The convergence is striking and validates our design.
5. Yggdrasil-relevant observations
Pulling the threads together — where Hermes is a useful mirror, and where it diverges:
Strong parallels (independent convergence on the same ideas):
- Layered identity vs. project context — SOUL.md/AGENTS.md ≈ our personal-CLAUDE.md/project-
.claude. - Single scoped home, env-var re-rooted —
HERMES_HOME+ profiles ≈ our layered config +~/.claudewiring; profiles are isolation we achieve with separate repos. - Cadence-gated, archive-not-delete hygiene — the curator ≈
/hygiene-check+ ledger +archive/. - The checkpoint summary template — their compression summary (Goal/Progress/Decisions/Next Steps)
≈ our
## Current Checkpoint. They auto-generate; we hand-author. - Session lineage / handoff versioning —
parent_session_idchains ≈ our handoff-file rename note. - Skill body shape — When-to-Use / Procedure / Pitfalls / Verification ≈ our SKILL.md conventions.
- Read-only-by-default + staged writes —
write_approvalstaging ≈ our prompt-on-write posture and puppet read-only gate.
Where Hermes goes further (candidate ideas, not endorsements):
- A database spine for history (
state.db+ FTS5 +session_search) instead of flat markdown — the agent can full-text-search its own past sessions. Our equivalent is grepping markdown; a searchable history is a different scale. - Automated context assembly with explicit tiers + cache-aware ordering — we assemble context implicitly via what's in CLAUDE.md/current-plan.md; Hermes has a named, ordered, cache-conscious pipeline. The subdirectory-hint-into-tool-result trick (lazy context that doesn't bust the cache) is a genuinely novel pattern.
- Self-authored skills + the background review fork — the agent writing/patching its own skills from conversational feedback. This is the headline capability we don't have; whether we'd want it is a values question (it trades dogfooded human-in-the-loop authorship for autonomy).
Where Yggdrasil deliberately diverges (and why that's fine):
- Hermes is built for autonomy + always-on multi-platform operation; Yggdrasil is built for human-in-the-loop, save-and-resume, low-demand-character work. Many Hermes mechanisms (auto-skill- creation, goals-that-loop-until-done, cron self-direction) sit on the autonomy side of a line we've intentionally drawn. The lesson is in the structures (tiered context, the curator, the checkpoint template, the home-scoping), not the autonomy.
Most directly applicable to the current-plan.md refactor specifically:
- Hermes proves out the idea of splitting one giant living doc into a small always-loaded core +
on-demand detail. Their analog: a tiny cached system prompt + a skills index (names only) +
progressive
skill_viewon demand; a frozen memory snapshot with a visible size gauge; searchable history in a DB rather than inline. Our 133 KBcurrent-plan.mdloaded every session is exactly the anti-pattern their progressive-disclosure + frozen-snapshot + searchable-history design avoids. The refactor could borrow: (a) a small core "checkpoint" that's always loaded, (b) a menu of detail loaded on demand, (c) history moved out of the always-loaded path into a searchable archive.
6. Fidelity & corrections (gap-check pass)
A fresh-eyes subagent independently re-verified every claim above against the cloned source
(hermes_constants.py, hermes_state.py, agent/*.py, and the website/docs/ guides), citing files
and line numbers. The report held up well — the three-tier context model, SOUL.md as slot #1, the
context-file priority order, the subdir-hint-into-tool-result pattern, compression thresholds
(0.50 agent / 0.85 gateway) and protect-first-3/last-20, the [Old tool output cleared…] placeholder,
iterative re-compression, system_and_3 caching, the full SQLite schema (sessions/messages/FTS5/trigram/
session_search/lineage), memory caps + § delimiter + error-on-overflow, the skills layout and body
convention, the skill action-preference order, and the entire curator spec (168h interval / 2h idle /
30d stale / 90d archive / created_by:"agent" scope / backup-before-pass / REPORT.md) all verified
against source.
Corrections already folded into the body above (logged here for the trail):
- Session-split
end_reason— was written as"compression_split"; the source value is plain"compression"(compression_splitdoes not exist; a separate"orphaned_compression"covers an error-recovery case). Fixed in §3.3. - Background-review tool whitelist — was "only two tools (
memory+skill_manage)"; it's actually thememory+skillstoolsets, andskillsincludesskills_list+skill_view, so the fork can read skills too. Fixed in §4. - Version/release trivia — the "v0.9 / Apr 2026 / ~27k stars" web-recall is unverifiable from the clone and contradicted by the source version 0.16.0. Softened/flagged in §1.
- Checkpoints path & triggers — the bare repo is at
checkpoints/store/(notcheckpoints/), and the documented snapshot triggers are destructive terminal commands (rm/mv/sed -i/redirects/ git reset…), not specificallywrite_file/patch. Fixed in §3.6. - Compression summary template — the report's "Progress (Done/In Progress/Blocked) / Next Steps" was a paraphrase; the literal headings (Completed Actions / Active State / In Progress / Blocked / … / Remaining Work) are now used. Fixed in §2.5.
- Minor: the agent-side intercepted-tool list was missing
clarifyandread_terminal(6 total, now listed); the Telegram platform-hint example was backwards ("lean into Markdown," not "short messages"); the "~3k tokens" figure forskills_listwas an unsourced estimate (removed). Fixed in §2.1/2.2/2.6/3.4.
Residual caveats: the SQLite DDL in §3.3 is described in prose (paraphrased, not reproduced verbatim) —
accurate per the gap-check, but if exact column names/types are ever needed, read hermes_state.py:514-570
from source. The ContextEngine ABC base-class defaults (0.75 / 3 / 6) differ from the active
ContextCompressor (0.50 / 3 / 20) cited above — §2.5 reflects the active engine, which is what runs.