Skip to main content

Prompt Assembly

Hermes deliberately separates:

  • cached system prompt state
  • ephemeral API-call-time additions

This is one of the most important design choices in the project because it affects:

  • token usage
  • prompt caching effectiveness
  • session continuity
  • memory correctness

Primary files:

  • run_agent.py
  • agent/prompt_builder.py
  • tools/memory_tool.py

Cached system prompt layers

The cached system prompt is assembled in roughly this order:

  1. default agent identity
  2. tool-aware behavior guidance
  3. Honcho static block (when active)
  4. optional system message
  5. frozen MEMORY snapshot
  6. frozen USER profile snapshot
  7. skills index
  8. context files (AGENTS.md, SOUL.md, .cursorrules, .cursor/rules/*.mdc)
  9. timestamp / optional session ID
  10. platform hint

API-call-time-only layers

These are intentionally not persisted as part of the cached system prompt:

  • ephemeral_system_prompt
  • prefill messages
  • gateway-derived session context overlays
  • later-turn Honcho recall injected into the current-turn user message

This separation keeps the stable prefix stable for caching.

Memory snapshots

Local memory and user profile data are injected as frozen snapshots at session start. Mid-session writes update disk state but do not mutate the already-built system prompt until a new session or forced rebuild occurs.

Context files

agent/prompt_builder.py scans and sanitizes:

  • AGENTS.md
  • SOUL.md
  • .cursorrules
  • .cursor/rules/*.mdc

Long files are truncated before injection.

Skills index

The skills system contributes a compact skills index to the prompt when skills tooling is available.

Why prompt assembly is split this way

The architecture is intentionally optimized to:

  • preserve provider-side prompt caching
  • avoid mutating history unnecessarily
  • keep memory semantics understandable
  • let gateway/ACP/CLI add context without poisoning persistent prompt state