Prompt Assembly

Hermes deliberately separates:

cached system prompt state
ephemeral API-call-time additions

This is one of the most important design choices in the project because it affects:

token usage
prompt caching effectiveness
session continuity
memory correctness

Primary files:

run_agent.py
agent/prompt_builder.py
tools/memory_tool.py

Cached system prompt layers

The cached system prompt is assembled in roughly this order:

agent identity — SOUL.md from HERMES_HOME when available, otherwise falls back to DEFAULT_AGENT_IDENTITY in prompt_builder.py
tool-aware behavior guidance
Honcho static block (when active)
optional system message
frozen MEMORY snapshot
frozen USER profile snapshot
skills index
context files (AGENTS.md, .cursorrules, .cursor/rules/*.mdc) — SOUL.md is not included here when it was already loaded as the identity in step 1
timestamp / optional session ID
platform hint

When skip_context_files is set (e.g., subagent delegation), SOUL.md is not loaded and the hardcoded DEFAULT_AGENT_IDENTITY is used instead.

Concrete example: assembled system prompt

Here is a simplified view of what the final system prompt looks like when all layers are present (comments show the source of each section):

# Layer 1: Agent Identity (from ~/.hermes/SOUL.md)
You are Hermes, an AI assistant created by Nous Research.
You are an expert software engineer and researcher.
You value correctness, clarity, and efficiency.
...

# Layer 2: Tool-aware behavior guidance
You have persistent memory across sessions. Save durable facts using
the memory tool: user preferences, environment details, tool quirks,
and stable conventions. Memory is injected into every turn, so keep
it compact and focused on facts that will still matter later.
...
When the user references something from a past conversation or you
suspect relevant cross-session context exists, use session_search
to recall it before asking them to repeat themselves.

# Tool-use enforcement (for GPT/Codex models only)
You MUST use your tools to take action — do not describe what you
would do or plan to do without actually doing it.
...

# Layer 3: Honcho static block (when active)
[Honcho personality/context data]

# Layer 4: Optional system message (from config or API)
[User-configured system message override]

# Layer 5: Frozen MEMORY snapshot
## Persistent Memory
- User prefers Python 3.12, uses pyproject.toml
- Default editor is nvim
- Working on project "atlas" in ~/code/atlas
- Timezone: US/Pacific

# Layer 6: Frozen USER profile snapshot
## User Profile
- Name: Alice
- GitHub: alice-dev

# Layer 7: Skills index
## Skills (mandatory)
Before replying, scan the skills below. If one clearly matches
your task, load it with skill_view(name) and follow its instructions.
...
<available_skills>
  software-development:
    - code-review: Structured code review workflow
    - test-driven-development: TDD methodology
  research:
    - arxiv: Search and summarize arXiv papers
</available_skills>

# Layer 8: Context files (from project directory)
# Project Context
The following project context files have been loaded and should be followed:

## AGENTS.md
This is the atlas project. Use pytest for testing. The main
entry point is src/atlas/main.py. Always run `make lint` before
committing.

# Layer 9: Timestamp + session
Current time: 2026-03-30T14:30:00-07:00
Session: abc123

# Layer 10: Platform hint
You are a CLI AI Agent. Try not to use markdown but simple text
renderable inside a terminal.

How SOUL.md appears in the prompt

SOUL.md lives at ~/.hermes/SOUL.md and serves as the agent's identity — the very first section of the system prompt. The loading logic in prompt_builder.py works as follows:

# From agent/prompt_builder.py (simplified)
def load_soul_md() -> Optional[str]:
    soul_path = get_hermes_home() / "SOUL.md"
    if not soul_path.exists():
        return None
    content = soul_path.read_text(encoding="utf-8").strip()
    content = _scan_context_content(content, "SOUL.md")  # Security scan
    content = _truncate_content(content, "SOUL.md")       # Cap at 20k chars
    return content

When load_soul_md() returns content, it replaces the hardcoded DEFAULT_AGENT_IDENTITY. The build_context_files_prompt() function is then called with skip_soul=True to prevent SOUL.md from appearing twice (once as identity, once as a context file).

If SOUL.md doesn't exist, the system falls back to:

You are Hermes Agent, an intelligent AI assistant created by Nous Research.
You are helpful, knowledgeable, and direct. You assist users with a wide
range of tasks including answering questions, writing and editing code,
analyzing information, creative work, and executing actions via your tools.
You communicate clearly, admit uncertainty when appropriate, and prioritize
being genuinely useful over being verbose unless otherwise directed below.
Be targeted and efficient in your exploration and investigations.

How context files are injected

build_context_files_prompt() uses a priority system — only one project context type is loaded (first match wins):

# From agent/prompt_builder.py (simplified)
def build_context_files_prompt(cwd=None, skip_soul=False):
    cwd_path = Path(cwd).resolve()

    # Priority: first match wins — only ONE project context loaded
    project_context = (
        _load_hermes_md(cwd_path)       # 1. .hermes.md / HERMES.md (walks to git root)
        or _load_agents_md(cwd_path)    # 2. AGENTS.md (cwd only)
        or _load_claude_md(cwd_path)    # 3. CLAUDE.md (cwd only)
        or _load_cursorrules(cwd_path)  # 4. .cursorrules / .cursor/rules/*.mdc
    )

    sections = []
    if project_context:
        sections.append(project_context)

    # SOUL.md from HERMES_HOME (independent of project context)
    if not skip_soul:
        soul_content = load_soul_md()
        if soul_content:
            sections.append(soul_content)

    if not sections:
        return ""

    return (
        "# Project Context\n\n"
        "The following project context files have been loaded "
        "and should be followed:\n\n"
        + "\n".join(sections)
    )

Context file discovery details

Priority	Files	Search scope	Notes
1	`.hermes.md`, `HERMES.md`	CWD up to git root	Hermes-native project config
2	`AGENTS.md`	CWD only	Common agent instruction file
3	`CLAUDE.md`	CWD only	Claude Code compatibility
4	`.cursorrules`, `.cursor/rules/*.mdc`	CWD only	Cursor compatibility

All context files are:

Security scanned — checked for prompt injection patterns (invisible unicode, "ignore previous instructions", credential exfiltration attempts)
Truncated — capped at 20,000 characters using 70/20 head/tail ratio with a truncation marker
YAML frontmatter stripped — .hermes.md frontmatter is removed (reserved for future config overrides)

API-call-time-only layers

These are intentionally not persisted as part of the cached system prompt:

ephemeral_system_prompt
prefill messages
gateway-derived session context overlays
later-turn Honcho recall injected into the current-turn user message

This separation keeps the stable prefix stable for caching.

Memory snapshots

Local memory and user profile data are injected as frozen snapshots at session start. Mid-session writes update disk state but do not mutate the already-built system prompt until a new session or forced rebuild occurs.

Context files

agent/prompt_builder.py scans and sanitizes project context files using a priority system — only one type is loaded (first match wins):

.hermes.md / HERMES.md (walks to git root)
AGENTS.md (CWD at startup; subdirectories discovered progressively during the session via agent/subdirectory_hints.py)
CLAUDE.md (CWD only)
.cursorrules / .cursor/rules/*.mdc (CWD only)

SOUL.md is loaded separately via load_soul_md() for the identity slot. When it loads successfully, build_context_files_prompt(skip_soul=True) prevents it from appearing twice.

Long files are truncated before injection.

Skills index

The skills system contributes a compact skills index to the prompt when skills tooling is available.

Why prompt assembly is split this way

The architecture is intentionally optimized to:

preserve provider-side prompt caching
avoid mutating history unnecessarily
keep memory semantics understandable
let gateway/ACP/CLI add context without poisoning persistent prompt state

Cached system prompt layers​

Concrete example: assembled system prompt​

How SOUL.md appears in the prompt​

How context files are injected​

Context file discovery details​

API-call-time-only layers​

Memory snapshots​

Context files​

Skills index​

Why prompt assembly is split this way​

Related docs​