# Hermes Agent — Full Documentation
This file is the entire Hermes Agent documentation concatenated for LLM context ingestion. Section order reflects docs-site navigation: Getting Started, Using Hermes, Features, Messaging, Integrations, Guides, Developer Guide, Reference, then everything else.
Canonical site: https://hermes-agent.nousresearch.com/docs
Short index: https://hermes-agent.nousresearch.com/docs/llms.txt
---
# Installation
# Installation
Get Hermes Agent up and running in under two minutes with the one-line installer.
## Quick Install
### Linux / macOS / WSL2
```bash
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
```
### Android / Termux
Hermes now ships a Termux-aware installer path too:
```bash
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
```
The installer detects Termux automatically and switches to a tested Android flow:
- uses Termux `pkg` for system dependencies (`git`, `python`, `nodejs`, `ripgrep`, `ffmpeg`, build tools)
- creates the virtualenv with `python -m venv`
- exports `ANDROID_API_LEVEL` automatically for Android wheel builds
- installs a curated `.[termux]` extra with `pip`
- skips the untested browser / WhatsApp bootstrap by default
If you want the fully explicit path, follow the dedicated [Termux guide](./termux.md).
:::warning Windows
Native Windows is **not supported**. Please install [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) and run Hermes Agent from there. The install command above works inside WSL2.
:::
### What the Installer Does
The installer handles everything automatically — all dependencies (Python, Node.js, ripgrep, ffmpeg), the repo clone, virtual environment, global `hermes` command setup, and LLM provider configuration. By the end, you're ready to chat.
#### Install Layout
Where the installer puts things depends on whether you're installing as a normal user or as root:
| Installer | Code lives at | `hermes` binary | Data directory |
|---|---|---|---|
| Per-user (normal) | `~/.hermes/hermes-agent/` | `~/.local/bin/hermes` (symlink) | `~/.hermes/` |
| Root-mode (`sudo curl … \| sudo bash`) | `/usr/local/lib/hermes-agent/` | `/usr/local/bin/hermes` | `/root/.hermes/` (or `$HERMES_HOME`) |
The root-mode **FHS layout** (`/usr/local/lib/…`, `/usr/local/bin/hermes`) matches where other system-wide developer tools land on Linux. It's useful for shared-machine deployments where one system install should serve every user. Per-user config (auth, skills, sessions) still lives under each user's `~/.hermes/` or explicit `HERMES_HOME`.
### After Installation
Reload your shell and start chatting:
```bash
source ~/.bashrc # or: source ~/.zshrc
hermes # Start chatting!
```
To reconfigure individual settings later, use the dedicated commands:
```bash
hermes model # Choose your LLM provider and model
hermes tools # Configure which tools are enabled
hermes gateway setup # Set up messaging platforms
hermes config set # Set individual config values
hermes setup # Or run the full setup wizard to configure everything at once
```
---
## Prerequisites
The only prerequisite is **Git**. The installer automatically handles everything else:
- **uv** (fast Python package manager)
- **Python 3.11** (via uv, no sudo needed)
- **Node.js v22** (for browser automation and WhatsApp bridge)
- **ripgrep** (fast file search)
- **ffmpeg** (audio format conversion for TTS)
:::info
You do **not** need to install Python, Node.js, ripgrep, or ffmpeg manually. The installer detects what's missing and installs it for you. Just make sure `git` is available (`git --version`).
:::
:::tip Nix users
If you use Nix (on NixOS, macOS, or Linux), there's a dedicated setup path with a Nix flake, declarative NixOS module, and optional container mode. See the **[Nix & NixOS Setup](./nix-setup.md)** guide.
:::
---
## Manual / Developer Installation
If you want to clone the repo and install from source — for contributing, running from a specific branch, or having full control over the virtual environment — see the [Development Setup](../developer-guide/contributing.md#development-setup) section in the Contributing guide.
---
## Troubleshooting
| Problem | Solution |
|---------|----------|
| `hermes: command not found` | Reload your shell (`source ~/.bashrc`) or check PATH |
| `API key not set` | Run `hermes model` to configure your provider, or `hermes config set OPENROUTER_API_KEY your_key` |
| Missing config after update | Run `hermes config check` then `hermes config migrate` |
For more diagnostics, run `hermes doctor` — it will tell you exactly what's missing and how to fix it.
---
# Quickstart
# Quickstart
This guide gets you from zero to a working Hermes setup that survives real use. Install, choose a provider, verify a working chat, and know exactly what to do when something breaks.
## Prefer to watch?
**Onchain AI Garage** put together a Masterclass walkthrough of installation, setup, and basic commands — a good companion to this page if you'd rather follow along on video. For more, see the full [Hermes Agent Tutorials & Use Cases](https://www.youtube.com/channel/UCqB1bhMwGsW-yefBxYwFCCg) playlist.
## Who this is for
- Brand new and want the shortest path to a working setup
- Switching providers and don't want to lose time to config mistakes
- Setting up Hermes for a team, bot, or always-on workflow
- Tired of "it installed, but it still does nothing"
## The fastest path
Pick the row that matches your goal:
| Goal | Do this first | Then do this |
|---|---|---|
| I just want Hermes working on my machine | `hermes setup` | Run a real chat and verify it responds |
| I already know my provider | `hermes model` | Save the config, then start chatting |
| I want a bot or always-on setup | `hermes gateway setup` after CLI works | Connect Telegram, Discord, Slack, or another platform |
| I want a local or self-hosted model | `hermes model` → custom endpoint | Verify the endpoint, model name, and context length |
| I want multi-provider fallback | `hermes model` first | Add routing and fallback only after the base chat works |
**Rule of thumb:** if Hermes cannot complete a normal chat, do not add more features yet. Get one clean conversation working first, then layer on gateway, cron, skills, voice, or routing.
---
## 1. Install Hermes Agent
Run the one-line installer:
```bash
# Linux / macOS / WSL2 / Android (Termux)
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
```
:::tip Android / Termux
If you're installing on a phone, see the dedicated [Termux guide](./termux.md) for the tested manual path, supported extras, and current Android-specific limitations.
:::
:::tip Windows Users
Install [WSL2](https://learn.microsoft.com/en-us/windows/wsl/install) first, then run the command above inside your WSL2 terminal.
:::
After it finishes, reload your shell:
```bash
source ~/.bashrc # or source ~/.zshrc
```
For detailed installation options, prerequisites, and troubleshooting, see the [Installation guide](./installation.md).
## 2. Choose a Provider
The single most important setup step. Use `hermes model` to walk through the choice interactively:
```bash
hermes model
```
Good defaults:
| Provider | What it is | How to set up |
|----------|-----------|---------------|
| **Nous Portal** | Subscription-based, zero-config | OAuth login via `hermes model` |
| **OpenAI Codex** | ChatGPT OAuth, uses Codex models | Device code auth via `hermes model` |
| **Anthropic** | Claude models directly — Max plan + extra usage credits (OAuth), or API key for pay-per-token | `hermes model` → OAuth login (requires Max + extra credits), or an Anthropic API key |
| **OpenRouter** | Multi-provider routing across many models | Enter your API key |
| **Z.AI** | GLM / Zhipu-hosted models | Set `GLM_API_KEY` / `ZAI_API_KEY` |
| **Kimi / Moonshot** | Moonshot-hosted coding and chat models | Set `KIMI_API_KEY` |
| **Kimi / Moonshot China** | China-region Moonshot endpoint | Set `KIMI_CN_API_KEY` |
| **Arcee AI** | Trinity models | Set `ARCEEAI_API_KEY` |
| **GMI Cloud** | Multi-model direct API | Set `GMI_API_KEY` |
| **MiniMax (OAuth)** | MiniMax-M2.7 via browser OAuth — no API key needed | `hermes model` → MiniMax (OAuth) |
| **MiniMax** | International MiniMax endpoint | Set `MINIMAX_API_KEY` |
| **MiniMax China** | China-region MiniMax endpoint | Set `MINIMAX_CN_API_KEY` |
| **Alibaba Cloud** | Qwen models via DashScope | Set `DASHSCOPE_API_KEY` |
| **Hugging Face** | 20+ open models via unified router (Qwen, DeepSeek, Kimi, etc.) | Set `HF_TOKEN` |
| **AWS Bedrock** | Claude, Nova, Llama, DeepSeek via native Converse API | IAM role or `aws configure` ([guide](../guides/aws-bedrock.md)) |
| **Kilo Code** | KiloCode-hosted models | Set `KILOCODE_API_KEY` |
| **OpenCode Zen** | Pay-as-you-go access to curated models | Set `OPENCODE_ZEN_API_KEY` |
| **OpenCode Go** | $10/month subscription for open models | Set `OPENCODE_GO_API_KEY` |
| **DeepSeek** | Direct DeepSeek API access | Set `DEEPSEEK_API_KEY` |
| **NVIDIA NIM** | Nemotron models via build.nvidia.com or local NIM | Set `NVIDIA_API_KEY` (optional: `NVIDIA_BASE_URL`) |
| **GitHub Copilot** | GitHub Copilot subscription (GPT-5.x, Claude, Gemini, etc.) | OAuth via `hermes model`, or `COPILOT_GITHUB_TOKEN` / `GH_TOKEN` |
| **GitHub Copilot ACP** | Copilot ACP agent backend (spawns local `copilot` CLI) | `hermes model` (requires `copilot` CLI + `copilot login`) |
| **Vercel AI Gateway** | Vercel AI Gateway routing | Set `AI_GATEWAY_API_KEY` |
| **Custom Endpoint** | VLLM, SGLang, Ollama, or any OpenAI-compatible API | Set base URL + API key |
For most first-time users: choose a provider, accept the defaults unless you know why you're changing them. The full provider catalog with env vars and setup steps lives on the [Providers](../integrations/providers.md) page.
:::caution Minimum context: 64K tokens
Hermes Agent requires a model with at least **64,000 tokens** of context. Models with smaller windows cannot maintain enough working memory for multi-step tool-calling workflows and will be rejected at startup. Most hosted models (Claude, GPT, Gemini, Qwen, DeepSeek) meet this easily. If you're running a local model, set its context size to at least 64K (e.g. `--ctx-size 65536` for llama.cpp or `-c 65536` for Ollama).
:::
:::tip
You can switch providers at any time with `hermes model` — no lock-in. For a full list of all supported providers and setup details, see [AI Providers](../integrations/providers.md).
:::
### How settings are stored
Hermes separates secrets from normal config:
- **Secrets and tokens** → `~/.hermes/.env`
- **Non-secret settings** → `~/.hermes/config.yaml`
The easiest way to set values correctly is through the CLI:
```bash
hermes config set model anthropic/claude-opus-4.6
hermes config set terminal.backend docker
hermes config set OPENROUTER_API_KEY sk-or-...
```
The right value goes to the right file automatically.
## 3. Run Your First Chat
```bash
hermes # classic CLI
hermes --tui # modern TUI (recommended)
```
You'll see a welcome banner with your model, available tools, and skills. Use a prompt that's specific and easy to verify:
:::tip Pick your interface
Hermes ships with two terminal interfaces: the classic `prompt_toolkit` CLI and a newer [TUI](../user-guide/tui.md) with modal overlays, mouse selection, and non-blocking input. Both share the same sessions, slash commands, and config — try each with `hermes` vs `hermes --tui`.
:::
```
Summarize this repo in 5 bullets and tell me what the main entrypoint is.
```
```
Check my current directory and tell me what looks like the main project file.
```
```
Help me set up a clean GitHub PR workflow for this codebase.
```
**What success looks like:**
- The banner shows your chosen model/provider
- Hermes replies without error
- It can use a tool if needed (terminal, file read, web search)
- The conversation continues normally for more than one turn
If that works, you're past the hardest part.
## 4. Verify Sessions Work
Before moving on, make sure resume works:
```bash
hermes --continue # Resume the most recent session
hermes -c # Short form
```
That should bring you back to the session you just had. If it doesn't, check whether you're in the same profile and whether the session actually saved. This matters later when you're juggling multiple setups or machines.
## 5. Try Key Features
### Use the terminal
```
❯ What's my disk usage? Show the top 5 largest directories.
```
The agent runs terminal commands on your behalf and shows results.
### Slash commands
Type `/` to see an autocomplete dropdown of all commands:
| Command | What it does |
|---------|-------------|
| `/help` | Show all available commands |
| `/tools` | List available tools |
| `/model` | Switch models interactively |
| `/personality pirate` | Try a fun personality |
| `/save` | Save the conversation |
### Multi-line input
Press `Alt+Enter` or `Ctrl+J` to add a new line. Great for pasting code or writing detailed prompts.
### Interrupt the agent
If the agent is taking too long, type a new message and press Enter — it interrupts the current task and switches to your new instructions. `Ctrl+C` also works.
## 6. Add the Next Layer
Only after the base chat works. Pick what you need:
### Bot or shared assistant
```bash
hermes gateway setup # Interactive platform configuration
```
Connect [Telegram](/docs/user-guide/messaging/telegram), [Discord](/docs/user-guide/messaging/discord), [Slack](/docs/user-guide/messaging/slack), [WhatsApp](/docs/user-guide/messaging/whatsapp), [Signal](/docs/user-guide/messaging/signal), [Email](/docs/user-guide/messaging/email), or [Home Assistant](/docs/user-guide/messaging/homeassistant), or [Microsoft Teams](/docs/user-guide/messaging/teams).
### Automation and tools
- `hermes tools` — tune tool access per platform
- `hermes skills` — browse and install reusable workflows
- Cron — only after your bot or CLI setup is stable
### Sandboxed terminal
For safety, run the agent in a Docker container or on a remote server:
```bash
hermes config set terminal.backend docker # Docker isolation
hermes config set terminal.backend ssh # Remote server
```
### Voice mode
```bash
pip install "hermes-agent[voice]"
# Includes faster-whisper for free local speech-to-text
```
Then in the CLI: `/voice on`. Press `Ctrl+B` to record. See [Voice Mode](../user-guide/features/voice-mode.md).
### Skills
```bash
hermes skills search kubernetes
hermes skills install openai/skills/k8s
```
Or use `/skills` inside a chat session.
### MCP servers
```yaml
# Add to ~/.hermes/config.yaml
mcp_servers:
github:
command: npx
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxx"
```
### Editor integration (ACP)
```bash
pip install -e '.[acp]'
hermes acp
```
See [ACP Editor Integration](../user-guide/features/acp.md).
---
## Common Failure Modes
These are the problems that waste the most time:
| Symptom | Likely cause | Fix |
|---|---|---|
| Hermes opens but gives empty or broken replies | Provider auth or model selection is wrong | Run `hermes model` again and confirm provider, model, and auth |
| Custom endpoint "works" but returns garbage | Wrong base URL, model name, or not actually OpenAI-compatible | Verify the endpoint in a separate client first |
| Gateway starts but nobody can message it | Bot token, allowlist, or platform setup is incomplete | Re-run `hermes gateway setup` and check `hermes gateway status` |
| `hermes --continue` can't find old session | Switched profiles or session never saved | Check `hermes sessions list` and confirm you're in the right profile |
| Model unavailable or odd fallback behavior | Provider routing or fallback settings are too aggressive | Keep routing off until the base provider is stable |
| `hermes doctor` flags config problems | Config values are missing or stale | Fix the config, retest a plain chat before adding features |
## Recovery Toolkit
When something feels off, use this order:
1. `hermes doctor`
2. `hermes model`
3. `hermes setup`
4. `hermes sessions list`
5. `hermes --continue`
6. `hermes gateway status`
That sequence gets you from "broken vibes" back to a known state fast.
---
## Quick Reference
| Command | Description |
|---------|-------------|
| `hermes` | Start chatting |
| `hermes model` | Choose your LLM provider and model |
| `hermes tools` | Configure which tools are enabled per platform |
| `hermes setup` | Full setup wizard (configures everything at once) |
| `hermes doctor` | Diagnose issues |
| `hermes update` | Update to latest version |
| `hermes gateway` | Start the messaging gateway |
| `hermes --continue` | Resume last session |
## Next Steps
- **[CLI Guide](../user-guide/cli.md)** — Master the terminal interface
- **[Configuration](../user-guide/configuration.md)** — Customize your setup
- **[Messaging Gateway](../user-guide/messaging/index.md)** — Connect Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant, Teams, and more
- **[Tools & Toolsets](../user-guide/features/tools.md)** — Explore available capabilities
- **[AI Providers](../integrations/providers.md)** — Full provider list and setup details
- **[Skills System](../user-guide/features/skills.md)** — Reusable workflows and knowledge
- **[Tips & Best Practices](../guides/tips.md)** — Power user tips
---
# Learning Path
# Learning Path
Hermes Agent can do a lot — CLI assistant, Telegram/Discord bot, task automation, RL training, and more. This page helps you figure out where to start and what to read based on your experience level and what you're trying to accomplish.
:::tip Start Here
If you haven't installed Hermes Agent yet, begin with the [Installation guide](/docs/getting-started/installation) and then run through the [Quickstart](/docs/getting-started/quickstart). Everything below assumes you have a working installation.
:::
## How to Use This Page
- **Know your level?** Jump to the [experience-level table](#by-experience-level) and follow the reading order for your tier.
- **Have a specific goal?** Skip to [By Use Case](#by-use-case) and find the scenario that matches.
- **Just browsing?** Check the [Key Features](#key-features-at-a-glance) table for a quick overview of everything Hermes Agent can do.
## By Experience Level
| Level | Goal | Recommended Reading | Time Estimate |
|---|---|---|---|
| **Beginner** | Get up and running, have basic conversations, use built-in tools | [Installation](/docs/getting-started/installation) → [Quickstart](/docs/getting-started/quickstart) → [CLI Usage](/docs/user-guide/cli) → [Configuration](/docs/user-guide/configuration) | ~1 hour |
| **Intermediate** | Set up messaging bots, use advanced features like memory, cron jobs, and skills | [Sessions](/docs/user-guide/sessions) → [Messaging](/docs/user-guide/messaging) → [Tools](/docs/user-guide/features/tools) → [Skills](/docs/user-guide/features/skills) → [Memory](/docs/user-guide/features/memory) → [Cron](/docs/user-guide/features/cron) | ~2–3 hours |
| **Advanced** | Build custom tools, create skills, train models with RL, contribute to the project | [Architecture](/docs/developer-guide/architecture) → [Adding Tools](/docs/developer-guide/adding-tools) → [Creating Skills](/docs/developer-guide/creating-skills) → [RL Training](/docs/user-guide/features/rl-training) → [Contributing](/docs/developer-guide/contributing) | ~4–6 hours |
## By Use Case
Pick the scenario that matches what you want to do. Each one links you to the relevant docs in the order you should read them.
### "I want a CLI coding assistant"
Use Hermes Agent as an interactive terminal assistant for writing, reviewing, and running code.
1. [Installation](/docs/getting-started/installation)
2. [Quickstart](/docs/getting-started/quickstart)
3. [CLI Usage](/docs/user-guide/cli)
4. [Code Execution](/docs/user-guide/features/code-execution)
5. [Context Files](/docs/user-guide/features/context-files)
6. [Tips & Tricks](/docs/guides/tips)
:::tip
Pass files directly into your conversation with context files. Hermes Agent can read, edit, and run code in your projects.
:::
### "I want a Telegram/Discord bot"
Deploy Hermes Agent as a bot on your favorite messaging platform.
1. [Installation](/docs/getting-started/installation)
2. [Configuration](/docs/user-guide/configuration)
3. [Messaging Overview](/docs/user-guide/messaging)
4. [Telegram Setup](/docs/user-guide/messaging/telegram)
5. [Discord Setup](/docs/user-guide/messaging/discord)
6. [Voice Mode](/docs/user-guide/features/voice-mode)
7. [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)
8. [Security](/docs/user-guide/security)
For full project examples, see:
- [Daily Briefing Bot](/docs/guides/daily-briefing-bot)
- [Team Telegram Assistant](/docs/guides/team-telegram-assistant)
### "I want to automate tasks"
Schedule recurring tasks, run batch jobs, or chain agent actions together.
1. [Quickstart](/docs/getting-started/quickstart)
2. [Cron Scheduling](/docs/user-guide/features/cron)
3. [Batch Processing](/docs/user-guide/features/batch-processing)
4. [Delegation](/docs/user-guide/features/delegation)
5. [Hooks](/docs/user-guide/features/hooks)
:::tip
Cron jobs let Hermes Agent run tasks on a schedule — daily summaries, periodic checks, automated reports — without you being present.
:::
### "I want to build custom tools/skills"
Extend Hermes Agent with your own tools and reusable skill packages.
1. [Plugins](/docs/user-guide/features/plugins)
2. [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin)
3. [Tools Overview](/docs/user-guide/features/tools)
4. [Skills Overview](/docs/user-guide/features/skills)
5. [MCP (Model Context Protocol)](/docs/user-guide/features/mcp)
6. [Architecture](/docs/developer-guide/architecture)
7. [Adding Tools](/docs/developer-guide/adding-tools)
8. [Creating Skills](/docs/developer-guide/creating-skills)
:::tip
For most custom tool creation, start with plugins. The [Adding Tools](/docs/developer-guide/adding-tools)
page is for built-in Hermes core development, not the usual user/custom-tool path.
:::
### "I want to train models"
Use reinforcement learning to fine-tune model behavior with Hermes Agent's built-in RL training pipeline.
1. [Quickstart](/docs/getting-started/quickstart)
2. [Configuration](/docs/user-guide/configuration)
3. [RL Training](/docs/user-guide/features/rl-training)
4. [Provider Routing](/docs/user-guide/features/provider-routing)
5. [Architecture](/docs/developer-guide/architecture)
:::tip
RL training works best when you already understand the basics of how Hermes Agent handles conversations and tool calls. Run through the Beginner path first if you're new.
:::
### "I want to use it as a Python library"
Integrate Hermes Agent into your own Python applications programmatically.
1. [Installation](/docs/getting-started/installation)
2. [Quickstart](/docs/getting-started/quickstart)
3. [Python Library Guide](/docs/guides/python-library)
4. [Architecture](/docs/developer-guide/architecture)
5. [Tools](/docs/user-guide/features/tools)
6. [Sessions](/docs/user-guide/sessions)
## Key Features at a Glance
Not sure what's available? Here's a quick directory of major features:
| Feature | What It Does | Link |
|---|---|---|
| **Tools** | Built-in tools the agent can call (file I/O, search, shell, etc.) | [Tools](/docs/user-guide/features/tools) |
| **Skills** | Installable plugin packages that add new capabilities | [Skills](/docs/user-guide/features/skills) |
| **Memory** | Persistent memory across sessions | [Memory](/docs/user-guide/features/memory) |
| **Context Files** | Feed files and directories into conversations | [Context Files](/docs/user-guide/features/context-files) |
| **MCP** | Connect to external tool servers via Model Context Protocol | [MCP](/docs/user-guide/features/mcp) |
| **Cron** | Schedule recurring agent tasks | [Cron](/docs/user-guide/features/cron) |
| **Delegation** | Spawn sub-agents for parallel work | [Delegation](/docs/user-guide/features/delegation) |
| **Code Execution** | Run Python scripts that call Hermes tools programmatically | [Code Execution](/docs/user-guide/features/code-execution) |
| **Browser** | Web browsing and scraping | [Browser](/docs/user-guide/features/browser) |
| **Hooks** | Event-driven callbacks and middleware | [Hooks](/docs/user-guide/features/hooks) |
| **Batch Processing** | Process multiple inputs in bulk | [Batch Processing](/docs/user-guide/features/batch-processing) |
| **RL Training** | Fine-tune models with reinforcement learning | [RL Training](/docs/user-guide/features/rl-training) |
| **Provider Routing** | Route requests across multiple LLM providers | [Provider Routing](/docs/user-guide/features/provider-routing) |
## What to Read Next
Based on where you are right now:
- **Just finished installing?** → Head to the [Quickstart](/docs/getting-started/quickstart) to run your first conversation.
- **Completed the Quickstart?** → Read [CLI Usage](/docs/user-guide/cli) and [Configuration](/docs/user-guide/configuration) to customize your setup.
- **Comfortable with the basics?** → Explore [Tools](/docs/user-guide/features/tools), [Skills](/docs/user-guide/features/skills), and [Memory](/docs/user-guide/features/memory) to unlock the full power of the agent.
- **Setting up for a team?** → Read [Security](/docs/user-guide/security) and [Sessions](/docs/user-guide/sessions) to understand access control and conversation management.
- **Ready to build?** → Jump into the [Developer Guide](/docs/developer-guide/architecture) to understand the internals and start contributing.
- **Want practical examples?** → Check out the [Guides](/docs/guides/tips) section for real-world projects and tips.
:::tip
You don't need to read everything. Pick the path that matches your goal, follow the links in order, and you'll be productive quickly. You can always come back to this page to find your next step.
:::
---
# Updating & Uninstalling
# Updating & Uninstalling
## Updating
Update to the latest version with a single command:
```bash
hermes update
```
This pulls the latest code, updates dependencies, and prompts you to configure any new options that were added since your last update.
:::tip
`hermes update` automatically detects new configuration options and prompts you to add them. If you skipped that prompt, you can manually run `hermes config check` to see missing options, then `hermes config migrate` to interactively add them.
:::
### What happens during an update
When you run `hermes update`, the following steps occur:
1. **Pairing-data snapshot** — a lightweight pre-update state snapshot is saved (covers `~/.hermes/pairing/`, Feishu comment rules, and other state files that get modified at runtime). Rollbackable via `hermes backup restore --state pre-update`.
2. **Git pull** — pulls the latest code from the `main` branch and updates submodules
3. **Dependency install** — runs `uv pip install -e ".[all]"` to pick up new or changed dependencies
4. **Config migration** — detects new config options added since your version and prompts you to set them
5. **Gateway auto-restart** — running gateways are refreshed after the update completes so the new code takes effect immediately. Service-managed gateways (systemd on Linux, launchd on macOS) are restarted through the service manager. Manual gateways are relaunched automatically when Hermes can map the running PID back to a profile.
### Preview-only: `hermes update --check`
Want to know if you're behind `origin/main` before actually pulling? Run `hermes update --check` — it fetches, prints your local commit and the latest remote commit side-by-side, and exits `0` if in sync or `1` if behind. No files are modified, no gateway is restarted. Useful in scripts and cron jobs that gate on "is there an update".
### Full pre-update backup: `--backup`
For high-value profiles (production gateways, shared team installs) you can opt into a full pre-pull backup of `HERMES_HOME` (config, auth, sessions, skills, pairing):
```bash
hermes update --backup
```
Or make it the default for every run:
```yaml
# ~/.hermes/config.yaml
update:
backup: true
```
`--backup` was the always-on behavior in earlier builds, but it was adding minutes to every update on large homes, so it's now opt-in. The lightweight pairing-data snapshot above still runs unconditionally.
Expected output looks like:
```
$ hermes update
Updating Hermes Agent...
📥 Pulling latest code...
Already up to date. (or: Updating abc1234..def5678)
📦 Updating dependencies...
✅ Dependencies updated
🔍 Checking for new config options...
✅ Config is up to date (or: Found 2 new options — running migration...)
🔄 Restarting gateways...
✅ Gateway restarted
✅ Hermes Agent updated successfully!
```
### Recommended Post-Update Validation
`hermes update` handles the main update path, but a quick validation confirms everything landed cleanly:
1. `git status --short` — if the tree is unexpectedly dirty, inspect before continuing
2. `hermes doctor` — checks config, dependencies, and service health
3. `hermes --version` — confirm the version bumped as expected
4. If you use the gateway: `hermes gateway status`
5. If `doctor` reports npm audit issues: run `npm audit fix` in the flagged directory
:::warning Dirty working tree after update
If `git status --short` shows unexpected changes after `hermes update`, stop and inspect them before continuing. This usually means local modifications were reapplied on top of the updated code, or a dependency step refreshed lockfiles.
:::
### If your terminal disconnects mid-update
`hermes update` protects itself against accidental terminal loss:
- The update ignores `SIGHUP`, so closing your SSH session or terminal window no longer kills it mid-install. `pip` and `git` child processes inherit this protection, so the Python environment cannot be left half-installed by a dropped connection.
- All output is mirrored to `~/.hermes/logs/update.log` while the update runs. If your terminal disappears, reconnect and inspect the log to see whether the update finished and whether the gateway restart succeeded:
```bash
tail -f ~/.hermes/logs/update.log
```
- `Ctrl-C` (SIGINT) and system shutdown (SIGTERM) are still honored — those are deliberate cancellations, not accidents.
You no longer need to wrap `hermes update` in `screen` or `tmux` to survive a terminal drop.
### Checking your current version
```bash
hermes version
```
Compare against the latest release at the [GitHub releases page](https://github.com/NousResearch/hermes-agent/releases).
### Updating from Messaging Platforms
You can also update directly from Telegram, Discord, Slack, WhatsApp, or Teams by sending:
```
/update
```
This pulls the latest code, updates dependencies, and restarts running gateways. The bot will briefly go offline during the restart (typically 5–15 seconds) and then resume.
### Manual Update
If you installed manually (not via the quick installer):
```bash
cd /path/to/hermes-agent
export VIRTUAL_ENV="$(pwd)/venv"
# Pull latest code and submodules
git pull origin main
git submodule update --init --recursive
# Reinstall (picks up new dependencies)
uv pip install -e ".[all]"
uv pip install -e "./tinker-atropos"
# Check for new config options
hermes config check
hermes config migrate # Interactively add any missing options
```
### Rollback instructions
If an update introduces a problem, you can roll back to a previous version:
```bash
cd /path/to/hermes-agent
# List recent versions
git log --oneline -10
# Roll back to a specific commit
git checkout
git submodule update --init --recursive
uv pip install -e ".[all]"
# Restart the gateway if running
hermes gateway restart
```
To roll back to a specific release tag:
```bash
git checkout v0.6.0
git submodule update --init --recursive
uv pip install -e ".[all]"
```
:::warning
Rolling back may cause config incompatibilities if new options were added. Run `hermes config check` after rolling back and remove any unrecognized options from `config.yaml` if you encounter errors.
:::
### Note for Nix users
If you installed via Nix flake, updates are managed through the Nix package manager:
```bash
# Update the flake input
nix flake update hermes-agent
# Or rebuild with the latest
nix profile upgrade hermes-agent
```
Nix installations are immutable — rollback is handled by Nix's generation system:
```bash
nix profile rollback
```
See [Nix Setup](./nix-setup.md) for more details.
---
## Uninstalling
```bash
hermes uninstall
```
The uninstaller gives you the option to keep your configuration files (`~/.hermes/`) for a future reinstall.
### Manual Uninstall
```bash
rm -f ~/.local/bin/hermes
rm -rf /path/to/hermes-agent
rm -rf ~/.hermes # Optional — keep if you plan to reinstall
```
:::info
If you installed the gateway as a system service, stop and disable it first:
```bash
hermes gateway stop
# Linux: systemctl --user disable hermes-gateway
# macOS: launchctl remove ai.hermes.gateway
```
:::
---
# Android / Termux
# Hermes on Android with Termux
This is the tested path for running Hermes Agent directly on an Android phone through [Termux](https://termux.dev/).
It gives you a working local CLI on the phone, plus the core extras that are currently known to install cleanly on Android.
## What is supported in the tested path?
The tested Termux bundle installs:
- the Hermes CLI
- cron support
- PTY/background terminal support
- Telegram gateway support (manual / best-effort background runs)
- MCP support
- Honcho memory support
- ACP support
Concretely, it maps to:
```bash
python -m pip install -e '.[termux]' -c constraints-termux.txt
```
## What is not part of the tested path yet?
A few features still need desktop/server-style dependencies that are not published for Android, or have not been validated on phones yet:
- `.[all]` is not supported on Android today
- the `voice` extra is blocked by `faster-whisper -> ctranslate2`, and `ctranslate2` does not publish Android wheels
- automatic browser / Playwright bootstrap is skipped in the Termux installer
- Docker-based terminal isolation is not available inside Termux
- Android may still suspend Termux background jobs, so gateway persistence is best-effort rather than a normal managed service
That does not stop Hermes from working well as a phone-native CLI agent — it just means the recommended mobile install is intentionally narrower than the desktop/server install.
---
## Option 1: One-line installer
Hermes now ships a Termux-aware installer path:
```bash
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
```
On Termux, the installer automatically:
- uses `pkg` for system packages
- creates the venv with `python -m venv`
- installs `.[termux]` with `pip`
- links `hermes` into `$PREFIX/bin` so it stays on your Termux PATH
- skips the untested browser / WhatsApp bootstrap
If you want the explicit commands or need to debug a failed install, use the manual path below.
---
## Option 2: Manual install (fully explicit)
### 1. Update Termux and install system packages
```bash
pkg update
pkg install -y git python clang rust make pkg-config libffi openssl nodejs ripgrep ffmpeg
```
Why these packages?
- `python` — runtime + venv support
- `git` — clone/update the repo
- `clang`, `rust`, `make`, `pkg-config`, `libffi`, `openssl` — needed to build a few Python dependencies on Android
- `nodejs` — optional Node runtime for experiments beyond the tested core path
- `ripgrep` — fast file search
- `ffmpeg` — media / TTS conversions
### 2. Clone Hermes
```bash
git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
```
If you already cloned without submodules:
```bash
git submodule update --init --recursive
```
### 3. Create a virtual environment
```bash
python -m venv venv
source venv/bin/activate
export ANDROID_API_LEVEL="$(getprop ro.build.version.sdk)"
python -m pip install --upgrade pip setuptools wheel
```
`ANDROID_API_LEVEL` is important for Rust / maturin-based packages such as `jiter`.
### 4. Install the tested Termux bundle
```bash
python -m pip install -e '.[termux]' -c constraints-termux.txt
```
If you only want the minimal core agent, this also works:
```bash
python -m pip install -e '.' -c constraints-termux.txt
```
### 5. Put `hermes` on your Termux PATH
```bash
ln -sf "$PWD/venv/bin/hermes" "$PREFIX/bin/hermes"
```
`$PREFIX/bin` is already on PATH in Termux, so this makes the `hermes` command persist across new shells without re-activating the venv every time.
### 6. Verify the install
```bash
hermes version
hermes doctor
```
### 7. Start Hermes
```bash
hermes
```
---
## Recommended follow-up setup
### Configure a model
```bash
hermes model
```
Or set keys directly in `~/.hermes/.env`.
### Re-run the full interactive setup wizard later
```bash
hermes setup
```
### Install optional Node dependencies manually
The tested Termux path skips Node/browser bootstrap on purpose. If you want to experiment with browser tooling later:
```bash
pkg install nodejs-lts
npm install
```
The browser tool automatically includes Termux directories (`/data/data/com.termux/files/usr/bin`) in its PATH search, so `agent-browser` and `npx` are discovered without any extra PATH configuration.
Treat browser / WhatsApp tooling on Android as experimental until documented otherwise.
---
## Troubleshooting
### `No solution found` when installing `.[all]`
Use the tested Termux bundle instead:
```bash
python -m pip install -e '.[termux]' -c constraints-termux.txt
```
The blocker is currently the `voice` extra:
- `voice` pulls `faster-whisper`
- `faster-whisper` depends on `ctranslate2`
- `ctranslate2` does not publish Android wheels
### `uv pip install` fails on Android
Use the Termux path with the stdlib venv + `pip` instead:
```bash
python -m venv venv
source venv/bin/activate
export ANDROID_API_LEVEL="$(getprop ro.build.version.sdk)"
python -m pip install --upgrade pip setuptools wheel
python -m pip install -e '.[termux]' -c constraints-termux.txt
```
### `jiter` / `maturin` complains about `ANDROID_API_LEVEL`
Set the API level explicitly before installing:
```bash
export ANDROID_API_LEVEL="$(getprop ro.build.version.sdk)"
python -m pip install -e '.[termux]' -c constraints-termux.txt
```
### `hermes doctor` says ripgrep or Node is missing
Install them with Termux packages:
```bash
pkg install ripgrep nodejs
```
### Build failures while installing Python packages
Make sure the build toolchain is installed:
```bash
pkg install clang rust make pkg-config libffi openssl
```
Then retry:
```bash
python -m pip install -e '.[termux]' -c constraints-termux.txt
```
---
## Known limitations on phones
- Docker backend is unavailable
- local voice transcription via `faster-whisper` is unavailable in the tested path
- browser automation setup is intentionally skipped by the installer
- some optional extras may work, but only `.[termux]` is currently documented as the tested Android bundle
If you hit a new Android-specific issue, please open a GitHub issue with:
- your Android version
- `termux-info`
- `python --version`
- `hermes doctor`
- the exact install command and full error output
---
# Nix & NixOS Setup
# Nix & NixOS Setup
Hermes Agent ships a Nix flake with three levels of integration:
| Level | Who it's for | What you get |
|-------|-------------|--------------|
| **`nix run` / `nix profile install`** | Any Nix user (macOS, Linux) | Pre-built binary with all deps — then use the standard CLI workflow |
| **NixOS module (native)** | NixOS server deployments | Declarative config, hardened systemd service, managed secrets |
| **NixOS module (container)** | Agents that need self-modification | Everything above, plus a persistent Ubuntu container where the agent can `apt`/`pip`/`npm install` |
:::info What's different from the standard install
The `curl | bash` installer manages Python, Node, and dependencies itself. The Nix flake replaces all of that — every Python dependency is a Nix derivation built by [uv2nix](https://github.com/pyproject-nix/uv2nix), and runtime tools (Node.js, git, ripgrep, ffmpeg) are wrapped into the binary's PATH. There is no runtime pip, no venv activation, no `npm install`.
**For non-NixOS users**, this only changes the install step. Everything after (`hermes setup`, `hermes gateway install`, config editing) works identically to the standard install.
**For NixOS module users**, the entire lifecycle is different: configuration lives in `configuration.nix`, secrets go through sops-nix/agenix, the service is a systemd unit, and CLI config commands are blocked. You manage hermes the same way you manage any other NixOS service.
:::
## Prerequisites
- **Nix with flakes enabled** — [Determinate Nix](https://install.determinate.systems) recommended (enables flakes by default)
- **API keys** for the services you want to use (at minimum: an OpenRouter or Anthropic key)
---
## Quick Start (Any Nix User)
No clone needed. Nix fetches, builds, and runs everything:
```bash
# Run directly (builds on first use, cached after)
nix run github:NousResearch/hermes-agent -- setup
nix run github:NousResearch/hermes-agent -- chat
# Or install persistently
nix profile install github:NousResearch/hermes-agent
hermes setup
hermes chat
```
After `nix profile install`, `hermes`, `hermes-agent`, and `hermes-acp` are on your PATH. From here, the workflow is identical to the [standard installation](./installation.md) — `hermes setup` walks you through provider selection, `hermes gateway install` sets up a launchd (macOS) or systemd user service, and config lives in `~/.hermes/`.
Building from a local clone
```bash
git clone https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
nix build
./result/bin/hermes setup
```
---
## NixOS Module
The flake exports `nixosModules.default` — a full NixOS service module that declaratively manages user creation, directories, config generation, secrets, documents, and service lifecycle.
:::note
This module requires NixOS. For non-NixOS systems (macOS, other Linux distros), use `nix profile install` and the standard CLI workflow above.
:::
### Add the Flake Input
```nix
# /etc/nixos/flake.nix (or your system flake)
{
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
hermes-agent.url = "github:NousResearch/hermes-agent";
};
outputs = { nixpkgs, hermes-agent, ... }: {
nixosConfigurations.your-host = nixpkgs.lib.nixosSystem {
system = "x86_64-linux";
modules = [
hermes-agent.nixosModules.default
./configuration.nix
];
};
};
}
```
### Minimal Configuration
```nix
# configuration.nix
{ config, ... }: {
services.hermes-agent = {
enable = true;
settings.model.default = "anthropic/claude-sonnet-4";
environmentFiles = [ config.sops.secrets."hermes-env".path ];
addToSystemPackages = true;
};
}
```
That's it. `nixos-rebuild switch` creates the `hermes` user, generates `config.yaml`, wires up secrets, and starts the gateway — a long-running service that connects the agent to messaging platforms (Telegram, Discord, etc.) and listens for incoming messages.
:::warning Secrets are required
The `environmentFiles` line above assumes you have [sops-nix](https://github.com/Mic92/sops-nix) or [agenix](https://github.com/ryantm/agenix) configured. The file should contain at least one LLM provider key (e.g., `OPENROUTER_API_KEY=sk-or-...`). See [Secrets Management](#secrets-management) for full setup. If you don't have a secrets manager yet, you can use a plain file as a starting point — just ensure it's not world-readable:
```bash
echo "OPENROUTER_API_KEY=sk-or-your-key" | sudo install -m 0600 -o hermes /dev/stdin /var/lib/hermes/env
```
```nix
services.hermes-agent.environmentFiles = [ "/var/lib/hermes/env" ];
```
:::
:::tip addToSystemPackages
Setting `addToSystemPackages = true` does two things: puts the `hermes` CLI on your system PATH **and** sets `HERMES_HOME` system-wide so the interactive CLI shares state (sessions, skills, cron) with the gateway service. Without it, running `hermes` in your shell creates a separate `~/.hermes/` directory.
:::
### Container-aware CLI
:::info
When `container.enable = true` and `addToSystemPackages = true`, **every** `hermes` command on the host automatically routes into the managed container. This means your interactive CLI session runs inside the same environment as the gateway service — with access to all container-installed packages and tools.
- The routing is transparent: `hermes chat`, `hermes sessions list`, `hermes version`, etc. all exec into the container under the hood
- All CLI flags are forwarded as-is
- If the container isn't running, the CLI retries briefly (5s with a spinner for interactive use, 10s silently for scripts) then fails with a clear error — no silent fallback
- For developers working on the hermes codebase, set `HERMES_DEV=1` to bypass container routing and run the local checkout directly
Set `container.hostUsers` to create a `~/.hermes` symlink to the service state directory, so the host CLI and the container share sessions, config, and memories:
```nix
services.hermes-agent = {
container.enable = true;
container.hostUsers = [ "your-username" ];
addToSystemPackages = true;
};
```
Users listed in `hostUsers` are automatically added to the `hermes` group for file permission access.
**Podman users:** The NixOS service runs the container as root. Docker users get access via the `docker` group socket, but Podman's rootful containers require sudo. Grant passwordless sudo for your container runtime:
```nix
security.sudo.extraRules = [{
users = [ "your-username" ];
commands = [{
command = "/run/current-system/sw/bin/podman";
options = [ "NOPASSWD" ];
}];
}];
```
The CLI auto-detects when sudo is needed and uses it transparently. Without this, you'll need to run `sudo hermes chat` manually.
:::
### Verify It Works
After `nixos-rebuild switch`, check that the service is running:
```bash
# Check service status
systemctl status hermes-agent
# Watch logs (Ctrl+C to stop)
journalctl -u hermes-agent -f
# If addToSystemPackages is true, test the CLI
hermes version
hermes config # shows the generated config
```
### Choosing a Deployment Mode
The module supports two modes, controlled by `container.enable`:
| | **Native** (default) | **Container** |
|---|---|---|
| How it runs | Hardened systemd service on the host | Persistent Ubuntu container with `/nix/store` bind-mounted |
| Security | `NoNewPrivileges`, `ProtectSystem=strict`, `PrivateTmp` | Container isolation, runs as unprivileged user inside |
| Agent can self-install packages | No — only tools on the Nix-provided PATH | Yes — `apt`, `pip`, `npm` installs persist across restarts |
| Config surface | Same | Same |
| When to choose | Standard deployments, maximum security, reproducibility | Agent needs runtime package installation, mutable environment, experimental tools |
To enable container mode, add one line:
```nix
{
services.hermes-agent = {
enable = true;
container.enable = true;
# ... rest of config is identical
};
}
```
:::info
Container mode auto-enables `virtualisation.docker.enable` via `mkDefault`. If you use Podman instead, set `container.backend = "podman"` and `virtualisation.docker.enable = false`.
:::
---
## Configuration
### Declarative Settings
The `settings` option accepts an arbitrary attrset that is rendered as `config.yaml`. It supports deep merging across multiple module definitions (via `lib.recursiveUpdate`), so you can split config across files:
```nix
# base.nix
services.hermes-agent.settings = {
model.default = "anthropic/claude-sonnet-4";
toolsets = [ "all" ];
terminal = { backend = "local"; timeout = 180; };
};
# personality.nix
services.hermes-agent.settings = {
display = { compact = false; personality = "kawaii"; };
memory = { memory_enabled = true; user_profile_enabled = true; };
};
```
Both are deep-merged at evaluation time. Nix-declared keys always win over keys in an existing `config.yaml` on disk, but **user-added keys that Nix doesn't touch are preserved**. This means if the agent or a manual edit adds keys like `skills.disabled` or `streaming.enabled`, they survive `nixos-rebuild switch`.
:::note Model naming
`settings.model.default` uses the model identifier your provider expects. With [OpenRouter](https://openrouter.ai) (the default), these look like `"anthropic/claude-sonnet-4"` or `"google/gemini-3-flash"`. If you're using a provider directly (Anthropic, OpenAI), set `settings.model.base_url` to point at their API and use their native model IDs (e.g., `"claude-sonnet-4-20250514"`). When no `base_url` is set, Hermes defaults to OpenRouter.
:::
:::tip Discovering available config keys
Run `nix build .#configKeys && cat result` to see every leaf config key extracted from Python's `DEFAULT_CONFIG`. You can paste your existing `config.yaml` into the `settings` attrset — the structure maps 1:1.
:::
Full example: all commonly customized settings
```nix
{ config, ... }: {
services.hermes-agent = {
enable = true;
container.enable = true;
# ── Model ──────────────────────────────────────────────────────────
settings = {
model = {
base_url = "https://openrouter.ai/api/v1";
default = "anthropic/claude-opus-4.6";
};
toolsets = [ "all" ];
max_turns = 100;
terminal = { backend = "local"; cwd = "."; timeout = 180; };
compression = {
enabled = true;
threshold = 0.85;
summary_model = "google/gemini-3-flash-preview";
};
memory = { memory_enabled = true; user_profile_enabled = true; };
display = { compact = false; personality = "kawaii"; };
agent = { max_turns = 60; verbose = false; };
};
# ── Secrets ────────────────────────────────────────────────────────
environmentFiles = [ config.sops.secrets."hermes-env".path ];
# ── Documents ──────────────────────────────────────────────────────
documents = {
"USER.md" = ./documents/USER.md;
};
# ── MCP Servers ────────────────────────────────────────────────────
mcpServers.filesystem = {
command = "npx";
args = [ "-y" "@modelcontextprotocol/server-filesystem" "/data/workspace" ];
};
# ── Container options ──────────────────────────────────────────────
container = {
image = "ubuntu:24.04";
backend = "docker";
hostUsers = [ "your-username" ];
extraVolumes = [ "/home/user/projects:/projects:rw" ];
extraOptions = [ "--gpus" "all" ];
};
# ── Service tuning ─────────────────────────────────────────────────
addToSystemPackages = true;
extraArgs = [ "--verbose" ];
restart = "always";
restartSec = 5;
};
}
```
### Escape Hatch: Bring Your Own Config
If you'd rather manage `config.yaml` entirely outside Nix, use `configFile`:
```nix
services.hermes-agent.configFile = /etc/hermes/config.yaml;
```
This bypasses `settings` entirely — no merge, no generation. The file is copied as-is to `$HERMES_HOME/config.yaml` on each activation.
### Customization Cheatsheet
Quick reference for the most common things Nix users want to customize:
| I want to... | Option | Example |
|---|---|---|
| Change the LLM model | `settings.model.default` | `"anthropic/claude-sonnet-4"` |
| Use a different provider endpoint | `settings.model.base_url` | `"https://openrouter.ai/api/v1"` |
| Add API keys | `environmentFiles` | `[ config.sops.secrets."hermes-env".path ]` |
| Give the agent a personality | `${services.hermes-agent.stateDir}/.hermes/SOUL.md` | manage the file directly |
| Add MCP tool servers | `mcpServers.` | See [MCP Servers](#mcp-servers) |
| Mount host directories into container | `container.extraVolumes` | `[ "/data:/data:rw" ]` |
| Pass GPU access to container | `container.extraOptions` | `[ "--gpus" "all" ]` |
| Use Podman instead of Docker | `container.backend` | `"podman"` |
| Share state between host CLI and container | `container.hostUsers` | `[ "sidbin" ]` |
| Make extra tools available to the agent | `extraPackages` | `[ pkgs.pandoc pkgs.imagemagick ]` |
| Use a custom base image | `container.image` | `"ubuntu:24.04"` |
| Override the hermes package | `package` | `inputs.hermes-agent.packages.${system}.default.override { ... }` |
| Change state directory | `stateDir` | `"/opt/hermes"` |
| Set the agent's working directory | `workingDirectory` | `"/home/user/projects"` |
---
## Secrets Management
:::danger Never put API keys in `settings` or `environment`
Values in Nix expressions end up in `/nix/store`, which is world-readable. Always use `environmentFiles` with a secrets manager.
:::
Both `environment` (non-secret vars) and `environmentFiles` (secret files) are merged into `$HERMES_HOME/.env` at activation time (`nixos-rebuild switch`). Hermes reads this file on every startup, so changes take effect with a `systemctl restart hermes-agent` — no container recreation needed.
### sops-nix
```nix
{
sops = {
defaultSopsFile = ./secrets/hermes.yaml;
age.keyFile = "/home/user/.config/sops/age/keys.txt";
secrets."hermes-env" = { format = "yaml"; };
};
services.hermes-agent.environmentFiles = [
config.sops.secrets."hermes-env".path
];
}
```
The secrets file contains key-value pairs:
```yaml
# secrets/hermes.yaml (encrypted with sops)
hermes-env: |
OPENROUTER_API_KEY=sk-or-...
TELEGRAM_BOT_TOKEN=123456:ABC...
ANTHROPIC_API_KEY=sk-ant-...
```
### agenix
```nix
{
age.secrets.hermes-env.file = ./secrets/hermes-env.age;
services.hermes-agent.environmentFiles = [
config.age.secrets.hermes-env.path
];
}
```
### OAuth / Auth Seeding
For platforms requiring OAuth (e.g., Discord), use `authFile` to seed credentials on first deploy:
```nix
{
services.hermes-agent = {
authFile = config.sops.secrets."hermes/auth.json".path;
# authFileForceOverwrite = true; # overwrite on every activation
};
}
```
The file is only copied if `auth.json` doesn't already exist (unless `authFileForceOverwrite = true`). Runtime OAuth token refreshes are written to the state directory and preserved across rebuilds.
---
## Documents
The `documents` option installs files into the agent's working directory (the `workingDirectory`, which the agent reads as its workspace). Hermes looks for specific filenames by convention:
- **`USER.md`** — context about the user the agent is interacting with.
- Any other files you place here are visible to the agent as workspace files.
The agent identity file is separate: Hermes loads its primary `SOUL.md` from `$HERMES_HOME/SOUL.md`, which in the NixOS module is `${services.hermes-agent.stateDir}/.hermes/SOUL.md`. Putting `SOUL.md` in `documents` only creates a workspace file and will not replace the main persona file.
```nix
{
services.hermes-agent.documents = {
"USER.md" = ./documents/USER.md; # path reference, copied from Nix store
};
}
```
Values can be inline strings or path references. Files are installed on every `nixos-rebuild switch`.
---
## MCP Servers
The `mcpServers` option declaratively configures [MCP (Model Context Protocol)](https://modelcontextprotocol.io) servers. Each server uses either **stdio** (local command) or **HTTP** (remote URL) transport.
### Stdio Transport (Local Servers)
```nix
{
services.hermes-agent.mcpServers = {
filesystem = {
command = "npx";
args = [ "-y" "@modelcontextprotocol/server-filesystem" "/data/workspace" ];
};
github = {
command = "npx";
args = [ "-y" "@modelcontextprotocol/server-github" ];
env.GITHUB_PERSONAL_ACCESS_TOKEN = "\${GITHUB_TOKEN}"; # resolved from .env
};
};
}
```
:::tip
Environment variables in `env` values are resolved from `$HERMES_HOME/.env` at runtime. Use `environmentFiles` to inject secrets — never put tokens directly in Nix config.
:::
### HTTP Transport (Remote Servers)
```nix
{
services.hermes-agent.mcpServers.remote-api = {
url = "https://mcp.example.com/v1/mcp";
headers.Authorization = "Bearer \${MCP_REMOTE_API_KEY}";
timeout = 180;
};
}
```
### HTTP Transport with OAuth
Set `auth = "oauth"` for servers using OAuth 2.1. Hermes implements the full PKCE flow — metadata discovery, dynamic client registration, token exchange, and automatic refresh.
```nix
{
services.hermes-agent.mcpServers.my-oauth-server = {
url = "https://mcp.example.com/mcp";
auth = "oauth";
};
}
```
Tokens are stored in `$HERMES_HOME/mcp-tokens/.json` and persist across restarts and rebuilds.
Initial OAuth authorization on headless servers
The first OAuth authorization requires a browser-based consent flow. In a headless deployment, Hermes prints the authorization URL to stdout/logs instead of opening a browser.
**Option A: Interactive bootstrap** — run the flow once via `docker exec` (container) or `sudo -u hermes` (native):
```bash
# Container mode
docker exec -it hermes-agent \
hermes mcp add my-oauth-server --url https://mcp.example.com/mcp --auth oauth
# Native mode
sudo -u hermes HERMES_HOME=/var/lib/hermes/.hermes \
hermes mcp add my-oauth-server --url https://mcp.example.com/mcp --auth oauth
```
The container uses `--network=host`, so the OAuth callback listener on `127.0.0.1` is reachable from the host browser.
**Option B: Pre-seed tokens** — complete the flow on a workstation, then copy tokens:
```bash
hermes mcp add my-oauth-server --url https://mcp.example.com/mcp --auth oauth
scp ~/.hermes/mcp-tokens/my-oauth-server{,.client}.json \
server:/var/lib/hermes/.hermes/mcp-tokens/
# Ensure: chown hermes:hermes, chmod 0600
```
### Sampling (Server-Initiated LLM Requests)
Some MCP servers can request LLM completions from the agent:
```nix
{
services.hermes-agent.mcpServers.analysis = {
command = "npx";
args = [ "-y" "analysis-server" ];
sampling = {
enabled = true;
model = "google/gemini-3-flash";
max_tokens_cap = 4096;
timeout = 30;
max_rpm = 10;
};
};
}
```
---
## Managed Mode
When hermes runs via the NixOS module, the following CLI commands are **blocked** with a descriptive error pointing you to `configuration.nix`:
| Blocked command | Why |
|---|---|
| `hermes setup` | Config is declarative — edit `settings` in your Nix config |
| `hermes config edit` | Config is generated from `settings` |
| `hermes config set ` | Config is generated from `settings` |
| `hermes gateway install` | The systemd service is managed by NixOS |
| `hermes gateway uninstall` | The systemd service is managed by NixOS |
This prevents drift between what Nix declares and what's on disk. Detection uses two signals:
1. **`HERMES_MANAGED=true`** environment variable — set by the systemd service, visible to the gateway process
2. **`.managed` marker file** in `HERMES_HOME` — set by the activation script, visible to interactive shells (e.g., `docker exec -it hermes-agent hermes config set ...` is also blocked)
To change configuration, edit your Nix config and run `sudo nixos-rebuild switch`.
---
## Container Architecture
:::info
This section is only relevant if you're using `container.enable = true`. Skip it for native mode deployments.
:::
When container mode is enabled, hermes runs inside a persistent Ubuntu container with the Nix-built binary bind-mounted read-only from the host:
```
Host Container
──── ─────────
/nix/store/...-hermes-agent-0.1.0 ──► /nix/store/... (ro)
~/.hermes -> /var/lib/hermes/.hermes (symlink bridge, per hostUsers)
/var/lib/hermes/ ──► /data/ (rw)
├── current-package -> /nix/store/... (symlink, updated each rebuild)
├── .gc-root -> /nix/store/... (prevents nix-collect-garbage)
├── .container-identity (sha256 hash, triggers recreation)
├── .hermes/ (HERMES_HOME)
│ ├── .env (merged from environment + environmentFiles)
│ ├── config.yaml (Nix-generated, deep-merged by activation)
│ ├── .managed (marker file)
│ ├── .container-mode (routing metadata: backend, exec_user, etc.)
│ ├── state.db, sessions/, memories/ (runtime state)
│ └── mcp-tokens/ (OAuth tokens for MCP servers)
├── home/ ──► /home/hermes (rw)
└── workspace/ (MESSAGING_CWD)
├── SOUL.md (from documents option)
└── (agent-created files)
Container writable layer (apt/pip/npm): /usr, /usr/local, /tmp
```
The Nix-built binary works inside the Ubuntu container because `/nix/store` is bind-mounted — it brings its own interpreter and all dependencies, so there's no reliance on the container's system libraries. The container entrypoint resolves through a `current-package` symlink: `/data/current-package/bin/hermes gateway run --replace`. On `nixos-rebuild switch`, only the symlink is updated — the container keeps running.
### What Persists Across What
| Event | Container recreated? | `/data` (state) | `/home/hermes` | Writable layer (`apt`/`pip`/`npm`) |
|---|---|---|---|---|
| `systemctl restart hermes-agent` | No | Persists | Persists | Persists |
| `nixos-rebuild switch` (code change) | No (symlink updated) | Persists | Persists | Persists |
| Host reboot | No | Persists | Persists | Persists |
| `nix-collect-garbage` | No (GC root) | Persists | Persists | Persists |
| Image change (`container.image`) | **Yes** | Persists | Persists | **Lost** |
| Volume/options change | **Yes** | Persists | Persists | **Lost** |
| `environment`/`environmentFiles` change | No | Persists | Persists | Persists |
The container is only recreated when its **identity hash** changes. The hash covers: schema version, image, `extraVolumes`, `extraOptions`, and the entrypoint script. Changes to environment variables, settings, documents, or the hermes package itself do **not** trigger recreation.
:::warning Writable layer loss
When the identity hash changes (image upgrade, new volumes, new container options), the container is destroyed and recreated from a fresh pull of `container.image`. Any `apt install`, `pip install`, or `npm install` packages in the writable layer are lost. State in `/data` and `/home/hermes` is preserved (these are bind mounts).
If the agent relies on specific packages, consider baking them into a custom image (`container.image = "my-registry/hermes-base:latest"`) or scripting their installation in the agent's SOUL.md.
:::
### GC Root Protection
The `preStart` script creates a GC root at `${stateDir}/.gc-root` pointing to the current hermes package. This prevents `nix-collect-garbage` from removing the running binary. If the GC root somehow breaks, restarting the service recreates it.
---
## Plugins
The NixOS module supports declarative plugin installation — no imperative `hermes plugins install` needed.
### Directory Plugins (`extraPlugins`)
For plugins that are just a source tree with `plugin.yaml` + `__init__.py` (e.g., [hermes-lcm](https://github.com/stephenschoettler/hermes-lcm)):
```nix
services.hermes-agent.extraPlugins = [
(pkgs.fetchFromGitHub {
owner = "stephenschoettler";
repo = "hermes-lcm";
rev = "v0.7.0";
hash = "sha256-...";
})
];
```
Plugins are symlinked into `$HERMES_HOME/plugins/` at activation time. Hermes discovers them via its normal directory scan. Removing a plugin from the list and running `nixos-rebuild switch` removes the symlink.
### Entry-Point Plugins (`extraPythonPackages`)
For pip-packaged plugins that register via `[project.entry-points."hermes_agent.plugins"]` (e.g., [rtk-hermes](https://github.com/ogallotti/rtk-hermes)):
```nix
services.hermes-agent.extraPythonPackages = [
(pkgs.python312Packages.buildPythonPackage {
pname = "rtk-hermes";
version = "1.0.0";
src = pkgs.fetchFromGitHub {
owner = "ogallotti";
repo = "rtk-hermes";
rev = "v1.0.0";
hash = "sha256-...";
};
format = "pyproject";
build-system = [ pkgs.python312Packages.setuptools ];
})
];
```
The package's `site-packages` is added to PYTHONPATH in the hermes wrapper. `importlib.metadata` discovers the entry point at session start.
### Combining Both
A directory plugin with third-party Python dependencies needs both options:
```nix
services.hermes-agent = {
extraPlugins = [ my-plugin-src ]; # plugin source
extraPythonPackages = [ pkgs.python312Packages.redis ]; # its Python dep
extraPackages = [ pkgs.redis ]; # system binary it needs
};
```
### Using the Overlay
External flakes can override the package directly:
```nix
{
inputs.hermes-agent.url = "github:NousResearch/hermes-agent";
outputs = { hermes-agent, nixpkgs, ... }: {
nixpkgs.overlays = [ hermes-agent.overlays.default ];
# Then: pkgs.hermes-agent.override { extraPythonPackages = [...]; }
};
}
```
### Plugin Configuration
Plugins still need to be enabled in `config.yaml`. Add them via the declarative settings:
```nix
services.hermes-agent.settings.plugins.enabled = [
"hermes-lcm"
"rtk-rewrite"
];
```
:::note
A build-time collision check prevents plugin packages from shadowing core hermes dependencies. If a plugin provides a package already in the sealed venv, `nixos-rebuild` fails with a clear error.
:::
---
## Development
### Dev Shell
The flake provides a development shell with Python 3.11, uv, Node.js, and all runtime tools:
```bash
cd hermes-agent
nix develop
# Shell provides:
# - Python 3.11 + uv (deps installed into .venv on first entry)
# - Node.js 20, ripgrep, git, openssh, ffmpeg on PATH
# - Stamp-file optimization: re-entry is near-instant if deps haven't changed
hermes setup
hermes chat
```
### direnv (Recommended)
The included `.envrc` activates the dev shell automatically:
```bash
cd hermes-agent
direnv allow # one-time
# Subsequent entries are near-instant (stamp file skips dep install)
```
### Flake Checks
The flake includes build-time verification that runs in CI and locally:
```bash
# Run all checks
nix flake check
# Individual checks
nix build .#checks.x86_64-linux.package-contents # binaries exist + version
nix build .#checks.x86_64-linux.entry-points-sync # pyproject.toml ↔ Nix package sync
nix build .#checks.x86_64-linux.cli-commands # gateway/config subcommands
nix build .#checks.x86_64-linux.managed-guard # HERMES_MANAGED blocks mutation
nix build .#checks.x86_64-linux.bundled-skills # skills present in package
nix build .#checks.x86_64-linux.config-roundtrip # merge script preserves user keys
```
What each check verifies
| Check | What it tests |
|---|---|
| `package-contents` | `hermes` and `hermes-agent` binaries exist and `hermes version` runs |
| `entry-points-sync` | Every `[project.scripts]` entry in `pyproject.toml` has a wrapped binary in the Nix package |
| `cli-commands` | `hermes --help` exposes `gateway` and `config` subcommands |
| `managed-guard` | `HERMES_MANAGED=true hermes config set ...` prints the NixOS error |
| `bundled-skills` | Skills directory exists, contains SKILL.md files, `HERMES_BUNDLED_SKILLS` is set in wrapper |
| `config-roundtrip` | 7 merge scenarios: fresh install, Nix override, user key preservation, mixed merge, MCP additive merge, nested deep merge, idempotency |
---
## Options Reference
### Core
| Option | Type | Default | Description |
|---|---|---|---|
| `enable` | `bool` | `false` | Enable the hermes-agent service |
| `package` | `package` | `hermes-agent` | The hermes-agent package to use |
| `user` | `str` | `"hermes"` | System user |
| `group` | `str` | `"hermes"` | System group |
| `createUser` | `bool` | `true` | Auto-create user/group |
| `stateDir` | `str` | `"/var/lib/hermes"` | State directory (`HERMES_HOME` parent) |
| `workingDirectory` | `str` | `"${stateDir}/workspace"` | Agent working directory (`MESSAGING_CWD`) |
| `addToSystemPackages` | `bool` | `false` | Add `hermes` CLI to system PATH and set `HERMES_HOME` system-wide |
### Configuration
| Option | Type | Default | Description |
|---|---|---|---|
| `settings` | `attrs` (deep-merged) | `{}` | Declarative config rendered as `config.yaml`. Supports arbitrary nesting; multiple definitions are merged via `lib.recursiveUpdate` |
| `configFile` | `null` or `path` | `null` | Path to an existing `config.yaml`. Overrides `settings` entirely if set |
### Secrets & Environment
| Option | Type | Default | Description |
|---|---|---|---|
| `environmentFiles` | `listOf str` | `[]` | Paths to env files with secrets. Merged into `$HERMES_HOME/.env` at activation time |
| `environment` | `attrsOf str` | `{}` | Non-secret env vars. **Visible in Nix store** — do not put secrets here |
| `authFile` | `null` or `path` | `null` | OAuth credentials seed. Only copied on first deploy |
| `authFileForceOverwrite` | `bool` | `false` | Always overwrite `auth.json` from `authFile` on activation |
### Documents
| Option | Type | Default | Description |
|---|---|---|---|
| `documents` | `attrsOf (either str path)` | `{}` | Workspace files. Keys are filenames, values are inline strings or paths. Installed into `workingDirectory` on activation |
### MCP Servers
| Option | Type | Default | Description |
|---|---|---|---|
| `mcpServers` | `attrsOf submodule` | `{}` | MCP server definitions, merged into `settings.mcp_servers` |
| `mcpServers..command` | `null` or `str` | `null` | Server command (stdio transport) |
| `mcpServers..args` | `listOf str` | `[]` | Command arguments |
| `mcpServers..env` | `attrsOf str` | `{}` | Environment variables for the server process |
| `mcpServers..url` | `null` or `str` | `null` | Server endpoint URL (HTTP/StreamableHTTP transport) |
| `mcpServers..headers` | `attrsOf str` | `{}` | HTTP headers, e.g. `Authorization` |
| `mcpServers..auth` | `null` or `"oauth"` | `null` | Authentication method. `"oauth"` enables OAuth 2.1 PKCE |
| `mcpServers..enabled` | `bool` | `true` | Enable or disable this server |
| `mcpServers..timeout` | `null` or `int` | `null` | Tool call timeout in seconds (default: 120) |
| `mcpServers..connect_timeout` | `null` or `int` | `null` | Connection timeout in seconds (default: 60) |
| `mcpServers..tools` | `null` or `submodule` | `null` | Tool filtering (`include`/`exclude` lists) |
| `mcpServers..sampling` | `null` or `submodule` | `null` | Sampling config for server-initiated LLM requests |
### Service Behavior
| Option | Type | Default | Description |
|---|---|---|---|
| `extraArgs` | `listOf str` | `[]` | Extra args for `hermes gateway` |
| `extraPackages` | `listOf package` | `[]` | Extra packages available to the agent. Added to the hermes user's per-user profile so terminal commands, skills, and cron jobs all see them |
| `extraPlugins` | `listOf package` | `[]` | Directory plugin packages to symlink into `$HERMES_HOME/plugins/`. Each must contain `plugin.yaml` |
| `extraPythonPackages` | `listOf package` | `[]` | Python packages added to PYTHONPATH for entry-point plugin discovery. Build with `python312Packages` |
| `restart` | `str` | `"always"` | systemd `Restart=` policy |
| `restartSec` | `int` | `5` | systemd `RestartSec=` value |
### Container
| Option | Type | Default | Description |
|---|---|---|---|
| `container.enable` | `bool` | `false` | Enable OCI container mode |
| `container.backend` | `enum ["docker" "podman"]` | `"docker"` | Container runtime |
| `container.image` | `str` | `"ubuntu:24.04"` | Base image (pulled at runtime) |
| `container.extraVolumes` | `listOf str` | `[]` | Extra volume mounts (`host:container:mode`) |
| `container.extraOptions` | `listOf str` | `[]` | Extra args passed to `docker create` |
| `container.hostUsers` | `listOf str` | `[]` | Interactive users who get a `~/.hermes` symlink to the service stateDir and are auto-added to the `hermes` group |
---
## Directory Layout
### Native Mode
```
/var/lib/hermes/ # stateDir (owned by hermes:hermes, 0750)
├── .hermes/ # HERMES_HOME
│ ├── config.yaml # Nix-generated (deep-merged each rebuild)
│ ├── .managed # Marker: CLI config mutation blocked
│ ├── .env # Merged from environment + environmentFiles
│ ├── auth.json # OAuth credentials (seeded, then self-managed)
│ ├── gateway.pid
│ ├── state.db
│ ├── mcp-tokens/ # OAuth tokens for MCP servers
│ ├── sessions/
│ ├── memories/
│ ├── skills/
│ ├── cron/
│ └── logs/
├── home/ # Agent HOME
└── workspace/ # MESSAGING_CWD
├── SOUL.md # From documents option
└── (agent-created files)
```
### Container Mode
Same layout, mounted into the container:
| Container path | Host path | Mode | Notes |
|---|---|---|---|
| `/nix/store` | `/nix/store` | `ro` | Hermes binary + all Nix deps |
| `/data` | `/var/lib/hermes` | `rw` | All state, config, workspace |
| `/home/hermes` | `${stateDir}/home` | `rw` | Persistent agent home — `pip install --user`, tool caches |
| `/usr`, `/usr/local`, `/tmp` | (writable layer) | `rw` | `apt`/`pip`/`npm` installs — persists across restarts, lost on recreation |
---
## Updating
```bash
# Update the flake input
nix flake update hermes-agent --flake /etc/nixos
# Rebuild
sudo nixos-rebuild switch
```
In container mode, the `current-package` symlink is updated and the agent picks up the new binary on restart. No container recreation, no loss of installed packages.
---
## Troubleshooting
:::tip Podman users
All `docker` commands below work the same with `podman`. Substitute accordingly if you set `container.backend = "podman"`.
:::
### Service Logs
```bash
# Both modes use the same systemd unit
journalctl -u hermes-agent -f
# Container mode: also available directly
docker logs -f hermes-agent
```
### Container Inspection
```bash
systemctl status hermes-agent
docker ps -a --filter name=hermes-agent
docker inspect hermes-agent --format='{{.State.Status}}'
docker exec -it hermes-agent bash
docker exec hermes-agent readlink /data/current-package
docker exec hermes-agent cat /data/.container-identity
```
### Force Container Recreation
If you need to reset the writable layer (fresh Ubuntu):
```bash
sudo systemctl stop hermes-agent
docker rm -f hermes-agent
sudo rm /var/lib/hermes/.container-identity
sudo systemctl start hermes-agent
```
### Verify Secrets Are Loaded
If the agent starts but can't authenticate with the LLM provider, check that the `.env` file was merged correctly:
```bash
# Native mode
sudo -u hermes cat /var/lib/hermes/.hermes/.env
# Container mode
docker exec hermes-agent cat /data/.hermes/.env
```
### GC Root Verification
```bash
nix-store --query --roots $(docker exec hermes-agent readlink /data/current-package)
```
### Common Issues
| Symptom | Cause | Fix |
|---|---|---|
| `Cannot save configuration: managed by NixOS` | CLI guards active | Edit `configuration.nix` and `nixos-rebuild switch` |
| Container recreated unexpectedly | `extraVolumes`, `extraOptions`, or `image` changed | Expected — writable layer resets. Reinstall packages or use a custom image |
| `hermes version` shows old version | Container not restarted | `systemctl restart hermes-agent` |
| Permission denied on `/var/lib/hermes` | State dir is `0750 hermes:hermes` | Use `docker exec` or `sudo -u hermes` |
| `nix-collect-garbage` removed hermes | GC root missing | Restart the service (preStart recreates the GC root) |
| `no container with name or ID "hermes-agent"` (Podman) | Podman rootful container not visible to regular user | Add passwordless sudo for podman (see [Container Mode](#container-mode) section) |
| `unable to find user hermes` | Container still starting (entrypoint hasn't created user yet) | Wait a few seconds and retry — the CLI retries automatically |
| Tool added via `extraPackages` not found in terminal | Requires `nixos-rebuild switch` to update the per-user profile | Rebuild and restart: `nixos-rebuild switch && systemctl restart hermes-agent` |
---
# CLI Interface
# CLI Interface
Hermes Agent's CLI is a full terminal user interface (TUI) — not a web UI. It features multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output. Built for people who live in the terminal.
:::tip
Hermes also ships a modern TUI with modal overlays, mouse selection, and non-blocking input. Launch it with `hermes --tui` — see the [TUI](tui.md) guide.
:::
## Running the CLI
```bash
# Start an interactive session (default)
hermes
# Single query mode (non-interactive)
hermes chat -q "Hello"
# With a specific model
hermes chat --model "anthropic/claude-sonnet-4"
# With a specific provider
hermes chat --provider nous # Use Nous Portal
hermes chat --provider openrouter # Force OpenRouter
# With specific toolsets
hermes chat --toolsets "web,terminal,skills"
# Start with one or more skills preloaded
hermes -s hermes-agent-dev,github-auth
hermes chat -s github-pr-workflow -q "open a draft PR"
# Resume previous sessions
hermes --continue # Resume the most recent CLI session (-c)
hermes --resume # Resume a specific session by ID (-r)
# Verbose mode (debug output)
hermes chat --verbose
# Isolated git worktree (for running multiple agents in parallel)
hermes -w # Interactive mode in worktree
hermes -w -q "Fix issue #123" # Single query in worktree
```
## Interface Layout
The Hermes CLI banner, conversation stream, and fixed input prompt rendered as a stable docs figure instead of fragile text art.
The welcome banner shows your model, terminal backend, working directory, available tools, and installed skills at a glance.
### Status Bar
A persistent status bar sits above the input area, updating in real time:
```
⚕ claude-sonnet-4-20250514 │ 12.4K/200K │ [██████░░░░] 6% │ $0.06 │ 15m
```
| Element | Description |
|---------|-------------|
| Model name | Current model (truncated if longer than 26 chars) |
| Token count | Context tokens used / max context window |
| Context bar | Visual fill indicator with color-coded thresholds |
| Cost | Estimated session cost (or `n/a` for unknown/zero-priced models) |
| Duration | Elapsed session time |
The bar adapts to terminal width — full layout at ≥ 76 columns, compact at 52–75, minimal (model + duration only) below 52.
**Context color coding:**
| Color | Threshold | Meaning |
|-------|-----------|---------|
| Green | < 50% | Plenty of room |
| Yellow | 50–80% | Getting full |
| Orange | 80–95% | Approaching limit |
| Red | ≥ 95% | Near overflow — consider `/compress` |
Use `/usage` for a detailed breakdown including per-category costs (input vs output tokens).
### Session Resume Display
When resuming a previous session (`hermes -c` or `hermes --resume `), a "Previous Conversation" panel appears between the banner and the input prompt, showing a compact recap of the conversation history. See [Sessions — Conversation Recap on Resume](sessions.md#conversation-recap-on-resume) for details and configuration.
## Keybindings
| Key | Action |
|-----|--------|
| `Enter` | Send message |
| `Alt+Enter` or `Ctrl+J` | New line (multi-line input) |
| `Alt+V` | Paste an image from the clipboard when supported by the terminal |
| `Ctrl+V` | Paste text and opportunistically attach clipboard images |
| `Ctrl+B` | Start/stop voice recording when voice mode is enabled (`voice.record_key`, default: `ctrl+b`) |
| `Ctrl+G` | Open the current input buffer in `$EDITOR` (vim/nvim/nano/VS Code/etc.). Save and quit to send the edited text as the next prompt — ideal for long, multi-paragraph prompts. |
| `Ctrl+X Ctrl+E` | Emacs-style alternate binding for the external editor (same behavior as `Ctrl+G`). |
| `Ctrl+C` | Interrupt agent (double-press within 2s to force exit) |
| `Ctrl+D` | Exit |
| `Ctrl+Z` | Suspend Hermes to background (Unix only). Run `fg` in the shell to resume. |
| `Tab` | Accept auto-suggestion (ghost text) or autocomplete slash commands |
**Multiline paste preview.** When you paste a multi-line block, the CLI echoes a compact single-line preview (`[pasted: 47 lines, 1,842 chars — press Enter to send]`) instead of dumping the whole payload into the scrollback. The full content is still what gets sent; this is just display polish.
**Markdown stripping in final responses.** The CLI strips the most verbose markdown fences and `**bold**` / `*italic*` wrappers from *final* agent replies so they render as readable terminal prose rather than raw source. Code blocks and lists are preserved. This does not affect gateway platforms or tool results — they keep their markdown for native rendering.
## Slash Commands
Type `/` to see the autocomplete dropdown. Hermes supports a large set of CLI slash commands, dynamic skill commands, and user-defined quick commands.
Common examples:
| Command | Description |
|---------|-------------|
| `/help` | Show command help |
| `/model` | Show or change the current model |
| `/tools` | List currently available tools |
| `/skills browse` | Browse the skills hub and official optional skills |
| `/background ` | Run a prompt in a separate background session |
| `/skin` | Show or switch the active CLI skin |
| `/voice on` | Enable CLI voice mode (press `Ctrl+B` to record) |
| `/voice tts` | Toggle spoken playback for Hermes replies |
| `/reasoning high` | Increase reasoning effort |
| `/title My Session` | Name the current session |
For the full built-in CLI and messaging lists, see [Slash Commands Reference](../reference/slash-commands.md).
For setup, providers, silence tuning, and messaging/Discord voice usage, see [Voice Mode](features/voice-mode.md).
:::tip
Commands are case-insensitive — `/HELP` works the same as `/help`. Installed skills also become slash commands automatically.
:::
## Quick Commands
You can define custom commands that run shell commands instantly without invoking the LLM. These work in both the CLI and messaging platforms (Telegram, Discord, etc.).
```yaml
# ~/.hermes/config.yaml
quick_commands:
status:
type: exec
command: systemctl status hermes-agent
gpu:
type: exec
command: nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv,noheader
restart:
type: alias
target: /gateway restart
```
Then type `/status`, `/gpu`, or `/restart` in any chat. See the [Configuration guide](/docs/user-guide/configuration#quick-commands) for more examples.
## Preloading Skills at Launch
If you already know which skills you want active for the session, pass them at launch time:
```bash
hermes -s hermes-agent-dev,github-auth
hermes chat -s github-pr-workflow -s github-auth
```
Hermes loads each named skill into the session prompt before the first turn. The same flag works in interactive mode and single-query mode.
## Skill Slash Commands
Every installed skill in `~/.hermes/skills/` is automatically registered as a slash command. The skill name becomes the command:
```
/gif-search funny cats
/axolotl help me fine-tune Llama 3 on my dataset
/github-pr-workflow create a PR for the auth refactor
# Just the skill name loads it and lets the agent ask what you need:
/excalidraw
```
## Personalities
Set a predefined personality to change the agent's tone:
```
/personality pirate
/personality kawaii
/personality concise
```
Built-in personalities include: `helpful`, `concise`, `technical`, `creative`, `teacher`, `kawaii`, `catgirl`, `pirate`, `shakespeare`, `surfer`, `noir`, `uwu`, `philosopher`, `hype`.
You can also define custom personalities in `~/.hermes/config.yaml`:
```yaml
personalities:
helpful: "You are a helpful, friendly AI assistant."
kawaii: "You are a kawaii assistant! Use cute expressions..."
pirate: "Arrr! Ye be talkin' to Captain Hermes..."
# Add your own!
```
## Multi-line Input
There are two ways to enter multi-line messages:
1. **`Alt+Enter` or `Ctrl+J`** — inserts a new line
2. **Backslash continuation** — end a line with `\` to continue:
```
❯ Write a function that:\
1. Takes a list of numbers\
2. Returns the sum
```
:::info
Pasting multi-line text is supported — use `Alt+Enter` or `Ctrl+J` to insert newlines, or simply paste content directly.
:::
## Interrupting the Agent
You can interrupt the agent at any point:
- **Type a new message + Enter** while the agent is working — it interrupts and processes your new instructions
- **`Ctrl+C`** — interrupt the current operation (press twice within 2s to force exit)
- In-progress terminal commands are killed immediately (SIGTERM, then SIGKILL after 1s)
- Multiple messages typed during interrupt are combined into one prompt
### Busy Input Mode
The `display.busy_input_mode` config key controls what happens when you press Enter while the agent is working:
| Mode | Behavior |
|------|----------|
| `"interrupt"` (default) | Your message interrupts the current operation and is processed immediately |
| `"queue"` | Your message is silently queued and sent as the next turn after the agent finishes |
| `"steer"` | Your message is injected into the current run via `/steer`, arriving at the agent after the next tool call — no interrupt, no new turn |
```yaml
# ~/.hermes/config.yaml
display:
busy_input_mode: "steer" # or "queue" or "interrupt" (default)
```
`"queue"` mode is useful when you want to prepare follow-up messages without accidentally canceling in-flight work. `"steer"` mode is useful when you want to redirect the agent mid-task without interrupting — e.g. "actually, also check the tests" while it's still editing code. Unknown values fall back to `"interrupt"`.
`"steer"` has two automatic fallbacks: if the agent hasn't started yet, or if images are attached, the message falls back to `"queue"` behavior so nothing is lost.
You can also change it inside the CLI:
```text
/busy queue
/busy steer
/busy interrupt
/busy status
```
:::tip First-touch hint
The very first time you press Enter while Hermes is working, Hermes prints a one-line reminder explaining the `/busy` knob (`"(tip) Your message interrupted the current run…"`). It only fires once per install — a flag in `config.yaml` under `onboarding.seen.busy_input_prompt` latches it. Delete that key to see the tip again.
:::
### Suspending to Background
On Unix systems, press **`Ctrl+Z`** to suspend Hermes to the background — just like any terminal process. The shell prints a confirmation:
```
Hermes Agent has been suspended. Run `fg` to bring Hermes Agent back.
```
Type `fg` in your shell to resume the session exactly where you left off. This is not supported on Windows.
## Tool Progress Display
The CLI shows animated feedback as the agent works:
**Thinking animation** (during API calls):
```
◜ (。•́︿•̀。) pondering... (1.2s)
◠ (⊙_⊙) contemplating... (2.4s)
✧٩(ˊᗜˋ*)و✧ got it! (3.1s)
```
**Tool execution feed:**
```
┊ 💻 terminal `ls -la` (0.3s)
┊ 🔍 web_search (1.2s)
┊ 📄 web_extract (2.1s)
```
Cycle through display modes with `/verbose`: `off → new → all → verbose`. This command can also be enabled for messaging platforms — see [configuration](/docs/user-guide/configuration#display-settings).
### Tool Preview Length
The `display.tool_preview_length` config key controls the maximum number of characters shown in tool call preview lines (e.g. file paths, terminal commands). The default is `0`, which means no limit — full paths and commands are shown.
```yaml
# ~/.hermes/config.yaml
display:
tool_preview_length: 80 # Truncate tool previews to 80 chars (0 = no limit)
```
This is useful on narrow terminals or when tool arguments contain very long file paths.
## Session Management
### Resuming Sessions
When you exit a CLI session, a resume command is printed:
```
Resume this session with:
hermes --resume 20260225_143052_a1b2c3
Session: 20260225_143052_a1b2c3
Duration: 12m 34s
Messages: 28 (5 user, 18 tool calls)
```
Resume options:
```bash
hermes --continue # Resume the most recent CLI session
hermes -c # Short form
hermes -c "my project" # Resume a named session (latest in lineage)
hermes --resume 20260225_143052_a1b2c3 # Resume a specific session by ID
hermes --resume "refactoring auth" # Resume by title
hermes -r 20260225_143052_a1b2c3 # Short form
```
Resuming restores the full conversation history from SQLite. The agent sees all previous messages, tool calls, and responses — just as if you never left.
Use `/title My Session Name` inside a chat to name the current session, or `hermes sessions rename ` from the command line. Use `hermes sessions list` to browse past sessions.
### Session Storage
CLI sessions are stored in Hermes's SQLite state database under `~/.hermes/state.db`. The database keeps:
- session metadata (ID, title, timestamps, token counters)
- message history
- lineage across compressed/resumed sessions
- full-text search indexes used by `session_search`
Some messaging adapters also keep per-platform transcript files alongside the database, but the CLI itself resumes from the SQLite session store.
### Context Compression
Long conversations are automatically summarized when approaching context limits:
```yaml
# In ~/.hermes/config.yaml
compression:
enabled: true
threshold: 0.50 # Compress at 50% of context limit by default
# Summarization model configured under auxiliary:
auxiliary:
compression:
model: "google/gemini-3-flash-preview" # Model used for summarization
```
When compression triggers, middle turns are summarized while the first 3 and last 20 turns are always preserved.
## Background Sessions
Run a prompt in a separate background session while continuing to use the CLI for other work:
```
/background Analyze the logs in /var/log and summarize any errors from today
```
Hermes immediately confirms the task and gives you back the prompt:
```
🔄 Background task #1 started: "Analyze the logs in /var/log and summarize..."
Task ID: bg_143022_a1b2c3
```
### How It Works
Each `/background` prompt spawns a **completely separate agent session** in a daemon thread:
- **Isolated conversation** — the background agent has no knowledge of your current session's history. It receives only the prompt you provide.
- **Same configuration** — the background agent inherits your model, provider, toolsets, reasoning settings, and fallback model from the current session.
- **Non-blocking** — your foreground session stays fully interactive. You can chat, run commands, or even start more background tasks.
- **Multiple tasks** — you can run several background tasks simultaneously. Each gets a numbered ID.
### Results
When a background task finishes, the result appears as a panel in your terminal:
```
╭─ ⚕ Hermes (background #1) ──────────────────────────────────╮
│ Found 3 errors in syslog from today: │
│ 1. OOM killer invoked at 03:22 — killed process nginx │
│ 2. Disk I/O error on /dev/sda1 at 07:15 │
│ 3. Failed SSH login attempts from 192.168.1.50 at 14:30 │
╰──────────────────────────────────────────────────────────────╯
```
If the task fails, you'll see an error notification instead. If `display.bell_on_complete` is enabled in your config, the terminal bell rings when the task finishes.
### Use Cases
- **Long-running research** — "/background research the latest developments in quantum error correction" while you work on code
- **File processing** — "/background analyze all Python files in this repo and list any security issues" while you continue a conversation
- **Parallel investigations** — start multiple background tasks to explore different angles simultaneously
:::info
Background sessions do not appear in your main conversation history. They are standalone sessions with their own task ID (e.g., `bg_143022_a1b2c3`).
:::
## Quiet Mode
By default, the CLI runs in quiet mode which:
- Suppresses verbose logging from tools
- Enables kawaii-style animated feedback
- Keeps output clean and user-friendly
For debug output:
```bash
hermes chat --verbose
```
---
# TUI
# TUI
The TUI is the modern front-end for Hermes — a terminal UI backed by the same Python runtime as the [Classic CLI](cli.md). Same agent, same sessions, same slash commands; a cleaner, more responsive surface for interacting with them.
It's the recommended way to run Hermes interactively.
## Launch
```bash
# Launch the TUI
hermes --tui
# Resume the latest TUI session (falls back to the latest classic session)
hermes --tui -c
hermes --tui --continue
# Resume a specific session by ID or title
hermes --tui -r 20260409_000000_aa11bb
hermes --tui --resume "my t0p session"
# Run source directly — skips the prebuild step (for TUI contributors)
hermes --tui --dev
```
You can also enable it via env var:
```bash
export HERMES_TUI=1
hermes # now uses the TUI
hermes chat # same
```
The classic CLI remains available as the default. Anything documented in [CLI Interface](cli.md) — slash commands, quick commands, skill preloading, personalities, multi-line input, interrupts — works in the TUI identically.
## Why the TUI
- **Instant first frame** — the banner paints before the app finishes loading, so the terminal never feels frozen while Hermes is starting.
- **Non-blocking input** — type and queue messages before the session is ready. Your first prompt sends the moment the agent comes online.
- **Rich overlays** — model picker, session picker, approval and clarification prompts all render as modal panels rather than inline flows.
- **Live session panel** — tools and skills fill in progressively as they initialize.
- **Mouse-friendly selection** — drag to highlight with a uniform background instead of SGR inverse. Copy with your terminal's normal copy gesture.
- **Alternate-screen rendering** — differential updates mean no flicker when streaming, no scrollback clutter after you quit.
- **Composer affordances** — inline paste-collapse for long snippets, `Cmd+V` / `Ctrl+V` text paste with clipboard-image fallback, bracketed-paste safety, and image/file-path attachment normalization.
Same [skins](features/skins.md) and [personalities](features/personality.md) apply. Switch mid-session with `/skin ares`, `/personality pirate`, and the UI repaints live. See [Skins & Themes](features/skins.md) for the full list of customizable keys and which ones apply to classic vs TUI — the TUI honors the banner palette, UI colors, prompt glyph/color, session display, completion menu, selection bg, `tool_prefix`, and `help_header`.
## Requirements
- **Node.js** ≥ 20 — the TUI runs as a subprocess launched from the Python CLI. `hermes doctor` verifies this.
- **TTY** — like the classic CLI, piping stdin or running in non-interactive environments falls back to single-query mode.
On first launch Hermes installs the TUI's Node dependencies into `ui-tui/node_modules` (one-time, a few seconds). Subsequent launches are fast. If you pull a new Hermes version, the TUI bundle is rebuilt automatically when sources are newer than the dist.
### External prebuild
Distributions that ship a prebuilt bundle (Nix, system packages) can point Hermes at it:
```bash
export HERMES_TUI_DIR=/path/to/prebuilt/ui-tui
hermes --tui
```
The directory must contain `dist/entry.js` and an up-to-date `node_modules`.
## Keybindings
Keybindings match the [Classic CLI](cli.md#keybindings) exactly. The only behavioral differences:
- **Mouse drag** highlights text with a uniform selection background.
- **`Cmd+V` / `Ctrl+V`** first tries normal text paste, then falls back to OSC52/native clipboard reads, and finally image attach when the clipboard or pasted payload resolves to an image.
- **`/terminal-setup`** installs local VS Code / Cursor / Windsurf terminal bindings for better `Cmd+Enter` and undo/redo parity on macOS.
- **Slash autocompletion** opens as a floating panel with descriptions, not an inline dropdown.
- **`Ctrl+X`** — when a queued message is highlighted (sent while the agent was still running), delete it from the queue. **`Esc`** cancels editing and unhighlights without deleting.
- **`Ctrl+G` / `Ctrl+X Ctrl+E`** — open the current input buffer in `$EDITOR` for multi-line / long-prompt composition; save-and-exit sends the contents back as the prompt.
## Slash commands
All slash commands work unchanged. A few are TUI-owned — they produce richer output or render as overlays rather than inline panels:
| Command | TUI behavior |
|---------|--------------|
| `/help` | Overlay with categorized commands, arrow-key navigable |
| `/sessions` | Modal session picker — preview, title, token totals, resume inline |
| `/model` | Modal model picker grouped by provider, with cost hints |
| `/skin` | Live preview — theme change applies as you browse |
| `/details` | Toggle verbose tool-call details (global or per-section) |
| `/usage` | Rich token / cost / context panel |
| `/agents` (alias `/tasks`) | Observability overlay — live subagent tree with kill/pause controls, per-branch cost / token / file rollups, turn-by-turn history |
| `/reload` | Re-reads `~/.hermes/.env` into the running TUI process so newly added API keys take effect without a restart |
| `/mouse` | Toggle mouse tracking on/off at runtime (also persists to `display.mouse_tracking` in `config.yaml`) |
Every other slash command (including installed skills, quick commands, and personality toggles) works identically to the classic CLI. See [Slash Commands Reference](../reference/slash-commands.md).
## LaTeX math rendering
The TUI's markdown pipeline renders LaTeX math inline: `$E = mc^2$` and `$$\frac{a}{b}$$` render as Unicode-formatted math instead of the raw TeX source. Works for inline and block math; unsupported syntax falls back to showing the literal TeX wrapped in a code span so it remains copyable.
This is always-on — nothing to configure. Classic CLI keeps the raw TeX.
## Light-terminal detection
The TUI auto-detects light terminals and swaps to the light theme accordingly. Detection works in three layers:
1. `HERMES_TUI_THEME` env var — highest priority. Values: `light`, `dark`, or a raw 6-char background hex (e.g. `ffffff`, `1a1a2e`).
2. `COLORFGBG` env var — the classic "what's my background color?" hint used by xterm-derived terminals.
3. Terminal background probe via OSC 11 — works on modern terminals (Ghostty, Warp, iTerm2, WezTerm, Kitty) that don't set `COLORFGBG`.
If you want the light theme permanently regardless of terminal:
```bash
export HERMES_TUI_THEME=light
```
## Busy indicator styles
The status-bar FaceTicker is pluggable — the default rotates Hermes' kawaii face palette every 2.5 seconds during agent work. Pick a different style (or `none` for a minimal dot) via config:
```yaml
display:
busy_indicator:
style: kawaii # kawaii | minimal | dots | wings | none
```
Styles ship with matched glyph widths so the rest of the status bar doesn't jitter on rotation.
## Auto-resume
By default, `hermes --tui` starts a fresh session each launch. To re-attach to the most recent TUI session automatically (useful when your terminal or SSH connection drops unexpectedly), opt in:
```bash
export HERMES_TUI_RESUME=1 # most-recent TUI session
# or:
export HERMES_TUI_RESUME= # specific session
```
Unset the variable or pass `--resume ` explicitly to override on a per-launch basis.
## Status line
The TUI's status line tracks agent state in real time:
| Status | Meaning |
|--------|---------|
| `starting agent…` | Session ID is live; tools and skills still coming online. You can type — messages queue and send when ready. |
| `ready` | Agent is idle, accepting input. |
| `thinking…` / `running…` | Agent is reasoning or running a tool. |
| `interrupted` | Current turn was cancelled; press Enter to send again. |
| `forging session…` / `resuming…` | Initial connect or `--resume` handshake. |
The per-skin status-bar colors and thresholds are shared with the classic CLI — see [Skins](features/skins.md) for customization.
The status line also shows:
- **Working directory with git branch** — `~/projects/hermes-agent (docs/two-week-gap-sweep)`. The branch suffix updates when you `git checkout` in a side terminal (mtime-cached) so the TUI reflects your actual active branch, not whatever it was at launch.
- **Per-prompt elapsed time** — `⏱ 12s/3m 45s` while the turn is running (live), frozen to `⏲ 32s / 3m 45s` after the turn completes. First number is time since last user message; second is total session duration. Resets on every new prompt.
## Configuration
The TUI respects all standard Hermes config: `~/.hermes/config.yaml`, profiles, personalities, skins, quick commands, credential pools, memory providers, tool/skill enablement. No TUI-specific config file exists.
A handful of keys tune the TUI surface specifically:
```yaml
display:
skin: default # any built-in or custom skin
personality: helpful
details_mode: collapsed # hidden | collapsed | expanded — global accordion default
sections: # optional: per-section overrides (any subset)
thinking: expanded # always open
tools: expanded # always open
activity: collapsed # opt back IN to the activity panel (hidden by default)
mouse_tracking: true # disable if your terminal conflicts with mouse reporting
```
Runtime toggles:
- `/details [hidden|collapsed|expanded|cycle]` — set the global mode
- `/details [hidden|collapsed|expanded|reset]` — override one section
(sections: `thinking`, `tools`, `subagents`, `activity`)
**Default visibility**
The TUI ships with opinionated per-section defaults that stream the turn as
a live transcript instead of a wall of chevrons:
- `thinking` — **expanded**. Reasoning streams inline as the model emits it.
- `tools` — **expanded**. Tool calls and their results render open.
- `subagents` — falls through to the global `details_mode` (collapsed under
chevron by default — stays quiet until a delegation actually happens).
- `activity` — **hidden**. Ambient meta (gateway hints, terminal-parity
nudges, background notifications) is noise for most day-to-day use. Tool
failures still render inline on the failing tool row; ambient
errors/warnings surface via a floating-alert backstop when every panel
is hidden.
Per-section overrides take precedence over both the section default and the
global `details_mode`. To reshape the layout:
- `display.sections.thinking: collapsed` — put thinking back under a chevron
- `display.sections.tools: collapsed` — put tool calls back under a chevron
- `display.sections.activity: collapsed` — opt the activity panel back in
- `/details ` at runtime
Anything set explicitly in `display.sections` wins over the defaults, so
existing configs keep working unchanged.
## Sessions
Sessions are shared between the TUI and the classic CLI — both write to the same `~/.hermes/state.db`. You can start a session in one, resume in the other. The session picker surfaces sessions from both sources, with a source tag.
See [Sessions](sessions.md) for lifecycle, search, compression, and export.
## Reverting to the classic CLI
Launching `hermes` (without `--tui`) stays on the classic CLI. To make a machine prefer the TUI, set `HERMES_TUI=1` in your shell profile. To go back, unset it.
If the TUI fails to launch (no Node, missing bundle, TTY issue), Hermes prints a diagnostic and falls back — rather than leaving you stuck.
## See also
- [CLI Interface](cli.md) — full slash command and keybinding reference (shared)
- [Sessions](sessions.md) — resume, branch, and history
- [Skins & Themes](features/skins.md) — theme the banner, status bar, and overlays
- [Voice Mode](features/voice-mode.md) — works in both interfaces
- [Configuration](configuration.md) — all config keys
---
# Configuration
# Configuration
All settings are stored in the `~/.hermes/` directory for easy access.
## Directory Structure
```text
~/.hermes/
├── config.yaml # Settings (model, terminal, TTS, compression, etc.)
├── .env # API keys and secrets
├── auth.json # OAuth provider credentials (Nous Portal, etc.)
├── SOUL.md # Primary agent identity (slot #1 in system prompt)
├── memories/ # Persistent memory (MEMORY.md, USER.md)
├── skills/ # Agent-created skills (managed via skill_manage tool)
├── cron/ # Scheduled jobs
├── sessions/ # Gateway sessions
└── logs/ # Logs (errors.log, gateway.log — secrets auto-redacted)
```
## Managing Configuration
```bash
hermes config # View current configuration
hermes config edit # Open config.yaml in your editor
hermes config set KEY VAL # Set a specific value
hermes config check # Check for missing options (after updates)
hermes config migrate # Interactively add missing options
# Examples:
hermes config set model anthropic/claude-opus-4
hermes config set terminal.backend docker
hermes config set OPENROUTER_API_KEY sk-or-... # Saves to .env
```
:::tip
The `hermes config set` command automatically routes values to the right file — API keys are saved to `.env`, everything else to `config.yaml`.
:::
## Configuration Precedence
Settings are resolved in this order (highest priority first):
1. **CLI arguments** — e.g., `hermes chat --model anthropic/claude-sonnet-4` (per-invocation override)
2. **`~/.hermes/config.yaml`** — the primary config file for all non-secret settings
3. **`~/.hermes/.env`** — fallback for env vars; **required** for secrets (API keys, tokens, passwords)
4. **Built-in defaults** — hardcoded safe defaults when nothing else is set
:::info Rule of Thumb
Secrets (API keys, bot tokens, passwords) go in `.env`. Everything else (model, terminal backend, compression settings, memory limits, toolsets) goes in `config.yaml`. When both are set, `config.yaml` wins for non-secret settings.
:::
## Environment Variable Substitution
You can reference environment variables in `config.yaml` using `${VAR_NAME}` syntax:
```yaml
auxiliary:
vision:
api_key: ${GOOGLE_API_KEY}
base_url: ${CUSTOM_VISION_URL}
delegation:
api_key: ${DELEGATION_KEY}
```
Multiple references in a single value work: `url: "${HOST}:${PORT}"`. If a referenced variable is not set, the placeholder is kept verbatim (`${UNDEFINED_VAR}` stays as-is). Only the `${VAR}` syntax is supported — bare `$VAR` is not expanded.
For AI provider setup (OpenRouter, Anthropic, Copilot, custom endpoints, self-hosted LLMs, fallback models, etc.), see [AI Providers](/docs/integrations/providers).
### Provider Timeouts
You can set `providers..request_timeout_seconds` for a provider-wide request timeout, plus `providers..models..timeout_seconds` for a model-specific override. Applies to the primary turn client on every transport (OpenAI-wire, native Anthropic, Anthropic-compatible), the fallback chain, rebuilds after credential rotation, and (for OpenAI-wire) the per-request timeout kwarg — so the configured value wins over the legacy `HERMES_API_TIMEOUT` env var.
You can also set `providers..stale_timeout_seconds` for the non-streaming stale-call detector, plus `providers..models..stale_timeout_seconds` for a model-specific override. This wins over the legacy `HERMES_API_CALL_STALE_TIMEOUT` env var.
Leaving these unset keeps the legacy defaults (`HERMES_API_TIMEOUT=1800`s, `HERMES_API_CALL_STALE_TIMEOUT=300`s, native Anthropic 900s). Not currently wired for AWS Bedrock (both `bedrock_converse` and AnthropicBedrock SDK paths use boto3 with its own timeout configuration). See the commented example in [`cli-config.yaml.example`](https://github.com/NousResearch/hermes-agent/blob/main/cli-config.yaml.example).
## Terminal Backend Configuration
Hermes supports seven terminal backends. Each determines where the agent's shell commands actually execute — your local machine, a Docker container, a remote server via SSH, a Modal cloud sandbox (direct or via the Nous-managed gateway), a Daytona workspace, a Vercel Sandbox, or a Singularity/Apptainer container.
```yaml
terminal:
backend: local # local | docker | ssh | modal | daytona | vercel_sandbox | singularity
cwd: "." # Gateway/cron working directory (CLI always uses launch dir)
timeout: 180 # Per-command timeout in seconds
env_passthrough: [] # Env var names to forward to sandboxed execution (terminal + execute_code)
singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20" # Container image for Singularity backend
modal_image: "nikolaik/python-nodejs:python3.11-nodejs20" # Container image for Modal backend
daytona_image: "nikolaik/python-nodejs:python3.11-nodejs20" # Container image for Daytona backend
```
For cloud sandboxes such as Modal, Daytona, and Vercel Sandbox, `container_persistent: true` means Hermes will try to preserve filesystem state across sandbox recreation. It does not promise that the same live sandbox, PID space, or background processes will still be running later.
### Backend Overview
| Backend | Where commands run | Isolation | Best for |
|---------|-------------------|-----------|----------|
| **local** | Your machine directly | None | Development, personal use |
| **docker** | Single persistent Docker container (shared across session, `/new`, subagents) | Full (namespaces, cap-drop) | Safe sandboxing, CI/CD |
| **ssh** | Remote server via SSH | Network boundary | Remote dev, powerful hardware |
| **modal** | Modal cloud sandbox | Full (cloud VM) | Ephemeral cloud compute, evals |
| **daytona** | Daytona workspace | Full (cloud container) | Managed cloud dev environments |
| **vercel_sandbox** | Vercel Sandbox | Full (cloud microVM) | Cloud execution with snapshot-backed filesystem persistence |
| **singularity** | Singularity/Apptainer container | Namespaces (--containall) | HPC clusters, shared machines |
### Local Backend
The default. Commands run directly on your machine with no isolation. No special setup required.
```yaml
terminal:
backend: local
```
:::warning
The agent has the same filesystem access as your user account. Use `hermes tools` to disable tools you don't want, or switch to Docker for sandboxing.
:::
### Docker Backend
Runs commands inside a Docker container with security hardening (all capabilities dropped, no privilege escalation, PID limits).
**Single persistent container, not per-command.** Hermes starts ONE long-lived container on first use and routes every terminal, file, and `execute_code` call through `docker exec` into that same container — across sessions, `/new`, `/reset`, and `delegate_task` subagents — for the lifetime of the Hermes process. Working-directory changes, installed packages, and files in `/workspace` carry over from one tool call to the next, just like a local shell. The container is stopped and removed on shutdown. See **Container lifecycle** below for details.
```yaml
terminal:
backend: docker
docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
docker_mount_cwd_to_workspace: false # Mount launch dir into /workspace
docker_run_as_host_user: false # See "Running container as host user" below
docker_forward_env: # Env vars to forward into container
- "GITHUB_TOKEN"
docker_volumes: # Host directory mounts
- "/home/user/projects:/workspace/projects"
- "/home/user/data:/data:ro" # :ro for read-only
# Resource limits
container_cpu: 1 # CPU cores (0 = unlimited)
container_memory: 5120 # MB (0 = unlimited)
container_disk: 51200 # MB (requires overlay2 on XFS+pquota)
container_persistent: true # Persist /workspace and /root across sessions
```
**Requirements:** Docker Desktop or Docker Engine installed and running. Hermes probes `$PATH` plus common macOS install locations (`/usr/local/bin/docker`, `/opt/homebrew/bin/docker`, Docker Desktop app bundle). Podman is supported out of the box: set `HERMES_DOCKER_BINARY=podman` (or the full path) to force it when both are installed.
**Container lifecycle:** Hermes reuses a single long-lived container (`docker run -d ... sleep 2h`) for every terminal and file-tool call, across sessions, `/new`, `/reset`, and `delegate_task` subagents, for the lifetime of the Hermes process. Commands run via `docker exec` with a login shell, so working-directory changes, installed packages, and files in `/workspace` all persist from one tool call to the next. The container is stopped and removed on Hermes shutdown (or when the idle-sweep reclaims it).
Parallel subagents spawned via `delegate_task(tasks=[...])` share this one container — concurrent `cd`, env mutations, and writes to the same path will collide. If a subagent needs an isolated sandbox, it must register a per-task image override via `register_task_env_overrides()`, which RL and benchmark environments (TerminalBench2, HermesSweEnv, etc.) do automatically for their per-task Docker images.
**Security hardening:**
- `--cap-drop ALL` with only `DAC_OVERRIDE`, `CHOWN`, `FOWNER` added back
- `--security-opt no-new-privileges`
- `--pids-limit 256`
- Size-limited tmpfs for `/tmp` (512MB), `/var/tmp` (256MB), `/run` (64MB)
**Credential forwarding:** Env vars listed in `docker_forward_env` are resolved from your shell environment first, then `~/.hermes/.env`. Skills can also declare `required_environment_variables` which are merged automatically.
### SSH Backend
Runs commands on a remote server over SSH. Uses ControlMaster for connection reuse (5-minute idle keepalive). Persistent shell is enabled by default — state (cwd, env vars) survives across commands.
```yaml
terminal:
backend: ssh
persistent_shell: true # Keep a long-lived bash session (default: true)
```
**Required environment variables:**
```bash
TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=ubuntu
```
**Optional:**
| Variable | Default | Description |
|----------|---------|-------------|
| `TERMINAL_SSH_PORT` | `22` | SSH port |
| `TERMINAL_SSH_KEY` | (system default) | Path to SSH private key |
| `TERMINAL_SSH_PERSISTENT` | `true` | Enable persistent shell |
**How it works:** Connects at init time with `BatchMode=yes` and `StrictHostKeyChecking=accept-new`. Persistent shell keeps a single `bash -l` process alive on the remote host, communicating via temporary files. Commands that need `stdin_data` or `sudo` automatically fall back to one-shot mode.
### Modal Backend
Runs commands in a [Modal](https://modal.com) cloud sandbox. Each task gets an isolated VM with configurable CPU, memory, and disk. Filesystem can be snapshot/restored across sessions.
```yaml
terminal:
backend: modal
container_cpu: 1 # CPU cores
container_memory: 5120 # MB (5GB)
container_disk: 51200 # MB (50GB)
container_persistent: true # Snapshot/restore filesystem
```
**Required:** Either `MODAL_TOKEN_ID` + `MODAL_TOKEN_SECRET` environment variables, or a `~/.modal.toml` config file.
**Persistence:** When enabled, the sandbox filesystem is snapshotted on cleanup and restored on next session. Snapshots are tracked in `~/.hermes/modal_snapshots.json`. This preserves filesystem state, not live processes, PID space, or background jobs.
**Credential files:** Automatically mounted from `~/.hermes/` (OAuth tokens, etc.) and synced before each command.
### Daytona Backend
Runs commands in a [Daytona](https://daytona.io) managed workspace. Supports stop/resume for persistence.
```yaml
terminal:
backend: daytona
container_cpu: 1 # CPU cores
container_memory: 5120 # MB → converted to GiB
container_disk: 10240 # MB → converted to GiB (max 10 GiB)
container_persistent: true # Stop/resume instead of delete
```
**Required:** `DAYTONA_API_KEY` environment variable.
**Persistence:** When enabled, sandboxes are stopped (not deleted) on cleanup and resumed on next session. Sandbox names follow the pattern `hermes-{task_id}`.
**Disk limit:** Daytona enforces a 10 GiB maximum. Requests above this are capped with a warning.
### Vercel Sandbox Backend
Runs commands in a [Vercel Sandbox](https://vercel.com/docs/vercel-sandbox) cloud microVM. Hermes uses the normal terminal and file tool surfaces; there are no Vercel-specific model-facing tools.
```yaml
terminal:
backend: vercel_sandbox
vercel_runtime: node24 # node24 | node22 | python3.13
cwd: /vercel/sandbox # default workspace root
container_persistent: true # Snapshot/restore filesystem
container_disk: 51200 # Shared default only; custom disk is unsupported
```
**Required install:** Install the optional SDK extra:
```bash
pip install 'hermes-agent[vercel]'
```
**Required authentication:** Configure access-token auth with all three of `VERCEL_TOKEN`, `VERCEL_PROJECT_ID`, and `VERCEL_TEAM_ID`. This is the supported setup for deployments and normal long-running Hermes processes on Render, Railway, Docker, and similar hosts.
For one-off local development, Hermes also accepts short-lived Vercel OIDC tokens:
```bash
VERCEL_OIDC_TOKEN="$(vc project token )" hermes chat
```
From a linked Vercel project directory, you can omit the project name:
```bash
VERCEL_OIDC_TOKEN="$(vc project token)" hermes chat
```
OIDC tokens are short-lived and should not be used as the documented deployment path.
**Runtime:** `terminal.vercel_runtime` supports `node24`, `node22`, and `python3.13`. If unset, Hermes defaults to `node24`.
**Persistence:** When `container_persistent: true`, Hermes snapshots the sandbox filesystem during cleanup and restores a later sandbox for the same task from that snapshot. Snapshot contents can include Hermes-synced credentials, skills, and cache files that were copied into the sandbox. This preserves filesystem state only; it does not preserve live sandbox identity, PID space, shell state, or running background processes.
**Background commands:** `terminal(background=true)` uses Hermes' generic non-local background process flow. You can spawn, poll, wait, view logs, and kill processes through the normal process tool while the sandbox is alive. Hermes does not provide native Vercel detached-process recovery after cleanup or restart.
**Disk sizing:** Vercel Sandbox does not currently support Hermes' `container_disk` resource knob. Leave `container_disk` unset or at the shared default `51200`; non-default values fail diagnostics and backend creation instead of being silently ignored.
### Singularity/Apptainer Backend
Runs commands in a [Singularity/Apptainer](https://apptainer.org) container. Designed for HPC clusters and shared machines where Docker isn't available.
```yaml
terminal:
backend: singularity
singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20"
container_cpu: 1 # CPU cores
container_memory: 5120 # MB
container_persistent: true # Writable overlay persists across sessions
```
**Requirements:** `apptainer` or `singularity` binary in `$PATH`.
**Image handling:** Docker URLs (`docker://...`) are automatically converted to SIF files and cached. Existing `.sif` files are used directly.
**Scratch directory:** Resolved in order: `TERMINAL_SCRATCH_DIR` → `TERMINAL_SANDBOX_DIR/singularity` → `/scratch/$USER/hermes-agent` (HPC convention) → `~/.hermes/sandboxes/singularity`.
**Isolation:** Uses `--containall --no-home` for full namespace isolation without mounting the host home directory.
### Common Terminal Backend Issues
If terminal commands fail immediately or the terminal tool is reported as disabled:
- **Local** — No special requirements. The safest default when getting started.
- **Docker** — Run `docker version` to verify Docker is working. If it fails, fix Docker or `hermes config set terminal.backend local`.
- **SSH** — Both `TERMINAL_SSH_HOST` and `TERMINAL_SSH_USER` must be set. Hermes logs a clear error if either is missing.
- **Modal** — Needs `MODAL_TOKEN_ID` env var or `~/.modal.toml`. Run `hermes doctor` to check.
- **Daytona** — Needs `DAYTONA_API_KEY`. The Daytona SDK handles server URL configuration.
- **Singularity** — Needs `apptainer` or `singularity` in `$PATH`. Common on HPC clusters.
When in doubt, set `terminal.backend` back to `local` and verify that commands run there first.
### Remote-to-Host File Sync on Teardown
For the **SSH**, **Modal**, and **Daytona** backends (anywhere the agent's working tree lives on a different machine than the host running Hermes), Hermes tracks files the agent touched inside the remote sandbox and, on session teardown / sandbox cleanup, **syncs the modified files back to the host** under `~/.hermes/cache/remote-syncs//`.
- Triggers on: session close, `/new`, `/reset`, gateway message timeout, `delegate_task` subagent completion when the child used a remote backend.
- Covers the whole tree the agent modified, not just files it explicitly opened. Additions, edits, and deletions are all captured.
- The remote sandbox may have been torn down by the time you go looking; the local `~/.hermes/cache/remote-syncs/…` copy is the authoritative record of what the agent changed.
- Large binary outputs (model checkpoints, raw datasets) are capped by size — the sync skips files over `file_sync_max_mb` (default `100`). Bump that if you expect bigger artifacts to come back.
```yaml
terminal:
file_sync_max_mb: 100 # default — sync files up to 100 MB each
file_sync_enabled: true # default — set false to skip the sync entirely
```
This is how you recover results from ephemeral cloud sandboxes that get destroyed after the session ends, without having to tell the agent to explicitly `scp` or `modal volume put` every artifact.
### Docker Volume Mounts
When using the Docker backend, `docker_volumes` lets you share host directories with the container. Each entry uses standard Docker `-v` syntax: `host_path:container_path[:options]`.
```yaml
terminal:
backend: docker
docker_volumes:
- "/home/user/projects:/workspace/projects" # Read-write (default)
- "/home/user/datasets:/data:ro" # Read-only
- "/home/user/.hermes/cache/documents:/output" # Gateway-visible exports
```
This is useful for:
- **Providing files** to the agent (datasets, configs, reference code)
- **Receiving files** from the agent (generated code, reports, exports)
- **Shared workspaces** where both you and the agent access the same files
If you use a messaging gateway and want the agent to send generated files via
`MEDIA:/...`, prefer a dedicated host-visible export mount such as
`/home/user/.hermes/cache/documents:/output`.
- Write files inside Docker to `/output/...`
- Emit the **host path** in `MEDIA:`, for example:
`MEDIA:/home/user/.hermes/cache/documents/report.txt`
- Do **not** emit `/workspace/...` or `/output/...` unless that exact path also
exists for the gateway process on the host
:::warning
YAML duplicate keys silently override earlier ones. If you already have a
`docker_volumes:` block, merge new mounts into the same list instead of adding
another `docker_volumes:` key later in the file.
:::
Can also be set via environment variable: `TERMINAL_DOCKER_VOLUMES='["/host:/container"]'` (JSON array).
### Docker Credential Forwarding
By default, Docker terminal sessions do not inherit arbitrary host credentials. If you need a specific token inside the container, add it to `terminal.docker_forward_env`.
```yaml
terminal:
backend: docker
docker_forward_env:
- "GITHUB_TOKEN"
- "NPM_TOKEN"
```
Hermes resolves each listed variable from your current shell first, then falls back to `~/.hermes/.env` if it was saved with `hermes config set`.
:::warning
Anything listed in `docker_forward_env` becomes visible to commands run inside the container. Only forward credentials you are comfortable exposing to the terminal session.
:::
### Running the Container as Your Host User
By default Docker containers run as `root` (UID 0). Files created inside `/workspace` or other bind-mounts end up owned by root on the host, so after a session you have to `sudo chown` them before you can edit them from your host editor. The `terminal.docker_run_as_host_user` flag fixes this:
```yaml
terminal:
backend: docker
docker_run_as_host_user: true # default: false
```
When enabled, Hermes appends `--user $(id -u):$(id -g)` to the `docker run` command so files written into bind-mounted directories (`/workspace`, `/root`, anything in `docker_volumes`) are owned by your host user, not root. The trade-off: the container can no longer `apt install` or write to root-owned paths like `/root/.npm` — use a base image whose `HOME` is owned by a non-root user (or add your required tooling at image build time) if you need both.
Leave this `false` (the default) for backwards-compatible behavior. Turn it on when your workflow is mostly "edit mounted host files" and you're tired of `sudo chown -R`.
### Optional: Mount the Launch Directory into `/workspace`
Docker sandboxes stay isolated by default. Hermes does **not** pass your current host working directory into the container unless you explicitly opt in.
Enable it in `config.yaml`:
```yaml
terminal:
backend: docker
docker_mount_cwd_to_workspace: true
```
When enabled:
- if you launch Hermes from `~/projects/my-app`, that host directory is bind-mounted to `/workspace`
- the Docker backend starts in `/workspace`
- file tools and terminal commands both see the same mounted project
When disabled, `/workspace` stays sandbox-owned unless you explicitly mount something via `docker_volumes`.
Security tradeoff:
- `false` preserves the sandbox boundary
- `true` gives the sandbox direct access to the directory you launched Hermes from
Use the opt-in only when you intentionally want the container to work on live host files.
### Persistent Shell
By default, each terminal command runs in its own subprocess — working directory, environment variables, and shell variables reset between commands. When **persistent shell** is enabled, a single long-lived bash process is kept alive across `execute()` calls so that state survives between commands.
This is most useful for the **SSH backend**, where it also eliminates per-command connection overhead. Persistent shell is **enabled by default for SSH** and disabled for the local backend.
```yaml
terminal:
persistent_shell: true # default — enables persistent shell for SSH
```
To disable:
```bash
hermes config set terminal.persistent_shell false
```
**What persists across commands:**
- Working directory (`cd /tmp` sticks for the next command)
- Exported environment variables (`export FOO=bar`)
- Shell variables (`MY_VAR=hello`)
**Precedence:**
| Level | Variable | Default |
|-------|----------|---------|
| Config | `terminal.persistent_shell` | `true` |
| SSH override | `TERMINAL_SSH_PERSISTENT` | follows config |
| Local override | `TERMINAL_LOCAL_PERSISTENT` | `false` |
Per-backend environment variables take highest precedence. If you want persistent shell on the local backend too:
```bash
export TERMINAL_LOCAL_PERSISTENT=true
```
:::note
Commands that require `stdin_data` or sudo automatically fall back to one-shot mode, since the persistent shell's stdin is already occupied by the IPC protocol.
:::
See [Code Execution](features/code-execution.md) and the [Terminal section of the README](features/tools.md) for details on each backend.
## Skill Settings
Skills can declare their own configuration settings via their SKILL.md frontmatter. These are non-secret values (paths, preferences, domain settings) stored under the `skills.config` namespace in `config.yaml`.
```yaml
skills:
config:
myplugin:
path: ~/myplugin-data # Example — each skill defines its own keys
```
**How skill settings work:**
- `hermes config migrate` scans all enabled skills, finds unconfigured settings, and offers to prompt you
- `hermes config show` displays all skill settings under "Skill Settings" with the skill they belong to
- When a skill loads, its resolved config values are injected into the skill context automatically
**Setting values manually:**
```bash
hermes config set skills.config.myplugin.path ~/myplugin-data
```
For details on declaring config settings in your own skills, see [Creating Skills — Config Settings](/docs/developer-guide/creating-skills#config-settings-configyaml).
### Guard on agent-created skill writes
When the agent uses `skill_manage` to create, edit, patch, or delete a skill, Hermes can optionally scan the new/updated content for dangerous keyword patterns (credential harvesting, obvious prompt injection, exfil instructions). The scanner is **off by default** — real agent workflows that legitimately touch `~/.ssh/` or mention `$OPENAI_API_KEY` were tripping the heuristic too often. Turn it back on if you want the scanner to prompt you before the agent's skill writes land:
```yaml
skills:
guard_agent_created: true # default: false
```
When on, any flagged `skill_manage` write surfaces as an approval prompt with the scanner's rationale. Accepted writes land; denied writes return an explanatory error to the agent.
## Memory Configuration
```yaml
memory:
memory_enabled: true
user_profile_enabled: true
memory_char_limit: 2200 # ~800 tokens
user_char_limit: 1375 # ~500 tokens
```
## File Read Safety
Controls how much content a single `read_file` call can return. Reads that exceed the limit are rejected with an error telling the agent to use `offset` and `limit` for a smaller range. This prevents a single read of a minified JS bundle or large data file from flooding the context window.
```yaml
file_read_max_chars: 100000 # default — ~25-35K tokens
```
Raise it if you're on a model with a large context window and frequently read big files. Lower it for small-context models to keep reads efficient:
```yaml
# Large context model (200K+)
file_read_max_chars: 200000
# Small local model (16K context)
file_read_max_chars: 30000
```
The agent also deduplicates file reads automatically — if the same file region is read twice and the file hasn't changed, a lightweight stub is returned instead of re-sending the content. This resets on context compression so the agent can re-read files after their content is summarized away.
## Tool Output Truncation Limits
Three related caps control how much raw output a tool can return before Hermes truncates it:
```yaml
tool_output:
max_bytes: 50000 # terminal output cap (chars)
max_lines: 2000 # read_file pagination cap
max_line_length: 2000 # per-line cap in read_file's line-numbered view
```
- **`max_bytes`** — When a `terminal` command produces more than this many characters of combined stdout/stderr, Hermes keeps the first 40% and last 60% and inserts a `[OUTPUT TRUNCATED]` notice between them. Default `50000` (≈12-15K tokens across typical tokenisers).
- **`max_lines`** — Upper bound on the `limit` parameter of a single `read_file` call. Requests above this are clamped so a single read can't flood the context window. Default `2000`.
- **`max_line_length`** — Per-line cap applied when `read_file` emits the line-numbered view. Lines longer than this are truncated to this many chars followed by `... [truncated]`. Default `2000`.
Raise the limits on models with large context windows that can afford more raw output per call. Lower them for small-context models to keep tool results compact:
```yaml
# Large context model (200K+)
tool_output:
max_bytes: 150000
max_lines: 5000
# Small local model (16K context)
tool_output:
max_bytes: 20000
max_lines: 500
```
## Global Toolset Disable
To suppress specific toolsets across the CLI and every gateway platform in one
place, list their names under `agent.disabled_toolsets`:
```yaml
agent:
disabled_toolsets:
- memory # hide memory tools + MEMORY_GUIDANCE injection
- web # no web_search / web_extract anywhere
```
This applies **after** per-platform tool config (`platform_toolsets` written by
`hermes tools`), so a toolset listed here is always removed — even if a
platform's saved config still lists it. Use this when you want a single
switch for "turn X off everywhere" rather than editing 15+ platform rows in
the `hermes tools` UI.
Leaving the list empty, or omitting the key, is a no-op.
## Git Worktree Isolation
Enable isolated git worktrees for running multiple agents in parallel on the same repo:
```yaml
worktree: true # Always create a worktree (same as hermes -w)
# worktree: false # Default — only when -w flag is passed
```
When enabled, each CLI session creates a fresh worktree under `.worktrees/` with its own branch. Agents can edit files, commit, push, and create PRs without interfering with each other. Clean worktrees are removed on exit; dirty ones are kept for manual recovery.
You can also list gitignored files to copy into worktrees via `.worktreeinclude` in your repo root:
```
# .worktreeinclude
.env
.venv/
node_modules/
```
## Context Compression
Hermes automatically compresses long conversations to stay within your model's context window. The compression summarizer is a separate LLM call — you can point it at any provider or endpoint.
All compression settings live in `config.yaml` (no environment variables).
### Full reference
```yaml
compression:
enabled: true # Toggle compression on/off
threshold: 0.50 # Compress at this % of context limit
target_ratio: 0.20 # Fraction of threshold to preserve as recent tail
protect_last_n: 20 # Min recent messages to keep uncompressed
hygiene_hard_message_limit: 400 # Gateway safety valve — see below
# The summarization model/provider is configured under auxiliary:
auxiliary:
compression:
model: "google/gemini-3-flash-preview" # Model for summarization
provider: "auto" # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
base_url: null # Custom OpenAI-compatible endpoint (overrides provider)
```
:::info Legacy config migration
Older configs with `compression.summary_model`, `compression.summary_provider`, and `compression.summary_base_url` are automatically migrated to `auxiliary.compression.*` on first load (config version 17). No manual action needed.
:::
`hygiene_hard_message_limit` is a gateway-only **pre-compression safety valve**. Runaway sessions with thousands of messages can hit model context limits before the normal percent-of-context threshold fires; when message count crosses this ceiling, Hermes forces compression regardless of token usage. Default `400` — raise it for platforms where very long sessions are normal, lower it to force more aggressive compression. Editing this value on a running gateway takes effect on the next message (see below).
:::tip Gateway hot-reload of compression and context length
As of recent releases, editing `model.context_length` or any `compression.*` key in `config.yaml` on a running gateway takes effect on the next message — no gateway restart, no `/reset`, no session rotation required. The cached-agent signature includes these keys, so the gateway transparently rebuilds the agent when it sees a change. API keys and tool/skill config still require the usual reload paths.
:::
### Common setups
**Default (auto-detect) — no configuration needed:**
```yaml
compression:
enabled: true
threshold: 0.50
```
Uses your main provider and main model. Override per-task (e.g. `auxiliary.compression.provider: openrouter` + `model: google/gemini-2.5-flash`) if you want compression on a cheaper model than your main chat model.
**Force a specific provider** (OAuth or API-key based):
```yaml
auxiliary:
compression:
provider: nous
model: gemini-3-flash
```
Works with any provider: `nous`, `openrouter`, `codex`, `anthropic`, `main`, etc.
**Custom endpoint** (self-hosted, Ollama, zai, DeepSeek, etc.):
```yaml
auxiliary:
compression:
model: glm-4.7
base_url: https://api.z.ai/api/coding/paas/v4
```
Points at a custom OpenAI-compatible endpoint. Uses `OPENAI_API_KEY` for auth.
### How the three knobs interact
| `auxiliary.compression.provider` | `auxiliary.compression.base_url` | Result |
|---------------------|---------------------|--------|
| `auto` (default) | not set | Auto-detect best available provider |
| `nous` / `openrouter` / etc. | not set | Force that provider, use its auth |
| any | set | Use the custom endpoint directly (provider ignored) |
:::warning Summary model context length requirement
The summary model **must** have a context window at least as large as your main agent model's. The compressor sends the full middle section of the conversation to the summary model — if that model's context window is smaller than the main model's, the summarization call will fail with a context length error. When this happens, the middle turns are **dropped without a summary**, losing conversation context silently. If you override the model, verify its context length meets or exceeds your main model's.
:::
## Context Engine
The context engine controls how conversations are managed when approaching the model's token limit. The built-in `compressor` engine uses lossy summarization (see [Context Compression](/docs/developer-guide/context-compression-and-caching)). Plugin engines can replace it with alternative strategies.
```yaml
context:
engine: "compressor" # default — built-in lossy summarization
```
To use a plugin engine (e.g., LCM for lossless context management):
```yaml
context:
engine: "lcm" # must match the plugin's name
```
Plugin engines are **never auto-activated** — you must explicitly set `context.engine` to the plugin name. Available engines can be browsed and selected via `hermes plugins` → Provider Plugins → Context Engine.
See [Memory Providers](/docs/user-guide/features/memory-providers) for the analogous single-select system for memory plugins.
## Iteration Budget Pressure
When the agent is working on a complex task with many tool calls, it can burn through its iteration budget (default: 90 turns) without realizing it's running low. Budget pressure automatically warns the model as it approaches the limit:
| Threshold | Level | What the model sees |
|-----------|-------|---------------------|
| **70%** | Caution | `[BUDGET: 63/90. 27 iterations left. Start consolidating.]` |
| **90%** | Warning | `[BUDGET WARNING: 81/90. Only 9 left. Respond NOW.]` |
Warnings are injected into the last tool result's JSON (as a `_budget_warning` field) rather than as separate messages — this preserves prompt caching and doesn't disrupt the conversation structure.
```yaml
agent:
max_turns: 90 # Max iterations per conversation turn (default: 90)
api_max_retries: 2 # Retries per provider before fallback engages (default: 2)
```
Budget pressure is enabled by default. The agent sees warnings naturally as part of tool results, encouraging it to consolidate its work and deliver a response before running out of iterations.
When the iteration budget is fully exhausted, the CLI shows a notification to the user: `⚠ Iteration budget reached (90/90) — response may be incomplete`. If the budget runs out during active work, the agent generates a summary of what was accomplished before stopping.
`agent.api_max_retries` controls how many times Hermes retries a provider API call on transient errors (rate limits, connection drops, 5xx) **before** fallback-provider switching engages. The default is `2` — three attempts total, matching the OpenAI SDK default. If you have [fallback providers](/docs/user-guide/features/fallback-providers) configured and want to fail over faster, drop this to `0` so the first transient error on your primary immediately hands off to the fallback instead of churning retries against the flaky endpoint.
### API Timeouts
Hermes has separate timeout layers for streaming, plus a stale detector for non-streaming calls. The stale detectors auto-adjust for local providers only when you leave them at their implicit defaults.
| Timeout | Default | Local providers | Config / env |
|---------|---------|----------------|--------------|
| Socket read timeout | 120s | Auto-raised to 1800s | `HERMES_STREAM_READ_TIMEOUT` |
| Stale stream detection | 180s | Auto-disabled | `HERMES_STREAM_STALE_TIMEOUT` |
| Stale non-stream detection | 300s | Auto-disabled when left implicit | `providers..stale_timeout_seconds` or `HERMES_API_CALL_STALE_TIMEOUT` |
| API call (non-streaming) | 1800s | Unchanged | `providers..request_timeout_seconds` / `timeout_seconds` or `HERMES_API_TIMEOUT` |
The **socket read timeout** controls how long httpx waits for the next chunk of data from the provider. Local LLMs can take minutes for prefill on large contexts before producing the first token, so Hermes raises this to 30 minutes when it detects a local endpoint. If you explicitly set `HERMES_STREAM_READ_TIMEOUT`, that value is always used regardless of endpoint detection.
The **stale stream detection** kills connections that receive SSE keep-alive pings but no actual content. This is disabled entirely for local providers since they don't send keep-alive pings during prefill.
The **stale non-stream detection** kills non-streaming calls that produce no response for too long. By default Hermes disables this on local endpoints to avoid false positives during long prefills. If you explicitly set `providers..stale_timeout_seconds`, `providers..models..stale_timeout_seconds`, or `HERMES_API_CALL_STALE_TIMEOUT`, that explicit value is honored even on local endpoints.
## Context Pressure Warnings
Separate from iteration budget pressure, context pressure tracks how close the conversation is to the **compaction threshold** — the point where context compression fires to summarize older messages. This helps both you and the agent understand when the conversation is getting long.
| Progress | Level | What happens |
|----------|-------|-------------|
| **≥ 60%** to threshold | Info | CLI shows a cyan progress bar; gateway sends an informational notice |
| **≥ 85%** to threshold | Warning | CLI shows a bold yellow bar; gateway warns compaction is imminent |
In the CLI, context pressure appears as a progress bar in the tool output feed:
```
◐ context ████████████░░░░░░░░ 62% to compaction 48k threshold (50%) · approaching compaction
```
On messaging platforms, a plain-text notification is sent:
```
◐ Context: ████████████░░░░░░░░ 62% to compaction (threshold: 50% of window).
```
If auto-compression is disabled, the warning tells you context may be truncated instead.
Context pressure is automatic — no configuration needed. It fires purely as a user-facing notification and does not modify the message stream or inject anything into the model's context.
## Credential Pool Strategies
When you have multiple API keys or OAuth tokens for the same provider, configure the rotation strategy:
```yaml
credential_pool_strategies:
openrouter: round_robin # cycle through keys evenly
anthropic: least_used # always pick the least-used key
```
Options: `fill_first` (default), `round_robin`, `least_used`, `random`. See [Credential Pools](/docs/user-guide/features/credential-pools) for full documentation.
## Auxiliary Models
Hermes uses "auxiliary" models for side tasks like image analysis, web page summarization, browser screenshot analysis, session-title generation, and context compression. By default (`auxiliary.*.provider: "auto"`), Hermes routes every auxiliary task to your **main chat model** — the same provider/model you picked in `hermes model`. You don't need to configure anything to get started, but be aware that on expensive reasoning models (Opus, MiniMax M2.7, etc.) auxiliary tasks add meaningful cost. If you want cheap-and-fast side tasks regardless of your main model, set `auxiliary..provider` and `auxiliary..model` explicitly (for example, Gemini Flash on OpenRouter for vision and web extraction).
:::note Why "auto" uses your main model
Earlier builds split aggregator users (OpenRouter, Nous Portal) onto a cheap provider-side default. That was surprising — users who paid for an aggregator subscription would see a different model handling their auxiliary traffic. `auto` now uses the main model for everyone, and per-task overrides in `config.yaml` still win (see [Full auxiliary config reference](#full-auxiliary-config-reference) below).
:::
### Configuring auxiliary models interactively
Instead of hand-editing YAML, run `hermes model` and pick **"Configure auxiliary models"** from the menu. You'll get an interactive per-task picker:
```
$ hermes model
→ Configure auxiliary models
[ ] vision currently: auto / main model
[ ] web_extract currently: auto / main model
[ ] session_search currently: openrouter / google/gemini-2.5-flash
[ ] title_generation currently: openrouter / google/gemini-3-flash-preview
[ ] compression currently: auto / main model
[ ] approval currently: auto / main model
```
Select a task, pick a provider (OAuth flows open a browser; API-key providers prompt), pick a model. The change persists to `auxiliary..*` in `config.yaml`. Same machinery as the main-model picker — no extra syntax to learn.
### Video Tutorial
### The universal config pattern
Every model slot in Hermes — auxiliary tasks, compression, fallback — uses the same three knobs:
| Key | What it does | Default |
|-----|-------------|---------|
| `provider` | Which provider to use for auth and routing | `"auto"` |
| `model` | Which model to request | provider's default |
| `base_url` | Custom OpenAI-compatible endpoint (overrides provider) | not set |
When `base_url` is set, Hermes ignores the provider and calls that endpoint directly (using `api_key` or `OPENAI_API_KEY` for auth). When only `provider` is set, Hermes uses that provider's built-in auth and base URL.
Available providers for auxiliary tasks: `auto`, `main`, plus any provider in the [provider registry](/docs/reference/environment-variables) — `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `gemini`, `google-gemini-cli`, `qwen-oauth`, `zai`, `kimi-coding`, `kimi-coding-cn`, `minimax`, `minimax-cn`, `minimax-oauth`, `deepseek`, `nvidia`, `xai`, `ollama-cloud`, `alibaba`, `bedrock`, `huggingface`, `arcee`, `xiaomi`, `kilocode`, `opencode-zen`, `opencode-go`, `ai-gateway`, `azure-foundry` — or any named custom provider from your `custom_providers` list (e.g. `provider: "beans"`).
:::tip MiniMax OAuth
`minimax-oauth` logs in via browser OAuth (no API key needed). Run `hermes model` and select **MiniMax (OAuth)** to authenticate. Auxiliary tasks use `MiniMax-M2.7-highspeed` automatically. See the [MiniMax OAuth guide](../guides/minimax-oauth.md).
:::
:::warning `"main"` is for auxiliary tasks only
The `"main"` provider option means "use whatever provider my main agent uses" — it's only valid inside `auxiliary:`, `compression:`, and `fallback_model:` configs. It is **not** a valid value for your top-level `model.provider` setting. If you use a custom OpenAI-compatible endpoint, set `provider: custom` in your `model:` section. See [AI Providers](/docs/integrations/providers) for all main model provider options.
:::
### Full auxiliary config reference
```yaml
auxiliary:
# Image analysis (vision_analyze tool + browser screenshots)
vision:
provider: "auto" # "auto", "openrouter", "nous", "codex", "main", etc.
model: "" # e.g. "openai/gpt-4o", "google/gemini-2.5-flash"
base_url: "" # Custom OpenAI-compatible endpoint (overrides provider)
api_key: "" # API key for base_url (falls back to OPENAI_API_KEY)
timeout: 120 # seconds — LLM API call timeout; vision payloads need generous timeout
download_timeout: 30 # seconds — image HTTP download; increase for slow connections
# Web page summarization + browser page text extraction
web_extract:
provider: "auto"
model: "" # e.g. "google/gemini-2.5-flash"
base_url: ""
api_key: ""
timeout: 360 # seconds (6min) — per-attempt LLM summarization
# Dangerous command approval classifier
approval:
provider: "auto"
model: ""
base_url: ""
api_key: ""
timeout: 30 # seconds
# Context compression timeout (separate from compression.* config)
compression:
timeout: 120 # seconds — compression summarizes long conversations, needs more time
# Session search — summarizes past session matches
session_search:
provider: "auto"
model: ""
base_url: ""
api_key: ""
timeout: 30
max_concurrency: 3 # Limit parallel summaries to reduce request-burst 429s
extra_body: {} # Provider-specific OpenAI-compatible request fields
# Skills hub — skill matching and search
skills_hub:
provider: "auto"
model: ""
base_url: ""
api_key: ""
timeout: 30
# MCP tool dispatch
mcp:
provider: "auto"
model: ""
base_url: ""
api_key: ""
timeout: 30
```
:::tip
Each auxiliary task has a configurable `timeout` (in seconds). Defaults: vision 120s, web_extract 360s, approval 30s, compression 120s. Increase these if you use slow local models for auxiliary tasks. Vision also has a separate `download_timeout` (default 30s) for the HTTP image download — increase this for slow connections or self-hosted image servers.
:::
:::info
Context compression has its own `compression:` block for thresholds and an `auxiliary.compression:` block for model/provider settings — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](/docs/integrations/providers#fallback-model). All three follow the same provider/model/base_url pattern.
:::
### Session Search Tuning
If you use a reasoning-heavy model for `auxiliary.session_search`, Hermes now gives you two built-in controls:
- `auxiliary.session_search.max_concurrency`: limits how many matched sessions Hermes summarizes at once
- `auxiliary.session_search.extra_body`: forwards provider-specific OpenAI-compatible request fields on the summarization calls
Example:
```yaml
auxiliary:
session_search:
provider: "main"
model: "glm-4.5-air"
timeout: 60
max_concurrency: 2
extra_body:
enable_thinking: false
```
Use `max_concurrency` when your provider rate-limits request bursts and you want `session_search` to trade some parallelism for stability.
Use `extra_body` only when your provider documents OpenAI-compatible request-body fields you want Hermes to pass through for that task. Hermes forwards the object as-is.
:::warning
`extra_body` is only effective when your provider actually supports the field you send. If the provider does not expose a native OpenAI-compatible reasoning-off flag, Hermes cannot synthesize one on its behalf.
:::
### Changing the Vision Model
To use GPT-4o instead of Gemini Flash for image analysis:
```yaml
auxiliary:
vision:
model: "openai/gpt-4o"
```
Or via environment variable (in `~/.hermes/.env`):
```bash
AUXILIARY_VISION_MODEL=openai/gpt-4o
```
### Provider Options
These options apply to **auxiliary task configs** (`auxiliary:`, `compression:`, `fallback_model:`), not to your main `model.provider` setting.
| Provider | Description | Requirements |
|----------|-------------|-------------|
| `"auto"` | Best available (default). Vision tries OpenRouter → Nous → Codex. | — |
| `"openrouter"` | Force OpenRouter — routes to any model (Gemini, GPT-4o, Claude, etc.) | `OPENROUTER_API_KEY` |
| `"nous"` | Force Nous Portal | `hermes auth` |
| `"codex"` | Force Codex OAuth (ChatGPT account). Supports vision (gpt-5.3-codex). | `hermes model` → Codex |
| `"minimax-oauth"` | Force MiniMax OAuth (browser login, no API key). Uses MiniMax-M2.7-highspeed for auxiliary tasks. | `hermes model` → MiniMax (OAuth) |
| `"main"` | Use your active custom/main endpoint. This can come from `OPENAI_BASE_URL` + `OPENAI_API_KEY` or from a custom endpoint saved via `hermes model` / `config.yaml`. Works with OpenAI, local models, or any OpenAI-compatible API. **Auxiliary tasks only — not valid for `model.provider`.** | Custom endpoint credentials + base URL |
Direct API-key providers from the main provider catalog also work here when you want side tasks to bypass your default router. `gmi` is valid once `GMI_API_KEY` is configured:
```yaml
auxiliary:
compression:
provider: "gmi"
model: "anthropic/claude-opus-4.6"
```
For GMI auxiliary routing, use the exact model ID returned by GMI's `/v1/models` endpoint.
### Common Setups
**Using a direct custom endpoint** (clearer than `provider: "main"` for local/self-hosted APIs):
```yaml
auxiliary:
vision:
base_url: "http://localhost:1234/v1"
api_key: "local-key"
model: "qwen2.5-vl"
```
`base_url` takes precedence over `provider`, so this is the most explicit way to route an auxiliary task to a specific endpoint. For direct endpoint overrides, Hermes uses the configured `api_key` or falls back to `OPENAI_API_KEY`; it does not reuse `OPENROUTER_API_KEY` for that custom endpoint.
**Using OpenAI API key for vision:**
```yaml
# In ~/.hermes/.env:
# OPENAI_BASE_URL=https://api.openai.com/v1
# OPENAI_API_KEY=sk-...
auxiliary:
vision:
provider: "main"
model: "gpt-4o" # or "gpt-4o-mini" for cheaper
```
**Using OpenRouter for vision** (route to any model):
```yaml
auxiliary:
vision:
provider: "openrouter"
model: "openai/gpt-4o" # or "google/gemini-2.5-flash", etc.
```
**Using Codex OAuth** (ChatGPT Pro/Plus account — no API key needed):
```yaml
auxiliary:
vision:
provider: "codex" # uses your ChatGPT OAuth token
# model defaults to gpt-5.3-codex (supports vision)
```
**Using MiniMax OAuth** (browser login, no API key needed):
```yaml
model:
default: MiniMax-M2.7
provider: minimax-oauth
base_url: https://api.minimax.io/anthropic
```
Run `hermes model` and select **MiniMax (OAuth)** to log in and set this automatically. For the China region, the base URL will be `https://api.minimaxi.com/anthropic`. See the [MiniMax OAuth guide](../guides/minimax-oauth.md) for the full walkthrough.
**Using a local/self-hosted model:**
```yaml
auxiliary:
vision:
provider: "main" # uses your active custom endpoint
model: "my-local-model"
```
`provider: "main"` uses whatever provider Hermes uses for normal chat — whether that's a named custom provider (e.g. `beans`), a built-in provider like `openrouter`, or a legacy `OPENAI_BASE_URL` endpoint.
:::tip
If you use Codex OAuth as your main model provider, vision works automatically — no extra configuration needed. Codex is included in the auto-detection chain for vision.
:::
:::warning
**Vision requires a multimodal model.** If you set `provider: "main"`, make sure your endpoint supports multimodal/vision — otherwise image analysis will fail.
:::
### Environment Variables (legacy)
Auxiliary models can also be configured via environment variables. However, `config.yaml` is the preferred method — it's easier to manage and supports all options including `base_url` and `api_key`.
| Setting | Environment Variable |
|---------|---------------------|
| Vision provider | `AUXILIARY_VISION_PROVIDER` |
| Vision model | `AUXILIARY_VISION_MODEL` |
| Vision endpoint | `AUXILIARY_VISION_BASE_URL` |
| Vision API key | `AUXILIARY_VISION_API_KEY` |
| Web extract provider | `AUXILIARY_WEB_EXTRACT_PROVIDER` |
| Web extract model | `AUXILIARY_WEB_EXTRACT_MODEL` |
| Web extract endpoint | `AUXILIARY_WEB_EXTRACT_BASE_URL` |
| Web extract API key | `AUXILIARY_WEB_EXTRACT_API_KEY` |
Compression and fallback model settings are config.yaml-only.
:::tip
Run `hermes config` to see your current auxiliary model settings. Overrides only show up when they differ from the defaults.
:::
## Reasoning Effort
Control how much "thinking" the model does before responding:
```yaml
agent:
reasoning_effort: "" # empty = medium (default). Options: none, minimal, low, medium, high, xhigh (max)
```
When unset (default), reasoning effort defaults to "medium" — a balanced level that works well for most tasks. Setting a value overrides it — higher reasoning effort gives better results on complex tasks at the cost of more tokens and latency.
You can also change the reasoning effort at runtime with the `/reasoning` command:
```
/reasoning # Show current effort level and display state
/reasoning high # Set reasoning effort to high
/reasoning none # Disable reasoning
/reasoning show # Show model thinking above each response
/reasoning hide # Hide model thinking
```
## Tool-Use Enforcement
Some models occasionally describe intended actions as text instead of making tool calls ("I would run the tests..." instead of actually calling the terminal). Tool-use enforcement injects system prompt guidance that steers the model back to actually calling tools.
```yaml
agent:
tool_use_enforcement: "auto" # "auto" | true | false | ["model-substring", ...]
```
| Value | Behavior |
|-------|----------|
| `"auto"` (default) | Enabled for models matching: `gpt`, `codex`, `gemini`, `gemma`, `grok`. Disabled for all others (Claude, DeepSeek, Qwen, etc.). |
| `true` | Always enabled, regardless of model. Useful if you notice your current model describing actions instead of performing them. |
| `false` | Always disabled, regardless of model. |
| `["gpt", "codex", "qwen", "llama"]` | Enabled only when the model name contains one of the listed substrings (case-insensitive). |
### What it injects
When enabled, three layers of guidance may be added to the system prompt:
1. **General tool-use enforcement** (all matched models) — instructs the model to make tool calls immediately instead of describing intentions, keep working until the task is complete, and never end a turn with a promise of future action.
2. **OpenAI execution discipline** (GPT and Codex models only) — additional guidance addressing GPT-specific failure modes: abandoning work on partial results, skipping prerequisite lookups, hallucinating instead of using tools, and declaring "done" without verification.
3. **Google operational guidance** (Gemini and Gemma models only) — conciseness, absolute paths, parallel tool calls, and verify-before-edit patterns.
These are transparent to the user and only affect the system prompt. Models that already use tools reliably (like Claude) don't need this guidance, which is why `"auto"` excludes them.
### When to turn it on
If you're using a model not in the default auto list and notice it frequently describes what it *would* do instead of doing it, set `tool_use_enforcement: true` or add the model substring to the list:
```yaml
agent:
tool_use_enforcement: ["gpt", "codex", "gemini", "grok", "my-custom-model"]
```
## TTS Configuration
```yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts"
speed: 1.0 # Global speed multiplier (fallback for all providers)
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
speed: 1.0 # Speed multiplier (converted to rate percentage, e.g. 1.5 → +50%)
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB"
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
speed: 1.0 # Speed multiplier (clamped to 0.25–4.0 by the API)
base_url: "https://api.openai.com/v1" # Override for OpenAI-compatible TTS endpoints
minimax:
speed: 1.0 # Speech speed multiplier
# base_url: "" # Optional: override for OpenAI-compatible TTS endpoints
mistral:
model: "voxtral-mini-tts-2603"
voice_id: "c69964a6-ab8b-4f8a-9465-ec0925096ec8" # Paul - Neutral (default)
gemini:
model: "gemini-2.5-flash-preview-tts" # or gemini-2.5-pro-preview-tts
voice: "Kore" # 30 prebuilt voices: Zephyr, Puck, Kore, Enceladus, etc.
xai:
voice_id: "eve" # xAI TTS voice
language: "en" # ISO 639-1
sample_rate: 24000
bit_rate: 128000 # MP3 bitrate
# base_url: "https://api.x.ai/v1"
neutts:
ref_audio: ''
ref_text: ''
model: neuphonic/neutts-air-q4-gguf
device: cpu
```
This controls both the `text_to_speech` tool and spoken replies in voice mode (`/voice tts` in the CLI or messaging gateway).
**Speed fallback hierarchy:** provider-specific speed (e.g. `tts.edge.speed`) → global `tts.speed` → `1.0` default. Set the global `tts.speed` to apply a uniform speed across all providers, or override per-provider for fine-grained control.
## Display Settings
```yaml
display:
tool_progress: all # off | new | all | verbose
tool_progress_command: false # Enable /verbose slash command in messaging gateway
platforms: {} # Per-platform display overrides (see below)
tool_progress_overrides: {} # DEPRECATED — use display.platforms instead
interim_assistant_messages: true # Gateway: send natural mid-turn assistant updates as separate messages
skin: default # Built-in or custom CLI skin (see user-guide/features/skins)
personality: "kawaii" # Legacy cosmetic field still surfaced in some summaries
compact: false # Compact output mode (less whitespace)
resume_display: full # full (show previous messages on resume) | minimal (one-liner only)
bell_on_complete: false # Play terminal bell when agent finishes (great for long tasks)
show_reasoning: false # Show model reasoning/thinking above each response (toggle with /reasoning show|hide)
streaming: false # Stream tokens to terminal as they arrive (real-time output)
show_cost: false # Show estimated $ cost in the CLI status bar
tool_preview_length: 0 # Max chars for tool call previews (0 = no limit, show full paths/commands)
runtime_metadata_footer: false # Gateway: append a runtime-context footer to final replies
language: en # UI language for static messages (approval prompts, some gateway replies). en | zh | ja | de | es | fr | tr | uk
```
### UI language for static messages
The `display.language` setting translates a small set of static user-facing messages — the CLI approval prompt, a handful of gateway slash-command replies (e.g. restart-drain notices, "approval expired", "goal cleared"). It does **not** translate agent responses, log lines, tool output, error tracebacks, or slash-command descriptions — those stay in English. If you want the agent itself to reply in another language, just tell it in your prompt or system message.
Supported values: `en` (default), `zh` (Simplified Chinese), `ja` (Japanese), `de` (German), `es` (Spanish), `fr` (French), `tr` (Turkish), `uk` (Ukrainian). Unknown values fall back to English.
You can also set this per-session with the `HERMES_LANGUAGE` env var, which overrides the config value.
```yaml
display:
language: zh # CLI approval prompts appear in Chinese
```
| Mode | What you see |
|------|-------------|
| `off` | Silent — just the final response |
| `new` | Tool indicator only when the tool changes |
| `all` | Every tool call with a short preview (default) |
| `verbose` | Full args, results, and debug logs |
In the CLI, cycle through these modes with `/verbose`. To use `/verbose` in messaging platforms (Telegram, Discord, Slack, etc.), set `tool_progress_command: true` in the `display` section above. The command will then cycle the mode and save to config.
### Runtime-metadata footer (gateway only)
When `display.runtime_metadata_footer: true`, Hermes appends a small runtime-context footer to the **final** message of each gateway turn — same info the CLI shows in its status bar (model, session duration, tokens, cost). Off by default; opt in per-gateway if your team wants every reply to include the provenance.
```yaml
display:
runtime_metadata_footer: true
```
Example footer appended to a Telegram/Discord/Slack reply:
```
— claude-opus-4.7 · 12 tool calls · 2m 14s · $0.042
```
Only the **final** message of a turn gets the footer; interim updates stay clean.
### Per-platform progress overrides
Different platforms have different verbosity needs. For example, Signal can't edit messages, so each progress update becomes a separate message — noisy. Use `display.platforms` to set per-platform modes:
```yaml
display:
tool_progress: all # global default
platforms:
signal:
tool_progress: 'off' # silence progress on Signal
telegram:
tool_progress: verbose # detailed progress on Telegram
slack:
tool_progress: 'off' # quiet in shared Slack workspace
```
Platforms without an override fall back to the global `tool_progress` value. Valid platform keys: `telegram`, `discord`, `slack`, `signal`, `whatsapp`, `matrix`, `mattermost`, `email`, `sms`, `homeassistant`, `dingtalk`, `feishu`, `wecom`, `weixin`, `bluebubbles`, `qqbot`. The legacy `display.tool_progress_overrides` key still loads for backward compatibility but is deprecated and migrated into `display.platforms` on first load.
`interim_assistant_messages` is gateway-only. When enabled, Hermes sends completed mid-turn assistant updates as separate chat messages. This is independent from `tool_progress` and does not require gateway streaming.
## Privacy
```yaml
privacy:
redact_pii: false # Strip PII from LLM context (gateway only)
```
When `redact_pii` is `true`, the gateway redacts personally identifiable information from the system prompt before sending it to the LLM on supported platforms:
| Field | Treatment |
|-------|-----------|
| Phone numbers (user ID on WhatsApp/Signal) | Hashed to `user_<12-char-sha256>` |
| User IDs | Hashed to `user_<12-char-sha256>` |
| Chat IDs | Numeric portion hashed, platform prefix preserved (`telegram:`) |
| Home channel IDs | Numeric portion hashed |
| User names / usernames | **Not affected** (user-chosen, publicly visible) |
**Platform support:** Redaction applies to WhatsApp, Signal, and Telegram. Discord and Slack are excluded because their mention systems (`<@user_id>`) require the real ID in the LLM context.
Hashes are deterministic — the same user always maps to the same hash, so the model can still distinguish between users in group chats. Routing and delivery use the original values internally.
## Speech-to-Text (STT)
```yaml
stt:
provider: "local" # "local" | "groq" | "openai" | "mistral"
local:
model: "base" # tiny, base, small, medium, large-v3
openai:
model: "whisper-1" # whisper-1 | gpt-4o-mini-transcribe | gpt-4o-transcribe
# model: "whisper-1" # Legacy fallback key still respected
```
Provider behavior:
- `local` uses `faster-whisper` running on your machine. Install it separately with `pip install faster-whisper`.
- `groq` uses Groq's Whisper-compatible endpoint and reads `GROQ_API_KEY`.
- `openai` uses the OpenAI speech API and reads `VOICE_TOOLS_OPENAI_KEY`.
If the requested provider is unavailable, Hermes falls back automatically in this order: `local` → `groq` → `openai`.
Groq and OpenAI model overrides are environment-driven:
```bash
STT_GROQ_MODEL=whisper-large-v3-turbo
STT_OPENAI_MODEL=whisper-1
GROQ_BASE_URL=https://api.groq.com/openai/v1
STT_OPENAI_BASE_URL=https://api.openai.com/v1
```
## Voice Mode (CLI)
```yaml
voice:
record_key: "ctrl+b" # Push-to-talk key inside the CLI
max_recording_seconds: 120 # Hard stop for long recordings
auto_tts: false # Enable spoken replies automatically when /voice on
beep_enabled: true # Play record start/stop beeps in CLI voice mode
silence_threshold: 200 # RMS threshold for speech detection
silence_duration: 3.0 # Seconds of silence before auto-stop
```
Use `/voice on` in the CLI to enable microphone mode, `record_key` to start/stop recording, and `/voice tts` to toggle spoken replies. See [Voice Mode](/docs/user-guide/features/voice-mode) for end-to-end setup and platform-specific behavior.
## Streaming
Stream tokens to the terminal or messaging platforms as they arrive, instead of waiting for the full response.
### CLI Streaming
```yaml
display:
streaming: true # Stream tokens to terminal in real-time
show_reasoning: true # Also stream reasoning/thinking tokens (optional)
```
When enabled, responses appear token-by-token inside a streaming box. Tool calls are still captured silently. If the provider doesn't support streaming, it falls back to the normal display automatically.
### Gateway Streaming (Telegram, Discord, Slack)
```yaml
streaming:
enabled: true # Enable progressive message editing
transport: edit # "edit" (progressive message editing) or "off"
edit_interval: 0.3 # Seconds between message edits
buffer_threshold: 40 # Characters before forcing an edit flush
cursor: " ▉" # Cursor shown during streaming
fresh_final_after_seconds: 60 # Send fresh final (Telegram) when preview is this old; 0 = always edit in place
```
When enabled, the bot sends a message on the first token, then progressively edits it as more tokens arrive. Platforms that don't support message editing (Signal, Email, Home Assistant) are auto-detected on the first attempt — streaming is gracefully disabled for that session with no flood of messages.
For separate natural mid-turn assistant updates without progressive token editing, set `display.interim_assistant_messages: true`.
**Overflow handling:** If the streamed text exceeds the platform's message length limit (~4096 chars), the current message is finalized and a new one starts automatically.
**Fresh final (Telegram):** Telegram's `editMessageText` preserves the original message timestamp, so a long-running streamed reply would keep the first-token timestamp even after completion. When `fresh_final_after_seconds > 0` (default `60`), the completed reply is delivered as a brand-new message (with the stale preview best-effort deleted) so Telegram's visible timestamp reflects completion time. Short previews still finalize in place. Set to `0` to always edit in place.
:::note
Streaming is disabled by default. Enable it in `~/.hermes/config.yaml` to try the streaming UX.
:::
## Group Chat Session Isolation
Control whether shared chats keep one conversation per room or one conversation per participant:
```yaml
group_sessions_per_user: true # true = per-user isolation in groups/channels, false = one shared session per chat
```
- `true` is the default and recommended setting. In Discord channels, Telegram groups, Slack channels, and similar shared contexts, each sender gets their own session when the platform provides a user ID.
- `false` reverts to the old shared-room behavior. That can be useful if you explicitly want Hermes to treat a channel like one collaborative conversation, but it also means users share context, token costs, and interrupt state.
- Direct messages are unaffected. Hermes still keys DMs by chat/DM ID as usual.
- Threads stay isolated from their parent channel either way; with `true`, each participant also gets their own session inside the thread.
For the behavior details and examples, see [Sessions](/docs/user-guide/sessions) and the [Discord guide](/docs/user-guide/messaging/discord).
## Unauthorized DM Behavior
Control what Hermes does when an unknown user sends a direct message:
```yaml
unauthorized_dm_behavior: pair
whatsapp:
unauthorized_dm_behavior: ignore
```
- `pair` is the default. Hermes denies access, but replies with a one-time pairing code in DMs.
- `ignore` silently drops unauthorized DMs.
- Platform sections override the global default, so you can keep pairing enabled broadly while making one platform quieter.
## Quick Commands
Define custom commands that either run shell commands without invoking the LLM, or alias one slash command to another. Exec quick commands are zero-token and useful from messaging platforms (Telegram, Discord, etc.) for quick server checks or utility scripts.
```yaml
quick_commands:
status:
type: exec
command: systemctl status hermes-agent
disk:
type: exec
command: df -h /
update:
type: exec
command: cd ~/.hermes/hermes-agent && git pull && pip install -e .
gpu:
type: exec
command: nvidia-smi --query-gpu=name,utilization.gpu,memory.used,memory.total --format=csv,noheader
restart:
type: alias
target: /gateway restart
```
Usage: type `/status`, `/disk`, `/update`, `/gpu`, or `/restart` in the CLI or any messaging platform. `exec` commands run locally on the host and return the output directly — no LLM call, no tokens consumed. `alias` commands rewrite to the configured slash command target.
- **30-second timeout** — long-running commands are killed with an error message
- **Priority** — quick commands are checked before skill commands, so you can override skill names
- **Autocomplete** — quick commands are resolved at dispatch time and are not shown in the built-in slash-command autocomplete tables
- **Type** — supported types are `exec` and `alias`; other types show an error
- **Works everywhere** — CLI, Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant
String-only prompt shortcuts are not valid quick commands. For reusable prompt workflows, create a skill or alias to an existing slash command.
## Human Delay
Simulate human-like response pacing in messaging platforms:
```yaml
human_delay:
mode: "off" # off | natural | custom
min_ms: 800 # Minimum delay (custom mode)
max_ms: 2500 # Maximum delay (custom mode)
```
## Code Execution
Configure the `execute_code` tool:
```yaml
code_execution:
mode: project # project (default) | strict
timeout: 300 # Max execution time in seconds
max_tool_calls: 50 # Max tool calls within code execution
```
**`mode`** controls the working directory and Python interpreter for scripts:
- **`project`** (default) — scripts run in the session's working directory with the active virtualenv/conda env's python. Project deps (`pandas`, `torch`, project packages) and relative paths (`.env`, `./data.csv`) resolve naturally, matching what `terminal()` sees.
- **`strict`** — scripts run in a temp staging directory with `sys.executable` (Hermes's own python). Maximum reproducibility, but project deps and relative paths won't resolve.
Environment scrubbing (strips `*_API_KEY`, `*_TOKEN`, `*_SECRET`, `*_PASSWORD`, `*_CREDENTIAL`, `*_PASSWD`, `*_AUTH`) and the tool whitelist apply identically in both modes — switching mode does not change the security posture.
## Web Search Backends
The `web_search`, `web_extract`, and `web_crawl` tools support four backend providers. Configure the backend in `config.yaml` or via `hermes tools`:
```yaml
web:
backend: firecrawl # firecrawl | parallel | tavily | exa
```
| Backend | Env Var | Search | Extract | Crawl |
|---------|---------|--------|---------|-------|
| **Firecrawl** (default) | `FIRECRAWL_API_KEY` | ✔ | ✔ | ✔ |
| **Parallel** | `PARALLEL_API_KEY` | ✔ | ✔ | — |
| **Tavily** | `TAVILY_API_KEY` | ✔ | ✔ | ✔ |
| **Exa** | `EXA_API_KEY` | ✔ | ✔ | — |
**Backend selection:** If `web.backend` is not set, the backend is auto-detected from available API keys. If only `EXA_API_KEY` is set, Exa is used. If only `TAVILY_API_KEY` is set, Tavily is used. If only `PARALLEL_API_KEY` is set, Parallel is used. Otherwise Firecrawl is the default.
**Self-hosted Firecrawl:** Set `FIRECRAWL_API_URL` to point at your own instance. When a custom URL is set, the API key becomes optional (set `USE_DB_AUTHENTICATION=false` on the server to disable auth).
**Parallel search modes:** Set `PARALLEL_SEARCH_MODE` to control search behavior — `fast`, `one-shot`, or `agentic` (default: `agentic`).
**Exa:** Set `EXA_API_KEY` in `~/.hermes/.env`. Supports `category` filtering (`company`, `research paper`, `news`, `people`, `personal site`, `pdf`) and domain/date filters.
## Browser
Configure browser automation behavior:
```yaml
browser:
inactivity_timeout: 120 # Seconds before auto-closing idle sessions
command_timeout: 30 # Timeout in seconds for browser commands (screenshot, navigate, etc.)
record_sessions: false # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
# Optional CDP override — when set, Hermes attaches directly to your own
# Chrome (via /browser connect) rather than starting a headless browser.
cdp_url: ""
# Dialog supervisor — controls how native JS dialogs (alert / confirm / prompt)
# are handled when a CDP backend is attached (Browserbase, local Chrome via
# /browser connect). Ignored on Camofox and default local agent-browser mode.
dialog_policy: must_respond # must_respond | auto_dismiss | auto_accept
dialog_timeout_s: 300 # Safety auto-dismiss under must_respond (seconds)
camofox:
managed_persistence: false # When true, Camofox sessions persist cookies/logins across restarts
```
**Dialog policies:**
- `must_respond` (default) — capture the dialog, surface it in `browser_snapshot.pending_dialogs`, and wait for the agent to call `browser_dialog(action=...)`. After `dialog_timeout_s` seconds with no response, the dialog is auto-dismissed to prevent the page's JS thread from stalling forever.
- `auto_dismiss` — capture, dismiss immediately. The agent still sees the dialog record in `browser_snapshot.recent_dialogs` with `closed_by="auto_policy"` after the fact.
- `auto_accept` — capture, accept immediately. Useful for pages with aggressive `beforeunload` prompts.
See the [browser feature page](./features/browser.md#browser_dialog) for the full dialog workflow.
The browser toolset supports multiple providers. See the [Browser feature page](/docs/user-guide/features/browser) for details on Browserbase, Browser Use, and local Chrome CDP setup.
## Timezone
Override the server-local timezone with an IANA timezone string. Affects timestamps in logs, cron scheduling, and system prompt time injection.
```yaml
timezone: "America/New_York" # IANA timezone (default: "" = server-local time)
```
Supported values: any IANA timezone identifier (e.g. `America/New_York`, `Europe/London`, `Asia/Kolkata`, `UTC`). Leave empty or omit for server-local time.
## Discord
Configure Discord-specific behavior for the messaging gateway:
```yaml
discord:
require_mention: true # Require @mention to respond in server channels
free_response_channels: "" # Comma-separated channel IDs where bot responds without @mention
auto_thread: true # Auto-create threads on @mention in channels
```
- `require_mention` — when `true` (default), the bot only responds in server channels when mentioned with `@BotName`. DMs always work without mention.
- `free_response_channels` — comma-separated list of channel IDs where the bot responds to every message without requiring a mention.
- `auto_thread` — when `true` (default), mentions in channels automatically create a thread for the conversation, keeping channels clean (similar to Slack threading).
## Security
Pre-execution security scanning and secret redaction:
```yaml
security:
redact_secrets: false # Redact API key patterns in tool output and logs (off by default)
tirith_enabled: true # Enable Tirith security scanning for terminal commands
tirith_path: "tirith" # Path to tirith binary (default: "tirith" in $PATH)
tirith_timeout: 5 # Seconds to wait for tirith scan before timing out
tirith_fail_open: true # Allow command execution if tirith is unavailable
website_blocklist: # See Website Blocklist section below
enabled: false
domains: []
shared_files: []
```
- `redact_secrets` — when `true`, automatically detects and redacts patterns that look like API keys, tokens, and passwords in tool output before it enters the conversation context and logs. **Off by default** — enable if you commonly work with real credentials in tool output and want a safety net. Set to `true` explicitly to turn on.
- `tirith_enabled` — when `true`, terminal commands are scanned by [Tirith](https://github.com/StackGuardian/tirith) before execution to detect potentially dangerous operations.
- `tirith_path` — path to the tirith binary. Set this if tirith is installed in a non-standard location.
- `tirith_timeout` — maximum seconds to wait for a tirith scan. Commands proceed if the scan times out.
- `tirith_fail_open` — when `true` (default), commands are allowed to execute if tirith is unavailable or fails. Set to `false` to block commands when tirith cannot verify them.
## Website Blocklist
Block specific domains from being accessed by the agent's web and browser tools:
```yaml
security:
website_blocklist:
enabled: false # Enable URL blocking (default: false)
domains: # List of blocked domain patterns
- "*.internal.company.com"
- "admin.example.com"
- "*.local"
shared_files: # Load additional rules from external files
- "/etc/hermes/blocked-sites.txt"
```
When enabled, any URL matching a blocked domain pattern is rejected before the web or browser tool executes. This applies to `web_search`, `web_extract`, `browser_navigate`, and any tool that accesses URLs.
Domain rules support:
- Exact domains: `admin.example.com`
- Wildcard subdomains: `*.internal.company.com` (blocks all subdomains)
- TLD wildcards: `*.local`
Shared files contain one domain rule per line (blank lines and `#` comments are ignored). Missing or unreadable files log a warning but don't disable other web tools.
The policy is cached for 30 seconds, so config changes take effect quickly without restart.
## Smart Approvals
Control how Hermes handles potentially dangerous commands:
```yaml
approvals:
mode: manual # manual | smart | off
```
| Mode | Behavior |
|------|----------|
| `manual` (default) | Prompt the user before executing any flagged command. In the CLI, shows an interactive approval dialog. In messaging, queues a pending approval request. |
| `smart` | Use an auxiliary LLM to assess whether a flagged command is actually dangerous. Low-risk commands are auto-approved with session-level persistence. Genuinely risky commands are escalated to the user. |
| `off` | Skip all approval checks. Equivalent to `HERMES_YOLO_MODE=true`. **Use with caution.** |
Smart mode is particularly useful for reducing approval fatigue — it lets the agent work more autonomously on safe operations while still catching genuinely destructive commands.
:::warning
Setting `approvals.mode: off` disables all safety checks for terminal commands. Only use this in trusted, sandboxed environments.
:::
## Checkpoints
Automatic filesystem snapshots before destructive file operations. See the [Checkpoints & Rollback](/docs/user-guide/checkpoints-and-rollback) for details.
```yaml
checkpoints:
enabled: true # Enable automatic checkpoints (also: hermes --checkpoints)
max_snapshots: 50 # Max checkpoints to keep per directory
```
## Delegation
Configure subagent behavior for the delegate tool:
```yaml
delegation:
# model: "google/gemini-3-flash-preview" # Override model (empty = inherit parent)
# provider: "openrouter" # Override provider (empty = inherit parent)
# base_url: "http://localhost:1234/v1" # Direct OpenAI-compatible endpoint (takes precedence over provider)
# api_key: "local-key" # API key for base_url (falls back to OPENAI_API_KEY)
max_concurrent_children: 3 # Parallel children per batch (floor 1, no ceiling). Also via DELEGATION_MAX_CONCURRENT_CHILDREN env var.
max_spawn_depth: 1 # Delegation tree depth cap (1-3, clamped). 1 = flat (default): parent spawns leaves that cannot delegate. 2 = orchestrator children can spawn leaf grandchildren. 3 = three levels.
orchestrator_enabled: true # Global kill switch. When false, role="orchestrator" is ignored and every child is forced to leaf regardless of max_spawn_depth.
```
**Subagent provider:model override:** By default, subagents inherit the parent agent's provider and model. Set `delegation.provider` and `delegation.model` to route subagents to a different provider:model pair — e.g., use a cheap/fast model for narrowly-scoped subtasks while your primary agent runs an expensive reasoning model.
**Direct endpoint override:** If you want the obvious custom-endpoint path, set `delegation.base_url`, `delegation.api_key`, and `delegation.model`. That sends subagents directly to that OpenAI-compatible endpoint and takes precedence over `delegation.provider`. If `delegation.api_key` is omitted, Hermes falls back to `OPENAI_API_KEY` only.
The delegation provider uses the same credential resolution as CLI/gateway startup. All configured providers are supported: `openrouter`, `nous`, `copilot`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`. When a provider is set, the system automatically resolves the correct base URL, API key, and API mode — no manual credential wiring needed.
**Precedence:** `delegation.base_url` in config → `delegation.provider` in config → parent provider (inherited). `delegation.model` in config → parent model (inherited). Setting just `model` without `provider` changes only the model name while keeping the parent's credentials (useful for switching models within the same provider like OpenRouter).
**Width and depth:** `max_concurrent_children` caps how many subagents run in parallel per batch (default `3`, floor of 1, no ceiling). Can also be set via the `DELEGATION_MAX_CONCURRENT_CHILDREN` env var. When the model submits a `tasks` array longer than the cap, `delegate_task` returns a tool error explaining the limit rather than silently truncating. `max_spawn_depth` controls the delegation tree depth (clamped to 1-3). At the default `1`, delegation is flat: children cannot spawn grandchildren, and passing `role="orchestrator"` silently degrades to `leaf`. Raise to `2` so orchestrator children can spawn leaf grandchildren; `3` for three-level trees. The agent opts into orchestration per call via `role="orchestrator"`; `orchestrator_enabled: false` forces every child back to leaf regardless. Cost scales multiplicatively — at `max_spawn_depth: 3` with `max_concurrent_children: 3`, the tree can reach 3×3×3 = 27 concurrent leaf agents. See [Subagent Delegation → Depth Limit and Nested Orchestration](features/delegation.md#depth-limit-and-nested-orchestration) for usage patterns.
## Clarify
Configure the clarification prompt behavior:
```yaml
clarify:
timeout: 120 # Seconds to wait for user clarification response
```
## Context Files (SOUL.md, AGENTS.md)
Hermes uses two different context scopes:
| File | Purpose | Scope |
|------|---------|-------|
| `SOUL.md` | **Primary agent identity** — defines who the agent is (slot #1 in the system prompt) | `~/.hermes/SOUL.md` or `$HERMES_HOME/SOUL.md` |
| `.hermes.md` / `HERMES.md` | Project-specific instructions (highest priority) | Walks to git root |
| `AGENTS.md` | Project-specific instructions, coding conventions | Recursive directory walk |
| `CLAUDE.md` | Claude Code context files (also detected) | Working directory only |
| `.cursorrules` | Cursor IDE rules (also detected) | Working directory only |
| `.cursor/rules/*.mdc` | Cursor rule files (also detected) | Working directory only |
- **SOUL.md** is the agent's primary identity. It occupies slot #1 in the system prompt, completely replacing the built-in default identity. Edit it to fully customize who the agent is.
- If SOUL.md is missing, empty, or cannot be loaded, Hermes falls back to a built-in default identity.
- **Project context files use a priority system** — only ONE type is loaded (first match wins): `.hermes.md` → `AGENTS.md` → `CLAUDE.md` → `.cursorrules`. SOUL.md is always loaded independently.
- **AGENTS.md** is hierarchical: if subdirectories also have AGENTS.md, all are combined.
- Hermes automatically seeds a default `SOUL.md` if one does not already exist.
- All loaded context files are capped at 20,000 characters with smart truncation.
See also:
- [Personality & SOUL.md](/docs/user-guide/features/personality)
- [Context Files](/docs/user-guide/features/context-files)
## Working Directory
| Context | Default |
|---------|---------|
| **CLI (`hermes`)** | Current directory where you run the command |
| **Messaging gateway** | Home directory `~` (override with `MESSAGING_CWD`) |
| **Docker / Singularity / Modal / SSH** | User's home directory inside the container or remote machine |
Override the working directory:
```bash
# In ~/.hermes/.env or ~/.hermes/config.yaml:
MESSAGING_CWD=/home/myuser/projects # Gateway sessions
TERMINAL_CWD=/workspace # All terminal sessions
```
---
# user-guide/configuring-models
# Configuring Models
Hermes uses two kinds of model slots:
- **Main model** — what the agent thinks with. Every user message, every tool-call loop, every streamed response goes through this model.
- **Auxiliary models** — smaller side-jobs the agent offloads. Context compression, vision (image analysis), web-page summarization, session search, approval scoring, MCP tool routing, session-title generation, and skill search. Each has its own slot and can be overridden independently.
This page covers configuring both from the dashboard. If you prefer config files or the CLI, jump to [Alternative methods](#alternative-methods) at the bottom.
## The Models page
Open the dashboard and click **Models** in the sidebar. You get two sections:
1. **Model Settings** — the top panel, where you assign models to slots.
2. **Usage analytics** — ranked cards showing every model that ran a session in the selected period, with token counts, cost, and capability badges.

The top card is the **Model Settings** panel. The main row always shows what the agent will spin up for new sessions. Click **Change** to open the picker.
## Setting the main model
Click **Change** on the Main model row:

The picker has two columns:
- **Left** — authenticated providers. Only providers you've set up (API key set, OAuth'd, or defined as a custom endpoint) show up here. If a provider is missing, head to **Keys** and add its credential.
- **Right** — the curated model list for the selected provider. These are the agentic models Hermes recommends for that provider, not the raw `/models` dump (which on OpenRouter includes 400+ models including TTS, image generators, and rerankers).
Type in the filter box to narrow by provider name, slug, or model ID.
Pick a model, hit **Switch**, and Hermes writes it to `~/.hermes/config.yaml` under the `model` section. **This applies to new sessions only** — any chat tab you already have open keeps running whatever model it started with. To hot-swap the current chat, use the `/model` slash command inside it.
## Setting auxiliary models
Click **Show auxiliary** to reveal the eight task slots:

Every auxiliary task defaults to `auto` — meaning Hermes uses your main model for that job too. Override a specific task when you want a cheaper or faster model for a side-job.
### Common override patterns
| Task | When to override |
|---|---|
| **Title Gen** | Almost always. A $0.10/M flash model writes session titles as well as Opus. Default config sets this to `google/gemini-3-flash-preview` on OpenRouter. |
| **Vision** | When your main model is a coding model without vision (e.g. Kimi, DeepSeek). Point it at `google/gemini-2.5-flash` or `gpt-4o-mini`. |
| **Compression** | When you're burning reasoning tokens on Opus/M2.7 just to summarize context. A fast chat model does the job at 1/50th the cost. |
| **Session Search** | When recall queries fan out — default max_concurrency is 3. A cheap model keeps the bill predictable. |
| **Approval** | For `approval_mode: smart` — a fast/cheap model (haiku, flash, gpt-5-mini) decides whether to auto-approve low-risk commands. Expensive models here are waste. |
| **Web Extract** | When you use `web_extract` heavily. Same logic as compression — summarization doesn't need reasoning. |
| **Skills Hub** | `hermes skills search` uses this. Usually fine at `auto`. |
| **MCP** | MCP tool routing. Usually fine at `auto`. |
### Per-task override
Click **Change** on any auxiliary row. Same picker opens, same behavior — pick provider + model, hit Switch. The row updates to show `provider · model` instead of `auto (use main model)`.
### Reset all to auto
If you've over-tuned and want to start over, click **Reset all to auto** at the top of the auxiliary section. Every slot goes back to using your main model.
## The "Use as" shortcut
Every model card on the page has a **Use as** dropdown. This is the fast path — pick a model you see in your analytics, click **Use as**, and assign it to the main slot or any specific auxiliary task in one click:

The dropdown has:
- **Main model** — same as clicking Change on the main row.
- **All auxiliary tasks** — assigns this model to all 8 aux slots at once. Useful when you just want every side-job on a cheap flash model.
- **Individual task options** — Vision, Web Extract, Compression, etc. The currently-assigned model for each task is marked `current`.
Cards are badged with `main` or `aux · ` when they're currently assigned to something — so you can see at a glance which of your historical models are wired in where.
## What gets written to `config.yaml`
When you save via the dashboard, Hermes writes to `~/.hermes/config.yaml`:
**Main model:**
```yaml
model:
provider: openrouter
default: anthropic/claude-opus-4.7
base_url: '' # cleared on provider switch
api_mode: chat_completions
```
**Auxiliary override (example — vision on gemini-flash):**
```yaml
auxiliary:
vision:
provider: openrouter
model: google/gemini-2.5-flash
base_url: ''
api_key: ''
timeout: 120
extra_body: {}
download_timeout: 30
```
**Auxiliary on auto (default):**
```yaml
auxiliary:
compression:
provider: auto
model: ''
base_url: ''
# ... other fields unchanged
```
`provider: auto` with `model: ''` tells Hermes to use the main model for that task.
## When does it take effect?
- **CLI** (`hermes chat`): next `hermes chat` invocation.
- **Gateway** (Telegram, Discord, Slack, etc.): next *new* session. Existing sessions keep their model. Restart the gateway (`hermes gateway restart`) if you want to force all sessions to pick up the change.
- **Dashboard chat tab** (`/chat`): next new PTY. The currently-open chat keeps its model — use `/model` inside it to hot-swap.
Changes never invalidate prompt caches on running sessions. That's deliberate: swapping the main model inside a session requires a cache reset (the system prompt contains model-specific content), and we reserve that for the explicit `/model` slash command inside chat.
## Troubleshooting
### "No authenticated providers" in the picker
Hermes lists a provider only if it has a working credential. Check **Keys** in the sidebar — you should see one of: an API key, a successful OAuth, or a custom endpoint URL. If the provider you want isn't there, run `hermes setup` to wire it up, or go to **Keys** and add the env var.
### Main model didn't change in my running chat
Expected. The dashboard writes `config.yaml`, which new sessions read. The currently-open chat is a live agent process — it keeps whatever model it was spawned with. Use `/model ` inside the chat to hot-swap that specific session.
### Auxiliary override "didn't take effect"
Three things to check:
1. **Did you start a new session?** Existing chats don't re-read config.
2. **Is `provider` set to something other than `auto`?** If the field shows `auto`, the task is still using your main model. Click **Change** and pick a real provider.
3. **Is the provider authenticated?** If you assigned `minimax` to a task but don't have a MiniMax API key, that task falls back to the openrouter default and logs a warning in `agent.log`.
### I picked a model but Hermes switched providers on me
On OpenRouter (or any aggregator), bare model names resolve *within* the aggregator first. So `claude-sonnet-4` on OpenRouter becomes `anthropic/claude-sonnet-4.6`, staying on your OpenRouter auth. But if you typed `claude-sonnet-4` on a native Anthropic auth, it would stay as `claude-sonnet-4-6`. If you see an unexpected provider switch, check that your current provider is what you expect — the picker always shows the current main at the top of the dialog.
## Alternative methods
### CLI slash command
Inside any `hermes chat` session:
```
/model gpt-5.4 --provider openrouter # session-only
/model gpt-5.4 --provider openrouter --global # also persists to config.yaml
```
`--global` does the same thing the dashboard's **Change** button does, plus it switches the running session in-place.
### Custom aliases
Define your own short names for models you reach for often, then use `/model ` in the CLI or any messaging platform:
```yaml
# ~/.hermes/config.yaml
model_aliases:
fav:
model: claude-sonnet-4.6
provider: anthropic
grok:
model: grok-4
provider: x-ai
```
Or from the shell (short form, `provider/model`):
```bash
hermes config set model.aliases.fav anthropic/claude-opus-4.6
hermes config set model.aliases.grok x-ai/grok-4
```
Then `/model fav` or `/model grok` in chat. User aliases shadow built-in short names (`sonnet`, `kimi`, `opus`, etc.). See [Custom model aliases](/docs/reference/slash-commands#custom-model-aliases) for the full reference.
### `hermes model` subcommand
```bash
hermes model list # list authenticated providers + models
hermes model set anthropic/claude-opus-4.7 --provider openrouter
```
### Direct config edit
Edit `~/.hermes/config.yaml` and restart whatever reads it. See the [Configuration reference](./configuration.md) for the full schema.
### REST API
The dashboard uses three endpoints. Useful for scripting:
```bash
# List authenticated providers + curated model lists
curl -H "X-Hermes-Session-Token: $TOKEN" http://localhost:PORT/api/model/options
# Read current main + auxiliary assignments
curl -H "X-Hermes-Session-Token: $TOKEN" http://localhost:PORT/api/model/auxiliary
# Set the main model
curl -X POST -H "Content-Type: application/json" -H "X-Hermes-Session-Token: $TOKEN" \
-d '{"scope":"main","provider":"openrouter","model":"anthropic/claude-opus-4.7"}' \
http://localhost:PORT/api/model/set
# Override a single auxiliary task
curl -X POST -H "Content-Type: application/json" -H "X-Hermes-Session-Token: $TOKEN" \
-d '{"scope":"auxiliary","task":"vision","provider":"openrouter","model":"google/gemini-2.5-flash"}' \
http://localhost:PORT/api/model/set
# Assign one model to every auxiliary task
curl -X POST -H "Content-Type: application/json" -H "X-Hermes-Session-Token: $TOKEN" \
-d '{"scope":"auxiliary","task":"","provider":"openrouter","model":"google/gemini-2.5-flash"}' \
http://localhost:PORT/api/model/set
# Reset all auxiliary tasks to auto
curl -X POST -H "Content-Type: application/json" -H "X-Hermes-Session-Token: $TOKEN" \
-d '{"scope":"auxiliary","task":"__reset__","provider":"","model":""}' \
http://localhost:PORT/api/model/set
```
The session token is injected into the dashboard HTML at startup and rotates on every server restart. Grab it from the browser devtools (`window.__HERMES_SESSION_TOKEN__`) if you're scripting against a running dashboard.
---
# Sessions
# Sessions
Hermes Agent automatically saves every conversation as a session. Sessions enable conversation resume, cross-session search, and full conversation history management.
## How Sessions Work
Every conversation — whether from the CLI, Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Teams, or any other messaging platform — is stored as a session with full message history. Sessions are tracked in two complementary systems:
1. **SQLite database** (`~/.hermes/state.db`) — structured session metadata with FTS5 full-text search
2. **JSONL transcripts** (`~/.hermes/sessions/`) — raw conversation transcripts including tool calls (gateway)
The SQLite database stores:
- Session ID, source platform, user ID
- **Session title** (unique, human-readable name)
- Model name and configuration
- System prompt snapshot
- Full message history (role, content, tool calls, tool results)
- Token counts (input/output)
- Timestamps (started_at, ended_at)
- Parent session ID (for compression-triggered session splitting)
### Session Sources
Each session is tagged with its source platform:
| Source | Description |
|--------|-------------|
| `cli` | Interactive CLI (`hermes` or `hermes chat`) |
| `telegram` | Telegram messenger |
| `discord` | Discord server/DM |
| `slack` | Slack workspace |
| `whatsapp` | WhatsApp messenger |
| `signal` | Signal messenger |
| `matrix` | Matrix rooms and DMs |
| `mattermost` | Mattermost channels |
| `email` | Email (IMAP/SMTP) |
| `sms` | SMS via Twilio |
| `dingtalk` | DingTalk messenger |
| `feishu` | Feishu/Lark messenger |
| `wecom` | WeCom (WeChat Work) |
| `weixin` | Weixin (personal WeChat) |
| `bluebubbles` | Apple iMessage via BlueBubbles macOS server |
| `qqbot` | QQ Bot (Tencent QQ) via Official API v2 |
| `homeassistant` | Home Assistant conversation |
| `webhook` | Incoming webhooks |
| `api-server` | API server requests |
| `acp` | ACP editor integration |
| `cron` | Scheduled cron jobs |
| `batch` | Batch processing runs |
## CLI Session Resume
Resume previous conversations from the CLI using `--continue` or `--resume`:
### Continue Last Session
```bash
# Resume the most recent CLI session
hermes --continue
hermes -c
# Or with the chat subcommand
hermes chat --continue
hermes chat -c
```
This looks up the most recent `cli` session from the SQLite database and loads its full conversation history.
### Resume by Name
If you've given a session a title (see [Session Naming](#session-naming) below), you can resume it by name:
```bash
# Resume a named session
hermes -c "my project"
# If there are lineage variants (my project, my project #2, my project #3),
# this automatically resumes the most recent one
hermes -c "my project" # → resumes "my project #3"
```
### Resume Specific Session
```bash
# Resume a specific session by ID
hermes --resume 20250305_091523_a1b2c3d4
hermes -r 20250305_091523_a1b2c3d4
# Resume by title
hermes --resume "refactoring auth"
# Or with the chat subcommand
hermes chat --resume 20250305_091523_a1b2c3d4
```
Session IDs are shown when you exit a CLI session, and can be found with `hermes sessions list`.
### Conversation Recap on Resume
When you resume a session, Hermes displays a compact recap of the previous conversation in a styled panel before the input prompt:
Resume mode shows a compact recap panel with recent user and assistant turns before returning you to the live prompt.
The recap:
- Shows **user messages** (gold `●`) and **assistant responses** (green `◆`)
- **Truncates** long messages (300 chars for user, 200 chars / 3 lines for assistant)
- **Collapses tool calls** to a count with tool names (e.g., `[3 tool calls: terminal, web_search]`)
- **Hides** system messages, tool results, and internal reasoning
- **Caps** at the last 10 exchanges with a "... N earlier messages ..." indicator
- Uses **dim styling** to distinguish from the active conversation
To disable the recap and keep the minimal one-liner behavior, set in `~/.hermes/config.yaml`:
```yaml
display:
resume_display: minimal # default: full
```
:::tip
Session IDs follow the format `YYYYMMDD_HHMMSS_` — CLI/TUI sessions use a 6-char hex suffix (e.g. `20250305_091523_a1b2c3`), gateway sessions use an 8-char suffix (e.g. `20250305_091523_a1b2c3d4`). You can resume by ID (full or unique prefix) or by title — both work with `-c` and `-r`.
:::
## Session Naming
Give sessions human-readable titles so you can find and resume them easily.
### Auto-Generated Titles
Hermes automatically generates a short descriptive title (3–7 words) for each session after the first exchange. This runs in a background thread using a fast auxiliary model, so it adds no latency. You'll see auto-generated titles when browsing sessions with `hermes sessions list` or `hermes sessions browse`.
Auto-titling only fires once per session and is skipped if you've already set a title manually.
### Setting a Title Manually
Use the `/title` slash command inside any chat session (CLI or gateway):
```
/title my research project
```
The title is applied immediately. If the session hasn't been created in the database yet (e.g., you run `/title` before sending your first message), it's queued and applied once the session starts.
You can also rename existing sessions from the command line:
```bash
hermes sessions rename 20250305_091523_a1b2c3d4 "refactoring auth module"
```
### Title Rules
- **Unique** — no two sessions can share the same title
- **Max 100 characters** — keeps listing output clean
- **Sanitized** — control characters, zero-width chars, and RTL overrides are stripped automatically
- **Normal Unicode is fine** — emoji, CJK, accented characters all work
### Auto-Lineage on Compression
When a session's context is compressed (manually via `/compress` or automatically), Hermes creates a new continuation session. If the original had a title, the new session automatically gets a numbered title:
```
"my project" → "my project #2" → "my project #3"
```
When you resume by name (`hermes -c "my project"`), it automatically picks the most recent session in the lineage.
### /title in Messaging Platforms
The `/title` command works in all gateway platforms (Telegram, Discord, Slack, WhatsApp):
- `/title My Research` — set the session title
- `/title` — show the current title
## Session Management Commands
Hermes provides a full set of session management commands via `hermes sessions`:
### List Sessions
```bash
# List recent sessions (default: last 20)
hermes sessions list
# Filter by platform
hermes sessions list --source telegram
# Show more sessions
hermes sessions list --limit 50
```
When sessions have titles, the output shows titles, previews, and relative timestamps:
```
Title Preview Last Active ID
────────────────────────────────────────────────────────────────────────────────────────────────
refactoring auth Help me refactor the auth module please 2h ago 20250305_091523_a
my project #3 Can you check the test failures? yesterday 20250304_143022_e
— What's the weather in Las Vegas? 3d ago 20250303_101500_f
```
When no sessions have titles, a simpler format is used:
```
Preview Last Active Src ID
──────────────────────────────────────────────────────────────────────────────────────
Help me refactor the auth module please 2h ago cli 20250305_091523_a
What's the weather in Las Vegas? 3d ago tele 20250303_101500_f
```
### Export Sessions
```bash
# Export all sessions to a JSONL file
hermes sessions export backup.jsonl
# Export sessions from a specific platform
hermes sessions export telegram-history.jsonl --source telegram
# Export a single session
hermes sessions export session.jsonl --session-id 20250305_091523_a1b2c3d4
```
Exported files contain one JSON object per line with full session metadata and all messages.
### Delete a Session
```bash
# Delete a specific session (with confirmation)
hermes sessions delete 20250305_091523_a1b2c3d4
# Delete without confirmation
hermes sessions delete 20250305_091523_a1b2c3d4 --yes
```
### Rename a Session
```bash
# Set or change a session's title
hermes sessions rename 20250305_091523_a1b2c3d4 "debugging auth flow"
# Multi-word titles don't need quotes in the CLI
hermes sessions rename 20250305_091523_a1b2c3d4 debugging auth flow
```
If the title is already in use by another session, an error is shown.
### Prune Old Sessions
```bash
# Delete ended sessions older than 90 days (default)
hermes sessions prune
# Custom age threshold
hermes sessions prune --older-than 30
# Only prune sessions from a specific platform
hermes sessions prune --source telegram --older-than 60
# Skip confirmation
hermes sessions prune --older-than 30 --yes
```
:::info
Pruning only deletes **ended** sessions (sessions that have been explicitly ended or auto-reset). Active sessions are never pruned.
:::
### Session Statistics
```bash
hermes sessions stats
```
Output:
```
Total sessions: 142
Total messages: 3847
cli: 89 sessions
telegram: 38 sessions
discord: 15 sessions
Database size: 12.4 MB
```
For deeper analytics — token usage, cost estimates, tool breakdown, and activity patterns — use [`hermes insights`](/docs/reference/cli-commands#hermes-insights).
## Session Search Tool
The agent has a built-in `session_search` tool that performs full-text search across all past conversations using SQLite's FTS5 engine.
### How It Works
1. FTS5 searches matching messages ranked by relevance
2. Groups results by session, takes the top N unique sessions (default 3)
3. Loads each session's conversation, truncates to ~100K chars centered on matches
4. Sends to a fast summarization model for focused summaries
5. Returns per-session summaries with metadata and surrounding context
### FTS5 Query Syntax
The search supports standard FTS5 query syntax:
- Simple keywords: `docker deployment`
- Phrases: `"exact phrase"`
- Boolean: `docker OR kubernetes`, `python NOT java`
- Prefix: `deploy*`
### When It's Used
The agent is prompted to use session search automatically:
> *"When the user references something from a past conversation or you suspect relevant prior context exists, use session_search to recall it before asking them to repeat themselves."*
## Per-Platform Session Tracking
### Gateway Sessions
On messaging platforms, sessions are keyed by a deterministic session key built from the message source:
| Chat Type | Default Key Format | Behavior |
|-----------|--------------------|----------|
| Telegram DM | `agent:main:telegram:dm:` | One session per DM chat |
| Discord DM | `agent:main:discord:dm:` | One session per DM chat |
| WhatsApp DM | `agent:main:whatsapp:dm:` | One session per DM user (LID/phone aliases collapse to one identity when mapping exists) |
| Group chat | `agent:main::group::` | Per-user inside the group when the platform exposes a user ID |
| Group thread/topic | `agent:main::group::` | Shared session for all thread participants (default). Per-user with `thread_sessions_per_user: true`. |
| Channel | `agent:main::channel::` | Per-user inside the channel when the platform exposes a user ID |
When Hermes cannot get a participant identifier for a shared chat, it falls back to one shared session for that room.
### Shared vs Isolated Group Sessions
By default, Hermes uses `group_sessions_per_user: true` in `config.yaml`. That means:
- Alice and Bob can both talk to Hermes in the same Discord channel without sharing transcript history
- one user's long tool-heavy task does not pollute another user's context window
- interrupt handling also stays per-user because the running-agent key matches the isolated session key
If you want one shared "room brain" instead, set:
```yaml
group_sessions_per_user: false
```
That reverts groups/channels to a single shared session per room, which preserves shared conversational context but also shares token costs, interrupt state, and context growth.
### Session Reset Policies
Gateway sessions are automatically reset based on configurable policies:
- **idle** — reset after N minutes of inactivity
- **daily** — reset at a specific hour each day
- **both** — reset on whichever comes first (idle or daily)
- **none** — never auto-reset
Before a session is auto-reset, the agent is given a turn to save any important memories or skills from the conversation.
Sessions with **active background processes** are never auto-reset, regardless of policy.
## Storage Locations
| What | Path | Description |
|------|------|-------------|
| SQLite database | `~/.hermes/state.db` | All session metadata + messages with FTS5 |
| Gateway transcripts | `~/.hermes/sessions/` | JSONL transcripts per session + sessions.json index |
| Gateway index | `~/.hermes/sessions/sessions.json` | Maps session keys to active session IDs |
The SQLite database uses WAL mode for concurrent readers and a single writer, which suits the gateway's multi-platform architecture well.
### Database Schema
Key tables in `state.db`:
- **sessions** — session metadata (id, source, user_id, model, title, timestamps, token counts). Titles have a unique index (NULL titles allowed, only non-NULL must be unique).
- **messages** — full message history (role, content, tool_calls, tool_name, token_count)
- **messages_fts** — FTS5 virtual table for full-text search across message content
## Session Expiry and Cleanup
### Automatic Cleanup
- Gateway sessions auto-reset based on the configured reset policy
- Before reset, the agent saves memories and skills from the expiring session
- Opt-in auto-pruning: when `sessions.auto_prune` is `true`, ended sessions older than `sessions.retention_days` (default 90) are pruned at CLI/gateway startup
- After a prune that actually removed rows, `state.db` is `VACUUM`ed to reclaim disk space (SQLite does not shrink the file on plain DELETE)
- Pruning runs at most once per `sessions.min_interval_hours` (default 24); the last-run timestamp is tracked inside `state.db` itself so it's shared across every Hermes process in the same `HERMES_HOME`
Default is **off** — session history is valuable for `session_search` recall, and silently deleting it could surprise users. Enable in `~/.hermes/config.yaml`:
```yaml
sessions:
auto_prune: true # opt in — default is false
retention_days: 90 # keep ended sessions this many days
vacuum_after_prune: true # reclaim disk space after a pruning sweep
min_interval_hours: 24 # don't re-run the sweep more often than this
```
Active sessions are never auto-pruned, regardless of age.
### Manual Cleanup
```bash
# Prune sessions older than 90 days
hermes sessions prune
# Delete a specific session
hermes sessions delete
# Export before pruning (backup)
hermes sessions export backup.jsonl
hermes sessions prune --older-than 30 --yes
```
:::tip
The database grows slowly (typical: 10-15 MB for hundreds of sessions) and session history powers `session_search` recall across past conversations, so auto-prune ships disabled. Enable it if you're running a heavy gateway/cron workload where `state.db` is meaningfully affecting performance (observed failure mode: 384 MB state.db with ~1000 sessions slowing down FTS5 inserts and `/resume` listing). Use `hermes sessions prune` for one-off cleanup without turning on the automatic sweep.
:::
---
# user-guide/profiles
# Profiles: Running Multiple Agents
Run multiple independent Hermes agents on the same machine — each with its own config, API keys, memory, sessions, skills, and gateway state.
## What are profiles?
A profile is a separate Hermes home directory. Each profile gets its own directory containing its own `config.yaml`, `.env`, `SOUL.md`, memories, sessions, skills, cron jobs, and state database. Profiles let you run separate agents for different purposes — a coding assistant, a personal bot, a research agent — without mixing up Hermes state.
When you create a profile, it automatically becomes its own command. Create a profile called `coder` and you immediately have `coder chat`, `coder setup`, `coder gateway start`, etc.
## Quick start
```bash
hermes profile create coder # creates profile + "coder" command alias
coder setup # configure API keys and model
coder chat # start chatting
```
That's it. `coder` is now its own Hermes profile with its own config, memory, and state.
## Creating a profile
### Blank profile
```bash
hermes profile create mybot
```
Creates a fresh profile with bundled skills seeded. Run `mybot setup` to configure API keys, model, and gateway tokens.
### Clone config only (`--clone`)
```bash
hermes profile create work --clone
```
Copies your current profile's `config.yaml`, `.env`, and `SOUL.md` into the new profile. Same API keys and model, but fresh sessions and memory. Edit `~/.hermes/profiles/work/.env` for different API keys, or `~/.hermes/profiles/work/SOUL.md` for a different personality.
### Clone everything (`--clone-all`)
```bash
hermes profile create backup --clone-all
```
Copies **everything** — config, API keys, personality, all memories, full session history, skills, cron jobs, plugins. A complete snapshot. Useful for backups or forking an agent that already has context.
### Clone from a specific profile
```bash
hermes profile create work --clone --clone-from coder
```
:::tip Honcho memory + profiles
When Honcho is enabled, `--clone` automatically creates a dedicated AI peer for the new profile while sharing the same user workspace. Each profile builds its own observations and identity. See [Honcho -- Multi-agent / Profiles](./features/memory-providers.md#honcho) for details.
:::
## Using profiles
### Command aliases
Every profile automatically gets a command alias at `~/.local/bin/`:
```bash
coder chat # chat with the coder agent
coder setup # configure coder's settings
coder gateway start # start coder's gateway
coder doctor # check coder's health
coder skills list # list coder's skills
coder config set model.default anthropic/claude-sonnet-4
```
The alias works with every hermes subcommand — it's just `hermes -p ` under the hood.
### The `-p` flag
You can also target a profile explicitly with any command:
```bash
hermes -p coder chat
hermes --profile=coder doctor
hermes chat -p coder -q "hello" # works in any position
```
### Sticky default (`hermes profile use`)
```bash
hermes profile use coder
hermes chat # now targets coder
hermes tools # configures coder's tools
hermes profile use default # switch back
```
Sets a default so plain `hermes` commands target that profile. Like `kubectl config use-context`.
### Knowing where you are
The CLI always shows which profile is active:
- **Prompt**: `coder ❯` instead of `❯`
- **Banner**: Shows `Profile: coder` on startup
- **`hermes profile`**: Shows current profile name, path, model, gateway status
## Profiles vs workspaces vs sandboxing
Profiles are often confused with workspaces or sandboxes, but they are different things:
- A **profile** gives Hermes its own state directory: `config.yaml`, `.env`, `SOUL.md`, sessions, memory, logs, cron jobs, and gateway state.
- A **workspace** or **working directory** is where terminal commands start. That is controlled separately by `terminal.cwd`.
- A **sandbox** is what limits filesystem access. Profiles do **not** sandbox the agent.
On the default `local` terminal backend, the agent still has the same filesystem access as your user account. A profile does not stop it from accessing folders outside the profile directory.
If you want a profile to start in a specific project folder, set an explicit absolute `terminal.cwd` in that profile's `config.yaml`:
```yaml
terminal:
backend: local
cwd: /absolute/path/to/project
```
Using `cwd: "."` on the local backend means "the directory Hermes was launched from", not "the profile directory".
Also note:
- `SOUL.md` can guide the model, but it does not enforce a workspace boundary.
- Changes to `SOUL.md` take effect cleanly on a new session. Existing sessions may still be using the old prompt state.
- Asking the model "what directory are you in?" is not a reliable isolation test. If you need a predictable starting directory for tools, set `terminal.cwd` explicitly.
## Running gateways
Each profile runs its own gateway as a separate process with its own bot token:
```bash
coder gateway start # starts coder's gateway
assistant gateway start # starts assistant's gateway (separate process)
```
### Different bot tokens
Each profile has its own `.env` file. Configure a different Telegram/Discord/Slack bot token in each:
```bash
# Edit coder's tokens
nano ~/.hermes/profiles/coder/.env
# Edit assistant's tokens
nano ~/.hermes/profiles/assistant/.env
```
### Safety: token locks
If two profiles accidentally use the same bot token, the second gateway will be blocked with a clear error naming the conflicting profile. Supported for Telegram, Discord, Slack, WhatsApp, and Signal.
### Persistent services
```bash
coder gateway install # creates hermes-gateway-coder systemd/launchd service
assistant gateway install # creates hermes-gateway-assistant service
```
Each profile gets its own service name. They run independently.
## Configuring profiles
Each profile has its own:
- **`config.yaml`** — model, provider, toolsets, all settings
- **`.env`** — API keys, bot tokens
- **`SOUL.md`** — personality and instructions
```bash
coder config set model.default anthropic/claude-sonnet-4
echo "You are a focused coding assistant." > ~/.hermes/profiles/coder/SOUL.md
```
If you want this profile to work in a specific project by default, also set its own `terminal.cwd`:
```bash
coder config set terminal.cwd /absolute/path/to/project
```
## Updating
`hermes update` pulls code once (shared) and syncs new bundled skills to **all** profiles automatically:
```bash
hermes update
# → Code updated (12 commits)
# → Skills synced: default (up to date), coder (+2 new), assistant (+2 new)
```
User-modified skills are never overwritten.
## Managing profiles
```bash
hermes profile list # show all profiles with status
hermes profile show coder # detailed info for one profile
hermes profile rename coder dev-bot # rename (updates alias + service)
hermes profile export coder # export to coder.tar.gz
hermes profile import coder.tar.gz # import from archive
```
## Deleting a profile
```bash
hermes profile delete coder
```
This stops the gateway, removes the systemd/launchd service, removes the command alias, and deletes all profile data. You'll be asked to type the profile name to confirm.
Use `--yes` to skip confirmation: `hermes profile delete coder --yes`
:::note
You cannot delete the default profile (`~/.hermes`). To remove everything, use `hermes uninstall`.
:::
## Tab completion
```bash
# Bash
eval "$(hermes completion bash)"
# Zsh
eval "$(hermes completion zsh)"
```
Add the line to your `~/.bashrc` or `~/.zshrc` for persistent completion. Completes profile names after `-p`, profile subcommands, and top-level commands.
## How it works
Profiles use the `HERMES_HOME` environment variable. When you run `coder chat`, the wrapper script sets `HERMES_HOME=~/.hermes/profiles/coder` before launching hermes. Since 119+ files in the codebase resolve paths via `get_hermes_home()`, Hermes state automatically scopes to the profile's directory — config, sessions, memory, skills, state database, gateway PID, logs, and cron jobs.
This is separate from terminal working directory. Tool execution starts from `terminal.cwd` (or the launch directory when `cwd: "."` on the local backend), not automatically from `HERMES_HOME`.
The default profile is simply `~/.hermes` itself. No migration needed — existing installs work identically.
---
# Git Worktrees
# Git Worktrees
Hermes Agent is often used on large, long‑lived repositories. When you want to:
- Run **multiple agents in parallel** on the same project, or
- Keep experimental refactors isolated from your main branch,
Git **worktrees** are the safest way to give each agent its own checkout without duplicating the entire repository.
This page shows how to combine worktrees with Hermes so each session has a clean, isolated working directory.
## Why Use Worktrees with Hermes?
Hermes treats the **current working directory** as the project root:
- CLI: the directory where you run `hermes` or `hermes chat`
- Messaging gateways: the directory set by `MESSAGING_CWD`
If you run multiple agents in the **same checkout**, their changes can interfere with each other:
- One agent may delete or rewrite files the other is using.
- It becomes harder to understand which changes belong to which experiment.
With worktrees, each agent gets:
- Its **own branch and working directory**
- Its **own Checkpoint Manager history** for `/rollback`
See also: [Checkpoints and /rollback](./checkpoints-and-rollback.md).
## Quick Start: Creating a Worktree
From your main repository (containing `.git/`), create a new worktree for a feature branch:
```bash
# From the main repo root
cd /path/to/your/repo
# Create a new branch and worktree in ../repo-feature
git worktree add ../repo-feature feature/hermes-experiment
```
This creates:
- A new directory: `../repo-feature`
- A new branch: `feature/hermes-experiment` checked out in that directory
Now you can `cd` into the new worktree and run Hermes there:
```bash
cd ../repo-feature
# Start Hermes in the worktree
hermes
```
Hermes will:
- See `../repo-feature` as the project root.
- Use that directory for context files, code edits, and tools.
- Use a **separate checkpoint history** for `/rollback` scoped to this worktree.
## Running Multiple Agents in Parallel
You can create multiple worktrees, each with its own branch:
```bash
cd /path/to/your/repo
git worktree add ../repo-experiment-a feature/hermes-a
git worktree add ../repo-experiment-b feature/hermes-b
```
In separate terminals:
```bash
# Terminal 1
cd ../repo-experiment-a
hermes
# Terminal 2
cd ../repo-experiment-b
hermes
```
Each Hermes process:
- Works on its own branch (`feature/hermes-a` vs `feature/hermes-b`).
- Writes checkpoints under a different shadow repo hash (derived from the worktree path).
- Can use `/rollback` independently without affecting the other.
This is especially useful when:
- Running batch refactors.
- Trying different approaches to the same task.
- Pairing CLI + gateway sessions against the same upstream repo.
## Cleaning Up Worktrees Safely
When you are done with an experiment:
1. Decide whether to keep or discard the work.
2. If you want to keep it:
- Merge the branch into your main branch as usual.
3. Remove the worktree:
```bash
cd /path/to/your/repo
# Remove the worktree directory and its reference
git worktree remove ../repo-feature
```
Notes:
- `git worktree remove` will refuse to remove a worktree with uncommitted changes unless you force it.
- Removing a worktree does **not** automatically delete the branch; you can delete or keep the branch using normal `git branch` commands.
- Hermes checkpoint data under `~/.hermes/checkpoints/` is not automatically pruned when you remove a worktree, but it is usually very small.
## Best Practices
- **One worktree per Hermes experiment**
- Create a dedicated branch/worktree for each substantial change.
- This keeps diffs focused and PRs small and reviewable.
- **Name branches after the experiment**
- e.g. `feature/hermes-checkpoints-docs`, `feature/hermes-refactor-tests`.
- **Commit frequently**
- Use git commits for high‑level milestones.
- Use [checkpoints and /rollback](./checkpoints-and-rollback.md) as a safety net for tool‑driven edits in between.
- **Avoid running Hermes from the bare repo root when using worktrees**
- Prefer the worktree directories instead, so each agent has a clear scope.
## Using `hermes -w` (Automatic Worktree Mode)
Hermes has a built‑in `-w` flag that **automatically creates a disposable git worktree** with its own branch. You don't need to set up worktrees manually — just `cd` into your repo and run:
```bash
cd /path/to/your/repo
hermes -w
```
Hermes will:
- Create a temporary worktree under `.worktrees/` inside your repo.
- Check out an isolated branch (e.g. `hermes/hermes-`).
- Run the full CLI session inside that worktree.
This is the easiest way to get worktree isolation. You can also combine it with a single query:
```bash
hermes -w -q "Fix issue #123"
```
For parallel agents, open multiple terminals and run `hermes -w` in each — every invocation gets its own worktree and branch automatically.
## Putting It All Together
- Use **git worktrees** to give each Hermes session its own clean checkout.
- Use **branches** to capture the high‑level history of your experiments.
- Use **checkpoints + `/rollback`** to recover from mistakes inside each worktree.
This combination gives you:
- Strong guarantees that different agents and experiments do not step on each other.
- Fast iteration cycles with easy recovery from bad edits.
- Clean, reviewable pull requests.
---
# Docker
# Hermes Agent — Docker
There are two distinct ways Docker intersects with Hermes Agent:
1. **Running Hermes IN Docker** — the agent itself runs inside a container (this page's primary focus)
2. **Docker as a terminal backend** — the agent runs on your host but executes every command inside a single, persistent Docker sandbox container that survives across tool calls, `/new`, and subagents for the life of the Hermes process (see [Configuration → Docker Backend](./configuration.md#docker-backend))
This page covers option 1. The container stores all user data (config, API keys, sessions, skills, memories) in a single directory mounted from the host at `/opt/data`. The image itself is stateless and can be upgraded by pulling a new version without losing any configuration.
## Quick start
If this is your first time running Hermes Agent, create a data directory on the host and start the container interactively to run the setup wizard:
```sh
mkdir -p ~/.hermes
docker run -it --rm \
-v ~/.hermes:/opt/data \
nousresearch/hermes-agent setup
```
This drops you into the setup wizard, which will prompt you for your API keys and write them to `~/.hermes/.env`. You only need to do this once. It is highly recommended to set up a chat system for the gateway to work with at this point.
## Running in gateway mode
Once configured, run the container in the background as a persistent gateway (Telegram, Discord, Slack, WhatsApp, etc.):
```sh
docker run -d \
--name hermes \
--restart unless-stopped \
-v ~/.hermes:/opt/data \
-p 8642:8642 \
nousresearch/hermes-agent gateway run
```
Port 8642 exposes the gateway's [OpenAI-compatible API server](./features/api-server.md) and health endpoint. It's optional if you only use chat platforms (Telegram, Discord, etc.), but required if you want the dashboard or external tools to reach the gateway.
Note: the API server is gated on `API_SERVER_ENABLED=true`. To expose it beyond `127.0.0.1` inside the container, also set `API_SERVER_HOST=0.0.0.0` and an `API_SERVER_KEY` (minimum 8 characters — generate one with `openssl rand -hex 32`). Example:
```sh
docker run -d \
--name hermes \
--restart unless-stopped \
-v ~/.hermes:/opt/data \
-p 8642:8642 \
-e API_SERVER_ENABLED=true \
-e API_SERVER_HOST=0.0.0.0 \
-e API_SERVER_KEY=your_api_key_here \
-e API_SERVER_CORS_ORIGINS='*' \
nousresearch/hermes-agent gateway run
```
Opening any port on an internet facing machine is a security risk. You should not do it unless you understand the risks.
## Running the dashboard
The built-in web dashboard runs as an optional side-process inside the same container as the gateway. Set `HERMES_DASHBOARD=1` and expose port `9119` alongside the gateway's `8642`:
```sh
docker run -d \
--name hermes \
--restart unless-stopped \
-v ~/.hermes:/opt/data \
-p 8642:8642 \
-p 9119:9119 \
-e HERMES_DASHBOARD=1 \
nousresearch/hermes-agent gateway run
```
The entrypoint starts `hermes dashboard` in the background (running as the non-root `hermes` user) before `exec`-ing the main command. Dashboard output is prefixed with `[dashboard]` in `docker logs` so it's easy to separate from gateway logs.
| Environment variable | Description | Default |
|---------------------|-------------|---------|
| `HERMES_DASHBOARD` | Set to `1` (or `true` / `yes`) to launch the dashboard alongside the main command | *(unset — dashboard not started)* |
| `HERMES_DASHBOARD_HOST` | Bind address for the dashboard HTTP server | `0.0.0.0` |
| `HERMES_DASHBOARD_PORT` | Port for the dashboard HTTP server | `9119` |
| `HERMES_DASHBOARD_TUI` | Set to `1` to expose the in-browser Chat tab (embedded `hermes --tui` via PTY/WebSocket) | *(unset)* |
The default `HERMES_DASHBOARD_HOST=0.0.0.0` is required for the host to reach the dashboard through the published port; the entrypoint automatically passes `--insecure` to `hermes dashboard` in that case. Override to `127.0.0.1` if you want to restrict the dashboard to in-container access only (e.g. behind a reverse proxy in a sidecar).
:::note
The dashboard side-process is **not supervised** — if it crashes, it stays down until the container restarts. Running it as a separate container is not supported: the dashboard's gateway-liveness detection requires a shared PID namespace with the gateway process.
:::
## Running interactively (CLI chat)
To open an interactive chat session against a running data directory:
```sh
docker run -it --rm \
-v ~/.hermes:/opt/data \
nousresearch/hermes-agent
```
Or if you have already opened a terminal in your running container (via Docker Desktop for instance), just run:
```sh
/opt/hermes/.venv/bin/hermes
```
## Persistent volumes
The `/opt/data` volume is the single source of truth for all Hermes state. It maps to your host's `~/.hermes/` directory and contains:
| Path | Contents |
|------|----------|
| `.env` | API keys and secrets |
| `config.yaml` | All Hermes configuration |
| `SOUL.md` | Agent personality/identity |
| `sessions/` | Conversation history |
| `memories/` | Persistent memory store |
| `skills/` | Installed skills |
| `cron/` | Scheduled job definitions |
| `hooks/` | Event hooks |
| `logs/` | Runtime logs |
| `skins/` | Custom CLI skins |
:::warning
Never run two Hermes **gateway** containers against the same data directory simultaneously — session files and memory stores are not designed for concurrent write access.
:::
## Multi-profile support
Hermes supports [multiple profiles](../reference/profile-commands.md) — separate `~/.hermes/` directories that let you run independent agents (different SOUL, skills, memory, sessions, credentials) from a single installation. **When running under Docker, using Hermes' built-in multi-profile feature is not recommended.**
Instead, the recommended pattern is **one container per profile**, with each container bind-mounting its own host directory as `/opt/data`:
```sh
# Work profile
docker run -d \
--name hermes-work \
--restart unless-stopped \
-v ~/.hermes-work:/opt/data \
-p 8642:8642 \
nousresearch/hermes-agent gateway run
# Personal profile
docker run -d \
--name hermes-personal \
--restart unless-stopped \
-v ~/.hermes-personal:/opt/data \
-p 8643:8642 \
nousresearch/hermes-agent gateway run
```
Why separate containers over profiles in Docker:
- **Isolation** — each container has its own filesystem, process table, and resource limits. A crash, dependency change, or runaway session in one profile can't affect another.
- **Independent lifecycle** — upgrade, restart, pause, or roll back each agent separately (`docker restart hermes-work` leaves `hermes-personal` untouched).
- **Clean port and network separation** — each gateway binds its own host port; there's no risk of cross-talk between chat platforms or API servers.
- **Simpler mental model** — the container *is* the profile. Backups, migrations, and permissions all follow the bind-mounted directory, with no extra `--profile` flags to remember.
- **Avoids concurrent-write risk** — the warning above about never running two gateways against the same data directory still applies to profiles within a single container.
In Docker Compose, this just means declaring one service per profile with distinct `container_name`, `volumes`, and `ports`:
```yaml
services:
hermes-work:
image: nousresearch/hermes-agent:latest
container_name: hermes-work
restart: unless-stopped
command: gateway run
ports:
- "8642:8642"
volumes:
- ~/.hermes-work:/opt/data
hermes-personal:
image: nousresearch/hermes-agent:latest
container_name: hermes-personal
restart: unless-stopped
command: gateway run
ports:
- "8643:8642"
volumes:
- ~/.hermes-personal:/opt/data
```
## Environment variable forwarding
API keys are read from `/opt/data/.env` inside the container. You can also pass environment variables directly:
```sh
docker run -it --rm \
-v ~/.hermes:/opt/data \
-e ANTHROPIC_API_KEY="sk-ant-..." \
-e OPENAI_API_KEY="sk-..." \
nousresearch/hermes-agent
```
Direct `-e` flags override values from `.env`. This is useful for CI/CD or secrets-manager integrations where you don't want keys on disk.
## Docker Compose example
For persistent deployment with both the gateway and dashboard, a `docker-compose.yaml` is convenient:
```yaml
services:
hermes:
image: nousresearch/hermes-agent:latest
container_name: hermes
restart: unless-stopped
command: gateway run
ports:
- "8642:8642" # gateway API
- "9119:9119" # dashboard (only reached when HERMES_DASHBOARD=1)
volumes:
- ~/.hermes:/opt/data
environment:
- HERMES_DASHBOARD=1
# Uncomment to forward specific env vars instead of using .env file:
# - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
# - OPENAI_API_KEY=${OPENAI_API_KEY}
# - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
deploy:
resources:
limits:
memory: 4G
cpus: "2.0"
```
Start with `docker compose up -d` and view logs with `docker compose logs -f`. Dashboard output is prefixed with `[dashboard]` so it's easy to filter from gateway logs.
## Resource limits
The Hermes container needs moderate resources. Recommended minimums:
| Resource | Minimum | Recommended |
|----------|---------|-------------|
| Memory | 1 GB | 2–4 GB |
| CPU | 1 core | 2 cores |
| Disk (data volume) | 500 MB | 2+ GB (grows with sessions/skills) |
Browser automation (Playwright/Chromium) is the most memory-hungry feature. If you don't need browser tools, 1 GB is sufficient. With browser tools active, allocate at least 2 GB.
Set limits in Docker:
```sh
docker run -d \
--name hermes \
--restart unless-stopped \
--memory=4g --cpus=2 \
-v ~/.hermes:/opt/data \
nousresearch/hermes-agent gateway run
```
## What the Dockerfile does
The official image is based on `debian:13.4` and includes:
- Python 3 with all Hermes dependencies (`uv pip install -e ".[all]"`)
- Node.js + npm (for browser automation and WhatsApp bridge)
- Playwright with Chromium (`npx playwright install --with-deps chromium --only-shell`)
- ripgrep, ffmpeg, git, and tini as system utilities
- **`docker-cli`** — so agents running inside the container can drive the host's Docker daemon (bind-mount `/var/run/docker.sock` to opt in) for `docker build`, `docker run`, container inspection, etc.
- **`openssh-client`** — enables the [SSH terminal backend](/docs/user-guide/configuration#ssh-backend) from inside the container. The SSH backend shells out to the system `ssh` binary; without this, it failed silently in containerized installs.
- The WhatsApp bridge (`scripts/whatsapp-bridge/`)
The entrypoint script (`docker/entrypoint.sh`) bootstraps the data volume on first run:
- Creates the directory structure (`sessions/`, `memories/`, `skills/`, etc.)
- Copies `.env.example` → `.env` if no `.env` exists
- Copies default `config.yaml` if missing
- Copies default `SOUL.md` if missing
- Syncs bundled skills using a manifest-based approach (preserves user edits)
- Optionally launches `hermes dashboard` as a background side-process when `HERMES_DASHBOARD=1` (see [Running the dashboard](#running-the-dashboard))
- Then runs `hermes` with whatever arguments you pass
## Upgrading
Pull the latest image and recreate the container. Your data directory is untouched.
```sh
docker pull nousresearch/hermes-agent:latest
docker rm -f hermes
docker run -d \
--name hermes \
--restart unless-stopped \
-v ~/.hermes:/opt/data \
nousresearch/hermes-agent gateway run
```
Or with Docker Compose:
```sh
docker compose pull
docker compose up -d
```
## Skills and credential files
When using Docker as the execution environment (not the methods above, but when the agent runs commands inside a Docker sandbox — see [Configuration → Docker Backend](./configuration.md#docker-backend)), Hermes reuses a single long-lived container for all tool calls and automatically bind-mounts the skills directory (`~/.hermes/skills/`) and any credential files declared by skills into that container as read-only volumes. Skill scripts, templates, and references are available inside the sandbox without manual configuration, and because the container persists for the life of the Hermes process, any dependencies you install or files you write stay around for the next tool call.
The same syncing happens for SSH and Modal backends — skills and credential files are uploaded via rsync or the Modal mount API before each command.
## Connecting to local inference servers (vLLM, Ollama, etc.)
When running Hermes in Docker and your inference server (vLLM, Ollama, text-generation-inference, etc.) is also running on the host or in another container, networking requires extra attention.
### Docker Compose (recommended)
Put both services on the same Docker network. This is the most reliable approach:
```yaml
services:
vllm:
image: vllm/vllm-openai:latest
container_name: vllm
command: >
--model Qwen/Qwen2.5-7B-Instruct
--served-model-name my-model
--host 0.0.0.0
--port 8000
ports:
- "8000:8000"
networks:
- hermes-net
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
hermes:
image: nousresearch/hermes-agent:latest
container_name: hermes
restart: unless-stopped
command: gateway run
ports:
- "8642:8642"
volumes:
- ~/.hermes:/opt/data
networks:
- hermes-net
networks:
hermes-net:
driver: bridge
```
Then in your `~/.hermes/config.yaml`, use the **container name** as the hostname:
```yaml
model:
provider: custom
model: my-model
base_url: http://vllm:8000/v1
api_key: "none"
```
:::tip Key points
- Use the **container name** (`vllm`) as the hostname — not `localhost` or `127.0.0.1`, which refer to the Hermes container itself.
- The `model` value must match the `--served-model-name` you passed to vLLM.
- Set `api_key` to any non-empty string (vLLM requires the header but doesn't validate it by default).
- Do **not** include a trailing slash in `base_url`.
:::
### Standalone Docker run (no Compose)
If your inference server runs directly on the host (not in Docker), use `host.docker.internal` on macOS/Windows, or `--network host` on Linux:
**macOS / Windows:**
```sh
docker run -d \
--name hermes \
-v ~/.hermes:/opt/data \
-p 8642:8642 \
nousresearch/hermes-agent gateway run
```
```yaml
# config.yaml
model:
provider: custom
model: my-model
base_url: http://host.docker.internal:8000/v1
api_key: "none"
```
**Linux (host networking):**
```sh
docker run -d \
--name hermes \
--network host \
-v ~/.hermes:/opt/data \
nousresearch/hermes-agent gateway run
```
```yaml
# config.yaml
model:
provider: custom
model: my-model
base_url: http://127.0.0.1:8000/v1
api_key: "none"
```
:::warning With `--network host`, the `-p` flag is ignored — all container ports are directly exposed on the host.
:::
### Verifying connectivity
From inside the Hermes container, confirm the inference server is reachable:
```sh
docker exec hermes curl -s http://vllm:8000/v1/models
```
You should see a JSON response listing your served model. If this fails, check:
1. Both containers are on the same Docker network (`docker network inspect hermes-net`)
2. The inference server is listening on `0.0.0.0`, not `127.0.0.1`
3. The port number matches
### Ollama
Ollama works the same way. If Ollama runs on the host, use `host.docker.internal:11434` (macOS/Windows) or `127.0.0.1:11434` (Linux with `--network host`). If Ollama runs in its own container on the same Docker network:
```yaml
model:
provider: custom
model: llama3
base_url: http://ollama:11434/v1
api_key: "none"
```
## Troubleshooting
### Container exits immediately
Check logs: `docker logs hermes`. Common causes:
- Missing or invalid `.env` file — run interactively first to complete setup
- Port conflicts if running with exposed ports
### "Permission denied" errors
The container's entrypoint drops privileges to the non-root `hermes` user (UID 10000) via `gosu`. If your host `~/.hermes/` is owned by a different UID, set `HERMES_UID`/`HERMES_GID` to match your host user, or ensure the data directory is writable:
```sh
chmod -R 755 ~/.hermes
```
### Browser tools not working
Playwright needs shared memory. Add `--shm-size=1g` to your Docker run command:
```sh
docker run -d \
--name hermes \
--shm-size=1g \
-v ~/.hermes:/opt/data \
nousresearch/hermes-agent gateway run
```
### Gateway not reconnecting after network issues
The `--restart unless-stopped` flag handles most transient failures. If the gateway is stuck, restart the container:
```sh
docker restart hermes
```
### Checking container health
```sh
docker logs --tail 50 hermes # Recent logs
docker run -it --rm nousresearch/hermes-agent:latest version # Verify version
docker stats hermes # Resource usage
```
---
# Security
# Security
Hermes Agent is designed with a defense-in-depth security model. This page covers every security boundary — from command approval to container isolation to user authorization on messaging platforms.
## Overview
The security model has seven layers:
1. **User authorization** — who can talk to the agent (allowlists, DM pairing)
2. **Dangerous command approval** — human-in-the-loop for destructive operations
3. **Container isolation** — Docker/Singularity/Modal sandboxing with hardened settings
4. **MCP credential filtering** — environment variable isolation for MCP subprocesses
5. **Context file scanning** — prompt injection detection in project files
6. **Cross-session isolation** — sessions cannot access each other's data or state; cron job storage paths are hardened against path traversal attacks
7. **Input sanitization** — working directory parameters in terminal tool backends are validated against an allowlist to prevent shell injection
## Dangerous Command Approval
Before executing any command, Hermes checks it against a curated list of dangerous patterns. If a match is found, the user must explicitly approve it.
### Approval Modes
The approval system supports three modes, configured via `approvals.mode` in `~/.hermes/config.yaml`:
```yaml
approvals:
mode: manual # manual | smart | off
timeout: 60 # seconds to wait for user response (default: 60)
```
| Mode | Behavior |
|------|----------|
| **manual** (default) | Always prompt the user for approval on dangerous commands |
| **smart** | Use an auxiliary LLM to assess risk. Low-risk commands (e.g., `python -c "print('hello')"`) are auto-approved. Genuinely dangerous commands are auto-denied. Uncertain cases escalate to a manual prompt. |
| **off** | Disable all approval checks — equivalent to running with `--yolo`. All commands execute without prompts. |
:::warning
Setting `approvals.mode: off` disables all safety prompts. Use only in trusted environments (CI/CD, containers, etc.).
:::
### YOLO Mode
YOLO mode bypasses **all** dangerous command approval prompts for the current session. It can be activated three ways:
1. **CLI flag**: Start a session with `hermes --yolo` or `hermes chat --yolo`
2. **Slash command**: Type `/yolo` during a session to toggle it on/off
3. **Environment variable**: Set `HERMES_YOLO_MODE=1`
The `/yolo` command is a **toggle** — each use flips the mode on or off:
```
> /yolo
⚡ YOLO mode ON — all commands auto-approved. Use with caution.
> /yolo
⚠ YOLO mode OFF — dangerous commands will require approval.
```
YOLO mode is available in both CLI and gateway sessions. Internally, it sets the `HERMES_YOLO_MODE` environment variable which is checked before every command execution.
:::danger
YOLO mode disables **all** dangerous command safety checks for the session — **except** the hardline blocklist (see below). Use only when you fully trust the commands being generated (e.g., well-tested automation scripts in disposable environments).
:::
### Hardline Blocklist (Always-On Floor)
Some commands are so catastrophic — irreversible filesystem wipes, fork bombs, direct block-device writes — that Hermes refuses to run them **regardless** of:
- `--yolo` / `/yolo` toggled on
- `approvals.mode: off`
- Cron jobs running in headless `approve` mode
- User explicitly clicking "allow always"
The blocklist is the floor below `--yolo`. It trips **before** the approval layer even sees the command, and there's no override flag. Patterns currently covered (not exhaustive; kept in sync with `tools/approval.py::UNRECOVERABLE_BLOCKLIST`):
| Pattern | Why it's hardline |
|---|---|
| `rm -rf /` and obvious variants | Wipes the filesystem root |
| `rm -rf --no-preserve-root /` | The explicit "yes I mean root" variant |
| `:(){ :\|:& };:` (bash fork bomb) | Pegs the host until reboot |
| `mkfs.*` on a mounted root device | Formats the live system |
| `dd if=/dev/zero of=/dev/sd*` | Zeroes a physical disk |
| Piping untrusted URLs to `sh` at the rootfs top level | Remote-code-execution attack vector too broad to approve |
If you hit the blocklist, the tool call returns an explanatory error to the agent and nothing runs. If a legitimate workflow needs one of these commands (you're the operator of a wipe-and-reinstall pipeline, for example), run it outside the agent.
### Approval Timeout
When a dangerous command prompt appears, the user has a configurable amount of time to respond. If no response is given within the timeout, the command is **denied** by default (fail-closed).
Configure the timeout in `~/.hermes/config.yaml`:
```yaml
approvals:
timeout: 60 # seconds (default: 60)
```
### What Triggers Approval
The following patterns trigger approval prompts (defined in `tools/approval.py`):
| Pattern | Description |
|---------|-------------|
| `rm -r` / `rm --recursive` | Recursive delete |
| `rm ... /` | Delete in root path |
| `chmod 777/666` / `o+w` / `a+w` | World/other-writable permissions |
| `chmod --recursive` with unsafe perms | Recursive world/other-writable (long flag) |
| `chown -R root` / `chown --recursive root` | Recursive chown to root |
| `mkfs` | Format filesystem |
| `dd if=` | Disk copy |
| `> /dev/sd` | Write to block device |
| `DROP TABLE/DATABASE` | SQL DROP |
| `DELETE FROM` (without WHERE) | SQL DELETE without WHERE |
| `TRUNCATE TABLE` | SQL TRUNCATE |
| `> /etc/` | Overwrite system config |
| `systemctl stop/restart/disable/mask` | Stop/restart/disable system services |
| `kill -9 -1` | Kill all processes |
| `pkill -9` | Force kill processes |
| Fork bomb patterns | Fork bombs |
| `bash -c` / `sh -c` / `zsh -c` / `ksh -c` | Shell command execution via `-c` flag (including combined flags like `-lc`) |
| `python -e` / `perl -e` / `ruby -e` / `node -c` | Script execution via `-e`/`-c` flag |
| `curl ... \| sh` / `wget ... \| sh` | Pipe remote content to shell |
| `bash <(curl ...)` / `sh <(wget ...)` | Execute remote script via process substitution |
| `tee` to `/etc/`, `~/.ssh/`, `~/.hermes/.env` | Overwrite sensitive file via tee |
| `>` / `>>` to `/etc/`, `~/.ssh/`, `~/.hermes/.env` | Overwrite sensitive file via redirection |
| `xargs rm` | xargs with rm |
| `find -exec rm` / `find -delete` | Find with destructive actions |
| `cp`/`mv`/`install` to `/etc/` | Copy/move file into system config |
| `sed -i` / `sed --in-place` on `/etc/` | In-place edit of system config |
| `pkill`/`killall` hermes/gateway | Self-termination prevention |
| `gateway run` with `&`/`disown`/`nohup`/`setsid` | Prevents starting gateway outside service manager |
:::info
**Container bypass**: When running in `docker`, `singularity`, `modal`, `daytona`, or `vercel_sandbox` backends, dangerous command checks are **skipped** because the container itself is the security boundary. Destructive commands inside a container can't harm the host.
:::
### Approval Flow (CLI)
In the interactive CLI, dangerous commands show an inline approval prompt:
```
⚠️ DANGEROUS COMMAND: recursive delete
rm -rf /tmp/old-project
[o]nce | [s]ession | [a]lways | [d]eny
Choice [o/s/a/D]:
```
The four options:
- **once** — allow this single execution
- **session** — allow this pattern for the rest of the session
- **always** — add to permanent allowlist (saved to `config.yaml`)
- **deny** (default) — block the command
### Approval Flow (Gateway/Messaging)
On messaging platforms, the agent sends the dangerous command details to the chat and waits for the user to reply:
- Reply **yes**, **y**, **approve**, **ok**, or **go** to approve
- Reply **no**, **n**, **deny**, or **cancel** to deny
The `HERMES_EXEC_ASK=1` environment variable is automatically set when running the gateway.
### Permanent Allowlist
Commands approved with "always" are saved to `~/.hermes/config.yaml`:
```yaml
# Permanently allowed dangerous command patterns
command_allowlist:
- rm
- systemctl
```
These patterns are loaded at startup and silently approved in all future sessions.
:::tip
Use `hermes config edit` to review or remove patterns from your permanent allowlist.
:::
## User Authorization (Gateway)
When running the messaging gateway, Hermes controls who can interact with the bot through a layered authorization system.
### Authorization Check Order
The `_is_user_authorized()` method checks in this order:
1. **Per-platform allow-all flag** (e.g., `DISCORD_ALLOW_ALL_USERS=true`)
2. **DM pairing approved list** (users approved via pairing codes)
3. **Platform-specific allowlists** (e.g., `TELEGRAM_ALLOWED_USERS=12345,67890`)
4. **Global allowlist** (`GATEWAY_ALLOWED_USERS=12345,67890`)
5. **Global allow-all** (`GATEWAY_ALLOW_ALL_USERS=true`)
6. **Default: deny**
### Platform Allowlists
Set allowed user IDs as comma-separated values in `~/.hermes/.env`:
```bash
# Platform-specific allowlists
TELEGRAM_ALLOWED_USERS=123456789,987654321
DISCORD_ALLOWED_USERS=111222333444555666
WHATSAPP_ALLOWED_USERS=15551234567
SLACK_ALLOWED_USERS=U01ABC123
# Cross-platform allowlist (checked for all platforms)
GATEWAY_ALLOWED_USERS=123456789
# Per-platform allow-all (use with caution)
DISCORD_ALLOW_ALL_USERS=true
# Global allow-all (use with extreme caution)
GATEWAY_ALLOW_ALL_USERS=true
```
:::warning
If **no allowlists are configured** and `GATEWAY_ALLOW_ALL_USERS` is not set, **all users are denied**. The gateway logs a warning at startup:
```
No user allowlists configured. All unauthorized users will be denied.
Set GATEWAY_ALLOW_ALL_USERS=true in ~/.hermes/.env to allow open access,
or configure platform allowlists (e.g., TELEGRAM_ALLOWED_USERS=your_id).
```
:::
### DM Pairing System
For more flexible authorization, Hermes includes a code-based pairing system. Instead of requiring user IDs upfront, unknown users receive a one-time pairing code that the bot owner approves via the CLI.
**How it works:**
1. An unknown user sends a DM to the bot
2. The bot replies with an 8-character pairing code
3. The bot owner runs `hermes pairing approve ` on the CLI
4. The user is permanently approved for that platform
Control how unauthorized direct messages are handled in `~/.hermes/config.yaml`:
```yaml
unauthorized_dm_behavior: pair
whatsapp:
unauthorized_dm_behavior: ignore
```
- `pair` is the default. Unauthorized DMs get a pairing code reply.
- `ignore` silently drops unauthorized DMs.
- Platform sections override the global default, so you can keep pairing on Telegram while keeping WhatsApp silent.
**Security features** (based on OWASP + NIST SP 800-63-4 guidance):
| Feature | Details |
|---------|---------|
| Code format | 8-char from 32-char unambiguous alphabet (no 0/O/1/I) |
| Randomness | Cryptographic (`secrets.choice()`) |
| Code TTL | 1 hour expiry |
| Rate limiting | 1 request per user per 10 minutes |
| Pending limit | Max 3 pending codes per platform |
| Lockout | 5 failed approval attempts → 1-hour lockout |
| File security | `chmod 0600` on all pairing data files |
| Logging | Codes are never logged to stdout |
**Pairing CLI commands:**
```bash
# List pending and approved users
hermes pairing list
# Approve a pairing code
hermes pairing approve telegram ABC12DEF
# Revoke a user's access
hermes pairing revoke telegram 123456789
# Clear all pending codes
hermes pairing clear-pending
```
**Storage:** Pairing data is stored in `~/.hermes/pairing/` with per-platform JSON files:
- `{platform}-pending.json` — pending pairing requests
- `{platform}-approved.json` — approved users
- `_rate_limits.json` — rate limit and lockout tracking
## Container Isolation
When using the `docker` terminal backend, Hermes applies strict security hardening to every container.
### Docker Security Flags
Every container runs with these flags (defined in `tools/environments/docker.py`):
```python
_SECURITY_ARGS = [
"--cap-drop", "ALL", # Drop ALL Linux capabilities
"--cap-add", "DAC_OVERRIDE", # Root can write to bind-mounted dirs
"--cap-add", "CHOWN", # Package managers need file ownership
"--cap-add", "FOWNER", # Package managers need file ownership
"--security-opt", "no-new-privileges", # Block privilege escalation
"--pids-limit", "256", # Limit process count
"--tmpfs", "/tmp:rw,nosuid,size=512m", # Size-limited /tmp
"--tmpfs", "/var/tmp:rw,noexec,nosuid,size=256m", # No-exec /var/tmp
"--tmpfs", "/run:rw,noexec,nosuid,size=64m", # No-exec /run
]
```
### Resource Limits
Container resources are configurable in `~/.hermes/config.yaml`:
```yaml
terminal:
backend: docker
docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
docker_forward_env: [] # Explicit allowlist only; empty keeps secrets out of the container
container_cpu: 1 # CPU cores
container_memory: 5120 # MB (default 5GB)
container_disk: 51200 # MB (default 50GB, requires overlay2 on XFS)
container_persistent: true # Persist filesystem across sessions
```
### Filesystem Persistence
- **Persistent mode** (`container_persistent: true`): Bind-mounts `/workspace` and `/root` from `~/.hermes/sandboxes/docker//`
- **Ephemeral mode** (`container_persistent: false`): Uses tmpfs for workspace — everything is lost on cleanup
:::tip
For production gateway deployments, use `docker`, `modal`, `daytona`, or `vercel_sandbox` backend to isolate agent commands from your host system. This eliminates the need for dangerous command approval entirely.
:::
:::warning
If you add names to `terminal.docker_forward_env`, those variables are intentionally injected into the container for terminal commands. This is useful for task-specific credentials like `GITHUB_TOKEN`, but it also means code running in the container can read and exfiltrate them.
:::
## Terminal Backend Security Comparison
| Backend | Isolation | Dangerous Cmd Check | Best For |
|---------|-----------|-------------------|----------|
| **local** | None — runs on host | ✅ Yes | Development, trusted users |
| **ssh** | Remote machine | ✅ Yes | Running on a separate server |
| **docker** | Container | ❌ Skipped (container is boundary) | Production gateway |
| **singularity** | Container | ❌ Skipped | HPC environments |
| **modal** | Cloud sandbox | ❌ Skipped | Scalable cloud isolation |
| **daytona** | Cloud sandbox | ❌ Skipped | Persistent cloud workspaces |
| **vercel_sandbox** | Cloud microVM | ❌ Skipped | Cloud execution with snapshot persistence |
## Environment Variable Passthrough {#environment-variable-passthrough}
Both `execute_code` and `terminal` strip sensitive environment variables from child processes to prevent credential exfiltration by LLM-generated code. However, skills that declare `required_environment_variables` legitimately need access to those vars.
### How It Works
Two mechanisms allow specific variables through the sandbox filters:
**1. Skill-scoped passthrough (automatic)**
When a skill is loaded (via `skill_view` or the `/skill` command) and declares `required_environment_variables`, any of those vars that are actually set in the environment are automatically registered as passthrough. Missing vars (still in setup-needed state) are **not** registered.
```yaml
# In a skill's SKILL.md frontmatter
required_environment_variables:
- name: TENOR_API_KEY
prompt: Tenor API key
help: Get a key from https://developers.google.com/tenor
```
After loading this skill, `TENOR_API_KEY` passes through to `execute_code`, `terminal` (local), **and remote backends (Docker, Modal)** — no manual configuration needed.
:::info Docker & Modal
Prior to v0.5.1, Docker's `forward_env` was a separate system from the skill passthrough. They are now merged — skill-declared env vars are automatically forwarded into Docker containers and Modal sandboxes without needing to add them to `docker_forward_env` manually.
:::
**2. Config-based passthrough (manual)**
For env vars not declared by any skill, add them to `terminal.env_passthrough` in `config.yaml`:
```yaml
terminal:
env_passthrough:
- MY_CUSTOM_KEY
- ANOTHER_TOKEN
```
### Credential File Passthrough (OAuth tokens, etc.) {#credential-file-passthrough}
Some skills need **files** (not just env vars) in the sandbox — for example, Google Workspace stores OAuth tokens as `google_token.json` under the active profile's `HERMES_HOME`. Skills declare these in frontmatter:
```yaml
required_credential_files:
- path: google_token.json
description: Google OAuth2 token (created by setup script)
- path: google_client_secret.json
description: Google OAuth2 client credentials
```
When loaded, Hermes checks if these files exist in the active profile's `HERMES_HOME` and registers them for mounting:
- **Docker**: Read-only bind mounts (`-v host:container:ro`)
- **Modal**: Mounted at sandbox creation + synced before each command (handles mid-session OAuth setup)
- **Local**: No action needed (files already accessible)
You can also list credential files manually in `config.yaml`:
```yaml
terminal:
credential_files:
- google_token.json
- my_custom_oauth_token.json
```
Paths are relative to `~/.hermes/`. Files are mounted to `/root/.hermes/` inside the container.
### What Each Sandbox Filters
| Sandbox | Default Filter | Passthrough Override |
|---------|---------------|---------------------|
| **execute_code** | Blocks vars containing `KEY`, `TOKEN`, `SECRET`, `PASSWORD`, `CREDENTIAL`, `PASSWD`, `AUTH` in name; only allows safe-prefix vars through | ✅ Passthrough vars bypass both checks |
| **terminal** (local) | Blocks explicit Hermes infrastructure vars (provider keys, gateway tokens, tool API keys) | ✅ Passthrough vars bypass the blocklist |
| **terminal** (Docker) | No host env vars by default | ✅ Passthrough vars + `docker_forward_env` forwarded via `-e` |
| **terminal** (Modal) | No host env/files by default | ✅ Credential files mounted; env passthrough via sync |
| **MCP** | Blocks everything except safe system vars + explicitly configured `env` | ❌ Not affected by passthrough (use MCP `env` config instead) |
### Security Considerations
- The passthrough only affects vars you or your skills explicitly declare — the default security posture is unchanged for arbitrary LLM-generated code
- Credential files are mounted **read-only** into Docker containers
- Skills Guard scans skill content for suspicious env access patterns before installation
- Missing/unset vars are never registered (you can't leak what doesn't exist)
- Hermes infrastructure secrets (provider API keys, gateway tokens) should never be added to `env_passthrough` — they have dedicated mechanisms
## MCP Credential Handling
MCP (Model Context Protocol) server subprocesses receive a **filtered environment** to prevent accidental credential leakage.
### Safe Environment Variables
Only these variables are passed through from the host to MCP stdio subprocesses:
```
PATH, HOME, USER, LANG, LC_ALL, TERM, SHELL, TMPDIR
```
Plus any `XDG_*` variables. All other environment variables (API keys, tokens, secrets) are **stripped**.
Variables explicitly defined in the MCP server's `env` config are passed through:
```yaml
mcp_servers:
github:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_..." # Only this is passed
```
### Credential Redaction
Error messages from MCP tools are sanitized before being returned to the LLM. The following patterns are replaced with `[REDACTED]`:
- GitHub PATs (`ghp_...`)
- OpenAI-style keys (`sk-...`)
- Bearer tokens
- `token=`, `key=`, `API_KEY=`, `password=`, `secret=` parameters
### Website Access Policy
You can restrict which websites the agent can access through its web and browser tools. This is useful for preventing the agent from accessing internal services, admin panels, or other sensitive URLs.
```yaml
# In ~/.hermes/config.yaml
security:
website_blocklist:
enabled: true
domains:
- "*.internal.company.com"
- "admin.example.com"
shared_files:
- "/etc/hermes/blocked-sites.txt"
```
When a blocked URL is requested, the tool returns an error explaining the domain is blocked by policy. The blocklist is enforced across `web_search`, `web_extract`, `browser_navigate`, and all URL-capable tools.
See [Website Blocklist](/docs/user-guide/configuration#website-blocklist) in the configuration guide for full details.
### SSRF Protection
All URL-capable tools (web search, web extract, vision, browser) validate URLs before fetching them to prevent Server-Side Request Forgery (SSRF) attacks. Blocked addresses include:
- **Private networks** (RFC 1918): `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`
- **Loopback**: `127.0.0.0/8`, `::1`
- **Link-local**: `169.254.0.0/16` (includes cloud metadata at `169.254.169.254`)
- **CGNAT / shared address space** (RFC 6598): `100.64.0.0/10` (Tailscale, WireGuard VPNs)
- **Cloud metadata hostnames**: `metadata.google.internal`, `metadata.goog`
- **Reserved, multicast, and unspecified addresses**
SSRF protection is always active for internet-facing use and DNS failures are treated as blocked (fail-closed). Redirect chains are re-validated at each hop to prevent redirect-based bypasses.
#### Intentionally allowing private URLs
Some setups legitimately need private/internal URL access — home networks that resolve `home.arpa` to RFC 1918 space, LAN-only Ollama/llama.cpp endpoints, internal wikis, cloud metadata debugging, and the like. For those cases there's a global opt-out:
```yaml
security:
allow_private_urls: true # default: false
```
When on, web tools, the browser, vision URL fetches, and gateway media downloads no longer reject RFC 1918 / loopback / link-local / CGNAT / cloud-metadata destinations. **This is a deliberate trust boundary** — only enable it on machines where the agent running arbitrary prompt-injected URLs against the local network is an acceptable risk. Public-facing gateways should leave it off.
The host-substring guard (which blocks lookalike Unicode domain tricks even when the underlying IP is public) stays on regardless of this setting.
### Tirith Pre-Exec Security Scanning
Hermes integrates [tirith](https://github.com/sheeki03/tirith) for content-level command scanning before execution. Tirith detects threats that pattern matching alone misses:
- Homograph URL spoofing (internationalized domain attacks)
- Pipe-to-interpreter patterns (`curl | bash`, `wget | sh`)
- Terminal injection attacks
Tirith auto-installs from GitHub releases on first use with SHA-256 checksum verification (and cosign provenance verification if cosign is available).
```yaml
# In ~/.hermes/config.yaml
security:
tirith_enabled: true # Enable/disable tirith scanning (default: true)
tirith_path: "tirith" # Path to tirith binary (default: PATH lookup)
tirith_timeout: 5 # Subprocess timeout in seconds
tirith_fail_open: true # Allow execution when tirith is unavailable (default: true)
```
When `tirith_fail_open` is `true` (default), commands proceed if tirith is not installed or times out. Set to `false` in high-security environments to block commands when tirith is unavailable.
Tirith's verdict integrates with the approval flow: safe commands pass through, while both suspicious and blocked commands trigger user approval with the full tirith findings (severity, title, description, safer alternatives). Users can approve or deny — the default choice is deny to keep unattended scenarios secure.
### Context File Injection Protection
Context files (AGENTS.md, .cursorrules, SOUL.md) are scanned for prompt injection before being included in the system prompt. The scanner checks for:
- Instructions to ignore/disregard prior instructions
- Hidden HTML comments with suspicious keywords
- Attempts to read secrets (`.env`, `credentials`, `.netrc`)
- Credential exfiltration via `curl`
- Invisible Unicode characters (zero-width spaces, bidirectional overrides)
Blocked files show a warning:
```
[BLOCKED: AGENTS.md contained potential prompt injection (prompt_injection). Content not loaded.]
```
## Best Practices for Production Deployment
### Gateway Deployment Checklist
1. **Set explicit allowlists** — never use `GATEWAY_ALLOW_ALL_USERS=true` in production
2. **Use container backend** — set `terminal.backend: docker` in config.yaml
3. **Restrict resource limits** — set appropriate CPU, memory, and disk limits
4. **Store secrets securely** — keep API keys in `~/.hermes/.env` with proper file permissions
5. **Enable DM pairing** — use pairing codes instead of hardcoding user IDs when possible
6. **Review command allowlist** — periodically audit `command_allowlist` in config.yaml
7. **Set `MESSAGING_CWD`** — don't let the agent operate from sensitive directories
8. **Run as non-root** — never run the gateway as root
9. **Monitor logs** — check `~/.hermes/logs/` for unauthorized access attempts
10. **Keep updated** — run `hermes update` regularly for security patches
### Securing API Keys
```bash
# Set proper permissions on the .env file
chmod 600 ~/.hermes/.env
# Keep separate keys for different services
# Never commit .env files to version control
```
### Network Isolation
For maximum security, run the gateway on a separate machine or VM:
```yaml
terminal:
backend: ssh
ssh_host: "agent-worker.local"
ssh_user: "hermes"
ssh_key: "~/.ssh/hermes_agent_key"
```
This keeps the gateway's messaging connections separate from the agent's command execution.
---
# Checkpoints and /rollback
# Checkpoints and `/rollback`
Hermes Agent automatically snapshots your project before **destructive operations** and lets you restore it with a single command. Checkpoints are **enabled by default** — there's zero cost when no file-mutating tools fire.
This safety net is powered by an internal **Checkpoint Manager** that keeps a separate shadow git repository under `~/.hermes/checkpoints/` — your real project `.git` is never touched.
## What Triggers a Checkpoint
Checkpoints are taken automatically before:
- **File tools** — `write_file` and `patch`
- **Destructive terminal commands** — `rm`, `rmdir`, `cp`, `install`, `mv`, `sed -i`, `truncate`, `dd`, `shred`, output redirects (`>`), and `git reset`/`clean`/`checkout`
The agent creates **at most one checkpoint per directory per turn**, so long-running sessions don't spam snapshots.
## Quick Reference
| Command | Description |
|---------|-------------|
| `/rollback` | List all checkpoints with change stats |
| `/rollback ` | Restore to checkpoint N (also undoes last chat turn) |
| `/rollback diff ` | Preview diff between checkpoint N and current state |
| `/rollback ` | Restore a single file from checkpoint N |
## How Checkpoints Work
At a high level:
- Hermes detects when tools are about to **modify files** in your working tree.
- Once per conversation turn (per directory), it:
- Resolves a reasonable project root for the file.
- Initialises or reuses a **shadow git repo** tied to that directory.
- Stages and commits the current state with a short, human‑readable reason.
- These commits form a checkpoint history that you can inspect and restore via `/rollback`.
```mermaid
flowchart LR
user["User command\n(hermes, gateway)"]
agent["AIAgent\n(run_agent.py)"]
tools["File & terminal tools"]
cpMgr["CheckpointManager"]
shadowRepo["Shadow git repo\n~/.hermes/checkpoints/"]
user --> agent
agent -->|"tool call"| tools
tools -->|"before mutate\nensure_checkpoint()"| cpMgr
cpMgr -->|"git add/commit"| shadowRepo
cpMgr -->|"OK / skipped"| tools
tools -->|"apply changes"| agent
```
## Configuration
Checkpoints are enabled by default. Configure in `~/.hermes/config.yaml`:
```yaml
checkpoints:
enabled: true # master switch (default: true)
max_snapshots: 50 # max checkpoints per directory
# Auto-maintenance (opt-in): sweep ~/.hermes/checkpoints/ at startup
# and delete shadow repos whose working directory no longer exists
# (orphans) or whose newest commit is older than retention_days.
# Runs at most once per min_interval_hours, tracked via a
# .last_prune marker inside ~/.hermes/checkpoints/.
auto_prune: false # default off — enable to reclaim disk
retention_days: 7
delete_orphans: true # delete repos whose workdir is gone
min_interval_hours: 24
```
To disable:
```yaml
checkpoints:
enabled: false
```
When disabled, the Checkpoint Manager is a no‑op and never attempts git operations.
## Listing Checkpoints
From a CLI session:
```
/rollback
```
Hermes responds with a formatted list showing change statistics:
```text
📸 Checkpoints for /path/to/project:
1. 4270a8c 2026-03-16 04:36 before patch (1 file, +1/-0)
2. eaf4c1f 2026-03-16 04:35 before write_file
3. b3f9d2e 2026-03-16 04:34 before terminal: sed -i s/old/new/ config.py (1 file, +1/-1)
/rollback restore to checkpoint N
/rollback diff preview changes since checkpoint N
/rollback restore a single file from checkpoint N
```
Each entry shows:
- Short hash
- Timestamp
- Reason (what triggered the snapshot)
- Change summary (files changed, insertions/deletions)
## Previewing Changes with `/rollback diff`
Before committing to a restore, preview what has changed since a checkpoint:
```
/rollback diff 1
```
This shows a git diff stat summary followed by the actual diff:
```text
test.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/test.py b/test.py
--- a/test.py
+++ b/test.py
@@ -1 +1 @@
-print('original content')
+print('modified content')
```
Long diffs are capped at 80 lines to avoid flooding the terminal.
## Restoring with `/rollback`
Restore to a checkpoint by number:
```
/rollback 1
```
Behind the scenes, Hermes:
1. Verifies the target commit exists in the shadow repo.
2. Takes a **pre‑rollback snapshot** of the current state so you can "undo the undo" later.
3. Restores tracked files in your working directory.
4. **Undoes the last conversation turn** so the agent's context matches the restored filesystem state.
On success:
```text
✅ Restored to checkpoint 4270a8c5: before patch
A pre-rollback snapshot was saved automatically.
(^_^)b Undid 4 message(s). Removed: "Now update test.py to ..."
4 message(s) remaining in history.
Chat turn undone to match restored file state.
```
The conversation undo ensures the agent doesn't "remember" changes that have been rolled back, avoiding confusion on the next turn.
## Single-File Restore
Restore just one file from a checkpoint without affecting the rest of the directory:
```
/rollback 1 src/broken_file.py
```
This is useful when the agent made changes to multiple files but only one needs to be reverted.
## Safety and Performance Guards
To keep checkpointing safe and fast, Hermes applies several guardrails:
- **Git availability** — if `git` is not found on `PATH`, checkpoints are transparently disabled.
- **Directory scope** — Hermes skips overly broad directories (root `/`, home `$HOME`).
- **Repository size** — directories with more than 50,000 files are skipped to avoid slow git operations.
- **No‑change snapshots** — if there are no changes since the last snapshot, the checkpoint is skipped.
- **Non‑fatal errors** — all errors inside the Checkpoint Manager are logged at debug level; your tools continue to run.
## Where Checkpoints Live
All shadow repos live under:
```text
~/.hermes/checkpoints/
├── / # shadow git repo for one working directory
├── /
└── ...
```
Each `` is derived from the absolute path of the working directory. Inside each shadow repo you'll find:
- Standard git internals (`HEAD`, `refs/`, `objects/`)
- An `info/exclude` file containing a curated ignore list
- A `HERMES_WORKDIR` file pointing back to the original project root
You normally never need to touch these manually.
## Best Practices
- **Leave checkpoints enabled** — they're on by default and have zero cost when no files are modified.
- **Use `/rollback diff` before restoring** — preview what will change to pick the right checkpoint.
- **Use `/rollback` instead of `git reset`** when you want to undo agent-driven changes only.
- **Combine with Git worktrees** for maximum safety — keep each Hermes session in its own worktree/branch, with checkpoints as an extra layer.
For running multiple agents in parallel on the same repo, see the guide on [Git worktrees](./git-worktrees.md).
---
# Features Overview
# Features Overview
Hermes Agent includes a rich set of capabilities that extend far beyond basic chat. From persistent memory and file-aware context to browser automation and voice conversations, these features work together to make Hermes a powerful autonomous assistant.
## Core
- **[Tools & Toolsets](tools.md)** — Tools are functions that extend the agent's capabilities. They're organized into logical toolsets that can be enabled or disabled per platform, covering web search, terminal execution, file editing, memory, delegation, and more.
- **[Skills System](skills.md)** — On-demand knowledge documents the agent can load when needed. Skills follow a progressive disclosure pattern to minimize token usage and are compatible with the [agentskills.io](https://agentskills.io/specification) open standard.
- **[Persistent Memory](memory.md)** — Bounded, curated memory that persists across sessions. Hermes remembers your preferences, projects, environment, and things it has learned via `MEMORY.md` and `USER.md`.
- **[Context Files](context-files.md)** — Hermes automatically discovers and loads project context files (`.hermes.md`, `AGENTS.md`, `CLAUDE.md`, `SOUL.md`, `.cursorrules`) that shape how it behaves in your project.
- **[Context References](context-references.md)** — Type `@` followed by a reference to inject files, folders, git diffs, and URLs directly into your messages. Hermes expands the reference inline and appends the content automatically.
- **[Checkpoints](../checkpoints-and-rollback.md)** — Hermes automatically snapshots your working directory before making file changes, giving you a safety net to roll back with `/rollback` if something goes wrong.
## Automation
- **[Scheduled Tasks (Cron)](cron.md)** — Schedule tasks to run automatically with natural language or cron expressions. Jobs can attach skills, deliver results to any platform, and support pause/resume/edit operations.
- **[Subagent Delegation](delegation.md)** — The `delegate_task` tool spawns child agent instances with isolated context, restricted toolsets, and their own terminal sessions. Run 3 concurrent subagents by default (configurable) for parallel workstreams.
- **[Code Execution](code-execution.md)** — The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn via sandboxed RPC execution.
- **[Event Hooks](hooks.md)** — Run custom code at key lifecycle points. Gateway hooks handle logging, alerts, and webhooks; plugin hooks handle tool interception, metrics, and guardrails.
- **[Batch Processing](batch-processing.md)** — Run the Hermes agent across hundreds or thousands of prompts in parallel, generating structured ShareGPT-format trajectory data for training data generation or evaluation.
## Media & Web
- **[Voice Mode](voice-mode.md)** — Full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
- **[Browser Automation](browser.md)** — Full browser automation with multiple backends: Browserbase cloud, Browser Use cloud, local Chrome via CDP, or local Chromium. Navigate websites, fill forms, and extract information.
- **[Vision & Image Paste](vision.md)** — Multimodal vision support. Paste images from your clipboard into the CLI and ask the agent to analyze, describe, or work with them using any vision-capable model.
- **[Image Generation](image-generation.md)** — Generate images from text prompts using FAL.ai. Nine models supported (FLUX 2 Klein/Pro, GPT-Image 1.5/2, Nano Banana Pro, Ideogram V3, Recraft V4 Pro, Qwen, Z-Image Turbo); pick one via `hermes tools`.
- **[Voice & TTS](tts.md)** — Text-to-speech output and voice message transcription across all messaging platforms, with ten native provider options: Edge TTS (free), ElevenLabs, OpenAI TTS, MiniMax, Mistral Voxtral, Google Gemini, xAI, NeuTTS, KittenTTS, and Piper — plus custom command providers for any local TTS CLI.
## Integrations
- **[MCP Integration](mcp.md)** — Connect to any MCP server via stdio or HTTP transport. Access external tools from GitHub, databases, file systems, and internal APIs without writing native Hermes tools. Includes per-server tool filtering and sampling support.
- **[Provider Routing](provider-routing.md)** — Fine-grained control over which AI providers handle your requests. Optimize for cost, speed, or quality with sorting, whitelists, blacklists, and priority ordering.
- **[Fallback Providers](fallback-providers.md)** — Automatic failover to backup LLM providers when your primary model encounters errors, including independent fallback for auxiliary tasks like vision and compression.
- **[Credential Pools](credential-pools.md)** — Distribute API calls across multiple keys for the same provider. Automatic rotation on rate limits or failures.
- **[Memory Providers](memory-providers.md)** — Plug in external memory backends (Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover, Supermemory) for cross-session user modeling and personalization beyond the built-in memory system.
- **[API Server](api-server.md)** — Expose Hermes as an OpenAI-compatible HTTP endpoint. Connect any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, and more.
- **[IDE Integration (ACP)](acp.md)** — Use Hermes inside ACP-compatible editors such as VS Code, Zed, and JetBrains. Chat, tool activity, file diffs, and terminal commands render inside your editor.
- **[RL Training](rl-training.md)** — Generate trajectory data from agent sessions for reinforcement learning and model fine-tuning.
## Customization
- **[Personality & SOUL.md](personality.md)** — Fully customizable agent personality. `SOUL.md` is the primary identity file — the first thing in the system prompt — and you can swap in built-in or custom `/personality` presets per session.
- **[Skins & Themes](skins.md)** — Customize the CLI's visual presentation: banner colors, spinner faces and verbs, response-box labels, branding text, and the tool activity prefix.
- **[Plugins](plugins.md)** — Add custom tools, hooks, and integrations without modifying core code. Three plugin types: general plugins (tools/hooks), memory providers (cross-session knowledge), and context engines (alternative context management). Managed via the unified `hermes plugins` interactive UI.
---
# Tools & Toolsets
# Tools & Toolsets
Tools are functions that extend the agent's capabilities. They're organized into logical **toolsets** that can be enabled or disabled per platform.
## Available Tools
Hermes ships with a broad built-in tool registry covering web search, browser automation, terminal execution, file editing, memory, delegation, RL training, messaging delivery, Home Assistant, and more.
:::note
**Honcho cross-session memory** is available as a memory provider plugin (`plugins/memory/honcho/`), not as a built-in toolset. See [Plugins](./plugins.md) for installation.
:::
High-level categories:
| Category | Examples | Description |
|----------|----------|-------------|
| **Web** | `web_search`, `web_extract` | Search the web and extract page content. |
| **Terminal & Files** | `terminal`, `process`, `read_file`, `patch` | Execute commands and manipulate files. |
| **Browser** | `browser_navigate`, `browser_snapshot`, `browser_vision` | Interactive browser automation with text and vision support. |
| **Media** | `vision_analyze`, `image_generate`, `text_to_speech` | Multimodal analysis and generation. |
| **Agent orchestration** | `todo`, `clarify`, `execute_code`, `delegate_task` | Planning, clarification, code execution, and subagent delegation. |
| **Memory & recall** | `memory`, `session_search` | Persistent memory and session search. |
| **Automation & delivery** | `cronjob`, `send_message` | Scheduled tasks with create/list/update/pause/resume/run/remove actions, plus outbound messaging delivery. |
| **Integrations** | `ha_*`, MCP server tools, `rl_*` | Home Assistant, MCP, RL training, and other integrations. |
For the authoritative code-derived registry, see [Built-in Tools Reference](/docs/reference/tools-reference) and [Toolsets Reference](/docs/reference/toolsets-reference).
:::tip Nous Tool Gateway
Paid [Nous Portal](https://portal.nousresearch.com) subscribers can use web search, image generation, TTS, and browser automation through the **[Tool Gateway](tool-gateway.md)** — no separate API keys needed. Run `hermes model` to enable it, or configure individual tools with `hermes tools`.
:::
## Using Toolsets
```bash
# Use specific toolsets
hermes chat --toolsets "web,terminal"
# See all available tools
hermes tools
# Configure tools per platform (interactive)
hermes tools
```
Common toolsets include `web`, `search`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `moa`, `skills`, `tts`, `todo`, `memory`, `session_search`, `cronjob`, `code_execution`, `delegation`, `clarify`, `homeassistant`, `messaging`, `spotify`, `discord`, `discord_admin`, `debugging`, `safe`, and `rl`.
See [Toolsets Reference](/docs/reference/toolsets-reference) for the full set, including platform presets such as `hermes-cli`, `hermes-telegram`, and dynamic MCP toolsets like `mcp-`.
## Terminal Backends
The terminal tool can execute commands in different environments:
| Backend | Description | Use Case |
|---------|-------------|----------|
| `local` | Run on your machine (default) | Development, trusted tasks |
| `docker` | Isolated containers | Security, reproducibility |
| `ssh` | Remote server | Sandboxing, keep agent away from its own code |
| `singularity` | HPC containers | Cluster computing, rootless |
| `modal` | Cloud execution | Serverless, scale |
| `daytona` | Cloud sandbox workspace | Persistent remote dev environments |
| `vercel_sandbox` | Vercel Sandbox cloud microVM | Cloud execution with snapshot-backed filesystem persistence |
### Configuration
```yaml
# In ~/.hermes/config.yaml
terminal:
backend: local # or: docker, ssh, singularity, modal, daytona, vercel_sandbox
cwd: "." # Working directory
timeout: 180 # Command timeout in seconds
```
### Docker Backend
```yaml
terminal:
backend: docker
docker_image: python:3.11-slim
```
**One persistent container, shared across the whole process.** Hermes starts a single long-lived container on first use (`docker run -d ... sleep 2h`) and routes every terminal, file, and `execute_code` call through `docker exec` into that same container. Working-directory changes, installed packages, environment tweaks, and files written to `/workspace` all carry over from one tool call to the next, across `/new`, `/reset`, and `delegate_task` subagents, for the lifetime of the Hermes process. The container is stopped and removed on shutdown.
This means the Docker backend behaves like a persistent sandbox VM, not a fresh container per command. If you `pip install foo` once, it's there for the rest of the session. If you `cd /workspace/project`, subsequent `ls` calls see that directory. See [Configuration → Docker Backend](../configuration.md#docker-backend) for the full lifecycle details and the `container_persistent` flag that controls whether `/workspace` and `/root` survive across Hermes restarts.
### SSH Backend
Recommended for security — agent can't modify its own code:
```yaml
terminal:
backend: ssh
```
```bash
# Set credentials in ~/.hermes/.env
TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=myuser
TERMINAL_SSH_KEY=~/.ssh/id_rsa
```
### Singularity/Apptainer
```bash
# Pre-build SIF for parallel workers
apptainer build ~/python.sif docker://python:3.11-slim
# Configure
hermes config set terminal.backend singularity
hermes config set terminal.singularity_image ~/python.sif
```
### Modal (Serverless Cloud)
```bash
uv pip install modal
modal setup
hermes config set terminal.backend modal
```
### Vercel Sandbox
```bash
pip install 'hermes-agent[vercel]'
hermes config set terminal.backend vercel_sandbox
hermes config set terminal.vercel_runtime node24
```
Authenticate with all three of `VERCEL_TOKEN`, `VERCEL_PROJECT_ID`, and `VERCEL_TEAM_ID`. This access-token setup is the supported path for deployments and normal long-running Hermes processes on Render, Railway, Docker, and similar hosts. Supported runtimes are `node24`, `node22`, and `python3.13`; Hermes defaults to `/vercel/sandbox` as the remote workspace root.
For one-off local development, Hermes also accepts short-lived Vercel OIDC tokens:
```bash
VERCEL_OIDC_TOKEN="$(vc project token )" hermes chat
```
From a linked Vercel project directory:
```bash
VERCEL_OIDC_TOKEN="$(vc project token)" hermes chat
```
With `container_persistent: true`, Hermes uses Vercel snapshots to preserve filesystem state across sandbox recreation for the same task. This can include Hermes-synced credentials, skills, and cache files inside the sandbox. Snapshots do not preserve live processes, PID space, or the same live sandbox identity.
Background terminal commands use Hermes' generic non-local process flow: spawn, poll, wait, log, and kill work through the normal process tool while the sandbox is alive, but Hermes does not provide native Vercel detached-process recovery after cleanup or restart.
Leave `container_disk` unset or at the shared default `51200`; custom disk sizing is unsupported for Vercel Sandbox and will fail diagnostics/backend creation.
### Container Resources
Configure CPU, memory, disk, and persistence for all container backends:
```yaml
terminal:
backend: docker # or singularity, modal, daytona, vercel_sandbox
container_cpu: 1 # CPU cores (default: 1)
container_memory: 5120 # Memory in MB (default: 5GB)
container_disk: 51200 # Disk in MB (default: 50GB)
container_persistent: true # Persist filesystem across sessions (default: true)
```
When `container_persistent: true`, installed packages, files, and config survive across sessions.
### Container Security
All container backends run with security hardening:
- Read-only root filesystem (Docker)
- All Linux capabilities dropped
- No privilege escalation
- PID limits (256 processes)
- Full namespace isolation
- Persistent workspace via volumes, not writable root layer
Docker can optionally receive an explicit env allowlist via `terminal.docker_forward_env`, but forwarded variables are visible to commands inside the container and should be treated as exposed to that session.
## Background Process Management
Start background processes and manage them:
```python
terminal(command="pytest -v tests/", background=true)
# Returns: {"session_id": "proc_abc123", "pid": 12345}
# Then manage with the process tool:
process(action="list") # Show all running processes
process(action="poll", session_id="proc_abc123") # Check status
process(action="wait", session_id="proc_abc123") # Block until done
process(action="log", session_id="proc_abc123") # Full output
process(action="kill", session_id="proc_abc123") # Terminate
process(action="write", session_id="proc_abc123", data="y") # Send input
```
PTY mode (`pty=true`) enables interactive CLI tools like Codex and Claude Code.
## Sudo Support
If a command needs sudo, you'll be prompted for your password (cached for the session). Or set `SUDO_PASSWORD` in `~/.hermes/.env`.
:::warning
On messaging platforms, if sudo fails, the output includes a tip to add `SUDO_PASSWORD` to `~/.hermes/.env`.
:::
---
# Skills System
# Skills System
Skills are on-demand knowledge documents the agent can load when needed. They follow a **progressive disclosure** pattern to minimize token usage and are compatible with the [agentskills.io](https://agentskills.io/specification) open standard.
All skills live in **`~/.hermes/skills/`** — the primary directory and source of truth. On fresh install, bundled skills are copied from the repo. Hub-installed and agent-created skills also go here. The agent can modify or delete any skill.
You can also point Hermes at **external skill directories** — additional folders scanned alongside the local one. See [External Skill Directories](#external-skill-directories) below.
See also:
- [Bundled Skills Catalog](/docs/reference/skills-catalog)
- [Official Optional Skills Catalog](/docs/reference/optional-skills-catalog)
## Using Skills
Every installed skill is automatically available as a slash command:
```bash
# In the CLI or any messaging platform:
/gif-search funny cats
/axolotl help me fine-tune Llama 3 on my dataset
/github-pr-workflow create a PR for the auth refactor
/plan design a rollout for migrating our auth provider
# Just the skill name loads it and lets the agent ask what you need:
/excalidraw
```
The bundled `plan` skill is a good example. Running `/plan [request]` loads the skill's instructions, telling Hermes to inspect context if needed, write a markdown implementation plan instead of executing the task, and save the result under `.hermes/plans/` relative to the active workspace/backend working directory.
You can also interact with skills through natural conversation:
```bash
hermes chat --toolsets skills -q "What skills do you have?"
hermes chat --toolsets skills -q "Show me the axolotl skill"
```
## Progressive Disclosure
Skills use a token-efficient loading pattern:
```
Level 0: skills_list() → [{name, description, category}, ...] (~3k tokens)
Level 1: skill_view(name) → Full content + metadata (varies)
Level 2: skill_view(name, path) → Specific reference file (varies)
```
The agent only loads the full skill content when it actually needs it.
## SKILL.md Format
```markdown
---
name: my-skill
description: Brief description of what this skill does
version: 1.0.0
platforms: [macos, linux] # Optional — restrict to specific OS platforms
metadata:
hermes:
tags: [python, automation]
category: devops
fallback_for_toolsets: [web] # Optional — conditional activation (see below)
requires_toolsets: [terminal] # Optional — conditional activation (see below)
config: # Optional — config.yaml settings
- key: my.setting
description: "What this controls"
default: "value"
prompt: "Prompt for setup"
---
# Skill Title
## When to Use
Trigger conditions for this skill.
## Procedure
1. Step one
2. Step two
## Pitfalls
- Known failure modes and fixes
## Verification
How to confirm it worked.
```
### Platform-Specific Skills
Skills can restrict themselves to specific operating systems using the `platforms` field:
| Value | Matches |
|-------|---------|
| `macos` | macOS (Darwin) |
| `linux` | Linux |
| `windows` | Windows |
```yaml
platforms: [macos] # macOS only (e.g., iMessage, Apple Reminders, FindMy)
platforms: [macos, linux] # macOS and Linux
```
When set, the skill is automatically hidden from the system prompt, `skills_list()`, and slash commands on incompatible platforms. If omitted, the skill loads on all platforms.
### Conditional Activation (Fallback Skills)
Skills can automatically show or hide themselves based on which tools are available in the current session. This is most useful for **fallback skills** — free or local alternatives that should only appear when a premium tool is unavailable.
```yaml
metadata:
hermes:
fallback_for_toolsets: [web] # Show ONLY when these toolsets are unavailable
requires_toolsets: [terminal] # Show ONLY when these toolsets are available
fallback_for_tools: [web_search] # Show ONLY when these specific tools are unavailable
requires_tools: [terminal] # Show ONLY when these specific tools are available
```
| Field | Behavior |
|-------|----------|
| `fallback_for_toolsets` | Skill is **hidden** when the listed toolsets are available. Shown when they're missing. |
| `fallback_for_tools` | Same, but checks individual tools instead of toolsets. |
| `requires_toolsets` | Skill is **hidden** when the listed toolsets are unavailable. Shown when they're present. |
| `requires_tools` | Same, but checks individual tools. |
**Example:** The built-in `duckduckgo-search` skill uses `fallback_for_toolsets: [web]`. When you have `FIRECRAWL_API_KEY` set, the web toolset is available and the agent uses `web_search` — the DuckDuckGo skill stays hidden. If the API key is missing, the web toolset is unavailable and the DuckDuckGo skill automatically appears as a fallback.
Skills without any conditional fields behave exactly as before — they're always shown.
## Secure Setup on Load
Skills can declare required environment variables without disappearing from discovery:
```yaml
required_environment_variables:
- name: TENOR_API_KEY
prompt: Tenor API key
help: Get a key from https://developers.google.com/tenor
required_for: full functionality
```
When a missing value is encountered, Hermes asks for it securely only when the skill is actually loaded in the local CLI. You can skip setup and keep using the skill. Messaging surfaces never ask for secrets in chat — they tell you to use `hermes setup` or `~/.hermes/.env` locally instead.
Once set, declared env vars are **automatically passed through** to `execute_code` and `terminal` sandboxes — the skill's scripts can use `$TENOR_API_KEY` directly. For non-skill env vars, use the `terminal.env_passthrough` config option. See [Environment Variable Passthrough](/docs/user-guide/security#environment-variable-passthrough) for details.
### Skill Config Settings
Skills can also declare non-secret config settings (paths, preferences) stored in `config.yaml`:
```yaml
metadata:
hermes:
config:
- key: myplugin.path
description: Path to the plugin data directory
default: "~/myplugin-data"
prompt: Plugin data directory path
```
Settings are stored under `skills.config` in your config.yaml. `hermes config migrate` prompts for unconfigured settings, and `hermes config show` displays them. When a skill loads, its resolved config values are injected into the context so the agent knows the configured values automatically.
See [Skill Settings](/docs/user-guide/configuration#skill-settings) and [Creating Skills — Config Settings](/docs/developer-guide/creating-skills#config-settings-configyaml) for details.
## Skill Directory Structure
```text
~/.hermes/skills/ # Single source of truth
├── mlops/ # Category directory
│ ├── axolotl/
│ │ ├── SKILL.md # Main instructions (required)
│ │ ├── references/ # Additional docs
│ │ ├── templates/ # Output formats
│ │ ├── scripts/ # Helper scripts callable from the skill
│ │ └── assets/ # Supplementary files
│ └── vllm/
│ └── SKILL.md
├── devops/
│ └── deploy-k8s/ # Agent-created skill
│ ├── SKILL.md
│ └── references/
├── .hub/ # Skills Hub state
│ ├── lock.json
│ ├── quarantine/
│ └── audit.log
└── .bundled_manifest # Tracks seeded bundled skills
```
## External Skill Directories
If you maintain skills outside of Hermes — for example, a shared `~/.agents/skills/` directory used by multiple AI tools — you can tell Hermes to scan those directories too.
Add `external_dirs` under the `skills` section in `~/.hermes/config.yaml`:
```yaml
skills:
external_dirs:
- ~/.agents/skills
- /home/shared/team-skills
- ${SKILLS_REPO}/skills
```
Paths support `~` expansion and `${VAR}` environment variable substitution.
### How it works
- **Read-only**: External dirs are only scanned for skill discovery. When the agent creates or edits a skill, it always writes to `~/.hermes/skills/`.
- **Local precedence**: If the same skill name exists in both the local dir and an external dir, the local version wins.
- **Full integration**: External skills appear in the system prompt index, `skills_list`, `skill_view`, and as `/skill-name` slash commands — no different from local skills.
- **Non-existent paths are silently skipped**: If a configured directory doesn't exist, Hermes ignores it without errors. Useful for optional shared directories that may not be present on every machine.
### Example
```text
~/.hermes/skills/ # Local (primary, read-write)
├── devops/deploy-k8s/
│ └── SKILL.md
└── mlops/axolotl/
└── SKILL.md
~/.agents/skills/ # External (read-only, shared)
├── my-custom-workflow/
│ └── SKILL.md
└── team-conventions/
└── SKILL.md
```
All four skills appear in your skill index. If you create a new skill called `my-custom-workflow` locally, it shadows the external version.
## Agent-Managed Skills (skill_manage tool)
The agent can create, update, and delete its own skills via the `skill_manage` tool. This is the agent's **procedural memory** — when it figures out a non-trivial workflow, it saves the approach as a skill for future reuse.
### When the Agent Creates Skills
- After completing a complex task (5+ tool calls) successfully
- When it hit errors or dead ends and found the working path
- When the user corrected its approach
- When it discovered a non-trivial workflow
### Actions
| Action | Use for | Key params |
|--------|---------|------------|
| `create` | New skill from scratch | `name`, `content` (full SKILL.md), optional `category` |
| `patch` | Targeted fixes (preferred) | `name`, `old_string`, `new_string` |
| `edit` | Major structural rewrites | `name`, `content` (full SKILL.md replacement) |
| `delete` | Remove a skill entirely | `name` |
| `write_file` | Add/update supporting files | `name`, `file_path`, `file_content` |
| `remove_file` | Remove a supporting file | `name`, `file_path` |
:::tip
The `patch` action is preferred for updates — it's more token-efficient than `edit` because only the changed text appears in the tool call.
:::
## Skills Hub
Browse, search, install, and manage skills from online registries, `skills.sh`, direct well-known skill endpoints, and official optional skills.
### Common commands
```bash
hermes skills browse # Browse all hub skills (official first)
hermes skills browse --source official # Browse only official optional skills
hermes skills search kubernetes # Search all sources
hermes skills search react --source skills-sh # Search the skills.sh directory
hermes skills search https://mintlify.com/docs --source well-known
hermes skills inspect openai/skills/k8s # Preview before installing
hermes skills install openai/skills/k8s # Install with security scan
hermes skills install official/security/1password
hermes skills install skills-sh/vercel-labs/json-render/json-render-react --force
hermes skills install well-known:https://mintlify.com/docs/.well-known/skills/mintlify
hermes skills install https://sharethis.chat/SKILL.md # Direct URL (single-file SKILL.md)
hermes skills install https://example.com/SKILL.md --name my-skill # Override name when frontmatter has none
hermes skills list --source hub # List hub-installed skills
hermes skills check # Check installed hub skills for upstream updates
hermes skills update # Reinstall hub skills with upstream changes when needed
hermes skills audit # Re-scan all hub skills for security
hermes skills uninstall k8s # Remove a hub skill
hermes skills reset google-workspace # Un-stick a bundled skill from "user-modified" (see below)
hermes skills reset google-workspace --restore # Also restore the bundled version, deleting your local edits
hermes skills publish skills/my-skill --to github --repo owner/repo
hermes skills snapshot export setup.json # Export skill config
hermes skills tap add myorg/skills-repo # Add a custom GitHub source
```
### Supported hub sources
| Source | Example | Notes |
|--------|---------|-------|
| `official` | `official/security/1password` | Optional skills shipped with Hermes. |
| `skills-sh` | `skills-sh/vercel-labs/agent-skills/vercel-react-best-practices` | Searchable via `hermes skills search --source skills-sh`. Hermes resolves alias-style skills when the skills.sh slug differs from the repo folder. |
| `well-known` | `well-known:https://mintlify.com/docs/.well-known/skills/mintlify` | Skills served directly from `/.well-known/skills/index.json` on a website. Search using the site or docs URL. |
| `url` | `https://sharethis.chat/SKILL.md` | Direct HTTP(S) URL to a single-file `SKILL.md`. Name resolution: frontmatter → URL slug → interactive prompt → `--name` flag. |
| `github` | `openai/skills/k8s` | Direct GitHub repo/path installs and custom taps. |
| `clawhub`, `lobehub`, `claude-marketplace` | Source-specific identifiers | Community or marketplace integrations. |
### Integrated hubs and registries
Hermes currently integrates with these skills ecosystems and discovery sources:
#### 1. Official optional skills (`official`)
These are maintained in the Hermes repository itself and install with builtin trust.
- Catalog: [Official Optional Skills Catalog](../../reference/optional-skills-catalog)
- Source in repo: `optional-skills/`
- Example:
```bash
hermes skills browse --source official
hermes skills install official/security/1password
```
#### 2. skills.sh (`skills-sh`)
This is Vercel's public skills directory. Hermes can search it directly, inspect skill detail pages, resolve alias-style slugs, and install from the underlying source repo.
- Directory: [skills.sh](https://skills.sh/)
- CLI/tooling repo: [vercel-labs/skills](https://github.com/vercel-labs/skills)
- Official Vercel skills repo: [vercel-labs/agent-skills](https://github.com/vercel-labs/agent-skills)
- Example:
```bash
hermes skills search react --source skills-sh
hermes skills inspect skills-sh/vercel-labs/json-render/json-render-react
hermes skills install skills-sh/vercel-labs/json-render/json-render-react --force
```
#### 3. Well-known skill endpoints (`well-known`)
This is URL-based discovery from sites that publish `/.well-known/skills/index.json`. It is not a single centralized hub — it is a web discovery convention.
- Example live endpoint: [Mintlify docs skills index](https://mintlify.com/docs/.well-known/skills/index.json)
- Reference server implementation: [vercel-labs/skills-handler](https://github.com/vercel-labs/skills-handler)
- Example:
```bash
hermes skills search https://mintlify.com/docs --source well-known
hermes skills inspect well-known:https://mintlify.com/docs/.well-known/skills/mintlify
hermes skills install well-known:https://mintlify.com/docs/.well-known/skills/mintlify
```
#### 4. Direct GitHub skills (`github`)
Hermes can install directly from GitHub repositories and GitHub-based taps. This is useful when you already know the repo/path or want to add your own custom source repo.
Default taps (browsable without any setup):
- [openai/skills](https://github.com/openai/skills)
- [anthropics/skills](https://github.com/anthropics/skills)
- [VoltAgent/awesome-agent-skills](https://github.com/VoltAgent/awesome-agent-skills)
- [garrytan/gstack](https://github.com/garrytan/gstack)
- Example:
```bash
hermes skills install openai/skills/k8s
hermes skills tap add myorg/skills-repo
```
#### 5. ClawHub (`clawhub`)
A third-party skills marketplace integrated as a community source.
- Site: [clawhub.ai](https://clawhub.ai/)
- Hermes source id: `clawhub`
#### 6. Claude marketplace-style repos (`claude-marketplace`)
Hermes supports marketplace repos that publish Claude-compatible plugin/marketplace manifests.
Known integrated sources include:
- [anthropics/skills](https://github.com/anthropics/skills)
- [aiskillstore/marketplace](https://github.com/aiskillstore/marketplace)
Hermes source id: `claude-marketplace`
#### 7. LobeHub (`lobehub`)
Hermes can search and convert agent entries from LobeHub's public catalog into installable Hermes skills.
- Site: [LobeHub](https://lobehub.com/)
- Public agents index: [chat-agents.lobehub.com](https://chat-agents.lobehub.com/)
- Backing repo: [lobehub/lobe-chat-agents](https://github.com/lobehub/lobe-chat-agents)
- Hermes source id: `lobehub`
#### 8. Direct URL (`url`)
Install a single-file `SKILL.md` directly from any HTTP(S) URL — useful when an author hosts a skill on their own site (no hub listing, no GitHub path to type). Hermes fetches the URL, parses the YAML frontmatter, security-scans it, and installs.
- Hermes source id: `url`
- Identifier: the URL itself (no prefix needed)
- Scope: **single-file `SKILL.md`** only. Multi-file skills with `references/` or `scripts/` need a manifest and should be published via one of the other sources above.
```bash
hermes skills install https://sharethis.chat/SKILL.md
hermes skills install https://example.com/my-skill/SKILL.md --category productivity
```
Name resolution, in order:
1. `name:` field in the SKILL.md YAML frontmatter (recommended — every well-formed skill has one).
2. Parent directory name from the URL path (e.g. `.../my-skill/SKILL.md` → `my-skill`, or `.../my-skill.md` → `my-skill`), when it's a valid identifier (`^[a-z][a-z0-9_-]*$`).
3. Interactive prompt on a terminal with a TTY.
4. On non-interactive surfaces (the `/skills install` slash command inside the TUI, gateway platforms, scripts), a clean error pointing at the `--name` override.
```bash
# Frontmatter has no name and the URL slug is unhelpful — supply one:
hermes skills install https://example.com/SKILL.md --name sharethis-chat
# Or inside a chat session:
/skills install https://example.com/SKILL.md --name sharethis-chat
```
Trust level is always `community` — the same security scan runs as for every other source. The URL is stored as the install identifier, so `hermes skills update` re-fetches from the same URL automatically when you want to refresh.
### Security scanning and `--force`
All hub-installed skills go through a **security scanner** that checks for data exfiltration, prompt injection, destructive commands, supply-chain signals, and other threats.
`hermes skills inspect ...` now also surfaces upstream metadata when available:
- repo URL
- skills.sh detail page URL
- install command
- weekly installs
- upstream security audit statuses
- well-known index/endpoint URLs
Use `--force` when you have reviewed a third-party skill and want to override a non-dangerous policy block:
```bash
hermes skills install skills-sh/anthropics/skills/pdf --force
```
Important behavior:
- `--force` can override policy blocks for caution/warn-style findings.
- `--force` does **not** override a `dangerous` scan verdict.
- Official optional skills (`official/...`) are treated as builtin trust and do not show the third-party warning panel.
### Trust levels
| Level | Source | Policy |
|-------|--------|--------|
| `builtin` | Ships with Hermes | Always trusted |
| `official` | `optional-skills/` in the repo | Builtin trust, no third-party warning |
| `trusted` | Trusted registries/repos such as `openai/skills`, `anthropics/skills` | More permissive policy than community sources |
| `community` | Everything else (`skills.sh`, well-known endpoints, custom GitHub repos, most marketplaces) | Non-dangerous findings can be overridden with `--force`; `dangerous` verdicts stay blocked |
### Update lifecycle
The hub now tracks enough provenance to re-check upstream copies of installed skills:
```bash
hermes skills check # Report which installed hub skills changed upstream
hermes skills update # Reinstall only the skills with updates available
hermes skills update react # Update one specific installed hub skill
```
This uses the stored source identifier plus the current upstream bundle content hash to detect drift.
:::tip GitHub rate limits
Skills hub operations use the GitHub API, which has a rate limit of 60 requests/hour for unauthenticated users. If you see rate-limit errors during install or search, set `GITHUB_TOKEN` in your `.env` file to increase the limit to 5,000 requests/hour. The error message includes an actionable hint when this happens.
:::
## Bundled skill updates (`hermes skills reset`)
Hermes ships with a set of bundled skills in `skills/` inside the repo. On install and on every `hermes update`, a sync pass copies those into `~/.hermes/skills/` and records a manifest at `~/.hermes/skills/.bundled_manifest` mapping each skill name to the content hash at the time it was synced (the **origin hash**).
On each sync, Hermes recomputes the hash of your local copy and compares it to the origin hash:
- **Unchanged** → safe to pull upstream changes, copy the new bundled version in, record the new origin hash.
- **Changed** → treated as **user-modified** and skipped forever, so your edits never get stomped.
The protection is good, but it has one sharp edge. If you edit a bundled skill and then later want to abandon your changes and go back to the bundled version by just copy-pasting from `~/.hermes/hermes-agent/skills/`, the manifest still holds the *old* origin hash from whenever the last successful sync ran. Your fresh copy-paste contents (current bundled hash) won't match that stale origin hash, so sync keeps flagging it as user-modified.
`hermes skills reset` is the escape hatch:
```bash
# Safe: clears the manifest entry for this skill. Your current copy is preserved,
# but the next sync re-baselines against it so future updates work normally.
hermes skills reset google-workspace
# Full restore: also deletes your local copy and re-copies the current bundled
# version. Use this when you want the pristine upstream skill back.
hermes skills reset google-workspace --restore
# Non-interactive (e.g. in scripts or TUI mode) — skip the --restore confirmation.
hermes skills reset google-workspace --restore --yes
```
The same command works in chat as a slash command:
```text
/skills reset google-workspace
/skills reset google-workspace --restore
```
:::note Profiles
Each profile has its own `.bundled_manifest` under its own `HERMES_HOME`, so `hermes -p coder skills reset ` only affects that profile.
:::
### Slash commands (inside chat)
All the same commands work with `/skills`:
```text
/skills browse
/skills search react --source skills-sh
/skills search https://mintlify.com/docs --source well-known
/skills inspect skills-sh/vercel-labs/json-render/json-render-react
/skills install openai/skills/skill-creator --force
/skills check
/skills update
/skills reset google-workspace
/skills list
```
Official optional skills still use identifiers like `official/security/1password` and `official/migration/openclaw-migration`.
---
# Curator
# Curator
The curator is a background maintenance pass for **agent-created skills**. It tracks how often each skill is viewed, used, and patched, moves long-unused skills through `active → stale → archived` states, and periodically spawns a short auxiliary-model review that proposes consolidations or patches drift.
It exists so that skills created via the [self-improvement loop](/docs/user-guide/features/skills#agent-managed-skills-skill_manage-tool) don't pile up forever. Every time the agent solves a novel problem and saves a skill, that skill lands in `~/.hermes/skills/`. Without maintenance, you end up with dozens of narrow near-duplicates that pollute the catalog and waste tokens.
The curator **never touches** bundled skills (shipped with the repo) or hub-installed skills (from [agentskills.io](https://agentskills.io)). It only reviews skills the agent itself authored. It also **never auto-deletes** — the worst outcome is archival into `~/.hermes/skills/.archive/`, which is recoverable.
Tracks [issue #7816](https://github.com/NousResearch/hermes-agent/issues/7816).
## How it runs
The curator is triggered by an inactivity check, not a cron daemon. On CLI session start, and on a recurring tick inside the gateway's cron-ticker thread, Hermes checks whether:
1. Enough time has passed since the last curator run (`interval_hours`, default **7 days**), and
2. The agent has been idle long enough (`min_idle_hours`, default **2 hours**).
If both are true, it spawns a background fork of `AIAgent` — the same pattern used by the memory/skill self-improvement nudges. The fork runs in its own prompt cache and never touches the active conversation.
:::info First-run behavior
On a brand-new install (or the first time a pre-curator install ticks after `hermes update`), the curator **does not run immediately**. The first observation seeds `last_run_at` to "now" and defers the first real pass by one full `interval_hours`. This gives you a full interval to review your skill library, pin anything important, or opt out entirely before the curator ever touches it.
If you want to see what the curator *would* do before it runs for real, run `hermes curator run --dry-run` — it produces the same review report without mutating the library.
:::
A run has two phases:
1. **Automatic transitions** (deterministic, no LLM). Skills unused for `stale_after_days` (30) become `stale`; skills unused for `archive_after_days` (90) are moved to `~/.hermes/skills/.archive/`.
2. **LLM review** (single aux-model pass, `max_iterations=8`). The forked agent surveys the agent-created skills, can read any of them with `skill_view`, and decides per-skill whether to keep, patch (via `skill_manage`), consolidate overlapping ones, or archive via the terminal tool.
Pinned skills are off-limits to both the curator's auto-transitions and the agent's own `skill_manage` tool. See [Pinning a skill](#pinning-a-skill) below.
## Configuration
All settings live in `config.yaml` under `curator:` (not `.env` — this isn't a secret). Defaults:
```yaml
curator:
enabled: true
interval_hours: 168 # 7 days
min_idle_hours: 2
stale_after_days: 30
archive_after_days: 90
```
To disable entirely, set `curator.enabled: false`.
### Running the review on a cheaper aux model
The curator's LLM review pass is a regular auxiliary task slot — `auxiliary.curator` — alongside Vision, Compression, Session Search, etc. "Auto" means "use my main chat model"; override the slot to pin a specific provider + model for the review pass instead.
**Easiest — `hermes model`:**
```bash
hermes model # → "Auxiliary models — side-task routing"
# → pick "Curator" → pick provider → pick model
```
The same picker is available in the web dashboard under the **Models** tab.
**Direct config.yaml (equivalent):**
```yaml
auxiliary:
curator:
provider: openrouter
model: google/gemini-3-flash-preview
timeout: 600 # generous — reviews can take several minutes
```
Leaving `provider: auto` (the default) routes the review pass through whatever your main chat model is, matching the behavior of every other auxiliary task.
:::note Legacy config
Earlier releases used a one-off `curator.auxiliary.{provider,model}` block. That path still works but emits a deprecation log line — please migrate to `auxiliary.curator` above so the curator shares the same plumbing (`hermes model`, dashboard Models tab, `base_url`, `api_key`, `timeout`, `extra_body`) as every other aux task.
:::
## CLI
```bash
hermes curator status # last run, counts, pinned list, LRU top 5
hermes curator run # trigger a review now (background by default)
hermes curator run --sync # same, but block until the LLM pass finishes
hermes curator run --dry-run # preview only — report without any mutations
hermes curator backup # take a manual snapshot of ~/.hermes/skills/
hermes curator rollback # restore from the newest snapshot
hermes curator rollback --list # list available snapshots
hermes curator rollback --id # restore a specific snapshot
hermes curator rollback -y # skip the confirmation prompt
hermes curator pause # stop runs until resumed
hermes curator resume
hermes curator pin # never auto-transition this skill
hermes curator unpin
hermes curator restore # move an archived skill back to active
```
## Backups and rollback
Before every real curator pass, Hermes takes a tar.gz snapshot of `~/.hermes/skills/` at `~/.hermes/skills/.curator_backups//skills.tar.gz`. If a pass archives or consolidates something you didn't want touched, you can undo the whole run with one command:
```bash
hermes curator rollback # restore newest snapshot (with confirmation)
hermes curator rollback -y # skip the prompt
hermes curator rollback --list # see all snapshots with reason + size
```
The rollback itself is reversible: before replacing the skills tree, Hermes takes another snapshot tagged `pre-rollback to `, so a mistaken rollback can be undone by rolling forward to that one with `--id`.
You can also take manual snapshots at any time with `hermes curator backup --reason "before-refactor"`. The `--reason` string lands in the snapshot's `manifest.json` and is shown in `--list`.
Snapshots are pruned to `curator.backup.keep` (default 5) to keep disk usage bounded:
```yaml
curator:
backup:
enabled: true
keep: 5
```
Set `curator.backup.enabled: false` to disable automatic snapshotting. The manual `hermes curator backup` command still works when backups are disabled only if you set `enabled: true` first — the flag gates both paths symmetrically so there's no way to accidentally skip the pre-run snapshot on mutating runs.
`hermes curator status` also lists the five least-recently-used skills — a quick way to see what's likely to become stale next.
The same subcommands are available as the `/curator` slash command inside a running session (CLI or gateway platforms).
## What "agent-created" means
A skill is considered agent-created if its name is **not** in:
- `~/.hermes/skills/.bundled_manifest` (skills copied from the repo on install), and
- `~/.hermes/skills/.hub/lock.json` (skills installed via `hermes skills install`).
Everything else in `~/.hermes/skills/` is fair game for the curator. This includes:
- Skills the agent saved via `skill_manage(action="create")` during a conversation.
- Skills you created manually with a hand-written `SKILL.md`.
- Skills added via external skill directories you've pointed Hermes at.
:::warning Your hand-written skills look the same as agent-saved ones
Provenance here is **binary** (bundled/hub vs. everything else). The curator cannot tell a hand-authored skill you rely on for private workflows apart from a skill the self-improvement loop saved mid-session. Both land in the "agent-created" bucket.
Before the first real pass (7 days after installation by default), take a moment to:
1. Run `hermes curator run --dry-run` to see exactly what the curator would propose.
2. Use `hermes curator pin ` to fence off anything you don't want touched.
3. Or set `curator.enabled: false` in `config.yaml` if you'd rather manage the library yourself.
Archives are always recoverable via `hermes curator restore `, but it's easier to pin up-front than to chase down a consolidation after the fact.
:::
If you want to protect a specific skill from ever being touched — for example a hand-authored skill you rely on — use `hermes curator pin `. See the next section.
## Pinning a skill
Pinning protects a skill from deletion — both the curator's automated archive passes and the agent's `skill_manage(action="delete")` tool call. Once a skill is pinned:
- The **curator** skips it during auto-transitions (`active → stale → archived`), and its LLM review pass is instructed to leave it alone.
- The **agent's `skill_manage` tool** refuses `delete` on it, pointing the user at `hermes curator unpin `. Patches and edits still go through, so the agent can improve a pinned skill's content as pitfalls come up without a pin/unpin/re-pin dance.
Pin and unpin with:
```bash
hermes curator pin
hermes curator unpin
```
The flag is stored as `"pinned": true` on the skill's entry in `~/.hermes/skills/.usage.json`, so it survives across sessions.
Only **agent-created** skills can be pinned — bundled and hub-installed skills are never subject to curator mutation in the first place, and `hermes curator pin` will refuse with an explanatory message if you try.
If you want a stronger guarantee than "no deletion" — for instance, freezing a skill's content entirely while the agent still reads it — edit `~/.hermes/skills//SKILL.md` directly with your editor. The pin guards tool-driven deletion, not your own filesystem access.
## Usage telemetry
The curator maintains a sidecar at `~/.hermes/skills/.usage.json` with one entry per skill:
```json
{
"my-skill": {
"use_count": 12,
"view_count": 34,
"last_used_at": "2026-04-24T18:12:03Z",
"last_viewed_at": "2026-04-23T09:44:17Z",
"patch_count": 3,
"last_patched_at": "2026-04-20T22:01:55Z",
"created_at": "2026-03-01T14:20:00Z",
"state": "active",
"pinned": false,
"archived_at": null
}
}
```
Counters increment when:
- `view_count`: the agent calls `skill_view` on the skill.
- `use_count`: the skill is loaded into a conversation's prompt.
- `patch_count`: `skill_manage patch/edit/write_file/remove_file` runs on the skill.
Bundled and hub-installed skills are explicitly excluded from telemetry writes.
## Per-run reports
Every curator run writes a timestamped directory under `~/.hermes/logs/curator/`:
```
~/.hermes/logs/curator/
└── 20260429-111512/
├── run.json # machine-readable: full fidelity, stats, LLM output
└── REPORT.md # human-readable summary
```
`REPORT.md` is a quick way to see what a given run did — which skills transitioned, what the LLM reviewer said, which skills it patched. Good for auditing without having to grep `agent.log`.
## Restoring an archived skill
If the curator archived something you still want:
```bash
hermes curator restore
```
This moves the skill back from `~/.hermes/skills/.archive/` to the active tree and resets its state to `active`. The restore refuses if a bundled or hub-installed skill has since been installed under the same name (would shadow upstream).
## Disabling per environment
The curator is on by default. To turn it off:
- **For one profile only:** edit `~/.hermes/config.yaml` (or the active profile's config) and set `curator.enabled: false`.
- **For just one run:** `hermes curator pause` — the pause persists across sessions; use `resume` to re-enable.
The curator also refuses to run if `min_idle_hours` hasn't elapsed, so on an active dev machine it naturally only runs during quiet stretches.
## See also
- [Skills System](/docs/user-guide/features/skills) — how skills work in general and the self-improvement loop that creates them
- [Memory](/docs/user-guide/features/memory) — a parallel background review that maintains long-term memory
- [Bundled Skills Catalog](/docs/reference/skills-catalog)
- [Issue #7816](https://github.com/NousResearch/hermes-agent/issues/7816) — original proposal and design discussion
---
# Persistent Memory
# Persistent Memory
Hermes Agent has bounded, curated memory that persists across sessions. This lets it remember your preferences, your projects, your environment, and things it has learned.
## How It Works
Two files make up the agent's memory:
| File | Purpose | Char Limit |
|------|---------|------------|
| **MEMORY.md** | Agent's personal notes — environment facts, conventions, things learned | 2,200 chars (~800 tokens) |
| **USER.md** | User profile — your preferences, communication style, expectations | 1,375 chars (~500 tokens) |
Both are stored in `~/.hermes/memories/` and are injected into the system prompt as a frozen snapshot at session start. The agent manages its own memory via the `memory` tool — it can add, replace, or remove entries.
:::info
Character limits keep memory focused. When memory is full, the agent consolidates or replaces entries to make room for new information.
:::
## How Memory Appears in the System Prompt
At the start of every session, memory entries are loaded from disk and rendered into the system prompt as a frozen block:
```
══════════════════════════════════════════════
MEMORY (your personal notes) [67% — 1,474/2,200 chars]
══════════════════════════════════════════════
User's project is a Rust web service at ~/code/myapi using Axum + SQLx
§
This machine runs Ubuntu 22.04, has Docker and Podman installed
§
User prefers concise responses, dislikes verbose explanations
```
The format includes:
- A header showing which store (MEMORY or USER PROFILE)
- Usage percentage and character counts so the agent knows capacity
- Individual entries separated by `§` (section sign) delimiters
- Entries can be multiline
**Frozen snapshot pattern:** The system prompt injection is captured once at session start and never changes mid-session. This is intentional — it preserves the LLM's prefix cache for performance. When the agent adds/removes memory entries during a session, the changes are persisted to disk immediately but won't appear in the system prompt until the next session starts. Tool responses always show the live state.
## Memory Tool Actions
The agent uses the `memory` tool with these actions:
- **add** — Add a new memory entry
- **replace** — Replace an existing entry with updated content (uses substring matching via `old_text`)
- **remove** — Remove an entry that's no longer relevant (uses substring matching via `old_text`)
There is no `read` action — memory content is automatically injected into the system prompt at session start. The agent sees its memories as part of its conversation context.
### Substring Matching
The `replace` and `remove` actions use short unique substring matching — you don't need the full entry text. The `old_text` parameter just needs to be a unique substring that identifies exactly one entry:
```python
# If memory contains "User prefers dark mode in all editors"
memory(action="replace", target="memory",
old_text="dark mode",
content="User prefers light mode in VS Code, dark mode in terminal")
```
If the substring matches multiple entries, an error is returned asking for a more specific match.
## Two Targets Explained
### `memory` — Agent's Personal Notes
For information the agent needs to remember about the environment, workflows, and lessons learned:
- Environment facts (OS, tools, project structure)
- Project conventions and configuration
- Tool quirks and workarounds discovered
- Completed task diary entries
- Skills and techniques that worked
### `user` — User Profile
For information about the user's identity, preferences, and communication style:
- Name, role, timezone
- Communication preferences (concise vs detailed, format preferences)
- Pet peeves and things to avoid
- Workflow habits
- Technical skill level
## What to Save vs Skip
### Save These (Proactively)
The agent saves automatically — you don't need to ask. It saves when it learns:
- **User preferences:** "I prefer TypeScript over JavaScript" → save to `user`
- **Environment facts:** "This server runs Debian 12 with PostgreSQL 16" → save to `memory`
- **Corrections:** "Don't use `sudo` for Docker commands, user is in docker group" → save to `memory`
- **Conventions:** "Project uses tabs, 120-char line width, Google-style docstrings" → save to `memory`
- **Completed work:** "Migrated database from MySQL to PostgreSQL on 2026-01-15" → save to `memory`
- **Explicit requests:** "Remember that my API key rotation happens monthly" → save to `memory`
### Skip These
- **Trivial/obvious info:** "User asked about Python" — too vague to be useful
- **Easily re-discovered facts:** "Python 3.12 supports f-string nesting" — can web search this
- **Raw data dumps:** Large code blocks, log files, data tables — too big for memory
- **Session-specific ephemera:** Temporary file paths, one-off debugging context
- **Information already in context files:** SOUL.md and AGENTS.md content
## Capacity Management
Memory has strict character limits to keep system prompts bounded:
| Store | Limit | Typical entries |
|-------|-------|----------------|
| memory | 2,200 chars | 8-15 entries |
| user | 1,375 chars | 5-10 entries |
### What Happens When Memory is Full
When you try to add an entry that would exceed the limit, the tool returns an error:
```json
{
"success": false,
"error": "Memory at 2,100/2,200 chars. Adding this entry (250 chars) would exceed the limit. Replace or remove existing entries first.",
"current_entries": ["..."],
"usage": "2,100/2,200"
}
```
The agent should then:
1. Read the current entries (shown in the error response)
2. Identify entries that can be removed or consolidated
3. Use `replace` to merge related entries into shorter versions
4. Then `add` the new entry
**Best practice:** When memory is above 80% capacity (visible in the system prompt header), consolidate entries before adding new ones. For example, merge three separate "project uses X" entries into one comprehensive project description entry.
### Practical Examples of Good Memory Entries
**Compact, information-dense entries work best:**
```
# Good: Packs multiple related facts
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop and Podman. Shell: zsh with oh-my-zsh. Editor: VS Code with Vim keybindings.
# Good: Specific, actionable convention
Project ~/code/api uses Go 1.22, sqlc for DB queries, chi router. Run tests with 'make test'. CI via GitHub Actions.
# Good: Lesson learned with context
The staging server (10.0.1.50) needs SSH port 2222, not 22. Key is at ~/.ssh/staging_ed25519.
# Bad: Too vague
User has a project.
# Bad: Too verbose
On January 5th, 2026, the user asked me to look at their project which is
located at ~/code/api. I discovered it uses Go version 1.22 and...
```
## Duplicate Prevention
The memory system automatically rejects exact duplicate entries. If you try to add content that already exists, it returns success with a "no duplicate added" message.
## Security Scanning
Memory entries are scanned for injection and exfiltration patterns before being accepted, since they're injected into the system prompt. Content matching threat patterns (prompt injection, credential exfiltration, SSH backdoors) or containing invisible Unicode characters is blocked.
## Session Search
Beyond MEMORY.md and USER.md, the agent can search its past conversations using the `session_search` tool:
- All CLI and messaging sessions are stored in SQLite (`~/.hermes/state.db`) with FTS5 full-text search
- Search queries return relevant past conversations with Gemini Flash summarization
- The agent can find things it discussed weeks ago, even if they're not in its active memory
```bash
hermes sessions list # Browse past sessions
```
### session_search vs memory
| Feature | Persistent Memory | Session Search |
|---------|------------------|----------------|
| **Capacity** | ~1,300 tokens total | Unlimited (all sessions) |
| **Speed** | Instant (in system prompt) | Requires search + LLM summarization |
| **Use case** | Key facts always available | Finding specific past conversations |
| **Management** | Manually curated by agent | Automatic — all sessions stored |
| **Token cost** | Fixed per session (~1,300 tokens) | On-demand (searched when needed) |
**Memory** is for critical facts that should always be in context. **Session search** is for "did we discuss X last week?" queries where the agent needs to recall specifics from past conversations.
## Configuration
```yaml
# In ~/.hermes/config.yaml
memory:
memory_enabled: true
user_profile_enabled: true
memory_char_limit: 2200 # ~800 tokens
user_char_limit: 1375 # ~500 tokens
```
## External Memory Providers
For deeper, persistent memory that goes beyond MEMORY.md and USER.md, Hermes ships with 8 external memory provider plugins — including Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover, and Supermemory.
External providers run **alongside** built-in memory (never replacing it) and add capabilities like knowledge graphs, semantic search, automatic fact extraction, and cross-session user modeling.
```bash
hermes memory setup # pick a provider and configure it
hermes memory status # check what's active
```
See the [Memory Providers](./memory-providers.md) guide for full details on each provider, setup instructions, and comparison.
---
# Memory Providers
# Memory Providers
Hermes Agent ships with 8 external memory provider plugins that give the agent persistent, cross-session knowledge beyond the built-in MEMORY.md and USER.md. Only **one** external provider can be active at a time — the built-in memory is always active alongside it.
## Quick Start
```bash
hermes memory setup # interactive picker + configuration
hermes memory status # check what's active
hermes memory off # disable external provider
```
You can also select the active memory provider via `hermes plugins` → Provider Plugins → Memory Provider.
Or set manually in `~/.hermes/config.yaml`:
```yaml
memory:
provider: openviking # or honcho, mem0, hindsight, holographic, retaindb, byterover, supermemory
```
## How It Works
When a memory provider is active, Hermes automatically:
1. **Injects provider context** into the system prompt (what the provider knows)
2. **Prefetches relevant memories** before each turn (background, non-blocking)
3. **Syncs conversation turns** to the provider after each response
4. **Extracts memories on session end** (for providers that support it)
5. **Mirrors built-in memory writes** to the external provider
6. **Adds provider-specific tools** so the agent can search, store, and manage memories
The built-in memory (MEMORY.md / USER.md) continues to work exactly as before. The external provider is additive.
## Available Providers
### Honcho
AI-native cross-session user modeling with dialectic reasoning, session-scoped context injection, semantic search, and persistent conclusions. Base context now includes the session summary alongside user representation and peer cards, giving the agent awareness of what has already been discussed.
| | |
|---|---|
| **Best for** | Multi-agent systems with cross-session context, user-agent alignment |
| **Requires** | `pip install honcho-ai` + [API key](https://app.honcho.dev) or self-hosted instance |
| **Data storage** | Honcho Cloud or self-hosted |
| **Cost** | Honcho pricing (cloud) / free (self-hosted) |
**Tools (5):** `honcho_profile` (read/update peer card), `honcho_search` (semantic search), `honcho_context` (session context — summary, representation, card, messages), `honcho_reasoning` (LLM-synthesized), `honcho_conclude` (create/delete conclusions)
**Architecture:** Two-layer context injection — a base layer (session summary + representation + peer card, refreshed on `contextCadence`) plus a dialectic supplement (LLM reasoning, refreshed on `dialecticCadence`). The dialectic automatically selects cold-start prompts (general user facts) vs. warm prompts (session-scoped context) based on whether base context exists.
**Three orthogonal config knobs** control cost and depth independently:
- `contextCadence` — how often the base layer refreshes (API call frequency)
- `dialecticCadence` — how often the dialectic LLM fires (LLM call frequency)
- `dialecticDepth` — how many `.chat()` passes per dialectic invocation (1–3, depth of reasoning)
**Setup Wizard:**
```bash
hermes honcho setup # (legacy command)
# or
hermes memory setup # select "honcho"
```
**Config:** `$HERMES_HOME/honcho.json` (profile-local) or `~/.honcho/config.json` (global). Resolution order: `$HERMES_HOME/honcho.json` > `~/.hermes/honcho.json` > `~/.honcho/config.json`. See the [config reference](https://github.com/hermes-ai/hermes-agent/blob/main/plugins/memory/honcho/README.md) and the [Honcho integration guide](https://docs.honcho.dev/v3/guides/integrations/hermes).
Full config reference
| Key | Default | Description |
|-----|---------|-------------|
| `apiKey` | -- | API key from [app.honcho.dev](https://app.honcho.dev) |
| `baseUrl` | -- | Base URL for self-hosted Honcho |
| `peerName` | -- | User peer identity |
| `aiPeer` | host key | AI peer identity (one per profile) |
| `workspace` | host key | Shared workspace ID |
| `contextTokens` | `null` (uncapped) | Token budget for auto-injected context per turn. Truncates at word boundaries |
| `contextCadence` | `1` | Minimum turns between `context()` API calls (base layer refresh) |
| `dialecticCadence` | `2` | Minimum turns between `peer.chat()` LLM calls. Recommended 1–5. Only applies to `hybrid`/`context` modes |
| `dialecticDepth` | `1` | Number of `.chat()` passes per dialectic invocation. Clamped 1–3. Pass 0: cold/warm prompt, pass 1: self-audit, pass 2: reconciliation |
| `dialecticDepthLevels` | `null` | Optional array of reasoning levels per pass, e.g. `["minimal", "low", "medium"]`. Overrides proportional defaults |
| `dialecticReasoningLevel` | `'low'` | Base reasoning level: `minimal`, `low`, `medium`, `high`, `max` |
| `dialecticDynamic` | `true` | When `true`, model can override reasoning level per-call via tool param |
| `dialecticMaxChars` | `600` | Max chars of dialectic result injected into system prompt |
| `recallMode` | `'hybrid'` | `hybrid` (auto-inject + tools), `context` (inject only), `tools` (tools only) |
| `writeFrequency` | `'async'` | When to flush messages: `async` (background thread), `turn` (sync), `session` (batch on end), or integer N |
| `saveMessages` | `true` | Whether to persist messages to Honcho API |
| `observationMode` | `'directional'` | `directional` (all on) or `unified` (shared pool). Override with `observation` object |
| `messageMaxChars` | `25000` | Max chars per message (chunked if exceeded) |
| `dialecticMaxInputChars` | `10000` | Max chars for dialectic query input to `peer.chat()` |
| `sessionStrategy` | `'per-directory'` | `per-directory`, `per-repo`, `per-session`, `global` |
Minimal honcho.json (cloud)
```json
{
"apiKey": "your-key-from-app.honcho.dev",
"hosts": {
"hermes": {
"enabled": true,
"aiPeer": "hermes",
"peerName": "your-name",
"workspace": "hermes"
}
}
}
```
Minimal honcho.json (self-hosted)
```json
{
"baseUrl": "http://localhost:8000",
"hosts": {
"hermes": {
"enabled": true,
"aiPeer": "hermes",
"peerName": "your-name",
"workspace": "hermes"
}
}
}
```
:::tip Migrating from `hermes honcho`
If you previously used `hermes honcho setup`, your config and all server-side data are intact. Just re-enable through the setup wizard again or manually set `memory.provider: honcho` to reactivate via the new system.
:::
**Multi-peer setup:**
Honcho models conversations as peers exchanging messages — one user peer plus one AI peer per Hermes profile, all sharing a workspace. The workspace is the shared environment: the user peer is global across profiles, each AI peer is its own identity. Every AI peer builds an independent representation / card from its own observations, so a `coder` profile stays code-oriented while a `writer` profile stays editorial against the same user.
The mapping:
| Concept | What it is |
|---------|-----------|
| **Workspace** | Shared environment. All Hermes profiles under one workspace see the same user identity. |
| **User peer** (`peerName`) | The human. Shared across profiles in the workspace. |
| **AI peer** (`aiPeer`) | One per Hermes profile. Host key `hermes` → default; `hermes.` for others. |
| **Observation** | Per-peer toggles controlling what Honcho models from whose messages. `directional` (default, all four on) or `unified` (single-observer pool). |
### New profile, fresh Honcho peer
```bash
hermes profile create coder --clone
```
`--clone` creates a `hermes.coder` host block in `honcho.json` with `aiPeer: "coder"`, shared `workspace`, inherited `peerName`, `recallMode`, `writeFrequency`, `observation`, etc. The AI peer is eagerly created in Honcho so it exists before the first message.
### Existing profiles, backfill Honcho peers
```bash
hermes honcho sync
```
Scans every Hermes profile, creates host blocks for any profile without one, inherits settings from the default `hermes` block, and creates the new AI peers eagerly. Idempotent — skips profiles that already have a host block.
### Per-profile observation
Each host block can override the observation config independently. Example: a code-focused profile where the AI peer observes the user but doesn't self-model:
```json
"hermes.coder": {
"aiPeer": "coder",
"observation": {
"user": { "observeMe": true, "observeOthers": true },
"ai": { "observeMe": false, "observeOthers": true }
}
}
```
**Observation toggles (one set per peer):**
| Toggle | Effect |
|--------|--------|
| `observeMe` | Honcho builds a representation of this peer from its own messages |
| `observeOthers` | This peer observes the other peer's messages (feeds cross-peer reasoning) |
Presets via `observationMode`:
- **`"directional"`** (default) — all four flags on. Full mutual observation; enables cross-peer dialectic.
- **`"unified"`** — user `observeMe: true`, AI `observeOthers: true`, rest false. Single-observer pool; AI models the user but not itself, user peer only self-models.
Server-side toggles set via the [Honcho dashboard](https://app.honcho.dev) win over local defaults — synced back at session init.
See the [Honcho page](./honcho.md#observation-directional-vs-unified) for the full observation reference.
Full honcho.json example (multi-profile)
```json
{
"apiKey": "your-key",
"workspace": "hermes",
"peerName": "eri",
"hosts": {
"hermes": {
"enabled": true,
"aiPeer": "hermes",
"workspace": "hermes",
"peerName": "eri",
"recallMode": "hybrid",
"writeFrequency": "async",
"sessionStrategy": "per-directory",
"observation": {
"user": { "observeMe": true, "observeOthers": true },
"ai": { "observeMe": true, "observeOthers": true }
},
"dialecticReasoningLevel": "low",
"dialecticDynamic": true,
"dialecticCadence": 2,
"dialecticDepth": 1,
"dialecticMaxChars": 600,
"contextCadence": 1,
"messageMaxChars": 25000,
"saveMessages": true
},
"hermes.coder": {
"enabled": true,
"aiPeer": "coder",
"workspace": "hermes",
"peerName": "eri",
"recallMode": "tools",
"observation": {
"user": { "observeMe": true, "observeOthers": false },
"ai": { "observeMe": true, "observeOthers": true }
}
},
"hermes.writer": {
"enabled": true,
"aiPeer": "writer",
"workspace": "hermes",
"peerName": "eri"
}
},
"sessions": {
"/home/user/myproject": "myproject-main"
}
}
```
See the [config reference](https://github.com/hermes-ai/hermes-agent/blob/main/plugins/memory/honcho/README.md) and [Honcho integration guide](https://docs.honcho.dev/v3/guides/integrations/hermes).
---
### OpenViking
Context database by Volcengine (ByteDance) with filesystem-style knowledge hierarchy, tiered retrieval, and automatic memory extraction into 6 categories.
| | |
|---|---|
| **Best for** | Self-hosted knowledge management with structured browsing |
| **Requires** | `pip install openviking` + running server |
| **Data storage** | Self-hosted (local or cloud) |
| **Cost** | Free (open-source, AGPL-3.0) |
**Tools:** `viking_search` (semantic search), `viking_read` (tiered: abstract/overview/full), `viking_browse` (filesystem navigation), `viking_remember` (store facts), `viking_add_resource` (ingest URLs/docs)
**Setup:**
```bash
# Start the OpenViking server first
pip install openviking
openviking-server
# Then configure Hermes
hermes memory setup # select "openviking"
# Or manually:
hermes config set memory.provider openviking
echo "OPENVIKING_ENDPOINT=http://localhost:1933" >> ~/.hermes/.env
```
**Key features:**
- Tiered context loading: L0 (~100 tokens) → L1 (~2k) → L2 (full)
- Automatic memory extraction on session commit (profile, preferences, entities, events, cases, patterns)
- `viking://` URI scheme for hierarchical knowledge browsing
---
### Mem0
Server-side LLM fact extraction with semantic search, reranking, and automatic deduplication.
| | |
|---|---|
| **Best for** | Hands-off memory management — Mem0 handles extraction automatically |
| **Requires** | `pip install mem0ai` + API key |
| **Data storage** | Mem0 Cloud |
| **Cost** | Mem0 pricing |
**Tools:** `mem0_profile` (all stored memories), `mem0_search` (semantic search + reranking), `mem0_conclude` (store verbatim facts)
**Setup:**
```bash
hermes memory setup # select "mem0"
# Or manually:
hermes config set memory.provider mem0
echo "MEM0_API_KEY=your-key" >> ~/.hermes/.env
```
**Config:** `$HERMES_HOME/mem0.json`
| Key | Default | Description |
|-----|---------|-------------|
| `user_id` | `hermes-user` | User identifier |
| `agent_id` | `hermes` | Agent identifier |
---
### Hindsight
Long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval. The `hindsight_reflect` tool provides cross-memory synthesis that no other provider offers. Automatically retains full conversation turns (including tool calls) with session-level document tracking.
| | |
|---|---|
| **Best for** | Knowledge graph-based recall with entity relationships |
| **Requires** | Cloud: API key from [ui.hindsight.vectorize.io](https://ui.hindsight.vectorize.io). Local: LLM API key (OpenAI, Groq, OpenRouter, etc.) |
| **Data storage** | Hindsight Cloud or local embedded PostgreSQL |
| **Cost** | Hindsight pricing (cloud) or free (local) |
**Tools:** `hindsight_retain` (store with entity extraction), `hindsight_recall` (multi-strategy search), `hindsight_reflect` (cross-memory synthesis)
**Setup:**
```bash
hermes memory setup # select "hindsight"
# Or manually:
hermes config set memory.provider hindsight
echo "HINDSIGHT_API_KEY=your-key" >> ~/.hermes/.env
```
The setup wizard installs dependencies automatically and only installs what's needed for the selected mode (`hindsight-client` for cloud, `hindsight-all` for local). Requires `hindsight-client >= 0.4.22` (auto-upgraded on session start if outdated).
**Local mode UI:** `hindsight-embed -p hermes ui start`
**Config:** `$HERMES_HOME/hindsight/config.json`
| Key | Default | Description |
|-----|---------|-------------|
| `mode` | `cloud` | `cloud` or `local` |
| `bank_id` | `hermes` | Memory bank identifier |
| `recall_budget` | `mid` | Recall thoroughness: `low` / `mid` / `high` |
| `memory_mode` | `hybrid` | `hybrid` (context + tools), `context` (auto-inject only), `tools` (tools only) |
| `auto_retain` | `true` | Automatically retain conversation turns |
| `auto_recall` | `true` | Automatically recall memories before each turn |
| `retain_async` | `true` | Process retain asynchronously on the server |
| `retain_context` | `conversation between Hermes Agent and the User` | Context label for retained memories |
| `retain_tags` | — | Default tags applied to retained memories; merged with per-call tool tags |
| `retain_source` | — | Optional `metadata.source` attached to retained memories |
| `retain_user_prefix` | `User` | Label used before user turns in auto-retained transcripts |
| `retain_assistant_prefix` | `Assistant` | Label used before assistant turns in auto-retained transcripts |
| `recall_tags` | — | Tags to filter on recall |
See [plugin README](https://github.com/NousResearch/hermes-agent/blob/main/plugins/memory/hindsight/README.md) for the full configuration reference.
---
### Holographic
Local SQLite fact store with FTS5 full-text search, trust scoring, and HRR (Holographic Reduced Representations) for compositional algebraic queries.
| | |
|---|---|
| **Best for** | Local-only memory with advanced retrieval, no external dependencies |
| **Requires** | Nothing (SQLite is always available). NumPy optional for HRR algebra. |
| **Data storage** | Local SQLite |
| **Cost** | Free |
**Tools:** `fact_store` (9 actions: add, search, probe, related, reason, contradict, update, remove, list), `fact_feedback` (helpful/unhelpful rating that trains trust scores)
**Setup:**
```bash
hermes memory setup # select "holographic"
# Or manually:
hermes config set memory.provider holographic
```
**Config:** `config.yaml` under `plugins.hermes-memory-store`
| Key | Default | Description |
|-----|---------|-------------|
| `db_path` | `$HERMES_HOME/memory_store.db` | SQLite database path |
| `auto_extract` | `false` | Auto-extract facts at session end |
| `default_trust` | `0.5` | Default trust score (0.0–1.0) |
**Unique capabilities:**
- `probe` — entity-specific algebraic recall (all facts about a person/thing)
- `reason` — compositional AND queries across multiple entities
- `contradict` — automated detection of conflicting facts
- Trust scoring with asymmetric feedback (+0.05 helpful / -0.10 unhelpful)
---
### RetainDB
Cloud memory API with hybrid search (Vector + BM25 + Reranking), 7 memory types, and delta compression.
| | |
|---|---|
| **Best for** | Teams already using RetainDB's infrastructure |
| **Requires** | RetainDB account + API key |
| **Data storage** | RetainDB Cloud |
| **Cost** | $20/month |
**Tools:** `retaindb_profile` (user profile), `retaindb_search` (semantic search), `retaindb_context` (task-relevant context), `retaindb_remember` (store with type + importance), `retaindb_forget` (delete memories)
**Setup:**
```bash
hermes memory setup # select "retaindb"
# Or manually:
hermes config set memory.provider retaindb
echo "RETAINDB_API_KEY=your-key" >> ~/.hermes/.env
```
---
### ByteRover
Persistent memory via the `brv` CLI — hierarchical knowledge tree with tiered retrieval (fuzzy text → LLM-driven search). Local-first with optional cloud sync.
| | |
|---|---|
| **Best for** | Developers who want portable, local-first memory with a CLI |
| **Requires** | ByteRover CLI (`npm install -g byterover-cli` or [install script](https://byterover.dev)) |
| **Data storage** | Local (default) or ByteRover Cloud (optional sync) |
| **Cost** | Free (local) or ByteRover pricing (cloud) |
**Tools:** `brv_query` (search knowledge tree), `brv_curate` (store facts/decisions/patterns), `brv_status` (CLI version + tree stats)
**Setup:**
```bash
# Install the CLI first
curl -fsSL https://byterover.dev/install.sh | sh
# Then configure Hermes
hermes memory setup # select "byterover"
# Or manually:
hermes config set memory.provider byterover
```
**Key features:**
- Automatic pre-compression extraction (saves insights before context compression discards them)
- Knowledge tree stored at `$HERMES_HOME/byterover/` (profile-scoped)
- SOC2 Type II certified cloud sync (optional)
---
### Supermemory
Semantic long-term memory with profile recall, semantic search, explicit memory tools, and session-end conversation ingest via the Supermemory graph API.
| | |
|---|---|
| **Best for** | Semantic recall with user profiling and session-level graph building |
| **Requires** | `pip install supermemory` + [API key](https://supermemory.ai) |
| **Data storage** | Supermemory Cloud |
| **Cost** | Supermemory pricing |
**Tools:** `supermemory_store` (save explicit memories), `supermemory_search` (semantic similarity search), `supermemory_forget` (forget by ID or best-match query), `supermemory_profile` (persistent profile + recent context)
**Setup:**
```bash
hermes memory setup # select "supermemory"
# Or manually:
hermes config set memory.provider supermemory
echo 'SUPERMEMORY_API_KEY=***' >> ~/.hermes/.env
```
**Config:** `$HERMES_HOME/supermemory.json`
| Key | Default | Description |
|-----|---------|-------------|
| `container_tag` | `hermes` | Container tag used for search and writes. Supports `{identity}` template for profile-scoped tags. |
| `auto_recall` | `true` | Inject relevant memory context before turns |
| `auto_capture` | `true` | Store cleaned user-assistant turns after each response |
| `max_recall_results` | `10` | Max recalled items to format into context |
| `profile_frequency` | `50` | Include profile facts on first turn and every N turns |
| `capture_mode` | `all` | Skip tiny or trivial turns by default |
| `search_mode` | `hybrid` | Search mode: `hybrid`, `memories`, or `documents` |
| `api_timeout` | `5.0` | Timeout for SDK and ingest requests |
**Environment variables:** `SUPERMEMORY_API_KEY` (required), `SUPERMEMORY_CONTAINER_TAG` (overrides config).
**Key features:**
- Automatic context fencing — strips recalled memories from captured turns to prevent recursive memory pollution
- Session-end conversation ingest for richer graph-level knowledge building
- Profile facts injected on first turn and at configurable intervals
- Trivial message filtering (skips "ok", "thanks", etc.)
- **Profile-scoped containers** — use `{identity}` in `container_tag` (e.g. `hermes-{identity}` → `hermes-coder`) to isolate memories per Hermes profile
- **Multi-container mode** — enable `enable_custom_container_tags` with a `custom_containers` list to let the agent read/write across named containers. Automatic operations (sync, prefetch) stay on the primary container.
Multi-container example
```json
{
"container_tag": "hermes",
"enable_custom_container_tags": true,
"custom_containers": ["project-alpha", "shared-knowledge"],
"custom_container_instructions": "Use project-alpha for coding context."
}
```
**Support:** [Discord](https://supermemory.link/discord) · [support@supermemory.com](mailto:support@supermemory.com)
---
## Provider Comparison
| Provider | Storage | Cost | Tools | Dependencies | Unique Feature |
|----------|---------|------|-------|-------------|----------------|
| **Honcho** | Cloud | Paid | 5 | `honcho-ai` | Dialectic user modeling + session-scoped context |
| **OpenViking** | Self-hosted | Free | 5 | `openviking` + server | Filesystem hierarchy + tiered loading |
| **Mem0** | Cloud | Paid | 3 | `mem0ai` | Server-side LLM extraction |
| **Hindsight** | Cloud/Local | Free/Paid | 3 | `hindsight-client` | Knowledge graph + reflect synthesis |
| **Holographic** | Local | Free | 2 | None | HRR algebra + trust scoring |
| **RetainDB** | Cloud | $20/mo | 5 | `requests` | Delta compression |
| **ByteRover** | Local/Cloud | Free/Paid | 3 | `brv` CLI | Pre-compression extraction |
| **Supermemory** | Cloud | Paid | 4 | `supermemory` | Context fencing + session graph ingest + multi-container |
## Profile Isolation
Each provider's data is isolated per [profile](/docs/user-guide/profiles):
- **Local storage providers** (Holographic, ByteRover) use `$HERMES_HOME/` paths which differ per profile
- **Config file providers** (Honcho, Mem0, Hindsight, Supermemory) store config in `$HERMES_HOME/` so each profile has its own credentials
- **Cloud providers** (RetainDB) auto-derive profile-scoped project names
- **Env var providers** (OpenViking) are configured via each profile's `.env` file
## Building a Memory Provider
See the [Developer Guide: Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) for how to create your own.
---
# Context Files
# Context Files
Hermes Agent automatically discovers and loads context files that shape how it behaves. Some are project-local and discovered from your working directory. `SOUL.md` is now global to the Hermes instance and is loaded from `HERMES_HOME` only.
## Supported Context Files
| File | Purpose | Discovery |
|------|---------|-----------|
| **.hermes.md** / **HERMES.md** | Project instructions (highest priority) | Walks to git root |
| **AGENTS.md** | Project instructions, conventions, architecture | CWD at startup + subdirectories progressively |
| **CLAUDE.md** | Claude Code context files (also detected) | CWD at startup + subdirectories progressively |
| **SOUL.md** | Global personality and tone customization for this Hermes instance | `HERMES_HOME/SOUL.md` only |
| **.cursorrules** | Cursor IDE coding conventions | CWD only |
| **.cursor/rules/*.mdc** | Cursor IDE rule modules | CWD only |
:::info Priority system
Only **one** project context type is loaded per session (first match wins): `.hermes.md` → `AGENTS.md` → `CLAUDE.md` → `.cursorrules`. **SOUL.md** is always loaded independently as the agent identity (slot #1).
:::
## AGENTS.md
`AGENTS.md` is the primary project context file. It tells the agent how your project is structured, what conventions to follow, and any special instructions.
### Progressive Subdirectory Discovery
At session start, Hermes loads the `AGENTS.md` from your working directory into the system prompt. As the agent navigates into subdirectories during the session (via `read_file`, `terminal`, `search_files`, etc.), it **progressively discovers** context files in those directories and injects them into the conversation at the moment they become relevant.
```
my-project/
├── AGENTS.md ← Loaded at startup (system prompt)
├── frontend/
│ └── AGENTS.md ← Discovered when agent reads frontend/ files
├── backend/
│ └── AGENTS.md ← Discovered when agent reads backend/ files
└── shared/
└── AGENTS.md ← Discovered when agent reads shared/ files
```
This approach has two advantages over loading everything at startup:
- **No system prompt bloat** — subdirectory hints only appear when needed
- **Prompt cache preservation** — the system prompt stays stable across turns
Each subdirectory is checked at most once per session. The discovery also walks up parent directories, so reading `backend/src/main.py` will discover `backend/AGENTS.md` even if `backend/src/` has no context file of its own.
:::info
Subdirectory context files go through the same [security scan](#security-prompt-injection-protection) as startup context files. Malicious files are blocked.
:::
### Example AGENTS.md
```markdown
# Project Context
This is a Next.js 14 web application with a Python FastAPI backend.
## Architecture
- Frontend: Next.js 14 with App Router in `/frontend`
- Backend: FastAPI in `/backend`, uses SQLAlchemy ORM
- Database: PostgreSQL 16
- Deployment: Docker Compose on a Hetzner VPS
## Conventions
- Use TypeScript strict mode for all frontend code
- Python code follows PEP 8, use type hints everywhere
- All API endpoints return JSON with `{data, error, meta}` shape
- Tests go in `__tests__/` directories (frontend) or `tests/` (backend)
## Important Notes
- Never modify migration files directly — use Alembic commands
- The `.env.local` file has real API keys, don't commit it
- Frontend port is 3000, backend is 8000, DB is 5432
```
## SOUL.md
`SOUL.md` controls the agent's personality, tone, and communication style. See the [Personality](/docs/user-guide/features/personality) page for full details.
**Location:**
- `~/.hermes/SOUL.md`
- or `$HERMES_HOME/SOUL.md` if you run Hermes with a custom home directory
Important details:
- Hermes seeds a default `SOUL.md` automatically if one does not exist yet
- Hermes loads `SOUL.md` only from `HERMES_HOME`
- Hermes does not probe the working directory for `SOUL.md`
- If the file is empty, nothing from `SOUL.md` is added to the prompt
- If the file has content, the content is injected verbatim after scanning and truncation
## .cursorrules
Hermes is compatible with Cursor IDE's `.cursorrules` file and `.cursor/rules/*.mdc` rule modules. If these files exist in your project root and no higher-priority context file (`.hermes.md`, `AGENTS.md`, or `CLAUDE.md`) is found, they're loaded as the project context.
This means your existing Cursor conventions automatically apply when using Hermes.
## How Context Files Are Loaded
### At startup (system prompt)
Context files are loaded by `build_context_files_prompt()` in `agent/prompt_builder.py`:
1. **Scan working directory** — checks for `.hermes.md` → `AGENTS.md` → `CLAUDE.md` → `.cursorrules` (first match wins)
2. **Content is read** — each file is read as UTF-8 text
3. **Security scan** — content is checked for prompt injection patterns
4. **Truncation** — files exceeding 20,000 characters are head/tail truncated (70% head, 20% tail, with a marker in the middle)
5. **Assembly** — all sections are combined under a `# Project Context` header
6. **Injection** — the assembled content is added to the system prompt
### During the session (progressive discovery)
`SubdirectoryHintTracker` in `agent/subdirectory_hints.py` watches tool call arguments for file paths:
1. **Path extraction** — after each tool call, file paths are extracted from arguments (`path`, `workdir`, shell commands)
2. **Ancestor walk** — the directory and up to 5 parent directories are checked (stopping at already-visited directories)
3. **Hint loading** — if an `AGENTS.md`, `CLAUDE.md`, or `.cursorrules` is found, it's loaded (first match per directory)
4. **Security scan** — same prompt injection scan as startup files
5. **Truncation** — capped at 8,000 characters per file
6. **Injection** — appended to the tool result, so the model sees it in context naturally
The final prompt section looks roughly like:
```text
# Project Context
The following project context files have been loaded and should be followed:
## AGENTS.md
[Your AGENTS.md content here]
## .cursorrules
[Your .cursorrules content here]
[Your SOUL.md content here]
```
Notice that SOUL content is inserted directly, without extra wrapper text.
## Security: Prompt Injection Protection
All context files are scanned for potential prompt injection before being included. The scanner checks for:
- **Instruction override attempts**: "ignore previous instructions", "disregard your rules"
- **Deception patterns**: "do not tell the user"
- **System prompt overrides**: "system prompt override"
- **Hidden HTML comments**: ``
- **Hidden div elements**: `
`
- **Credential exfiltration**: `curl ... $API_KEY`
- **Secret file access**: `cat .env`, `cat credentials`
- **Invisible characters**: zero-width spaces, bidirectional overrides, word joiners
If any threat pattern is detected, the file is blocked:
```
[BLOCKED: AGENTS.md contained potential prompt injection (prompt_injection). Content not loaded.]
```
:::warning
This scanner protects against common injection patterns, but it's not a substitute for reviewing context files in shared repositories. Always validate AGENTS.md content in projects you didn't author.
:::
## Size Limits
| Limit | Value |
|-------|-------|
| Max chars per file | 20,000 (~7,000 tokens) |
| Head truncation ratio | 70% |
| Tail truncation ratio | 20% |
| Truncation marker | 10% (shows char counts and suggests using file tools) |
When a file exceeds 20,000 characters, the truncation message reads:
```
[...truncated AGENTS.md: kept 14000+4000 of 25000 chars. Use file tools to read the full file.]
```
## Tips for Effective Context Files
:::tip Best practices for AGENTS.md
1. **Keep it concise** — stay well under 20K chars; the agent reads it every turn
2. **Structure with headers** — use `##` sections for architecture, conventions, important notes
3. **Include concrete examples** — show preferred code patterns, API shapes, naming conventions
4. **Mention what NOT to do** — "never modify migration files directly"
5. **List key paths and ports** — the agent uses these for terminal commands
6. **Update as the project evolves** — stale context is worse than no context
:::
### Per-Subdirectory Context
For monorepos, put subdirectory-specific instructions in nested AGENTS.md files:
```markdown
# Frontend Context
- Use `pnpm` not `npm` for package management
- Components go in `src/components/`, pages in `src/app/`
- Use Tailwind CSS, never inline styles
- Run tests with `pnpm test`
```
```markdown
# Backend Context
- Use `poetry` for dependency management
- Run the dev server with `poetry run uvicorn main:app --reload`
- All endpoints need OpenAPI docstrings
- Database models are in `models/`, schemas in `schemas/`
```
---
# Context References
# Context References
Type `@` followed by a reference to inject content directly into your message. Hermes expands the reference inline and appends the content under an `--- Attached Context ---` section.
## Supported References
| Syntax | Description |
|--------|-------------|
| `@file:path/to/file.py` | Inject file contents |
| `@file:path/to/file.py:10-25` | Inject specific line range (1-indexed, inclusive) |
| `@folder:path/to/dir` | Inject directory tree listing with file metadata |
| `@diff` | Inject `git diff` (unstaged working tree changes) |
| `@staged` | Inject `git diff --staged` (staged changes) |
| `@git:5` | Inject last N commits with patches (max 10) |
| `@url:https://example.com` | Fetch and inject web page content |
## Usage Examples
```text
Review @file:src/main.py and suggest improvements
What changed? @diff
Compare @file:old_config.yaml and @file:new_config.yaml
What's in @folder:src/components?
Summarize this article @url:https://arxiv.org/abs/2301.00001
```
Multiple references work in a single message:
```text
Check @file:main.py, and also @file:test.py.
```
Trailing punctuation (`,`, `.`, `;`, `!`, `?`) is automatically stripped from reference values.
## CLI Tab Completion
In the interactive CLI, typing `@` triggers autocomplete:
- `@` shows all reference types (`@diff`, `@staged`, `@file:`, `@folder:`, `@git:`, `@url:`)
- `@file:` and `@folder:` trigger filesystem path completion with file size metadata
- Bare `@` followed by partial text shows matching files and folders from the current directory
## Line Ranges
The `@file:` reference supports line ranges for precise content injection:
```text
@file:src/main.py:42 # Single line 42
@file:src/main.py:10-25 # Lines 10 through 25 (inclusive)
```
Lines are 1-indexed. Invalid ranges are silently ignored (full file is returned).
## Size Limits
Context references are bounded to prevent overwhelming the model's context window:
| Threshold | Value | Behavior |
|-----------|-------|----------|
| Soft limit | 25% of context length | Warning appended, expansion proceeds |
| Hard limit | 50% of context length | Expansion refused, original message returned unchanged |
| Folder entries | 200 files max | Excess entries replaced with `- ...` |
| Git commits | 10 max | `@git:N` clamped to range [1, 10] |
## Security
### Sensitive Path Blocking
These paths are always blocked from `@file:` references to prevent credential exposure:
- SSH keys and config: `~/.ssh/id_rsa`, `~/.ssh/id_ed25519`, `~/.ssh/authorized_keys`, `~/.ssh/config`
- Shell profiles: `~/.bashrc`, `~/.zshrc`, `~/.profile`, `~/.bash_profile`, `~/.zprofile`
- Credential files: `~/.netrc`, `~/.pgpass`, `~/.npmrc`, `~/.pypirc`
- Hermes env: `$HERMES_HOME/.env`
These directories are fully blocked (any file inside):
- `~/.ssh/`, `~/.aws/`, `~/.gnupg/`, `~/.kube/`, `$HERMES_HOME/skills/.hub/`
### Path Traversal Protection
All paths are resolved relative to the working directory. References that resolve outside the allowed workspace root are rejected.
### Binary File Detection
Binary files are detected via MIME type and null-byte scanning. Known text extensions (`.py`, `.md`, `.json`, `.yaml`, `.toml`, `.js`, `.ts`, etc.) bypass MIME-based detection. Binary files are rejected with a warning.
## Platform Availability
Context references are primarily a **CLI feature**. They work in the interactive CLI where `@` triggers tab completion and references are expanded before the message is sent to the agent.
In **messaging platforms** (Telegram, Discord, etc.), the `@` syntax is not expanded by the gateway — messages are passed through as-is. The agent itself can still reference files via the `read_file`, `search_files`, and `web_extract` tools.
## Interaction with Context Compression
When conversation context is compressed, the expanded reference content is included in the compression summary. This means:
- Large file contents injected via `@file:` contribute to context usage
- If the conversation is later compressed, the file content is summarized (not preserved verbatim)
- For very large files, consider using line ranges (`@file:main.py:100-200`) to inject only relevant sections
## Common Patterns
```text
# Code review workflow
Review @diff and check for security issues
# Debug with context
This test is failing. Here's the test @file:tests/test_auth.py
and the implementation @file:src/auth.py:50-80
# Project exploration
What does this project do? @folder:src @file:README.md
# Research
Compare the approaches in @url:https://arxiv.org/abs/2301.00001
and @url:https://arxiv.org/abs/2301.00002
```
## Error Handling
Invalid references produce inline warnings rather than failures:
| Condition | Behavior |
|-----------|----------|
| File not found | Warning: "file not found" |
| Binary file | Warning: "binary files are not supported" |
| Folder not found | Warning: "folder not found" |
| Git command fails | Warning with git stderr |
| URL returns no content | Warning: "no content extracted" |
| Sensitive path | Warning: "path is a sensitive credential file" |
| Path outside workspace | Warning: "path is outside the allowed workspace" |
---
# Personality & SOUL.md
# Personality & SOUL.md
Hermes Agent's personality is fully customizable. `SOUL.md` is the **primary identity** — it's the first thing in the system prompt and defines who the agent is.
- `SOUL.md` — a durable persona file that lives in `HERMES_HOME` and serves as the agent's identity (slot #1 in the system prompt)
- built-in or custom `/personality` presets — session-level system-prompt overlays
If you want to change who Hermes is — or replace it with an entirely different agent persona — edit `SOUL.md`.
## How SOUL.md works now
Hermes now seeds a default `SOUL.md` automatically in:
```text
~/.hermes/SOUL.md
```
More precisely, it uses the current instance's `HERMES_HOME`, so if you run Hermes with a custom home directory, it will use:
```text
$HERMES_HOME/SOUL.md
```
### Important behavior
- **SOUL.md is the agent's primary identity.** It occupies slot #1 in the system prompt, replacing the hardcoded default identity.
- Hermes creates a starter `SOUL.md` automatically if one does not exist yet
- Existing user `SOUL.md` files are never overwritten
- Hermes loads `SOUL.md` only from `HERMES_HOME`
- Hermes does not look in the current working directory for `SOUL.md`
- If `SOUL.md` exists but is empty, or cannot be loaded, Hermes falls back to a built-in default identity
- If `SOUL.md` has content, that content is injected verbatim after security scanning and truncation
- SOUL.md is **not** duplicated in the context files section — it appears only once, as the identity
That makes `SOUL.md` a true per-user or per-instance identity, not just an additive layer.
## Why this design
This keeps personality predictable.
If Hermes loaded `SOUL.md` from whatever directory you happened to launch it in, your personality could change unexpectedly between projects. By loading only from `HERMES_HOME`, the personality belongs to the Hermes instance itself.
That also makes it easier to teach users:
- "Edit `~/.hermes/SOUL.md` to change Hermes' default personality."
## Where to edit it
For most users:
```bash
~/.hermes/SOUL.md
```
If you use a custom home:
```bash
$HERMES_HOME/SOUL.md
```
## What should go in SOUL.md?
Use it for durable voice and personality guidance, such as:
- tone
- communication style
- level of directness
- default interaction style
- what to avoid stylistically
- how Hermes should handle uncertainty, disagreement, or ambiguity
Use it less for:
- one-off project instructions
- file paths
- repo conventions
- temporary workflow details
Those belong in `AGENTS.md`, not `SOUL.md`.
## Good SOUL.md content
A good SOUL file is:
- stable across contexts
- broad enough to apply in many conversations
- specific enough to materially shape the voice
- focused on communication and identity, not task-specific instructions
### Example
```markdown
# Personality
You are a pragmatic senior engineer with strong taste.
You optimize for truth, clarity, and usefulness over politeness theater.
## Style
- Be direct without being cold
- Prefer substance over filler
- Push back when something is a bad idea
- Admit uncertainty plainly
- Keep explanations compact unless depth is useful
## What to avoid
- Sycophancy
- Hype language
- Repeating the user's framing if it's wrong
- Overexplaining obvious things
## Technical posture
- Prefer simple systems over clever systems
- Care about operational reality, not idealized architecture
- Treat edge cases as part of the design, not cleanup
```
## What Hermes injects into the prompt
`SOUL.md` content goes directly into slot #1 of the system prompt — the agent identity position. No wrapper language is added around it.
The content goes through:
- prompt-injection scanning
- truncation if it is too large
If the file is empty, whitespace-only, or cannot be read, Hermes falls back to a built-in default identity ("You are Hermes Agent, an intelligent AI assistant created by Nous Research..."). This fallback also applies when `skip_context_files` is set (e.g., in subagent/delegation contexts).
## Security scanning
`SOUL.md` is scanned like other context-bearing files for prompt injection patterns before inclusion.
That means you should still keep it focused on persona/voice rather than trying to sneak in strange meta-instructions.
## SOUL.md vs AGENTS.md
This is the most important distinction.
### SOUL.md
Use for:
- identity
- tone
- style
- communication defaults
- personality-level behavior
### AGENTS.md
Use for:
- project architecture
- coding conventions
- tool preferences
- repo-specific workflows
- commands, ports, paths, deployment notes
A useful rule:
- if it should follow you everywhere, it belongs in `SOUL.md`
- if it belongs to a project, it belongs in `AGENTS.md`
## SOUL.md vs `/personality`
`SOUL.md` is your durable default personality.
`/personality` is a session-level overlay that changes or supplements the current system prompt.
So:
- `SOUL.md` = baseline voice
- `/personality` = temporary mode switch
Examples:
- keep a pragmatic default SOUL, then use `/personality teacher` for a tutoring conversation
- keep a concise SOUL, then use `/personality creative` for brainstorming
## Built-in personalities
Hermes ships with built-in personalities you can switch to with `/personality`.
| Name | Description |
|------|-------------|
| **helpful** | Friendly, general-purpose assistant |
| **concise** | Brief, to-the-point responses |
| **technical** | Detailed, accurate technical expert |
| **creative** | Innovative, outside-the-box thinking |
| **teacher** | Patient educator with clear examples |
| **kawaii** | Cute expressions, sparkles, and enthusiasm ★ |
| **catgirl** | Neko-chan with cat-like expressions, nya~ |
| **pirate** | Captain Hermes, tech-savvy buccaneer |
| **shakespeare** | Bardic prose with dramatic flair |
| **surfer** | Totally chill bro vibes |
| **noir** | Hard-boiled detective narration |
| **uwu** | Maximum cute with uwu-speak |
| **philosopher** | Deep contemplation on every query |
| **hype** | MAXIMUM ENERGY AND ENTHUSIASM!!! |
## Switching personalities with commands
### CLI
```text
/personality
/personality concise
/personality technical
```
### Messaging platforms
```text
/personality teacher
```
These are convenient overlays, but your global `SOUL.md` still gives Hermes its persistent default personality unless the overlay meaningfully changes it.
## Custom personalities in config
You can also define named custom personalities in `~/.hermes/config.yaml` under `agent.personalities`.
```yaml
agent:
personalities:
codereviewer: >
You are a meticulous code reviewer. Identify bugs, security issues,
performance concerns, and unclear design choices. Be precise and constructive.
```
Then switch to it with:
```text
/personality codereviewer
```
## Recommended workflow
A strong default setup is:
1. Keep a thoughtful global `SOUL.md` in `~/.hermes/SOUL.md`
2. Put project instructions in `AGENTS.md`
3. Use `/personality` only when you want a temporary mode shift
That gives you:
- a stable voice
- project-specific behavior where it belongs
- temporary control when needed
## How personality interacts with the full prompt
At a high level, the prompt stack includes:
1. **SOUL.md** (agent identity — or built-in fallback if SOUL.md is unavailable)
2. tool-aware behavior guidance
3. memory/user context
4. skills guidance
5. context files (`AGENTS.md`, `.cursorrules`)
6. timestamp
7. platform-specific formatting hints
8. optional system-prompt overlays such as `/personality`
`SOUL.md` is the foundation — everything else builds on top of it.
## Related docs
- [Context Files](/docs/user-guide/features/context-files)
- [Configuration](/docs/user-guide/configuration)
- [Tips & Best Practices](/docs/guides/tips)
- [SOUL.md Guide](/docs/guides/use-soul-with-hermes)
## CLI appearance vs conversational personality
Conversational personality and CLI appearance are separate:
- `SOUL.md`, `agent.system_prompt`, and `/personality` affect how Hermes speaks
- `display.skin` and `/skin` affect how Hermes looks in the terminal
For terminal appearance, see [Skins & Themes](./skins.md).
---
# Plugins
# Plugins
Hermes has a plugin system for adding custom tools, hooks, and integrations without modifying core code.
If you want to create a custom tool for yourself, your team, or one project,
this is usually the right path. The developer guide's
[Adding Tools](/docs/developer-guide/adding-tools) page is for built-in Hermes
core tools that live in `tools/` and `toolsets.py`.
**→ [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin)** — step-by-step guide with a complete working example.
## Quick overview
Drop a directory into `~/.hermes/plugins/` with a `plugin.yaml` and Python code:
```
~/.hermes/plugins/my-plugin/
├── plugin.yaml # manifest
├── __init__.py # register() — wires schemas to handlers
├── schemas.py # tool schemas (what the LLM sees)
└── tools.py # tool handlers (what runs when called)
```
Start Hermes — your tools appear alongside built-in tools. The model can call them immediately.
### Minimal working example
Here is a complete plugin that adds a `hello_world` tool and logs every tool call via a hook.
**`~/.hermes/plugins/hello-world/plugin.yaml`**
```yaml
name: hello-world
version: "1.0"
description: A minimal example plugin
```
**`~/.hermes/plugins/hello-world/__init__.py`**
```python
"""Minimal Hermes plugin — registers a tool and a hook."""
import json
def register(ctx):
# --- Tool: hello_world ---
schema = {
"name": "hello_world",
"description": "Returns a friendly greeting for the given name.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Name to greet",
}
},
"required": ["name"],
},
}
def handle_hello(params, **kwargs):
del kwargs
name = params.get("name", "World")
return json.dumps({"success": True, "greeting": f"Hello, {name}!"})
ctx.register_tool(
name="hello_world",
toolset="hello_world",
schema=schema,
handler=handle_hello,
description="Return a friendly greeting for the given name.",
)
# --- Hook: log every tool call ---
def on_tool_call(tool_name, params, result):
print(f"[hello-world] tool called: {tool_name}")
ctx.register_hook("post_tool_call", on_tool_call)
```
Drop both files into `~/.hermes/plugins/hello-world/`, restart Hermes, and the model can immediately call `hello_world`. The hook prints a log line after every tool invocation.
Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable them only for trusted repositories by setting `HERMES_ENABLE_PROJECT_PLUGINS=true` before starting Hermes.
## What plugins can do
| Capability | How |
|-----------|-----|
| Add tools | `ctx.register_tool(name=..., toolset=..., schema=..., handler=...)` |
| Add hooks | `ctx.register_hook("post_tool_call", callback)` |
| Add slash commands | `ctx.register_command(name, handler, description)` — adds `/name` in CLI and gateway sessions |
| Dispatch tools from commands | `ctx.dispatch_tool(name, args)` — invokes a registered tool with parent-agent context auto-wired |
| Add CLI commands | `ctx.register_cli_command(name, help, setup_fn, handler_fn)` — adds `hermes ` |
| Inject messages | `ctx.inject_message(content, role="user")` — see [Injecting Messages](#injecting-messages) |
| Ship data files | `Path(__file__).parent / "data" / "file.yaml"` |
| Bundle skills | `ctx.register_skill(name, path)` — namespaced as `plugin:skill`, loaded via `skill_view("plugin:skill")` |
| Gate on env vars | `requires_env: [API_KEY]` in plugin.yaml — prompted during `hermes plugins install` |
| Distribute via pip | `[project.entry-points."hermes_agent.plugins"]` |
## Plugin discovery
| Source | Path | Use case |
|--------|------|----------|
| Bundled | `/plugins/` | Ships with Hermes — see [Built-in Plugins](/docs/user-guide/features/built-in-plugins) |
| User | `~/.hermes/plugins/` | Personal plugins |
| Project | `.hermes/plugins/` | Project-specific plugins (requires `HERMES_ENABLE_PROJECT_PLUGINS=true`) |
| pip | `hermes_agent.plugins` entry_points | Distributed packages |
| Nix | `services.hermes-agent.extraPlugins` / `extraPythonPackages` | NixOS declarative installs — see [Nix Setup](/docs/getting-started/nix-setup#plugins) |
Later sources override earlier ones on name collision, so a user plugin with the same name as a bundled plugin replaces it.
## Plugins are opt-in
**Every plugin — user-installed, bundled, or pip — is disabled by default.** Discovery finds them (so they show up in `hermes plugins` and `/plugins`), but nothing loads until you add the plugin's name to `plugins.enabled` in `~/.hermes/config.yaml`. This stops anything with hooks or tools from running without your explicit consent.
```yaml
plugins:
enabled:
- my-tool-plugin
- disk-cleanup
disabled: # optional deny-list — always wins if a name appears in both
- noisy-plugin
```
Three ways to flip state:
```bash
hermes plugins # interactive toggle (space to check/uncheck)
hermes plugins enable # add to allow-list
hermes plugins disable # remove from allow-list + add to disabled
```
After `hermes plugins install owner/repo`, you're asked `Enable 'name' now? [y/N]` — defaults to no. Skip the prompt for scripted installs with `--enable` or `--no-enable`.
### Migration for existing users
When you upgrade to a version of Hermes that has opt-in plugins (config schema v21+), any user plugins already installed under `~/.hermes/plugins/` that weren't already in `plugins.disabled` are **automatically grandfathered** into `plugins.enabled`. Your existing setup keeps working. Bundled plugins are NOT grandfathered — even existing users have to opt in explicitly.
## Available hooks
Plugins can register callbacks for these lifecycle events. See the **[Event Hooks page](/docs/user-guide/features/hooks#plugin-hooks)** for full details, callback signatures, and examples.
| Hook | Fires when |
|------|-----------|
| [`pre_tool_call`](/docs/user-guide/features/hooks#pre_tool_call) | Before any tool executes |
| [`post_tool_call`](/docs/user-guide/features/hooks#post_tool_call) | After any tool returns |
| [`pre_llm_call`](/docs/user-guide/features/hooks#pre_llm_call) | Once per turn, before the LLM loop — can return `{"context": "..."}` to [inject context into the user message](/docs/user-guide/features/hooks#pre_llm_call) |
| [`post_llm_call`](/docs/user-guide/features/hooks#post_llm_call) | Once per turn, after the LLM loop (successful turns only) |
| [`on_session_start`](/docs/user-guide/features/hooks#on_session_start) | New session created (first turn only) |
| [`on_session_end`](/docs/user-guide/features/hooks#on_session_end) | End of every `run_conversation` call + CLI exit handler |
| [`on_session_finalize`](/docs/user-guide/features/hooks#on_session_finalize) | CLI/gateway tears down an active session (`/new`, GC, CLI quit) |
| [`on_session_reset`](/docs/user-guide/features/hooks#on_session_reset) | Gateway swaps in a new session key (`/new`, `/reset`, `/clear`, idle rotation) |
| [`subagent_stop`](/docs/user-guide/features/hooks#subagent_stop) | Once per child after `delegate_task` finishes |
| [`pre_gateway_dispatch`](/docs/user-guide/features/hooks#pre_gateway_dispatch) | Gateway received a user message, before auth + dispatch. Return `{"action": "skip" \| "rewrite" \| "allow", ...}` to influence flow. |
## Plugin types
Hermes has three kinds of plugins:
| Type | What it does | Selection | Location |
|------|-------------|-----------|----------|
| **General plugins** | Add tools, hooks, slash commands, CLI commands | Multi-select (enable/disable) | `~/.hermes/plugins/` |
| **Memory providers** | Replace or augment built-in memory | Single-select (one active) | `plugins/memory/` |
| **Context engines** | Replace the built-in context compressor | Single-select (one active) | `plugins/context_engine/` |
Memory providers and context engines are **provider plugins** — only one of each type can be active at a time. General plugins can be enabled in any combination.
## NixOS declarative plugins
On NixOS, plugins can be installed declaratively via the module options — no `hermes plugins install` needed. See the **[Nix Setup guide](/docs/getting-started/nix-setup#plugins)** for full details.
```nix
services.hermes-agent = {
# Directory plugin (source tree with plugin.yaml)
extraPlugins = [ (pkgs.fetchFromGitHub { ... }) ];
# Entry-point plugin (pip package)
extraPythonPackages = [ (pkgs.python312Packages.buildPythonPackage { ... }) ];
# Enable in config
settings.plugins.enabled = [ "my-plugin" ];
};
```
Declarative plugins are symlinked with a `nix-managed-` prefix — they coexist with manually installed plugins and are cleaned up automatically when removed from the Nix config.
## Managing plugins
```bash
hermes plugins # unified interactive UI
hermes plugins list # table: enabled / disabled / not enabled
hermes plugins install user/repo # install from Git, then prompt Enable? [y/N]
hermes plugins install user/repo --enable # install AND enable (no prompt)
hermes plugins install user/repo --no-enable # install but leave disabled (no prompt)
hermes plugins update my-plugin # pull latest
hermes plugins remove my-plugin # uninstall
hermes plugins enable my-plugin # add to allow-list
hermes plugins disable my-plugin # remove from allow-list + add to disabled
```
### Interactive UI
Running `hermes plugins` with no arguments opens a composite interactive screen:
```
Plugins
↑↓ navigate SPACE toggle ENTER configure/confirm ESC done
General Plugins
→ [✓] my-tool-plugin — Custom search tool
[ ] webhook-notifier — Event hooks
[ ] disk-cleanup — Auto-cleanup of ephemeral files [bundled]
Provider Plugins
Memory Provider ▸ honcho
Context Engine ▸ compressor
```
- **General Plugins section** — checkboxes, toggle with SPACE. Checked = in `plugins.enabled`, unchecked = in `plugins.disabled` (explicit off).
- **Provider Plugins section** — shows current selection. Press ENTER to drill into a radio picker where you choose one active provider.
- Bundled plugins appear in the same list with a `[bundled]` tag.
Provider plugin selections are saved to `config.yaml`:
```yaml
memory:
provider: "honcho" # empty string = built-in only
context:
engine: "compressor" # default built-in compressor
```
### Enabled vs. disabled vs. neither
Plugins occupy one of three states:
| State | Meaning | In `plugins.enabled`? | In `plugins.disabled`? |
|---|---|---|---|
| `enabled` | Loaded on next session | Yes | No |
| `disabled` | Explicitly off — won't load even if also in `enabled` | (irrelevant) | Yes |
| `not enabled` | Discovered but never opted in | No | No |
The default for a newly-installed or bundled plugin is `not enabled`. `hermes plugins list` shows all three distinct states so you can tell what's been explicitly turned off vs. what's just waiting to be enabled.
In a running session, `/plugins` shows which plugins are currently loaded.
## Injecting Messages
Plugins can inject messages into the active conversation using `ctx.inject_message()`:
```python
ctx.inject_message("New data arrived from the webhook", role="user")
```
**Signature:** `ctx.inject_message(content: str, role: str = "user") -> bool`
How it works:
- If the agent is **idle** (waiting for user input), the message is queued as the next input and starts a new turn.
- If the agent is **mid-turn** (actively running), the message interrupts the current operation — the same as a user typing a new message and pressing Enter.
- For non-`"user"` roles, the content is prefixed with `[role]` (e.g. `[system] ...`).
- Returns `True` if the message was queued successfully, `False` if no CLI reference is available (e.g. in gateway mode).
This enables plugins like remote control viewers, messaging bridges, or webhook receivers to feed messages into the conversation from external sources.
:::note
`inject_message` is only available in CLI mode. In gateway mode, there is no CLI reference and the method returns `False`.
:::
See the **[full guide](/docs/guides/build-a-hermes-plugin)** for handler contracts, schema format, hook behavior, error handling, and common mistakes.
---
# Built-in Plugins
# Built-in Plugins
Hermes ships a small set of plugins bundled with the repository. They live under `/plugins//` and load automatically alongside user-installed plugins in `~/.hermes/plugins/`. They use the same plugin surface as third-party plugins — hooks, tools, slash commands — just maintained in-tree.
See the [Plugins](/docs/user-guide/features/plugins) page for the general plugin system, and [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin) to write your own.
## How discovery works
The `PluginManager` scans four sources, in order:
1. **Bundled** — `/plugins//` (what this page documents)
2. **User** — `~/.hermes/plugins//`
3. **Project** — `./.hermes/plugins//` (requires `HERMES_ENABLE_PROJECT_PLUGINS=1`)
4. **Pip entry points** — `hermes_agent.plugins`
On name collision, later sources win — a user plugin named `disk-cleanup` would replace the bundled one.
`plugins/memory/` and `plugins/context_engine/` are deliberately excluded from bundled scanning. Those directories use their own discovery paths because memory providers and context engines are single-select providers configured through `hermes memory setup` / `context.engine` in config.
## Bundled plugins are opt-in
Bundled plugins ship disabled. Discovery finds them (they appear in `hermes plugins list` and the interactive `hermes plugins` UI), but none load until you explicitly enable them:
```bash
hermes plugins enable disk-cleanup
```
Or via `~/.hermes/config.yaml`:
```yaml
plugins:
enabled:
- disk-cleanup
```
This is the same mechanism user-installed plugins use. Bundled plugins are never auto-enabled — not on fresh install, not for existing users upgrading to a newer Hermes. You always opt in explicitly.
To turn a bundled plugin off again:
```bash
hermes plugins disable disk-cleanup
# or: remove it from plugins.enabled in config.yaml
```
## Currently shipped
The repo ships these bundled plugins under `plugins/`. All are opt-in — enable them via `hermes plugins enable `.
| Plugin | Kind | Purpose |
|---|---|---|
| `disk-cleanup` | hooks + slash command | Auto-track ephemeral files and clean them on session end |
| `observability/langfuse` | hooks | Trace turns / LLM calls / tools to [Langfuse](https://langfuse.com) |
| `spotify` | backend (7 tools) | Native Spotify playback, queue, search, playlists, albums, library |
| `google_meet` | standalone | Join Meet calls, live-caption transcription, optional realtime duplex audio |
| `image_gen/openai` | image backend | OpenAI `gpt-image-2` image generation backend (alternative to FAL) |
| `image_gen/openai-codex` | image backend | OpenAI image generation via Codex OAuth |
| `image_gen/xai` | image backend | xAI `grok-2-image` backend |
| `hermes-achievements` | dashboard tab | Steam-style collectible badges generated from your real Hermes session history |
| `example-dashboard` | dashboard example | Reference dashboard plugin for [Extending the Dashboard](./extending-the-dashboard.md) |
| `strike-freedom-cockpit` | dashboard skin | Sample custom dashboard skin |
Memory providers (`plugins/memory/*`) and context engines (`plugins/context_engine/*`) are listed separately on [Memory Providers](./memory-providers.md) — they're managed through `hermes memory` and `hermes plugins` respectively. The full per-plugin detail for the two long-running hooks-based plugins follows.
### disk-cleanup
Auto-tracks and removes ephemeral files created during sessions — test scripts, temp outputs, cron logs, stale chrome profiles — without requiring the agent to remember to call a tool.
**How it works:**
| Hook | Behaviour |
|---|---|
| `post_tool_call` | When `write_file` / `terminal` / `patch` creates a file matching `test_*`, `tmp_*`, or `*.test.*` inside `HERMES_HOME` or `/tmp/hermes-*`, track it silently as `test` / `temp` / `cron-output`. |
| `on_session_end` | If any test files were auto-tracked during the turn, run the safe `quick` cleanup and log a one-line summary. Stays silent otherwise. |
**Deletion rules:**
| Category | Threshold | Confirmation |
|---|---|---|
| `test` | every session end | Never |
| `temp` | >7 days since tracked | Never |
| `cron-output` | >14 days since tracked | Never |
| empty dirs under HERMES_HOME | always | Never |
| `research` | >30 days, beyond 10 newest | Always (deep only) |
| `chrome-profile` | >14 days since tracked | Always (deep only) |
| files >500 MB | never auto | Always (deep only) |
**Slash command** — `/disk-cleanup` available in both CLI and gateway sessions:
```
/disk-cleanup status # breakdown + top-10 largest
/disk-cleanup dry-run # preview without deleting
/disk-cleanup quick # run safe cleanup now
/disk-cleanup deep # quick + list items needing confirmation
/disk-cleanup track # manual tracking
/disk-cleanup forget # stop tracking (does not delete)
```
**State** — everything lives at `$HERMES_HOME/disk-cleanup/`:
| File | Contents |
|---|---|
| `tracked.json` | Tracked paths with category, size, and timestamp |
| `tracked.json.bak` | Atomic-write backup of the above |
| `cleanup.log` | Append-only audit trail of every track / skip / reject / delete |
**Safety** — cleanup only ever touches paths under `HERMES_HOME` or `/tmp/hermes-*`. Windows mounts (`/mnt/c/...`) are rejected. Well-known top-level state dirs (`logs/`, `memories/`, `sessions/`, `cron/`, `cache/`, `skills/`, `plugins/`, `disk-cleanup/` itself) are never removed even when empty — a fresh install does not get gutted on first session end.
**Enabling:** `hermes plugins enable disk-cleanup` (or check the box in `hermes plugins`).
**Disabling again:** `hermes plugins disable disk-cleanup`.
### observability/langfuse
Traces Hermes turns, LLM calls, and tool invocations to [Langfuse](https://langfuse.com) — an open-source LLM observability platform. One span per turn, one generation per API call, one tool observation per tool call. Usage totals, per-type token counts, and cost estimates come out of Hermes' canonical `agent.usage_pricing` numbers, so the Langfuse dashboard sees the same breakdown (input / output / `cache_read_input_tokens` / `cache_creation_input_tokens` / `reasoning_tokens`) that appears in `hermes logs`.
The plugin is fail-open: no SDK installed, no credentials, or a transient Langfuse error — all turn into a silent no-op in the hook. The agent loop is never impacted.
**Setup (interactive — recommended):**
```bash
hermes tools # → Langfuse Observability → Cloud or Self-Hosted
```
The wizard collects your keys, `pip install`s the `langfuse` SDK, and adds `observability/langfuse` to `plugins.enabled` for you. Restart Hermes and the next turn ships a trace.
**Setup (manual):**
```bash
pip install langfuse
hermes plugins enable observability/langfuse
```
Then put the credentials in `~/.hermes/.env`:
```bash
HERMES_LANGFUSE_PUBLIC_KEY=pk-lf-...
HERMES_LANGFUSE_SECRET_KEY=sk-lf-...
HERMES_LANGFUSE_BASE_URL=https://cloud.langfuse.com # or your self-hosted URL
```
**How it works:**
| Hook | Behaviour |
|---|---|
| `pre_api_request` / `pre_llm_call` | Open (or reuse) a per-turn root span "Hermes turn". Start a `generation` child observation for this API call with serialized recent messages as input. |
| `post_api_request` / `post_llm_call` | Close the generation, attach `usage_details`, `cost_details`, `finish_reason`, assistant output + tool calls. If no tool calls and non-empty content, close the turn. |
| `pre_tool_call` | Start a `tool` child observation with sanitized `args`. |
| `post_tool_call` | Close the tool observation with sanitized `result`. `read_file` payloads get summarized (head + tail + omitted-line count) so a huge file read stays under `HERMES_LANGFUSE_MAX_CHARS`. |
Session grouping keys off the Hermes session ID (or task ID for sub-agents) via `langfuse.propagate_attributes`, so everything in a single `hermes chat` session lives under one Langfuse session.
**Verify:**
```bash
hermes plugins list # observability/langfuse should show "enabled"
hermes chat -q "hello" # check the Langfuse UI for a "Hermes turn" trace
```
**Optional tuning** (in `.env`):
| Variable | Default | Purpose |
|---|---|---|
| `HERMES_LANGFUSE_ENV` | — | Environment tag on traces (`production`, `staging`, …) |
| `HERMES_LANGFUSE_RELEASE` | — | Release/version tag |
| `HERMES_LANGFUSE_SAMPLE_RATE` | `1.0` | Sampling rate passed to the SDK (0.0–1.0) |
| `HERMES_LANGFUSE_MAX_CHARS` | `12000` | Per-field truncation for message content / tool args / tool results |
| `HERMES_LANGFUSE_DEBUG` | `false` | Verbose plugin logging to `agent.log` |
Hermes-prefixed and standard SDK env vars (`LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`, `LANGFUSE_BASE_URL`) are both accepted — Hermes-prefixed wins when both are set.
**Performance:** the Langfuse client is cached after the first hook call. If credentials or SDK are missing, that decision is also cached — subsequent hooks fast-return without re-checking env vars or reloading config.
**Disabling:** `hermes plugins disable observability/langfuse`. The plugin module is still discovered, but no module code runs until you re-enable.
### google_meet
Lets the agent **join, transcribe, and participate in Google Meet calls** — take notes on a meeting, summarize the back-and-forth after, follow up on specific points, and (optionally) speak replies back into the call via TTS.
**What it adds:**
- A headless virtual participant that joins a Meet URL using browser automation
- Live transcription of the meeting audio via the configured STT provider
- A `meet_summarize` / `meet_speak` / `meet_followup` toolset the agent invokes to act on what it heard
- Post-meeting artifacts (transcript, speaker-attributed notes, action items) saved under `~/.hermes/cache/google_meet//`
**Setup:**
```bash
hermes plugins enable google_meet
# Prompts you to sign in via the plugin's OAuth flow on first use —
# needs a Google account with Meet access. Host approval may be required
# if the meeting enforces "only invited participants can join".
```
Usage from chat:
> "Join meet.google.com/abc-defg-hij and take notes. After the call, send me a summary with action items."
The agent kicks off the meeting join, streams the transcription back into its context as the call proceeds, and produces a structured summary when the meeting ends (or when you tell it to stop).
**When to use it:** recurring standups where you want a bot to transcribe + summarize for async attendees; deposition-style interviews where you want structured notes; any case where you'd otherwise need Fireflies / Otter / Grain. When you'd rather not have an AI listening in — don't enable it.
**Disabling:** `hermes plugins disable google_meet`. Any cached transcripts and recordings stay in `~/.hermes/cache/google_meet/` until you remove them.
### hermes-achievements
Adds a **Steam-style achievements tab to the dashboard** — 60+ collectible, tiered badges generated from your real Hermes session history. Tool-chain feats, debugging patterns, vibe-coding streaks, skill/memory usage, model/provider variety, lifestyle quirks (weekend and night sessions). Originally authored by [@PCinkusz](https://github.com/PCinkusz) as an external plugin; brought in-tree so it stays in lockstep with Hermes feature changes.
**How it works:**
- Scans your entire `~/.hermes/state.db` session history on the dashboard backend
- Per-session stats are cached by `(started_at, last_active)` fingerprint, so only new or changed sessions re-analyze on subsequent scans
- First-ever scan runs in a background thread — the dashboard never blocks waiting for it, even on databases with thousands of sessions
- Unlock state is persisted to `$HERMES_HOME/plugins/hermes-achievements/state.json`
**Tier progression:** Copper → Silver → Gold → Diamond → Olympian. Each card exposes a "What counts" section listing the exact metric being tracked.
**Achievement states:**
| State | Meaning |
|---|---|
| Unlocked | At least one tier achieved |
| Discovered | Known achievement, progress visible, not yet earned |
| Secret | Hidden until Hermes detects the first related signal in your history |
**API** — routes mount under `/api/plugins/hermes-achievements/`:
| Endpoint | Purpose |
|---|---|
| `GET /achievements` | Full catalog with per-badge unlock state (returns a pending placeholder while the first cold scan is running) |
| `GET /scan-status` | State of the background scanner: `idle` / `running` / `failed`, last duration, run count |
| `GET /recent-unlocks` | Twenty most recently unlocked badges, newest first |
| `GET /sessions/{id}/badges` | Badges earned primarily in one specific session |
| `POST /rescan` | Manual synchronous rescan (blocks; use when the user clicks the rescan button) |
| `POST /reset-state` | Clear unlock history and cached snapshot |
**State files** — live under `$HERMES_HOME/plugins/hermes-achievements/`:
| File | Contents |
|---|---|
| `state.json` | Unlock history: which badges you've earned and when. Stable across Hermes updates. |
| `scan_snapshot.json` | Last completed scan payload (served immediately on dashboard load) |
| `scan_checkpoint.json` | Per-session stats cache keyed by fingerprint (makes warm rescans fast) |
**Performance notes:**
- Cold scan on ~8,000 sessions takes a few minutes. It runs in a background thread on first dashboard request; the UI sees a pending placeholder and polls `/scan-status`.
- **Incremental results during a cold scan** — the scanner publishes a partial snapshot every ~250 sessions so each dashboard refresh shows more badges unlocked as the scan progresses. No minute-long stare at zeros.
- Warm rescan reuses per-session stats for every session whose `started_at` + `last_active` fingerprint matches the checkpoint — completes in seconds even on large histories.
- The in-memory snapshot TTL is 120s; stale requests serve the old snapshot immediately and kick a background refresh. You never wait on a spinner just because TTL expired.
**Enabling:** Nothing to enable — `hermes-achievements` is a dashboard-only plugin (no lifecycle hooks, no model-visible tools). It auto-registers as a tab in `hermes dashboard` on first launch. The `plugins.enabled` config only gates lifecycle/tool plugins; dashboard plugins are discovered purely via their `dashboard/manifest.json`.
**Opting out:** Delete or rename `plugins/hermes-achievements/dashboard/manifest.json`, or override it with a user plugin of the same name in `~/.hermes/plugins/hermes-achievements/` that ships no dashboard. The plugin's state files under `$HERMES_HOME/plugins/hermes-achievements/` survive — reinstalling preserves your unlock history.
## Adding a bundled plugin
Bundled plugins are written exactly like any other Hermes plugin — see [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin). The only differences are:
- Directory lives at `/plugins//` instead of `~/.hermes/plugins//`
- Manifest source is reported as `bundled` in `hermes plugins list`
- User plugins with the same name override the bundled version
A plugin is a good candidate for bundling when:
- It has no optional dependencies (or they're already `pip install .[all]` deps)
- The behaviour benefits most users and is opt-out rather than opt-in
- The logic ties into lifecycle hooks that the agent would otherwise have to remember to invoke
- It complements a core capability without expanding the model-visible tool surface
Counter-examples — things that should stay as user-installable plugins, not bundled: third-party integrations with API keys, niche workflows, large dependency trees, anything that would meaningfully change agent behaviour by default.
---
# Scheduled Tasks (Cron)
# Scheduled Tasks (Cron)
Schedule tasks to run automatically with natural language or cron expressions. Hermes exposes cron management through a single `cronjob` tool with action-style operations instead of separate schedule/list/remove tools.
## What cron can do now
Cron jobs can:
- schedule one-shot or recurring tasks
- pause, resume, edit, trigger, and remove jobs
- attach zero, one, or multiple skills to a job
- deliver results back to the origin chat, local files, or configured platform targets
- run in fresh agent sessions with the normal static tool list
- run in **no-agent mode** — a script on a schedule, its stdout delivered verbatim, zero LLM involvement (see the [no-agent mode](#no-agent-mode-script-only-jobs) section below)
All of this is available to Hermes itself through the `cronjob` tool, so you can create, pause, edit, and remove jobs by asking in plain language — no CLI required.
:::warning
Cron-run sessions cannot recursively create more cron jobs. Hermes disables cron management tools inside cron executions to prevent runaway scheduling loops.
:::
## Creating scheduled tasks
### In chat with `/cron`
```bash
/cron add 30m "Remind me to check the build"
/cron add "every 2h" "Check server status"
/cron add "every 1h" "Summarize new feed items" --skill blogwatcher
/cron add "every 1h" "Use both skills and combine the result" --skill blogwatcher --skill maps
```
### From the standalone CLI
```bash
hermes cron create "every 2h" "Check server status"
hermes cron create "every 1h" "Summarize new feed items" --skill blogwatcher
hermes cron create "every 1h" "Use both skills and combine the result" \
--skill blogwatcher \
--skill maps \
--name "Skill combo"
```
### Through natural conversation
Ask Hermes normally:
```text
Every morning at 9am, check Hacker News for AI news and send me a summary on Telegram.
```
Hermes will use the unified `cronjob` tool internally.
## Skill-backed cron jobs
A cron job can load one or more skills before it runs the prompt.
### Single skill
```python
cronjob(
action="create",
skill="blogwatcher",
prompt="Check the configured feeds and summarize anything new.",
schedule="0 9 * * *",
name="Morning feeds",
)
```
### Multiple skills
Skills are loaded in order. The prompt becomes the task instruction layered on top of those skills.
```python
cronjob(
action="create",
skills=["blogwatcher", "maps"],
prompt="Look for new local events and interesting nearby places, then combine them into one short brief.",
schedule="every 6h",
name="Local brief",
)
```
This is useful when you want a scheduled agent to inherit reusable workflows without stuffing the full skill text into the cron prompt itself.
## Running a job inside a project directory
Cron jobs default to running detached from any repo — no `AGENTS.md`, `CLAUDE.md`, or `.cursorrules` is loaded, and the terminal / file / code-exec tools run from whatever working directory the gateway started in. Pass `--workdir` (CLI) or `workdir=` (tool call) to change that:
```bash
# Standalone CLI (schedule and prompt are positional)
hermes cron create "every 1d at 09:00" \
"Audit open PRs, summarize CI health, and post to #eng" \
--workdir /home/me/projects/acme
```
```python
# From a chat, via the cronjob tool
cronjob(
action="create",
schedule="every 1d at 09:00",
workdir="/home/me/projects/acme",
prompt="Audit open PRs, summarize CI health, and post to #eng",
)
```
When `workdir` is set:
- `AGENTS.md`, `CLAUDE.md`, and `.cursorrules` from that directory are injected into the system prompt (same discovery order as the interactive CLI)
- `terminal`, `read_file`, `write_file`, `patch`, `search_files`, and `execute_code` all use that directory as their working directory (via `TERMINAL_CWD`)
- The path must be an absolute directory that exists — relative paths and missing directories are rejected at create / update time
- Pass `--workdir ""` (or `workdir=""` via the tool) on edit to clear it and restore the old behaviour
:::note Serialization
Jobs with a `workdir` run sequentially on the scheduler tick, not in the parallel pool. This is deliberate — `TERMINAL_CWD` is process-global, so two workdir jobs running at the same time would corrupt each other's cwd. Workdir-less jobs still run in parallel as before.
:::
## Editing jobs
You do not need to delete and recreate jobs just to change them.
### Chat
```bash
/cron edit --schedule "every 4h"
/cron edit --prompt "Use the revised task"
/cron edit --skill blogwatcher --skill maps
/cron edit --remove-skill blogwatcher
/cron edit --clear-skills
```
### Standalone CLI
```bash
hermes cron edit --schedule "every 4h"
hermes cron edit --prompt "Use the revised task"
hermes cron edit --skill blogwatcher --skill maps
hermes cron edit --add-skill maps
hermes cron edit --remove-skill blogwatcher
hermes cron edit --clear-skills
```
Notes:
- repeated `--skill` replaces the job's attached skill list
- `--add-skill` appends to the existing list without replacing it
- `--remove-skill` removes specific attached skills
- `--clear-skills` removes all attached skills
## Lifecycle actions
Cron jobs now have a fuller lifecycle than just create/remove.
### Chat
```bash
/cron list
/cron pause
/cron resume
/cron run
/cron remove
```
### Standalone CLI
```bash
hermes cron list
hermes cron pause
hermes cron resume
hermes cron run
hermes cron remove
hermes cron status
hermes cron tick
```
What they do:
- `pause` — keep the job but stop scheduling it
- `resume` — re-enable the job and compute the next future run
- `run` — trigger the job on the next scheduler tick
- `remove` — delete it entirely
## How it works
**Cron execution is handled by the gateway daemon.** The gateway ticks the scheduler every 60 seconds, running any due jobs in isolated agent sessions.
```bash
hermes gateway install # Install as a user service
sudo hermes gateway install --system # Linux: boot-time system service for servers
hermes gateway # Or run in foreground
hermes cron list
hermes cron status
```
### Gateway scheduler behavior
On each tick Hermes:
1. loads jobs from `~/.hermes/cron/jobs.json`
2. checks `next_run_at` against the current time
3. starts a fresh `AIAgent` session for each due job
4. optionally injects one or more attached skills into that fresh session
5. runs the prompt to completion
6. delivers the final response
7. updates run metadata and the next scheduled time
A file lock at `~/.hermes/cron/.tick.lock` prevents overlapping scheduler ticks from double-running the same job batch.
## Delivery options
When scheduling jobs, you specify where the output goes:
| Option | Description | Example |
|--------|-------------|---------|
| `"origin"` | Back to where the job was created | Default on messaging platforms |
| `"local"` | Save to local files only (`~/.hermes/cron/output/`) | Default on CLI |
| `"telegram"` | Telegram home channel | Uses `TELEGRAM_HOME_CHANNEL` |
| `"telegram:123456"` | Specific Telegram chat by ID | Direct delivery |
| `"telegram:-100123:17585"` | Specific Telegram topic | `chat_id:thread_id` format |
| `"discord"` | Discord home channel | Uses `DISCORD_HOME_CHANNEL` |
| `"discord:#engineering"` | Specific Discord channel | By channel name |
| `"slack"` | Slack home channel | |
| `"whatsapp"` | WhatsApp home | |
| `"signal"` | Signal | |
| `"matrix"` | Matrix home room | |
| `"mattermost"` | Mattermost home channel | |
| `"email"` | Email | |
| `"sms"` | SMS via Twilio | |
| `"homeassistant"` | Home Assistant | |
| `"dingtalk"` | DingTalk | |
| `"feishu"` | Feishu/Lark | |
| `"wecom"` | WeCom | |
| `"weixin"` | Weixin (WeChat) | |
| `"bluebubbles"` | BlueBubbles (iMessage) | |
| `"qqbot"` | QQ Bot (Tencent QQ) | |
The agent's final response is automatically delivered. You do not need to call `send_message` in the cron prompt.
### Response wrapping
By default, delivered cron output is wrapped with a header and footer so the recipient knows it came from a scheduled task:
```
Cronjob Response: Morning feeds
-------------
Note: The agent cannot see this message, and therefore cannot respond to it.
```
To deliver the raw agent output without the wrapper, set `cron.wrap_response` to `false`:
```yaml
# ~/.hermes/config.yaml
cron:
wrap_response: false
```
### Silent suppression
If the agent's final response starts with `[SILENT]`, delivery is suppressed entirely. The output is still saved locally for audit (in `~/.hermes/cron/output/`), but no message is sent to the delivery target.
This is useful for monitoring jobs that should only report when something is wrong:
```text
Check if nginx is running. If everything is healthy, respond with only [SILENT].
Otherwise, report the issue.
```
Failed jobs always deliver regardless of the `[SILENT]` marker — only successful runs can be silenced.
## Script timeout
Pre-run scripts (attached via the `script` parameter) have a default timeout of 120 seconds. If your scripts need longer — for example, to include randomized delays that avoid bot-like timing patterns — you can increase this:
```yaml
# ~/.hermes/config.yaml
cron:
script_timeout_seconds: 300 # 5 minutes
```
Or set the `HERMES_CRON_SCRIPT_TIMEOUT` environment variable. The resolution order is: env var → config.yaml → 120s default.
## No-agent mode (script-only jobs)
For recurring jobs that don't need LLM reasoning — classic watchdogs, disk/memory alerts, heartbeats, CI pings — pass `no_agent=True` at creation time. The scheduler runs your script on schedule and delivers its stdout directly, skipping the agent entirely:
```bash
hermes cron create "every 5m" \
--no-agent \
--script memory-watchdog.sh \
--deliver telegram \
--name "memory-watchdog"
```
Semantics:
- Script stdout (trimmed) → delivered verbatim as the message.
- **Empty stdout → silent tick**, no delivery. This is the watchdog pattern: "only say something when something is wrong".
- Non-zero exit or timeout → an error alert is delivered, so a broken watchdog can't fail silently.
- `{"wakeAgent": false}` on the last line → silent tick (same gate LLM jobs use).
- No tokens, no model, no provider fallback — the job never touches the inference layer.
`.sh` / `.bash` files run under `/bin/bash`; anything else under the current Python interpreter (`sys.executable`). Scripts must live in `~/.hermes/scripts/` (same sandboxing rule as the pre-run script gate).
### The agent sets these up for you
The `cronjob` tool's schema exposes `no_agent` to Hermes directly, so you can describe a watchdog in chat and let the agent wire it up:
```text
Ping me on Telegram if RAM is over 85%, every 5 minutes.
```
Hermes will write the check script to `~/.hermes/scripts/` via `write_file`, then call:
```python
cronjob(action="create", schedule="every 5m",
script="memory-watchdog.sh", no_agent=True,
deliver="telegram", name="memory-watchdog")
```
It picks `no_agent=True` automatically when the message content is fully determined by the script (watchdogs, threshold alerts, heartbeats). The same tool also lets the agent pause, resume, edit, and remove jobs — so the whole lifecycle is chat-driven without anyone touching the CLI.
See the [Script-Only Cron Jobs guide](/docs/guides/cron-script-only) for worked examples.
## Chaining jobs with `context_from`
Cron jobs run in isolated sessions with no memory of previous runs. But sometimes one job's output is exactly what the next job needs. The `context_from` parameter wires that connection automatically — Job B's prompt gets Job A's most recent output prepended as context at runtime.
```python
# Job 1: Collect raw data
cronjob(
action="create",
prompt="Fetch the top 10 AI/ML stories from Hacker News. Save them to ~/.hermes/data/briefs/raw.md in markdown format with title, URL, and score.",
schedule="0 7 * * *",
name="AI News Collector",
)
# Job 2: Triage — receives Job 1's output as context
# Get Job 1's ID from: cronjob(action="list")
cronjob(
action="create",
prompt="Read ~/.hermes/data/briefs/raw.md. Score each story 1–10 for engagement potential and novelty. Output the top 5 to ~/.hermes/data/briefs/ranked.md.",
schedule="30 7 * * *",
context_from="",
name="AI News Triage",
)
# Job 3: Ship — receives Job 2's output as context
cronjob(
action="create",
prompt="Read ~/.hermes/data/briefs/ranked.md. Write 3 tweet drafts (hook + body + hashtags). Deliver to telegram:7976161601.",
schedule="0 8 * * *",
context_from="",
name="AI News Brief",
)
```
**How it works:**
- When Job 2 fires, Hermes reads Job 1's most recent output from `~/.hermes/cron/output/{job1_id}/*.md`
- That output is prepended to Job 2's prompt automatically
- Job 2 doesn't need to hardcode "read this file" — it receives the content as context
- The chain can be any length: Job 1 → Job 2 → Job 3 → ...
**What `context_from` accepts:**
| Format | Example |
|--------|---------|
| Single job ID (string) | `context_from="a1b2c3d4"` |
| Multiple job IDs (list) | `context_from=["job_a", "job_b"]` |
Outputs are concatenated in the order listed.
**When to use it:**
- Multi-stage pipelines (collect → filter → format → deliver)
- Dependent tasks where step N's work depends on step N−1's output
- Fan-out/fan-in patterns where one job aggregates results from several others
## Provider recovery
Cron jobs inherit your configured fallback providers and credential pool rotation. If the primary API key is rate-limited or the provider returns an error, the cron agent can:
- **Fall back to an alternate provider** if you have `fallback_providers` (or the legacy `fallback_model`) configured in `config.yaml`
- **Rotate to the next credential** in your [credential pool](/docs/user-guide/configuration#credential-pool-strategies) for the same provider
This means cron jobs that run at high frequency or during peak hours are more resilient — a single rate-limited key won't fail the entire run.
## Schedule formats
The agent's final response is automatically delivered — you do **not** need to include `send_message` in the cron prompt for that same destination. If a cron run calls `send_message` to the exact target the scheduler will already deliver to, Hermes skips that duplicate send and tells the model to put the user-facing content in the final response instead. Use `send_message` only for additional or different targets.
### Relative delays (one-shot)
```text
30m → Run once in 30 minutes
2h → Run once in 2 hours
1d → Run once in 1 day
```
### Intervals (recurring)
```text
every 30m → Every 30 minutes
every 2h → Every 2 hours
every 1d → Every day
```
### Cron expressions
```text
0 9 * * * → Daily at 9:00 AM
0 9 * * 1-5 → Weekdays at 9:00 AM
0 */6 * * * → Every 6 hours
30 8 1 * * → First of every month at 8:30 AM
0 0 * * 0 → Every Sunday at midnight
```
### ISO timestamps
```text
2026-03-15T09:00:00 → One-time at March 15, 2026 9:00 AM
```
## Repeat behavior
| Schedule type | Default repeat | Behavior |
|--------------|----------------|----------|
| One-shot (`30m`, timestamp) | 1 | Runs once |
| Interval (`every 2h`) | forever | Runs until removed |
| Cron expression | forever | Runs until removed |
You can override it:
```python
cronjob(
action="create",
prompt="...",
schedule="every 2h",
repeat=5,
)
```
## Managing jobs programmatically
The agent-facing API is one tool:
```python
cronjob(action="create", ...)
cronjob(action="list")
cronjob(action="update", job_id="...")
cronjob(action="pause", job_id="...")
cronjob(action="resume", job_id="...")
cronjob(action="run", job_id="...")
cronjob(action="remove", job_id="...")
```
For `update`, pass `skills=[]` to remove all attached skills.
## Toolsets available to cron jobs
Cron runs each job in a fresh agent session with no chat platform attached. By default the cron agent gets **the toolset you configured for the `cron` platform in `hermes tools`** — not the CLI default, not everything under the sun.
```bash
hermes tools
# → pick the "cron" platform in the curses UI
# → toggle toolsets on/off just like you would for Telegram/Discord/etc.
```
Tighter per-job control is available via the `enabled_toolsets` field on `cronjob.create` (or on an existing job via `cronjob.update`):
```text
cronjob(action="create", name="weekly-news-summary",
schedule="every sunday 9am",
enabled_toolsets=["web", "file"], # just web + file, no terminal/browser/etc.
prompt="Summarize this week's AI news: ...")
```
When `enabled_toolsets` is set on a job it wins; otherwise the `hermes tools` cron-platform config wins; otherwise Hermes falls back to the built-in defaults. This matters for cost control: carrying `moa`, `browser`, `delegation` into every tiny "fetch news" job bloats the tool-schema prompt on every LLM call.
### Skipping the agent entirely: `wakeAgent`
If your cron job attaches a pre-check script (via `script=`), the script can decide at runtime whether Hermes should even invoke the agent. Emit a final stdout line of the form:
```text
{"wakeAgent": false}
```
…and cron skips the agent run entirely for this tick. Useful for frequent polls (every 1–5 min) that only need to wake the LLM when state actually changed — otherwise you pay for zero-content agent turns over and over.
```python
# pre-check script
import json, sys
latest = fetch_latest_issue_count()
prev = read_state("issue_count")
if latest == prev:
print(json.dumps({"wakeAgent": False})) # skip this tick
sys.exit(0)
write_state("issue_count", latest)
print(json.dumps({"wakeAgent": True, "context": {"new_issues": latest - prev}}))
```
When `wakeAgent` is omitted, the default is `true` (wake the agent as usual).
### Chaining jobs: `context_from`
A cron job can consume the most recent successful output of one or more other jobs by listing their names (or IDs) in `context_from`:
```text
cronjob(action="create", name="daily-digest",
schedule="every day 7am",
context_from=["ai-news-fetch", "github-prs-fetch"],
prompt="Write the daily digest using the outputs above.")
```
The referenced jobs' most recent completed outputs are injected above the prompt as context for this run. Each upstream entry must be a valid job ID or name (see `cronjob action="list"`). Note: chaining reads the *most recent completed* output — it does not wait for upstream jobs that are running in the same tick.
## Job storage
Jobs are stored in `~/.hermes/cron/jobs.json`. Output from job runs is saved to `~/.hermes/cron/output/{job_id}/{timestamp}.md`.
Jobs may store `model` and `provider` as `null`. When those fields are omitted, Hermes resolves them at execution time from the global configuration. They only appear in the job record when a per-job override is set.
The storage uses atomic file writes so interrupted writes do not leave a partially written job file behind.
## Self-contained prompts still matter
:::warning Important
Cron jobs run in a completely fresh agent session. The prompt must contain everything the agent needs that is not already provided by attached skills.
:::
**BAD:** `"Check on that server issue"`
**GOOD:** `"SSH into server 192.168.1.100 as user 'deploy', check if nginx is running with 'systemctl status nginx', and verify https://example.com returns HTTP 200."`
## Security
Scheduled task prompts are scanned for prompt-injection and credential-exfiltration patterns at creation and update time. Prompts containing invisible Unicode tricks, SSH backdoor attempts, or obvious secret-exfiltration payloads are blocked.
---
# Subagent Delegation
# Subagent Delegation
The `delegate_task` tool spawns child AIAgent instances with isolated context, restricted toolsets, and their own terminal sessions. Each child gets a fresh conversation and works independently — only its final summary enters the parent's context.
## Single Task
```python
delegate_task(
goal="Debug why tests fail",
context="Error: assertion in test_foo.py line 42",
toolsets=["terminal", "file"]
)
```
## Parallel Batch
Up to 3 concurrent subagents by default (configurable, no hard ceiling):
```python
delegate_task(tasks=[
{"goal": "Research topic A", "toolsets": ["web"]},
{"goal": "Research topic B", "toolsets": ["web"]},
{"goal": "Fix the build", "toolsets": ["terminal", "file"]}
])
```
## How Subagent Context Works
:::warning Critical: Subagents Know Nothing
Subagents start with a **completely fresh conversation**. They have zero knowledge of the parent's conversation history, prior tool calls, or anything discussed before delegation. The subagent's only context comes from the `goal` and `context` fields the parent agent populates when it calls `delegate_task`.
:::
This means the parent agent must pass **everything** the subagent needs in the call:
```python
# BAD - subagent has no idea what "the error" is
delegate_task(goal="Fix the error")
# GOOD - subagent has all context it needs
delegate_task(
goal="Fix the TypeError in api/handlers.py",
context="""The file api/handlers.py has a TypeError on line 47:
'NoneType' object has no attribute 'get'.
The function process_request() receives a dict from parse_body(),
but parse_body() returns None when Content-Type is missing.
The project is at /home/user/myproject and uses Python 3.11."""
)
```
The subagent receives a focused system prompt built from your goal and context, instructing it to complete the task and provide a structured summary of what it did, what it found, any files modified, and any issues encountered.
## Practical Examples
### Parallel Research
Research multiple topics simultaneously and collect summaries:
```python
delegate_task(tasks=[
{
"goal": "Research the current state of WebAssembly in 2025",
"context": "Focus on: browser support, non-browser runtimes, language support",
"toolsets": ["web"]
},
{
"goal": "Research the current state of RISC-V adoption in 2025",
"context": "Focus on: server chips, embedded systems, software ecosystem",
"toolsets": ["web"]
},
{
"goal": "Research quantum computing progress in 2025",
"context": "Focus on: error correction breakthroughs, practical applications, key players",
"toolsets": ["web"]
}
])
```
### Code Review + Fix
Delegate a review-and-fix workflow to a fresh context:
```python
delegate_task(
goal="Review the authentication module for security issues and fix any found",
context="""Project at /home/user/webapp.
Auth module files: src/auth/login.py, src/auth/jwt.py, src/auth/middleware.py.
The project uses Flask, PyJWT, and bcrypt.
Focus on: SQL injection, JWT validation, password handling, session management.
Fix any issues found and run the test suite (pytest tests/auth/).""",
toolsets=["terminal", "file"]
)
```
### Multi-File Refactoring
Delegate a large refactoring task that would flood the parent's context:
```python
delegate_task(
goal="Refactor all Python files in src/ to replace print() with proper logging",
context="""Project at /home/user/myproject.
Use the 'logging' module with logger = logging.getLogger(__name__).
Replace print() calls with appropriate log levels:
- print(f"Error: ...") -> logger.error(...)
- print(f"Warning: ...") -> logger.warning(...)
- print(f"Debug: ...") -> logger.debug(...)
- Other prints -> logger.info(...)
Don't change print() in test files or CLI output.
Run pytest after to verify nothing broke.""",
toolsets=["terminal", "file"]
)
```
## Batch Mode Details
When you provide a `tasks` array, subagents run in **parallel** using a thread pool:
- **Maximum concurrency:** 3 tasks by default (configurable via `delegation.max_concurrent_children` or the `DELEGATION_MAX_CONCURRENT_CHILDREN` env var; floor of 1, no hard ceiling). Batches larger than the limit return a tool error rather than being silently truncated.
- **Thread pool:** Uses `ThreadPoolExecutor` with the configured concurrency limit as max workers
- **Progress display:** In CLI mode, a tree-view shows tool calls from each subagent in real-time with per-task completion lines. In gateway mode, progress is batched and relayed to the parent's progress callback
- **Result ordering:** Results are sorted by task index to match input order regardless of completion order
- **Interrupt propagation:** Interrupting the parent (e.g., sending a new message) interrupts all active children
Single-task delegation runs directly without thread pool overhead.
## Model Override
You can configure a different model for subagents via `config.yaml` — useful for delegating simple tasks to cheaper/faster models:
```yaml
# In ~/.hermes/config.yaml
delegation:
model: "google/gemini-flash-2.0" # Cheaper model for subagents
provider: "openrouter" # Optional: route subagents to a different provider
```
If omitted, subagents use the same model as the parent.
## Toolset Selection Tips
The `toolsets` parameter controls what tools the subagent has access to. Choose based on the task:
| Toolset Pattern | Use Case |
|----------------|----------|
| `["terminal", "file"]` | Code work, debugging, file editing, builds |
| `["web"]` | Research, fact-checking, documentation lookup |
| `["terminal", "file", "web"]` | Full-stack tasks (default) |
| `["file"]` | Read-only analysis, code review without execution |
| `["terminal"]` | System administration, process management |
Certain toolsets are blocked for subagents regardless of what you specify:
- `delegation` — blocked for leaf subagents (the default). Retained for `role="orchestrator"` children, bounded by `max_spawn_depth` — see [Depth Limit and Nested Orchestration](#depth-limit-and-nested-orchestration) below.
- `clarify` — subagents cannot interact with the user
- `memory` — no writes to shared persistent memory
- `code_execution` — children should reason step-by-step
- `send_message` — no cross-platform side effects (e.g., sending Telegram messages)
## Max Iterations
Each subagent has an iteration limit (default: 50) that controls how many tool-calling turns it can take:
```python
delegate_task(
goal="Quick file check",
context="Check if /etc/nginx/nginx.conf exists and print its first 10 lines",
max_iterations=10 # Simple task, don't need many turns
)
```
## Child Timeout
Subagents are killed as stuck if they go quiet for more than `delegation.child_timeout_seconds` wall-clock seconds. The default is **600** (10 minutes) — bumped up from 300s in earlier releases because high-reasoning models on non-trivial research tasks were getting killed mid-think. Tune it per-install:
```yaml
delegation:
child_timeout_seconds: 600 # default
```
Lower it for fast local models; raise it for slow reasoning models on hard problems. The timer resets every time the child makes an API call or tool call — only genuinely idle workers trigger the kill.
:::tip Diagnostic dump on zero-call timeout
If a subagent times out having made **zero** API calls (usually: provider unreachable, auth failure, or tool-schema rejection), `delegate_task` writes a structured diagnostic to `~/.hermes/logs/subagent-timeout--.log` containing the subagent's config snapshot, credential-resolution trace, and any early error messages. Much easier to root-cause than the previous silent-timeout behavior.
:::
## Monitoring Running Subagents (`/agents`)
The TUI ships a `/agents` overlay (alias `/tasks`) that turns recursive `delegate_task` fan-out into a first-class audit surface:
- Live tree view of running and recently-finished subagents, grouped by parent
- Per-branch cost, token, and file-touched rollups
- Kill and pause controls — cancel a specific subagent mid-flight without interrupting its siblings
- Post-hoc review: step through each subagent's turn-by-turn history even after they've returned to the parent
The classic CLI just prints `/agents` as a text summary; the TUI is where the overlay shines. See [TUI — Slash commands](/docs/user-guide/tui#slash-commands).
## Depth Limit and Nested Orchestration
By default, delegation is **flat**: a parent (depth 0) spawns children (depth 1), and those children cannot delegate further. This prevents runaway recursive delegation.
For multi-stage workflows (research → synthesis, or parallel orchestration over sub-problems), a parent can spawn **orchestrator** children that *can* delegate their own workers:
```python
delegate_task(
goal="Survey three code review approaches and recommend one",
role="orchestrator", # Allows this child to spawn its own workers
context="...",
)
```
- `role="leaf"` (default): child cannot delegate further — identical to the flat-delegation behavior.
- `role="orchestrator"`: child retains the `delegation` toolset. Gated by `delegation.max_spawn_depth` (default **1** = flat, so `role="orchestrator"` is a no-op at defaults). Raise `max_spawn_depth` to 2 to allow orchestrator children to spawn leaf grandchildren; 3 for three levels (cap).
- `delegation.orchestrator_enabled: false`: global kill switch that forces every child to `leaf` regardless of the `role` parameter.
**Cost warning:** With `max_spawn_depth: 3` and `max_concurrent_children: 3`, the tree can reach 3×3×3 = 27 concurrent leaf agents. Each extra level multiplies spend — raise `max_spawn_depth` intentionally.
## Lifetime and Durability
:::warning delegate_task is synchronous — not durable
`delegate_task` runs **inside the parent's current turn**. It blocks the parent until every child finishes (or is cancelled). It is **not** a background job queue:
- If the parent is interrupted (user sends a new message, `/stop`, `/new`), all active children are cancelled and return `status="interrupted"`. Their in-progress work is discarded.
- Children do **not** continue running after the parent turn ends.
- Cancelled children return a structured result (`status="interrupted"`, `exit_reason="interrupted"`), but because the parent was interrupted too, that result often never makes it into a user-visible reply.
For **durable long-running work** that must survive interrupts or outlive the current turn, use:
- `cronjob` (action=`create`) — schedules a separate agent run; immune to parent-turn interrupts.
- `terminal(background=True, notify_on_complete=True)` — long-running shell commands that keep running while the agent does other things.
:::
## Key Properties
- Each subagent gets its **own terminal session** (separate from the parent)
- **Nested delegation is opt-in** — only `role="orchestrator"` children can delegate further, and only when `max_spawn_depth` is raised from its default of 1 (flat). Disable globally with `orchestrator_enabled: false`.
- Leaf subagents **cannot** call: `delegate_task`, `clarify`, `memory`, `send_message`, `execute_code`. Orchestrator subagents retain `delegate_task` but still cannot use the other four.
- **Interrupt propagation** — interrupting the parent interrupts all active children (including grandchildren under orchestrators)
- Only the final summary enters the parent's context, keeping token usage efficient
- Subagents inherit the parent's **API key, provider configuration, and credential pool** (enabling key rotation on rate limits)
## Delegation vs execute_code
| Factor | delegate_task | execute_code |
|--------|--------------|-------------|
| **Reasoning** | Full LLM reasoning loop | Just Python code execution |
| **Context** | Fresh isolated conversation | No conversation, just script |
| **Tool access** | All non-blocked tools with reasoning | 7 tools via RPC, no reasoning |
| **Parallelism** | 3 concurrent subagents by default (configurable) | Single script |
| **Best for** | Complex tasks needing judgment | Mechanical multi-step pipelines |
| **Token cost** | Higher (full LLM loop) | Lower (only stdout returned) |
| **User interaction** | None (subagents can't clarify) | None |
**Rule of thumb:** Use `delegate_task` when the subtask requires reasoning, judgment, or multi-step problem solving. Use `execute_code` when you need mechanical data processing or scripted workflows.
## Configuration
```yaml
# In ~/.hermes/config.yaml
delegation:
max_iterations: 50 # Max turns per child (default: 50)
# max_concurrent_children: 3 # Parallel children per batch (default: 3)
# max_spawn_depth: 1 # Tree depth (1-3, default 1 = flat). Raise to 2 to allow orchestrator children to spawn leaves; 3 for three levels.
# orchestrator_enabled: true # Disable to force all children to leaf role.
model: "google/gemini-3-flash-preview" # Optional provider/model override
provider: "openrouter" # Optional built-in provider
# Or use a direct custom endpoint instead of provider:
delegation:
model: "qwen2.5-coder"
base_url: "http://localhost:1234/v1"
api_key: "local-key"
```
:::tip
The agent handles delegation automatically based on the task complexity. You don't need to explicitly ask it to delegate — it will do so when it makes sense.
:::
---
# Kanban (Multi-Agent Board)
# Kanban — Multi-Agent Profile Collaboration
> **Want a walkthrough?** Read the [Kanban tutorial](./kanban-tutorial) — four user stories (solo dev, fleet farming, role pipeline with retry, circuit breaker) with dashboard screenshots of each. This page is the reference; the tutorial is the narrative.
Hermes Kanban is a durable task board, shared across all your Hermes profiles, that lets multiple named agents collaborate on work without fragile in-process subagent swarms. Every task is a row in `~/.hermes/kanban.db`; every handoff is a row anyone can read and write; every worker is a full OS process with its own identity.
### Two surfaces: the model talks through tools, you talk through the CLI
The board has two front doors, both backed by the same `~/.hermes/kanban.db`:
- **Agents drive the board through a dedicated `kanban_*` toolset** — `kanban_show`, `kanban_complete`, `kanban_block`, `kanban_heartbeat`, `kanban_comment`, `kanban_create`, `kanban_link`. The dispatcher spawns each worker with these tools already in its schema; the model reads its task and hands work off by calling them directly, *not* by shelling out to `hermes kanban`. See [How workers interact with the board](#how-workers-interact-with-the-board) below.
- **You (and scripts, and cron) drive the board through `hermes kanban …`** on the CLI, `/kanban …` as a slash command, or the dashboard. These are for humans and automation — the places without a tool-calling model behind them.
Both surfaces route through the same `kanban_db` layer, so reads see a consistent view and writes can't drift. The rest of this page shows CLI examples because they're easy to copy-paste, but every CLI verb has a tool-call equivalent the model uses.
This is the shape that covers the workloads `delegate_task` can't:
- **Research triage** — parallel researchers + analyst + writer, human-in-the-loop.
- **Scheduled ops** — recurring daily briefs that build a journal over weeks.
- **Digital twins** — persistent named assistants (`inbox-triage`, `ops-review`) that accumulate memory over time.
- **Engineering pipelines** — decompose → implement in parallel worktrees → review → iterate → PR.
- **Fleet work** — one specialist managing N subjects (50 social accounts, 12 monitored services).
For the full design rationale, comparative analysis against Cline Kanban / Paperclip / NanoClaw / Google Gemini Enterprise, and the eight canonical collaboration patterns, see `docs/hermes-kanban-v1-spec.pdf` in the repository.
## Kanban vs. `delegate_task`
They look similar; they are not the same primitive.
| | `delegate_task` | Kanban |
|---|---|---|
| Shape | RPC call (fork → join) | Durable message queue + state machine |
| Parent | Blocks until child returns | Fire-and-forget after `create` |
| Child identity | Anonymous subagent | Named profile with persistent memory |
| Resumability | None — failed = failed | Block → unblock → re-run; crash → reclaim |
| Human in the loop | Not supported | Comment / unblock at any point |
| Agents per task | One call = one subagent | N agents over task's life (retry, review, follow-up) |
| Audit trail | Lost on context compression | Durable rows in SQLite forever |
| Coordination | Hierarchical (caller → callee) | Peer — any profile reads/writes any task |
**One-sentence distinction:** `delegate_task` is a function call; Kanban is a work queue where every handoff is a row any profile (or human) can see and edit.
**Use `delegate_task` when** the parent agent needs a short reasoning answer before continuing, no humans involved, result goes back into the parent's context.
**Use Kanban when** work crosses agent boundaries, needs to survive restarts, might need human input, might be picked up by a different role, or needs to be discoverable after the fact.
They coexist: a kanban worker may call `delegate_task` internally during its run.
## Core concepts
- **Board** — a standalone queue of tasks with its own SQLite DB, workspaces
directory, and dispatcher loop. A single install can have many boards
(e.g. one per project, repo, or domain); see [Boards (multi-project)](#boards-multi-project)
below. Single-project users stay on the `default` board and never see the
word "board" outside this docs section.
- **Task** — a row with title, optional body, one assignee (a profile name), status (`triage | todo | ready | running | blocked | done | archived`), optional tenant namespace, optional idempotency key (dedup for retried automation).
- **Link** — `task_links` row recording a parent → child dependency. The dispatcher promotes `todo → ready` when all parents are `done`.
- **Comment** — the inter-agent protocol. Agents and humans append comments; when a worker is (re-)spawned it reads the full comment thread as part of its context.
- **Workspace** — the directory a worker operates in. Three kinds:
- `scratch` (default) — fresh tmp dir under `~/.hermes/kanban/workspaces//` (or `~/.hermes/kanban/boards//workspaces//` on non-default boards).
- `dir:` — an existing shared directory (Obsidian vault, mail ops dir, per-account folder). **Must be an absolute path.** Relative paths like `dir:../tenants/foo/` are rejected at dispatch because they'd resolve against whatever CWD the dispatcher happens to be in, which is ambiguous and a confused-deputy escape vector. The path is otherwise trusted — it's your box, your filesystem, the worker runs with your uid. This is the trusted-local-user threat model; kanban is single-host by design.
- `worktree` — a git worktree under `.worktrees//` for coding tasks. Worker-side `git worktree add` creates it.
- **Dispatcher** — a long-lived loop that, every N seconds (default 60): reclaims stale claims, reclaims crashed workers (PID gone but TTL not yet expired), promotes ready tasks, atomically claims, spawns assigned profiles. Runs **inside the gateway** by default (`kanban.dispatch_in_gateway: true`). One dispatcher sweeps all boards per tick; workers are spawned with `HERMES_KANBAN_BOARD` pinned so they can't see other boards. After ~5 consecutive spawn failures on the same task the dispatcher auto-blocks it with the last error as the reason — prevents thrashing on tasks whose profile doesn't exist, workspace can't mount, etc.
- **Tenant** — optional string namespace *within* a board. One specialist fleet can serve multiple businesses (`--tenant business-a`) with data isolation by workspace path and memory key prefix. Tenants are a soft filter; boards are the hard isolation boundary.
## Boards (multi-project)
Boards let you separate unrelated streams of work — one per project, repo,
or domain — into isolated queues. A new install has exactly one board
called `default` (DB at `~/.hermes/kanban.db` for back-compat). Users who
only want one stream of work never need to know about boards; the feature
is opt-in.
Per-board isolation is absolute:
- Separate SQLite DB per board (`~/.hermes/kanban/boards//kanban.db`).
- Separate `workspaces/` and `logs/` directories.
- Workers spawned for a task see **only** their board's tasks — the
dispatcher sets `HERMES_KANBAN_BOARD` in the child env and every
`kanban_*` tool the worker has access to reads it.
- Linking tasks across boards is not allowed (keeps the schema simple; if
you really need cross-project refs, use free-text mentions and look
them up by id manually).
### Managing boards from the CLI
```bash
# See what's on disk. Fresh installs show only "default".
hermes kanban boards list
# Create a new board.
hermes kanban boards create atm10-server \
--name "ATM10 Server" \
--description "Minecraft modded server ops" \
--icon 🎮 \
--switch # optional: make it the active board
# Operate on a specific board without switching.
hermes kanban --board atm10-server list
hermes kanban --board atm10-server create "Restart ATM server" --assignee ops
# Change which board is "current" for subsequent calls.
hermes kanban boards switch atm10-server
hermes kanban boards show # who's active right now?
# Rename the display name (the slug is immutable — it's the directory name).
hermes kanban boards rename atm10-server "ATM10 (Prod)"
# Archive (default) — moves the board's dir to boards/_archived/-/.
# Recoverable by moving the dir back.
hermes kanban boards rm atm10-server
# Hard delete — `rm -rf` the board dir. No recovery.
hermes kanban boards rm atm10-server --delete
```
Board resolution order (highest precedence first):
1. Explicit `--board ` on the CLI call.
2. `HERMES_KANBAN_BOARD` env var (set by the dispatcher when spawning a
worker, so workers can't see other boards).
3. `~/.hermes/kanban/current` — the slug persisted by `hermes kanban
boards switch`.
4. `default`.
Slugs are validated: lowercase alphanumerics + hyphens + underscores, 1-64
chars, must start with alphanumeric. Uppercase input is auto-downcased.
Anything else (slashes, spaces, dots, `..`) is rejected at the CLI layer
so path-traversal tricks can't name a board.
### Managing boards from the dashboard
`hermes dashboard` → Kanban tab shows a board switcher at the top as soon
as more than one board exists (or any board has tasks). Single-board users
see only a small `+ New board` button; the switcher is hidden until it
matters.
- **Board dropdown** — pick the active board. Your selection is saved to
the browser's `localStorage` so it persists across reloads without
shifting the CLI's `current` pointer out from under a terminal you left
open.
- **+ New board** — opens a modal asking for slug, display name,
description, and icon. Option to auto-switch to the new board.
- **Archive** — only shown on non-`default` boards. Confirms, then moves
the board dir to `boards/_archived/`.
All dashboard API endpoints accept `?board=` for board scoping. The
events WebSocket is pinned to a board at connection time; switching in
the UI opens a fresh WS against the new board.
## Quick start
The commands below are **you** (the human) setting up the board and creating tasks. Once a task is assigned, the dispatcher spawns the assigned profile as a worker, and from there **the model drives the task through `kanban_*` tool calls, not CLI commands** — see [How workers interact with the board](#how-workers-interact-with-the-board).
```bash
# 1. Create the board (you)
hermes kanban init
# 2. Start the gateway (hosts the embedded dispatcher)
hermes gateway start
# 3. Create a task (you — or an orchestrator agent via kanban_create)
hermes kanban create "research AI funding landscape" --assignee researcher
# 4. Watch activity live (you)
hermes kanban watch
# 5. See the board (you)
hermes kanban list
hermes kanban stats
```
When the dispatcher picks up `t_abcd` and spawns the `researcher` profile, the very first thing that worker's model does is call `kanban_show()` to read its task. It doesn't run `hermes kanban show t_abcd`.
### Gateway-embedded dispatcher (default)
The dispatcher runs inside the gateway process. Nothing to install, no
separate service to manage — if the gateway is up, ready tasks get picked
up on the next tick (60s by default).
```yaml
# config.yaml
kanban:
dispatch_in_gateway: true # default
dispatch_interval_seconds: 60 # default
```
Override the config flag at runtime via `HERMES_KANBAN_DISPATCH_IN_GATEWAY=0`
for debugging. Standard gateway supervision applies: run `hermes gateway
start` directly, or wire the gateway up as a systemd user unit (see the
gateway docs). Without a running gateway, `ready` tasks stay where they are
until one comes up — `hermes kanban create` warns about this at creation
time.
Running `hermes kanban daemon` as a separate process is **deprecated**;
use the gateway. If you truly cannot run the gateway (headless host
policy forbids long-lived services, etc.) a `--force` escape hatch keeps
the old standalone daemon alive for one release cycle, but running both
a gateway-embedded dispatcher AND a standalone daemon against the same
`kanban.db` causes claim races and is not supported.
### Idempotent create (for automation / webhooks)
```bash
# First call creates the task. Any subsequent call with the same key
# returns the existing task id instead of duplicating.
hermes kanban create "nightly ops review" \
--assignee ops \
--idempotency-key "nightly-ops-$(date -u +%Y-%m-%d)" \
--json
```
### Bulk CLI verbs
All the lifecycle verbs accept multiple ids so you can clean up a batch
in one command:
```bash
hermes kanban complete t_abc t_def t_hij --result "batch wrap"
hermes kanban archive t_abc t_def t_hij
hermes kanban unblock t_abc t_def
hermes kanban block t_abc "need input" --ids t_def t_hij
```
## How workers interact with the board
**Workers do not shell out to `hermes kanban`.** When the dispatcher spawns a worker it sets `HERMES_KANBAN_TASK=t_abcd` in the child's env, and that env var flips on a dedicated **kanban toolset** in the model's schema — seven tools that read and mutate the board directly via the Python `kanban_db` layer, same as the CLI does. A running worker calls these like any other tool; it never sees or needs the `hermes kanban` CLI.
| Tool | Purpose | Required params |
|---|---|---|
| `kanban_show` | Read the current task (title, body, prior attempts, parent handoffs, comments, full pre-formatted `worker_context`). Defaults to the env's task id. | — |
| `kanban_complete` | Finish with `summary` + `metadata` structured handoff. | at least one of `summary` / `result` |
| `kanban_block` | Escalate for human input with a `reason`. | `reason` |
| `kanban_heartbeat` | Signal liveness during long operations. Pure side-effect. | — |
| `kanban_comment` | Append a durable note to the task thread. | `task_id`, `body` |
| `kanban_create` | (Orchestrators) fan out into child tasks with an `assignee`, optional `parents`, `skills`, etc. | `title`, `assignee` |
| `kanban_link` | (Orchestrators) add a `parent_id → child_id` dependency edge after the fact. | `parent_id`, `child_id` |
A typical worker turn looks like:
```
# Model's tool calls, in order:
kanban_show() # no args — uses HERMES_KANBAN_TASK
# (model reads the returned worker_context, does the work via terminal/file tools)
kanban_heartbeat(note="halfway through — 4 of 8 files transformed")
# (more work)
kanban_complete(
summary="migrated limiter.py to token-bucket; added 14 tests, all pass",
metadata={"changed_files": ["limiter.py", "tests/test_limiter.py"], "tests_run": 14},
)
```
An **orchestrator** worker fans out instead:
```
kanban_show()
kanban_create(
title="research ICP funding 2024-2026",
assignee="researcher-a",
body="focus on seed + series A, North America, AI-adjacent",
)
# → returns {"task_id": "t_r1", ...}
kanban_create(title="research ICP funding — EU angle", assignee="researcher-b", body="…")
# → returns {"task_id": "t_r2", ...}
kanban_create(
title="synthesize findings into launch brief",
assignee="writer",
parents=["t_r1", "t_r2"], # promotes to ready when both complete
body="one-pager, 300 words, neutral tone",
)
kanban_complete(summary="decomposed into 2 research tasks + 1 writer; linked dependencies")
```
The three "(Orchestrators)" tools — `kanban_create`, `kanban_link`, and `kanban_comment` on foreign tasks — are available to every worker; the convention (enforced by the `kanban-orchestrator` skill) is that worker profiles don't fan out and orchestrator profiles don't execute.
### Why tools instead of shelling to `hermes kanban`
Three reasons:
1. **Backend portability.** Workers whose terminal tool points at a remote backend (Docker / Modal / Singularity / SSH) would run `hermes kanban complete` *inside* the container, where `hermes` isn't installed and `~/.hermes/kanban.db` isn't mounted. The kanban tools run in the agent's own Python process and always reach `~/.hermes/kanban.db` regardless of terminal backend.
2. **No shell-quoting fragility.** Passing `--metadata '{"files": [...]}'` through shlex + argparse is a latent footgun. Structured tool args skip it entirely.
3. **Better errors.** Tool results are structured JSON the model can reason about, not stderr strings it has to parse.
**Zero schema footprint on normal sessions.** A regular `hermes chat` session has zero `kanban_*` tools in its schema. The `check_fn` on each tool only returns True when `HERMES_KANBAN_TASK` is set, which only happens when the dispatcher spawned this process. No tool bloat for users who never touch kanban.
The `kanban-worker` and `kanban-orchestrator` skills teach the model which tool to call when and in what order.
### Recommended handoff evidence
`kanban_complete(summary=..., metadata={...})` is intentionally flexible:
the summary is the human-readable closeout, and `metadata` is the
machine-readable handoff that downstream agents, reviewers, or dashboards can
reuse without scraping prose.
For engineering and review tasks, prefer this optional metadata shape:
```json
{
"changed_files": ["path/to/file.py"],
"verification": ["pytest tests/hermes_cli/test_kanban_db.py -q"],
"dependencies": ["parent task id or external issue, if any"],
"blocked_reason": null,
"retry_notes": "what failed before, if this was a retry",
"residual_risk": ["what was not tested or still needs human review"]
}
```
These keys are a convention, not a schema requirement. The useful property is
that every worker leaves enough evidence for the next reader to answer four
questions quickly:
1. What changed?
2. How was it verified?
3. What can unblock or retry this if it fails?
4. What risk is still deliberately left open?
Keep secrets, raw logs, tokens, OAuth material, and unrelated transcripts out of
`metadata`. Store pointers and summaries instead. If a task has no files or
tests, say so explicitly in `summary` and use `metadata` for the evidence that
does exist, such as source URLs, issue ids, or manual review steps.
### The worker skill
Any profile that should be able to work kanban tasks must load the `kanban-worker` skill. It teaches the worker the full lifecycle in **tool calls**, not CLI commands:
1. On spawn, call `kanban_show()` to read title + body + parent handoffs + prior attempts + full comment thread.
2. `cd $HERMES_KANBAN_WORKSPACE` (via the terminal tool) and do the work there.
3. Call `kanban_heartbeat(note="...")` every few minutes during long operations.
4. Complete with `kanban_complete(summary="...", metadata={...})`, or `kanban_block(reason="...")` if stuck.
Load it with (this one is **you**, installing into a profile — not a tool call):
```bash
hermes skills install devops/kanban-worker
```
The dispatcher also auto-passes `--skills kanban-worker` when spawning every worker, so the worker always has the pattern library available even if a profile's default skills config doesn't include it.
### Pinning extra skills to a specific task
Sometimes a single task needs specialist context the assignee profile doesn't carry by default — a translation job that needs the `translation` skill, a review task that needs `github-code-review`, a security audit that needs `security-pr-audit`. Rather than editing the assignee's profile every time, attach the skills directly to the task.
**From an orchestrator agent** (the usual case — one agent routing work to another), use the `kanban_create` tool's `skills` array:
```
kanban_create(
title="translate README to Japanese",
assignee="linguist",
skills=["translation"],
)
kanban_create(
title="audit auth flow",
assignee="reviewer",
skills=["security-pr-audit", "github-code-review"],
)
```
**From a human (CLI / slash command)**, repeat `--skill` for each one:
```bash
hermes kanban create "translate README to Japanese" \
--assignee linguist \
--skill translation
hermes kanban create "audit auth flow" \
--assignee reviewer \
--skill security-pr-audit \
--skill github-code-review
```
**From the dashboard**, type the skills comma-separated into the **skills** field of the inline create form.
These skills are **additive** to the built-in `kanban-worker` — the dispatcher emits one `--skills ` flag for each (and for the built-in), so the worker spawns with all of them loaded. The skill names must match skills that are actually installed on the assignee's profile (run `hermes skills list` to see what's available); there's no runtime install.
### The orchestrator skill
A **well-behaved orchestrator does not do the work itself.** It decomposes the user's goal into tasks, links them, assigns each to a specialist, and steps back. The `kanban-orchestrator` skill encodes this as tool-call patterns: anti-temptation rules, a standard specialist roster (`researcher`, `writer`, `analyst`, `backend-eng`, `reviewer`, `ops`), and a decomposition playbook keyed on `kanban_create` / `kanban_link` / `kanban_comment`.
A canonical orchestrator turn (two parallel researchers handing off to a writer):
```
# Goal from user: "draft a launch post on the ICP funding landscape"
kanban_create(title="research ICP funding, NA angle", assignee="researcher-a", body="…") # → t_r1
kanban_create(title="research ICP funding, EU angle", assignee="researcher-b", body="…") # → t_r2
kanban_create(
title="synthesize ICP funding research into launch post draft",
assignee="writer",
parents=["t_r1", "t_r2"], # promoted to 'ready' when both researchers complete
body="one-pager, neutral tone, cite sources inline",
) # → t_w1
# Optional: add cross-cutting deps discovered later without re-creating tasks
kanban_link(parent_id="t_r1", child_id="t_followup")
kanban_complete(
summary="decomposed into 2 parallel research tasks → 1 synthesis task; writer starts when both researchers finish",
)
```
Load it into your orchestrator profile:
```bash
hermes skills install devops/kanban-orchestrator
```
For best results, pair it with a profile whose toolsets are restricted to board operations (`kanban`, `gateway`, `memory`) so the orchestrator literally cannot execute implementation tasks even if it tries.
## Dashboard (GUI)
The `/kanban` CLI and slash command are enough to run the board headlessly, but a visual board is often the right interface for humans-in-the-loop: triage, cross-profile supervision, reading comment threads, and dragging cards between columns. Hermes ships this as a **bundled dashboard plugin** at `plugins/kanban/` — not a core feature, not a separate service — following the model laid out in [Extending the Dashboard](./extending-the-dashboard).
Open it with:
```bash
hermes kanban init # one-time: create kanban.db if not already present
hermes dashboard # "Kanban" tab appears in the nav, after "Skills"
```
### What the plugin gives you
- A **Kanban** tab showing one column per status: `triage`, `todo`, `ready`, `running`, `blocked`, `done` (plus `archived` when the toggle is on).
- `triage` is the parking column for rough ideas a specifier is expected to flesh out. Tasks created with `hermes kanban create --triage` (or via the Triage column's inline create) land here and the dispatcher leaves them alone until a human or specifier promotes them to `todo` / `ready`.
- Cards show the task id, title, priority badge, tenant tag, assigned profile, comment/link counts, a **progress pill** (`N/M` children done when the task has dependents), and "created N ago". A per-card checkbox enables multi-select.
- **Per-profile lanes inside Running** — toolbar checkbox toggles sub-grouping of the Running column by assignee.
- **Live updates via WebSocket** — the plugin tails the append-only `task_events` table on a short poll interval; the board reflects changes the instant any profile (CLI, gateway, or another dashboard tab) acts. Reloads are debounced so a burst of events triggers a single refetch.
- **Drag-drop** cards between columns to change status. The drop sends `PATCH /api/plugins/kanban/tasks/:id` which routes through the same `kanban_db` code the CLI uses — the three surfaces can never drift. Moves into destructive statuses (`done`, `archived`, `blocked`) prompt for confirmation. Touch devices use a pointer-based fallback so the board is usable from a tablet.
- **Inline create** — click `+` on any column header to type a title, assignee, priority, and (optionally) a parent task from a dropdown over every existing task. Creating from the Triage column automatically parks the new task in triage.
- **Multi-select with bulk actions** — shift/ctrl-click a card or tick its checkbox to add it to the selection. A bulk action bar appears at the top with batch status transitions, archive, and reassign (by profile dropdown, or "(unassign)"). Destructive batches confirm first. Per-id partial failures are reported without aborting the rest.
- **Click a card** (without shift/ctrl) to open a side drawer (Escape or click-outside closes) with:
- **Editable title** — click the heading to rename.
- **Editable assignee / priority** — click the meta row to rewrite.
- **Editable description** — markdown-rendered by default (headings, bold, italic, inline code, fenced code, `http(s)` / `mailto:` links, bullet lists), with an "edit" button that swaps in a textarea. Markdown rendering is a tiny, XSS-safe renderer — every substitution runs on HTML-escaped input, only `http(s)` / `mailto:` links pass through, and `target="_blank"` + `rel="noopener noreferrer"` are always set.
- **Dependency editor** — chip list of parents and children, each with an `×` to unlink, plus dropdowns over every other task to add a new parent or child. Cycle attempts are rejected server-side with a clear message.
- **Status action row** (→ triage / → ready / → running / block / unblock / complete / archive) with confirm prompts for destructive transitions.
- Result section (also markdown-rendered), comment thread with Enter-to-submit, the last 20 events.
- **Toolbar filters** — free-text search, tenant dropdown (defaults to `dashboard.kanban.default_tenant` from `config.yaml`), assignee dropdown, "show archived" toggle, "lanes by profile" toggle, and a **Nudge dispatcher** button so you don't have to wait for the next 60 s tick.
Visually the target is the familiar Linear / Fusion layout: dark theme, column headers with counts, coloured status dots, pill chips for priority and tenant. The plugin reads only theme CSS vars (`--color-*`, `--radius`, `--font-mono`, ...), so it reskins automatically with whichever dashboard theme is active.
### Architecture
The GUI is strictly a **read-through-the-DB + write-through-kanban_db** layer with no domain logic of its own:
```
┌────────────────────────┐ WebSocket (tails task_events)
│ React SPA (plugin) │ ◀──────────────────────────────────┐
│ HTML5 drag-and-drop │ │
└──────────┬─────────────┘ │
│ REST over fetchJSON │
▼ │
┌────────────────────────┐ writes call kanban_db.* │
│ FastAPI router │ directly — same code path │
│ plugins/kanban/ │ the CLI /kanban verbs use │
│ dashboard/plugin_api.py │
└──────────┬─────────────┘ │
│ │
▼ │
┌────────────────────────┐ │
│ ~/.hermes/kanban.db │ ───── append task_events ──────────┘
│ (WAL, shared) │
└────────────────────────┘
```
### REST surface
All routes are mounted under `/api/plugins/kanban/` and protected by the dashboard's ephemeral session token:
| Method | Path | Purpose |
|---|---|---|
| `GET` | `/board?tenant=&include_archived=…` | Full board grouped by status column, plus tenants + assignees for filter dropdowns |
| `GET` | `/tasks/:id` | Task + comments + events + links |
| `POST` | `/tasks` | Create (wraps `kanban_db.create_task`, accepts `triage: bool` and `parents: [id, …]`) |
| `PATCH` | `/tasks/:id` | Status / assignee / priority / title / body / result |
| `POST` | `/tasks/bulk` | Apply the same patch (status / archive / assignee / priority) to every id in `ids`. Per-id failures reported without aborting siblings |
| `POST` | `/tasks/:id/comments` | Append a comment |
| `POST` | `/links` | Add a dependency (`parent_id` → `child_id`) |
| `DELETE` | `/links?parent_id=…&child_id=…` | Remove a dependency |
| `POST` | `/dispatch?max=…&dry_run=…` | Nudge the dispatcher — skip the 60 s wait |
| `GET` | `/config` | Read `dashboard.kanban` preferences from `config.yaml` — `default_tenant`, `lane_by_profile`, `include_archived_by_default`, `render_markdown` |
| `WS` | `/events?since=` | Live stream of `task_events` rows |
Every handler is a thin wrapper — the plugin is ~700 lines of Python (router + WebSocket tail + bulk batcher + config reader) and adds no new business logic. A tiny `_conn()` helper auto-initializes `kanban.db` on every read and write, so a fresh install works whether the user opened the dashboard first, hit the REST API directly, or ran `hermes kanban init`.
### Dashboard config
Any of these keys under `dashboard.kanban` in `~/.hermes/config.yaml` changes the tab's defaults — the plugin reads them at load time via `GET /config`:
```yaml
dashboard:
kanban:
default_tenant: acme # preselects the tenant filter
lane_by_profile: true # default for the "lanes by profile" toggle
include_archived_by_default: false
render_markdown: true # set false for plain
rendering
```
Each key is optional and falls back to the shown default.
### Security model
The dashboard's HTTP auth middleware [explicitly skips `/api/plugins/`](./extending-the-dashboard#backend-api-routes) — plugin routes are unauthenticated by design because the dashboard binds to localhost by default. That means the kanban REST surface is reachable from any process on the host.
The WebSocket takes one additional step: it requires the dashboard's ephemeral session token as a `?token=…` query parameter (browsers can't set `Authorization` on an upgrade request), matching the pattern used by the in-browser PTY bridge.
If you run `hermes dashboard --host 0.0.0.0`, every plugin route — kanban included — becomes reachable from the network. **Don't do that on a shared host.** The board contains task bodies, comments, and workspace paths; an attacker reaching these routes gets read access to your entire collaboration surface and can also create / reassign / archive tasks.
Tasks in `~/.hermes/kanban.db` are profile-agnostic on purpose (that's the coordination primitive). If you open the dashboard with `hermes -p dashboard`, the board still shows tasks created by any other profile on the host. Same user owns all profiles, but this is worth knowing if multiple personas coexist.
### Live updates
`task_events` is an append-only SQLite table with a monotonic `id`. The WebSocket endpoint holds each client's last-seen event id and pushes new rows as they land. When a burst of events arrives, the frontend reloads the (very cheap) board endpoint — simpler and more correct than trying to patch local state from every event kind. WAL mode means the read loop never blocks the dispatcher's `BEGIN IMMEDIATE` claim transactions.
### Extending it
The plugin uses the standard Hermes dashboard plugin contract — see [Extending the Dashboard](./extending-the-dashboard) for the full manifest reference, shell slots, page-scoped slots, and the Plugin SDK. Extra columns, custom card chrome, tenant-filtered layouts, or full `tab.override` replacements are all expressible without forking this plugin.
To disable without removing: add `dashboard.plugins.kanban.enabled: false` to `config.yaml` (or delete `plugins/kanban/dashboard/manifest.json`).
### Scope boundary
The GUI is deliberately thin. Everything the plugin does is reachable from the CLI; the plugin just makes it comfortable for humans. Auto-assignment, budgets, governance gates, and org-chart views remain user-space — a router profile, another plugin, or a reuse of `tools/approval.py` — exactly as listed in the out-of-scope section of the design spec.
## CLI command reference
This is the surface **you** (or scripts, cron, the dashboard) use to drive the board. Workers running inside the dispatcher use the `kanban_*` [tool surface](#how-workers-interact-with-the-board) for the same operations — the CLI here and the tools there both route through `kanban_db`, so the two surfaces agree by construction.
```
hermes kanban init # create kanban.db + print daemon hint
hermes kanban create "" [--body ...] [--assignee ]
[--parent ]... [--tenant ]
[--workspace scratch|worktree|dir:]
[--priority N] [--triage] [--idempotency-key KEY]
[--max-runtime 30m|2h|1d|]
[--skill ]...
[--json]
hermes kanban list [--mine] [--assignee P] [--status S] [--tenant T] [--archived] [--json]
hermes kanban show [--json]
hermes kanban assign # or 'none' to unassign
hermes kanban link
hermes kanban unlink
hermes kanban claim [--ttl SECONDS]
hermes kanban comment "" [--author NAME]
# Bulk verbs — accept multiple ids:
hermes kanban complete ... [--result "..."]
hermes kanban block "" [--ids ...]
hermes kanban unblock ...
hermes kanban archive ...
hermes kanban tail # follow a single task's event stream
hermes kanban watch [--assignee P] [--tenant T] # live stream ALL events to the terminal
[--kinds completed,blocked,…] [--interval SECS]
hermes kanban heartbeat [--note "..."] # worker liveness signal for long ops
hermes kanban runs [--json] # attempt history (one row per run)
hermes kanban assignees [--json] # profiles on disk + per-assignee task counts
hermes kanban dispatch [--dry-run] [--max N] # one-shot pass
[--failure-limit N] [--json]
hermes kanban daemon --force # DEPRECATED — standalone dispatcher (use `hermes gateway start` instead)
[--failure-limit N] [--pidfile PATH] [-v]
hermes kanban stats [--json] # per-status + per-assignee counts
hermes kanban log [--tail BYTES] # worker log from ~/.hermes/kanban/logs/
hermes kanban notify-subscribe # gateway bridge hook (used by /kanban in the gateway)
--platform --chat-id [--thread-id ] [--user-id ]
hermes kanban notify-list [] [--json]
hermes kanban notify-unsubscribe
--platform --chat-id [--thread-id ]
hermes kanban context # what a worker sees
hermes kanban gc [--event-retention-days N] # workspaces + old events + old logs
[--log-retention-days N]
```
All commands are also available as a slash command in the interactive CLI and in the messaging gateway (see [`/kanban` slash command](#kanban-slash-command) below).
## `/kanban` slash command {#kanban-slash-command}
Every `hermes kanban ` verb is also reachable as `/kanban ` — from inside an interactive `hermes chat` session **and** from any gateway platform (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, email, SMS). Both surfaces call the exact same `hermes_cli.kanban.run_slash()` entry point that reuses the `hermes kanban` argparse tree, so the argument surface, flags, and output format are identical across CLI, `/kanban`, and `hermes kanban`. You don't have to leave the chat to drive the board.
```
/kanban list
/kanban show t_abcd
/kanban create "write launch post" --assignee writer --parent t_research
/kanban comment t_abcd "looks good, ship it"
/kanban unblock t_abcd
/kanban dispatch --max 3
```
Quote multi-word arguments the same way you would on a shell — `run_slash` parses the rest of the line with `shlex.split`, so `"..."` and `'...'` both work.
### Mid-run usage: `/kanban` bypasses the running-agent guard
The gateway normally queues slash commands and user messages while an agent is still thinking — that's what stops you from accidentally starting a second turn while the first is in flight. **`/kanban` is explicitly exempted from this guard.** The board lives in `~/.hermes/kanban.db`, not in the running agent's state, so reads (`list`, `show`, `context`, `tail`, `watch`, `stats`, `runs`) and writes (`comment`, `unblock`, `block`, `assign`, `archive`, `create`, `link`, …) all go through immediately, even mid-turn.
This is the whole point of the separation:
- A worker blocks waiting on a peer → you send `/kanban unblock t_abcd` from your phone and the dispatcher picks the peer up on its next tick. The blocked worker isn't interrupted — it just stops being blocked.
- You spot a card that needs human context → `/kanban comment t_xyz "use the 2026 schema, not 2025"` lands on the task thread and the *next* run of that task will read it in `kanban_show()`.
- You want to know what your fleet is doing without stopping the orchestrator → `/kanban list --mine` or `/kanban stats` inspects the board without touching your main conversation.
### Auto-subscribe on `/kanban create` (gateway only)
When you create a task from the gateway with `/kanban create "…"`, the originating chat (platform + chat id + thread id) is automatically subscribed to that task's terminal events (`completed`, `blocked`, `gave_up`, `crashed`, `timed_out`). You'll get one message back per terminal event — including the first line of the worker's result summary on `completed` — without having to poll or remember the task id.
```
you> /kanban create "transcribe today's podcast" --assignee transcriber
bot> Created t_9fc1a3 (ready, assignee=transcriber)
(subscribed — you'll be notified when t_9fc1a3 completes or blocks)
… ~8 minutes later …
bot> ✓ t_9fc1a3 completed by transcriber
transcribed 42 minutes, saved to podcast/2026-05-04.md
```
Subscriptions auto-remove themselves once the task reaches `done` or `archived`. If you script a create with `--json` (machine output) the auto-subscribe is skipped — the assumption is that scripted callers want to manage subscriptions explicitly via `/kanban notify-subscribe`.
### Output truncation in messaging
Gateway platforms have practical message-length caps. If `/kanban list`, `/kanban show`, or `/kanban tail` produce more than ~3800 characters of output, the response is truncated with a `… (truncated; use \`hermes kanban …\` in your terminal for full output)` footer. The CLI surface has no such cap.
### Autocomplete
In the interactive CLI, typing `/kanban ` and hitting Tab cycles through the built-in subcommand list (`list`, `ls`, `show`, `create`, `assign`, `link`, `unlink`, `claim`, `comment`, `complete`, `block`, `unblock`, `archive`, `tail`, `dispatch`, `context`, `init`, `gc`). The remaining verbs listed in the CLI reference above (`watch`, `stats`, `runs`, `log`, `assignees`, `heartbeat`, `notify-subscribe`, `notify-list`, `notify-unsubscribe`, `daemon`) also work — they're just not in the autocomplete hint list yet.
## Collaboration patterns
The board supports these eight patterns without any new primitives:
| Pattern | Shape | Example |
|---|---|---|
| **P1 Fan-out** | N siblings, same role | "research 5 angles in parallel" |
| **P2 Pipeline** | role chain: scout → editor → writer | daily brief assembly |
| **P3 Voting / quorum** | N siblings + 1 aggregator | 3 researchers → 1 reviewer picks |
| **P4 Long-running journal** | same profile + shared dir + cron | Obsidian vault |
| **P5 Human-in-the-loop** | worker blocks → user comments → unblock | ambiguous decisions |
| **P6 `@mention`** | inline routing from prose | `@reviewer look at this` |
| **P7 Thread-scoped workspace** | `/kanban here` in a thread | per-project gateway threads |
| **P8 Fleet farming** | one profile, N subjects | 50 social accounts |
| **P9 Triage specifier** | rough idea → `triage` → specifier expands body → `todo` | "turn this one-liner into a spec' task" |
For worked examples of each, see `docs/hermes-kanban-v1-spec.pdf`.
## Multi-tenant usage
When one specialist fleet serves multiple businesses, tag each task with a tenant:
```bash
hermes kanban create "monthly report" \
--assignee researcher \
--tenant business-a \
--workspace dir:~/tenants/business-a/data/
```
Workers receive `$HERMES_TENANT` and namespace their memory writes by prefix. The board, the dispatcher, and the profile definitions are all shared; only the data is scoped.
## Gateway notifications
When you run `/kanban create …` from the gateway (Telegram, Discord, Slack, etc.), the originating chat is automatically subscribed to the new task. The gateway's background notifier polls `task_events` every few seconds and delivers one message per terminal event (`completed`, `blocked`, `gave_up`, `crashed`, `timed_out`) to that chat. Completed tasks also send the first line of the worker's `--result` so you see the outcome without having to `/kanban show`.
You can manage subscriptions explicitly from the CLI — useful when a script / cron job wants to notify a chat it didn't originate from:
```bash
hermes kanban notify-subscribe t_abcd \
--platform telegram --chat-id 12345678 --thread-id 7
hermes kanban notify-list
hermes kanban notify-unsubscribe t_abcd \
--platform telegram --chat-id 12345678 --thread-id 7
```
A subscription removes itself automatically once the task reaches `done` or `archived`; no cleanup needed.
## Runs — one row per attempt
A task is a logical unit of work; a **run** is one attempt to execute it. When the dispatcher claims a ready task it creates a row in `task_runs` and points `tasks.current_run_id` at it. When that attempt ends — completed, blocked, crashed, timed out, spawn-failed, reclaimed — the run row closes with an `outcome` and the task's pointer clears. A task that's been attempted three times has three `task_runs` rows.
Why two tables instead of just mutating the task: you need **full attempt history** for real-world postmortems ("the second reviewer attempt got to approve, the third merged"), and you need a clean place to hang per-attempt metadata — which files changed, which tests ran, which findings a reviewer noted. Those are run facts, not task facts.
Runs are also where **structured handoff** lives. When a worker completes a task (via `kanban_complete(...)`) it can pass:
- `summary` (tool param) / `--summary` (CLI) — human handoff; goes on the run; downstream children see it in their `build_worker_context`.
- `metadata` (tool param) / `--metadata` (CLI) — free-form JSON dict on the run; children see it serialized alongside the summary.
- `result` (tool param) / `--result` (CLI) — short log line that goes on the task row (legacy field, kept for back-compat).
Downstream children read the most recent completed run's summary + metadata for each parent. Retrying workers read the prior attempts on their own task (outcome, summary, error) so they don't repeat a path that already failed.
```
# What a worker actually does — a tool call, from inside the agent loop:
kanban_complete(
summary="implemented token bucket, keys on user_id with IP fallback, all tests pass",
metadata={"changed_files": ["limiter.py", "tests/test_limiter.py"], "tests_run": 14},
result="rate limiter shipped",
)
```
The same handoff is reachable from the CLI when you (the human) need to close out a task a worker can't — e.g. a task that was abandoned, or one you marked done manually from the dashboard:
```bash
hermes kanban complete t_abcd \
--result "rate limiter shipped" \
--summary "implemented token bucket, keys on user_id with IP fallback, all tests pass" \
--metadata '{"changed_files": ["limiter.py", "tests/test_limiter.py"], "tests_run": 14}'
# Review the attempt history on a retried task:
hermes kanban runs t_abcd
# # OUTCOME PROFILE ELAPSED STARTED
# 1 blocked worker 12s 2026-04-27 14:02
# → BLOCKED: need decision on rate-limit key
# 2 completed worker 8m 2026-04-27 15:18
# → implemented token bucket, keys on user_id with IP fallback
```
Runs are exposed on the dashboard (Run History section in the drawer, one coloured row per attempt) and on the REST API (`GET /api/plugins/kanban/tasks/:id` returns a `runs[]` array). `PATCH /api/plugins/kanban/tasks/:id` with `{status: "done", summary, metadata}` forwards both to the kernel, so the dashboard's "mark done" button is CLI-equivalent. `task_events` rows carry the `run_id` they belong to so the UI can group them by attempt, and the `completed` event embeds the first-line summary in its payload (capped at 400 chars) so gateway notifiers can render structured handoffs without a second SQL round-trip.
**Bulk close caveat.** `hermes kanban complete a b c --summary X` is refused — structured handoff is per-run, so copy-pasting the same summary to N tasks is almost always wrong. Bulk close *without* `--summary` / `--metadata` still works for the common "I finished a pile of admin tasks" case.
**Reclaimed runs from status changes.** If you drag a running task off `running` in the dashboard (back to `ready`, or straight to `todo`), or archive a task that was still running, the in-flight run closes with `outcome='reclaimed'` rather than being orphaned. The `task_runs` row is always in a terminal state when `tasks.current_run_id` is `NULL`, and vice versa — that invariant holds across CLI, dashboard, dispatcher, and notifier.
**Synthetic runs for never-claimed completions.** Completing or blocking a task that was never claimed (e.g. a human closes a `ready` task from the dashboard with a summary, or a CLI user runs `hermes kanban complete --summary X`) would otherwise drop the handoff. Instead the kernel inserts a zero-duration run row (`started_at == ended_at`) carrying the summary / metadata / reason so attempt history stays complete. The `completed` / `blocked` event's `run_id` points at that row.
**Live drawer refresh.** When the dashboard's WebSocket event stream reports new events for the task the user is currently viewing, the drawer reloads itself (via a per-task event counter threaded into its `useEffect` dependency list). Closing and reopening is no longer required to see a run's new row or updated outcome.
### Forward compatibility
Two nullable columns on `tasks` are reserved for v2 workflow routing: `workflow_template_id` (which template this task belongs to) and `current_step_key` (which step in that template is active). The v1 kernel ignores them for routing but lets clients write them, so a v2 release can add the routing machinery without another schema migration.
## Event reference
Every transition appends a row to `task_events`. Each row carries an optional `run_id` so UIs can group events by attempt. Kinds group into three clusters so filtering is easy (`hermes kanban watch --kinds completed,gave_up,timed_out`):
**Lifecycle** (what changed about the task as a logical unit):
| Kind | Payload | When |
|---|---|---|
| `created` | `{assignee, status, parents, tenant}` | Task inserted. `run_id` is `NULL`. |
| `promoted` | — | `todo → ready` because all parents hit `done`. `run_id` is `NULL`. |
| `claimed` | `{lock, expires, run_id}` | Dispatcher atomically claimed a `ready` task for spawn. |
| `completed` | `{result_len, summary?}` | Worker wrote `--result` / `--summary` and task hit `done`. `summary` is the first-line handoff (400-char cap); full version lives on the run row. If `complete_task` is called on a never-claimed task with handoff fields, a zero-duration run is synthesized so `run_id` still points at something. |
| `blocked` | `{reason}` | Worker or human flipped the task to `blocked`. Synthesizes a zero-duration run when called on a never-claimed task with `--reason`. |
| `unblocked` | — | `blocked → ready`, either manually or via `/unblock`. `run_id` is `NULL`. |
| `archived` | — | Hidden from the default board. If the task was still running, carries the `run_id` of the run that was reclaimed as a side effect. |
**Edits** (human-driven changes that aren't transitions):
| Kind | Payload | When |
|---|---|---|
| `assigned` | `{assignee}` | Assignee changed (including unassignment). |
| `edited` | `{fields}` | Title or body updated. |
| `reprioritized` | `{priority}` | Priority changed. |
| `status` | `{status}` | Dashboard drag-drop wrote a status directly (e.g. `todo → ready`). Carries the `run_id` of the run that was reclaimed when dragging off `running`; otherwise `run_id` is NULL. |
**Worker telemetry** (about the execution process, not the logical task):
| Kind | Payload | When |
|---|---|---|
| `spawned` | `{pid}` | Dispatcher successfully started a worker process. |
| `heartbeat` | `{note?}` | Worker called `hermes kanban heartbeat $TASK` to signal liveness during long operations. |
| `reclaimed` | `{stale_lock}` | Claim TTL expired without a completion; task goes back to `ready`. |
| `crashed` | `{pid, claimer}` | Worker PID no longer alive but TTL hadn't expired yet. |
| `timed_out` | `{pid, elapsed_seconds, limit_seconds, sigkill}` | `max_runtime_seconds` exceeded; dispatcher SIGTERM'd (then SIGKILL'd after 5 s grace) and re-queued. |
| `spawn_failed` | `{error, failures}` | One spawn attempt failed (missing PATH, workspace unmountable, …). Counter increments; task returns to `ready` for retry. |
| `gave_up` | `{failures, error}` | Circuit breaker fired after N consecutive `spawn_failed`. Task auto-blocks with the last error. Default N = 5; override via `--failure-limit`. |
`hermes kanban tail ` shows these for a single task. `hermes kanban watch` streams them board-wide.
## Out of scope
Kanban is deliberately single-host. `~/.hermes/kanban.db` is a local SQLite file and the dispatcher spawns workers on the same machine. Running a shared board across two hosts is not supported — there's no coordination primitive for "worker X on host A, worker Y on host B," and the crash-detection path assumes PIDs are host-local. If you need multi-host, run an independent board per host and use `delegate_task` / a message queue to bridge them.
## Design spec
The complete design — architecture, concurrency correctness, comparison with other systems, implementation plan, risks, open questions — lives in `docs/hermes-kanban-v1-spec.pdf`. Read that before filing any behavior-change PR.
---
# user-guide/features/kanban-tutorial
# Kanban tutorial
A walkthrough of the four use-cases the Hermes Kanban system was designed for, with the dashboard open in a browser. If you haven't read the [Kanban overview](./kanban) yet, start there — this assumes you know what a task, run, assignee, and dispatcher are.
## Setup
```bash
hermes kanban init # optional; first `hermes kanban ` auto-inits
hermes dashboard # opens http://127.0.0.1:9119 in your browser
# click Kanban in the left nav
```
The dashboard is the most comfortable place for **you** to watch the system. Agent workers the dispatcher spawns never see the dashboard or the CLI — they drive the board through a dedicated `kanban_*` [toolset](./kanban#how-workers-interact-with-the-board) (`kanban_show`, `kanban_complete`, `kanban_block`, `kanban_heartbeat`, `kanban_comment`, `kanban_create`, `kanban_link`). All three surfaces — dashboard, CLI, worker tools — route through the same per-board SQLite DB (`~/.hermes/kanban.db` for the default board, `~/.hermes/kanban/boards//kanban.db` for any board you create later), so each board is consistent no matter which side of the fence a change came from.
This tutorial uses the `default` board throughout. If you want multiple isolated queues (one per project / repo / domain), see [Boards (multi-project)](./kanban#boards-multi-project) in the overview — the same CLI / dashboard / worker flows apply per board, and workers physically cannot see tasks on other boards.
Throughout the tutorial, **code blocks labelled `bash` are commands *you* run.** Code blocks labelled `# worker tool calls` are what the spawned worker's model emits as tool calls — shown here so you can see the loop end-to-end, not because you'd ever run them yourself.
## The board at a glance

Six columns, left to right:
- **Triage** — raw ideas, a specifier will flesh out the spec before anyone works on them.
- **Todo** — created but waiting on dependencies, or not yet assigned.
- **Ready** — assigned and waiting for the dispatcher to claim.
- **In progress** — a worker is actively running the task. With "Lanes by profile" on (the default), this column sub-groups by assignee so you can see at a glance what each worker is doing.
- **Blocked** — a worker asked for human input, or the circuit breaker tripped.
- **Done** — completed.
The top bar has filters for search, tenant, and assignee, plus a `Lanes by profile` toggle and a `Nudge dispatcher` button that runs one dispatch tick right now instead of waiting for the daemon's next interval. Clicking any card opens its drawer on the right.
### Flat view
If the profile lanes are noisy, toggle "Lanes by profile" off and the In Progress column collapses to a single flat list ordered by claim time:

## Story 1 — Solo dev shipping a feature
You're building a feature. Classic flow: design a schema, implement the API, write the tests. Three tasks with parent→child dependencies.
```bash
SCHEMA=$(hermes kanban create "Design auth schema" \
--assignee backend-dev --tenant auth-project --priority 2 \
--body "Design the user/session/token schema for the auth module." \
--json | jq -r .id)
API=$(hermes kanban create "Implement auth API endpoints" \
--assignee backend-dev --tenant auth-project --priority 2 \
--parent $SCHEMA \
--body "POST /register, POST /login, POST /refresh, POST /logout." \
--json | jq -r .id)
hermes kanban create "Write auth integration tests" \
--assignee qa-dev --tenant auth-project --priority 2 \
--parent $API \
--body "Cover happy path, wrong password, expired token, concurrent refresh."
```
Because `API` has `SCHEMA` as its parent, and `tests` has `API` as its parent, only `SCHEMA` starts in `ready`. The other two sit in `todo` until their parents complete. This is the dependency promotion engine doing its job — no other worker will pick up the test-writing until there's an API to test.
On the next dispatcher tick (60s by default, or immediately if you hit **Nudge dispatcher**) the `backend-dev` profile spawns as a worker with `HERMES_KANBAN_TASK=$SCHEMA` in its env. Here's what the worker's tool-call loop looks like from inside the agent:
```python
# worker tool calls — NOT commands you run
kanban_show()
# → returns title, body, worker_context, parents, prior attempts, comments
# (worker reads worker_context, uses terminal/file tools to design the schema,
# write migrations, run its own checks, commit — the real work happens here)
kanban_heartbeat(note="schema drafted, writing migrations now")
kanban_complete(
summary="users(id, email, pw_hash), sessions(id, user_id, jti, expires_at); "
"refresh tokens stored as sessions with type='refresh'",
metadata={
"changed_files": ["migrations/001_users.sql", "migrations/002_sessions.sql"],
"decisions": ["bcrypt for hashing", "JWT for session tokens",
"7-day refresh, 15-min access"],
},
)
```
`kanban_show` defaults `task_id` to `$HERMES_KANBAN_TASK`, so the worker doesn't need to know its own id. `kanban_complete` writes the summary + metadata onto the current `task_runs` row, closes that run, and transitions the task to `done` — all in one atomic hop through `kanban_db`.
When `SCHEMA` hits `done`, the dependency engine promotes `API` to `ready` automatically. The API worker, when it picks up, will call `kanban_show()` and see `SCHEMA`'s summary and metadata attached to the parent handoff — so it knows the schema decisions without re-reading a long design doc.
Click the completed schema task on the board and the drawer shows everything:

The Run History section at the bottom is the key addition. One attempt: outcome `completed`, worker `@backend-dev`, duration, timestamp, and the handoff summary in full. The metadata blob (`changed_files`, `decisions`) is stored on the run too and surfaced to any downstream worker that reads this parent.
You can inspect the same data from your terminal at any time — these commands are **you** peeking at the board, not the worker:
```bash
hermes kanban show $SCHEMA
hermes kanban runs $SCHEMA
# # OUTCOME PROFILE ELAPSED STARTED
# 1 completed backend-dev 0s 2026-04-27 19:34
# → users(id, email, pw_hash), sessions(id, user_id, jti, expires_at); refresh tokens ...
```
## Story 2 — Fleet farming
You have three workers (a translator, a transcriber, a copywriter) and a pile of independent tasks. You want all three pulling in parallel and making visible progress. This is the simplest kanban use-case and the one the original design optimized for.
Create the work:
```bash
for lang in Spanish French German; do
hermes kanban create "Translate homepage to $lang" \
--assignee translator --tenant content-ops
done
for i in 1 2 3 4 5; do
hermes kanban create "Transcribe Q3 customer call #$i" \
--assignee transcriber --tenant content-ops
done
for sku in 1001 1002 1003 1004; do
hermes kanban create "Generate product description: SKU-$sku" \
--assignee copywriter --tenant content-ops
done
```
Start the gateway and walk away — it hosts the embedded dispatcher
that picks up all three specialist profiles' tasks on the same
kanban.db:
```bash
hermes gateway start
```
Now filter the board to `content-ops` (or just search for "Transcribe") and you get this:

Two transcribes done, one running, two ready waiting for the next dispatcher tick. The In Progress column is grouped by profile (the "Lanes by profile" default) so you see each worker's active task without scanning a mixed list. The dispatcher will promote the next ready task to running as soon as the current one completes. With three daemons working on three assignee pools in parallel, the whole content queue drains without further human input.
**Everything Story 1 said about structured handoff still applies here.** A translator worker completing a call emits `kanban_complete(summary="translated 4 pages, style matched existing marketing voice", metadata={"duration_seconds": 720, "tokens_used": 2100})` — useful for analytics and for any downstream task that depends on this one.
## Story 3 — Role pipeline with retry
This is where Kanban earns its keep over a flat TODO list. A PM writes a spec. An engineer implements it. A reviewer rejects the first attempt. The engineer tries again with changes. The reviewer approves.
The dashboard view, filtered by `auth-project`:

Three-stage chain visible at once: `Spec: password reset flow` (DONE, pm), `Implement password reset flow` (DONE, backend-dev), `Review password reset PR` (READY, reviewer). Each has its parent in green at the bottom and children as dependencies.
The interesting one is the implementation task, because it was blocked and retried. Here's the full three-agent choreography, shown as the tool calls each worker's model makes:
```python
# --- PM worker spawns on $SPEC and writes the acceptance criteria ---
# worker tool calls
kanban_show()
kanban_complete(
summary="spec approved; POST /forgot-password sends email, "
"GET /reset/:token renders form, POST /reset applies new password",
metadata={"acceptance": [
"expired token returns 410",
"reused last-3 password returns 400 with message",
"successful reset invalidates all active sessions",
]},
)
# → $SPEC is done; $IMPL auto-promotes from todo to ready
# --- Engineer worker spawns on $IMPL (first attempt) ---
# worker tool calls
kanban_show() # reads $SPEC's summary + acceptance metadata in worker_context
# (engineer writes code, runs tests, opens PR)
# Reviewer feedback arrives — engineer decides the concerns are valid and blocks
kanban_block(
reason="Review: password strength check missing, reset link isn't "
"single-use (can be replayed within 30min)",
)
# → $IMPL transitions to blocked; run 1 closes with outcome='blocked'
```
Now you (the human, or a separate reviewer profile) read the block reason, decide the fix direction is clear, and unblock from the dashboard's "Unblock" button — or from the CLI / slash command:
```bash
hermes kanban unblock $IMPL
# or from a chat: /kanban unblock $IMPL
```
The dispatcher promotes `$IMPL` back to `ready` and, on the next tick, respawns the `backend-dev` worker. This second spawn is a **new run** on the same task:
```python
# --- Engineer worker spawns on $IMPL (second attempt) ---
# worker tool calls
kanban_show()
# → worker_context now includes the run 1 block reason, so this worker knows
# which two things to fix instead of re-reading the whole spec
# (engineer adds zxcvbn check, makes reset tokens single-use, re-runs tests)
kanban_complete(
summary="added zxcvbn strength check, reset tokens are now single-use "
"(stored + deleted on success)",
metadata={
"changed_files": [
"auth/reset.py",
"auth/tests/test_reset.py",
"migrations/003_single_use_reset_tokens.sql",
],
"tests_run": 11,
"review_iteration": 2,
},
)
```
Click the implementation task. The drawer shows **two attempts**:

- **Run 1** — `blocked` by `@backend-dev`. The review feedback sits right under the outcome: "password strength check missing, reset link isn't single-use (can be replayed within 30min)".
- **Run 2** — `completed` by `@backend-dev`. Fresh summary, fresh metadata.
Each run is a row in `task_runs` with its own outcome, summary, and metadata. Retry history is not a conceptual afterthought layered on top of a "latest state" task — it's the primary representation. When a retrying worker opens the task, `build_worker_context` shows it the prior attempts, so the second-pass worker sees why the first pass was blocked and addresses those specific findings instead of re-running from scratch.
The reviewer picks up next. When they open `Review password reset PR`, they see:

The parent link is the completed implementation. When the reviewer's worker spawns on `Review password reset PR` and calls `kanban_show()`, the returned `worker_context` includes the parent's most-recent-completed-run summary + metadata — so the reviewer reads "added zxcvbn strength check, reset tokens are now single-use" and has the list of changed files in hand before looking at a diff.
## Story 4 — Circuit breaker and crash recovery
Real workers fail. Missing credentials, OOM kills, transient network errors. The dispatcher has two lines of defense: a **circuit breaker** that auto-blocks after N consecutive failures so the board doesn't thrash forever, and **crash detection** that reclaims a task whose worker PID went away before its TTL expired.
### Circuit breaker — permanent-looking failure
A deploy task that can't spawn its worker because `AWS_ACCESS_KEY_ID` isn't set in the profile's environment:
```bash
hermes kanban create "Deploy to staging (missing creds)" \
--assignee deploy-bot --tenant ops
```
The dispatcher tries to spawn the worker. Spawn fails (`RuntimeError: AWS_ACCESS_KEY_ID not set`). The dispatcher releases the claim, increments a failure counter, and tries again next tick. After three consecutive failures (the default `failure_limit`), the circuit trips: the task goes to `blocked` with outcome `gave_up`. No more retries until a human unblocks it.
Click the blocked task:

Three runs, all with the same error on the `error` field. The first two are `spawn_failed` (retryable), the third is `gave_up` (terminal). The event log above shows the full sequence: `created → claimed → spawn_failed → claimed → spawn_failed → claimed → gave_up`.
On the terminal:
```bash
hermes kanban runs t_ef5d
# # OUTCOME PROFILE ELAPSED STARTED
# 1 spawn_failed deploy-bot 0s 2026-04-27 19:34
# ! AWS_ACCESS_KEY_ID not set in deploy-bot env
# 2 spawn_failed deploy-bot 0s 2026-04-27 19:34
# ! AWS_ACCESS_KEY_ID not set in deploy-bot env
# 3 gave_up deploy-bot 0s 2026-04-27 19:34
# ! AWS_ACCESS_KEY_ID not set in deploy-bot env
```
If Telegram / Discord / Slack is wired in, a gateway notification fires on the `gave_up` event so you hear about the outage without having to check the board.
### Crash recovery — worker dies mid-flight
Sometimes the spawn succeeds but the worker process dies later — segfault, OOM, `systemctl stop`. The dispatcher polls `kill(pid, 0)` and detects the dead pid; the claim releases, the task goes back to `ready`, and the next tick gives it to a fresh worker.
The example in the seed data is a migration that was running out of memory:
```bash
# Worker claims, starts scanning 2.4M rows, OOM kills it at ~2.3M
# Dispatcher detects dead pid, releases claim, increments attempt counter
# Retry with a chunked strategy succeeds
```
The drawer shows the full two-attempt history:

Run 1 — `crashed`, with the error `OOM kill at row 2.3M (process 99999 gone)`. Run 2 — `completed`, with `"strategy": "chunked with LIMIT + WHERE id > last_id"` in its metadata. The retrying worker saw the crash of run 1 in its context and picked a safer strategy; the metadata makes it obvious to a future observer (or postmortem writer) what changed.
## Structured handoff — why `summary` and `metadata` matter
In every story above, workers called `kanban_complete(summary=..., metadata=...)` at the end. That's not decoration — it's the primary handoff channel between stages of a workflow.
When a worker on task B is spawned and calls `kanban_show()`, the `worker_context` it gets back includes:
- B's **prior attempts** (previous runs: outcome, summary, error, metadata) so a retrying worker doesn't repeat a failed path.
- **Parent task results** — for each parent, the most-recent completed run's summary and metadata — so downstream workers see why and how the upstream work was done.
This replaces the "dig through comments and the work output" dance that plagues flat kanban systems. A PM writes acceptance criteria in the spec's metadata, and the engineer's worker sees them structurally in the parent handoff. An engineer records which tests they ran and how many passed, and the reviewer's worker has that list in hand before opening a diff.
The bulk-close guard exists because this data is per-run. `hermes kanban complete a b c --summary X` (you, from the CLI) is refused — copy-pasting the same summary to three tasks is almost always wrong. Bulk close without the handoff flags still works for the common "I finished a pile of admin tasks" case. The tool surface doesn't expose a bulk variant at all; `kanban_complete` is always single-task-at-a-time for the same reason.
## Inspecting a task currently running
For completeness — here's the drawer of a task still in flight (the API implementation from Story 1, claimed by `backend-dev` but not yet complete):

Status is `Running`. The active run appears in the Run History section with outcome `active` and no `ended_at`. If this worker dies or times out, the dispatcher closes this run with the appropriate outcome and opens a new one on the next claim — the attempt row never disappears.
## Next steps
- [Kanban overview](./kanban) — the full data model, event vocabulary, and CLI reference.
- `hermes kanban --help` — every subcommand, every flag.
- `hermes kanban watch --kinds completed,gave_up,timed_out` — live stream terminal events across the whole board.
- `hermes kanban notify-subscribe --platform telegram --chat-id ` — get a gateway ping when a specific task finishes.
---
# Persistent Goals
# Persistent Goals (`/goal`)
`/goal` gives Hermes a standing objective that survives across turns. After every turn a lightweight judge model checks whether the goal is satisfied by the assistant's last response. If not, Hermes automatically feeds a continuation prompt back into the same session and keeps working — until the goal is achieved, you pause or clear it, or the turn budget runs out.
It's our take on the **Ralph loop**, directly inspired by [Codex CLI 0.128.0's `/goal`](https://github.com/openai/codex) by Eric Traut (OpenAI). The core idea — keep a goal alive across turns and don't stop until it's achieved — is theirs. The implementation here is independent and adapted to Hermes' architecture.
## When to use it
Use `/goal` for tasks where you want Hermes to iterate on its own without you re-prompting every turn:
- "Fix every lint error in `src/` and verify `ruff check` passes"
- "Port feature X from repo Y, including tests, and get CI green"
- "Investigate why session IDs sometimes drift on mid-run compression and write up a report"
- "Build a small CLI to rename files by their EXIF dates, then test it against the photos/ folder"
Tasks where the agent does one turn and stops don't need `/goal`. Tasks where *you'd otherwise have to say "keep going" three times* are where this shines.
## Quick start
```
/goal Fix every failing test in tests/hermes_cli/ and make sure scripts/run_tests.sh passes for that directory
```
What you'll see:
1. **Goal accepted** — `⊙ Goal set (20-turn budget): `
2. **Turn 1 runs** — Hermes starts working as if you'd sent the goal as a normal message.
3. **Judge runs** — after the turn, the judge model decides `done` or `continue`.
4. **Loop fires if needed** — if `continue`, you'll see `↻ Continuing toward goal (1/20): ` and Hermes takes the next step automatically.
5. **Terminates** — eventually you see either `✓ Goal achieved: ` or `⏸ Goal paused — N/20 turns used`.
## Commands
| Command | What it does |
|---|---|
| `/goal ` | Set (or replace) the standing goal. Kicks off the first turn immediately so you don't need to send a separate message. |
| `/goal` or `/goal status` | Show the current goal, its status, and turns used. |
| `/goal pause` | Stop the auto-continuation loop without clearing the goal. |
| `/goal resume` | Resume the loop (resets the turn counter back to zero). |
| `/goal clear` | Drop the goal entirely. |
Works identically on the CLI and every gateway platform (Telegram, Discord, Slack, Matrix, Signal, WhatsApp, SMS, iMessage, Webhook, API server, and the web dashboard).
## Behavior details
### The judge
After every turn, Hermes calls an auxiliary model with:
- The standing goal text
- The agent's most recent final response (last ~4 KB of text)
- A system prompt telling the judge to reply with strict JSON: `{"done": , "reason": ""}`
The judge is deliberately conservative: it marks a goal `done` only when the response **explicitly** confirms the goal is complete, when the final deliverable is clearly produced, or when the goal is unachievable/blocked (treated as DONE with a block reason so we don't burn budget on impossible tasks).
### Fail-open semantics
If the judge errors (network blip, malformed response, unavailable aux client), Hermes treats the verdict as `continue` — a broken judge never wedges progress. The **turn budget** is the real backstop.
### Turn budget
Default is 20 continuation turns (`goals.max_turns` in `config.yaml`). When the budget is hit, Hermes auto-pauses and tells you exactly how to proceed:
```
⏸ Goal paused — 20/20 turns used. Use /goal resume to keep going, or /goal clear to stop.
```
`/goal resume` resets the counter to zero, so you can keep going in measured chunks.
### User messages always preempt
Any real message you send while a goal is active takes priority over the continuation loop. On the CLI your message lands in `_pending_input` ahead of the queued continuation; on the gateway it goes through the adapter FIFO the same way. The judge runs again after your turn — so if your message happens to complete the goal, the judge will catch it and stop.
### Mid-run safety (gateway)
While an agent is already running, `/goal status`, `/goal pause`, and `/goal clear` are safe to run — they only touch control-plane state and don't interrupt the current turn. Setting a **new** goal mid-run (`/goal `) is rejected with a message telling you to `/stop` first, so the old continuation can't race the new one.
### Persistence
Goal state lives in `SessionDB.state_meta` keyed by `goal:`. That means `/resume` picks up right where you left off — set a goal, close your laptop, come back tomorrow, `/resume`, and the goal is still standing exactly as you left it (active, paused, or done).
### Prompt cache
The continuation prompt is a plain user-role message appended to history. It does **not** mutate the system prompt, swap toolsets, or touch the conversation in any way that invalidates Hermes' prompt cache. Running a 20-turn goal costs the same cache-wise as 20 turns of normal conversation.
## Configuration
Add to `~/.hermes/config.yaml`:
```yaml
goals:
# Max continuation turns before Hermes auto-pauses and asks you to
# /goal resume. Default 20. Lower this if you want tighter loops;
# raise it for long-running refactors.
max_turns: 20
```
### Choosing the judge model
The judge uses the `goal_judge` auxiliary task. By default it resolves to your main model (see [Auxiliary Models](/docs/user-guide/configuration#auxiliary-models)). If you want to route the judge to a cheap fast model to keep costs down, add an override:
```yaml
auxiliary:
goal_judge:
provider: openrouter
model: google/gemini-3-flash-preview
```
The judge call is small (~200 output tokens) and runs once per turn, so a cheap fast model is usually the right call.
## Example walkthrough
```
You: /goal Create four files /tmp/note_{1..4}.txt, one per turn, each containing its number as text
⊙ Goal set (20-turn budget): Create four files /tmp/note_{1..4}.txt, one per turn, each containing its number as text
Hermes: Creating /tmp/note_1.txt now.
💻 echo "1" > /tmp/note_1.txt (0.1s)
I've created /tmp/note_1.txt with the content "1". I'll continue with the remaining files on the next turn as you specified.
↻ Continuing toward goal (1/20): Only 1 of 4 files has been created; 3 files remain.
Hermes: [Continuing toward your standing goal]
💻 echo "2" > /tmp/note_2.txt (0.1s)
Created /tmp/note_2.txt. Two more to go.
↻ Continuing toward goal (2/20): 2 of 4 files created; 2 remain.
Hermes: [Continuing toward your standing goal]
💻 echo "3" > /tmp/note_3.txt (0.1s)
Created /tmp/note_3.txt.
↻ Continuing toward goal (3/20): 3 of 4 files created; 1 remains.
Hermes: [Continuing toward your standing goal]
💻 echo "4" > /tmp/note_4.txt (0.1s)
All four files have been created: /tmp/note_1.txt through /tmp/note_4.txt, each containing its number.
✓ Goal achieved: All four files were created with the specified content, completing the goal.
You: _
```
Four turns, one `/goal` invocation, zero "keep going" prompts from you.
## When the judge gets it wrong
No judge is perfect. Two failure modes to watch for:
**False negative — judge says continue when the goal is actually done.** The turn budget catches this. You'll see `⏸ Goal paused` and can `/goal clear` or just send a new message.
**False positive — judge says done when work remains.** You'll see `✓ Goal achieved` but you know better. Send a follow-up message to continue, or re-set the goal more precisely: `/goal `. The judge's system prompt is deliberately conservative to make false positives rarer than false negatives.
If you find a judge verdict unconvincing, the reason text in the `↻ Continuing toward goal` or `✓ Goal achieved` line tells you exactly what the judge saw. That's usually enough to diagnose whether the goal text was ambiguous or the model's response was.
## Attribution
`/goal` is Hermes' take on the **Ralph loop** pattern. The user-facing design — keep a goal alive across turns, don't stop until it's achieved, with create/pause/resume/clear controls — was popularised and shipped in [Codex CLI 0.128.0](https://github.com/openai/codex) by Eric Traut on OpenAI's Codex team. Our implementation is independent (central `CommandDef` registry, `SessionDB.state_meta` persistence, auxiliary-client judge, adapter-FIFO continuation on the gateway side) but the idea is theirs. Credit where credit's due.
---
# Code Execution
# Code Execution (Programmatic Tool Calling)
The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a child process on the agent host, communicating with Hermes over a Unix domain socket RPC.
## How It Works
1. The agent writes a Python script using `from hermes_tools import ...`
2. Hermes generates a `hermes_tools.py` stub module with RPC functions
3. Hermes opens a Unix domain socket and starts an RPC listener thread
4. The script runs in a child process — tool calls travel over the socket back to Hermes
5. Only the script's `print()` output is returned to the LLM; intermediate tool results never enter the context window
```python
# The agent can write scripts like:
from hermes_tools import web_search, web_extract
results = web_search("Python 3.13 features", limit=5)
for r in results["data"]["web"]:
content = web_extract([r["url"]])
# ... filter and process ...
print(summary)
```
**Available tools inside scripts:** `web_search`, `web_extract`, `read_file`, `write_file`, `search_files`, `patch`, `terminal` (foreground only).
## When the Agent Uses This
The agent uses `execute_code` when there are:
- **3+ tool calls** with processing logic between them
- Bulk data filtering or conditional branching
- Loops over results
The key benefit: intermediate tool results never enter the context window — only the final `print()` output comes back, dramatically reducing token usage.
## Practical Examples
### Data Processing Pipeline
```python
from hermes_tools import search_files, read_file
import json
# Find all config files and extract database settings
matches = search_files("database", path=".", file_glob="*.yaml", limit=20)
configs = []
for match in matches.get("matches", []):
content = read_file(match["path"])
configs.append({"file": match["path"], "preview": content["content"][:200]})
print(json.dumps(configs, indent=2))
```
### Multi-Step Web Research
```python
from hermes_tools import web_search, web_extract
import json
# Search, extract, and summarize in one turn
results = web_search("Rust async runtime comparison 2025", limit=5)
summaries = []
for r in results["data"]["web"]:
page = web_extract([r["url"]])
for p in page.get("results", []):
if p.get("content"):
summaries.append({
"title": r["title"],
"url": r["url"],
"excerpt": p["content"][:500]
})
print(json.dumps(summaries, indent=2))
```
### Bulk File Refactoring
```python
from hermes_tools import search_files, read_file, patch
# Find all Python files using deprecated API and fix them
matches = search_files("old_api_call", path="src/", file_glob="*.py")
fixed = 0
for match in matches.get("matches", []):
result = patch(
path=match["path"],
old_string="old_api_call(",
new_string="new_api_call(",
replace_all=True
)
if "error" not in str(result):
fixed += 1
print(f"Fixed {fixed} files out of {len(matches.get('matches', []))} matches")
```
### Build and Test Pipeline
```python
from hermes_tools import terminal, read_file
import json
# Run tests, parse results, and report
result = terminal("cd /project && python -m pytest --tb=short -q 2>&1", timeout=120)
output = result.get("output", "")
# Parse test output
passed = output.count(" passed")
failed = output.count(" failed")
errors = output.count(" error")
report = {
"passed": passed,
"failed": failed,
"errors": errors,
"exit_code": result.get("exit_code", -1),
"summary": output[-500:] if len(output) > 500 else output
}
print(json.dumps(report, indent=2))
```
## Execution Mode
`execute_code` has two execution modes controlled by `code_execution.mode` in `~/.hermes/config.yaml`:
| Mode | Working directory | Python interpreter |
|------|-------------------|--------------------|
| **`project`** (default) | The session's working directory (same as `terminal()`) | Active `VIRTUAL_ENV` / `CONDA_PREFIX` python, falling back to Hermes's own python |
| `strict` | A temp staging directory isolated from the user's project | `sys.executable` (Hermes's own python) |
**When to leave it on `project`:** you want `import pandas`, `from my_project import foo`, or relative paths like `open(".env")` to work the same way they do in `terminal()`. This is almost always what you want.
**When to flip to `strict`:** you need maximum reproducibility — you want the same interpreter every session regardless of which venv the user activated, and you want scripts quarantined from the project tree (no risk of accidentally reading project files through a relative path).
```yaml
# ~/.hermes/config.yaml
code_execution:
mode: project # or "strict"
```
Fallback behavior in `project` mode: if `VIRTUAL_ENV` / `CONDA_PREFIX` is unset, broken, or points at a Python older than 3.8, the resolver falls back cleanly to `sys.executable` — it never leaves the agent without a working interpreter.
Security-critical invariants are identical across both modes:
- environment scrubbing (API keys, tokens, credentials stripped)
- tool whitelist (scripts cannot call `execute_code` recursively, `delegate_task`, or MCP tools)
- resource limits (timeout, stdout cap, tool-call cap)
Switching mode changes where scripts run and which interpreter runs them, not what credentials they can see or which tools they can call.
## Resource Limits
| Resource | Limit | Notes |
|----------|-------|-------|
| **Timeout** | 5 minutes (300s) | Script is killed with SIGTERM, then SIGKILL after 5s grace |
| **Stdout** | 50 KB | Output truncated with `[output truncated at 50KB]` notice |
| **Stderr** | 10 KB | Included in output on non-zero exit for debugging |
| **Tool calls** | 50 per execution | Error returned when limit reached |
All limits are configurable via `config.yaml`:
```yaml
# In ~/.hermes/config.yaml
code_execution:
mode: project # project (default) | strict
timeout: 300 # Max seconds per script (default: 300)
max_tool_calls: 50 # Max tool calls per execution (default: 50)
```
## How Tool Calls Work Inside Scripts
When your script calls a function like `web_search("query")`:
1. The call is serialized to JSON and sent over a Unix domain socket to the parent process
2. The parent dispatches through the standard `handle_function_call` handler
3. The result is sent back over the socket
4. The function returns the parsed result
This means tool calls inside scripts behave identically to normal tool calls — same rate limits, same error handling, same capabilities. The only restriction is that `terminal()` is foreground-only (no `background` or `pty` parameters).
## Error Handling
When a script fails, the agent receives structured error information:
- **Non-zero exit code**: stderr is included in the output so the agent sees the full traceback
- **Timeout**: Script is killed and the agent sees `"Script timed out after 300s and was killed."`
- **Interruption**: If the user sends a new message during execution, the script is terminated and the agent sees `[execution interrupted — user sent a new message]`
- **Tool call limit**: When the 50-call limit is hit, subsequent tool calls return an error message
The response always includes `status` (success/error/timeout/interrupted), `output`, `tool_calls_made`, and `duration_seconds`.
## Security
:::danger Security Model
The child process runs with a **minimal environment**. API keys, tokens, and credentials are stripped by default. The script accesses tools exclusively via the RPC channel — it cannot read secrets from environment variables unless explicitly allowed.
:::
Environment variables containing `KEY`, `TOKEN`, `SECRET`, `PASSWORD`, `CREDENTIAL`, `PASSWD`, or `AUTH` in their names are excluded. Only safe system variables (`PATH`, `HOME`, `LANG`, `SHELL`, `PYTHONPATH`, `VIRTUAL_ENV`, etc.) are passed through.
### Skill Environment Variable Passthrough
When a skill declares `required_environment_variables` in its frontmatter, those variables are **automatically passed through** to both `execute_code` and `terminal` child processes after the skill is loaded. This lets skills use their declared API keys without weakening the security posture for arbitrary code.
For non-skill use cases, you can explicitly allowlist variables in `config.yaml`:
```yaml
terminal:
env_passthrough:
- MY_CUSTOM_KEY
- ANOTHER_TOKEN
```
See the [Security guide](/docs/user-guide/security#environment-variable-passthrough) for full details.
Hermes always writes the script and the auto-generated `hermes_tools.py` RPC stub into a temp staging directory that is cleaned up after execution. In `strict` mode the script also *runs* there; in `project` mode it runs in the session's working directory (the staging directory stays on `PYTHONPATH` so imports still resolve). The child process runs in its own process group so it can be cleanly killed on timeout or interruption.
## execute_code vs terminal
| Use Case | execute_code | terminal |
|----------|-------------|----------|
| Multi-step workflows with tool calls between | ✅ | ❌ |
| Simple shell command | ❌ | ✅ |
| Filtering/processing large tool outputs | ✅ | ❌ |
| Running a build or test suite | ❌ | ✅ |
| Looping over search results | ✅ | ❌ |
| Interactive/background processes | ❌ | ✅ |
| Needs API keys in environment | ⚠️ Only via [passthrough](/docs/user-guide/security#environment-variable-passthrough) | ✅ (most pass through) |
**Rule of thumb:** Use `execute_code` when you need to call Hermes tools programmatically with logic between calls. Use `terminal` for running shell commands, builds, and processes.
## Platform Support
Code execution requires Unix domain sockets and is available on **Linux and macOS only**. It is automatically disabled on Windows — the agent falls back to regular sequential tool calls.
---
# Event Hooks
# Event Hooks
Hermes has three hook systems that run custom code at key lifecycle points:
| System | Registered via | Runs in | Use case |
|--------|---------------|---------|----------|
| **[Gateway hooks](#gateway-event-hooks)** | `HOOK.yaml` + `handler.py` in `~/.hermes/hooks/` | Gateway only | Logging, alerts, webhooks |
| **[Plugin hooks](#plugin-hooks)** | `ctx.register_hook()` in a [plugin](/docs/user-guide/features/plugins) | CLI + Gateway | Tool interception, metrics, guardrails |
| **[Shell hooks](#shell-hooks)** | `hooks:` block in `~/.hermes/config.yaml` pointing at shell scripts | CLI + Gateway | Drop-in scripts for blocking, auto-formatting, context injection |
All three systems are non-blocking — errors in any hook are caught and logged, never crashing the agent.
## Gateway Event Hooks
Gateway hooks fire automatically during gateway operation (Telegram, Discord, Slack, WhatsApp, Teams) without blocking the main agent pipeline.
### Creating a Hook
Each hook is a directory under `~/.hermes/hooks/` containing two files:
```text
~/.hermes/hooks/
└── my-hook/
├── HOOK.yaml # Declares which events to listen for
└── handler.py # Python handler function
```
#### HOOK.yaml
```yaml
name: my-hook
description: Log all agent activity to a file
events:
- agent:start
- agent:end
- agent:step
```
The `events` list determines which events trigger your handler. You can subscribe to any combination of events, including wildcards like `command:*`.
#### handler.py
```python
import json
from datetime import datetime
from pathlib import Path
LOG_FILE = Path.home() / ".hermes" / "hooks" / "my-hook" / "activity.log"
async def handle(event_type: str, context: dict):
"""Called for each subscribed event. Must be named 'handle'."""
entry = {
"timestamp": datetime.now().isoformat(),
"event": event_type,
**context,
}
with open(LOG_FILE, "a") as f:
f.write(json.dumps(entry) + "\n")
```
**Handler rules:**
- Must be named `handle`
- Receives `event_type` (string) and `context` (dict)
- Can be `async def` or regular `def` — both work
- Errors are caught and logged, never crashing the agent
### Available Events
| Event | When it fires | Context keys |
|-------|---------------|--------------|
| `gateway:startup` | Gateway process starts | `platforms` (list of active platform names) |
| `session:start` | New messaging session created | `platform`, `user_id`, `session_id`, `session_key` |
| `session:end` | Session ended (before reset) | `platform`, `user_id`, `session_key` |
| `session:reset` | User ran `/new` or `/reset` | `platform`, `user_id`, `session_key` |
| `agent:start` | Agent begins processing a message | `platform`, `user_id`, `session_id`, `message` |
| `agent:step` | Each iteration of the tool-calling loop | `platform`, `user_id`, `session_id`, `iteration`, `tool_names` |
| `agent:end` | Agent finishes processing | `platform`, `user_id`, `session_id`, `message`, `response` |
| `command:*` | Any slash command executed | `platform`, `user_id`, `command`, `args` |
#### Wildcard Matching
Handlers registered for `command:*` fire for any `command:` event (`command:model`, `command:reset`, etc.). Monitor all slash commands with a single subscription.
### Examples
#### Telegram Alert on Long Tasks
Send yourself a message when the agent takes more than 10 steps:
```yaml
# ~/.hermes/hooks/long-task-alert/HOOK.yaml
name: long-task-alert
description: Alert when agent is taking many steps
events:
- agent:step
```
```python
# ~/.hermes/hooks/long-task-alert/handler.py
import os
import httpx
THRESHOLD = 10
BOT_TOKEN = os.getenv("TELEGRAM_BOT_TOKEN")
CHAT_ID = os.getenv("TELEGRAM_HOME_CHANNEL")
async def handle(event_type: str, context: dict):
iteration = context.get("iteration", 0)
if iteration == THRESHOLD and BOT_TOKEN and CHAT_ID:
tools = ", ".join(context.get("tool_names", []))
text = f"⚠️ Agent has been running for {iteration} steps. Last tools: {tools}"
async with httpx.AsyncClient() as client:
await client.post(
f"https://api.telegram.org/bot{BOT_TOKEN}/sendMessage",
json={"chat_id": CHAT_ID, "text": text},
)
```
#### Command Usage Logger
Track which slash commands are used:
```yaml
# ~/.hermes/hooks/command-logger/HOOK.yaml
name: command-logger
description: Log slash command usage
events:
- command:*
```
```python
# ~/.hermes/hooks/command-logger/handler.py
import json
from datetime import datetime
from pathlib import Path
LOG = Path.home() / ".hermes" / "logs" / "command_usage.jsonl"
def handle(event_type: str, context: dict):
LOG.parent.mkdir(parents=True, exist_ok=True)
entry = {
"ts": datetime.now().isoformat(),
"command": context.get("command"),
"args": context.get("args"),
"platform": context.get("platform"),
"user": context.get("user_id"),
}
with open(LOG, "a") as f:
f.write(json.dumps(entry) + "\n")
```
#### Session Start Webhook
POST to an external service on new sessions:
```yaml
# ~/.hermes/hooks/session-webhook/HOOK.yaml
name: session-webhook
description: Notify external service on new sessions
events:
- session:start
- session:reset
```
```python
# ~/.hermes/hooks/session-webhook/handler.py
import httpx
WEBHOOK_URL = "https://your-service.example.com/hermes-events"
async def handle(event_type: str, context: dict):
async with httpx.AsyncClient() as client:
await client.post(WEBHOOK_URL, json={
"event": event_type,
**context,
}, timeout=5)
```
### Tutorial: BOOT.md — Run a Startup Checklist on Every Gateway Boot
A popular pattern from the community: drop a Markdown checklist at `~/.hermes/BOOT.md`, and have the agent run it once every time the gateway starts. Useful for "on every boot, check overnight cron failures and ping me on Discord if anything failed," or "summarize the last 24h of deploy.log and post it to Slack #ops."
This tutorial shows how to build it yourself as a user-defined hook. Hermes does not ship a built-in BOOT.md hook — you wire up exactly the behavior you want.
#### What we're building
1. A file at `~/.hermes/BOOT.md` with natural-language startup instructions.
2. A gateway hook that fires on `gateway:startup`, spawns a one-shot agent with your gateway's resolved model/credentials, and runs the BOOT.md instructions.
3. A `[SILENT]` convention so the agent can opt out of sending a message when there's nothing to report.
#### Step 1: Write your checklist
Create `~/.hermes/BOOT.md`. Write it as if you were giving instructions to a human assistant:
```markdown
# Startup Checklist
1. Run `hermes cron list` and check if any scheduled jobs failed overnight.
2. If any failed, send a summary to Discord #ops using the `send_message` tool.
3. Check if `/opt/app/deploy.log` has any ERROR lines from the last 24 hours. If yes, summarize them and include in the same Discord message.
4. If nothing went wrong, reply with only `[SILENT]` so no message is sent.
```
The agent sees this as part of its prompt, so anything you can describe in plain language works — tool calls, shell commands, sending messages, summarizing files.
#### Step 2: Create the hook
```text
~/.hermes/hooks/boot-md/
├── HOOK.yaml
└── handler.py
```
**`~/.hermes/hooks/boot-md/HOOK.yaml`**
```yaml
name: boot-md
description: Run ~/.hermes/BOOT.md on gateway startup
events:
- gateway:startup
```
**`~/.hermes/hooks/boot-md/handler.py`**
```python
"""Run ~/.hermes/BOOT.md on every gateway startup."""
import logging
import threading
from pathlib import Path
logger = logging.getLogger("hooks.boot-md")
BOOT_FILE = Path.home() / ".hermes" / "BOOT.md"
def _build_prompt(content: str) -> str:
return (
"You are running a startup boot checklist. Follow the instructions "
"below exactly.\n\n"
"---\n"
f"{content}\n"
"---\n\n"
"Execute each instruction. Use the send_message tool to deliver any "
"messages to platforms like Discord or Slack.\n"
"If nothing needs attention and there is nothing to report, reply "
"with ONLY: [SILENT]"
)
def _run_boot_agent(content: str) -> None:
"""Spawn a one-shot agent and execute the checklist.
Uses the gateway's resolved model and runtime credentials so this works
against custom endpoints, aggregators, and OAuth-based providers alike.
"""
try:
from gateway.run import _resolve_gateway_model, _resolve_runtime_agent_kwargs
from run_agent import AIAgent
agent = AIAgent(
model=_resolve_gateway_model(),
**_resolve_runtime_agent_kwargs(),
platform="gateway",
quiet_mode=True,
skip_context_files=True,
skip_memory=True,
max_iterations=20,
)
result = agent.run_conversation(_build_prompt(content))
response = result.get("final_response", "")
if response and "[SILENT]" not in response:
logger.info("boot-md completed: %s", response[:200])
else:
logger.info("boot-md completed (nothing to report)")
except Exception as e:
logger.error("boot-md agent failed: %s", e)
async def handle(event_type: str, context: dict) -> None:
if not BOOT_FILE.exists():
return
content = BOOT_FILE.read_text(encoding="utf-8").strip()
if not content:
return
logger.info("Running BOOT.md (%d chars)", len(content))
# Background thread so gateway startup isn't blocked on a full agent turn.
thread = threading.Thread(
target=_run_boot_agent,
args=(content,),
name="boot-md",
daemon=True,
)
thread.start()
```
The two key lines:
- `_resolve_gateway_model()` reads the gateway's currently-configured model.
- `_resolve_runtime_agent_kwargs()` resolves provider credentials the same way a normal gateway turn does — including API keys, base URLs, OAuth tokens, and credential pools.
Without these, a bare `AIAgent()` falls back to built-in defaults and will 401 against any non-default endpoint.
#### Step 3: Test it
Restart the gateway:
```bash
hermes gateway restart
```
Watch the logs:
```bash
hermes logs --follow --level INFO | grep boot-md
```
You should see `Running BOOT.md (N chars)` followed by either `boot-md completed: ...` (summary of what the agent did) or `boot-md completed (nothing to report)` when the agent replied `[SILENT]`.
Delete `~/.hermes/BOOT.md` to disable the checklist — the hook stays loaded but silently skips when the file isn't there.
#### Extending the pattern
- **Schedule-aware checklists:** key off `datetime.now().weekday()` inside BOOT.md's instructions ("if it's Monday, also check the weekly deploy log"). The instructions are free-form text, so anything the agent can reason about is fair game.
- **Multiple checklists:** point the hook at a different file (`STARTUP.md`, `MORNING.md`, etc.) and register separate hook directories for each.
- **Non-agent variant:** if you don't need a full agent loop, skip `AIAgent` entirely and have the handler post a fixed notification directly via `httpx`. Cheaper, faster, and has no provider dependency.
#### Why this isn't a built-in
An earlier version of Hermes shipped this as a built-in hook and silently spawned an agent with bare defaults on every gateway boot. That surprised users with custom endpoints and made the feature invisible to users who didn't know it was running. Keeping it as a documented pattern — built by you, in your hooks directory — means you see exactly what it does and opt in by writing the files.
### How It Works
1. On gateway startup, `HookRegistry.discover_and_load()` scans `~/.hermes/hooks/`
2. Each subdirectory with `HOOK.yaml` + `handler.py` is loaded dynamically
3. Handlers are registered for their declared events
4. At each lifecycle point, `hooks.emit()` fires all matching handlers
5. Errors in any handler are caught and logged — a broken hook never crashes the agent
:::info
Gateway hooks only fire in the **gateway** (Telegram, Discord, Slack, WhatsApp, Teams). The CLI does not load gateway hooks. For hooks that work everywhere, use [plugin hooks](#plugin-hooks).
:::
## Plugin Hooks
[Plugins](/docs/user-guide/features/plugins) can register hooks that fire in **both CLI and gateway** sessions. These are registered programmatically via `ctx.register_hook()` in your plugin's `register()` function.
```python
def register(ctx):
ctx.register_hook("pre_tool_call", my_tool_observer)
ctx.register_hook("post_tool_call", my_tool_logger)
ctx.register_hook("pre_llm_call", my_memory_callback)
ctx.register_hook("post_llm_call", my_sync_callback)
ctx.register_hook("on_session_start", my_init_callback)
ctx.register_hook("on_session_end", my_cleanup_callback)
```
**General rules for all hooks:**
- Callbacks receive **keyword arguments**. Always accept `**kwargs` for forward compatibility — new parameters may be added in future versions without breaking your plugin.
- If a callback **crashes**, it's logged and skipped. Other hooks and the agent continue normally. A misbehaving plugin can never break the agent.
- Two hooks' return values affect behavior: [`pre_tool_call`](#pre_tool_call) can **block** the tool, and [`pre_llm_call`](#pre_llm_call) can **inject context** into the LLM call. All other hooks are fire-and-forget observers.
### Quick reference
| Hook | Fires when | Returns |
|------|-----------|---------|
| [`pre_tool_call`](#pre_tool_call) | Before any tool executes | `{"action": "block", "message": str}` to veto the call |
| [`post_tool_call`](#post_tool_call) | After any tool returns | ignored |
| [`pre_llm_call`](#pre_llm_call) | Once per turn, before the tool-calling loop | `{"context": str}` to prepend context to the user message |
| [`post_llm_call`](#post_llm_call) | Once per turn, after the tool-calling loop | ignored |
| [`on_session_start`](#on_session_start) | New session created (first turn only) | ignored |
| [`on_session_end`](#on_session_end) | Session ends | ignored |
| [`on_session_finalize`](#on_session_finalize) | CLI/gateway tears down an active session (flush, save, stats) | ignored |
| [`on_session_reset`](#on_session_reset) | Gateway swaps in a fresh session key (e.g. `/new`, `/reset`) | ignored |
| [`subagent_stop`](#subagent_stop) | A `delegate_task` child has exited | ignored |
| [`pre_gateway_dispatch`](#pre_gateway_dispatch) | Gateway received a user message, before auth + dispatch | `{"action": "skip" \| "rewrite" \| "allow", ...}` to influence flow |
| [`pre_approval_request`](#pre_approval_request) | Dangerous command needs user approval, before the prompt/notification is sent | ignored |
| [`post_approval_response`](#post_approval_response) | User responded to an approval prompt (or it timed out) | ignored |
| [`transform_tool_result`](#transform_tool_result) | After any tool returns, before the result is handed back to the model | `str` to replace the result, `None` to leave unchanged |
| [`transform_terminal_output`](#transform_terminal_output) | Inside the `terminal` tool, before truncation/ANSI-strip/redact | `str` to replace the raw output, `None` to leave unchanged |
---
### `pre_tool_call`
Fires **immediately before** every tool execution — built-in tools and plugin tools alike.
**Callback signature:**
```python
def my_callback(tool_name: str, args: dict, task_id: str, **kwargs):
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `tool_name` | `str` | Name of the tool about to execute (e.g. `"terminal"`, `"web_search"`, `"read_file"`) |
| `args` | `dict` | The arguments the model passed to the tool |
| `task_id` | `str` | Session/task identifier. Empty string if not set. |
**Fires:** In `model_tools.py`, inside `handle_function_call()`, before the tool's handler runs. Fires once per tool call — if the model calls 3 tools in parallel, this fires 3 times.
**Return value — veto the call:**
```python
return {"action": "block", "message": "Reason the tool call was blocked"}
```
The agent short-circuits the tool with `message` as the error returned to the model. The first matching block directive wins (Python plugins registered first, then shell hooks). Any other return value is ignored, so existing observer-only callbacks keep working unchanged.
**Use cases:** Logging, audit trails, tool call counters, blocking dangerous operations, rate limiting, per-user policy enforcement.
**Example — tool call audit log:**
```python
import json, logging
from datetime import datetime
logger = logging.getLogger(__name__)
def audit_tool_call(tool_name, args, task_id, **kwargs):
logger.info("TOOL_CALL session=%s tool=%s args=%s",
task_id, tool_name, json.dumps(args)[:200])
def register(ctx):
ctx.register_hook("pre_tool_call", audit_tool_call)
```
**Example — warn on dangerous tools:**
```python
DANGEROUS = {"terminal", "write_file", "patch"}
def warn_dangerous(tool_name, **kwargs):
if tool_name in DANGEROUS:
print(f"⚠ Executing potentially dangerous tool: {tool_name}")
def register(ctx):
ctx.register_hook("pre_tool_call", warn_dangerous)
```
---
### `post_tool_call`
Fires **immediately after** every tool execution returns.
**Callback signature:**
```python
def my_callback(tool_name: str, args: dict, result: str, task_id: str,
duration_ms: int, **kwargs):
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `tool_name` | `str` | Name of the tool that just executed |
| `args` | `dict` | The arguments the model passed to the tool |
| `result` | `str` | The tool's return value (always a JSON string) |
| `task_id` | `str` | Session/task identifier. Empty string if not set. |
| `duration_ms` | `int` | How long the tool's dispatch took, in milliseconds (measured with `time.monotonic()` around `registry.dispatch()`). |
**Fires:** In `model_tools.py`, inside `handle_function_call()`, after the tool's handler returns. Fires once per tool call. Does **not** fire if the tool raised an unhandled exception (the error is caught and returned as an error JSON string instead, and `post_tool_call` fires with that error string as `result`).
**Return value:** Ignored.
**Use cases:** Logging tool results, metrics collection, tracking tool success/failure rates, latency dashboards, per-tool budget alerts, sending notifications when specific tools complete.
**Example — track tool usage metrics:**
```python
from collections import Counter, defaultdict
import json
_tool_counts = Counter()
_error_counts = Counter()
_latency_ms = defaultdict(list)
def track_metrics(tool_name, result, duration_ms=0, **kwargs):
_tool_counts[tool_name] += 1
_latency_ms[tool_name].append(duration_ms)
try:
parsed = json.loads(result)
if "error" in parsed:
_error_counts[tool_name] += 1
except (json.JSONDecodeError, TypeError):
pass
def register(ctx):
ctx.register_hook("post_tool_call", track_metrics)
```
---
### `pre_llm_call`
Fires **once per turn**, before the tool-calling loop begins. This is the **only hook whose return value is used** — it can inject context into the current turn's user message.
**Callback signature:**
```python
def my_callback(session_id: str, user_message: str, conversation_history: list,
is_first_turn: bool, model: str, platform: str, **kwargs):
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `session_id` | `str` | Unique identifier for the current session |
| `user_message` | `str` | The user's original message for this turn (before any skill injection) |
| `conversation_history` | `list` | Copy of the full message list (OpenAI format: `[{"role": "user", "content": "..."}]`) |
| `is_first_turn` | `bool` | `True` if this is the first turn of a new session, `False` on subsequent turns |
| `model` | `str` | The model identifier (e.g. `"anthropic/claude-sonnet-4.6"`) |
| `platform` | `str` | Where the session is running: `"cli"`, `"telegram"`, `"discord"`, etc. |
**Fires:** In `run_agent.py`, inside `run_conversation()`, after context compression but before the main `while` loop. Fires once per `run_conversation()` call (i.e. once per user turn), not once per API call within the tool loop.
**Return value:** If the callback returns a dict with a `"context"` key, or a plain non-empty string, the text is appended to the current turn's user message. Return `None` for no injection.
```python
# Inject context
return {"context": "Recalled memories:\n- User likes Python\n- Working on hermes-agent"}
# Plain string (equivalent)
return "Recalled memories:\n- User likes Python"
# No injection
return None
```
**Where context is injected:** Always the **user message**, never the system prompt. This preserves the prompt cache — the system prompt stays identical across turns, so cached tokens are reused. The system prompt is Hermes's territory (model guidance, tool enforcement, personality, skills). Plugins contribute context alongside the user's input.
All injected context is **ephemeral** — added at API call time only. The original user message in the conversation history is never mutated, and nothing is persisted to the session database.
When **multiple plugins** return context, their outputs are joined with double newlines in plugin discovery order (alphabetical by directory name).
**Use cases:** Memory recall, RAG context injection, guardrails, per-turn analytics.
**Example — memory recall:**
```python
import httpx
MEMORY_API = "https://your-memory-api.example.com"
def recall(session_id, user_message, is_first_turn, **kwargs):
try:
resp = httpx.post(f"{MEMORY_API}/recall", json={
"session_id": session_id,
"query": user_message,
}, timeout=3)
memories = resp.json().get("results", [])
if not memories:
return None
text = "Recalled context:\n" + "\n".join(f"- {m['text']}" for m in memories)
return {"context": text}
except Exception:
return None
def register(ctx):
ctx.register_hook("pre_llm_call", recall)
```
**Example — guardrails:**
```python
POLICY = "Never execute commands that delete files without explicit user confirmation."
def guardrails(**kwargs):
return {"context": POLICY}
def register(ctx):
ctx.register_hook("pre_llm_call", guardrails)
```
---
### `post_llm_call`
Fires **once per turn**, after the tool-calling loop completes and the agent has produced a final response. Only fires on **successful** turns — does not fire if the turn was interrupted.
**Callback signature:**
```python
def my_callback(session_id: str, user_message: str, assistant_response: str,
conversation_history: list, model: str, platform: str, **kwargs):
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `session_id` | `str` | Unique identifier for the current session |
| `user_message` | `str` | The user's original message for this turn |
| `assistant_response` | `str` | The agent's final text response for this turn |
| `conversation_history` | `list` | Copy of the full message list after the turn completed |
| `model` | `str` | The model identifier |
| `platform` | `str` | Where the session is running |
**Fires:** In `run_agent.py`, inside `run_conversation()`, after the tool loop exits with a final response. Guarded by `if final_response and not interrupted` — so it does **not** fire when the user interrupts mid-turn or the agent hits the iteration limit without producing a response.
**Return value:** Ignored.
**Use cases:** Syncing conversation data to an external memory system, computing response quality metrics, logging turn summaries, triggering follow-up actions.
**Example — sync to external memory:**
```python
import httpx
MEMORY_API = "https://your-memory-api.example.com"
def sync_memory(session_id, user_message, assistant_response, **kwargs):
try:
httpx.post(f"{MEMORY_API}/store", json={
"session_id": session_id,
"user": user_message,
"assistant": assistant_response,
}, timeout=5)
except Exception:
pass # best-effort
def register(ctx):
ctx.register_hook("post_llm_call", sync_memory)
```
**Example — track response lengths:**
```python
import logging
logger = logging.getLogger(__name__)
def log_response_length(session_id, assistant_response, model, **kwargs):
logger.info("RESPONSE session=%s model=%s chars=%d",
session_id, model, len(assistant_response or ""))
def register(ctx):
ctx.register_hook("post_llm_call", log_response_length)
```
---
### `on_session_start`
Fires **once** when a brand-new session is created. Does **not** fire on session continuation (when the user sends a second message in an existing session).
**Callback signature:**
```python
def my_callback(session_id: str, model: str, platform: str, **kwargs):
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `session_id` | `str` | Unique identifier for the new session |
| `model` | `str` | The model identifier |
| `platform` | `str` | Where the session is running |
**Fires:** In `run_agent.py`, inside `run_conversation()`, during the first turn of a new session — specifically after the system prompt is built but before the tool loop starts. The check is `if not conversation_history` (no prior messages = new session).
**Return value:** Ignored.
**Use cases:** Initializing session-scoped state, warming caches, registering the session with an external service, logging session starts.
**Example — initialize a session cache:**
```python
_session_caches = {}
def init_session(session_id, model, platform, **kwargs):
_session_caches[session_id] = {
"model": model,
"platform": platform,
"tool_calls": 0,
"started": __import__("datetime").datetime.now().isoformat(),
}
def register(ctx):
ctx.register_hook("on_session_start", init_session)
```
---
### `on_session_end`
Fires at the **very end** of every `run_conversation()` call, regardless of outcome. Also fires from the CLI's exit handler if the agent was mid-turn when the user quit.
**Callback signature:**
```python
def my_callback(session_id: str, completed: bool, interrupted: bool,
model: str, platform: str, **kwargs):
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `session_id` | `str` | Unique identifier for the session |
| `completed` | `bool` | `True` if the agent produced a final response, `False` otherwise |
| `interrupted` | `bool` | `True` if the turn was interrupted (user sent new message, `/stop`, or quit) |
| `model` | `str` | The model identifier |
| `platform` | `str` | Where the session is running |
**Fires:** In two places:
1. **`run_agent.py`** — at the end of every `run_conversation()` call, after all cleanup. Always fires, even if the turn errored.
2. **`cli.py`** — in the CLI's atexit handler, but **only** if the agent was mid-turn (`_agent_running=True`) when the exit occurred. This catches Ctrl+C and `/exit` during processing. In this case, `completed=False` and `interrupted=True`.
**Return value:** Ignored.
**Use cases:** Flushing buffers, closing connections, persisting session state, logging session duration, cleanup of resources initialized in `on_session_start`.
**Example — flush and cleanup:**
```python
_session_caches = {}
def cleanup_session(session_id, completed, interrupted, **kwargs):
cache = _session_caches.pop(session_id, None)
if cache:
# Flush accumulated data to disk or external service
status = "completed" if completed else ("interrupted" if interrupted else "failed")
print(f"Session {session_id} ended: {status}, {cache['tool_calls']} tool calls")
def register(ctx):
ctx.register_hook("on_session_end", cleanup_session)
```
**Example — session duration tracking:**
```python
import time, logging
logger = logging.getLogger(__name__)
_start_times = {}
def on_start(session_id, **kwargs):
_start_times[session_id] = time.time()
def on_end(session_id, completed, interrupted, **kwargs):
start = _start_times.pop(session_id, None)
if start:
duration = time.time() - start
logger.info("SESSION_DURATION session=%s seconds=%.1f completed=%s interrupted=%s",
session_id, duration, completed, interrupted)
def register(ctx):
ctx.register_hook("on_session_start", on_start)
ctx.register_hook("on_session_end", on_end)
```
---
### `on_session_finalize`
Fires when the CLI or gateway **tears down** an active session — for example, when the user runs `/new`, the gateway GC'd an idle session, or the CLI quit with an active agent. This is the last chance to flush state tied to the outgoing session before its identity is gone.
**Callback signature:**
```python
def my_callback(session_id: str | None, platform: str, **kwargs):
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `session_id` | `str` or `None` | The outgoing session ID. May be `None` if no active session existed. |
| `platform` | `str` | `"cli"` or the messaging platform name (`"telegram"`, `"discord"`, etc.). |
**Fires:** In `cli.py` (on `/new` / CLI exit) and `gateway/run.py` (when a session is reset or GC'd). Always paired with `on_session_reset` on the gateway side.
**Return value:** Ignored.
**Use cases:** Persist final session metrics before the session ID is discarded, close per-session resources, emit a final telemetry event, drain queued writes.
---
### `on_session_reset`
Fires when the gateway **swaps in a new session key** for an active chat — the user invoked `/new`, `/reset`, `/clear`, or the adapter picked a fresh session after an idle window. This lets plugins react to the fact that conversation state has been wiped without waiting for the next `on_session_start`.
**Callback signature:**
```python
def my_callback(session_id: str, platform: str, **kwargs):
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `session_id` | `str` | The new session's ID (already rotated to the fresh value). |
| `platform` | `str` | The messaging platform name. |
**Fires:** In `gateway/run.py`, immediately after the new session key is allocated but before the next inbound message is processed. On the gateway, the order is: `on_session_finalize(old_id)` → swap → `on_session_reset(new_id)` → `on_session_start(new_id)` on the first inbound turn.
**Return value:** Ignored.
**Use cases:** Reset per-session caches keyed by `session_id`, emit "session rotated" analytics, prime a fresh state bucket.
---
See the **[Build a Plugin guide](/docs/guides/build-a-hermes-plugin)** for the full walkthrough including tool schemas, handlers, and advanced hook patterns.
---
### `subagent_stop`
Fires **once per child agent** after `delegate_task` finishes. Whether you delegated a single task or a batch of three, this hook fires once for each child, serialised on the parent thread.
**Callback signature:**
```python
def my_callback(parent_session_id: str, child_role: str | None,
child_summary: str | None, child_status: str,
duration_ms: int, **kwargs):
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `parent_session_id` | `str` | Session ID of the delegating parent agent |
| `child_role` | `str \| None` | Orchestrator role tag set on the child (`None` if the feature isn't enabled) |
| `child_summary` | `str \| None` | The final response the child returned to the parent |
| `child_status` | `str` | `"completed"`, `"failed"`, `"interrupted"`, or `"error"` |
| `duration_ms` | `int` | Wall-clock time spent running the child, in milliseconds |
**Fires:** In `tools/delegate_tool.py`, after `ThreadPoolExecutor.as_completed()` drains all child futures. Firing is marshalled to the parent thread so hook authors don't have to reason about concurrent callback execution.
**Return value:** Ignored.
**Use cases:** Logging orchestration activity, accumulating child durations for billing, writing post-delegation audit records.
**Example — log orchestrator activity:**
```python
import logging
logger = logging.getLogger(__name__)
def log_subagent(parent_session_id, child_role, child_status, duration_ms, **kwargs):
logger.info(
"SUBAGENT parent=%s role=%s status=%s duration_ms=%d",
parent_session_id, child_role, child_status, duration_ms,
)
def register(ctx):
ctx.register_hook("subagent_stop", log_subagent)
```
:::info
With heavy delegation (e.g. orchestrator roles × 5 leaves × nested depth), `subagent_stop` fires many times per turn. Keep your callback fast; push expensive work to a background queue.
:::
---
### `pre_gateway_dispatch`
Fires **once per incoming `MessageEvent`** in the gateway, after the internal-event guard but **before** auth/pairing and agent dispatch. This is the interception point for gateway-level message-flow policies (listen-only windows, human handover, per-chat routing, etc.) that don't fit cleanly into any single platform adapter.
**Callback signature:**
```python
def my_callback(event, gateway, session_store, **kwargs):
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `event` | `MessageEvent` | The normalized inbound message (has `.text`, `.source`, `.message_id`, `.internal`, etc.). |
| `gateway` | `GatewayRunner` | The active gateway runner, so plugins can call `gateway.adapters[platform].send(...)` for side-channel replies (owner notifications, etc.). |
| `session_store` | `SessionStore` | For silent transcript ingestion via `session_store.append_to_transcript(...)`. |
**Fires:** In `gateway/run.py`, inside `GatewayRunner._handle_message()`, immediately after `is_internal` is computed. **Internal events skip the hook entirely** (they are system-generated — background-process completions, etc. — and must not be gate-kept by user-facing policy).
**Return value:** `None` or a dict. The first recognized action dict wins; remaining plugin results are ignored. Exceptions in plugin callbacks are caught and logged; the gateway always falls through to normal dispatch on error.
| Return | Effect |
|--------|--------|
| `{"action": "skip", "reason": "..."}` | Drop the message — no agent reply, no pairing flow, no auth. Plugin is assumed to have handled it (e.g. silent-ingested into the transcript). |
| `{"action": "rewrite", "text": "new text"}` | Replace `event.text`, then continue normal dispatch with the modified event. Useful for collapsing buffered ambient messages into a single prompt. |
| `{"action": "allow"}` / `None` | Normal dispatch — runs the full auth / pairing / agent-loop chain. |
**Use cases:** Listen-only group chats (only respond when tagged; buffer ambient messages into context); human handover (silent-ingest customer messages while owner handles the chat manually); per-profile rate limiting; policy-driven routing.
**Example — drop unauthorized DMs silently without triggering the pairing code:**
```python
def deny_unauthorized_dms(event, **kwargs):
src = event.source
if src.chat_type == "dm" and not _is_approved_user(src.user_id):
return {"action": "skip", "reason": "unauthorized-dm"}
return None
def register(ctx):
ctx.register_hook("pre_gateway_dispatch", deny_unauthorized_dms)
```
**Example — rewrite an ambient-message buffer into a single prompt on mention:**
```python
_buffers = {}
def buffer_or_rewrite(event, **kwargs):
key = (event.source.platform, event.source.chat_id)
buf = _buffers.setdefault(key, [])
if _bot_mentioned(event.text):
combined = "\n".join(buf + [event.text])
buf.clear()
return {"action": "rewrite", "text": combined}
buf.append(event.text)
return {"action": "skip", "reason": "ambient-buffered"}
def register(ctx):
ctx.register_hook("pre_gateway_dispatch", buffer_or_rewrite)
```
---
### `pre_approval_request`
Fires **immediately before** an approval request is shown to the user — covers every surface: interactive CLI, the Ink TUI, gateway platforms (Telegram, Discord, Slack, WhatsApp, Matrix, etc.), and ACP clients (VS Code, Zed, JetBrains).
This is the right place to wire a custom notifier — for example, a macOS menu-bar app that pops an allow/deny notification, or an audit log that records every approval request with context.
**Callback signature:**
```python
def my_callback(
command: str,
description: str,
pattern_key: str,
pattern_keys: list[str],
session_key: str,
surface: str,
**kwargs,
):
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `command` | `str` | The shell command awaiting approval |
| `description` | `str` | Human-readable reason(s) the command is flagged (combined when multiple patterns match) |
| `pattern_key` | `str` | Primary pattern key that triggered the approval (e.g. `"rm_rf"`, `"sudo"`) |
| `pattern_keys` | `list[str]` | All pattern keys that matched |
| `session_key` | `str` | Session identifier, useful for scoping notifications per-chat |
| `surface` | `str` | `"cli"` for interactive CLI/TUI prompts, `"gateway"` for async platform approvals |
**Return value:** ignored. Hooks here are observer-only; they cannot veto or pre-answer the approval. Use [`pre_tool_call`](#pre_tool_call) to block a tool before it reaches the approval system.
**Use cases:** Desktop notifications, push alerts, audit logging, Slack webhooks, escalation routing, metrics.
**Example — desktop notification on macOS:**
```python
import subprocess
def notify_approval(command, description, session_key, **kwargs):
title = "Hermes needs approval"
body = f"{description}: {command[:80]}"
subprocess.Popen([
"osascript", "-e",
f'display notification "{body}" with title "{title}"',
])
def register(ctx):
ctx.register_hook("pre_approval_request", notify_approval)
```
---
### `post_approval_response`
Fires **after** the user responds to an approval prompt (or the prompt times out).
**Callback signature:**
```python
def my_callback(
command: str,
description: str,
pattern_key: str,
pattern_keys: list[str],
session_key: str,
surface: str,
choice: str,
**kwargs,
):
```
Same kwargs as `pre_approval_request`, plus:
| Parameter | Type | Description |
|-----------|------|-------------|
| `choice` | `str` | One of `"once"`, `"session"`, `"always"`, `"deny"`, or `"timeout"` |
**Return value:** ignored.
**Use cases:** Close the matching desktop notification, record the final decision in an audit log, update metrics, roll forward a rate limiter.
```python
def log_decision(command, choice, session_key, **kwargs):
logger.info("approval %s: %s for session %s", choice, command[:60], session_key)
def register(ctx):
ctx.register_hook("post_approval_response", log_decision)
```
---
### `transform_tool_result`
Fires **after** a tool returns and **before** the result is appended to the conversation. Lets a plugin rewrite ANY tool's result string — not just terminal output — before the model sees it.
**Callback signature:**
```python
def my_callback(
tool_name: str,
arguments: dict,
result: str,
task_id: str | None,
**kwargs,
) -> str | None:
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `tool_name` | `str` | Tool that produced the result (`read_file`, `web_extract`, `delegate_task`, …). |
| `arguments` | `dict` | Arguments the model called the tool with. |
| `result` | `str` | The tool's raw result string, post-truncation and post-ANSI-strip. |
| `task_id` | `str \| None` | Task/session ID when running inside RL/benchmark environments. |
**Return value:** `str` to replace the result (the returned string is what the model sees), `None` to leave it unchanged.
**Use cases:** Redact organization-specific PII from `web_extract` output, wrap long JSON tool responses in a summary header, inject retrieval-augmented hints into `read_file` results, rewrite `delegate_task` subagent reports into a project-specific schema.
```python
import re
SECRET = re.compile(r"sk-[A-Za-z0-9]{32,}")
def redact_secrets(tool_name, result, **kwargs):
if SECRET.search(result):
return SECRET.sub("[REDACTED]", result)
return None
def register(ctx):
ctx.register_hook("transform_tool_result", redact_secrets)
```
Applies to every tool. For terminal-only rewriting see `transform_terminal_output` below — it's narrower and runs earlier in the pipeline (pre-truncation, pre-redaction).
---
### `transform_terminal_output`
Fires inside the `terminal` tool's foreground-output pipeline, **before** the default 50 KB truncation, ANSI strip, and secret redaction. Lets plugins rewrite the raw stdout/stderr of a shell command before any downstream processing touches it.
**Callback signature:**
```python
def my_callback(
command: str,
output: str,
exit_code: int,
cwd: str,
task_id: str | None,
**kwargs,
) -> str | None:
```
| Parameter | Type | Description |
|-----------|------|-------------|
| `command` | `str` | The shell command that produced the output. |
| `output` | `str` | Raw combined stdout/stderr (may be very large — truncation happens after the hook). |
| `exit_code` | `int` | Process exit code. |
| `cwd` | `str` | Working directory the command ran in. |
**Return value:** `str` to replace the output, `None` to leave it unchanged.
**Use cases:** Inject summaries for commands that produce massive output (`du -ah`, `find`, `tree`), tag output with a project-specific marker so downstream hooks know how to handle it, strip timing noise that flaps between runs and defeats prompt caching.
```python
def summarize_find(command, output, **kwargs):
if command.startswith("find ") and len(output) > 50_000:
lines = output.count("\n")
head = "\n".join(output.splitlines()[:40])
return f"{head}\n\n[summary: {lines} paths total, showing first 40]"
return None
def register(ctx):
ctx.register_hook("transform_terminal_output", summarize_find)
```
Pairs well with `transform_tool_result` (which covers every other tool).
---
## Shell Hooks
Declare shell-script hooks in your `cli-config.yaml` and Hermes will run them as subprocesses whenever the corresponding plugin-hook event fires — in both CLI and gateway sessions. No Python plugin authoring required.
Use shell hooks when you want a drop-in, single-file script (Bash, Python, anything with a shebang) to:
- **Block a tool call** — reject dangerous `terminal` commands, enforce per-directory policies, require approval for destructive `write_file` / `patch` operations.
- **Run after a tool call** — auto-format Python or TypeScript files that the agent just wrote, log API calls, trigger a CI workflow.
- **Inject context into the next LLM turn** — prepend `git status` output, the current weekday, or retrieved documents to the user message (see [`pre_llm_call`](#pre_llm_call)).
- **Observe lifecycle events** — write a log line when a subagent completes (`subagent_stop`) or a session starts (`on_session_start`).
Shell hooks are registered by calling `agent.shell_hooks.register_from_config(cfg)` at both CLI startup (`hermes_cli/main.py`) and gateway startup (`gateway/run.py`). They compose naturally with Python plugin hooks — both flow through the same dispatcher.
### Comparison at a glance
| Dimension | Shell hooks | [Plugin hooks](#plugin-hooks) | [Gateway hooks](#gateway-event-hooks) |
|-----------|-------------|-------------------------------|---------------------------------------|
| Declared in | `hooks:` block in `~/.hermes/config.yaml` | `register()` in a `plugin.yaml` plugin | `HOOK.yaml` + `handler.py` directory |
| Lives under | `~/.hermes/agent-hooks/` (by convention) | `~/.hermes/plugins//` | `~/.hermes/hooks//` |
| Language | Any (Bash, Python, Go binary, …) | Python only | Python only |
| Runs in | CLI + Gateway | CLI + Gateway | Gateway only |
| Events | `VALID_HOOKS` (incl. `subagent_stop`) | `VALID_HOOKS` | Gateway lifecycle (`gateway:startup`, `agent:*`, `command:*`) |
| Can block a tool call | Yes (`pre_tool_call`) | Yes (`pre_tool_call`) | No |
| Can inject LLM context | Yes (`pre_llm_call`) | Yes (`pre_llm_call`) | No |
| Consent | First-use prompt per `(event, command)` pair | Implicit (Python plugin trust) | Implicit (dir trust) |
| Inter-process isolation | Yes (subprocess) | No (in-process) | No (in-process) |
### Configuration schema
```yaml
hooks:
: # Must be in VALID_HOOKS
- matcher: "" # Optional; used for pre/post_tool_call only
command: "" # Required; runs via shlex.split, shell=False
timeout: # Optional; default 60, capped at 300
hooks_auto_accept: false # See "Consent model" below
```
Event names must be one of the [plugin hook events](#plugin-hooks); typos produce a "Did you mean X?" warning and are skipped. Unknown keys inside a single entry are ignored; missing `command` is a skip-with-warning. `timeout > 300` is clamped with a warning.
### JSON wire protocol
Each time the event fires, Hermes spawns a subprocess for every matching hook (matcher permitting), pipes a JSON payload to **stdin**, and reads **stdout** back as JSON.
**stdin — payload the script receives:**
```json
{
"hook_event_name": "pre_tool_call",
"tool_name": "terminal",
"tool_input": {"command": "rm -rf /"},
"session_id": "sess_abc123",
"cwd": "/home/user/project",
"extra": {"task_id": "...", "tool_call_id": "..."}
}
```
`tool_name` and `tool_input` are `null` for non-tool events (`pre_llm_call`, `subagent_stop`, session lifecycle). The `extra` dict carries all event-specific kwargs (`user_message`, `conversation_history`, `child_role`, `duration_ms`, …). Unserialisable values are stringified rather than omitted.
**stdout — optional response:**
```jsonc
// Block a pre_tool_call (both shapes accepted; normalised internally):
{"decision": "block", "reason": "Forbidden: rm -rf"} // Claude-Code style
{"action": "block", "message": "Forbidden: rm -rf"} // Hermes-canonical
// Inject context for pre_llm_call:
{"context": "Today is Friday, 2026-04-17"}
// Silent no-op — any empty / non-matching output is fine:
```
Malformed JSON, non-zero exit codes, and timeouts log a warning but never abort the agent loop.
### Worked examples
#### 1. Auto-format Python files after every write
```yaml
# ~/.hermes/config.yaml
hooks:
post_tool_call:
- matcher: "write_file|patch"
command: "~/.hermes/agent-hooks/auto-format.sh"
```
```bash
#!/usr/bin/env bash
# ~/.hermes/agent-hooks/auto-format.sh
payload="$(cat -)"
path=$(echo "$payload" | jq -r '.tool_input.path // empty')
[[ "$path" == *.py ]] && command -v black >/dev/null && black "$path" 2>/dev/null
printf '{}\n'
```
The agent's in-context view of the file is **not** re-read automatically — the reformat only affects the file on disk. Subsequent `read_file` calls pick up the formatted version.
#### 2. Block destructive `terminal` commands
```yaml
hooks:
pre_tool_call:
- matcher: "terminal"
command: "~/.hermes/agent-hooks/block-rm-rf.sh"
timeout: 5
```
```bash
#!/usr/bin/env bash
# ~/.hermes/agent-hooks/block-rm-rf.sh
payload="$(cat -)"
cmd=$(echo "$payload" | jq -r '.tool_input.command // empty')
if echo "$cmd" | grep -qE 'rm[[:space:]]+-rf?[[:space:]]+/'; then
printf '{"decision": "block", "reason": "blocked: rm -rf / is not permitted"}\n'
else
printf '{}\n'
fi
```
#### 3. Inject `git status` into every turn (Claude-Code `UserPromptSubmit` equivalent)
```yaml
hooks:
pre_llm_call:
- command: "~/.hermes/agent-hooks/inject-cwd-context.sh"
```
```bash
#!/usr/bin/env bash
# ~/.hermes/agent-hooks/inject-cwd-context.sh
cat - >/dev/null # discard stdin payload
if status=$(git status --porcelain 2>/dev/null) && [[ -n "$status" ]]; then
jq --null-input --arg s "$status" \
'{context: ("Uncommitted changes in cwd:\n" + $s)}'
else
printf '{}\n'
fi
```
Claude Code's `UserPromptSubmit` event is intentionally not a separate Hermes event — `pre_llm_call` fires at the same place and already supports context injection. Use it here.
#### 4. Log every subagent completion
```yaml
hooks:
subagent_stop:
- command: "~/.hermes/agent-hooks/log-orchestration.sh"
```
```bash
#!/usr/bin/env bash
# ~/.hermes/agent-hooks/log-orchestration.sh
log=~/.hermes/logs/orchestration.log
jq -c '{ts: now, parent: .session_id, extra: .extra}' < /dev/stdin >> "$log"
printf '{}\n'
```
### Consent model
Each unique `(event, command)` pair prompts the user for approval the first time Hermes sees it, then persists the decision to `~/.hermes/shell-hooks-allowlist.json`. Subsequent runs (CLI or gateway) skip the prompt.
Three escape hatches bypass the interactive prompt — any one is sufficient:
1. `--accept-hooks` flag on the CLI (e.g. `hermes --accept-hooks chat`)
2. `HERMES_ACCEPT_HOOKS=1` environment variable
3. `hooks_auto_accept: true` in `cli-config.yaml`
Non-TTY runs (gateway, cron, CI) need one of these three — otherwise any newly-added hook silently stays un-registered and logs a warning.
**Script edits are silently trusted.** The allowlist keys on the exact command string, not the script's hash, so editing the script on disk does not invalidate consent. `hermes hooks doctor` flags mtime drift so you can spot edits and decide whether to re-approve.
### The `hermes hooks` CLI
| Command | What it does |
|---------|--------------|
| `hermes hooks list` | Dump configured hooks with matcher, timeout, and consent status |
| `hermes hooks test [--for-tool X] [--payload-file F]` | Fire every matching hook against a synthetic payload and print the parsed response |
| `hermes hooks revoke ` | Remove every allowlist entry matching `` (takes effect on next restart) |
| `hermes hooks doctor` | For every configured hook: check exec bit, allowlist status, mtime drift, JSON output validity, and rough execution time |
### Security
Shell hooks run with **your full user credentials** — same trust boundary as a cron entry or a shell alias. Treat the `hooks:` block in `config.yaml` as privileged configuration:
- Only reference scripts you wrote or fully reviewed.
- Keep scripts inside `~/.hermes/agent-hooks/` so the path is easy to audit.
- Re-run `hermes hooks doctor` after you pull a shared config to spot newly-added hooks before they register.
- If your config.yaml is version-controlled across a team, review PRs that change the `hooks:` section the same way you'd review CI config.
### Ordering and precedence
Both Python plugin hooks and shell hooks flow through the same `invoke_hook()` dispatcher. Python plugins are registered first (`discover_and_load()`), shell hooks second (`register_from_config()`), so Python `pre_tool_call` block decisions take precedence in tie cases. The first valid block wins — the aggregator returns as soon as any callback produces `{"action": "block", "message": str}` with a non-empty message.
---
# Batch Processing
# Batch Processing
Batch processing lets you run the Hermes agent across hundreds or thousands of prompts in parallel, generating structured trajectory data. This is primarily used for **training data generation** — producing ShareGPT-format trajectories with tool usage statistics that can be used for fine-tuning or evaluation.
## Overview
The batch runner (`batch_runner.py`) processes a JSONL dataset of prompts, running each through a full agent session with tool access. Each prompt gets its own isolated environment. The output is structured trajectory data with full conversation history, tool call statistics, and reasoning coverage metrics.
## Quick Start
```bash
# Basic batch run
python batch_runner.py \
--dataset_file=data/prompts.jsonl \
--batch_size=10 \
--run_name=my_first_run \
--model=anthropic/claude-sonnet-4.6 \
--num_workers=4
# Resume an interrupted run
python batch_runner.py \
--dataset_file=data/prompts.jsonl \
--batch_size=10 \
--run_name=my_first_run \
--resume
# List available toolset distributions
python batch_runner.py --list_distributions
```
## Dataset Format
The input dataset is a JSONL file (one JSON object per line). Each entry must have a `prompt` field:
```jsonl
{"prompt": "Write a Python function that finds the longest palindromic substring"}
{"prompt": "Create a REST API endpoint for user authentication using Flask"}
{"prompt": "Debug this error: TypeError: cannot unpack non-iterable NoneType object"}
```
Entries can optionally include:
- `image` or `docker_image`: A container image to use for this prompt's sandbox (works with Docker, Modal, and Singularity backends)
- `cwd`: Working directory override for the task's terminal session
## Configuration Options
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--dataset_file` | (required) | Path to JSONL dataset |
| `--batch_size` | (required) | Prompts per batch |
| `--run_name` | (required) | Name for this run (used for output dir and checkpointing) |
| `--distribution` | `"default"` | Toolset distribution to sample from |
| `--model` | `claude-sonnet-4.6` | Model to use |
| `--base_url` | `https://openrouter.ai/api/v1` | API base URL |
| `--api_key` | (env var) | API key for model |
| `--max_turns` | `10` | Maximum tool-calling iterations per prompt |
| `--num_workers` | `4` | Parallel worker processes |
| `--resume` | `false` | Resume from checkpoint |
| `--verbose` | `false` | Enable verbose logging |
| `--max_samples` | all | Only process first N samples from dataset |
| `--max_tokens` | model default | Maximum tokens per model response |
### Provider Routing (OpenRouter)
| Parameter | Description |
|-----------|-------------|
| `--providers_allowed` | Comma-separated providers to allow (e.g., `"anthropic,openai"`) |
| `--providers_ignored` | Comma-separated providers to ignore (e.g., `"together,deepinfra"`) |
| `--providers_order` | Comma-separated preferred provider order |
| `--provider_sort` | Sort by `"price"`, `"throughput"`, or `"latency"` |
### Reasoning Control
| Parameter | Description |
|-----------|-------------|
| `--reasoning_effort` | Effort level: `none`, `minimal`, `low`, `medium`, `high`, `xhigh` |
| `--reasoning_disabled` | Completely disable reasoning/thinking tokens |
### Advanced Options
| Parameter | Description |
|-----------|-------------|
| `--ephemeral_system_prompt` | System prompt used during execution but NOT saved to trajectories |
| `--log_prefix_chars` | Characters to show in log previews (default: 100) |
| `--prefill_messages_file` | Path to JSON file with prefill messages for few-shot priming |
## Toolset Distributions
Each prompt gets a randomly sampled set of toolsets from a **distribution**. This ensures training data covers diverse tool combinations. Use `--list_distributions` to see all available distributions.
In the current implementation, distributions assign a probability to **each individual toolset**. The sampler flips each toolset independently, then guarantees that at least one toolset is enabled. This is different from a hand-authored table of prebuilt combinations.
## Output Format
All output goes to `data//`:
```text
data/my_run/
├── trajectories.jsonl # Combined final output (all batches merged)
├── batch_0.jsonl # Individual batch results
├── batch_1.jsonl
├── ...
├── checkpoint.json # Resume checkpoint
└── statistics.json # Aggregate tool usage stats
```
### Trajectory Format
Each line in `trajectories.jsonl` is a JSON object:
```json
{
"prompt_index": 42,
"conversations": [
{"from": "human", "value": "Write a function..."},
{"from": "gpt", "value": "I'll create that function...",
"tool_calls": [...]},
{"from": "tool", "value": "..."},
{"from": "gpt", "value": "Here's the completed function..."}
],
"metadata": {
"batch_num": 2,
"timestamp": "2026-01-15T10:30:00",
"model": "anthropic/claude-sonnet-4.6"
},
"completed": true,
"partial": false,
"api_calls": 3,
"toolsets_used": ["terminal", "file"],
"tool_stats": {
"terminal": {"count": 2, "success": 2, "failure": 0},
"read_file": {"count": 1, "success": 1, "failure": 0}
},
"tool_error_counts": {
"terminal": 0,
"read_file": 0
}
}
```
The `conversations` field uses a ShareGPT-like format with `from` and `value` fields. Tool stats are normalized to include all possible tools with zero defaults, ensuring consistent schema across entries for HuggingFace datasets compatibility.
## Checkpointing
The batch runner has robust checkpointing for fault tolerance:
- **Checkpoint file:** Saved after each batch completes, tracking which prompt indices are done
- **Content-based resume:** On `--resume`, the runner scans existing batch files and matches completed prompts by their actual text content (not just indices), enabling recovery even if the dataset order changes
- **Failed prompts:** Only successfully completed prompts are marked as done — failed prompts will be retried on resume
- **Batch merging:** On completion, all batch files (including from previous runs) are merged into a single `trajectories.jsonl`
### How Resume Works
1. Scan all `batch_*.jsonl` files for completed prompts (by content matching)
2. Filter the dataset to exclude already-completed prompts
3. Re-batch the remaining prompts
4. Process only the remaining prompts
5. Merge all batch files (old + new) into final output
## Quality Filtering
The batch runner applies automatic quality filtering:
- **No-reasoning filter:** Samples where zero assistant turns contain reasoning (no `` or native thinking tokens) are discarded
- **Corrupted entry filter:** Entries with hallucinated tool names (not in the valid tool list) are filtered out during the final merge
- **Reasoning statistics:** Tracks percentage of turns with/without reasoning across the entire run
## Statistics
After completion, the runner prints comprehensive statistics:
- **Tool usage:** Call counts, success/failure rates per tool
- **Reasoning coverage:** Percentage of assistant turns with reasoning
- **Samples discarded:** Count of samples filtered for lacking reasoning
- **Duration:** Total processing time
Statistics are also saved to `statistics.json` for programmatic analysis.
## Use Cases
### Training Data Generation
Generate diverse tool-use trajectories for fine-tuning:
```bash
python batch_runner.py \
--dataset_file=data/coding_prompts.jsonl \
--batch_size=20 \
--run_name=coding_v1 \
--model=anthropic/claude-sonnet-4.6 \
--num_workers=8 \
--distribution=default \
--max_turns=15
```
### Model Evaluation
Evaluate how well a model uses tools across standardized prompts:
```bash
python batch_runner.py \
--dataset_file=data/eval_suite.jsonl \
--batch_size=10 \
--run_name=eval_gpt4 \
--model=openai/gpt-4o \
--num_workers=4 \
--max_turns=10
```
### Per-Prompt Container Images
For benchmarks requiring specific environments, each prompt can specify its own container image:
```jsonl
{"prompt": "Install numpy and compute eigenvalues of a 3x3 matrix", "image": "python:3.11-slim"}
{"prompt": "Compile this Rust program and run it", "image": "rust:1.75"}
{"prompt": "Set up a Node.js Express server", "image": "node:20-alpine", "cwd": "/app"}
```
The batch runner verifies Docker images are accessible before running each prompt.
---
# Voice Mode
# Voice Mode
Hermes Agent supports full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
If you want a practical setup walkthrough with recommended configurations and real usage patterns, see [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
## Prerequisites
Before using voice features, make sure you have:
1. **Hermes Agent installed** — `pip install hermes-agent` (see [Installation](/docs/getting-started/installation))
2. **An LLM provider configured** — run `hermes model` or set your preferred provider credentials in `~/.hermes/.env`
3. **A working base setup** — run `hermes` to verify the agent responds to text before enabling voice
:::tip
The `~/.hermes/` directory and default `config.yaml` are created automatically the first time you run `hermes`. You only need to create `~/.hermes/.env` manually for API keys.
:::
## Overview
| Feature | Platform | Description |
|---------|----------|-------------|
| **Interactive Voice** | CLI | Press Ctrl+B to record, agent auto-detects silence and responds |
| **Auto Voice Reply** | Telegram, Discord | Agent sends spoken audio alongside text responses |
| **Voice Channel** | Discord | Bot joins VC, listens to users speaking, speaks replies back |
## Requirements
### Python Packages
```bash
# CLI voice mode (microphone + audio playback)
pip install "hermes-agent[voice]"
# Discord + Telegram messaging (includes discord.py[voice] for VC support)
pip install "hermes-agent[messaging]"
# Premium TTS (ElevenLabs)
pip install "hermes-agent[tts-premium]"
# Local TTS (NeuTTS, optional)
python -m pip install -U neutts[all]
# Everything at once
pip install "hermes-agent[all]"
```
| Extra | Packages | Required For |
|-------|----------|-------------|
| `voice` | `sounddevice`, `numpy` | CLI voice mode |
| `messaging` | `discord.py[voice]`, `python-telegram-bot`, `aiohttp` | Discord & Telegram bots |
| `tts-premium` | `elevenlabs` | ElevenLabs TTS provider |
Optional local TTS provider: install `neutts` separately with `python -m pip install -U neutts[all]`. On first use it downloads the model automatically.
:::info
`discord.py[voice]` installs **PyNaCl** (for voice encryption) and **opus bindings** automatically. This is required for Discord voice channel support.
:::
### System Dependencies
```bash
# macOS
brew install portaudio ffmpeg opus
brew install espeak-ng # for NeuTTS
# Ubuntu/Debian
sudo apt install portaudio19-dev ffmpeg libopus0
sudo apt install espeak-ng # for NeuTTS
```
| Dependency | Purpose | Required For |
|-----------|---------|-------------|
| **PortAudio** | Microphone input and audio playback | CLI voice mode |
| **ffmpeg** | Audio format conversion (MP3 → Opus, PCM → WAV) | All platforms |
| **Opus** | Discord voice codec | Discord voice channels |
| **espeak-ng** | Phonemizer backend | Local NeuTTS provider |
### API Keys
Add to `~/.hermes/.env`:
```bash
# Speech-to-Text — local provider needs NO key at all
# pip install faster-whisper # Free, runs locally, recommended
GROQ_API_KEY=your-key # Groq Whisper — fast, free tier (cloud)
VOICE_TOOLS_OPENAI_KEY=your-key # OpenAI Whisper — paid (cloud)
# Text-to-Speech (optional — Edge TTS and NeuTTS work without any key)
ELEVENLABS_API_KEY=*** # ElevenLabs — premium quality
# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS
```
:::tip
If `faster-whisper` is installed, voice mode works with **zero API keys** for STT. The model (~150 MB for `base`) downloads automatically on first use.
:::
---
## CLI Voice Mode
Voice mode is available in both the **classic CLI** (`hermes chat`) and the **TUI** (`hermes --tui`). Behavior is identical across both — same slash commands, same VAD silence detection, same streaming TTS, same hallucination filter. The TUI additionally forwards crash-forensic logs to `~/.hermes/logs/` so push-to-talk failures on exotic audio backends can be reported with a full stack trace rather than disappearing silently.
### Quick Start
Start the CLI and enable voice mode:
```bash
hermes # Start the interactive CLI
```
Then use these commands inside the CLI:
```
/voice Toggle voice mode on/off
/voice on Enable voice mode
/voice off Disable voice mode
/voice tts Toggle TTS output
/voice status Show current state
```
### How It Works
1. Start the CLI with `hermes` and enable voice mode with `/voice on`
2. **Press Ctrl+B** — a beep plays (880Hz), recording starts
3. **Speak** — a live audio level bar shows your input: `● [▁▂▃▅▇▇▅▂] ❯`
4. **Stop speaking** — after 3 seconds of silence, recording auto-stops
5. **Two beeps** play (660Hz) confirming the recording ended
6. Audio is transcribed via Whisper and sent to the agent
7. If TTS is enabled, the agent's reply is spoken aloud
8. Recording **automatically restarts** — speak again without pressing any key
This loop continues until you press **Ctrl+B** during recording (exits continuous mode) or 3 consecutive recordings detect no speech.
:::tip
The record key is configurable via `voice.record_key` in `~/.hermes/config.yaml` (default: `ctrl+b`).
:::
### Silence Detection
Two-stage algorithm detects when you've finished speaking:
1. **Speech confirmation** — waits for audio above the RMS threshold (200) for at least 0.3s, tolerating brief dips between syllables
2. **End detection** — once speech is confirmed, triggers after 3.0 seconds of continuous silence
If no speech is detected at all for 15 seconds, recording stops automatically.
Both `silence_threshold` and `silence_duration` are configurable in `config.yaml`. You can also disable the record start/stop beeps with `voice.beep_enabled: false`.
### Streaming TTS
When TTS is enabled, the agent speaks its reply **sentence-by-sentence** as it generates text — you don't wait for the full response:
1. Buffers text deltas into complete sentences (min 20 chars)
2. Strips markdown formatting and `` blocks
3. Generates and plays audio per sentence in real-time
### Hallucination Filter
Whisper sometimes generates phantom text from silence or background noise ("Thank you for watching", "Subscribe", etc.). The agent filters these out using a set of 26 known hallucination phrases across multiple languages, plus a regex pattern that catches repetitive variations.
---
## Gateway Voice Reply (Telegram & Discord)
If you haven't set up your messaging bots yet, see the platform-specific guides:
- [Telegram Setup Guide](../messaging/telegram.md)
- [Discord Setup Guide](../messaging/discord.md)
Start the gateway to connect to your messaging platforms:
```bash
hermes gateway # Start the gateway (connects to configured platforms)
hermes gateway setup # Interactive setup wizard for first-time configuration
```
### Discord: Channels vs DMs
The bot supports two interaction modes on Discord:
| Mode | How to Talk | Mention Required | Setup |
|------|------------|-----------------|-------|
| **Direct Message (DM)** | Open the bot's profile → "Message" | No | Works immediately |
| **Server Channel** | Type in a text channel where the bot is present | Yes (`@botname`) | Bot must be invited to the server |
**DM (recommended for personal use):** Just open a DM with the bot and type — no @mention needed. Voice replies and all commands work the same as in channels.
**Server channels:** The bot only responds when you @mention it (e.g. `@hermesbyt4 hello`). Make sure you select the **bot user** from the mention popup, not the role with the same name.
:::tip
To disable the mention requirement in server channels, add to `~/.hermes/.env`:
```bash
DISCORD_REQUIRE_MENTION=false
```
Or set specific channels as free-response (no mention needed):
```bash
DISCORD_FREE_RESPONSE_CHANNELS=123456789,987654321
```
:::
### Commands
These work in both Telegram and Discord (DMs and text channels):
```
/voice Toggle voice mode on/off
/voice on Voice replies only when you send a voice message
/voice tts Voice replies for ALL messages
/voice off Disable voice replies
/voice status Show current setting
```
### Modes
| Mode | Command | Behavior |
|------|---------|----------|
| `off` | `/voice off` | Text only (default) |
| `voice_only` | `/voice on` | Speaks reply only when you send a voice message |
| `all` | `/voice tts` | Speaks reply to every message |
Voice mode setting is persisted across gateway restarts.
### Platform Delivery
| Platform | Format | Notes |
|----------|--------|-------|
| **Telegram** | Voice bubble (Opus/OGG) | Plays inline in chat. ffmpeg converts MP3 → Opus if needed |
| **Discord** | Native voice bubble (Opus/OGG) | Plays inline like a user voice message. Falls back to file attachment if voice bubble API fails |
---
## Discord Voice Channels
The most immersive voice feature: the bot joins a Discord voice channel, listens to users speaking, transcribes their speech, processes through the agent, and speaks the reply back in the voice channel.
### Setup
#### 1. Discord Bot Permissions
If you already have a Discord bot set up for text (see [Discord Setup Guide](../messaging/discord.md)), you need to add voice permissions.
Go to the [Discord Developer Portal](https://discord.com/developers/applications) → your application → **Installation** → **Default Install Settings** → **Guild Install**:
**Add these permissions to the existing text permissions:**
| Permission | Purpose | Required |
|-----------|---------|----------|
| **Connect** | Join voice channels | Yes |
| **Speak** | Play TTS audio in voice channels | Yes |
| **Use Voice Activity** | Detect when users are speaking | Recommended |
**Updated Permissions Integer:**
| Level | Integer | What's Included |
|-------|---------|----------------|
| Text only | `274878286912` | View Channels, Send Messages, Read History, Embeds, Attachments, Threads, Reactions |
| Text + Voice | `274881432640` | All above + Connect, Speak |
**Re-invite the bot** with the updated permissions URL:
```
https://discord.com/oauth2/authorize?client_id=YOUR_APP_ID&scope=bot+applications.commands&permissions=274881432640
```
Replace `YOUR_APP_ID` with your Application ID from the Developer Portal.
:::warning
Re-inviting the bot to a server it's already in will update its permissions without removing it. You won't lose any data or configuration.
:::
#### 2. Privileged Gateway Intents
In the [Developer Portal](https://discord.com/developers/applications) → your application → **Bot** → **Privileged Gateway Intents**, enable all three:
| Intent | Purpose |
|--------|---------|
| **Presence Intent** | Detect user online/offline status |
| **Server Members Intent** | Resolve usernames in `DISCORD_ALLOWED_USERS` to numeric IDs (conditional) |
| **Message Content Intent** | Read text message content in channels |
**Message Content Intent** is required. **Server Members Intent** is only needed if your `DISCORD_ALLOWED_USERS` list uses usernames — if you use numeric user IDs, you can leave it OFF. Voice-channel SSRC → user_id mapping comes from Discord's SPEAKING opcode on the voice websocket and does **not** require the Server Members Intent.
#### 3. Opus Codec
The Opus codec library must be installed on the machine running the gateway:
```bash
# macOS (Homebrew)
brew install opus
# Ubuntu/Debian
sudo apt install libopus0
```
The bot auto-loads the codec from:
- **macOS:** `/opt/homebrew/lib/libopus.dylib`
- **Linux:** `libopus.so.0`
#### 4. Environment Variables
```bash
# ~/.hermes/.env
# Discord bot (already configured for text)
DISCORD_BOT_TOKEN=your-bot-token
DISCORD_ALLOWED_USERS=your-user-id
# STT — local provider needs no key (pip install faster-whisper)
# GROQ_API_KEY=your-key # Alternative: cloud-based, fast, free tier
# TTS — optional. Edge TTS and NeuTTS need no key.
# ELEVENLABS_API_KEY=*** # Premium quality
# VOICE_TOOLS_OPENAI_KEY=*** # OpenAI TTS / Whisper
```
### Start the Gateway
```bash
hermes gateway # Start with existing configuration
```
The bot should come online in Discord within a few seconds.
### Commands
Use these in the Discord text channel where the bot is present:
```
/voice join Bot joins your current voice channel
/voice channel Alias for /voice join
/voice leave Bot disconnects from voice channel
/voice status Show voice mode and connected channel
```
:::info
You must be in a voice channel before running `/voice join`. The bot joins the same VC you're in.
:::
### How It Works
When the bot joins a voice channel, it:
1. **Listens** to each user's audio stream independently
2. **Detects silence** — 1.5s of silence after at least 0.5s of speech triggers processing
3. **Transcribes** the audio via Whisper STT (local, Groq, or OpenAI)
4. **Processes** through the full agent pipeline (session, tools, memory)
5. **Speaks** the reply back in the voice channel via TTS
### Text Channel Integration
When the bot is in a voice channel:
- Transcripts appear in the text channel: `[Voice] @user: what you said`
- Agent responses are sent as text in the channel AND spoken in the VC
- The text channel is the one where `/voice join` was issued
### Echo Prevention
The bot automatically pauses its audio listener while playing TTS replies, preventing it from hearing and re-processing its own output.
### Access Control
Only users listed in `DISCORD_ALLOWED_USERS` can interact via voice. Other users' audio is silently ignored.
```bash
# ~/.hermes/.env
DISCORD_ALLOWED_USERS=284102345871466496
```
---
## Configuration Reference
### config.yaml
```yaml
# Voice recording (CLI)
voice:
record_key: "ctrl+b" # Key to start/stop recording
max_recording_seconds: 120 # Maximum recording length
auto_tts: false # Auto-enable TTS when voice mode starts
beep_enabled: true # Play record start/stop beeps
silence_threshold: 200 # RMS level (0-32767) below which counts as silence
silence_duration: 3.0 # Seconds of silence before auto-stop
# Speech-to-Text
stt:
provider: "local" # "local" (free) | "groq" | "openai"
local:
model: "base" # tiny, base, small, medium, large-v3
# model: "whisper-1" # Legacy: used when provider is not set
# Text-to-Speech
tts:
provider: "edge" # "edge" (free) | "elevenlabs" | "openai" | "neutts" | "minimax"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB" # Adam
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
base_url: "https://api.openai.com/v1" # optional: override for self-hosted or OpenAI-compatible endpoints
neutts:
ref_audio: ''
ref_text: ''
model: neuphonic/neutts-air-q4-gguf
device: cpu
```
### Environment Variables
```bash
# Speech-to-Text providers (local needs no key)
# pip install faster-whisper # Free local STT — no API key needed
GROQ_API_KEY=... # Groq Whisper (fast, free tier)
VOICE_TOOLS_OPENAI_KEY=... # OpenAI Whisper (paid)
# STT advanced overrides (optional)
STT_GROQ_MODEL=whisper-large-v3-turbo # Override default Groq STT model
STT_OPENAI_MODEL=whisper-1 # Override default OpenAI STT model
GROQ_BASE_URL=https://api.groq.com/openai/v1 # Custom Groq endpoint
STT_OPENAI_BASE_URL=https://api.openai.com/v1 # Custom OpenAI STT endpoint
# Text-to-Speech providers (Edge TTS and NeuTTS need no key)
ELEVENLABS_API_KEY=*** # ElevenLabs (premium quality)
# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS
# Discord voice channel
DISCORD_BOT_TOKEN=...
DISCORD_ALLOWED_USERS=...
```
### STT Provider Comparison
| Provider | Model | Speed | Quality | Cost | API Key |
|----------|-------|-------|---------|------|---------|
| **Local** | `base` | Fast (depends on CPU/GPU) | Good | Free | No |
| **Local** | `small` | Medium | Better | Free | No |
| **Local** | `large-v3` | Slow | Best | Free | No |
| **Groq** | `whisper-large-v3-turbo` | Very fast (~0.5s) | Good | Free tier | Yes |
| **Groq** | `whisper-large-v3` | Fast (~1s) | Better | Free tier | Yes |
| **OpenAI** | `whisper-1` | Fast (~1s) | Good | Paid | Yes |
| **OpenAI** | `gpt-4o-transcribe` | Medium (~2s) | Best | Paid | Yes |
Provider priority (automatic fallback): **local** > **groq** > **openai**
### TTS Provider Comparison
| Provider | Quality | Cost | Latency | Key Required |
|----------|---------|------|---------|-------------|
| **Edge TTS** | Good | Free | ~1s | No |
| **ElevenLabs** | Excellent | Paid | ~2s | Yes |
| **OpenAI TTS** | Good | Paid | ~1.5s | Yes |
| **NeuTTS** | Good | Free | Depends on CPU/GPU | No |
NeuTTS uses the `tts.neutts` config block above.
---
## Troubleshooting
### "No audio device found" (CLI)
PortAudio is not installed:
```bash
brew install portaudio # macOS
sudo apt install portaudio19-dev # Ubuntu
```
### Bot doesn't respond in Discord server channels
The bot requires an @mention by default in server channels. Make sure you:
1. Type `@` and select the **bot user** (with the #discriminator), not the **role** with the same name
2. Or use DMs instead — no mention needed
3. Or set `DISCORD_REQUIRE_MENTION=false` in `~/.hermes/.env`
### Bot joins VC but doesn't hear me
- Check your Discord user ID is in `DISCORD_ALLOWED_USERS`
- Make sure you're not muted in Discord
- The bot needs a SPEAKING event from Discord before it can map your audio — start speaking within a few seconds of joining
### Bot hears me but doesn't respond
- Verify STT is available: install `faster-whisper` (no key needed) or set `GROQ_API_KEY` / `VOICE_TOOLS_OPENAI_KEY`
- Check the LLM model is configured and accessible
- Review gateway logs: `tail -f ~/.hermes/logs/gateway.log`
### Bot responds in text but not in voice channel
- TTS provider may be failing — check API key and quota
- Edge TTS (free, no key) is the default fallback
- Check logs for TTS errors
### Whisper returns garbage text
The hallucination filter catches most cases automatically. If you're still getting phantom transcripts:
- Use a quieter environment
- Adjust `silence_threshold` in config (higher = less sensitive)
- Try a different STT model
---
# user-guide/features/browser
# Browser Automation
Hermes Agent includes a full browser automation toolset with multiple backend options:
- **Browserbase cloud mode** via [Browserbase](https://browserbase.com) for managed cloud browsers and anti-bot tooling
- **Browser Use cloud mode** via [Browser Use](https://browser-use.com) as an alternative cloud browser provider
- **Firecrawl cloud mode** via [Firecrawl](https://firecrawl.dev) for cloud browsers with built-in scraping
- **Camofox local mode** via [Camofox](https://github.com/jo-inc/camofox-browser) for local anti-detection browsing (Firefox-based fingerprint spoofing)
- **Local Chrome via CDP** — connect browser tools to your own Chrome instance using `/browser connect`
- **Local browser mode** via the `agent-browser` CLI and a local Chromium installation
In all modes, the agent can navigate websites, interact with page elements, fill forms, and extract information.
## Overview
Pages are represented as **accessibility trees** (text-based snapshots), making them ideal for LLM agents. Interactive elements get ref IDs (like `@e1`, `@e2`) that the agent uses for clicking and typing.
Key capabilities:
- **Multi-provider cloud execution** — Browserbase, Browser Use, or Firecrawl — no local browser needed
- **Local Chrome integration** — attach to your running Chrome via CDP for hands-on browsing
- **Built-in stealth** — random fingerprints, CAPTCHA solving, residential proxies (Browserbase)
- **Session isolation** — each task gets its own browser session
- **Automatic cleanup** — inactive sessions are closed after a timeout
- **Vision analysis** — screenshot + AI analysis for visual understanding
## Setup
:::tip Nous Subscribers
If you have a paid [Nous Portal](https://portal.nousresearch.com) subscription, you can use browser automation through the **[Tool Gateway](tool-gateway.md)** without any separate API keys. Run `hermes model` or `hermes tools` to enable it.
:::
### Browserbase cloud mode
To use Browserbase-managed cloud browsers, add:
```bash
# Add to ~/.hermes/.env
BROWSERBASE_API_KEY=***
BROWSERBASE_PROJECT_ID=your-project-id-here
```
Get your credentials at [browserbase.com](https://browserbase.com).
### Browser Use cloud mode
To use Browser Use as your cloud browser provider, add:
```bash
# Add to ~/.hermes/.env
BROWSER_USE_API_KEY=***
```
Get your API key at [browser-use.com](https://browser-use.com). Browser Use provides a cloud browser via its REST API. If both Browserbase and Browser Use credentials are set, Browserbase takes priority.
### Firecrawl cloud mode
To use Firecrawl as your cloud browser provider, add:
```bash
# Add to ~/.hermes/.env
FIRECRAWL_API_KEY=fc-***
```
Get your API key at [firecrawl.dev](https://firecrawl.dev). Then select Firecrawl as your browser provider:
```bash
hermes setup tools
# → Browser Automation → Firecrawl
```
Optional settings:
```bash
# Self-hosted Firecrawl instance (default: https://api.firecrawl.dev)
FIRECRAWL_API_URL=http://localhost:3002
# Session TTL in seconds (default: 300)
FIRECRAWL_BROWSER_TTL=600
```
### Hybrid routing: cloud for public URLs, local for LAN/localhost
When a cloud provider is configured, Hermes auto-spawns a **local Chromium sidecar**
for URLs that resolve to a private/loopback/LAN address (`localhost`, `127.0.0.1`,
`192.168.x.x`, `10.x.x.x`, `172.16-31.x.x`, `*.local`, `*.lan`, `*.internal`,
IPv6 loopback `::1`, link-local `169.254.x.x`). Public URLs continue to use the
cloud provider in the same conversation.
This solves the common "I'm developing locally but using Browserbase" workflow —
the agent can screenshot your dashboard at `http://localhost:3000` AND scrape
`https://github.com` without you switching providers or disabling the SSRF guard.
The cloud provider never sees the private URL.
The feature is **on by default**. To disable it (all URLs go to the configured
cloud provider, as before):
```yaml
# ~/.hermes/config.yaml
browser:
cloud_provider: browserbase
auto_local_for_private_urls: false
```
With auto-routing disabled, private URLs are rejected with
`"Blocked: URL targets a private or internal address"` unless you also set
`browser.allow_private_urls: true` (which lets the cloud provider attempt them —
usually won't work since Browserbase etc. can't reach your LAN).
Requirements: the local sidecar uses the same `agent-browser` CLI as pure local
mode, so you need it installed (`hermes setup tools → Browser Automation`
auto-installs it). Post-navigation redirects from a public URL onto a private
address are still blocked (you can't use a redirect-to-internal trick to reach
your LAN through the public path).
### Camofox local mode
[Camofox](https://github.com/jo-inc/camofox-browser) is a self-hosted Node.js server wrapping Camoufox (a Firefox fork with C++ fingerprint spoofing). It provides local anti-detection browsing without cloud dependencies.
```bash
# Clone the Camofox browser server first
git clone https://github.com/jo-inc/camofox-browser
cd camofox-browser
# Build and start with Docker using the default container settings
# (auto-detects arch: aarch64 on M1/M2, x86_64 on Intel)
make up
# Stop and remove the default container
make down
# Force a clean rebuild (for example, after upgrading VERSION/RELEASE)
make reset
# Just download binaries without building
make fetch
# Override arch or version explicitly
make up ARCH=x86_64
make up VERSION=135.0.1 RELEASE=beta.24
```
`make up` starts the default container immediately. If you want custom runtime settings such as a larger Node heap, VNC, or a persistent profile directory, build the image first and then run it yourself:
```bash
# Build the image without starting the default container
make build
# Start with persistence, VNC live view, and a larger Node heap
mkdir -p ~/.camofox-docker
docker run -d \
--name camofox-browser \
--restart unless-stopped \
-p 9377:9377 \
-p 6080:6080 \
-p 5901:5900 \
-e CAMOFOX_PORT=9377 \
-e ENABLE_VNC=1 \
-e VNC_BIND=0.0.0.0 \
-e VNC_RESOLUTION=1920x1080 \
-e MAX_OLD_SPACE_SIZE=2048 \
-v ~/.camofox-docker:/root/.camofox \
camofox-browser:135.0.1-aarch64
```
With VNC enabled, the browser runs in headed mode and can be watched live in your browser at `http://localhost:6080` (noVNC). You can also connect a native VNC client to `localhost:5901`.
If you already ran `make up`, stop and remove that default container before starting the custom one:
```bash
make down
# then run the custom docker run command above
```
Then set in `~/.hermes/.env`:
```bash
CAMOFOX_URL=http://localhost:9377
```
Or configure via `hermes tools` → Browser Automation → Camofox.
When `CAMOFOX_URL` is set, all browser tools automatically route through Camofox instead of Browserbase or agent-browser.
#### Persistent browser sessions
By default, each Camofox session gets a random identity — cookies and logins don't survive across agent restarts. To enable persistent browser sessions, add the following to `~/.hermes/config.yaml`:
```yaml
browser:
camofox:
managed_persistence: true
```
Then fully restart Hermes so the new config is picked up.
:::warning Nested path matters
Hermes reads `browser.camofox.managed_persistence`, **not** a top-level `managed_persistence`. A common mistake is writing:
```yaml
# ❌ Wrong — Hermes ignores this
managed_persistence: true
```
If the flag is placed at the wrong path, Hermes silently falls back to a random ephemeral `userId` and your login state will be lost on every session.
:::
##### What Hermes does
- Sends a deterministic profile-scoped `userId` to Camofox so the server can reuse the same Firefox profile across sessions.
- Skips server-side context destruction on cleanup, so cookies and logins survive between agent tasks.
- Scopes the `userId` to the active Hermes profile, so different Hermes profiles get different browser profiles (profile isolation).
##### What Hermes does not do
- It does not force persistence on the Camofox server. Hermes only sends a stable `userId`; the server must honor it by mapping that `userId` to a persistent Firefox profile directory.
- If your Camofox server build treats every request as ephemeral (e.g. always calls `browser.newContext()` without loading a stored profile), Hermes cannot make those sessions persist. Make sure you are running a Camofox build that implements userId-based profile persistence.
##### Verify it's working
1. Start Hermes and your Camofox server.
2. Open Google (or any login site) in a browser task and sign in manually.
3. End the browser task normally.
4. Start a new browser task.
5. Open the same site again — you should still be signed in.
If step 5 logs you out, the Camofox server isn't honoring the stable `userId`. Double-check your config path, confirm you fully restarted Hermes after editing `config.yaml`, and verify your Camofox server version supports persistent per-user profiles.
##### Where state lives
Hermes derives the stable `userId` from the profile-scoped directory `~/.hermes/browser_auth/camofox/` (or the equivalent under `$HERMES_HOME` for non-default profiles). The actual browser profile data lives on the Camofox server side, keyed by that `userId`. To fully reset a persistent profile, clear it on the Camofox server and remove the corresponding Hermes profile's state directory.
#### VNC live view
When Camofox runs in headed mode (with a visible browser window), it exposes a VNC port in its health check response. Hermes automatically discovers this and includes the VNC URL in navigation responses, so the agent can share a link for you to watch the browser live.
### Local Chrome via CDP (`/browser connect`)
Instead of a cloud provider, you can attach Hermes browser tools to your own running Chrome instance via the Chrome DevTools Protocol (CDP). This is useful when you want to see what the agent is doing in real-time, interact with pages that require your own cookies/sessions, or avoid cloud browser costs.
:::note
`/browser connect` is an **interactive-CLI slash command** — it is not dispatched by the gateway. If you try to run it inside a WebUI, Telegram, Discord, or other gateway chat, the message will be sent to the agent as plain text and the command will not execute. Start Hermes from the terminal (`hermes` or `hermes chat`) and issue `/browser connect` there.
:::
In the CLI, use:
```
/browser connect # Connect to Chrome at ws://localhost:9222
/browser connect ws://host:port # Connect to a specific CDP endpoint
/browser status # Check current connection
/browser disconnect # Detach and return to cloud/local mode
```
If Chrome isn't already running with remote debugging, Hermes will attempt to auto-launch it with `--remote-debugging-port=9222`.
:::tip
To start Chrome manually with CDP enabled, use a dedicated user-data-dir so the debug port actually comes up even if Chrome is already running with your normal profile:
```bash
# Linux
google-chrome \
--remote-debugging-port=9222 \
--user-data-dir=$HOME/.hermes/chrome-debug \
--no-first-run \
--no-default-browser-check &
# macOS
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/.hermes/chrome-debug" \
--no-first-run \
--no-default-browser-check &
```
Then launch the Hermes CLI and run `/browser connect`.
**Why `--user-data-dir`?** Without it, launching Chrome while a regular Chrome instance is already running typically opens a new window on the existing process — and that existing process was not started with `--remote-debugging-port`, so port 9222 never opens. A dedicated user-data-dir forces a fresh Chrome process where the debug port actually listens. `--no-first-run --no-default-browser-check` skips the first-launch wizard for the fresh profile.
:::
When connected via CDP, all browser tools (`browser_navigate`, `browser_click`, etc.) operate on your live Chrome instance instead of spinning up a cloud session.
### WSL2 + Windows Chrome: prefer MCP over `/browser connect`
If Hermes runs inside WSL2 but the Chrome window you want to control runs on the Windows host, `/browser connect` is often not the best path.
Why:
- `/browser connect` expects Hermes itself to reach a usable CDP endpoint
- modern Chrome live-debugging sessions often expose a host-local endpoint that is not directly reachable from WSL the same way a classic `9222` port is
- even when Windows Chrome is debuggable, the cleanest integration is often to let a Windows-side browser MCP server attach to Chrome and let Hermes talk to that MCP server
For that setup, prefer `chrome-devtools-mcp` through Hermes MCP support.
See the MCP guide for the practical setup:
- [Use MCP with Hermes](../../guides/use-mcp-with-hermes.md#wsl2-bridge-hermes-in-wsl-to-windows-chrome)
### Local browser mode
If you do **not** set any cloud credentials and don't use `/browser connect`, Hermes can still use the browser tools through a local Chromium install driven by `agent-browser`.
### Optional Environment Variables
```bash
# Residential proxies for better CAPTCHA solving (default: "true")
BROWSERBASE_PROXIES=true
# Advanced stealth with custom Chromium — requires Scale Plan (default: "false")
BROWSERBASE_ADVANCED_STEALTH=false
# Session reconnection after disconnects — requires paid plan (default: "true")
BROWSERBASE_KEEP_ALIVE=true
# Custom session timeout in milliseconds (default: project default)
# Examples: 600000 (10min), 1800000 (30min)
BROWSERBASE_SESSION_TIMEOUT=600000
# Inactivity timeout before auto-cleanup in seconds (default: 120)
BROWSER_INACTIVITY_TIMEOUT=120
```
### Install agent-browser CLI
```bash
npm install -g agent-browser
# Or install locally in the repo:
npm install
```
:::info
The `browser` toolset must be included in your config's `toolsets` list or enabled via `hermes config set toolsets '["hermes-cli", "browser"]'`.
:::
## Available Tools
### `browser_navigate`
Navigate to a URL. Must be called before any other browser tool. Initializes the Browserbase session.
```
Navigate to https://github.com/NousResearch
```
:::tip
For simple information retrieval, prefer `web_search` or `web_extract` — they are faster and cheaper. Use browser tools when you need to **interact** with a page (click buttons, fill forms, handle dynamic content).
:::
### `browser_snapshot`
Get a text-based snapshot of the current page's accessibility tree. Returns interactive elements with ref IDs like `@e1`, `@e2` for use with `browser_click` and `browser_type`.
- **`full=false`** (default): Compact view showing only interactive elements
- **`full=true`**: Complete page content
Snapshots over 8000 characters are automatically summarized by an LLM.
### `browser_click`
Click an element identified by its ref ID from the snapshot.
```
Click @e5 to press the "Sign In" button
```
### `browser_type`
Type text into an input field. Clears the field first, then types the new text.
```
Type "hermes agent" into the search field @e3
```
### `browser_scroll`
Scroll the page up or down to reveal more content.
```
Scroll down to see more results
```
### `browser_press`
Press a keyboard key. Useful for submitting forms or navigation.
```
Press Enter to submit the form
```
Supported keys: `Enter`, `Tab`, `Escape`, `ArrowDown`, `ArrowUp`, and more.
### `browser_back`
Navigate back to the previous page in browser history.
### `browser_get_images`
List all images on the current page with their URLs and alt text. Useful for finding images to analyze.
### `browser_vision`
Take a screenshot and analyze it with vision AI. Use this when text snapshots don't capture important visual information — especially useful for CAPTCHAs, complex layouts, or visual verification challenges.
The screenshot is saved persistently and the file path is returned alongside the AI analysis. On messaging platforms (Telegram, Discord, Slack, WhatsApp), you can ask the agent to share the screenshot — it will be sent as a native photo attachment via the `MEDIA:` mechanism.
```
What does the chart on this page show?
```
Screenshots are stored in `~/.hermes/cache/screenshots/` and automatically cleaned up after 24 hours.
### `browser_console`
Get browser console output (log/warn/error messages) and uncaught JavaScript exceptions from the current page. Essential for detecting silent JS errors that don't appear in the accessibility tree.
```
Check the browser console for any JavaScript errors
```
Use `clear=True` to clear the console after reading, so subsequent calls only show new messages.
### `browser_cdp`
Raw Chrome DevTools Protocol passthrough — the escape hatch for browser operations not covered by the other tools. Use for native dialog handling, iframe-scoped evaluation, cookie/network control, or any CDP verb the agent needs.
**Only available when a CDP endpoint is reachable at session start** — meaning `/browser connect` has attached to a running Chrome, or `browser.cdp_url` is set in `config.yaml`. The default local agent-browser mode, Camofox, and cloud providers (Browserbase, Browser Use, Firecrawl) do not currently expose CDP to this tool — cloud providers have per-session CDP URLs but live-session routing is a follow-up.
**CDP method reference:** https://chromedevtools.github.io/devtools-protocol/ — the agent can `web_extract` a specific method's page to look up parameters and return shape.
Common patterns:
```
# List tabs (browser-level, no target_id)
browser_cdp(method="Target.getTargets")
# Handle a native JS dialog on a tab
browser_cdp(method="Page.handleJavaScriptDialog",
params={"accept": true, "promptText": ""},
target_id="")
# Evaluate JS in a specific tab
browser_cdp(method="Runtime.evaluate",
params={"expression": "document.title", "returnByValue": true},
target_id="")
# Get all cookies
browser_cdp(method="Network.getAllCookies")
```
Browser-level methods (`Target.*`, `Browser.*`, `Storage.*`) omit `target_id`. Page-level methods (`Page.*`, `Runtime.*`, `DOM.*`, `Emulation.*`) require a `target_id` from `Target.getTargets`. Each stateless call is independent — sessions do not persist between calls.
**Cross-origin iframes:** pass `frame_id` (from `browser_snapshot.frame_tree.children[]` where `is_oopif=true`) to route the CDP call through the supervisor's live session for that iframe. This is how `Runtime.evaluate` inside a cross-origin iframe works on Browserbase, where stateless CDP connections would hit signed-URL expiry. Example:
```
browser_cdp(
method="Runtime.evaluate",
params={"expression": "document.title", "returnByValue": True},
frame_id="",
)
```
Same-origin iframes don't need `frame_id` — use `document.querySelector('iframe').contentDocument` from a top-level `Runtime.evaluate` instead.
### `browser_dialog`
Responds to a native JS dialog (`alert` / `confirm` / `prompt` / `beforeunload`). Before this tool existed, dialogs would silently block the page's JavaScript thread and subsequent `browser_*` calls would hang or throw; now the agent sees pending dialogs in `browser_snapshot` output and responds explicitly.
**Workflow:**
1. Call `browser_snapshot`. If a dialog is blocking the page, it shows up as `pending_dialogs: [{"id": "d-1", "type": "alert", "message": "..."}]`.
2. Call `browser_dialog(action="accept")` or `browser_dialog(action="dismiss")`. For `prompt()` dialogs, pass `prompt_text="..."` to supply the response.
3. Re-snapshot — `pending_dialogs` is empty; the page's JS thread has resumed.
**Detection happens automatically** via a persistent CDP supervisor — one WebSocket per task that subscribes to Page/Runtime/Target events. The supervisor also populates a `frame_tree` field in the snapshot so the agent can see the iframe structure of the current page, including cross-origin (OOPIF) iframes.
**Availability matrix:**
| Backend | Detection via `pending_dialogs` | Response (`browser_dialog` tool) |
|---|---|---|
| Local Chrome via `/browser connect` or `browser.cdp_url` | ✓ | ✓ full workflow |
| Browserbase | ✓ | ✓ full workflow (via injected XHR bridge) |
| Camofox / default local agent-browser | ✗ | ✗ (no CDP endpoint) |
**How it works on Browserbase.** Browserbase's CDP proxy auto-dismisses real native dialogs server-side within ~10ms, so we can't use `Page.handleJavaScriptDialog`. The supervisor injects a small script via `Page.addScriptToEvaluateOnNewDocument` that overrides `window.alert`/`confirm`/`prompt` with a synchronous XHR. We intercept those XHRs via `Fetch.enable` — the page's JS thread stays blocked on the XHR until we call `Fetch.fulfillRequest` with the agent's response. `prompt()` return values round-trip back into page JS unchanged.
**Dialog policy** is configured in `config.yaml` under `browser.dialog_policy`:
| Policy | Behavior |
|--------|----------|
| `must_respond` (default) | Capture, surface in snapshot, wait for explicit `browser_dialog()` call. Safety auto-dismiss after `browser.dialog_timeout_s` (default 300s) so a buggy agent can't stall forever. |
| `auto_dismiss` | Capture, dismiss immediately. Agent still sees the dialog in `browser_state` history but doesn't have to act. |
| `auto_accept` | Capture, accept immediately. Useful when navigating pages with aggressive `beforeunload` prompts. |
**Frame tree** inside `browser_snapshot.frame_tree` is capped to 30 frames and OOPIF depth 2 to keep payloads bounded on ad-heavy pages. A `truncated: true` flag surfaces when limits were hit; agents needing the full tree can use `browser_cdp` with `Page.getFrameTree`.
## Practical Examples
### Filling Out a Web Form
```
User: Sign up for an account on example.com with my email john@example.com
Agent workflow:
1. browser_navigate("https://example.com/signup")
2. browser_snapshot() → sees form fields with refs
3. browser_type(ref="@e3", text="john@example.com")
4. browser_type(ref="@e5", text="SecurePass123")
5. browser_click(ref="@e8") → clicks "Create Account"
6. browser_snapshot() → confirms success
```
### Researching Dynamic Content
```
User: What are the top trending repos on GitHub right now?
Agent workflow:
1. browser_navigate("https://github.com/trending")
2. browser_snapshot(full=true) → reads trending repo list
3. Returns formatted results
```
## Session Recording
Automatically record browser sessions as WebM video files:
```yaml
browser:
record_sessions: true # default: false
```
When enabled, recording starts automatically on the first `browser_navigate` and saves to `~/.hermes/browser_recordings/` when the session closes. Works in both local and cloud (Browserbase) modes. Recordings older than 72 hours are automatically cleaned up.
## Stealth Features
Browserbase provides automatic stealth capabilities:
| Feature | Default | Notes |
|---------|---------|-------|
| Basic Stealth | Always on | Random fingerprints, viewport randomization, CAPTCHA solving |
| Residential Proxies | On | Routes through residential IPs for better access |
| Advanced Stealth | Off | Custom Chromium build, requires Scale Plan |
| Keep Alive | On | Session reconnection after network hiccups |
:::note
If paid features aren't available on your plan, Hermes automatically falls back — first disabling `keepAlive`, then proxies — so browsing still works on free plans.
:::
## Session Management
- Each task gets an isolated browser session via Browserbase
- Sessions are automatically cleaned up after inactivity (default: 2 minutes)
- A background thread checks every 30 seconds for stale sessions
- Emergency cleanup runs on process exit to prevent orphaned sessions
- Sessions are released via the Browserbase API (`REQUEST_RELEASE` status)
## Limitations
- **Text-based interaction** — relies on accessibility tree, not pixel coordinates
- **Snapshot size** — large pages may be truncated or LLM-summarized at 8000 characters
- **Session timeout** — cloud sessions expire based on your provider's plan settings
- **Cost** — cloud sessions consume provider credits; sessions are automatically cleaned up when the conversation ends or after inactivity. Use `/browser connect` for free local browsing.
- **No file downloads** — cannot download files from the browser
---
# user-guide/features/vision
# Vision & Image Paste
Hermes Agent supports **multimodal vision** — you can paste images from your clipboard directly into the CLI and ask the agent to analyze, describe, or work with them. Images are sent to the model as base64-encoded content blocks, so any vision-capable model can process them.
## How It Works
1. Copy an image to your clipboard (screenshot, browser image, etc.)
2. Attach it using one of the methods below
3. Type your question and press Enter
4. The image appears as a `[📎 Image #1]` badge above the input
5. On submit, the image is sent to the model as a vision content block
You can attach multiple images before sending — each gets its own badge. Press `Ctrl+C` to clear all attached images.
Images are saved to `~/.hermes/images/` as PNG files with timestamped filenames.
## Paste Methods
How you attach an image depends on your terminal environment. Not all methods work everywhere — here's the full breakdown:
### `/paste` Command
**The most reliable explicit image-attach fallback.**
```
/paste
```
Type `/paste` and press Enter. Hermes checks your clipboard for an image and attaches it. This is the safest option when your terminal rewrites `Cmd+V`/`Ctrl+V`, or when you copied only an image and there is no bracketed-paste text payload to inspect.
### Ctrl+V / Cmd+V
Hermes now treats paste as a layered flow:
- normal text paste first
- native clipboard / OSC52 text fallback if the terminal did not deliver text cleanly
- image attach when the clipboard or pasted payload resolves to an image or image path
This means pasted macOS screenshot temp paths and `file://...` image URIs can attach immediately instead of sitting in the composer as raw text.
:::warning
If your clipboard has **only an image** (no text), terminals still cannot send binary image bytes directly. Use `/paste` as the explicit image-attach fallback.
:::
### `/terminal-setup` for VS Code / Cursor / Windsurf
If you run the TUI inside a local VS Code-family integrated terminal on macOS, Hermes can install the recommended `workbench.action.terminal.sendSequence` bindings for better multiline and undo/redo parity:
```text
/terminal-setup
```
This is especially useful when `Cmd+Enter`, `Cmd+Z`, or `Shift+Cmd+Z` are being intercepted by the IDE. Run it on the local machine only — not inside an SSH session.
## Platform Compatibility
| Environment | `/paste` | Cmd/Ctrl+V | `/terminal-setup` | Notes |
|---|:---:|:---:|:---:|---|
| **macOS Terminal / iTerm2** | ✅ | ✅ | n/a | Best experience — native clipboard + screenshot-path recovery |
| **Apple Terminal** | ✅ | ✅ | n/a | If Cmd+←/→/⌫ gets rewritten, use Ctrl+A / Ctrl+E / Ctrl+U fallbacks |
| **Linux X11 desktop** | ✅ | ✅ | n/a | Requires `xclip` (`apt install xclip`) |
| **Linux Wayland desktop** | ✅ | ✅ | n/a | Requires `wl-paste` (`apt install wl-clipboard`) |
| **WSL2 (Windows Terminal)** | ✅ | ✅ | n/a | Uses `powershell.exe` — no extra install needed |
| **VS Code / Cursor / Windsurf (local)** | ✅ | ✅ | ✅ | Recommended for better Cmd+Enter / undo / redo parity |
| **VS Code / Cursor / Windsurf (SSH)** | ❌² | ❌² | ❌³ | Run `/terminal-setup` on the local machine instead |
| **SSH terminal (any)** | ❌² | ❌² | n/a | Remote clipboard not accessible |
² See [SSH & Remote Sessions](#ssh--remote-sessions) below
³ The command writes local IDE keybindings and should not be run from the remote host
## Platform-Specific Setup
### macOS
**No setup required.** Hermes uses `osascript` (built into macOS) to read the clipboard. For faster performance, optionally install `pngpaste`:
```bash
brew install pngpaste
```
### Linux (X11)
Install `xclip`:
```bash
# Ubuntu/Debian
sudo apt install xclip
# Fedora
sudo dnf install xclip
# Arch
sudo pacman -S xclip
```
### Linux (Wayland)
Modern Linux desktops (Ubuntu 22.04+, Fedora 34+) often use Wayland by default. Install `wl-clipboard`:
```bash
# Ubuntu/Debian
sudo apt install wl-clipboard
# Fedora
sudo dnf install wl-clipboard
# Arch
sudo pacman -S wl-clipboard
```
:::tip How to check if you're on Wayland
```bash
echo $XDG_SESSION_TYPE
# "wayland" = Wayland, "x11" = X11, "tty" = no display server
```
:::
### WSL2
**No extra setup required.** Hermes detects WSL2 automatically (via `/proc/version`) and uses `powershell.exe` to access the Windows clipboard through .NET's `System.Windows.Forms.Clipboard`. This is built into WSL2's Windows interop — `powershell.exe` is available by default.
The clipboard data is transferred as base64-encoded PNG over stdout, so no file path conversion or temp files are needed.
:::info WSLg Note
If you're running WSLg (WSL2 with GUI support), Hermes tries the PowerShell path first, then falls back to `wl-paste`. WSLg's clipboard bridge only supports BMP format for images — Hermes auto-converts BMP to PNG using Pillow (if installed) or ImageMagick's `convert` command.
:::
#### Verify WSL2 clipboard access
```bash
# 1. Check WSL detection
grep -i microsoft /proc/version
# 2. Check PowerShell is accessible
which powershell.exe
# 3. Copy an image, then check
powershell.exe -NoProfile -Command "Add-Type -AssemblyName System.Windows.Forms; [System.Windows.Forms.Clipboard]::ContainsImage()"
# Should print "True"
```
## SSH & Remote Sessions
**Clipboard image paste does not fully work over SSH.** When you SSH into a remote machine, the Hermes CLI runs on the remote host. Clipboard tools (`xclip`, `wl-paste`, `powershell.exe`, `osascript`) read the clipboard of the machine they run on — which is the remote server, not your local machine. Your local clipboard image is therefore inaccessible from the remote side.
Text can sometimes still bridge through terminal paste or OSC52, but image clipboard access and local screenshot temp paths remain tied to the machine running Hermes.
### Workarounds for SSH
1. **Upload the image file** — Save the image locally, upload it to the remote server via `scp`, VSCode's file explorer (drag-and-drop), or any file transfer method. Then reference it by path. *(A `/attach ` command is planned for a future release.)*
2. **Use a URL** — If the image is accessible online, just paste the URL in your message. The agent can use `vision_analyze` to look at any image URL directly.
3. **X11 forwarding** — Connect with `ssh -X` to forward X11. This lets `xclip` on the remote machine access your local X11 clipboard. Requires an X server running locally (XQuartz on macOS, built-in on Linux X11 desktops). Slow for large images.
4. **Use a messaging platform** — Send images to Hermes via Telegram, Discord, Slack, or WhatsApp. These platforms handle image upload natively and are not affected by clipboard/terminal limitations.
## Why Terminals Can't Paste Images
This is a common source of confusion, so here's the technical explanation:
Terminals are **text-based** interfaces. When you press Ctrl+V (or Cmd+V), the terminal emulator:
1. Reads the clipboard for **text content**
2. Wraps it in [bracketed paste](https://en.wikipedia.org/wiki/Bracketed-paste) escape sequences
3. Sends it to the application through the terminal's text stream
If the clipboard contains only an image (no text), the terminal has nothing to send. There is no standard terminal escape sequence for binary image data. The terminal simply does nothing.
This is why Hermes uses a separate clipboard check — instead of receiving image data through the terminal paste event, it calls OS-level tools (`osascript`, `powershell.exe`, `xclip`, `wl-paste`) directly via subprocess to read the clipboard independently.
## Supported Models
Image paste works with any vision-capable model. The image is sent as a base64-encoded data URL in the OpenAI vision content format:
```json
{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,..."
}
}
```
Most modern models support this format, including GPT-4 Vision, Claude (with vision), Gemini, and open-source multimodal models served through OpenRouter.
## Image Routing (Vision-Capable vs Text-Only Models)
When a user attaches an image — from the CLI clipboard, the gateway (Telegram/Discord photo), or any other entry point — Hermes routes it based on whether your current model actually supports vision:
| Your model | What happens to the image |
|---|---|
| **Vision-capable** (GPT-4V, Claude with vision, Gemini, Qwen-VL, MiMo-VL, etc.) | Sent as **real pixels** using the provider's native image content format above. No text summary layer. |
| **Text-only** (DeepSeek V3, smaller open-source models, older chat-only endpoints) | Routed through the `vision_analyze` auxiliary tool — an auxiliary vision model describes the image, and the text description is injected into the conversation. |
You don't configure this — Hermes looks up your current model's capability in the provider metadata and picks the right path automatically. The practical effect: you can switch between vision and non-vision models mid-session and image handling "just works" without changing your workflow. Text-only models get coherent context about the image rather than a broken multimodal payload they'd have to reject.
Which auxiliary model handles the text-description path is configurable under `auxiliary.vision` — see [Auxiliary Models](/docs/user-guide/configuration#auxiliary-models).
---
# user-guide/features/image-generation
# Image Generation
Hermes Agent generates images from text prompts via FAL.ai. Nine models are supported out of the box, each with different speed, quality, and cost tradeoffs. The active model is user-configurable via `hermes tools` and persists in `config.yaml`.
## Supported Models
| Model | Speed | Strengths | Price |
|---|---|---|---|
| `fal-ai/flux-2/klein/9b` *(default)* | `<1s` | Fast, crisp text | $0.006/MP |
| `fal-ai/flux-2-pro` | ~6s | Studio photorealism | $0.03/MP |
| `fal-ai/z-image/turbo` | ~2s | Bilingual EN/CN, 6B params | $0.005/MP |
| `fal-ai/nano-banana-pro` | ~8s | Gemini 3 Pro, reasoning depth, text rendering | $0.15/image (1K) |
| `fal-ai/gpt-image-1.5` | ~15s | Prompt adherence | $0.034/image |
| `fal-ai/gpt-image-2` | ~20s | SOTA text rendering + CJK, world-aware photorealism | $0.04–0.06/image |
| `fal-ai/ideogram/v3` | ~5s | Best typography | $0.03–0.09/image |
| `fal-ai/recraft/v4/pro/text-to-image` | ~8s | Design, brand systems, production-ready | $0.25/image |
| `fal-ai/qwen-image` | ~12s | LLM-based, complex text | $0.02/MP |
Prices are FAL's pricing at time of writing; check [fal.ai](https://fal.ai/) for current numbers.
## Setup
:::tip Nous Subscribers
If you have a paid [Nous Portal](https://portal.nousresearch.com) subscription, you can use image generation through the **[Tool Gateway](tool-gateway.md)** without a FAL API key. Your model selection persists across both paths.
If the managed gateway returns `HTTP 4xx` for a specific model, that model isn't yet proxied on the portal side — the agent will tell you so, with remediation steps (set `FAL_KEY` for direct access, or pick a different model).
:::
### Get a FAL API Key
1. Sign up at [fal.ai](https://fal.ai/)
2. Generate an API key from your dashboard
### Configure and Pick a Model
Run the tools command:
```bash
hermes tools
```
Navigate to **🎨 Image Generation**, pick your backend (Nous Subscription or FAL.ai), then the picker shows all supported models in a column-aligned table — arrow keys to navigate, Enter to select:
```
Model Speed Strengths Price
fal-ai/flux-2/klein/9b <1s Fast, crisp text $0.006/MP ← currently in use
fal-ai/flux-2-pro ~6s Studio photorealism $0.03/MP
fal-ai/z-image/turbo ~2s Bilingual EN/CN, 6B $0.005/MP
...
```
Your selection is saved to `config.yaml`:
```yaml
image_gen:
model: fal-ai/flux-2/klein/9b
use_gateway: false # true if using Nous Subscription
```
### GPT-Image Quality
The `fal-ai/gpt-image-1.5` and `fal-ai/gpt-image-2` request quality is pinned to `medium` (~$0.034–$0.06/image at 1024×1024). We don't expose the `low` / `high` tiers as a user-facing option so that Nous Portal billing stays predictable across all users — the cost spread between tiers is 3–22×. If you want a cheaper option, pick Klein 9B or Z-Image Turbo; if you want higher quality, use Nano Banana Pro or Recraft V4 Pro.
## Usage
The agent-facing schema is intentionally minimal — the model picks up whatever you've configured:
```
Generate an image of a serene mountain landscape with cherry blossoms
```
```
Create a square portrait of a wise old owl — use the typography model
```
```
Make me a futuristic cityscape, landscape orientation
```
## Aspect Ratios
Every model accepts the same three aspect ratios from the agent's perspective. Internally, each model's native size spec is filled in automatically:
| Agent input | image_size (flux/z-image/qwen/recraft/ideogram) | aspect_ratio (nano-banana-pro) | image_size (gpt-image-1.5) | image_size (gpt-image-2) |
|---|---|---|---|---|
| `landscape` | `landscape_16_9` | `16:9` | `1536x1024` | `landscape_4_3` (1024×768) |
| `square` | `square_hd` | `1:1` | `1024x1024` | `square_hd` (1024×1024) |
| `portrait` | `portrait_16_9` | `9:16` | `1024x1536` | `portrait_4_3` (768×1024) |
GPT Image 2 maps to 4:3 presets rather than 16:9 because its minimum pixel count is 655,360 — the `landscape_16_9` preset (1024×576 = 589,824) would be rejected.
This translation happens in `_build_fal_payload()` — agent code never has to know about per-model schema differences.
## Automatic Upscaling
Upscaling via FAL's **Clarity Upscaler** is gated per-model:
| Model | Upscale? | Why |
|---|---|---|
| `fal-ai/flux-2-pro` | ✓ | Backward-compat (was the pre-picker default) |
| All others | ✗ | Fast models would lose their sub-second value prop; hi-res models don't need it |
When upscaling runs, it uses these settings:
| Setting | Value |
|---|---|
| Upscale factor | 2× |
| Creativity | 0.35 |
| Resemblance | 0.6 |
| Guidance scale | 4 |
| Inference steps | 18 |
If upscaling fails (network issue, rate limit), the original image is returned automatically.
## How It Works Internally
1. **Model resolution** — `_resolve_fal_model()` reads `image_gen.model` from `config.yaml`, falls back to the `FAL_IMAGE_MODEL` env var, then to `fal-ai/flux-2/klein/9b`.
2. **Payload building** — `_build_fal_payload()` translates your `aspect_ratio` into the model's native format (preset enum, aspect-ratio enum, or GPT literal), merges the model's default params, applies any caller overrides, then filters to the model's `supports` whitelist so unsupported keys are never sent.
3. **Submission** — `_submit_fal_request()` routes via direct FAL credentials or the managed Nous gateway.
4. **Upscaling** — runs only if the model's metadata has `upscale: True`.
5. **Delivery** — final image URL returned to the agent, which emits a `MEDIA:` tag that platform adapters convert to native media.
## Debugging
Enable debug logging:
```bash
export IMAGE_TOOLS_DEBUG=true
```
Debug logs go to `./logs/image_tools_debug_.json` with per-call details (model, parameters, timing, errors).
## Platform Delivery
| Platform | Delivery |
|---|---|
| **CLI** | Image URL printed as markdown `` — click to open |
| **Telegram** | Photo message with the prompt as caption |
| **Discord** | Embedded in a message |
| **Slack** | URL unfurled by Slack |
| **WhatsApp** | Media message |
| **Others** | URL in plain text |
## Limitations
- **Requires FAL credentials** (direct `FAL_KEY` or Nous Subscription)
- **Text-to-image only** — no inpainting, img2img, or editing via this tool
- **Temporary URLs** — FAL returns hosted URLs that expire after hours/days; save locally if needed
- **Per-model constraints** — some models don't support `seed`, `num_inference_steps`, etc. The `supports` filter silently drops unsupported params; this is expected behavior
---
# Voice & TTS
# Voice & TTS
Hermes Agent supports both text-to-speech output and voice message transcription across all messaging platforms.
:::tip Nous Subscribers
If you have a paid [Nous Portal](https://portal.nousresearch.com) subscription, OpenAI TTS is available through the **[Tool Gateway](tool-gateway.md)** without a separate OpenAI API key. Run `hermes model` or `hermes tools` to enable it.
:::
## Text-to-Speech
Convert text to speech with ten providers:
| Provider | Quality | Cost | API Key |
|----------|---------|------|---------|
| **Edge TTS** (default) | Good | Free | None needed |
| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
| **MiniMax TTS** | Excellent | Paid | `MINIMAX_API_KEY` |
| **Mistral (Voxtral TTS)** | Excellent | Paid | `MISTRAL_API_KEY` |
| **Google Gemini TTS** | Excellent | Free tier | `GEMINI_API_KEY` |
| **xAI TTS** | Excellent | Paid | `XAI_API_KEY` |
| **NeuTTS** | Good | Free (local) | None needed |
| **KittenTTS** | Good | Free (local) | None needed |
| **Piper** | Good | Free (local) | None needed |
### Platform Delivery
| Platform | Delivery | Format |
|----------|----------|--------|
| Telegram | Voice bubble (plays inline) | Opus `.ogg` |
| Discord | Voice bubble (Opus/OGG), falls back to file attachment | Opus/MP3 |
| WhatsApp | Audio file attachment | MP3 |
| CLI | Saved to `~/.hermes/audio_cache/` | MP3 |
### Configuration
```yaml
# In ~/.hermes/config.yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai" | "minimax" | "mistral" | "gemini" | "xai" | "neutts" | "kittentts" | "piper"
speed: 1.0 # Global speed multiplier (provider-specific settings override this)
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
speed: 1.0 # Converted to rate percentage (+/-%)
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB" # Adam
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
base_url: "https://api.openai.com/v1" # Override for OpenAI-compatible TTS endpoints
speed: 1.0 # 0.25 - 4.0
minimax:
model: "speech-2.8-hd" # speech-2.8-hd (default), speech-2.8-turbo
voice_id: "English_Graceful_Lady" # See https://platform.minimax.io/faq/system-voice-id
speed: 1 # 0.5 - 2.0
vol: 1 # 0 - 10
pitch: 0 # -12 - 12
mistral:
model: "voxtral-mini-tts-2603"
voice_id: "c69964a6-ab8b-4f8a-9465-ec0925096ec8" # Paul - Neutral (default)
gemini:
model: "gemini-2.5-flash-preview-tts" # or gemini-2.5-pro-preview-tts
voice: "Kore" # 30 prebuilt voices: Zephyr, Puck, Kore, Enceladus, Gacrux, etc.
xai:
voice_id: "eve" # or a custom voice ID — see docs below
language: "en" # ISO 639-1 code
sample_rate: 24000 # 22050 / 24000 (default) / 44100 / 48000
bit_rate: 128000 # MP3 bitrate; only applies when codec=mp3
# base_url: "https://api.x.ai/v1" # Override via XAI_BASE_URL env var
neutts:
ref_audio: ''
ref_text: ''
model: neuphonic/neutts-air-q4-gguf
device: cpu
kittentts:
model: KittenML/kitten-tts-nano-0.8-int8 # 25MB int8; also: kitten-tts-micro-0.8 (41MB), kitten-tts-mini-0.8 (80MB)
voice: Jasper # Jasper, Bella, Luna, Bruno, Rosie, Hugo, Kiki, Leo
speed: 1.0 # 0.5 - 2.0
clean_text: true # Expand numbers, currencies, units
piper:
voice: en_US-lessac-medium # voice name (auto-downloaded) OR absolute path to .onnx
# voices_dir: '' # default: ~/.hermes/cache/piper-voices/
# use_cuda: false # requires onnxruntime-gpu
# length_scale: 1.0 # 2.0 = twice as slow
# noise_scale: 0.667
# noise_w_scale: 0.8
# volume: 1.0 # 0.5 = half as loud
# normalize_audio: true
```
**Speed control**: The global `tts.speed` value applies to all providers by default. Each provider can override it with its own `speed` setting (e.g., `tts.openai.speed: 1.5`). Provider-specific speed takes precedence over the global value. Default is `1.0` (normal speed).
### Input length limits
Each provider has a documented per-request input-character cap. Hermes truncates text before calling the provider so requests never fail with a length error:
| Provider | Default cap (chars) |
|----------|---------------------|
| Edge TTS | 5000 |
| OpenAI | 4096 |
| xAI | 15000 |
| MiniMax | 10000 |
| Mistral | 4000 |
| Google Gemini | 5000 |
| ElevenLabs | Model-aware (see below) |
| NeuTTS | 2000 |
| KittenTTS | 2000 |
**ElevenLabs** picks a cap from the configured `model_id`:
| `model_id` | Cap (chars) |
|------------|-------------|
| `eleven_flash_v2_5` | 40000 |
| `eleven_flash_v2` | 30000 |
| `eleven_multilingual_v2` (default), `eleven_multilingual_v1`, `eleven_english_sts_v2`, `eleven_english_sts_v1` | 10000 |
| `eleven_v3`, `eleven_ttv_v3` | 5000 |
| Unknown model | Falls back to provider default (10000) |
**Override per provider** with `max_text_length:` under the provider section of your TTS config:
```yaml
tts:
openai:
max_text_length: 8192 # raise or lower the provider cap
```
Only positive integers are honored. Zero, negative, non-numeric, or boolean values fall through to the provider default, so a broken config can't accidentally disable truncation.
### Telegram Voice Bubbles & ffmpeg
Telegram voice bubbles require Opus/OGG audio format:
- **OpenAI, ElevenLabs, and Mistral** produce Opus natively — no extra setup
- **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:
- **MiniMax TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles
- **Google Gemini TTS** outputs raw PCM and uses **ffmpeg** to encode Opus directly for Telegram voice bubbles
- **xAI TTS** outputs MP3 and needs **ffmpeg** to convert for Telegram voice bubbles
- **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
- **KittenTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
- **Piper** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
```bash
# Ubuntu/Debian
sudo apt install ffmpeg
# macOS
brew install ffmpeg
# Fedora
sudo dnf install ffmpeg
```
Without ffmpeg, Edge TTS, MiniMax TTS, NeuTTS, KittenTTS, and Piper audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
:::tip
If you want voice bubbles without installing ffmpeg, switch to the OpenAI, ElevenLabs, or Mistral provider.
:::
### xAI Custom Voices (voice cloning)
xAI supports cloning your voice and using it with TTS. Create a custom voice in the [xAI Console](https://console.x.ai/team/default/voice/voice-library), then set the resulting `voice_id` in your config:
```yaml
tts:
provider: xai
xai:
voice_id: "nlbqfwie" # your custom voice ID
```
See the [xAI Custom Voices docs](https://docs.x.ai/developers/model-capabilities/audio/custom-voices) for details on recording, supported formats, and limits.
### Piper (local, 44 languages)
Piper is a fast, local neural TTS engine from the Open Home Foundation (the Home Assistant maintainers). It runs entirely on CPU, supports **44 languages** with pre-trained voices, and needs no API key.
**Install via `hermes tools`** → Voice & TTS → Piper — Hermes runs `pip install piper-tts` for you. Or install manually: `pip install piper-tts`.
**Switch to Piper:**
```yaml
tts:
provider: piper
piper:
voice: en_US-lessac-medium
```
On the first TTS call for a voice that isn't cached locally, Hermes runs `python -m piper.download_voices ` and downloads the model (~20-90MB depending on quality tier) into `~/.hermes/cache/piper-voices/`. Subsequent calls reuse the cached model.
**Picking a voice.** The [full voice catalog](https://github.com/OHF-Voice/piper1-gpl/blob/main/docs/VOICES.md) covers English, Spanish, French, German, Italian, Dutch, Portuguese, Russian, Polish, Turkish, Chinese, Arabic, Hindi, and more — each with `x_low` / `low` / `medium` / `high` quality tiers. Sample voices at [rhasspy.github.io/piper-samples](https://rhasspy.github.io/piper-samples/).
**Using a pre-downloaded voice.** Set `tts.piper.voice` to an absolute path ending in `.onnx`:
```yaml
tts:
piper:
voice: /path/to/my-custom-voice.onnx
```
**Advanced knobs** (`tts.piper.length_scale` / `noise_scale` / `noise_w_scale` / `volume` / `normalize_audio`, `use_cuda`) correspond 1:1 to Piper's `SynthesisConfig`. They're ignored on older `piper-tts` versions.
### Custom command providers
If a TTS engine you want isn't natively supported (VoxCPM, MLX-Kokoro, XTTS CLI, a voice-cloning script, anything else that exposes a CLI), you can wire it in as a **command-type provider** without writing any Python. Hermes writes the input text to a temp UTF-8 file, runs your shell command, and reads the audio file the command produced.
Declare one or more providers under `tts.providers.` and switch between them with `tts.provider: ` — the same way you switch between built-ins like `edge` and `openai`.
```yaml
tts:
provider: voxcpm # pick any name under tts.providers
providers:
voxcpm:
type: command
command: "voxcpm --ref ~/voice.wav --text-file {input_path} --out {output_path}"
output_format: mp3
timeout: 180
voice_compatible: true # try to deliver as a Telegram voice bubble
mlx-kokoro:
type: command
command: "python -m mlx_kokoro --in {input_path} --out {output_path} --voice {voice}"
voice: af_sky
output_format: wav
piper-custom: # native Piper also supports custom .onnx via tts.piper.voice
type: command
command: "piper -m /path/to/custom.onnx -f {output_path} < {input_path}"
output_format: wav
```
#### Example: Doubao (Chinese seed-tts-2.0)
For high-quality Chinese TTS via ByteDance's [seed-tts-2.0](https://www.volcengine.com/docs/6561/1257544) bidirectional-streaming API, install the [`doubao-speech`](https://pypi.org/project/doubao-speech/) PyPI package and wire it in as a command provider:
```bash
pip install doubao-speech
export VOLCENGINE_APP_ID="your-app-id"
export VOLCENGINE_ACCESS_TOKEN="your-access-token"
```
```yaml
tts:
provider: doubao
providers:
doubao:
type: command
command: "doubao-speech say --text-file {input_path} --out {output_path}"
output_format: mp3
max_text_length: 1024
timeout: 30
```
Credentials come from your shell environment (`VOLCENGINE_APP_ID` / `VOLCENGINE_ACCESS_TOKEN`) or `~/.doubao-speech/config.yaml`. Pick a voice by adding `--voice zh-female-warm` (or any other alias from `doubao-speech list-voices`) to the command. `doubao-speech` also bundles streaming ASR — see the [STT section below](#example-doubao--volcengine-asr) for Hermes integration. Source and full docs: [github.com/Hypnus-Yuan/doubao-speech](https://github.com/Hypnus-Yuan/doubao-speech).
#### Placeholders
Your command template can reference these placeholders. Hermes substitutes them at render time and shell-quotes each value for the surrounding context (bare / single-quoted / double-quoted), so paths with spaces and other shell-sensitive characters are safe.
| Placeholder | Meaning |
|------------------|------------------------------------------------------|
| `{input_path}` | Path to the temp UTF-8 text file Hermes wrote |
| `{text_path}` | Alias for `{input_path}` |
| `{output_path}` | Path the command must write audio to |
| `{format}` | `mp3` / `wav` / `ogg` / `flac` |
| `{voice}` | `tts.providers..voice`, empty when unset |
| `{model}` | `tts.providers..model` |
| `{speed}` | Resolved speed multiplier (provider or global) |
Use `{{` and `}}` for literal braces.
#### Optional keys
| Key | Default | Meaning |
|--------------------|---------|------------------------------------------------------------------------------------------------------------|
| `timeout` | `120` | Seconds; the process tree is killed on expiry (Unix `killpg`, Windows `taskkill /T`). |
| `output_format` | `mp3` | One of `mp3` / `wav` / `ogg` / `flac`. Auto-inferred from the output extension if Hermes picks a path. |
| `voice_compatible` | `false` | When `true`, Hermes converts MP3/WAV output to Opus/OGG via ffmpeg so Telegram renders a voice bubble. |
| `max_text_length` | `5000` | Input is truncated to this length before rendering the command. |
| `voice` / `model` | empty | Passed to the command as placeholder values only. |
#### Behavior notes
- **Built-in names always win.** A `tts.providers.openai` entry never shadows the native OpenAI provider, so no user config can silently replace a built-in.
- **Default delivery is a document.** Command providers deliver as regular audio attachments on every platform. Opt in to voice-bubble delivery per-provider with `voice_compatible: true`.
- **Command failures surface to the agent.** Non-zero exit, empty output, or timeout all return an error with the command's stderr/stdout included so you can debug the provider from the conversation.
- **`type: command` is the default when `command:` is set.** Writing `type: command` explicitly is good practice but not required; an entry with a non-empty `command` string is treated as a command provider.
- **`{input_path}` / `{text_path}` are interchangeable.** Use whichever reads better in your command.
#### Security
Command-type providers run whatever shell command you configure, with your user's permissions. Hermes quotes placeholder values and enforces the configured timeout, but the command template itself is trusted local input — treat it the same way you would a shell script on your PATH.
## Voice Message Transcription (STT)
Voice messages sent on Telegram, Discord, WhatsApp, Slack, or Signal are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.
| Provider | Quality | Cost | API Key |
|----------|---------|------|---------|
| **Local Whisper** (default) | Good | Free | None needed |
| **Groq Whisper API** | Good–Best | Free tier | `GROQ_API_KEY` |
| **OpenAI Whisper API** | Good–Best | Paid | `VOICE_TOOLS_OPENAI_KEY` or `OPENAI_API_KEY` |
:::info Zero Config
Local transcription works out of the box when `faster-whisper` is installed. If that's unavailable, Hermes can also use a local `whisper` CLI from common install locations (like `/opt/homebrew/bin`) or a custom command via `HERMES_LOCAL_STT_COMMAND`.
:::
### Configuration
```yaml
# In ~/.hermes/config.yaml
stt:
provider: "local" # "local" | "groq" | "openai" | "mistral" | "xai"
local:
model: "base" # tiny, base, small, medium, large-v3
openai:
model: "whisper-1" # whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe
mistral:
model: "voxtral-mini-latest" # voxtral-mini-latest, voxtral-mini-2602
xai:
model: "grok-stt" # xAI Grok STT
```
### Provider Details
**Local (faster-whisper)** — Runs Whisper locally via [faster-whisper](https://github.com/SYSTRAN/faster-whisper). Uses CPU by default, GPU if available. Model sizes:
| Model | Size | Speed | Quality |
|-------|------|-------|---------|
| `tiny` | ~75 MB | Fastest | Basic |
| `base` | ~150 MB | Fast | Good (default) |
| `small` | ~500 MB | Medium | Better |
| `medium` | ~1.5 GB | Slower | Great |
| `large-v3` | ~3 GB | Slowest | Best |
**Groq API** — Requires `GROQ_API_KEY`. Good cloud fallback when you want a free hosted STT option.
**OpenAI API** — Accepts `VOICE_TOOLS_OPENAI_KEY` first and falls back to `OPENAI_API_KEY`. Supports `whisper-1`, `gpt-4o-mini-transcribe`, and `gpt-4o-transcribe`.
**Mistral API (Voxtral Transcribe)** — Requires `MISTRAL_API_KEY`. Uses Mistral's [Voxtral Transcribe](https://docs.mistral.ai/capabilities/audio/speech_to_text/) models. Supports 13 languages, speaker diarization, and word-level timestamps. Install with `pip install hermes-agent[mistral]`.
**xAI Grok STT** — Requires `XAI_API_KEY`. Posts to `https://api.x.ai/v1/stt` as multipart/form-data. Good choice if you're already using xAI for chat or TTS and want one API key for everything. Auto-detection order puts it after Groq — explicitly set `stt.provider: xai` to force it.
**Custom local CLI fallback** — Set `HERMES_LOCAL_STT_COMMAND` if you want Hermes to call a local transcription command directly. The command template supports `{input_path}`, `{output_dir}`, `{language}`, and `{model}` placeholders. Your command must write a `.txt` transcript somewhere under `{output_dir}`.
#### Example: Doubao / Volcengine ASR
If you use [`doubao-speech`](https://pypi.org/project/doubao-speech/) for Doubao TTS (see [above](#example-doubao-chinese-seed-tts-20)), the same package handles speech-to-text via the local-command STT surface:
```bash
pip install doubao-speech
export VOLCENGINE_APP_ID="your-app-id"
export VOLCENGINE_ACCESS_TOKEN="your-access-token"
export HERMES_LOCAL_STT_COMMAND='doubao-speech transcribe {input_path} --out {output_dir}/transcript.txt'
```
```yaml
stt:
provider: local_command
```
Hermes writes the incoming voice message to `{input_path}`, runs the command, and reads the `.txt` file produced under `{output_dir}`. Language is auto-detected by the Volcengine bigmodel endpoint.
### Fallback Behavior
If your configured provider isn't available, Hermes automatically falls back:
- **Local faster-whisper unavailable** → Tries a local `whisper` CLI or `HERMES_LOCAL_STT_COMMAND` before cloud providers
- **Groq key not set** → Falls back to local transcription, then OpenAI
- **OpenAI key not set** → Falls back to local transcription, then Groq
- **Mistral key/SDK not set** → Skipped in auto-detect; falls through to next available provider
- **Nothing available** → Voice messages pass through with an accurate note to the user
---
# Messaging Gateway
# Messaging Gateway
Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, SMS, Email, Home Assistant, Mattermost, Matrix, DingTalk, Feishu/Lark, WeCom, Weixin, BlueBubbles (iMessage), QQ, Yuanbao, Microsoft Teams, or your browser. The gateway is a single background process that connects to all your configured platforms, handles sessions, runs cron jobs, and delivers voice messages.
For the full voice feature set — including CLI microphone mode, spoken replies in messaging, and Discord voice-channel conversations — see [Voice Mode](/docs/user-guide/features/voice-mode) and [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
## Platform Comparison
| Platform | Voice | Images | Files | Threads | Reactions | Typing | Streaming |
|----------|:-----:|:------:|:-----:|:-------:|:---------:|:------:|:---------:|
| Telegram | ✅ | ✅ | ✅ | ✅ | — | ✅ | ✅ |
| Discord | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Slack | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| WhatsApp | — | ✅ | ✅ | — | — | ✅ | ✅ |
| Signal | — | ✅ | ✅ | — | — | ✅ | ✅ |
| SMS | — | — | — | — | — | — | — |
| Email | — | ✅ | ✅ | ✅ | — | — | — |
| Home Assistant | — | — | — | — | — | — | — |
| Mattermost | ✅ | ✅ | ✅ | ✅ | — | ✅ | ✅ |
| Matrix | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| DingTalk | — | ✅ | ✅ | — | ✅ | — | ✅ |
| Feishu/Lark | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| WeCom | ✅ | ✅ | ✅ | — | — | ✅ | ✅ |
| WeCom Callback | — | — | — | — | — | — | — |
| Weixin | ✅ | ✅ | ✅ | — | — | ✅ | ✅ |
| BlueBubbles | — | ✅ | ✅ | — | ✅ | ✅ | — |
| QQ | ✅ | ✅ | ✅ | — | — | ✅ | — |
| Yuanbao | ✅ | ✅ | ✅ | — | — | ✅ | ✅ |
| Microsoft Teams | — | ✅ | — | ✅ | — | ✅ | — |
**Voice** = TTS audio replies and/or voice message transcription. **Images** = send/receive images. **Files** = send/receive file attachments. **Threads** = threaded conversations. **Reactions** = emoji reactions on messages. **Typing** = typing indicator while processing. **Streaming** = progressive message updates via editing.
## Architecture
```mermaid
flowchart TB
subgraph Gateway["Hermes Gateway"]
subgraph Adapters["Platform adapters"]
tg[Telegram]
dc[Discord]
wa[WhatsApp]
sl[Slack]
sig[Signal]
sms[SMS]
em[Email]
ha[Home Assistant]
mm[Mattermost]
mx[Matrix]
dt[DingTalk]
fs[Feishu/Lark]
wc[WeCom]
wcb[WeCom Callback]
wx[Weixin]
bb[BlueBubbles]
qq[QQ]
yb[Yuanbao]
ms[Microsoft Teams]
api["API Server (OpenAI-compatible)"]
wh[Webhooks]
end
store["Session store per chat"]
agent["AIAgent run_agent.py"]
cron["Cron scheduler ticks every 60s"]
end
tg --> store
dc --> store
wa --> store
sl --> store
sig --> store
sms --> store
em --> store
ha --> store
mm --> store
mx --> store
dt --> store
fs --> store
wc --> store
wcb --> store
wx --> store
bb --> store
qq --> store
yb --> store
ms --> store
api --> store
wh --> store
store --> agent
cron --> store
```
Each platform adapter receives messages, routes them through a per-chat session store, and dispatches them to the AIAgent for processing. The gateway also runs the cron scheduler, ticking every 60 seconds to execute any due jobs.
## Quick Setup
The easiest way to configure messaging platforms is the interactive wizard:
```bash
hermes gateway setup # Interactive setup for all messaging platforms
```
This walks you through configuring each platform with arrow-key selection, shows which platforms are already configured, and offers to start/restart the gateway when done.
## Gateway Commands
```bash
hermes gateway # Run in foreground
hermes gateway setup # Configure messaging platforms interactively
hermes gateway install # Install as a user service (Linux) / launchd service (macOS)
sudo hermes gateway install --system # Linux only: install a boot-time system service
hermes gateway start # Start the default service
hermes gateway stop # Stop the default service
hermes gateway status # Check default service status
hermes gateway status --system # Linux only: inspect the system service explicitly
```
## Chat Commands (Inside Messaging)
| Command | Description |
|---------|-------------|
| `/new` or `/reset` | Start a fresh conversation |
| `/model [provider:model]` | Show or change the model (supports `provider:model` syntax) |
| `/personality [name]` | Set a personality |
| `/retry` | Retry the last message |
| `/undo` | Remove the last exchange |
| `/status` | Show session info |
| `/stop` | Stop the running agent |
| `/approve` | Approve a pending dangerous command |
| `/deny` | Reject a pending dangerous command |
| `/sethome` | Set this chat as the home channel |
| `/compress` | Manually compress conversation context |
| `/title [name]` | Set or show the session title |
| `/resume [name]` | Resume a previously named session |
| `/usage` | Show token usage for this session |
| `/insights [days]` | Show usage insights and analytics |
| `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display |
| `/voice [on\|off\|tts\|join\|leave\|status]` | Control messaging voice replies and Discord voice-channel behavior |
| `/rollback [number]` | List or restore filesystem checkpoints |
| `/background ` | Run a prompt in a separate background session |
| `/reload-mcp` | Reload MCP servers from config |
| `/update` | Update Hermes Agent to the latest version |
| `/help` | Show available commands |
| `/` | Invoke any installed skill |
## Session Management
### Session Persistence
Sessions persist across messages until they reset. The agent remembers your conversation context.
### Reset Policies
Sessions reset based on configurable policies:
| Policy | Default | Description |
|--------|---------|-------------|
| Daily | 4:00 AM | Reset at a specific hour each day |
| Idle | 1440 min | Reset after N minutes of inactivity |
| Both | (combined) | Whichever triggers first |
Configure per-platform overrides in `~/.hermes/gateway.json`:
```json
{
"reset_by_platform": {
"telegram": { "mode": "idle", "idle_minutes": 240 },
"discord": { "mode": "idle", "idle_minutes": 60 }
}
}
```
## Security
**By default, the gateway denies all users who are not in an allowlist or paired via DM.** This is the safe default for a bot with terminal access.
```bash
# Restrict to specific users (recommended):
TELEGRAM_ALLOWED_USERS=123456789,987654321
DISCORD_ALLOWED_USERS=123456789012345678
SIGNAL_ALLOWED_USERS=+155****4567,+155****6543
SMS_ALLOWED_USERS=+155****4567,+155****6543
EMAIL_ALLOWED_USERS=trusted@example.com,colleague@work.com
MATTERMOST_ALLOWED_USERS=3uo8dkh1p7g1mfk49ear5fzs5c
MATRIX_ALLOWED_USERS=@alice:matrix.org
DINGTALK_ALLOWED_USERS=user-id-1
FEISHU_ALLOWED_USERS=ou_xxxxxxxx,ou_yyyyyyyy
WECOM_ALLOWED_USERS=user-id-1,user-id-2
WECOM_CALLBACK_ALLOWED_USERS=user-id-1,user-id-2
TEAMS_ALLOWED_USERS=aad-object-id-1,aad-object-id-2
# Or allow
GATEWAY_ALLOWED_USERS=123456789,987654321
# Or explicitly allow all users (NOT recommended for bots with terminal access):
GATEWAY_ALLOW_ALL_USERS=true
```
### DM Pairing (Alternative to Allowlists)
Instead of manually configuring user IDs, unknown users receive a one-time pairing code when they DM the bot:
```bash
# The user sees: "Pairing code: XKGH5N7P"
# You approve them with:
hermes pairing approve telegram XKGH5N7P
# Other pairing commands:
hermes pairing list # View pending + approved users
hermes pairing revoke telegram 123456789 # Remove access
```
Pairing codes expire after 1 hour, are rate-limited, and use cryptographic randomness.
## Interrupting the Agent
Send any message while the agent is working to interrupt it. Key behaviors:
- **In-progress terminal commands are killed immediately** (SIGTERM, then SIGKILL after 1s)
- **Tool calls are cancelled** — only the currently-executing one runs, the rest are skipped
- **Multiple messages are combined** — messages sent during interruption are joined into one prompt
- **`/stop` command** — interrupts without queuing a follow-up message
### Queue vs interrupt vs steer (busy-input mode)
By default, messaging a busy agent interrupts it. Two other modes are available:
- `queue` — follow-up messages wait and run as the next turn after the current task finishes.
- `steer` — follow-up messages are injected into the current run via `/steer`, arriving at the agent after the next tool call. No interrupt, no new turn. Falls back to `queue` behavior if the agent hasn't started yet.
```yaml
display:
busy_input_mode: steer # or queue, or interrupt (default)
busy_ack_enabled: true # set to false to suppress the ⚡/⏳/⏩ chat reply entirely
```
The first time you message a busy agent on any platform, Hermes appends a one-line reminder to the busy-ack explaining the knob (`"💡 First-time tip — …"`). The reminder fires once per install — a flag under `onboarding.seen.busy_input_prompt` latches it. Delete that key to see the tip again.
If you find the busy-ack noisy — especially with voice input or rapid-fire messages — set `display.busy_ack_enabled: false`. Your input is still queued/steered/interrupts as normal, only the chat reply is silenced.
## Tool Progress Notifications
Control how much tool activity is displayed in `~/.hermes/config.yaml`:
```yaml
display:
tool_progress: all # off | new | all | verbose
tool_progress_command: false # set to true to enable /verbose in messaging
```
When enabled, the bot sends status messages as it works:
```text
💻 `ls -la`...
🔍 web_search...
📄 web_extract...
🐍 execute_code...
```
## Background Sessions
Run a prompt in a separate background session so the agent works on it independently while your main chat stays responsive:
```
/background Check all servers in the cluster and report any that are down
```
Hermes confirms immediately:
```
🔄 Background task started: "Check all servers in the cluster..."
Task ID: bg_143022_a1b2c3
```
### How It Works
Each `/background` prompt spawns a **separate agent instance** that runs asynchronously:
- **Isolated session** — the background agent has its own session with its own conversation history. It has no knowledge of your current chat context and receives only the prompt you provide.
- **Same configuration** — inherits your model, provider, toolsets, reasoning settings, and provider routing from the current gateway setup.
- **Non-blocking** — your main chat stays fully interactive. Send messages, run other commands, or start more background tasks while it works.
- **Result delivery** — when the task finishes, the result is sent back to the **same chat or channel** where you issued the command, prefixed with "✅ Background task complete". If it fails, you'll see "❌ Background task failed" with the error.
### Background Process Notifications
When the agent running a background session uses `terminal(background=true)` to start long-running processes (servers, builds, etc.), the gateway can push status updates to your chat. Control this with `display.background_process_notifications` in `~/.hermes/config.yaml`:
```yaml
display:
background_process_notifications: all # all | result | error | off
```
| Mode | What you receive |
|------|-----------------|
| `all` | Running-output updates **and** the final completion message (default) |
| `result` | Only the final completion message (regardless of exit code) |
| `error` | Only the final message when the exit code is non-zero |
| `off` | No process watcher messages at all |
You can also set this via environment variable:
```bash
HERMES_BACKGROUND_NOTIFICATIONS=result
```
### Use Cases
- **Server monitoring** — "/background Check the health of all services and alert me if anything is down"
- **Long builds** — "/background Build and deploy the staging environment" while you continue chatting
- **Research tasks** — "/background Research competitor pricing and summarize in a table"
- **File operations** — "/background Organize the photos in ~/Downloads by date into folders"
:::tip
Background tasks on messaging platforms are fire-and-forget — you don't need to wait or check on them. Results arrive in the same chat automatically when the task finishes.
:::
## Service Management
### Linux (systemd)
```bash
hermes gateway install # Install as user service
hermes gateway start # Start the service
hermes gateway stop # Stop the service
hermes gateway status # Check status
journalctl --user -u hermes-gateway -f # View logs
# Enable lingering (keeps running after logout)
sudo loginctl enable-linger $USER
# Or install a boot-time system service that still runs as your user
sudo hermes gateway install --system
sudo hermes gateway start --system
sudo hermes gateway status --system
journalctl -u hermes-gateway -f
```
Use the user service on laptops and dev boxes. Use the system service on VPS or headless hosts that should come back at boot without relying on systemd linger.
Avoid keeping both the user and system gateway units installed at once unless you really mean to. Hermes will warn if it detects both because start/stop/status behavior gets ambiguous.
:::info Multiple installations
If you run multiple Hermes installations on the same machine (with different `HERMES_HOME` directories), each gets its own systemd service name. The default `~/.hermes` uses `hermes-gateway`; other installations use `hermes-gateway-`. The `hermes gateway` commands automatically target the correct service for your current `HERMES_HOME`.
:::
### macOS (launchd)
```bash
hermes gateway install # Install as launchd agent
hermes gateway start # Start the service
hermes gateway stop # Stop the service
hermes gateway status # Check status
tail -f ~/.hermes/logs/gateway.log # View logs
```
The generated plist lives at `~/Library/LaunchAgents/ai.hermes.gateway.plist`. It includes three environment variables:
- **PATH** — your full shell PATH at install time, with the venv `bin/` and `node_modules/.bin` prepended. This ensures user-installed tools (Node.js, ffmpeg, etc.) are available to gateway subprocesses like the WhatsApp bridge.
- **VIRTUAL_ENV** — points to the Python virtualenv so tools can resolve packages correctly.
- **HERMES_HOME** — scopes the gateway to your Hermes installation.
:::tip PATH changes after install
launchd plists are static — if you install new tools (e.g. a new Node.js version via nvm, or ffmpeg via Homebrew) after setting up the gateway, run `hermes gateway install` again to capture the updated PATH. The gateway will detect the stale plist and reload automatically.
:::
:::info Multiple installations
Like the Linux systemd service, each `HERMES_HOME` directory gets its own launchd label. The default `~/.hermes` uses `ai.hermes.gateway`; other installations use `ai.hermes.gateway-`.
:::
## Platform-Specific Toolsets
Each platform has its own toolset:
| Platform | Toolset | Capabilities |
|----------|---------|--------------|
| CLI | `hermes-cli` | Full access |
| Telegram | `hermes-telegram` | Full tools including terminal |
| Discord | `hermes-discord` | Full tools including terminal |
| WhatsApp | `hermes-whatsapp` | Full tools including terminal |
| Slack | `hermes-slack` | Full tools including terminal |
| Signal | `hermes-signal` | Full tools including terminal |
| SMS | `hermes-sms` | Full tools including terminal |
| Email | `hermes-email` | Full tools including terminal |
| Home Assistant | `hermes-homeassistant` | Full tools + HA device control (ha_list_entities, ha_get_state, ha_call_service, ha_list_services) |
| Mattermost | `hermes-mattermost` | Full tools including terminal |
| Matrix | `hermes-matrix` | Full tools including terminal |
| DingTalk | `hermes-dingtalk` | Full tools including terminal |
| Feishu/Lark | `hermes-feishu` | Full tools including terminal |
| WeCom | `hermes-wecom` | Full tools including terminal |
| WeCom Callback | `hermes-wecom-callback` | Full tools including terminal |
| Weixin | `hermes-weixin` | Full tools including terminal |
| BlueBubbles | `hermes-bluebubbles` | Full tools including terminal |
| QQBot | `hermes-qqbot` | Full tools including terminal |
| Yuanbao | `hermes-yuanbao` | Full tools including terminal |
| Microsoft Teams | `hermes-teams` | Full tools including terminal |
| API Server | `hermes` (default) | Full tools including terminal |
| Webhooks | `hermes-webhook` | Full tools including terminal |
## Next Steps
- [Telegram Setup](telegram.md)
- [Discord Setup](discord.md)
- [Slack Setup](slack.md)
- [WhatsApp Setup](whatsapp.md)
- [Signal Setup](signal.md)
- [SMS Setup (Twilio)](sms.md)
- [Email Setup](email.md)
- [Home Assistant Integration](homeassistant.md)
- [Mattermost Setup](mattermost.md)
- [Matrix Setup](matrix.md)
- [DingTalk Setup](dingtalk.md)
- [Feishu/Lark Setup](feishu.md)
- [WeCom Setup](wecom.md)
- [WeCom Callback Setup](wecom-callback.md)
- [Weixin Setup (WeChat)](weixin.md)
- [BlueBubbles Setup (iMessage)](bluebubbles.md)
- [QQBot Setup](qqbot.md)
- [Yuanbao Setup](yuanbao.md)
- [Microsoft Teams Setup](teams.md)
- [Open WebUI + API Server](open-webui.md)
- [Webhooks](webhooks.md)
---
# Telegram
# Telegram Setup
Hermes Agent integrates with Telegram as a full-featured conversational bot. Once connected, you can chat with your agent from any device, send voice memos that get auto-transcribed, receive scheduled task results, and use the agent in group chats. The integration is built on [python-telegram-bot](https://python-telegram-bot.org/) and supports text, voice, images, and file attachments.
## Step 1: Create a Bot via BotFather
Every Telegram bot requires an API token issued by [@BotFather](https://t.me/BotFather), Telegram's official bot management tool.
1. Open Telegram and search for **@BotFather**, or visit [t.me/BotFather](https://t.me/BotFather)
2. Send `/newbot`
3. Choose a **display name** (e.g., "Hermes Agent") — this can be anything
4. Choose a **username** — this must be unique and end in `bot` (e.g., `my_hermes_bot`)
5. BotFather replies with your **API token**. It looks like this:
```
123456789:ABCdefGHIjklMNOpqrSTUvwxYZ
```
:::warning
Keep your bot token secret. Anyone with this token can control your bot. If it leaks, revoke it immediately via `/revoke` in BotFather.
:::
## Step 2: Customize Your Bot (Optional)
These BotFather commands improve the user experience. Message @BotFather and use:
| Command | Purpose |
|---------|---------|
| `/setdescription` | The "What can this bot do?" text shown before a user starts chatting |
| `/setabouttext` | Short text on the bot's profile page |
| `/setuserpic` | Upload an avatar for your bot |
| `/setcommands` | Define the command menu (the `/` button in chat) |
| `/setprivacy` | Control whether the bot sees all group messages (see Step 3) |
:::tip
For `/setcommands`, a useful starting set:
```
help - Show help information
new - Start a new conversation
sethome - Set this chat as the home channel
```
:::
## Step 3: Privacy Mode (Critical for Groups)
Telegram bots have a **privacy mode** that is **enabled by default**. This is the single most common source of confusion when using bots in groups.
**With privacy mode ON**, your bot can only see:
- Messages that start with a `/` command
- Replies directly to the bot's own messages
- Service messages (member joins/leaves, pinned messages, etc.)
- Messages in channels where the bot is an admin
**With privacy mode OFF**, the bot receives every message in the group.
### How to disable privacy mode
1. Message **@BotFather**
2. Send `/mybots`
3. Select your bot
4. Go to **Bot Settings → Group Privacy → Turn off**
:::warning
**You must remove and re-add the bot to any group** after changing the privacy setting. Telegram caches the privacy state when a bot joins a group, and it will not update until the bot is removed and re-added.
:::
:::tip
An alternative to disabling privacy mode: promote the bot to **group admin**. Admin bots always receive all messages regardless of the privacy setting, and this avoids needing to toggle the global privacy mode.
:::
## Step 4: Find Your User ID
Hermes Agent uses numeric Telegram user IDs to control access. Your user ID is **not** your username — it's a number like `123456789`.
**Method 1 (recommended):** Message [@userinfobot](https://t.me/userinfobot) — it instantly replies with your user ID.
**Method 2:** Message [@get_id_bot](https://t.me/get_id_bot) — another reliable option.
Save this number; you'll need it for the next step.
## Step 5: Configure Hermes
### Option A: Interactive Setup (Recommended)
```bash
hermes gateway setup
```
Select **Telegram** when prompted. The wizard asks for your bot token and allowed user IDs, then writes the configuration for you.
### Option B: Manual Configuration
Add the following to `~/.hermes/.env`:
```bash
TELEGRAM_BOT_TOKEN=123456789:ABCdefGHIjklMNOpqrSTUvwxYZ
TELEGRAM_ALLOWED_USERS=123456789 # Comma-separated for multiple users
```
### Start the Gateway
```bash
hermes gateway
```
The bot should come online within seconds. Send it a message on Telegram to verify.
## Sending Generated Files from Docker-backed Terminals
If your terminal backend is `docker`, keep in mind that Telegram attachments are
sent by the **gateway process**, not from inside the container. That means the
final `MEDIA:/...` path must be readable on the host where the gateway is
running.
Common pitfall:
- the agent writes a file inside Docker to `/workspace/report.txt`
- the model emits `MEDIA:/workspace/report.txt`
- Telegram delivery fails because `/workspace/report.txt` only exists inside the
container, not on the host
Recommended pattern:
```yaml
terminal:
backend: docker
docker_volumes:
- "/home/user/.hermes/cache/documents:/output"
```
Then:
- write files inside Docker to `/output/...`
- emit the **host-visible** path in `MEDIA:`, for example:
`MEDIA:/home/user/.hermes/cache/documents/report.txt`
If you already have a `docker_volumes:` section, add the new mount to the same
list. YAML duplicate keys silently override earlier ones.
### Supported `MEDIA:` file extensions
The gateway extracts `MEDIA:/path/to/file` tags from agent replies and ships the referenced file as a platform-native attachment. Supported extensions across all gateway platforms:
| Category | Extensions |
|---|---|
| Images | `png`, `jpg`, `jpeg`, `gif`, `webp`, `bmp`, `tiff`, `svg` |
| Audio | `mp3`, `wav`, `ogg`, `m4a`, `opus`, `flac`, `aac` |
| Video | `mp4`, `mov`, `webm`, `mkv`, `avi` |
| **Documents** | `pdf`, `txt`, `md`, `csv`, `json`, `xml`, `html`, `yaml`, `yml`, `log` |
| **Office** | `docx`, `xlsx`, `pptx`, `odt`, `ods`, `odp` |
| **Archives** | `zip`, `rar`, `7z`, `tar`, `gz`, `bz2` |
| **Books / packages** | `epub`, `apk`, `ipa` |
Anything on this list delivered as a native attachment on platforms that support it (Telegram, Discord, Signal, Slack, WhatsApp, Feishu, Matrix, etc.); on platforms without native support it falls back to a link or plain-text indicator. The **bold** categories were added in the last few releases — if you were relying on the model saying `here is the file: /path/to/report.docx` instead, swap to `MEDIA:/path/to/report.docx` for native delivery.
## Webhook Mode
By default, Hermes connects to Telegram using **long polling** — the gateway makes outbound requests to Telegram's servers to fetch new updates. This works well for local and always-on deployments.
For **cloud deployments** (Fly.io, Railway, Render, etc.), **webhook mode** is more cost-effective. These platforms can auto-wake suspended machines on inbound HTTP traffic, but not on outbound connections. Since polling is outbound, a polling bot can never sleep. Webhook mode flips the direction — Telegram pushes updates to your bot's HTTPS URL, enabling sleep-when-idle deployments.
| | Polling (default) | Webhook |
|---|---|---|
| Direction | Gateway → Telegram (outbound) | Telegram → Gateway (inbound) |
| Best for | Local, always-on servers | Cloud platforms with auto-wake |
| Setup | No extra config | Set `TELEGRAM_WEBHOOK_URL` |
| Idle cost | Machine must stay running | Machine can sleep between messages |
### Configuration
Add the following to `~/.hermes/.env`:
```bash
TELEGRAM_WEBHOOK_URL=https://my-app.fly.dev/telegram
TELEGRAM_WEBHOOK_SECRET="$(openssl rand -hex 32)" # required
# TELEGRAM_WEBHOOK_PORT=8443 # optional, default 8443
```
| Variable | Required | Description |
|----------|----------|-------------|
| `TELEGRAM_WEBHOOK_URL` | Yes | Public HTTPS URL where Telegram will send updates. The URL path is auto-extracted (e.g., `/telegram` from the example above). |
| `TELEGRAM_WEBHOOK_SECRET` | **Yes** (when `TELEGRAM_WEBHOOK_URL` is set) | Secret token that Telegram echoes in every webhook request for verification. The gateway refuses to start without it — see [GHSA-3vpc-7q5r-276h](https://github.com/NousResearch/hermes-agent/security/advisories/GHSA-3vpc-7q5r-276h). Generate with `openssl rand -hex 32`. |
| `TELEGRAM_WEBHOOK_PORT` | No | Local port the webhook server listens on (default: `8443`). |
When `TELEGRAM_WEBHOOK_URL` is set, the gateway starts an HTTP webhook server instead of polling. When unset, polling mode is used — no behavior change from previous versions.
### Cloud deployment example (Fly.io)
1. Add the env vars to your Fly.io app secrets:
```bash
fly secrets set TELEGRAM_WEBHOOK_URL=https://my-app.fly.dev/telegram
fly secrets set TELEGRAM_WEBHOOK_SECRET=$(openssl rand -hex 32)
```
2. Expose the webhook port in your `fly.toml`:
```toml
[[services]]
internal_port = 8443
protocol = "tcp"
[[services.ports]]
handlers = ["tls", "http"]
port = 443
```
3. Deploy:
```bash
fly deploy
```
The gateway log should show: `[telegram] Connected to Telegram (webhook mode)`.
## Proxy Support
If Telegram's API is blocked or you need to route traffic through a proxy, set a Telegram-specific proxy URL. This takes priority over the generic `HTTPS_PROXY` / `HTTP_PROXY` env vars.
**Option 1: config.yaml (recommended)**
```yaml
telegram:
proxy_url: "socks5://127.0.0.1:1080"
```
**Option 2: environment variable**
```bash
TELEGRAM_PROXY=socks5://127.0.0.1:1080
```
Supported schemes: `http://`, `https://`, `socks5://`.
The proxy applies to both the main Telegram connection and the fallback IP transport. If no Telegram-specific proxy is set, the gateway falls back to `HTTPS_PROXY` / `HTTP_PROXY` / `ALL_PROXY` (or macOS system proxy auto-detection).
## Home Channel
Use the `/sethome` command in any Telegram chat (DM or group) to designate it as the **home channel**. Scheduled tasks (cron jobs) deliver their results to this channel.
You can also set it manually in `~/.hermes/.env`:
```bash
TELEGRAM_HOME_CHANNEL=-1001234567890
TELEGRAM_HOME_CHANNEL_NAME="My Notes"
```
:::tip
Group chat IDs are negative numbers (e.g., `-1001234567890`). Your personal DM chat ID is the same as your user ID.
:::
## Voice Messages
### Incoming Voice (Speech-to-Text)
Voice messages you send on Telegram are automatically transcribed by Hermes's configured STT provider and injected as text into the conversation.
- `local` uses `faster-whisper` on the machine running Hermes — no API key required
- `groq` uses Groq Whisper and requires `GROQ_API_KEY`
- `openai` uses OpenAI Whisper and requires `VOICE_TOOLS_OPENAI_KEY`
### Outgoing Voice (Text-to-Speech)
When the agent generates audio via TTS, it's delivered as native Telegram **voice bubbles** — the round, inline-playable kind.
- **OpenAI and ElevenLabs** produce Opus natively — no extra setup needed
- **Edge TTS** (the default free provider) outputs MP3 and requires **ffmpeg** to convert to Opus:
```bash
# Ubuntu/Debian
sudo apt install ffmpeg
# macOS
brew install ffmpeg
```
Without ffmpeg, Edge TTS audio is sent as a regular audio file (still playable, but uses the rectangular player instead of a voice bubble).
Configure the TTS provider in your `config.yaml` under the `tts.provider` key.
## Group Chat Usage
Hermes Agent works in Telegram group chats with a few considerations:
- **Privacy mode** determines what messages the bot can see (see [Step 3](#step-3-privacy-mode-critical-for-groups))
- `TELEGRAM_ALLOWED_USERS` still applies — only authorized users can trigger the bot, even in groups
- You can keep the bot from responding to ordinary group chatter with `telegram.require_mention: true`
- With `telegram.require_mention: true`, group messages are accepted when they are:
- replies to one of the bot's messages
- `@botusername` mentions
- `/command@botusername` (Telegram's bot-menu command form that includes the bot name)
- matches for one of your configured regex wake words in `telegram.mention_patterns`
- Use `telegram.ignored_threads` to keep Hermes silent in specific Telegram forum topics, even when the group would otherwise allow free responses or mention-triggered replies
- If `telegram.require_mention` is left unset or false, Hermes keeps the previous open-group behavior and responds to normal group messages it can see
### Troubleshooting: works in DMs but not groups
If the bot responds in a private chat but stays silent in a group, check these
gates in order:
1. **Telegram delivery:** turn off BotFather privacy mode, promote the bot to
admin, or mention the bot directly. Hermes cannot respond to group messages
that Telegram never delivers to the bot.
2. **Rejoin after changing privacy:** remove the bot from the group and add it
again after changing BotFather privacy settings. Telegram may keep the old
delivery behavior for existing memberships.
3. **Hermes authorization:** make sure the sender is listed in
`TELEGRAM_ALLOWED_USERS` or `TELEGRAM_GROUP_ALLOWED_USERS`, or allow the
group chat with `TELEGRAM_GROUP_ALLOWED_CHATS`.
4. **Mention filters:** if `telegram.require_mention: true` is set, normal
group chatter is ignored unless the message is a slash command, reply to the
bot, `@botusername` mention, or configured `mention_patterns` match.
Negative chat IDs are normal for Telegram groups and supergroups. If you use
chat-scoped authorization, put those IDs in `TELEGRAM_GROUP_ALLOWED_CHATS`, not
the sender-user allowlist.
### Example group trigger configuration
Add this to `~/.hermes/config.yaml`:
```yaml
telegram:
require_mention: true
mention_patterns:
- "^\\s*chompy\\b"
ignored_threads:
- 31
- "42"
```
This example allows all the usual direct triggers plus messages that begin with `chompy`, even if they do not use an `@mention`.
Messages in Telegram topics `31` and `42` are always ignored before the mention and free-response checks run.
### Notes on `mention_patterns`
- Patterns use Python regular expressions
- Matching is case-insensitive
- Patterns are checked against both text messages and media captions
- Invalid regex patterns are ignored with a warning in the gateway logs rather than crashing the bot
- If you want a pattern to match only at the start of a message, anchor it with `^`
## Private Chat Topics (Bot API 9.4)
Telegram Bot API 9.4 (February 2026) introduced **Private Chat Topics** — bots can create forum-style topic threads directly in 1-on-1 DM chats, no supergroup needed. This lets you run multiple isolated workspaces within your existing DM with Hermes.
### Use case
If you work on several long-running projects, topics keep their context separate:
- **Topic "Website"** — work on your production web service
- **Topic "Research"** — literature review and paper exploration
- **Topic "General"** — miscellaneous tasks and quick questions
Each topic gets its own conversation session, history, and context — completely isolated from the others.
### Configuration
:::caution Prerequisites
Before adding topics to your config, the user must **enable Topics mode** in the DM chat with the bot:
1. Open your private chat with the Hermes bot in Telegram
2. Tap the bot's name at the top to open chat info
3. Enable **Topics** (the toggle to turn the chat into a forum)
Without this, Hermes will log `The chat is not a forum` on startup and skip topic creation. This is a Telegram client-side setting — the bot cannot enable it programmatically.
:::
Add topics under `platforms.telegram.extra.dm_topics` in `~/.hermes/config.yaml`:
```yaml
platforms:
telegram:
extra:
dm_topics:
- chat_id: 123456789 # Your Telegram user ID
topics:
- name: General
icon_color: 7322096
- name: Website
icon_color: 9367192
- name: Research
icon_color: 16766590
skill: arxiv # Auto-load a skill in this topic
```
**Fields:**
| Field | Required | Description |
|-------|----------|-------------|
| `name` | Yes | Topic display name |
| `icon_color` | No | Telegram icon color code (integer) |
| `icon_custom_emoji_id` | No | Custom emoji ID for the topic icon |
| `skill` | No | Skill to auto-load on new sessions in this topic |
| `thread_id` | No | Auto-populated after topic creation — don't set manually |
### How it works
1. On gateway startup, Hermes calls `createForumTopic` for each topic that doesn't have a `thread_id` yet
2. The `thread_id` is saved back to `config.yaml` automatically — subsequent restarts skip the API call
3. Each topic maps to an isolated session key: `agent:main:telegram:dm:{chat_id}:{thread_id}`
4. Messages in each topic have their own conversation history, memory flush, and context window
### Skill binding
Topics with a `skill` field automatically load that skill when a new session starts in the topic. This works exactly like typing `/skill-name` at the start of a conversation — the skill content is injected into the first message, and subsequent messages see it in the conversation history.
For example, a topic with `skill: arxiv` will have the arxiv skill pre-loaded whenever its session resets (due to idle timeout, daily reset, or manual `/reset`).
:::tip
Topics created outside of the config (e.g., by manually calling the Telegram API) are discovered automatically when a `forum_topic_created` service message arrives. You can also add topics to the config while the gateway is running — they'll be picked up on the next cache miss.
:::
## Multi-session DM mode (`/topic`)
A ChatGPT-style multi-session DM — one bot, many parallel conversations. Unlike the operator-curated `extra.dm_topics` above, this mode is **user-driven**: no config, no pre-declared topic names. The end user flips it on with `/topic`, then taps the Telegram **+** button to create as many topics as they want, each one a fully independent Hermes session.
### `/topic` subcommands
| Form | Context | Effect |
|------|---------|--------|
| `/topic` | Root DM, not yet enabled | Check BotFather capabilities, enable multi-session mode, create pinned System topic |
| `/topic` | Root DM, already enabled | Show status: unlinked sessions available for restore |
| `/topic` | Inside a topic | Show the current topic's session binding |
| `/topic help` | Any | Inline usage |
| `/topic off` | Root DM | Disable multi-session mode and clear all topic bindings for this chat |
| `/topic ` | Inside a topic | Restore a previous Telegram session into the current topic |
Only authorized users (allowlist via `TELEGRAM_ALLOWED_USERS` / platform auth config) can run `/topic`. An unauthorized sender gets a refusal instead of activation.
### DM Topics vs Multi-session DM mode
| | `extra.dm_topics` (config-driven) | `/topic` (user-driven) |
|---|---|---|
| Who activates it | Operator, in `config.yaml` | End user, by sending `/topic` |
| Topic list | Fixed set declared in config | User creates/deletes topics freely |
| Topic names | Chosen by operator | Chosen by user; auto-renamed to match Hermes session title |
| Root DM behavior | Unchanged — normal chat | Becomes a system lobby (non-command messages are rejected) |
| Primary use case | Permanent workspaces with optional skill binding | Ad-hoc parallel sessions |
| Persistence | `extra.dm_topics` in config | `telegram_dm_topic_mode` + `telegram_dm_topic_bindings` SQLite tables |
Both features can coexist on the same bot — you'd run `/topic` from a user's DM, and `extra.dm_topics` continues to manage operator-declared topics for other chats.
### Prerequisites
In **@BotFather**, open your bot → **Bot Settings → Threads Settings**:
1. Turn on **Threaded Mode** (enables `has_topics_enabled`)
2. Do **not** disable users creating topics (keeps `allows_users_to_create_topics` on)
When the user first runs `/topic`, Hermes calls `getMe` to verify both flags. If either is off, Hermes sends a screenshot of the BotFather Threads Settings page and explains what to toggle — no activation happens until prerequisites are met.
### Activation flow
From the root DM, send:
```
/topic
```
Hermes will:
1. Check `getMe().has_topics_enabled` and `allows_users_to_create_topics`
2. If both are true, enable multi-session topic mode for this DM
3. Create and pin a **System** topic for status/commands (best-effort)
4. Reply with a list of previous unlinked Telegram sessions the user can restore
After activation, the **root DM is a lobby**: normal prompts are rejected with guidance pointing at **All Messages**. System commands (`/status`, `/sessions`, `/usage`, `/help`, etc.) still work in the root.
### Creating a new topic (end-user flow)
1. Open the bot DM in Telegram
2. Tap **All Messages** at the top of the bot interface, then send any message
3. Telegram creates a new topic for that message
4. Hermes responds inside that topic — the topic is now a standalone session
Every topic gets its own conversation history, model state, tool execution, and session ID. The isolation key is `agent:main:telegram:dm:{chat_id}:{thread_id}` — identical to the config-driven DM topics isolation.
### Auto-renamed topics
When Hermes generates a session title for a topic (via the auto-title pipeline, after the first exchange), the Telegram topic itself is renamed to match — e.g. "New Topic" becomes "Database migration plan". The rename is best-effort: failures are logged but don't break the session.
### `/new` inside a topic
Resets the current topic's session (new session ID, fresh history) without touching other topics. Hermes replies with a reminder that for parallel work, creating another topic (via **All Messages**) is usually what you want.
### Restoring a previous session
Inside a topic, send:
```
/topic
```
This binds the current topic to an existing Hermes session instead of starting fresh. Useful for continuing a conversation that started before topic mode was enabled. Restrictions:
- The target session must belong to the same Telegram user
- The target session must not already be bound to another topic
Hermes confirms with the session title and replays the last assistant message for context.
To discover session IDs, send `/topic` (no argument) in the root DM — Hermes lists the user's unlinked Telegram sessions.
### `/topic` inside a topic (no argument)
Shows the current topic's binding: session title, session ID, and hints for `/new` vs creating another topic.
### Under the hood
- Activation persists to `telegram_dm_topic_mode(chat_id, user_id, enabled, ...)` in `state.db`
- Each topic binding persists to `telegram_dm_topic_bindings(chat_id, thread_id, session_id, ...)` with `ON DELETE CASCADE` on `session_id` — pruning a session automatically clears its topic binding
- The topic-mode SQLite migration is **opt-in**: it runs on the first `/topic` call, never on gateway startup. Until a user runs `/topic` in this profile, `state.db` is unchanged
- Each inbound DM message looks up its `(chat_id, thread_id)` binding. If present, the lookup routes the message to the bound session via `SessionStore.switch_session()` so the session-key-to-session-id mapping stays consistent on disk
- `/new` inside a topic rewrites the binding row to point at the new session ID, so the next message stays on the fresh session
- Topics declared in `extra.dm_topics` are **never auto-renamed** — the operator-chosen name is preserved even when multi-session mode is enabled
- The General (pinned top) topic in a forum-enabled DM is treated as the root lobby, regardless of whether Telegram delivers its messages with `message_thread_id=1` or with no thread_id
- Root-lobby reminders are rate-limited to one message per 30 seconds per chat — a user who forgets topic mode is on and types ten prompts in the root won't get ten replies
- BotFather setup screenshots are rate-limited to one send per 5 minutes per chat — repeated `/topic` attempts while Threads Settings are still disabled won't re-upload the same image
- `/background ` started inside a topic delivers its result back to the same topic; background sessions don't trigger auto-rename of the owning topic
- `/topic` itself is gated by the bot's user authorization check — unauthorized DMs get a refusal instead of activation
### Disabling multi-session mode
Send `/topic off` in the root DM. Hermes flips the row off, clears the chat's `(thread_id → session_id)` bindings, and the root DM reverts to a normal Hermes chat. Existing topics in Telegram aren't deleted — they just stop being gated as independent sessions. Re-run `/topic` later to turn it back on.
If you need to clean up by hand (e.g. a bulk reset across many chats), remove the rows directly:
```bash
sqlite3 ~/.hermes/state.db \
"UPDATE telegram_dm_topic_mode SET enabled = 0 WHERE chat_id = ''; \
DELETE FROM telegram_dm_topic_bindings WHERE chat_id = '';"
```
### Downgrading Hermes
If you downgrade to a Hermes version that predates `/topic`, the feature simply stops working — the `telegram_dm_topic_mode` and `telegram_dm_topic_bindings` tables remain in `state.db` but are ignored by older code. DMs revert to the native per-thread isolation (each `message_thread_id` still gets its own session via `build_session_key`), so your existing Telegram topics keep working as parallel sessions. The root DM is no longer a lobby — messages there go into the agent like they used to. Re-upgrading reactivates multi-session mode exactly where it was.
## Group Forum Topic Skill Binding
Supergroups with **Topics mode** enabled (also called "forum topics") already get session isolation per topic — each `thread_id` maps to its own conversation. But you may want to **auto-load a skill** when messages arrive in a specific group topic, just like DM topic skill binding works.
### Use case
A team supergroup with forum topics for different workstreams:
- **Engineering** topic → auto-loads the `software-development` skill
- **Research** topic → auto-loads the `arxiv` skill
- **General** topic → no skill, general-purpose assistant
### Configuration
Add topic bindings under `platforms.telegram.extra.group_topics` in `~/.hermes/config.yaml`:
```yaml
platforms:
telegram:
extra:
group_topics:
- chat_id: -1001234567890 # Supergroup ID
topics:
- name: Engineering
thread_id: 5
skill: software-development
- name: Research
thread_id: 12
skill: arxiv
- name: General
thread_id: 1
# No skill — general purpose
```
**Fields:**
| Field | Required | Description |
|-------|----------|-------------|
| `chat_id` | Yes | The supergroup's numeric ID (negative number starting with `-100`) |
| `name` | No | Human-readable label for the topic (informational only) |
| `thread_id` | Yes | Telegram forum topic ID — visible in `t.me/c//` links |
| `skill` | No | Skill to auto-load on new sessions in this topic |
### How it works
1. When a message arrives in a mapped group topic, Hermes looks up the `chat_id` and `thread_id` in `group_topics` config
2. If a matching entry has a `skill` field, that skill is auto-loaded for the session — identical to DM topic skill binding
3. Topics without a `skill` key get session isolation only (existing behavior, unchanged)
4. Unmapped `thread_id` values or `chat_id` values fall through silently — no error, no skill
### Differences from DM Topics
| | DM Topics | Group Topics |
|---|---|---|
| Config key | `extra.dm_topics` | `extra.group_topics` |
| Topic creation | Hermes creates topics via API if `thread_id` is missing | Admin creates topics in Telegram UI |
| `thread_id` | Auto-populated after creation | Must be set manually |
| `icon_color` / `icon_custom_emoji_id` | Supported | Not applicable (admin controls appearance) |
| Skill binding | ✓ | ✓ |
| Session isolation | ✓ | ✓ (already built-in for forum topics) |
:::tip
To find a topic's `thread_id`, open the topic in Telegram Web or Desktop and look at the URL: `https://t.me/c/1234567890/5` — the last number (`5`) is the `thread_id`. The `chat_id` for supergroups is the group ID prefixed with `-100` (e.g., group `1234567890` becomes `-1001234567890`).
:::
## Recent Bot API Features
- **Bot API 9.4 (Feb 2026):** Private Chat Topics — bots can create forum topics in 1-on-1 DM chats via `createForumTopic`. Hermes uses this for two distinct features: operator-curated [Private Chat Topics](#private-chat-topics-bot-api-94) (config-driven, fixed topic list) and user-driven [Multi-session DM mode](#multi-session-dm-mode-topic) (activated by `/topic`, unlimited user-created topics).
- **Privacy policy:** Telegram now requires bots to have a privacy policy. Set one via BotFather with `/setprivacy_policy`, or Telegram may auto-generate a placeholder. This is particularly important if your bot is public-facing.
- **Message streaming:** Bot API 9.x added support for streaming long responses, which can improve perceived latency for lengthy agent replies.
## Rendering: Tables and Link Previews
Telegram's MarkdownV2 has no native table syntax — pipe tables render as backslash-escaped noise if passed through raw. Hermes normalizes markdown tables automatically:
- **Small tables** are flattened into **row-group bullets** — each row becomes a readable bulleted list under the column headings. Good for 2–4 columns and short cells.
- **Larger or wider tables** fall back to a **fenced code block** with aligned columns so nothing collapses. A one-line prompt hint is added so the agent knows to prefer prose follow-ups over more tables on Telegram.
There's nothing to configure — the adapter picks the right fallback per message. If you want the legacy "always code-block" behavior, disable table normalization by setting `telegram.pretty_tables: false` in `config.yaml` (default: `true`).
**Link previews.** Telegram auto-generates link previews for URLs in bot messages. If you'd rather suppress those (long `/tools` output, agent reply that mentions ten links, etc.):
```yaml
gateway:
platforms:
telegram:
extra:
disable_link_previews: true
```
When enabled, Hermes attaches Telegram's `LinkPreviewOptions(is_disabled=True)` to every outgoing message and falls back to the legacy `disable_web_page_preview` parameter on older `python-telegram-bot` versions.
## Group Allowlisting
Telegram groups and forum chats have two orthogonal gates you can configure:
- **Sender user IDs** (`group_allow_from` / `TELEGRAM_GROUP_ALLOWED_USERS`) — sender-scoped allowlist that applies only to group/forum messages. Use this when you want specific users to be able to invoke the bot in groups without adding them to `TELEGRAM_ALLOWED_USERS` (which would also give them DM access).
- **Chat IDs** (`group_allowed_chats` / `TELEGRAM_GROUP_ALLOWED_CHATS`) — chat-scoped allowlist. Any member of these groups/forums can interact with the bot. Useful for team/support bots where group membership itself is the access signal.
```yaml
gateway:
platforms:
telegram:
extra:
# Global access (DMs + groups). Users here can always invoke the bot.
allow_from:
- "123456789"
# Sender IDs allowed in groups/forums only. Does NOT grant DM access.
group_allow_from:
- "987654321"
# Entire groups/forums — any member is authorized.
group_allowed_chats:
- "-1001234567890"
```
Equivalent env vars:
```bash
TELEGRAM_ALLOWED_USERS="123456789"
TELEGRAM_GROUP_ALLOWED_USERS="987654321"
TELEGRAM_GROUP_ALLOWED_CHATS="-1001234567890"
```
Behavior:
- `TELEGRAM_ALLOWED_USERS` covers all chat types (DMs, groups, forums).
- `TELEGRAM_GROUP_ALLOWED_USERS` only authorizes the listed senders in groups/forums. They still can't DM the bot unless listed in `TELEGRAM_ALLOWED_USERS`.
- A chat in `TELEGRAM_GROUP_ALLOWED_CHATS` authorizes every member of that chat, regardless of sender.
- Use `*` in any of these to allow any sender/chat.
- This layers on top of existing mention/pattern triggers and on top of `group_topics` + `ignored_threads`.
### Migration from before PR #17686
Prior to this split, `TELEGRAM_GROUP_ALLOWED_USERS` was the only knob and users put **chat IDs** in it. For backward compatibility, chat-ID-shaped values (starting with `-`) in `TELEGRAM_GROUP_ALLOWED_USERS` are still honored as chat IDs and a deprecation warning is logged once. Migration:
```bash
# Old (still works, but deprecated)
TELEGRAM_GROUP_ALLOWED_USERS="-1001234567890"
# New
TELEGRAM_GROUP_ALLOWED_CHATS="-1001234567890"
```
## Interactive Model Picker
When you send `/model` with no arguments in a Telegram chat, Hermes shows an interactive inline keyboard for switching models:
1. **Provider selection** — buttons showing each available provider with model counts (e.g., "OpenAI (15)", "✓ Anthropic (12)" for the current provider).
2. **Model selection** — paginated model list with **Prev**/**Next** navigation, a **Back** button to return to providers, and **Cancel**.
The current model and provider are displayed at the top. All navigation happens by editing the same message in-place (no chat clutter).
:::tip
If you know the exact model name, type `/model ` directly to skip the picker. You can also type `/model --global` to persist the change across sessions.
:::
## DNS-over-HTTPS Fallback IPs
In some restricted networks, `api.telegram.org` may resolve to an IP that is unreachable. The Telegram adapter includes a **fallback IP** mechanism that transparently retries connections against alternative IPs while preserving the correct TLS hostname and SNI.
### How it works
1. If `TELEGRAM_FALLBACK_IPS` is set, those IPs are used directly.
2. Otherwise, the adapter automatically queries **Google DNS** and **Cloudflare DNS** via DNS-over-HTTPS (DoH) to discover alternative IPs for `api.telegram.org`.
3. IPs returned by DoH that differ from the system DNS result are used as fallbacks.
4. If DoH is also blocked, a hardcoded seed IP (`149.154.167.220`) is used as a last resort.
5. Once a fallback IP succeeds, it becomes "sticky" — subsequent requests use it directly without retrying the primary path first.
### Configuration
```bash
# Explicit fallback IPs (comma-separated)
TELEGRAM_FALLBACK_IPS=149.154.167.220,149.154.167.221
```
Or in `~/.hermes/config.yaml`:
```yaml
platforms:
telegram:
extra:
fallback_ips:
- "149.154.167.220"
```
:::tip
You usually don't need to configure this manually. The auto-discovery via DoH handles most restricted-network scenarios. The `TELEGRAM_FALLBACK_IPS` env var is only needed if DoH is also blocked on your network.
:::
## Proxy Support
If your network requires an HTTP proxy to reach the internet (common in corporate environments), the Telegram adapter automatically reads standard proxy environment variables and routes all connections through the proxy.
### Supported variables
The adapter checks these environment variables in order, using the first one that is set:
1. `HTTPS_PROXY`
2. `HTTP_PROXY`
3. `ALL_PROXY`
4. `https_proxy` / `http_proxy` / `all_proxy` (lowercase variants)
### Configuration
Set the proxy in your environment before starting the gateway:
```bash
export HTTPS_PROXY=http://proxy.example.com:8080
hermes gateway
```
Or add it to `~/.hermes/.env`:
```bash
HTTPS_PROXY=http://proxy.example.com:8080
```
The proxy applies to both the primary transport and all fallback IP transports. No additional Hermes configuration is needed — if the environment variable is set, it's used automatically.
:::note
This covers the custom fallback transport layer that Hermes uses for Telegram connections. The standard `httpx` client used elsewhere already respects proxy env vars natively.
:::
## Message Reactions
The bot can add emoji reactions to messages as visual processing feedback:
- 👀 when the bot starts processing your message
- ✅ when the response is delivered successfully
- ❌ if an error occurs during processing
Reactions are **disabled by default**. Enable them in `config.yaml`:
```yaml
telegram:
reactions: true
```
Or via environment variable:
```bash
TELEGRAM_REACTIONS=true
```
:::note
Unlike Discord (where reactions are additive), Telegram's Bot API replaces all bot reactions in a single call. The transition from 👀 to ✅/❌ happens atomically — you won't see both at once.
:::
:::tip
If the bot doesn't have permission to add reactions in a group, the reaction calls fail silently and message processing continues normally.
:::
## Per-Channel Prompts
Assign ephemeral system prompts to specific Telegram groups or forum topics. The prompt is injected at runtime on every turn — never persisted to transcript history — so changes take effect immediately.
```yaml
telegram:
channel_prompts:
"-1001234567890": |
You are a research assistant. Focus on academic sources,
citations, and concise synthesis.
"42": |
This topic is for creative writing feedback. Be warm and
constructive.
```
Keys are chat IDs (groups/supergroups) or forum topic IDs. For forum groups, topic-level prompts override the group-level prompt:
- Message in topic `42` inside group `-1001234567890` → uses topic `42`'s prompt
- Message in topic `99` (no explicit entry) → falls back to group `-1001234567890`'s prompt
- Message in a group with no entry → no channel prompt applied
Numeric YAML keys are automatically normalized to strings.
## Troubleshooting
| Problem | Solution |
|---------|----------|
| Bot not responding at all | Verify `TELEGRAM_BOT_TOKEN` is correct. Check `hermes gateway` logs for errors. |
| Bot responds with "unauthorized" | Your user ID is not in `TELEGRAM_ALLOWED_USERS`. Double-check with @userinfobot. |
| Bot ignores group messages | Privacy mode is likely on. Disable it (Step 3) or make the bot a group admin. **Remember to remove and re-add the bot after changing privacy.** |
| Voice messages not transcribed | Verify STT is available: install `faster-whisper` for local transcription, or set `GROQ_API_KEY` / `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`. |
| Voice replies are files, not bubbles | Install `ffmpeg` (needed for Edge TTS Opus conversion). |
| Bot token revoked/invalid | Generate a new token via `/revoke` then `/newbot` or `/token` in BotFather. Update your `.env` file. |
| Webhook not receiving updates | Verify `TELEGRAM_WEBHOOK_URL` is publicly reachable (test with `curl`). Ensure your platform/reverse proxy routes inbound HTTPS traffic from the URL's port to the local listen port configured by `TELEGRAM_WEBHOOK_PORT` (they do not need to be the same number). Ensure SSL/TLS is active — Telegram only sends to HTTPS URLs. Check firewall rules. |
## Exec Approval
When the agent tries to run a potentially dangerous command, it asks you for approval in the chat:
> ⚠️ This command is potentially dangerous (recursive delete). Reply "yes" to approve.
Reply "yes"/"y" to approve or "no"/"n" to deny.
## Security
:::warning
Always set `TELEGRAM_ALLOWED_USERS` to restrict who can interact with your bot. Without it, the gateway denies all users by default as a safety measure.
:::
Never share your bot token publicly. If compromised, revoke it immediately via BotFather's `/revoke` command.
For more details, see the [Security documentation](/user-guide/security). You can also use [DM pairing](/user-guide/messaging#dm-pairing-alternative-to-allowlists) for a more dynamic approach to user authorization.
---
# Discord
# Discord Setup
Hermes Agent integrates with Discord as a bot, letting you chat with your AI assistant through direct messages or server channels. The bot receives your messages, processes them through the Hermes Agent pipeline (including tool use, memory, and reasoning), and responds in real time. It supports text, voice messages, file attachments, and slash commands.
Before setup, here's the part most people want to know: how Hermes behaves once it's in your server.
## How Hermes Behaves
| Context | Behavior |
|---------|----------|
| **DMs** | Hermes responds to every message. No `@mention` needed. Each DM has its own session. |
| **Server channels** | By default, Hermes only responds when you `@mention` it. If you post in a channel without mentioning it, Hermes ignores the message. |
| **Free-response channels** | You can make specific channels mention-free with `DISCORD_FREE_RESPONSE_CHANNELS`, or disable mentions globally with `DISCORD_REQUIRE_MENTION=false`. Messages in these channels are answered inline — auto-threading is skipped so the channel stays a lightweight chat. |
| **Threads** | Hermes replies in the same thread. Mention rules still apply unless that thread or its parent channel is configured as free-response. Threads stay isolated from the parent channel for session history. |
| **Shared channels with multiple users** | By default, Hermes isolates session history per user inside the channel for safety and clarity. Two people talking in the same channel do not share one transcript unless you explicitly disable that. |
| **Messages mentioning other users** | When `DISCORD_IGNORE_NO_MENTION` is `true` (the default), Hermes stays silent if a message @mentions other users but does **not** mention the bot. This prevents the bot from jumping into conversations directed at other people. Set to `false` if you want the bot to respond to all messages regardless of who is mentioned. This only applies in server channels, not DMs. |
:::tip
If you want a normal bot-help channel where people can talk to Hermes without tagging it every time, add that channel to `DISCORD_FREE_RESPONSE_CHANNELS`.
:::
### Discord Gateway Model
Hermes on Discord is not a webhook that replies statelessly. It runs through the full messaging gateway, which means each incoming message goes through:
1. authorization (`DISCORD_ALLOWED_USERS`)
2. mention / free-response checks
3. session lookup
4. session transcript loading
5. normal Hermes agent execution, including tools, memory, and slash commands
6. response delivery back to Discord
That matters because behavior in a busy server depends on both Discord routing and Hermes session policy.
### Session Model in Discord
By default:
- each DM gets its own session
- each server thread gets its own session namespace
- each user in a shared channel gets their own session inside that channel
So if Alice and Bob both talk to Hermes in `#research`, Hermes treats those as separate conversations by default even though they are using the same visible Discord channel.
This is controlled by `config.yaml`:
```yaml
group_sessions_per_user: true
```
Set it to `false` only if you explicitly want one shared conversation for the entire room:
```yaml
group_sessions_per_user: false
```
Shared sessions can be useful for a collaborative room, but they also mean:
- users share context growth and token costs
- one person's long tool-heavy task can bloat everyone else's context
- one person's in-flight run can interrupt another person's follow-up in the same room
### Interrupts and Concurrency
Hermes tracks running agents by session key.
With the default `group_sessions_per_user: true`:
- Alice interrupting her own in-flight request only affects Alice's session in that channel
- Bob can keep talking in the same channel without inheriting Alice's history or interrupting Alice's run
With `group_sessions_per_user: false`:
- the whole room shares one running-agent slot for that channel/thread
- follow-up messages from different people can interrupt or queue behind each other
This guide walks you through the full setup process — from creating your bot on Discord's Developer Portal to sending your first message.
## Step 1: Create a Discord Application
1. Go to the [Discord Developer Portal](https://discord.com/developers/applications) and sign in with your Discord account.
2. Click **New Application** in the top-right corner.
3. Enter a name for your application (e.g., "Hermes Agent") and accept the Developer Terms of Service.
4. Click **Create**.
You'll land on the **General Information** page. Note the **Application ID** — you'll need it later to build the invite URL.
## Step 2: Create the Bot
1. In the left sidebar, click **Bot**.
2. Discord automatically creates a bot user for your application. You'll see the bot's username, which you can customize.
3. Under **Authorization Flow**:
- Set **Public Bot** to **ON** — required to use the Discord-provided invite link (recommended). This allows the Installation tab to generate a default authorization URL.
- Leave **Require OAuth2 Code Grant** set to **OFF**.
:::tip
You can set a custom avatar and banner for your bot on this page. This is what users will see in Discord.
:::
:::info[Private Bot Alternative]
If you prefer to keep your bot private (Public Bot = OFF), you **must** use the **Manual URL** method in Step 5 instead of the Installation tab. The Discord-provided link requires Public Bot to be enabled.
:::
## Step 3: Enable Privileged Gateway Intents
This is the most critical step in the entire setup. Without the correct intents enabled, your bot will connect to Discord but **will not be able to read message content**.
On the **Bot** page, scroll down to **Privileged Gateway Intents**. You'll see three toggles:
| Intent | Purpose | Required? |
|--------|---------|-----------|
| **Presence Intent** | See user online/offline status | Optional |
| **Server Members Intent** | Access the member list, resolve usernames | **Required** |
| **Message Content Intent** | Read the text content of messages | **Required** |
**Enable both Server Members Intent and Message Content Intent** by toggling them **ON**.
- Without **Message Content Intent**, your bot receives message events but the message text is empty — the bot literally cannot see what you typed.
- Without **Server Members Intent**, the bot cannot resolve usernames for the allowed users list and may fail to identify who is messaging it.
:::warning[This is the #1 reason Discord bots don't work]
If your bot is online but never responds to messages, the **Message Content Intent** is almost certainly disabled. Go back to the [Developer Portal](https://discord.com/developers/applications), select your application → Bot → Privileged Gateway Intents, and make sure **Message Content Intent** is toggled ON. Click **Save Changes**.
:::
**Regarding server count:**
- If your bot is in **fewer than 100 servers**, you can simply toggle intents on and off freely.
- If your bot is in **100 or more servers**, Discord requires you to submit a verification application to use privileged intents. For personal use, this is not a concern.
Click **Save Changes** at the bottom of the page.
## Step 4: Get the Bot Token
The bot token is the credential Hermes Agent uses to log in as your bot. Still on the **Bot** page:
1. Under the **Token** section, click **Reset Token**.
2. If you have two-factor authentication enabled on your Discord account, enter your 2FA code.
3. Discord will display your new token. **Copy it immediately.**
:::warning[Token shown only once]
The token is only displayed once. If you lose it, you'll need to reset it and generate a new one. Never share your token publicly or commit it to Git — anyone with this token has full control of your bot.
:::
Store the token somewhere safe (a password manager, for example). You'll need it in Step 8.
## Step 5: Generate the Invite URL
You need an OAuth2 URL to invite the bot to your server. There are two ways to do this:
### Option A: Using the Installation Tab (Recommended)
:::note[Requires Public Bot]
This method requires **Public Bot** to be set to **ON** in Step 2. If you set Public Bot to OFF, use the Manual URL method below instead.
:::
1. In the left sidebar, click **Installation**.
2. Under **Installation Contexts**, enable **Guild Install**.
3. For **Install Link**, select **Discord Provided Link**.
4. Under **Default Install Settings** for Guild Install:
- **Scopes**: select `bot` and `applications.commands`
- **Permissions**: select the permissions listed below.
### Option B: Manual URL
You can construct the invite URL directly using this format:
```
https://discord.com/oauth2/authorize?client_id=YOUR_APP_ID&scope=bot+applications.commands&permissions=274878286912
```
Replace `YOUR_APP_ID` with the Application ID from Step 1.
### Required Permissions
These are the minimum permissions your bot needs:
- **View Channels** — see the channels it has access to
- **Send Messages** — respond to your messages
- **Embed Links** — format rich responses
- **Attach Files** — send images, audio, and file outputs
- **Read Message History** — maintain conversation context
### Recommended Additional Permissions
- **Send Messages in Threads** — respond in thread conversations
- **Add Reactions** — react to messages for acknowledgment
### Permission Integers
| Level | Permissions Integer | What's Included |
|-------|-------------------|-----------------|
| Minimal | `117760` | View Channels, Send Messages, Read Message History, Attach Files |
| Recommended | `274878286912` | All of the above plus Embed Links, Send Messages in Threads, Add Reactions |
## Step 6: Invite to Your Server
1. Open the invite URL in your browser (from the Installation tab or the manual URL you constructed).
2. In the **Add to Server** dropdown, select your server.
3. Click **Continue**, then **Authorize**.
4. Complete the CAPTCHA if prompted.
:::info
You need the **Manage Server** permission on the Discord server to invite a bot. If you don't see your server in the dropdown, ask a server admin to use the invite link instead.
:::
After authorizing, the bot will appear in your server's member list (it will show as offline until you start the Hermes gateway).
## Step 7: Find Your Discord User ID
Hermes Agent uses your Discord User ID to control who can interact with the bot. To find it:
1. Open Discord (desktop or web app).
2. Go to **Settings** → **Advanced** → toggle **Developer Mode** to **ON**.
3. Close settings.
4. Right-click your own username (in a message, the member list, or your profile) → **Copy User ID**.
Your User ID is a long number like `284102345871466496`.
:::tip
Developer Mode also lets you copy **Channel IDs** and **Server IDs** the same way — right-click the channel or server name and select Copy ID. You'll need a Channel ID if you want to set a home channel manually.
:::
## Step 8: Configure Hermes Agent
### Option A: Interactive Setup (Recommended)
Run the guided setup command:
```bash
hermes gateway setup
```
Select **Discord** when prompted, then paste your bot token and user ID when asked.
### Option B: Manual Configuration
Add the following to your `~/.hermes/.env` file:
```bash
# Required
DISCORD_BOT_TOKEN=your-bot-token
DISCORD_ALLOWED_USERS=284102345871466496
# Multiple allowed users (comma-separated)
# DISCORD_ALLOWED_USERS=284102345871466496,198765432109876543
```
Then start the gateway:
```bash
hermes gateway
```
The bot should come online in Discord within a few seconds. Send it a message — either a DM or in a channel it can see — to test.
:::tip
You can run `hermes gateway` in the background or as a systemd service for persistent operation. See the deployment docs for details.
:::
## Configuration Reference
Discord behavior is controlled through two files: **`~/.hermes/.env`** for credentials and env-level toggles, and **`~/.hermes/config.yaml`** for structured settings. Environment variables always take precedence over config.yaml values when both are set.
### Environment Variables (`.env`)
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `DISCORD_BOT_TOKEN` | **Yes** | — | Bot token from the [Discord Developer Portal](https://discord.com/developers/applications). |
| `DISCORD_ALLOWED_USERS` | **Yes** | — | Comma-separated Discord user IDs allowed to interact with the bot. Without this **or** `DISCORD_ALLOWED_ROLES`, the gateway denies all users. |
| `DISCORD_ALLOWED_ROLES` | No | — | Comma-separated Discord role IDs. Any member with one of these roles is authorized — OR semantics with `DISCORD_ALLOWED_USERS`. Auto-enables the **Server Members Intent** on connect. Useful when moderation teams churn: new mods get access as soon as the role is granted, no config push needed. |
| `DISCORD_HOME_CHANNEL` | No | — | Channel ID where the bot sends proactive messages (cron output, reminders, notifications). |
| `DISCORD_HOME_CHANNEL_NAME` | No | `"Home"` | Display name for the home channel in logs and status output. |
| `DISCORD_COMMAND_SYNC_POLICY` | No | `"safe"` | Controls native slash-command startup sync. `"safe"` diffs existing global commands and only updates what changed, recreating commands when Discord metadata changes cannot be applied via patch. `"bulk"` preserves the old `tree.sync()` behavior. `"off"` skips startup sync entirely. |
| `DISCORD_REQUIRE_MENTION` | No | `true` | When `true`, the bot only responds in server channels when `@mentioned`. Set to `false` to respond to all messages in every channel. |
| `DISCORD_FREE_RESPONSE_CHANNELS` | No | — | Comma-separated channel IDs where the bot responds without requiring an `@mention`, even when `DISCORD_REQUIRE_MENTION` is `true`. |
| `DISCORD_IGNORE_NO_MENTION` | No | `true` | When `true`, the bot stays silent if a message `@mentions` other users but does **not** mention the bot. Prevents the bot from jumping into conversations directed at other people. Only applies in server channels, not DMs. |
| `DISCORD_AUTO_THREAD` | No | `true` | When `true`, automatically creates a new thread for every `@mention` in a text channel, so each conversation is isolated (similar to Slack behavior). Messages already inside threads or DMs are unaffected. |
| `DISCORD_ALLOW_BOTS` | No | `"none"` | Controls how the bot handles messages from other Discord bots. `"none"` — ignore all other bots. `"mentions"` — only accept bot messages that `@mention` Hermes. `"all"` — accept all bot messages. |
| `DISCORD_REACTIONS` | No | `true` | When `true`, the bot adds emoji reactions to messages during processing (👀 when starting, ✅ on success, ❌ on error). Set to `false` to disable reactions entirely. |
| `DISCORD_IGNORED_CHANNELS` | No | — | Comma-separated channel IDs where the bot **never** responds, even when `@mentioned`. Takes priority over all other channel settings. |
| `DISCORD_ALLOWED_CHANNELS` | No | — | Comma-separated channel IDs. When set, the bot **only** responds in these channels (plus DMs if allowed). Overrides `config.yaml` `discord.allowed_channels`. Combine with `DISCORD_IGNORED_CHANNELS` to express allow/deny rules. |
| `DISCORD_NO_THREAD_CHANNELS` | No | — | Comma-separated channel IDs where the bot responds directly in the channel instead of creating a thread. Only relevant when `DISCORD_AUTO_THREAD` is `true`. |
| `DISCORD_REPLY_TO_MODE` | No | `"first"` | Controls reply-reference behavior: `"off"` — never reply to the original message, `"first"` — reply-reference on the first message chunk only (default), `"all"` — reply-reference on every chunk. |
| `DISCORD_ALLOW_MENTION_EVERYONE` | No | `false` | When `false` (default), the bot cannot ping `@everyone` or `@here` even if its response contains those tokens. Set to `true` to opt back in. See [Mention Control](#mention-control) below. |
| `DISCORD_ALLOW_MENTION_ROLES` | No | `false` | When `false` (default), the bot cannot ping `@role` mentions. Set to `true` to allow. |
| `DISCORD_ALLOW_MENTION_USERS` | No | `true` | When `true` (default), the bot can ping individual users by ID. |
| `DISCORD_ALLOW_MENTION_REPLIED_USER` | No | `true` | When `true` (default), replying to a message pings the original author. |
| `DISCORD_PROXY` | No | — | Proxy URL for Discord connections (HTTP, WebSocket, REST). Overrides `HTTPS_PROXY`/`ALL_PROXY`. Supports `http://`, `https://`, and `socks5://` schemes. |
| `HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS` | No | `0.6` | Grace window the adapter waits before flushing a queued text chunk. Useful for smoothing streamed output. |
| `HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS` | No | `2.0` | Delay between split chunks when a single message exceeds Discord's length limit. |
### Config File (`config.yaml`)
The `discord` section in `~/.hermes/config.yaml` mirrors the env vars above. Config.yaml settings are applied as defaults — if the equivalent env var is already set, the env var wins.
```yaml
# Discord-specific settings
discord:
require_mention: true # Require @mention in server channels
free_response_channels: "" # Comma-separated channel IDs (or YAML list)
auto_thread: true # Auto-create threads on @mention
reactions: true # Add emoji reactions during processing
ignored_channels: [] # Channel IDs where bot never responds
no_thread_channels: [] # Channel IDs where bot responds without threading
channel_prompts: {} # Per-channel ephemeral system prompts
allow_mentions: # What the bot is allowed to ping (safe defaults)
everyone: false # @everyone / @here pings (default: false)
roles: false # @role pings (default: false)
users: true # @user pings (default: true)
replied_user: true # reply-reference pings the author (default: true)
# Session isolation (applies to all gateway platforms, not just Discord)
group_sessions_per_user: true # Isolate sessions per user in shared channels
```
#### `discord.require_mention`
**Type:** boolean — **Default:** `true`
When enabled, the bot only responds in server channels when directly `@mentioned`. DMs always get a response regardless of this setting.
#### `discord.free_response_channels`
**Type:** string or list — **Default:** `""`
Channel IDs where the bot responds to all messages without needing an `@mention`. Accepts either a comma-separated string or a YAML list:
```yaml
# String format
discord:
free_response_channels: "1234567890,9876543210"
# List format
discord:
free_response_channels:
- 1234567890
- 9876543210
```
If a thread's parent channel is in this list, the thread also becomes mention-free.
Free-response channels also **skip auto-threading** — the bot replies inline rather than spinning off a new thread per message. This keeps the channel usable as a lightweight chat surface. If you want threading behavior, don't list the channel as free-response (use normal `@mention` flow instead).
#### `discord.auto_thread`
**Type:** boolean — **Default:** `true`
When enabled, every `@mention` in a regular text channel automatically creates a new thread for the conversation. This keeps the main channel clean and gives each conversation its own isolated session history. Once a thread is created, subsequent messages in that thread don't require `@mention` — the bot knows it's already participating.
Messages sent in existing threads or DMs are unaffected by this setting. Channels listed in `discord.free_response_channels` or `discord.no_thread_channels` also bypass auto-threading and get inline replies instead.
#### `discord.reactions`
**Type:** boolean — **Default:** `true`
Controls whether the bot adds emoji reactions to messages as visual feedback:
- 👀 added when the bot starts processing your message
- ✅ added when the response is delivered successfully
- ❌ added if an error occurs during processing
Disable this if you find the reactions distracting or if the bot's role doesn't have the **Add Reactions** permission.
#### `discord.ignored_channels`
**Type:** string or list — **Default:** `[]`
Channel IDs where the bot **never** responds, even when directly `@mentioned`. This takes the highest priority — if a channel is in this list, the bot silently ignores all messages there, regardless of `require_mention`, `free_response_channels`, or any other setting.
```yaml
# String format
discord:
ignored_channels: "1234567890,9876543210"
# List format
discord:
ignored_channels:
- 1234567890
- 9876543210
```
If a thread's parent channel is in this list, messages in that thread are also ignored.
#### `discord.no_thread_channels`
**Type:** string or list — **Default:** `[]`
Channel IDs where the bot responds directly in the channel instead of auto-creating a thread. This only has an effect when `auto_thread` is `true` (the default). In these channels, the bot responds inline like a normal message rather than spawning a new thread.
```yaml
discord:
no_thread_channels:
- 1234567890 # Bot responds inline here
```
Useful for channels dedicated to bot interaction where threads would add unnecessary noise.
#### `discord.channel_prompts`
**Type:** mapping — **Default:** `{}`
Per-channel ephemeral system prompts that are injected on every turn in the matching Discord channel or thread without being persisted to transcript history.
```yaml
discord:
channel_prompts:
"1234567890": |
This channel is for research tasks. Prefer deep comparisons,
citations, and concise synthesis.
"9876543210": |
This forum is for therapy-style support. Be warm, grounded,
and non-judgmental.
```
Behavior:
- Exact thread/channel ID matches win.
- If a message arrives inside a thread or forum post and that thread has no explicit entry, Hermes falls back to the parent channel/forum ID.
- Prompts are applied ephemerally at runtime, so changing them affects future turns immediately without rewriting past session history.
#### `group_sessions_per_user`
**Type:** boolean — **Default:** `true`
This is a global gateway setting (not Discord-specific) that controls whether users in the same channel get isolated session histories.
When `true`: Alice and Bob talking in `#research` each have their own separate conversation with Hermes. When `false`: the entire channel shares one conversation transcript and one running-agent slot.
```yaml
group_sessions_per_user: true
```
See the [Session Model](#session-model-in-discord) section above for the full implications of each mode.
#### `display.tool_progress`
**Type:** string — **Default:** `"all"` — **Values:** `off`, `new`, `all`, `verbose`
Controls whether the bot sends progress messages in the chat while processing (e.g., "Reading file...", "Running terminal command..."). This is a global gateway setting that applies to all platforms.
```yaml
display:
tool_progress: "all" # off | new | all | verbose
```
- `off` — no progress messages
- `new` — only show the first tool call per turn
- `all` — show all tool calls (truncated to 40 characters in gateway messages)
- `verbose` — show full tool call details (can produce long messages)
#### `display.tool_progress_command`
**Type:** boolean — **Default:** `false`
When enabled, makes the `/verbose` slash command available in the gateway, letting you cycle through tool progress modes (`off → new → all → verbose → off`) without editing config.yaml.
```yaml
display:
tool_progress_command: true
```
## Interactive Model Picker
Send `/model` with no arguments in a Discord channel to open a dropdown-based model picker:
1. **Provider selection** — a Select dropdown showing available providers (up to 25).
2. **Model selection** — a second dropdown with models for the chosen provider (up to 25).
The picker times out after 120 seconds. Only authorized users (those in `DISCORD_ALLOWED_USERS`) can interact with it. If you know the model name, type `/model ` directly.
## Native Slash Commands for Skills
Hermes automatically registers installed skills as **native Discord Application Commands**. This means skills appear in Discord's autocomplete `/` menu alongside built-in commands.
- Each skill becomes a Discord slash command (e.g., `/code-review`, `/ascii-art`)
- Skills accept an optional `args` string parameter
- Discord has a limit of 100 application commands per bot — if you have more skills than available slots, extra skills are skipped with a warning in the logs
- Skills are registered during bot startup alongside built-in commands like `/model`, `/reset`, and `/background`
No extra configuration is needed — any skill installed via `hermes skills install` is automatically registered as a Discord slash command on the next gateway restart.
### Disabling Slash Command Registration
If you run multiple Hermes gateways against the same Discord application (e.g. staging + production), only one of them should own the global slash-command registration — otherwise the last startup wins and the registrations flap. Turn slash registration off on the "follower" gateway:
```yaml
gateway:
platforms:
discord:
extra:
slash_commands: false # default: true
```
Leaving this at `true` on the "primary" gateway keeps the normal behavior — global `/`-menu commands for built-ins and installed skills.
## Sending Media (`send_message` + `MEDIA:` tags)
The Discord adapter supports native file uploads for every common media type via the `send_message` tool and inline `MEDIA:/path/to/file` tags emitted by the agent:
| Type | How it's delivered |
|---|---|
| Images (PNG/JPG/WebP) | Native Discord image attachment with inline preview |
| Animated GIFs | `send_animation` uploads as `animation.gif` so Discord plays it inline (not as a static thumbnail) |
| Video (MP4/MOV) | `send_video` — native video player |
| Audio / Voice | `send_voice` — native voice message when possible, file attachment otherwise |
| Documents (PDF/ZIP/docx/etc.) | `send_document` — native attachment with download button |
Discord's per-upload size limit depends on the server's boost tier (25 MB free, up to 500 MB). If Hermes gets an HTTP 413, the adapter falls back to a link pointing at the local cache path rather than failing silently.
## Home Channel
You can designate a "home channel" where the bot sends proactive messages (such as cron job output, reminders, and notifications). There are two ways to set it:
### Using the Slash Command
Type `/sethome` in any Discord channel where the bot is present. That channel becomes the home channel.
### Manual Configuration
Add these to your `~/.hermes/.env`:
```bash
DISCORD_HOME_CHANNEL=123456789012345678
DISCORD_HOME_CHANNEL_NAME="#bot-updates"
```
Replace the ID with the actual channel ID (right-click → Copy Channel ID with Developer Mode on).
## Voice Messages
Hermes Agent supports Discord voice messages:
- **Incoming voice messages** are automatically transcribed using the configured STT provider: local `faster-whisper` (no key), Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`).
- **Text-to-speech**: Use `/voice tts` to have the bot send spoken audio responses alongside text replies.
- **Discord voice channels**: Hermes can also join a voice channel, listen to users speaking, and talk back in the channel.
For the full setup and operational guide, see:
- [Voice Mode](/docs/user-guide/features/voice-mode)
- [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)
## Forum Channels
Discord forum channels (type 15) don't accept direct messages — every post in a forum must be a thread. Hermes auto-detects forum channels and creates a new thread post whenever it needs to send there, so `send_message`, TTS, images, voice messages, and file attachments all work without special handling from the agent.
- **Thread name** is derived from the first line of the message (markdown heading prefix stripped, capped at 100 chars). When the message is attachment-only, the filename is used as the fallback thread name.
- **Attachments** ride along on the starter message of the new thread — no separate upload step, no partial sends.
- **One call, one thread**: each forum send creates a new thread. Successive sends to the same forum will therefore produce separate threads.
- **Detection is three-layered**: the channel directory cache first, a process-local probe cache second, and a live `GET /channels/{id}` probe as a last resort (whose result is then memoized for the life of the process).
Refreshing the directory (`/channels refresh` on platforms that expose it, or a gateway restart) populates the cache with any forum channels created after the bot started.
## Troubleshooting
### Bot is online but not responding to messages
**Cause**: Message Content Intent is disabled.
**Fix**: Go to [Developer Portal](https://discord.com/developers/applications) → your app → Bot → Privileged Gateway Intents → enable **Message Content Intent** → Save Changes. Restart the gateway.
### "Disallowed Intents" error on startup
**Cause**: Your code requests intents that aren't enabled in the Developer Portal.
**Fix**: Enable all three Privileged Gateway Intents (Presence, Server Members, Message Content) in the Bot settings, then restart.
### Bot can't see messages in a specific channel
**Cause**: The bot's role doesn't have permission to view that channel.
**Fix**: In Discord, go to the channel's settings → Permissions → add the bot's role with **View Channel** and **Read Message History** enabled.
### 403 Forbidden errors
**Cause**: The bot is missing required permissions.
**Fix**: Re-invite the bot with the correct permissions using the URL from Step 5, or manually adjust the bot's role permissions in Server Settings → Roles.
### Bot is offline
**Cause**: The Hermes gateway isn't running, or the token is incorrect.
**Fix**: Check that `hermes gateway` is running. Verify `DISCORD_BOT_TOKEN` in your `.env` file. If you recently reset the token, update it.
### "User not allowed" / Bot ignores you
**Cause**: Your User ID isn't in `DISCORD_ALLOWED_USERS`.
**Fix**: Add your User ID to `DISCORD_ALLOWED_USERS` in `~/.hermes/.env` and restart the gateway.
### People in the same channel are sharing context unexpectedly
**Cause**: `group_sessions_per_user` is disabled, or the platform cannot provide a user ID for the messages in that context.
**Fix**: Set this in `~/.hermes/config.yaml` and restart the gateway:
```yaml
group_sessions_per_user: true
```
If you intentionally want a shared room conversation, leave it off — just expect shared transcript history and shared interrupt behavior.
## Security
:::warning
Always set `DISCORD_ALLOWED_USERS` (or `DISCORD_ALLOWED_ROLES`) to restrict who can interact with the bot. Without either, the gateway denies all users by default as a safety measure. Only authorize people you trust — authorized users have full access to the agent's capabilities, including tool use and system access.
:::
### Role-Based Access Control
For servers where access is managed by roles instead of individual user lists (moderator teams, support staff, internal tooling), use `DISCORD_ALLOWED_ROLES` — a comma-separated list of role IDs. Any member with one of those roles is authorized.
```bash
# ~/.hermes/.env — works alongside or instead of DISCORD_ALLOWED_USERS
DISCORD_ALLOWED_ROLES=987654321098765432,876543210987654321
```
Semantics:
- **OR with user allowlist.** A user is authorized if their ID is in `DISCORD_ALLOWED_USERS` **or** they have any role in `DISCORD_ALLOWED_ROLES`.
- **Server Members Intent auto-enabled.** When `DISCORD_ALLOWED_ROLES` is set, the bot enables the Members intent on connect — required for Discord to send role information with member records.
- **Role IDs, not names.** Grab them from Discord: **User Settings → Advanced → Developer Mode ON**, then right-click any role → **Copy Role ID**.
- **DM fallback.** In DMs the role check scans mutual guilds; a user with an allowed role in any shared server is authorized in DMs too.
This is the preferred pattern when the moderation team churns — new moderators get access the moment the role is granted, with no `.env` edit or gateway restart.
### Mention Control
By default, Hermes blocks the bot from pinging `@everyone`, `@here`, and role mentions, even if its reply contains those tokens. This prevents a poorly-worded prompt or echoed user content from spamming a whole server. Individual `@user` pings and reply-reference pings (the little "replying to…" chip) stay enabled so normal conversation still works.
You can relax these defaults via either env vars or `config.yaml`:
```yaml
# ~/.hermes/config.yaml
discord:
allow_mentions:
everyone: false # allow the bot to ping @everyone / @here
roles: false # allow the bot to ping @role mentions
users: true # allow the bot to ping individual @users
replied_user: true # ping the author when replying to their message
```
```bash
# ~/.hermes/.env — env vars win over config.yaml
DISCORD_ALLOW_MENTION_EVERYONE=false
DISCORD_ALLOW_MENTION_ROLES=false
DISCORD_ALLOW_MENTION_USERS=true
DISCORD_ALLOW_MENTION_REPLIED_USER=true
```
:::tip
Leave `everyone` and `roles` at `false` unless you know exactly why you need them. It is very easy for an LLM to produce the string `@everyone` inside a normal-looking response; without this protection, that would notify every member of your server.
:::
For more information on securing your Hermes Agent deployment, see the [Security Guide](../security.md).
---
# Slack
# Slack Setup
Connect Hermes Agent to Slack as a bot using Socket Mode. Socket Mode uses WebSockets instead of
public HTTP endpoints, so your Hermes instance doesn't need to be publicly accessible — it works
behind firewalls, on your laptop, or on a private server.
:::warning Classic Slack Apps Deprecated
Classic Slack apps (using RTM API) were **fully deprecated in March 2025**. Hermes uses the modern
Bolt SDK with Socket Mode. If you have an old classic app, you must create a new one following
the steps below.
:::
## Overview
| Component | Value |
|-----------|-------|
| **Library** | `slack-bolt` / `slack_sdk` for Python (Socket Mode) |
| **Connection** | WebSocket — no public URL required |
| **Auth tokens needed** | Bot Token (`xoxb-`) + App-Level Token (`xapp-`) |
| **User identification** | Slack Member IDs (e.g., `U01ABC2DEF3`) |
---
## Step 1: Create a Slack App
The fastest path is to paste a manifest Hermes generates for you. It
declares every built-in slash command (`/btw`, `/stop`, `/model`, …),
every required OAuth scope, every event subscription, and enables Socket
Mode — all at once.
### Option A: From a Hermes-generated manifest (recommended)
1. Generate the manifest:
```bash
hermes slack manifest --write
```
This writes `~/.hermes/slack-manifest.json` and prints paste-in
instructions.
2. Go to [https://api.slack.com/apps](https://api.slack.com/apps) →
**Create New App** → **From an app manifest**
3. Pick your workspace, paste the JSON contents, review, click **Next**
→ **Create**
4. Skip ahead to **Step 6: Install App to Workspace**. The manifest
handled scopes, events, and slash commands for you.
### Option B: From scratch (manual)
1. Go to [https://api.slack.com/apps](https://api.slack.com/apps)
2. Click **Create New App**
3. Choose **From scratch**
4. Enter an app name (e.g., "Hermes Agent") and select your workspace
5. Click **Create App**
You'll land on the app's **Basic Information** page. Continue with
Steps 2–6 below.
---
## Step 2: Configure Bot Token Scopes
Navigate to **Features → OAuth & Permissions** in the sidebar. Scroll to **Scopes → Bot Token Scopes** and add the following:
| Scope | Purpose |
|-------|---------|
| `chat:write` | Send messages as the bot |
| `app_mentions:read` | Detect when @mentioned in channels |
| `channels:history` | Read messages in public channels the bot is in |
| `channels:read` | List and get info about public channels |
| `groups:history` | Read messages in private channels the bot is invited to |
| `im:history` | Read direct message history |
| `im:read` | View basic DM info |
| `im:write` | Open and manage DMs |
| `users:read` | Look up user information |
| `files:read` | Read and download attached files, including voice notes/audio |
| `files:write` | Upload files (images, audio, documents) |
:::caution Missing scopes = missing features
Without `channels:history` and `groups:history`, the bot **will not receive messages in channels** —
it will only work in DMs. Without `files:read`, Hermes can chat but **cannot reliably read user-uploaded attachments**.
These are the most commonly missed scopes.
:::
**Optional scopes:**
| Scope | Purpose |
|-------|---------|
| `groups:read` | List and get info about private channels |
---
## Step 3: Enable Socket Mode
Socket Mode lets the bot connect via WebSocket instead of requiring a public URL.
1. In the sidebar, go to **Settings → Socket Mode**
2. Toggle **Enable Socket Mode** to ON
3. You'll be prompted to create an **App-Level Token**:
- Name it something like `hermes-socket` (the name doesn't matter)
- Add the **`connections:write`** scope
- Click **Generate**
4. **Copy the token** — it starts with `xapp-`. This is your `SLACK_APP_TOKEN`
:::tip
You can always find or regenerate app-level tokens under **Settings → Basic Information → App-Level Tokens**.
:::
---
## Step 4: Subscribe to Events
This step is critical — it controls what messages the bot can see.
1. In the sidebar, go to **Features → Event Subscriptions**
2. Toggle **Enable Events** to ON
3. Expand **Subscribe to bot events** and add:
| Event | Required? | Purpose |
|-------|-----------|---------|
| `message.im` | **Yes** | Bot receives direct messages |
| `message.channels` | **Yes** | Bot receives messages in **public** channels it's added to |
| `message.groups` | **Recommended** | Bot receives messages in **private** channels it's invited to |
| `app_mention` | **Yes** | Prevents Bolt SDK errors when bot is @mentioned |
4. Click **Save Changes** at the bottom of the page
:::danger Missing event subscriptions is the #1 setup issue
If the bot works in DMs but **not in channels**, you almost certainly forgot to add
`message.channels` (for public channels) and/or `message.groups` (for private channels).
Without these events, Slack simply never delivers channel messages to the bot.
:::
---
## Step 5: Enable the Messages Tab
This step enables direct messages to the bot. Without it, users see **"Sending messages to this app has been turned off"** when trying to DM the bot.
1. In the sidebar, go to **Features → App Home**
2. Scroll to **Show Tabs**
3. Toggle **Messages Tab** to ON
4. Check **"Allow users to send Slash commands and messages from the messages tab"**
:::danger Without this step, DMs are completely blocked
Even with all the correct scopes and event subscriptions, Slack will not allow users to send direct messages to the bot unless the Messages Tab is enabled. This is a Slack platform requirement, not a Hermes configuration issue.
:::
---
## Step 6: Install App to Workspace
1. In the sidebar, go to **Settings → Install App**
2. Click **Install to Workspace**
3. Review the permissions and click **Allow**
4. After authorization, you'll see a **Bot User OAuth Token** starting with `xoxb-`
5. **Copy this token** — this is your `SLACK_BOT_TOKEN`
:::tip
If you change scopes or event subscriptions later, you **must reinstall the app** for the changes
to take effect. The Install App page will show a banner prompting you to do so.
:::
---
## Step 7: Find User IDs for the Allowlist
Hermes uses Slack **Member IDs** (not usernames or display names) for the allowlist.
To find a Member ID:
1. In Slack, click on the user's name or avatar
2. Click **View full profile**
3. Click the **⋮** (more) button
4. Select **Copy member ID**
Member IDs look like `U01ABC2DEF3`. You need your own Member ID at minimum.
---
## Step 8: Configure Hermes
Add the following to your `~/.hermes/.env` file:
```bash
# Required
SLACK_BOT_TOKEN=xoxb-your-bot-token-here
SLACK_APP_TOKEN=xapp-your-app-token-here
SLACK_ALLOWED_USERS=U01ABC2DEF3 # Comma-separated Member IDs
# Optional
SLACK_HOME_CHANNEL=C01234567890 # Default channel for cron/scheduled messages
SLACK_HOME_CHANNEL_NAME=general # Human-readable name for the home channel (optional)
```
Or run the interactive setup:
```bash
hermes gateway setup # Select Slack when prompted
```
Then start the gateway:
```bash
hermes gateway # Foreground
hermes gateway install # Install as a user service
sudo hermes gateway install --system # Linux only: boot-time system service
```
---
## Step 9: Invite the Bot to Channels
After starting the gateway, you need to **invite the bot** to any channel where you want it to respond:
```
/invite @Hermes Agent
```
The bot will **not** automatically join channels. You must invite it to each channel individually.
---
## Slash Commands
Every Hermes command (`/btw`, `/stop`, `/new`, `/model`, `/help`, ...)
is a native Slack slash command — exactly the way they work on Telegram
and Discord. Type `/` in Slack and the autocomplete picker lists every
Hermes command with its description.
Under the hood: Hermes ships with a generated Slack app manifest (see
Step 1, Option A) that declares every command in
[`COMMAND_REGISTRY`](https://github.com/NousResearch/hermes-agent/blob/main/hermes_cli/commands.py)
as a slash command. In Socket Mode, Slack routes the command event
through the WebSocket regardless of the manifest's `url` field.
### Refreshing slash commands after updates
When Hermes adds new commands (e.g. after `hermes update`), regenerate
the manifest and update your Slack app:
```bash
hermes slack manifest --write
```
Then in Slack:
1. Open [https://api.slack.com/apps](https://api.slack.com/apps) →
your Hermes app
2. **Features → App Manifest → Edit**
3. Paste the new contents of `~/.hermes/slack-manifest.json`
4. **Save**. Slack will prompt to reinstall the app if scopes or slash
commands changed.
### Legacy `/hermes ` still works
For backward compatibility with older manifests, you can still type
`/hermes btw run the tests` — Hermes routes it the same way as `/btw
run the tests`. Free-form questions also work: `/hermes what's the
weather?` is treated as a regular message.
### Advanced: emit only the slash-commands array
If you maintain your Slack manifest by hand and just want the slash
command list:
```bash
hermes slack manifest --slashes-only > /tmp/slashes.json
```
Paste that array into the `features.slash_commands` key of your
existing manifest.
---
## How the Bot Responds
Understanding how Hermes behaves in different contexts:
| Context | Behavior |
|---------|----------|
| **DMs** | Bot responds to every message — no @mention needed |
| **Channels** | Bot **only responds when @mentioned** (e.g., `@Hermes Agent what time is it?`). In channels, Hermes replies in a thread attached to that message. |
| **Threads** | If you @mention Hermes inside an existing thread, it replies in that same thread. Once the bot has an active session in a thread, **subsequent replies in that thread do not require @mention** — the bot follows the conversation naturally. |
:::tip
In channels, always @mention the bot to start a conversation. Once the bot is active in a thread, you can reply in that thread without mentioning it. Outside of threads, messages without @mention are ignored to prevent noise in busy channels.
:::
---
## Configuration Options
Beyond the required environment variables from Step 8, you can customize Slack bot behavior through `~/.hermes/config.yaml`.
### Thread & Reply Behavior
```yaml
platforms:
slack:
# Controls how multi-part responses are threaded
# "off" — never thread replies to the original message
# "first" — first chunk threads to user's message (default)
# "all" — all chunks thread to user's message
reply_to_mode: "first"
extra:
# Whether to reply in a thread (default: true).
# When false, channel messages get direct channel replies instead
# of threads. Messages inside existing threads still reply in-thread.
reply_in_thread: true
# Also post thread replies to the main channel
# (Slack's "Also send to channel" feature).
# Only the first chunk of the first reply is broadcast.
reply_broadcast: false
```
| Key | Default | Description |
|-----|---------|-------------|
| `platforms.slack.reply_to_mode` | `"first"` | Threading mode for multi-part messages: `"off"`, `"first"`, or `"all"` |
| `platforms.slack.extra.reply_in_thread` | `true` | When `false`, channel messages get direct replies instead of threads. Messages inside existing threads still reply in-thread. |
| `platforms.slack.extra.reply_broadcast` | `false` | When `true`, thread replies are also posted to the main channel. Only the first chunk is broadcast. |
### Session Isolation
```yaml
# Global setting — applies to Slack and all other platforms
group_sessions_per_user: true
```
When `true` (the default), each user in a shared channel gets their own isolated conversation session. Two people talking to Hermes in `#general` will have separate histories and contexts.
Set to `false` if you want a collaborative mode where the entire channel shares one conversation session. Be aware this means users share context growth and token costs, and one user's `/reset` clears the session for everyone.
### Mention & Trigger Behavior
```yaml
slack:
# Require @mention in channels (this is the default behavior;
# the Slack adapter enforces @mention gating in channels regardless,
# but you can set this explicitly for consistency with other platforms)
require_mention: true
# Prevent thread auto-engagement: only reply to channel messages that
# contain an explicit @mention. With this OFF (default), Slack can
# "auto-engage" — remembering past mentions in a thread and following
# up on bot-message replies, and resuming active sessions without a
# fresh mention. With strict_mention ON, every new channel message
# must @mention the bot before Hermes will respond.
strict_mention: false
# Custom mention patterns that trigger the bot
# (in addition to the default @mention detection)
mention_patterns:
- "hey hermes"
- "hermes,"
# Text prepended to every outgoing message
reply_prefix: ""
```
:::tip When to use `strict_mention`
Set this to `true` in busy workspaces where Slack's default "the bot remembers this thread" behavior surprises users — for example, a long tech-support thread where the bot helped at the start and you'd rather it stay silent unless explicitly pinged again. DMs and active interactive sessions are unaffected.
:::
:::info
Slack supports both patterns: `@mention` required to start a conversation by default, but you can opt specific channels out via `SLACK_FREE_RESPONSE_CHANNELS` (comma-separated channel IDs) or `slack.free_response_channels` in `config.yaml`. Once the bot has an active session in a thread, subsequent thread replies do not require a mention. In DMs the bot always responds without needing a mention.
:::
### Unauthorized User Handling
```yaml
slack:
# What happens when an unauthorized user (not in SLACK_ALLOWED_USERS) DMs the bot
# "pair" — prompt them for a pairing code (default)
# "ignore" — silently drop the message
unauthorized_dm_behavior: "pair"
```
You can also set this globally for all platforms:
```yaml
unauthorized_dm_behavior: "pair"
```
The platform-specific setting under `slack:` takes precedence over the global setting.
### Voice Transcription
```yaml
# Global setting — enable/disable automatic transcription of incoming voice messages
stt_enabled: true
```
When `true` (the default), incoming audio messages are automatically transcribed using the configured STT provider before being processed by the agent.
### Full Example
```yaml
# Global gateway settings
group_sessions_per_user: true
unauthorized_dm_behavior: "pair"
stt_enabled: true
# Slack-specific settings
slack:
require_mention: true
unauthorized_dm_behavior: "pair"
# Platform config
platforms:
slack:
reply_to_mode: "first"
extra:
reply_in_thread: true
reply_broadcast: false
```
---
## Home Channel
Set `SLACK_HOME_CHANNEL` to a channel ID where Hermes will deliver scheduled messages,
cron job results, and other proactive notifications. To find a channel ID:
1. Right-click the channel name in Slack
2. Click **View channel details**
3. Scroll to the bottom — the Channel ID is shown there
```bash
SLACK_HOME_CHANNEL=C01234567890
```
Make sure the bot has been **invited to the channel** (`/invite @Hermes Agent`).
---
## Multi-Workspace Support
Hermes can connect to **multiple Slack workspaces** simultaneously using a single gateway instance. Each workspace is authenticated independently with its own bot user ID.
### Configuration
Provide multiple bot tokens as a **comma-separated list** in `SLACK_BOT_TOKEN`:
```bash
# Multiple bot tokens — one per workspace
SLACK_BOT_TOKEN=xoxb-workspace1-token,xoxb-workspace2-token,xoxb-workspace3-token
# A single app-level token is still used for Socket Mode
SLACK_APP_TOKEN=xapp-your-app-token
```
Or in `~/.hermes/config.yaml`:
```yaml
platforms:
slack:
token: "xoxb-workspace1-token,xoxb-workspace2-token"
```
### OAuth Token File
In addition to tokens in the environment or config, Hermes also loads tokens from an **OAuth token file** at:
```
~/.hermes/slack_tokens.json
```
This file is a JSON object mapping team IDs to token entries:
```json
{
"T01ABC2DEF3": {
"token": "xoxb-workspace-token-here",
"team_name": "My Workspace"
}
}
```
Tokens from this file are merged with any tokens specified via `SLACK_BOT_TOKEN`. Duplicate tokens are automatically deduplicated.
### How it works
- The **first token** in the list is the primary token, used for the Socket Mode connection (AsyncApp).
- Each token is authenticated via `auth.test` on startup. The gateway maps each `team_id` to its own `WebClient` and `bot_user_id`.
- When a message arrives, Hermes uses the correct workspace-specific client to respond.
- The primary `bot_user_id` (from the first token) is used for backward compatibility with features that expect a single bot identity.
---
## Voice Messages
Hermes supports voice on Slack:
- **Incoming:** Voice/audio messages are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
- **Outgoing:** TTS responses are sent as audio file attachments
---
## Per-Channel Prompts
Assign ephemeral system prompts to specific Slack channels. The prompt is injected at runtime on every turn — never persisted to transcript history — so changes take effect immediately.
```yaml
slack:
channel_prompts:
"C01RESEARCH": |
You are a research assistant. Focus on academic sources,
citations, and concise synthesis.
"C02ENGINEERING": |
Code review mode. Be precise about edge cases and
performance implications.
```
Keys are Slack channel IDs (find them via channel details → "About" → scroll to bottom). All messages in the matching channel get the prompt injected as an ephemeral system instruction.
## Per-Channel Skill Bindings
Auto-load a skill whenever a new session starts in a specific channel or DM. Unlike per-channel prompts (which are injected on every turn), skill bindings inject the skill content as a user message at **session start** — it becomes part of the conversation history and does not need to be reloaded on subsequent turns.
This is ideal for DMs or channels with a dedicated purpose (flashcards, a domain-specific Q&A bot, a support triage channel, etc.) where you don't want the model's own skill selector to decide whether to load on every short reply.
```yaml
slack:
channel_skill_bindings:
# DM channel — always runs in "german-flashcards" mode
- id: "D0ATH9TQ0G6"
skills:
- german-flashcards
# Research channel — preload multiple skills in order
- id: "C01RESEARCH"
skills:
- arxiv
- writing-plans
# Short form: single skill as a string
- id: "C02SUPPORT"
skill: hubspot-on-demand
```
Notes:
- The binding matches by channel ID. For threaded messages in a bound channel, the thread inherits the parent channel's binding.
- The skill is loaded only at session start (new session or after auto-reset). If you change the binding, run `/new` or wait for the session to auto-reset for it to take effect.
- Combine with `channel_prompts` for per-channel tone/constraints on top of the skill's instructions.
## Troubleshooting
| Problem | Solution |
|---------|----------|
| Bot doesn't respond to DMs | Verify `message.im` is in your event subscriptions and the app is reinstalled |
| Bot works in DMs but not in channels | **Most common issue.** Add `message.channels` and `message.groups` to event subscriptions, reinstall the app, and invite the bot to the channel with `/invite @Hermes Agent` |
| Bot doesn't respond to @mentions in channels | 1) Check `message.channels` event is subscribed. 2) Bot must be invited to the channel. 3) Ensure `channels:history` scope is added. 4) Reinstall the app after scope/event changes |
| Bot ignores messages in private channels | Add both the `message.groups` event subscription and `groups:history` scope, then reinstall the app and `/invite` the bot |
| "Sending messages to this app has been turned off" in DMs | Enable the **Messages Tab** in App Home settings (see Step 5) |
| "not_authed" or "invalid_auth" errors | Regenerate your Bot Token and App Token, update `.env` |
| Bot responds but can't post in a channel | Invite the bot to the channel with `/invite @Hermes Agent` |
| Bot can chat but can't read uploaded images/files | Add `files:read`, then **reinstall** the app. Hermes now surfaces attachment access diagnostics in-chat when Slack returns scope/auth/permission failures. |
| `missing_scope` error | Add the required scope in OAuth & Permissions, then **reinstall** the app |
| Socket disconnects frequently | Check your network; Bolt auto-reconnects but unstable connections cause lag |
| Changed scopes/events but nothing changed | You **must reinstall** the app to your workspace after any scope or event subscription change |
### Quick Checklist
If the bot isn't working in channels, verify **all** of the following:
1. ✅ `message.channels` event is subscribed (for public channels)
2. ✅ `message.groups` event is subscribed (for private channels)
3. ✅ `app_mention` event is subscribed
4. ✅ `channels:history` scope is added (for public channels)
5. ✅ `groups:history` scope is added (for private channels)
6. ✅ App was **reinstalled** after adding scopes/events
7. ✅ Bot was **invited** to the channel (`/invite @Hermes Agent`)
8. ✅ You are **@mentioning** the bot in your message
---
## Security
:::warning
**Always set `SLACK_ALLOWED_USERS`** with the Member IDs of authorized users. Without this setting,
the gateway will **deny all messages** by default as a safety measure. Never share your bot tokens —
treat them like passwords.
:::
- Tokens should be stored in `~/.hermes/.env` (file permissions `600`)
- Rotate tokens periodically via the Slack app settings
- Audit who has access to your Hermes config directory
- Socket Mode means no public endpoint is exposed — one less attack surface
---
# WhatsApp
# WhatsApp Setup
Hermes connects to WhatsApp through a built-in bridge based on **Baileys**. This works by emulating a WhatsApp Web session — **not** through the official WhatsApp Business API. No Meta developer account or Business verification is required.
:::warning Unofficial API — Ban Risk
WhatsApp does **not** officially support third-party bots outside the Business API. Using a third-party bridge carries a small risk of account restrictions. To minimize risk:
- **Use a dedicated phone number** for the bot (not your personal number)
- **Don't send bulk/spam messages** — keep usage conversational
- **Don't automate outbound messaging** to people who haven't messaged first
:::
:::warning WhatsApp Web Protocol Updates
WhatsApp periodically updates their Web protocol, which can temporarily break compatibility
with third-party bridges. When this happens, Hermes will update the bridge dependency. If the
bot stops working after a WhatsApp update, pull the latest Hermes version and re-pair.
:::
## Two Modes
| Mode | How it works | Best for |
|------|-------------|----------|
| **Separate bot number** (recommended) | Dedicate a phone number to the bot. People message that number directly. | Clean UX, multiple users, lower ban risk |
| **Personal self-chat** | Use your own WhatsApp. You message yourself to talk to the agent. | Quick setup, single user, testing |
---
## Prerequisites
- **Node.js v18+** and **npm** — the WhatsApp bridge runs as a Node.js process
- **A phone with WhatsApp** installed (for scanning the QR code)
Unlike older browser-driven bridges, the current Baileys-based bridge does **not** require a local Chromium or Puppeteer dependency stack.
---
## Step 1: Run the Setup Wizard
```bash
hermes whatsapp
```
The wizard will:
1. Ask which mode you want (**bot** or **self-chat**)
2. Install bridge dependencies if needed
3. Display a **QR code** in your terminal
4. Wait for you to scan it
**To scan the QR code:**
1. Open WhatsApp on your phone
2. Go to **Settings → Linked Devices**
3. Tap **Link a Device**
4. Point your camera at the terminal QR code
Once paired, the wizard confirms the connection and exits. Your session is saved automatically.
:::tip
If the QR code looks garbled, make sure your terminal is at least 60 columns wide and supports
Unicode. You can also try a different terminal emulator.
:::
---
## Step 2: Getting a Second Phone Number (Bot Mode)
For bot mode, you need a phone number that isn't already registered with WhatsApp. Three options:
| Option | Cost | Notes |
|--------|------|-------|
| **Google Voice** | Free | US only. Get a number at [voice.google.com](https://voice.google.com). Verify WhatsApp via SMS through the Google Voice app. |
| **Prepaid SIM** | $5–15 one-time | Any carrier. Activate, verify WhatsApp, then the SIM can sit in a drawer. Number must stay active (make a call every 90 days). |
| **VoIP services** | Free–$5/month | TextNow, TextFree, or similar. Some VoIP numbers are blocked by WhatsApp — try a few if the first doesn't work. |
After getting the number:
1. Install WhatsApp on a phone (or use WhatsApp Business app with dual-SIM)
2. Register the new number with WhatsApp
3. Run `hermes whatsapp` and scan the QR code from that WhatsApp account
---
## Step 3: Configure Hermes
Add the following to your `~/.hermes/.env` file:
```bash
# Required
WHATSAPP_ENABLED=true
WHATSAPP_MODE=bot # "bot" or "self-chat"
# Access control — pick ONE of these options:
WHATSAPP_ALLOWED_USERS=15551234567 # Comma-separated phone numbers (with country code, no +)
# WHATSAPP_ALLOWED_USERS=* # OR use * to allow everyone
# WHATSAPP_ALLOW_ALL_USERS=true # OR set this flag instead (same effect as *)
```
:::tip Allow-all shorthand
Setting `WHATSAPP_ALLOWED_USERS=*` allows **all** senders (equivalent to `WHATSAPP_ALLOW_ALL_USERS=true`).
This is consistent with [Signal group allowlists](/docs/reference/environment-variables).
To use the pairing flow instead, remove both variables and rely on the
[DM pairing system](/docs/user-guide/security#dm-pairing-system).
:::
Optional behavior settings in `~/.hermes/config.yaml`:
```yaml
unauthorized_dm_behavior: pair
whatsapp:
unauthorized_dm_behavior: ignore
```
- `unauthorized_dm_behavior: pair` is the global default. Unknown DM senders get a pairing code.
- `whatsapp.unauthorized_dm_behavior: ignore` makes WhatsApp stay silent for unauthorized DMs, which is usually the better choice for a private number.
Then start the gateway:
```bash
hermes gateway # Foreground
hermes gateway install # Install as a user service
sudo hermes gateway install --system # Linux only: boot-time system service
```
The gateway starts the WhatsApp bridge automatically using the saved session.
---
## Session Persistence
The Baileys bridge saves its session under `~/.hermes/platforms/whatsapp/session`. This means:
- **Sessions survive restarts** — you don't need to re-scan the QR code every time
- The session data includes encryption keys and device credentials
- **Do not share or commit this session directory** — it grants full access to the WhatsApp account
---
## Re-pairing
If the session breaks (phone reset, WhatsApp update, manually unlinked), you'll see connection
errors in the gateway logs. To fix it:
```bash
hermes whatsapp
```
This generates a fresh QR code. Scan it again and the session is re-established. The gateway
handles **temporary** disconnections (network blips, phone going offline briefly) automatically
with reconnection logic.
---
## Voice Messages
Hermes supports voice on WhatsApp:
- **Incoming:** Voice messages (`.ogg` opus) are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
- **Outgoing:** TTS responses are sent as MP3 audio file attachments
- Agent responses are prefixed with "⚕ **Hermes Agent**" by default. You can customize or disable this in `config.yaml`:
```yaml
# ~/.hermes/config.yaml
whatsapp:
reply_prefix: "" # Empty string disables the header
# reply_prefix: "🤖 *My Bot*\n──────\n" # Custom prefix (supports \n for newlines)
```
---
## Message Formatting & Delivery
WhatsApp supports **streaming (progressive) responses** — the bot edits its message in real-time as the AI generates text, just like Discord and Telegram. Internally, WhatsApp is classified as a TIER_MEDIUM platform for delivery capabilities.
### Chunking
Long responses are automatically split into multiple messages at **4,096 characters** per chunk (WhatsApp's practical display limit). You don't need to configure anything — the gateway handles splitting and sends chunks sequentially.
### WhatsApp-Compatible Markdown
Standard Markdown in AI responses is automatically converted to WhatsApp's native formatting:
| Markdown | WhatsApp | Renders as |
|----------|----------|------------|
| `**bold**` | `*bold*` | **bold** |
| `~~strikethrough~~` | `~strikethrough~` | ~~strikethrough~~ |
| `# Heading` | `*Heading*` | Bold text (no native headings) |
| `[link text](url)` | `link text (url)` | Inline URL |
Code blocks and inline code are preserved as-is since WhatsApp supports triple-backtick formatting natively.
### Tool Progress
When the agent calls tools (web search, file operations, etc.), WhatsApp displays real-time progress indicators showing which tool is running. This is enabled by default — no configuration needed.
---
## Troubleshooting
| Problem | Solution |
|---------|----------|
| **QR code not scanning** | Ensure terminal is wide enough (60+ columns). Try a different terminal. Make sure you're scanning from the correct WhatsApp account (bot number, not personal). |
| **QR code expires** | QR codes refresh every ~20 seconds. If it times out, restart `hermes whatsapp`. |
| **Session not persisting** | Check that `~/.hermes/platforms/whatsapp/session` exists and is writable. If containerized, mount it as a persistent volume. |
| **Logged out unexpectedly** | WhatsApp unlinks devices after long inactivity. Keep the phone on and connected to the network, then re-pair with `hermes whatsapp` if needed. |
| **Bridge crashes or reconnect loops** | Restart the gateway, update Hermes, and re-pair if the session was invalidated by a WhatsApp protocol change. |
| **Bot stops working after WhatsApp update** | Update Hermes to get the latest bridge version, then re-pair. |
| **macOS: "Node.js not installed" but node works in terminal** | launchd services don't inherit your shell PATH. Run `hermes gateway install` to re-snapshot your current PATH into the plist, then `hermes gateway start`. See the [Gateway Service docs](./index.md#macos-launchd) for details. |
| **Messages not being received** | Verify `WHATSAPP_ALLOWED_USERS` includes the sender's number (with country code, no `+` or spaces), or set it to `*` to allow everyone. Set `WHATSAPP_DEBUG=true` in `.env` and restart the gateway to see raw message events in `bridge.log`. |
| **Bot replies to strangers with a pairing code** | Set `whatsapp.unauthorized_dm_behavior: ignore` in `~/.hermes/config.yaml` if you want unauthorized DMs to be silently ignored instead. |
---
## Security
:::warning
**Configure access control** before going live. Set `WHATSAPP_ALLOWED_USERS` with specific
phone numbers (including country code, without the `+`), use `*` to allow everyone, or set
`WHATSAPP_ALLOW_ALL_USERS=true`. Without any of these, the gateway **denies all incoming
messages** as a safety measure.
:::
By default, unauthorized DMs still receive a pairing code reply. If you want a private WhatsApp number to stay completely silent to strangers, set:
```yaml
whatsapp:
unauthorized_dm_behavior: ignore
```
- The `~/.hermes/platforms/whatsapp/session` directory contains full session credentials — protect it like a password
- Set file permissions: `chmod 700 ~/.hermes/platforms/whatsapp/session`
- Use a **dedicated phone number** for the bot to isolate risk from your personal account
- If you suspect compromise, unlink the device from WhatsApp → Settings → Linked Devices
- Phone numbers in logs are partially redacted, but review your log retention policy
---
# Signal
# Signal Setup
Hermes connects to Signal through the [signal-cli](https://github.com/AsamK/signal-cli) daemon running in HTTP mode. The adapter streams messages in real-time via SSE (Server-Sent Events) and sends responses via JSON-RPC.
Signal is the most privacy-focused mainstream messenger — end-to-end encrypted by default, open-source protocol, minimal metadata collection. This makes it ideal for security-sensitive agent workflows.
:::info No New Python Dependencies
The Signal adapter uses `httpx` (already a core Hermes dependency) for all communication. No additional Python packages are required. You just need signal-cli installed externally.
:::
---
## Prerequisites
- **signal-cli** — Java-based Signal client ([GitHub](https://github.com/AsamK/signal-cli))
- **Java 17+** runtime — required by signal-cli
- **A phone number** with Signal installed (for linking as a secondary device)
### Installing signal-cli
```bash
# macOS
brew install signal-cli
# Linux (download latest release)
VERSION=$(curl -Ls -o /dev/null -w %{url_effective} \
https://github.com/AsamK/signal-cli/releases/latest | sed 's/^.*\/v//')
curl -L -O "https://github.com/AsamK/signal-cli/releases/download/v${VERSION}/signal-cli-${VERSION}.tar.gz"
sudo tar xf "signal-cli-${VERSION}.tar.gz" -C /opt
sudo ln -sf "/opt/signal-cli-${VERSION}/bin/signal-cli" /usr/local/bin/
```
:::caution
signal-cli is **not** in apt or snap repositories. The Linux install above downloads directly from [GitHub releases](https://github.com/AsamK/signal-cli/releases).
:::
---
## Step 1: Link Your Signal Account
Signal-cli works as a **linked device** — like WhatsApp Web, but for Signal. Your phone stays the primary device.
```bash
# Generate a linking URI (displays a QR code or link)
signal-cli link -n "HermesAgent"
```
1. Open **Signal** on your phone
2. Go to **Settings → Linked Devices**
3. Tap **Link New Device**
4. Scan the QR code or enter the URI
---
## Step 2: Start the signal-cli Daemon
```bash
# Replace +1234567890 with your Signal phone number (E.164 format)
signal-cli --account +1234567890 daemon --http 127.0.0.1:8080
```
:::tip
Keep this running in the background. You can use `systemd`, `tmux`, `screen`, or run it as a service.
:::
Verify it's running:
```bash
curl http://127.0.0.1:8080/api/v1/check
# Should return: {"versions":{"signal-cli":...}}
```
---
## Step 3: Configure Hermes
The easiest way:
```bash
hermes gateway setup
```
Select **Signal** from the platform menu. The wizard will:
1. Check if signal-cli is installed
2. Prompt for the HTTP URL (default: `http://127.0.0.1:8080`)
3. Test connectivity to the daemon
4. Ask for your account phone number
5. Configure allowed users and access policies
### Manual Configuration
Add to `~/.hermes/.env`:
```bash
# Required
SIGNAL_HTTP_URL=http://127.0.0.1:8080
SIGNAL_ACCOUNT=+1234567890
# Security (recommended)
SIGNAL_ALLOWED_USERS=+1234567890,+0987654321 # Comma-separated E.164 numbers or UUIDs
# Optional
SIGNAL_GROUP_ALLOWED_USERS=groupId1,groupId2 # Enable groups (omit to disable, * for all)
SIGNAL_HOME_CHANNEL=+1234567890 # Default delivery target for cron jobs
```
Then start the gateway:
```bash
hermes gateway # Foreground
hermes gateway install # Install as a user service
sudo hermes gateway install --system # Linux only: boot-time system service
```
---
## Access Control
### DM Access
DM access follows the same pattern as all other Hermes platforms:
1. **`SIGNAL_ALLOWED_USERS` set** → only those users can message
2. **No allowlist set** → unknown users get a DM pairing code (approve via `hermes pairing approve signal CODE`)
3. **`SIGNAL_ALLOW_ALL_USERS=true`** → anyone can message (use with caution)
### Group Access
Group access is controlled by the `SIGNAL_GROUP_ALLOWED_USERS` env var:
| Configuration | Behavior |
|---------------|----------|
| Not set (default) | All group messages are ignored. The bot only responds to DMs. |
| Set with group IDs | Only listed groups are monitored (e.g., `groupId1,groupId2`). |
| Set to `*` | The bot responds in any group it's a member of. |
---
## Features
### Attachments
The adapter supports sending and receiving media in both directions.
**Incoming** (user → agent):
- **Images** — PNG, JPEG, GIF, WebP (auto-detected via magic bytes)
- **Audio** — MP3, OGG, WAV, M4A (voice messages transcribed if Whisper is configured)
- **Documents** — PDF, ZIP, and other file types
**Outgoing** (agent → user):
The agent can send media files via `MEDIA:` tags in responses. The following delivery methods are supported:
- **Images** — `send_multiple_images` and `send_image_file` send PNG, JPEG, GIF, WebP as native Signal attachments
- **Voice** — `send_voice` sends audio files (OGG, MP3, WAV, M4A, AAC) as attachments
- **Video** — `send_video` sends MP4 video files
- **Documents** — `send_document` sends any file type (PDF, ZIP, etc.)
All outgoing media goes through Signal's standard attachment API. Unlike some platforms, Signal does not distinguish between voice messages and file attachments at the protocol level.
Attachment size limit: **100 MB** (both directions).
:::warning
**Signal servers will rate-limit attachment uploads**, the adapter uses a scheduler for multiple image sending that batches images in groups of 32 and throttles uploads to match the Signal server policy.
:::
### Native Formatting, Reply Quotes, and Reactions
Signal messages render with **native formatting** instead of literal markdown characters. The adapter converts markdown (`**bold**`, `*italic*`, `` `code` ``, `~~strike~~`, `||spoiler||`, headings) into Signal `bodyRanges` so the text shows up with real styling on the recipient's client rather than as visible `**` / `` ` `` characters.
**Reply quotes.** When Hermes replies to a specific message, it now posts a native reply that quotes the original — same UI affordance Signal users see when they use "Reply" themselves. This is automatic for replies generated in response to an inbound message.
**Reactions.** The agent can react to messages via the standard reaction API; reactions surface in Signal as emoji reactions on the referenced message rather than as extra text.
None of this requires additional config — it ships on by default in recent signal-cli builds. If your `signal-cli` version is too old, Hermes falls back to plaintext delivery and logs a one-time warning.
### Typing Indicators
The bot sends typing indicators while processing messages, refreshing every 8 seconds.
### Phone Number Redaction
All phone numbers are automatically redacted in logs:
- `+15551234567` → `+155****4567`
- This applies to both Hermes gateway logs and the global redaction system
### Note to Self (Single-Number Setup)
If you run signal-cli as a **linked secondary device** on your own phone number (rather than a separate bot number), you can interact with Hermes through Signal's "Note to Self" feature.
Just send a message to yourself from your phone — signal-cli picks it up and Hermes responds in the same conversation.
**How it works:**
- "Note to Self" messages arrive as `syncMessage.sentMessage` envelopes
- The adapter detects when these are addressed to the bot's own account and processes them as regular inbound messages
- Echo-back protection (sent-timestamp tracking) prevents infinite loops — the bot's own replies are filtered out automatically
**No extra configuration needed.** This works automatically as long as `SIGNAL_ACCOUNT` matches your phone number.
### Health Monitoring
The adapter monitors the SSE connection and automatically reconnects if:
- The connection drops (with exponential backoff: 2s → 60s)
- No activity is detected for 120 seconds (pings signal-cli to verify)
---
## Troubleshooting
| Problem | Solution |
|---------|----------|
| **"Cannot reach signal-cli"** during setup | Ensure signal-cli daemon is running: `signal-cli --account +YOUR_NUMBER daemon --http 127.0.0.1:8080` |
| **Messages not received** | Check that `SIGNAL_ALLOWED_USERS` includes the sender's number in E.164 format (with `+` prefix) |
| **"signal-cli not found on PATH"** | Install signal-cli and ensure it's in your PATH, or use Docker |
| **Connection keeps dropping** | Check signal-cli logs for errors. Ensure Java 17+ is installed. |
| **Group messages ignored** | Configure `SIGNAL_GROUP_ALLOWED_USERS` with specific group IDs, or `*` to allow all groups. |
| **Bot responds to no one** | Configure `SIGNAL_ALLOWED_USERS`, use DM pairing, or explicitly allow all users through gateway policy if you want broader access. |
| **Duplicate messages** | Ensure only one signal-cli instance is listening on your phone number |
---
## Security
:::warning
**Always configure access controls.** The bot has terminal access by default. Without `SIGNAL_ALLOWED_USERS` or DM pairing, the gateway denies all incoming messages as a safety measure.
:::
- Phone numbers are redacted in all log output
- Use DM pairing or explicit allowlists for safe onboarding of new users
- Keep groups disabled unless you specifically need group support, or allowlist only the groups you trust
- Signal's end-to-end encryption protects message content in transit
- The signal-cli session data in `~/.local/share/signal-cli/` contains account credentials — protect it like a password
---
## Environment Variables Reference
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `SIGNAL_HTTP_URL` | Yes | — | signal-cli HTTP endpoint |
| `SIGNAL_ACCOUNT` | Yes | — | Bot phone number (E.164) |
| `SIGNAL_ALLOWED_USERS` | No | — | Comma-separated phone numbers/UUIDs |
| `SIGNAL_GROUP_ALLOWED_USERS` | No | — | Group IDs to monitor, or `*` for all (omit to disable groups) |
| `SIGNAL_ALLOW_ALL_USERS` | No | `false` | Allow any user to interact (skip allowlist) |
| `SIGNAL_HOME_CHANNEL` | No | — | Default delivery target for cron jobs |
---
# Email
# Email Setup
Hermes can receive and reply to emails using standard IMAP and SMTP protocols. Send an email to the agent's address and it replies in-thread — no special client or bot API needed. Works with Gmail, Outlook, Yahoo, Fastmail, or any provider that supports IMAP/SMTP.
:::info No External Dependencies
The Email adapter uses Python's built-in `imaplib`, `smtplib`, and `email` modules. No additional packages or external services are required.
:::
---
## Prerequisites
- **A dedicated email account** for your Hermes agent (don't use your personal email)
- **IMAP enabled** on the email account
- **An app password** if using Gmail or another provider with 2FA
### Gmail Setup
1. Enable 2-Factor Authentication on your Google Account
2. Go to [App Passwords](https://myaccount.google.com/apppasswords)
3. Create a new App Password (select "Mail" or "Other")
4. Copy the 16-character password — you'll use this instead of your regular password
### Outlook / Microsoft 365
1. Go to [Security Settings](https://account.microsoft.com/security)
2. Enable 2FA if not already active
3. Create an App Password under "Additional security options"
4. IMAP host: `outlook.office365.com`, SMTP host: `smtp.office365.com`
### Other Providers
Most email providers support IMAP/SMTP. Check your provider's documentation for:
- IMAP host and port (usually port 993 with SSL)
- SMTP host and port (usually port 587 with STARTTLS)
- Whether app passwords are required
---
## Step 1: Configure Hermes
The easiest way:
```bash
hermes gateway setup
```
Select **Email** from the platform menu. The wizard prompts for your email address, password, IMAP/SMTP hosts, and allowed senders.
### Manual Configuration
Add to `~/.hermes/.env`:
```bash
# Required
EMAIL_ADDRESS=hermes@gmail.com
EMAIL_PASSWORD=abcd efgh ijkl mnop # App password (not your regular password)
EMAIL_IMAP_HOST=imap.gmail.com
EMAIL_SMTP_HOST=smtp.gmail.com
# Security (recommended)
EMAIL_ALLOWED_USERS=your@email.com,colleague@work.com
# Optional
EMAIL_IMAP_PORT=993 # Default: 993 (IMAP SSL)
EMAIL_SMTP_PORT=587 # Default: 587 (SMTP STARTTLS)
EMAIL_POLL_INTERVAL=15 # Seconds between inbox checks (default: 15)
EMAIL_HOME_ADDRESS=your@email.com # Default delivery target for cron jobs
```
---
## Step 2: Start the Gateway
```bash
hermes gateway # Run in foreground
hermes gateway install # Install as a user service
sudo hermes gateway install --system # Linux only: boot-time system service
```
On startup, the adapter:
1. Tests IMAP and SMTP connections
2. Marks all existing inbox messages as "seen" (only processes new emails)
3. Starts polling for new messages
---
## How It Works
### Receiving Messages
The adapter polls the IMAP inbox for UNSEEN messages at a configurable interval (default: 15 seconds). For each new email:
- **Subject line** is included as context (e.g., `[Subject: Deploy to production]`)
- **Reply emails** (subject starting with `Re:`) skip the subject prefix — the thread context is already established
- **Attachments** are cached locally:
- Images (JPEG, PNG, GIF, WebP) → available to the vision tool
- Documents (PDF, ZIP, etc.) → available for file access
- **HTML-only emails** have tags stripped for plain text extraction
- **Self-messages** are filtered out to prevent reply loops
- **Automated/noreply senders** are silently ignored — `noreply@`, `mailer-daemon@`, `bounce@`, `no-reply@`, and emails with `Auto-Submitted`, `Precedence: bulk`, or `List-Unsubscribe` headers
### Sending Replies
Replies are sent via SMTP with proper email threading:
- **In-Reply-To** and **References** headers maintain the thread
- **Subject line** preserved with `Re:` prefix (no double `Re: Re:`)
- **Message-ID** generated with the agent's domain
- Responses are sent as plain text (UTF-8)
### File Attachments
The agent can send file attachments in replies. Include `MEDIA:/path/to/file` in the response and the file is attached to the outgoing email.
### Skipping Attachments
To ignore all incoming attachments (for malware protection or bandwidth savings), add to your `config.yaml`:
```yaml
platforms:
email:
skip_attachments: true
```
When enabled, attachment and inline parts are skipped before payload decoding. The email body text is still processed normally.
---
## Access Control
Email access follows the same pattern as all other Hermes platforms:
1. **`EMAIL_ALLOWED_USERS` set** → only emails from those addresses are processed
2. **No allowlist set** → unknown senders get a pairing code
3. **`EMAIL_ALLOW_ALL_USERS=true`** → any sender is accepted (use with caution)
:::warning
**Always configure `EMAIL_ALLOWED_USERS`.** Without it, anyone who knows the agent's email address could send commands. The agent has terminal access by default.
:::
---
## Troubleshooting
| Problem | Solution |
|---------|----------|
| **"IMAP connection failed"** at startup | Verify `EMAIL_IMAP_HOST` and `EMAIL_IMAP_PORT`. Ensure IMAP is enabled on the account. For Gmail, enable it in Settings → Forwarding and POP/IMAP. |
| **"SMTP connection failed"** at startup | Verify `EMAIL_SMTP_HOST` and `EMAIL_SMTP_PORT`. Check that your password is correct (use App Password for Gmail). |
| **Messages not received** | Check `EMAIL_ALLOWED_USERS` includes the sender's email. Check spam folder — some providers flag automated replies. |
| **"Authentication failed"** | For Gmail, you must use an App Password, not your regular password. Ensure 2FA is enabled first. |
| **Duplicate replies** | Ensure only one gateway instance is running. Check `hermes gateway status`. |
| **Slow response** | The default poll interval is 15 seconds. Reduce with `EMAIL_POLL_INTERVAL=5` for faster response (but more IMAP connections). |
| **Replies not threading** | The adapter uses In-Reply-To headers. Some email clients (especially web-based) may not thread correctly with automated messages. |
---
## Security
:::warning
**Use a dedicated email account.** Don't use your personal email — the agent stores the password in `.env` and has full inbox access via IMAP.
:::
- Use **App Passwords** instead of your main password (required for Gmail with 2FA)
- Set `EMAIL_ALLOWED_USERS` to restrict who can interact with the agent
- The password is stored in `~/.hermes/.env` — protect this file (`chmod 600`)
- IMAP uses SSL (port 993) and SMTP uses STARTTLS (port 587) by default — connections are encrypted
---
## Environment Variables Reference
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `EMAIL_ADDRESS` | Yes | — | Agent's email address |
| `EMAIL_PASSWORD` | Yes | — | Email password or app password |
| `EMAIL_IMAP_HOST` | Yes | — | IMAP server host (e.g., `imap.gmail.com`) |
| `EMAIL_SMTP_HOST` | Yes | — | SMTP server host (e.g., `smtp.gmail.com`) |
| `EMAIL_IMAP_PORT` | No | `993` | IMAP server port |
| `EMAIL_SMTP_PORT` | No | `587` | SMTP server port |
| `EMAIL_POLL_INTERVAL` | No | `15` | Seconds between inbox checks |
| `EMAIL_ALLOWED_USERS` | No | — | Comma-separated allowed sender addresses |
| `EMAIL_HOME_ADDRESS` | No | — | Default delivery target for cron jobs |
| `EMAIL_ALLOW_ALL_USERS` | No | `false` | Allow all senders (not recommended) |
---
# SMS (Twilio)
# SMS Setup (Twilio)
Hermes connects to SMS through the [Twilio](https://www.twilio.com/) API. People text your Twilio phone number and get AI responses back — same conversational experience as Telegram or Discord, but over standard text messages.
:::info Shared Credentials
The SMS gateway shares credentials with the optional [telephony skill](/docs/reference/skills-catalog). If you've already set up Twilio for voice calls or one-off SMS, the gateway works with the same `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`, and `TWILIO_PHONE_NUMBER`.
:::
---
## Prerequisites
- **Twilio account** — [Sign up at twilio.com](https://www.twilio.com/try-twilio) (free trial available)
- **A Twilio phone number** with SMS capability
- **A publicly accessible server** — Twilio sends webhooks to your server when SMS arrives
- **aiohttp** — `pip install 'hermes-agent[sms]'`
---
## Step 1: Get Your Twilio Credentials
1. Go to the [Twilio Console](https://console.twilio.com/)
2. Copy your **Account SID** and **Auth Token** from the dashboard
3. Go to **Phone Numbers → Manage → Active Numbers** — note your phone number in E.164 format (e.g., `+15551234567`)
---
## Step 2: Configure Hermes
### Interactive setup (recommended)
```bash
hermes gateway setup
```
Select **SMS (Twilio)** from the platform list. The wizard will prompt for your credentials.
### Manual setup
Add to `~/.hermes/.env`:
```bash
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=your_auth_token_here
TWILIO_PHONE_NUMBER=+15551234567
# Security: restrict to specific phone numbers (recommended)
SMS_ALLOWED_USERS=+15559876543,+15551112222
# Optional: set a home channel for cron job delivery
SMS_HOME_CHANNEL=+15559876543
```
---
## Step 3: Configure Twilio Webhook
Twilio needs to know where to send incoming messages. In the [Twilio Console](https://console.twilio.com/):
1. Go to **Phone Numbers → Manage → Active Numbers**
2. Click your phone number
3. Under **Messaging → A MESSAGE COMES IN**, set:
- **Webhook**: `https://your-server:8080/webhooks/twilio`
- **HTTP Method**: `POST`
:::tip Exposing Your Webhook
If you're running Hermes locally, use a tunnel to expose the webhook:
```bash
# Using cloudflared
cloudflared tunnel --url http://localhost:8080
# Using ngrok
ngrok http 8080
```
Set the resulting public URL as your Twilio webhook.
:::
**Set `SMS_WEBHOOK_URL` to the same URL you configured in Twilio.** This is required for Twilio signature validation — the adapter will refuse to start without it:
```bash
# Must match the webhook URL in your Twilio Console
SMS_WEBHOOK_URL=https://your-server:8080/webhooks/twilio
```
The webhook port defaults to `8080`. Override with:
```bash
SMS_WEBHOOK_PORT=3000
```
---
## Step 4: Start the Gateway
```bash
hermes gateway
```
You should see:
```
[sms] Twilio webhook server listening on 0.0.0.0:8080, from: +1555***4567
```
If you see `Refusing to start: SMS_WEBHOOK_URL is required`, set `SMS_WEBHOOK_URL` to the public URL configured in your Twilio Console (see Step 3).
Text your Twilio number — Hermes will respond via SMS.
---
## Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `TWILIO_ACCOUNT_SID` | Yes | Twilio Account SID (starts with `AC`) |
| `TWILIO_AUTH_TOKEN` | Yes | Twilio Auth Token (also used for webhook signature validation) |
| `TWILIO_PHONE_NUMBER` | Yes | Your Twilio phone number (E.164 format) |
| `SMS_WEBHOOK_URL` | Yes | Public URL for Twilio signature validation — must match the webhook URL in your Twilio Console |
| `SMS_WEBHOOK_PORT` | No | Webhook listener port (default: `8080`) |
| `SMS_WEBHOOK_HOST` | No | Webhook bind address (default: `0.0.0.0`) |
| `SMS_INSECURE_NO_SIGNATURE` | No | Set to `true` to disable signature validation (local dev only — **not for production**) |
| `SMS_ALLOWED_USERS` | No | Comma-separated E.164 phone numbers allowed to chat |
| `SMS_ALLOW_ALL_USERS` | No | Set to `true` to allow anyone (not recommended) |
| `SMS_HOME_CHANNEL` | No | Phone number for cron job / notification delivery |
| `SMS_HOME_CHANNEL_NAME` | No | Display name for the home channel (default: `Home`) |
---
## SMS-Specific Behavior
- **Plain text only** — Markdown is automatically stripped since SMS renders it as literal characters
- **1600 character limit** — Longer responses are split across multiple messages at natural boundaries (newlines, then spaces)
- **Echo prevention** — Messages from your own Twilio number are ignored to prevent loops
- **Phone number redaction** — Phone numbers are redacted in logs for privacy
---
## Security
### Webhook signature validation
Hermes validates that inbound webhooks genuinely originate from Twilio by verifying the `X-Twilio-Signature` header (HMAC-SHA1). This prevents attackers from injecting forged messages.
**`SMS_WEBHOOK_URL` is required.** Set it to the public URL configured in your Twilio Console. The adapter will refuse to start without it.
For local development without a public URL, you can disable validation:
```bash
# Local dev only — NOT for production
SMS_INSECURE_NO_SIGNATURE=true
```
### User allowlists
**The gateway denies all users by default.** Configure an allowlist:
```bash
# Recommended: restrict to specific phone numbers
SMS_ALLOWED_USERS=+15559876543,+15551112222
# Or allow all (NOT recommended for bots with terminal access)
SMS_ALLOW_ALL_USERS=true
```
:::warning
SMS has no built-in encryption. Don't use SMS for sensitive operations unless you understand the security implications. For sensitive use cases, prefer Signal or Telegram.
:::
---
## Troubleshooting
### Messages not arriving
1. Check your Twilio webhook URL is correct and publicly accessible
2. Verify `TWILIO_ACCOUNT_SID` and `TWILIO_AUTH_TOKEN` are correct
3. Check the Twilio Console → **Monitor → Logs → Messaging** for delivery errors
4. Ensure your phone number is in `SMS_ALLOWED_USERS` (or `SMS_ALLOW_ALL_USERS=true`)
### Replies not sending
1. Check `TWILIO_PHONE_NUMBER` is set correctly (E.164 format with `+`)
2. Verify your Twilio account has SMS-capable numbers
3. Check Hermes gateway logs for Twilio API errors
### Webhook port conflicts
If port 8080 is already in use, change it:
```bash
SMS_WEBHOOK_PORT=3001
```
Update the webhook URL in Twilio Console to match.
---
# Matrix
# Matrix Setup
Hermes Agent integrates with Matrix, the open, federated messaging protocol. Matrix lets you run your own homeserver or use a public one like matrix.org — either way, you keep control of your communications. The bot connects via the `mautrix` Python SDK, processes messages through the Hermes Agent pipeline (including tool use, memory, and reasoning), and responds in real time. It supports text, file attachments, images, audio, video, and optional end-to-end encryption (E2EE).
Hermes works with any Matrix homeserver — Synapse, Conduit, Dendrite, or matrix.org.
Before setup, here's the part most people want to know: how Hermes behaves once it's connected.
## How Hermes Behaves
| Context | Behavior |
|---------|----------|
| **DMs** | Hermes responds to every message. No `@mention` needed. Each DM has its own session. Set `MATRIX_DM_MENTION_THREADS=true` to start a thread when the bot is `@mentioned` in a DM. |
| **Rooms** | By default, Hermes requires an `@mention` to respond. Set `MATRIX_REQUIRE_MENTION=false` or add room IDs to `MATRIX_FREE_RESPONSE_ROOMS` for free-response rooms. Room invites are auto-accepted. |
| **Threads** | Hermes supports Matrix threads (MSC3440). If you reply in a thread, Hermes keeps the thread context isolated from the main room timeline. Threads where the bot has already participated do not require a mention. |
| **Auto-threading** | By default, Hermes auto-creates a thread for each message it responds to in a room. This keeps conversations isolated. Set `MATRIX_AUTO_THREAD=false` to disable. |
| **Shared rooms with multiple users** | By default, Hermes isolates session history per user inside the room. Two people talking in the same room do not share one transcript unless you explicitly disable that. |
:::tip
The bot automatically joins rooms when invited. Just invite the bot's Matrix user to any room and it will join and start responding.
:::
### Session Model in Matrix
By default:
- each DM gets its own session
- each thread gets its own session namespace
- each user in a shared room gets their own session inside that room
This is controlled by `config.yaml`:
```yaml
group_sessions_per_user: true
```
Set it to `false` only if you explicitly want one shared conversation for the entire room:
```yaml
group_sessions_per_user: false
```
Shared sessions can be useful for a collaborative room, but they also mean:
- users share context growth and token costs
- one person's long tool-heavy task can bloat everyone else's context
- one person's in-flight run can interrupt another person's follow-up in the same room
### Mention and Threading Configuration
You can configure mention and auto-threading behavior via environment variables or `config.yaml`:
```yaml
matrix:
require_mention: true # Require @mention in rooms (default: true)
free_response_rooms: # Rooms exempt from mention requirement
- "!abc123:matrix.org"
auto_thread: true # Auto-create threads for responses (default: true)
dm_mention_threads: false # Create thread when @mentioned in DM (default: false)
```
Or via environment variables:
```bash
MATRIX_REQUIRE_MENTION=true
MATRIX_FREE_RESPONSE_ROOMS=!abc123:matrix.org,!def456:matrix.org
MATRIX_AUTO_THREAD=true
MATRIX_DM_MENTION_THREADS=false
MATRIX_REACTIONS=true # default: true — emoji reactions during processing
```
:::tip Disabling reactions
`MATRIX_REACTIONS=false` turns off the processing-lifecycle emoji reactions (👀/✅/❌) the bot posts on inbound messages. Useful for rooms where reaction events are noisy or aren't supported by all participating clients.
:::
:::note
If you are upgrading from a version that did not have `MATRIX_REQUIRE_MENTION`, the bot previously responded to all messages in rooms. To preserve that behavior, set `MATRIX_REQUIRE_MENTION=false`.
:::
This guide walks you through the full setup process — from creating your bot account to sending your first message.
## Step 1: Create a Bot Account
You need a Matrix user account for the bot. There are several ways to do this:
### Option A: Register on Your Homeserver (Recommended)
If you run your own homeserver (Synapse, Conduit, Dendrite):
1. Use the admin API or registration tool to create a new user:
```bash
# Synapse example
register_new_matrix_user -c /etc/synapse/homeserver.yaml http://localhost:8008
```
2. Choose a username like `hermes` — the full user ID will be `@hermes:your-server.org`.
### Option B: Use matrix.org or Another Public Homeserver
1. Go to [Element Web](https://app.element.io) and create a new account.
2. Pick a username for your bot (e.g., `hermes-bot`).
### Option C: Use Your Own Account
You can also run Hermes as your own user. This means the bot posts as you — useful for personal assistants.
## Step 2: Get an Access Token
Hermes needs an access token to authenticate with the homeserver. You have two options:
### Option A: Access Token (Recommended)
The most reliable way to get a token:
**Via Element:**
1. Log in to [Element](https://app.element.io) with the bot account.
2. Go to **Settings** → **Help & About**.
3. Scroll down and expand **Advanced** — the access token is displayed there.
4. **Copy it immediately.**
**Via the API:**
```bash
curl -X POST https://your-server/_matrix/client/v3/login \
-H "Content-Type: application/json" \
-d '{
"type": "m.login.password",
"user": "@hermes:your-server.org",
"password": "your-password"
}'
```
The response includes an `access_token` field — copy it.
:::warning[Keep your access token safe]
The access token gives full access to the bot's Matrix account. Never share it publicly or commit it to Git. If compromised, revoke it by logging out all sessions for that user.
:::
### Option B: Password Login
Instead of providing an access token, you can give Hermes the bot's user ID and password. Hermes will log in automatically on startup. This is simpler but means the password is stored in your `.env` file.
```bash
MATRIX_USER_ID=@hermes:your-server.org
MATRIX_PASSWORD=your-password
```
## Step 3: Find Your Matrix User ID
Hermes Agent uses your Matrix User ID to control who can interact with the bot. Matrix User IDs follow the format `@username:server`.
To find yours:
1. Open [Element](https://app.element.io) (or your preferred Matrix client).
2. Click your avatar → **Settings**.
3. Your User ID is displayed at the top of the profile (e.g., `@alice:matrix.org`).
:::tip
Matrix User IDs always start with `@` and contain a `:` followed by the server name. For example: `@alice:matrix.org`, `@bob:your-server.com`.
:::
## Step 4: Configure Hermes Agent
### Option A: Interactive Setup (Recommended)
Run the guided setup command:
```bash
hermes gateway setup
```
Select **Matrix** when prompted, then provide your homeserver URL, access token (or user ID + password), and allowed user IDs when asked.
### Option B: Manual Configuration
Add the following to your `~/.hermes/.env` file:
**Using an access token:**
```bash
# Required
MATRIX_HOMESERVER=https://matrix.example.org
MATRIX_ACCESS_TOKEN=***
# Optional: user ID (auto-detected from token if omitted)
# MATRIX_USER_ID=@hermes:matrix.example.org
# Security: restrict who can interact with the bot
MATRIX_ALLOWED_USERS=@alice:matrix.example.org
# Multiple allowed users (comma-separated)
# MATRIX_ALLOWED_USERS=@alice:matrix.example.org,@bob:matrix.example.org
```
**Using password login:**
```bash
# Required
MATRIX_HOMESERVER=https://matrix.example.org
MATRIX_USER_ID=@hermes:matrix.example.org
MATRIX_PASSWORD=***
# Security
MATRIX_ALLOWED_USERS=@alice:matrix.example.org
```
Optional behavior settings in `~/.hermes/config.yaml`:
```yaml
group_sessions_per_user: true
```
- `group_sessions_per_user: true` keeps each participant's context isolated inside shared rooms
### Start the Gateway
Once configured, start the Matrix gateway:
```bash
hermes gateway
```
The bot should connect to your homeserver and start syncing within a few seconds. Send it a message — either a DM or in a room it has joined — to test.
:::tip
You can run `hermes gateway` in the background or as a systemd service for persistent operation. See the deployment docs for details.
:::
## End-to-End Encryption (E2EE)
Hermes supports Matrix end-to-end encryption, so you can chat with your bot in encrypted rooms.
### Requirements
E2EE requires the `mautrix` library with encryption extras and the `libolm` C library:
```bash
# Install mautrix with E2EE support
pip install 'mautrix[encryption]'
# Or install with hermes extras
pip install 'hermes-agent[matrix]'
```
You also need `libolm` installed on your system:
```bash
# Debian/Ubuntu
sudo apt install libolm-dev
# macOS
brew install libolm
# Fedora
sudo dnf install libolm-devel
```
### Enable E2EE
Add to your `~/.hermes/.env`:
```bash
MATRIX_ENCRYPTION=true
```
When E2EE is enabled, Hermes:
- Stores encryption keys in `~/.hermes/platforms/matrix/store/` (legacy installs: `~/.hermes/matrix/store/`)
- Uploads device keys on first connection
- Decrypts incoming messages and encrypts outgoing messages automatically
- Auto-joins encrypted rooms when invited
### Cross-Signing Verification (Recommended)
If your Matrix account has cross-signing enabled (the default in Element), set the recovery key so the bot can self-sign its device on startup. Without this, other Matrix clients may refuse to share encryption sessions with the bot after a device key rotation.
```bash
MATRIX_RECOVERY_KEY=EsT... your recovery key here
```
**Where to find it:** In Element, go to **Settings** → **Security & Privacy** → **Encryption** → your recovery key (also called the "Security Key"). This is the key you were asked to save when you first set up cross-signing.
On each startup, if `MATRIX_RECOVERY_KEY` is set, Hermes imports cross-signing keys from the homeserver's secure secret storage and signs the current device. This is idempotent and safe to leave enabled permanently.
:::warning[Deleting the crypto store]
If you delete `~/.hermes/platforms/matrix/store/crypto.db`, the bot loses its encryption identity. Simply restarting with the same device ID will **not** fully recover — the homeserver still holds one-time keys signed with the old identity key, and peers cannot establish new Olm sessions.
Hermes detects this condition on startup and refuses to enable E2EE, logging: `device XXXX has stale one-time keys on the server signed with a previous identity key`.
**Easiest recovery: generate a new access token** (which gets a fresh device ID with no stale key history). See the "Upgrading from a previous version with E2EE" section below. This is the most reliable path and avoids touching the homeserver database.
**Manual recovery** (advanced — keeps the same device ID):
1. Stop Synapse and delete the old device from its database:
```bash
sudo systemctl stop matrix-synapse
sudo sqlite3 /var/lib/matrix-synapse/homeserver.db "
DELETE FROM e2e_device_keys_json WHERE device_id = 'DEVICE_ID' AND user_id = '@hermes:your-server';
DELETE FROM e2e_one_time_keys_json WHERE device_id = 'DEVICE_ID' AND user_id = '@hermes:your-server';
DELETE FROM e2e_fallback_keys_json WHERE device_id = 'DEVICE_ID' AND user_id = '@hermes:your-server';
DELETE FROM devices WHERE device_id = 'DEVICE_ID' AND user_id = '@hermes:your-server';
"
sudo systemctl start matrix-synapse
```
Or via the Synapse admin API (note the URL-encoded user ID):
```bash
curl -X DELETE -H "Authorization: Bearer ADMIN_TOKEN" \
'https://your-server/_synapse/admin/v2/users/%40hermes%3Ayour-server/devices/DEVICE_ID'
```
Note: deleting a device via the admin API may also invalidate the associated access token. You may need to generate a new token afterward.
2. Delete the local crypto store and restart Hermes:
```bash
rm -f ~/.hermes/platforms/matrix/store/crypto.db*
# restart hermes
```
Other Matrix clients (Element, matrix-commander) may cache the old device keys. After recovery, type `/discardsession` in Element to force a new encryption session with the bot.
:::
:::info
If `mautrix[encryption]` is not installed or `libolm` is missing, the bot falls back to a plain (unencrypted) client automatically. You'll see a warning in the logs.
:::
## Home Room
You can designate a "home room" where the bot sends proactive messages (such as cron job output, reminders, and notifications). There are two ways to set it:
### Using the Slash Command
Type `/sethome` in any Matrix room where the bot is present. That room becomes the home room.
### Manual Configuration
Add this to your `~/.hermes/.env`:
```bash
MATRIX_HOME_ROOM=!abc123def456:matrix.example.org
```
:::tip
To find a Room ID: in Element, go to the room → **Settings** → **Advanced** → the **Internal room ID** is shown there (starts with `!`).
:::
## Troubleshooting
### Bot is not responding to messages
**Cause**: The bot hasn't joined the room, or `MATRIX_ALLOWED_USERS` doesn't include your User ID.
**Fix**: Invite the bot to the room — it auto-joins on invite. Verify your User ID is in `MATRIX_ALLOWED_USERS` (use the full `@user:server` format). Restart the gateway.
### "Failed to authenticate" / "whoami failed" on startup
**Cause**: The access token or homeserver URL is incorrect.
**Fix**: Verify `MATRIX_HOMESERVER` points to your homeserver (include `https://`, no trailing slash). Check that `MATRIX_ACCESS_TOKEN` is valid — try it with curl:
```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
https://your-server/_matrix/client/v3/account/whoami
```
If this returns your user info, the token is valid. If it returns an error, generate a new token.
### "mautrix not installed" error
**Cause**: The `mautrix` Python package is not installed.
**Fix**: Install it:
```bash
pip install 'mautrix[encryption]'
```
Or with Hermes extras:
```bash
pip install 'hermes-agent[matrix]'
```
### Encryption errors / "could not decrypt event"
**Cause**: Missing encryption keys, `libolm` not installed, or the bot's device isn't trusted.
**Fix**:
1. Verify `libolm` is installed on your system (see the E2EE section above).
2. Make sure `MATRIX_ENCRYPTION=true` is set in your `.env`.
3. In your Matrix client (Element), go to the bot's profile -> Sessions -> verify/trust the bot's device.
4. If the bot just joined an encrypted room, it can only decrypt messages sent *after* it joined. Older messages are inaccessible.
### Upgrading from a previous version with E2EE
:::tip
If you also manually deleted `crypto.db`, see the "Deleting the crypto store" warning in the E2EE section above — there are additional steps to clear stale one-time keys from the homeserver.
:::
If you previously used Hermes with `MATRIX_ENCRYPTION=true` and are upgrading to
a version that uses the new SQLite-based crypto store, the bot's encryption
identity has changed. Your Matrix client (Element) may cache the old device keys
and refuse to share encryption sessions with the bot.
**Symptoms**: The bot connects and shows "E2EE enabled" in the logs, but all
messages show "could not decrypt event" and the bot never responds.
**What's happening**: The old encryption state (from the previous `matrix-nio` or
serialization-based `mautrix` backend) is incompatible with the new SQLite crypto
store. The bot creates a fresh encryption identity, but your Matrix client still
has the old keys cached and won't share the room's encryption session with a
device whose keys changed. This is a Matrix security feature -- clients treat
changed identity keys for the same device as suspicious.
**Fix** (one-time migration):
1. **Generate a new access token** to get a fresh device ID. The simplest way:
```bash
curl -X POST https://your-server/_matrix/client/v3/login \
-H "Content-Type: application/json" \
-d '{
"type": "m.login.password",
"identifier": {"type": "m.id.user", "user": "@hermes:your-server.org"},
"password": "***",
"initial_device_display_name": "Hermes Agent"
}'
```
Copy the new `access_token` and update `MATRIX_ACCESS_TOKEN` in `~/.hermes/.env`.
2. **Delete old encryption state**:
```bash
rm -f ~/.hermes/platforms/matrix/store/crypto.db
rm -f ~/.hermes/platforms/matrix/store/crypto_store.*
```
3. **Set your recovery key** (if you use cross-signing — most Element users do). Add to `~/.hermes/.env`:
```bash
MATRIX_RECOVERY_KEY=EsT... your recovery key here
```
This lets the bot self-sign with cross-signing keys on startup, so Element trusts the new device immediately. Without this, Element may see the new device as unverified and refuse to share encryption sessions. Find your recovery key in Element under **Settings** → **Security & Privacy** → **Encryption**.
4. **Force your Matrix client to rotate the encryption session**. In Element,
open the DM room with the bot and type `/discardsession`. This forces Element
to create a new encryption session and share it with the bot's new device.
5. **Restart the gateway**:
```bash
hermes gateway run
```
If `MATRIX_RECOVERY_KEY` is set, you should see `Matrix: cross-signing verified via recovery key` in the logs.
6. **Send a new message**. The bot should decrypt and respond normally.
:::note
After migration, messages sent *before* the upgrade cannot be decrypted -- the old
encryption keys are gone. This only affects the transition; new messages work
normally.
:::
:::tip
**New installations are not affected.** This migration is only needed if you had
a working E2EE setup with a previous version of Hermes and are upgrading.
**Why a new access token?** Each Matrix access token is bound to a specific device
ID. Reusing the same device ID with new encryption keys causes other Matrix
clients to distrust the device (they see changed identity keys as a potential
security breach). A new access token gets a new device ID with no stale key
history, so other clients trust it immediately.
:::
## Proxy Mode (E2EE on macOS)
Matrix E2EE requires `libolm`, which doesn't compile on macOS ARM64 (Apple Silicon). The `hermes-agent[matrix]` extra is gated to Linux only. If you're on macOS, proxy mode lets you run E2EE in a Docker container on a Linux VM while the actual agent runs natively on macOS with full access to your local files, memory, and skills.
### How It Works
```
macOS (Host):
└─ hermes gateway
├─ api_server adapter ← listens on 0.0.0.0:8642
├─ AIAgent ← single source of truth
├─ Sessions, memory, skills
└─ Local file access (Obsidian, projects, etc.)
Linux VM (Docker):
└─ hermes gateway (proxy mode)
├─ Matrix adapter ← E2EE decryption/encryption
└─ HTTP forward → macOS:8642/v1/chat/completions
(no LLM API keys, no agent, no inference)
```
The Docker container only handles Matrix protocol + E2EE. When a message arrives, it decrypts it and forwards the text to the host via a standard HTTP request. The host runs the agent, calls tools, generates a response, and streams it back. The container encrypts and sends the response to Matrix. All sessions are unified — CLI, Matrix, Telegram, and any other platform share the same memory and conversation history.
### Step 1: Configure the Host (macOS)
Enable the API server so the host accepts incoming requests from the Docker container.
Add to `~/.hermes/.env`:
```bash
API_SERVER_ENABLED=true
API_SERVER_KEY=your-secret-key-here
API_SERVER_HOST=0.0.0.0
```
- `API_SERVER_HOST=0.0.0.0` binds to all interfaces so the Docker container can reach it.
- `API_SERVER_KEY` is required for non-loopback binding. Pick a strong random string.
- The API server runs on port 8642 by default (change with `API_SERVER_PORT` if needed).
Start the gateway:
```bash
hermes gateway
```
You should see the API server start alongside any other platforms you have configured. Verify it's reachable from the VM:
```bash
# From the Linux VM
curl http://:8642/health
```
### Step 2: Configure the Docker Container (Linux VM)
The container needs Matrix credentials and the proxy URL. It does NOT need LLM API keys.
**`docker-compose.yml`:**
```yaml
services:
hermes-matrix:
build: .
environment:
# Matrix credentials
MATRIX_HOMESERVER: "https://matrix.example.org"
MATRIX_ACCESS_TOKEN: "syt_..."
MATRIX_ALLOWED_USERS: "@you:matrix.example.org"
MATRIX_ENCRYPTION: "true"
MATRIX_DEVICE_ID: "HERMES_BOT"
# Proxy mode — forward to host agent
GATEWAY_PROXY_URL: "http://192.168.1.100:8642"
GATEWAY_PROXY_KEY: "your-secret-key-here"
volumes:
- ./matrix-store:/root/.hermes/platforms/matrix/store
```
**`Dockerfile`:**
```dockerfile
FROM python:3.11-slim
RUN apt-get update && apt-get install -y libolm-dev && rm -rf /var/lib/apt/lists/*
RUN pip install 'hermes-agent[matrix]'
CMD ["hermes", "gateway"]
```
That's the entire container. No API keys for OpenRouter, Anthropic, or any inference provider.
### Step 3: Start Both
1. Start the host gateway first:
```bash
hermes gateway
```
2. Start the Docker container:
```bash
docker compose up -d
```
3. Send a message in an encrypted Matrix room. The container decrypts it, forwards it to the host, and streams the response back.
### Configuration Reference
Proxy mode is configured on the **container side** (the thin gateway):
| Setting | Description |
|---------|-------------|
| `GATEWAY_PROXY_URL` | URL of the remote Hermes API server (e.g., `http://192.168.1.100:8642`) |
| `GATEWAY_PROXY_KEY` | Bearer token for authentication (must match `API_SERVER_KEY` on the host) |
| `gateway.proxy_url` | Same as `GATEWAY_PROXY_URL` but in `config.yaml` |
The host side needs:
| Setting | Description |
|---------|-------------|
| `API_SERVER_ENABLED` | Set to `true` |
| `API_SERVER_KEY` | Bearer token (shared with the container) |
| `API_SERVER_HOST` | Set to `0.0.0.0` for network access |
| `API_SERVER_PORT` | Port number (default: `8642`) |
### Works for Any Platform
Proxy mode is not limited to Matrix. Any platform adapter can use it — set `GATEWAY_PROXY_URL` on any gateway instance and it will forward to the remote agent instead of running one locally. This is useful for any deployment where the platform adapter needs to run in a different environment from the agent (network isolation, E2EE requirements, resource constraints).
:::tip
Session continuity is maintained via the `X-Hermes-Session-Id` header. The host's API server tracks sessions by this ID, so conversations persist across messages just like they would with a local agent.
:::
:::note
**Limitations (v1):** Tool progress messages from the remote agent are not relayed back — the user sees the streamed final response only, not individual tool calls. Dangerous command approval prompts are handled on the host side, not relayed to the Matrix user. These can be addressed in future updates.
:::
### Sync issues / bot falls behind
**Cause**: Long-running tool executions can delay the sync loop, or the homeserver is slow.
**Fix**: The sync loop automatically retries every 5 seconds on error. Check the Hermes logs for sync-related warnings. If the bot consistently falls behind, ensure your homeserver has adequate resources.
### Bot is offline
**Cause**: The Hermes gateway isn't running, or it failed to connect.
**Fix**: Check that `hermes gateway` is running. Look at the terminal output for error messages. Common issues: wrong homeserver URL, expired access token, homeserver unreachable.
### "User not allowed" / Bot ignores you
**Cause**: Your User ID isn't in `MATRIX_ALLOWED_USERS`.
**Fix**: Add your User ID to `MATRIX_ALLOWED_USERS` in `~/.hermes/.env` and restart the gateway. Use the full `@user:server` format.
## Security
:::warning
Always set `MATRIX_ALLOWED_USERS` to restrict who can interact with the bot. Without it, the gateway denies all users by default as a safety measure. Only add User IDs of people you trust — authorized users have full access to the agent's capabilities, including tool use and system access.
:::
For more information on securing your Hermes Agent deployment, see the [Security Guide](../security.md).
## Notes
- **Any homeserver**: Works with Synapse, Conduit, Dendrite, matrix.org, or any spec-compliant Matrix homeserver. No specific homeserver software required.
- **Federation**: If you're on a federated homeserver, the bot can communicate with users from other servers — just add their full `@user:server` IDs to `MATRIX_ALLOWED_USERS`.
- **Auto-join**: The bot automatically accepts room invites and joins. It starts responding immediately after joining.
- **Media support**: Hermes can send and receive images, audio, video, and file attachments. Media is uploaded to your homeserver using the Matrix content repository API.
- **Native voice messages (MSC3245)**: The Matrix adapter automatically tags outgoing voice messages with the `org.matrix.msc3245.voice` flag. This means TTS responses and voice audio are rendered as **native voice bubbles** in Element and other clients that support MSC3245, rather than as generic audio file attachments. Incoming voice messages with the MSC3245 flag are also correctly identified and routed to speech-to-text transcription. No configuration is needed — this works automatically.
---
# Mattermost
# Mattermost Setup
Hermes Agent integrates with Mattermost as a bot, letting you chat with your AI assistant through direct messages or team channels. Mattermost is a self-hosted, open-source Slack alternative — you run it on your own infrastructure, keeping full control of your data. The bot connects via Mattermost's REST API (v4) and WebSocket for real-time events, processes messages through the Hermes Agent pipeline (including tool use, memory, and reasoning), and responds in real time. It supports text, file attachments, images, and slash commands.
No external Mattermost library is required — the adapter uses `aiohttp`, which is already a Hermes dependency.
Before setup, here's the part most people want to know: how Hermes behaves once it's in your Mattermost instance.
## How Hermes Behaves
| Context | Behavior |
|---------|----------|
| **DMs** | Hermes responds to every message. No `@mention` needed. Each DM has its own session. |
| **Public/private channels** | Hermes responds when you `@mention` it. Without a mention, Hermes ignores the message. |
| **Threads** | If `MATTERMOST_REPLY_MODE=thread`, Hermes replies in a thread under your message. Thread context stays isolated from the parent channel. |
| **Shared channels with multiple users** | By default, Hermes isolates session history per user inside the channel. Two people talking in the same channel do not share one transcript unless you explicitly disable that. |
:::tip
If you want Hermes to reply as threaded conversations (nested under your original message), set `MATTERMOST_REPLY_MODE=thread`. The default is `off`, which sends flat messages in the channel.
:::
### Session Model in Mattermost
By default:
- each DM gets its own session
- each thread gets its own session namespace
- each user in a shared channel gets their own session inside that channel
This is controlled by `config.yaml`:
```yaml
group_sessions_per_user: true
```
Set it to `false` only if you explicitly want one shared conversation for the entire channel:
```yaml
group_sessions_per_user: false
```
Shared sessions can be useful for a collaborative channel, but they also mean:
- users share context growth and token costs
- one person's long tool-heavy task can bloat everyone else's context
- one person's in-flight run can interrupt another person's follow-up in the same channel
This guide walks you through the full setup process — from creating your bot on Mattermost to sending your first message.
## Step 1: Enable Bot Accounts
Bot accounts must be enabled on your Mattermost server before you can create one.
1. Log in to Mattermost as a **System Admin**.
2. Go to **System Console** → **Integrations** → **Bot Accounts**.
3. Set **Enable Bot Account Creation** to **true**.
4. Click **Save**.
:::info
If you don't have System Admin access, ask your Mattermost administrator to enable bot accounts and create one for you.
:::
## Step 2: Create a Bot Account
1. In Mattermost, click the **☰** menu (top-left) → **Integrations** → **Bot Accounts**.
2. Click **Add Bot Account**.
3. Fill in the details:
- **Username**: e.g., `hermes`
- **Display Name**: e.g., `Hermes Agent`
- **Description**: optional
- **Role**: `Member` is sufficient
4. Click **Create Bot Account**.
5. Mattermost will display the **bot token**. **Copy it immediately.**
:::warning[Token shown only once]
The bot token is only displayed once when you create the bot account. If you lose it, you'll need to regenerate it from the bot account settings. Never share your token publicly or commit it to Git — anyone with this token has full control of the bot.
:::
Store the token somewhere safe (a password manager, for example). You'll need it in Step 5.
:::tip
You can also use a **personal access token** instead of a bot account. Go to **Profile** → **Security** → **Personal Access Tokens** → **Create Token**. This is useful if you want Hermes to post as your own user rather than a separate bot user.
:::
## Step 3: Add the Bot to Channels
The bot needs to be a member of any channel where you want it to respond:
1. Open the channel where you want the bot.
2. Click the channel name → **Add Members**.
3. Search for your bot username (e.g., `hermes`) and add it.
For DMs, simply open a direct message with the bot — it will be able to respond immediately.
## Step 4: Find Your Mattermost User ID
Hermes Agent uses your Mattermost User ID to control who can interact with the bot. To find it:
1. Click your **avatar** (top-left corner) → **Profile**.
2. Your User ID is displayed in the profile dialog — click it to copy.
Your User ID is a 26-character alphanumeric string like `3uo8dkh1p7g1mfk49ear5fzs5c`.
:::warning
Your User ID is **not** your username. The username is what appears after `@` (e.g., `@alice`). The User ID is a long alphanumeric identifier that Mattermost uses internally.
:::
**Alternative**: You can also get your User ID via the API:
```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
https://your-mattermost-server/api/v4/users/me | jq .id
```
:::tip
To get a **Channel ID**: click the channel name → **View Info**. The Channel ID is shown in the info panel. You'll need this if you want to set a home channel manually.
:::
## Step 5: Configure Hermes Agent
### Option A: Interactive Setup (Recommended)
Run the guided setup command:
```bash
hermes gateway setup
```
Select **Mattermost** when prompted, then paste your server URL, bot token, and user ID when asked.
### Option B: Manual Configuration
Add the following to your `~/.hermes/.env` file:
```bash
# Required
MATTERMOST_URL=https://mm.example.com
MATTERMOST_TOKEN=***
MATTERMOST_ALLOWED_USERS=3uo8dkh1p7g1mfk49ear5fzs5c
# Multiple allowed users (comma-separated)
# MATTERMOST_ALLOWED_USERS=3uo8dkh1p7g1mfk49ear5fzs5c,8fk2jd9s0a7bncm1xqw4tp6r3e
# Optional: reply mode (thread or off, default: off)
# MATTERMOST_REPLY_MODE=thread
# Optional: respond without @mention (default: true = require mention)
# MATTERMOST_REQUIRE_MENTION=false
# Optional: channels where bot responds without @mention (comma-separated channel IDs)
# MATTERMOST_FREE_RESPONSE_CHANNELS=channel_id_1,channel_id_2
```
Optional behavior settings in `~/.hermes/config.yaml`:
```yaml
group_sessions_per_user: true
```
- `group_sessions_per_user: true` keeps each participant's context isolated inside shared channels and threads
### Start the Gateway
Once configured, start the Mattermost gateway:
```bash
hermes gateway
```
The bot should connect to your Mattermost server within a few seconds. Send it a message — either a DM or in a channel where it's been added — to test.
:::tip
You can run `hermes gateway` in the background or as a systemd service for persistent operation. See the deployment docs for details.
:::
## Home Channel
You can designate a "home channel" where the bot sends proactive messages (such as cron job output, reminders, and notifications). There are two ways to set it:
### Using the Slash Command
Type `/sethome` in any Mattermost channel where the bot is present. That channel becomes the home channel.
### Manual Configuration
Add this to your `~/.hermes/.env`:
```bash
MATTERMOST_HOME_CHANNEL=abc123def456ghi789jkl012mn
```
Replace the ID with the actual channel ID (click the channel name → View Info → copy the ID).
## Reply Mode
The `MATTERMOST_REPLY_MODE` setting controls how Hermes posts responses:
| Mode | Behavior |
|------|----------|
| `off` (default) | Hermes posts flat messages in the channel, like a normal user. |
| `thread` | Hermes replies in a thread under your original message. Keeps channels clean when there's lots of back-and-forth. |
Set it in your `~/.hermes/.env`:
```bash
MATTERMOST_REPLY_MODE=thread
```
## Mention Behavior
By default, the bot only responds in channels when `@mentioned`. You can change this:
| Variable | Default | Description |
|----------|---------|-------------|
| `MATTERMOST_REQUIRE_MENTION` | `true` | Set to `false` to respond to all messages in channels (DMs always work). |
| `MATTERMOST_FREE_RESPONSE_CHANNELS` | _(none)_ | Comma-separated channel IDs where the bot responds without `@mention`, even when require_mention is true. |
To find a channel ID in Mattermost: open the channel, click the channel name header, and look for the ID in the URL or channel details.
When the bot is `@mentioned`, the mention is automatically stripped from the message before processing.
## Troubleshooting
### Bot is not responding to messages
**Cause**: The bot is not a member of the channel, or `MATTERMOST_ALLOWED_USERS` doesn't include your User ID.
**Fix**: Add the bot to the channel (channel name → Add Members → search for the bot). Verify your User ID is in `MATTERMOST_ALLOWED_USERS`. Restart the gateway.
### 403 Forbidden errors
**Cause**: The bot token is invalid, or the bot doesn't have permission to post in the channel.
**Fix**: Check that `MATTERMOST_TOKEN` in your `.env` file is correct. Make sure the bot account hasn't been deactivated. Verify the bot has been added to the channel. If using a personal access token, ensure your account has the required permissions.
### WebSocket disconnects / reconnection loops
**Cause**: Network instability, Mattermost server restarts, or firewall/proxy issues with WebSocket connections.
**Fix**: The adapter automatically reconnects with exponential backoff (2s → 60s). Check your server's WebSocket configuration — reverse proxies (nginx, Apache) need WebSocket upgrade headers configured. Verify no firewall is blocking WebSocket connections on your Mattermost server.
For nginx, ensure your config includes:
```nginx
location /api/v4/websocket {
proxy_pass http://mattermost-backend;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 600s;
}
```
### "Failed to authenticate" on startup
**Cause**: The token or server URL is incorrect.
**Fix**: Verify `MATTERMOST_URL` points to your Mattermost server (include `https://`, no trailing slash). Check that `MATTERMOST_TOKEN` is valid — try it with curl:
```bash
curl -H "Authorization: Bearer YOUR_TOKEN" \
https://your-server/api/v4/users/me
```
If this returns your bot's user info, the token is valid. If it returns an error, regenerate the token.
### Bot is offline
**Cause**: The Hermes gateway isn't running, or it failed to connect.
**Fix**: Check that `hermes gateway` is running. Look at the terminal output for error messages. Common issues: wrong URL, expired token, Mattermost server unreachable.
### "User not allowed" / Bot ignores you
**Cause**: Your User ID isn't in `MATTERMOST_ALLOWED_USERS`.
**Fix**: Add your User ID to `MATTERMOST_ALLOWED_USERS` in `~/.hermes/.env` and restart the gateway. Remember: the User ID is a 26-character alphanumeric string, not your `@username`.
## Per-Channel Prompts
Assign ephemeral system prompts to specific Mattermost channels. The prompt is injected at runtime on every turn — never persisted to transcript history — so changes take effect immediately.
```yaml
mattermost:
channel_prompts:
"channel_id_abc123": |
You are a research assistant. Focus on academic sources,
citations, and concise synthesis.
"channel_id_def456": |
Code review mode. Be precise about edge cases and
performance implications.
```
Keys are Mattermost channel IDs (find them in the channel URL or via the API). All messages in the matching channel get the prompt injected as an ephemeral system instruction.
## Security
:::warning
Always set `MATTERMOST_ALLOWED_USERS` to restrict who can interact with the bot. Without it, the gateway denies all users by default as a safety measure. Only add User IDs of people you trust — authorized users have full access to the agent's capabilities, including tool use and system access.
:::
For more information on securing your Hermes Agent deployment, see the [Security Guide](../security.md).
## Notes
- **Self-hosted friendly**: Works with any self-hosted Mattermost instance. No Mattermost Cloud account or subscription required.
- **No extra dependencies**: The adapter uses `aiohttp` for HTTP and WebSocket, which is already included with Hermes Agent.
- **Team Edition compatible**: Works with both Mattermost Team Edition (free) and Enterprise Edition.
---
# user-guide/messaging/homeassistant
# Home Assistant Integration
Hermes Agent integrates with [Home Assistant](https://www.home-assistant.io/) in two ways:
1. **Gateway platform** — subscribes to real-time state changes via WebSocket and responds to events
2. **Smart home tools** — four LLM-callable tools for querying and controlling devices via the REST API
## Setup
### 1. Create a Long-Lived Access Token
1. Open your Home Assistant instance
2. Go to your **Profile** (click your name in the sidebar)
3. Scroll to **Long-Lived Access Tokens**
4. Click **Create Token**, give it a name like "Hermes Agent"
5. Copy the token
### 2. Configure Environment Variables
```bash
# Add to ~/.hermes/.env
# Required: your Long-Lived Access Token
HASS_TOKEN=your-long-lived-access-token
# Optional: HA URL (default: http://homeassistant.local:8123)
HASS_URL=http://192.168.1.100:8123
```
:::info
The `homeassistant` toolset is automatically enabled when `HASS_TOKEN` is set. Both the gateway platform and the device control tools activate from this single token.
:::
### 3. Start the Gateway
```bash
hermes gateway
```
Home Assistant will appear as a connected platform alongside any other messaging platforms (Telegram, Discord, etc.).
## Available Tools
Hermes Agent registers four tools for smart home control:
### `ha_list_entities`
List Home Assistant entities, optionally filtered by domain or area.
**Parameters:**
- `domain` *(optional)* — Filter by entity domain: `light`, `switch`, `climate`, `sensor`, `binary_sensor`, `cover`, `fan`, `media_player`, etc.
- `area` *(optional)* — Filter by area/room name (matches against friendly names): `living room`, `kitchen`, `bedroom`, etc.
**Example:**
```
List all lights in the living room
```
Returns entity IDs, states, and friendly names.
### `ha_get_state`
Get detailed state of a single entity, including all attributes (brightness, color, temperature setpoint, sensor readings, etc.).
**Parameters:**
- `entity_id` *(required)* — The entity to query, e.g., `light.living_room`, `climate.thermostat`, `sensor.temperature`
**Example:**
```
What's the current state of climate.thermostat?
```
Returns: state, all attributes, last changed/updated timestamps.
### `ha_list_services`
List available services (actions) for device control. Shows what actions can be performed on each device type and what parameters they accept.
**Parameters:**
- `domain` *(optional)* — Filter by domain, e.g., `light`, `climate`, `switch`
**Example:**
```
What services are available for climate devices?
```
### `ha_call_service`
Call a Home Assistant service to control a device.
**Parameters:**
- `domain` *(required)* — Service domain: `light`, `switch`, `climate`, `cover`, `media_player`, `fan`, `scene`, `script`
- `service` *(required)* — Service name: `turn_on`, `turn_off`, `toggle`, `set_temperature`, `set_hvac_mode`, `open_cover`, `close_cover`, `set_volume_level`
- `entity_id` *(optional)* — Target entity, e.g., `light.living_room`
- `data` *(optional)* — Additional parameters as a JSON object
**Examples:**
```
Turn on the living room lights
→ ha_call_service(domain="light", service="turn_on", entity_id="light.living_room")
```
```
Set the thermostat to 22 degrees in heat mode
→ ha_call_service(domain="climate", service="set_temperature",
entity_id="climate.thermostat", data={"temperature": 22, "hvac_mode": "heat"})
```
```
Set living room lights to blue at 50% brightness
→ ha_call_service(domain="light", service="turn_on",
entity_id="light.living_room", data={"brightness": 128, "color_name": "blue"})
```
## Gateway Platform: Real-Time Events
The Home Assistant gateway adapter connects via WebSocket and subscribes to `state_changed` events. When a device state changes and matches your filters, it's forwarded to the agent as a message.
### Event Filtering
:::warning Required Configuration
By default, **no events are forwarded**. You must configure at least one of `watch_domains`, `watch_entities`, or `watch_all` to receive events. Without filters, a warning is logged at startup and all state changes are silently dropped.
:::
Configure which events the agent sees in `~/.hermes/config.yaml` under the Home Assistant platform's `extra` section:
```yaml
platforms:
homeassistant:
enabled: true
extra:
watch_domains:
- climate
- binary_sensor
- alarm_control_panel
- light
watch_entities:
- sensor.front_door_battery
ignore_entities:
- sensor.uptime
- sensor.cpu_usage
- sensor.memory_usage
cooldown_seconds: 30
```
| Setting | Default | Description |
|---------|---------|-------------|
| `watch_domains` | *(none)* | Only watch these entity domains (e.g., `climate`, `light`, `binary_sensor`) |
| `watch_entities` | *(none)* | Only watch these specific entity IDs |
| `watch_all` | `false` | Set to `true` to receive **all** state changes (not recommended for most setups) |
| `ignore_entities` | *(none)* | Always ignore these entities (applied before domain/entity filters) |
| `cooldown_seconds` | `30` | Minimum seconds between events for the same entity |
:::tip
Start with a focused set of domains — `climate`, `binary_sensor`, and `alarm_control_panel` cover the most useful automations. Add more as needed. Use `ignore_entities` to suppress noisy sensors like CPU temperature or uptime counters.
:::
### Event Formatting
State changes are formatted as human-readable messages based on domain:
| Domain | Format |
|--------|--------|
| `climate` | "HVAC mode changed from 'off' to 'heat' (current: 21, target: 23)" |
| `sensor` | "changed from 21°C to 22°C" |
| `binary_sensor` | "triggered" / "cleared" |
| `light`, `switch`, `fan` | "turned on" / "turned off" |
| `alarm_control_panel` | "alarm state changed from 'armed_away' to 'triggered'" |
| *(other)* | "changed from 'old' to 'new'" |
### Agent Responses
Outbound messages from the agent are delivered as **Home Assistant persistent notifications** (via `persistent_notification.create`). These appear in the HA notification panel with the title "Hermes Agent".
### Connection Management
- **WebSocket** with 30-second heartbeat for real-time events
- **Automatic reconnection** with backoff: 5s → 10s → 30s → 60s
- **REST API** for outbound notifications (separate session to avoid WebSocket conflicts)
- **Authorization** — HA events are always authorized (no user allowlist needed, since the `HASS_TOKEN` authenticates the connection)
## Security
The Home Assistant tools enforce security restrictions:
:::warning Blocked Domains
The following service domains are **blocked** to prevent arbitrary code execution on the HA host:
- `shell_command` — arbitrary shell commands
- `command_line` — sensors/switches that execute commands
- `python_script` — scripted Python execution
- `pyscript` — broader scripting integration
- `hassio` — addon control, host shutdown/reboot
- `rest_command` — HTTP requests from HA server (SSRF vector)
Attempting to call services in these domains returns an error.
:::
Entity IDs are validated against the pattern `^[a-z_][a-z0-9_]*\.[a-z0-9_]+$` to prevent injection attacks.
## Example Automations
### Morning Routine
```
User: Start my morning routine
Agent:
1. ha_call_service(domain="light", service="turn_on",
entity_id="light.bedroom", data={"brightness": 128})
2. ha_call_service(domain="climate", service="set_temperature",
entity_id="climate.thermostat", data={"temperature": 22})
3. ha_call_service(domain="media_player", service="turn_on",
entity_id="media_player.kitchen_speaker")
```
### Security Check
```
User: Is the house secure?
Agent:
1. ha_list_entities(domain="binary_sensor")
→ checks door/window sensors
2. ha_get_state(entity_id="alarm_control_panel.home")
→ checks alarm status
3. ha_list_entities(domain="lock")
→ checks lock states
4. Reports: "All doors closed, alarm is armed_away, all locks engaged."
```
### Reactive Automation (via Gateway Events)
When connected as a gateway platform, the agent can react to events:
```
[Home Assistant] Front Door: triggered (was cleared)
Agent automatically:
1. ha_get_state(entity_id="binary_sensor.front_door")
2. ha_call_service(domain="light", service="turn_on",
entity_id="light.hallway")
3. Sends notification: "Front door opened. Hallway lights turned on."
```
---
# Webhooks
# Webhooks
Receive events from external services (GitHub, GitLab, JIRA, Stripe, etc.) and trigger Hermes agent runs automatically. The webhook adapter runs an HTTP server that accepts POST requests, validates HMAC signatures, transforms payloads into agent prompts, and routes responses back to the source or to another configured platform.
The agent processes the event and can respond by posting comments on PRs, sending messages to Telegram/Discord, or logging the result.
## Video Tutorial
---
## Quick Start
1. Enable via `hermes gateway setup` or environment variables
2. Define routes in `config.yaml` **or** create them dynamically with `hermes webhook subscribe`
3. Point your service at `http://your-server:8644/webhooks/`
---
## Setup
There are two ways to enable the webhook adapter.
### Via setup wizard
```bash
hermes gateway setup
```
Follow the prompts to enable webhooks, set the port, and set a global HMAC secret.
### Via environment variables
Add to `~/.hermes/.env`:
```bash
WEBHOOK_ENABLED=true
WEBHOOK_PORT=8644 # default
WEBHOOK_SECRET=your-global-secret
```
### Verify the server
Once the gateway is running:
```bash
curl http://localhost:8644/health
```
Expected response:
```json
{"status": "ok", "platform": "webhook"}
```
---
## Configuring Routes {#configuring-routes}
Routes define how different webhook sources are handled. Each route is a named entry under `platforms.webhook.extra.routes` in your `config.yaml`.
### Route properties
| Property | Required | Description |
|----------|----------|-------------|
| `events` | No | List of event types to accept (e.g. `["pull_request"]`). If empty, all events are accepted. Event type is read from `X-GitHub-Event`, `X-GitLab-Event`, or `event_type` in the payload. |
| `secret` | **Yes** | HMAC secret for signature validation. Falls back to the global `secret` if not set on the route. Set to `"INSECURE_NO_AUTH"` for testing only (skips validation). |
| `prompt` | No | Template string with dot-notation payload access (e.g. `{pull_request.title}`). If omitted, the full JSON payload is dumped into the prompt. |
| `skills` | No | List of skill names to load for the agent run. |
| `deliver` | No | Where to send the response: `github_comment`, `telegram`, `discord`, `slack`, `signal`, `sms`, `whatsapp`, `matrix`, `mattermost`, `homeassistant`, `email`, `dingtalk`, `feishu`, `wecom`, `weixin`, `bluebubbles`, `qqbot`, or `log` (default). |
| `deliver_extra` | No | Additional delivery config — keys depend on `deliver` type (e.g. `repo`, `pr_number`, `chat_id`). Values support the same `{dot.notation}` templates as `prompt`. |
| `deliver_only` | No | If `true`, skip the agent entirely — the rendered `prompt` template becomes the literal message that gets delivered. Zero LLM cost, sub-second delivery. See [Direct Delivery Mode](#direct-delivery-mode) for use cases. Requires `deliver` to be a real target (not `log`). |
### Full example
```yaml
platforms:
webhook:
enabled: true
extra:
port: 8644
secret: "global-fallback-secret"
routes:
github-pr:
events: ["pull_request"]
secret: "github-webhook-secret"
prompt: |
Review this pull request:
Repository: {repository.full_name}
PR #{number}: {pull_request.title}
Author: {pull_request.user.login}
URL: {pull_request.html_url}
Diff URL: {pull_request.diff_url}
Action: {action}
skills: ["github-code-review"]
deliver: "github_comment"
deliver_extra:
repo: "{repository.full_name}"
pr_number: "{number}"
deploy-notify:
events: ["push"]
secret: "deploy-secret"
prompt: "New push to {repository.full_name} branch {ref}: {head_commit.message}"
deliver: "telegram"
```
### Prompt Templates
Prompts use dot-notation to access nested fields in the webhook payload:
- `{pull_request.title}` resolves to `payload["pull_request"]["title"]`
- `{repository.full_name}` resolves to `payload["repository"]["full_name"]`
- `{__raw__}` — special token that dumps the **entire payload** as indented JSON (truncated at 4000 characters). Useful for monitoring alerts or generic webhooks where the agent needs the full context.
- Missing keys are left as the literal `{key}` string (no error)
- Nested dicts and lists are JSON-serialized and truncated at 2000 characters
You can mix `{__raw__}` with regular template variables:
```yaml
prompt: "PR #{pull_request.number} by {pull_request.user.login}: {__raw__}"
```
If no `prompt` template is configured for a route, the entire payload is dumped as indented JSON (truncated at 4000 characters).
The same dot-notation templates work in `deliver_extra` values.
### Forum Topic Delivery
When delivering webhook responses to Telegram, you can target a specific forum topic by including `message_thread_id` (or `thread_id`) in `deliver_extra`:
```yaml
webhooks:
routes:
alerts:
events: ["alert"]
prompt: "Alert: {__raw__}"
deliver: "telegram"
deliver_extra:
chat_id: "-1001234567890"
message_thread_id: "42"
```
If `chat_id` is not provided in `deliver_extra`, the delivery falls back to the home channel configured for the target platform.
---
## GitHub PR Review (Step by Step) {#github-pr-review}
This walkthrough sets up automatic code review on every pull request.
### 1. Create the webhook in GitHub
1. Go to your repository → **Settings** → **Webhooks** → **Add webhook**
2. Set **Payload URL** to `http://your-server:8644/webhooks/github-pr`
3. Set **Content type** to `application/json`
4. Set **Secret** to match your route config (e.g. `github-webhook-secret`)
5. Under **Which events?**, select **Let me select individual events** and check **Pull requests**
6. Click **Add webhook**
### 2. Add the route config
Add the `github-pr` route to your `~/.hermes/config.yaml` as shown in the example above.
### 3. Ensure `gh` CLI is authenticated
The `github_comment` delivery type uses the GitHub CLI to post comments:
```bash
gh auth login
```
### 4. Test it
Open a pull request on the repository. The webhook fires, Hermes processes the event, and posts a review comment on the PR.
---
## GitLab Webhook Setup {#gitlab-webhook-setup}
GitLab webhooks work similarly but use a different authentication mechanism. GitLab sends the secret as a plain `X-Gitlab-Token` header (exact string match, not HMAC).
### 1. Create the webhook in GitLab
1. Go to your project → **Settings** → **Webhooks**
2. Set the **URL** to `http://your-server:8644/webhooks/gitlab-mr`
3. Enter your **Secret token**
4. Select **Merge request events** (and any other events you want)
5. Click **Add webhook**
### 2. Add the route config
```yaml
platforms:
webhook:
enabled: true
extra:
routes:
gitlab-mr:
events: ["merge_request"]
secret: "your-gitlab-secret-token"
prompt: |
Review this merge request:
Project: {project.path_with_namespace}
MR !{object_attributes.iid}: {object_attributes.title}
Author: {object_attributes.last_commit.author.name}
URL: {object_attributes.url}
Action: {object_attributes.action}
deliver: "log"
```
---
## Delivery Options {#delivery-options}
The `deliver` field controls where the agent's response goes after processing the webhook event.
| Deliver Type | Description |
|-------------|-------------|
| `log` | Logs the response to the gateway log output. This is the default and is useful for testing. |
| `github_comment` | Posts the response as a PR/issue comment via the `gh` CLI. Requires `deliver_extra.repo` and `deliver_extra.pr_number`. The `gh` CLI must be installed and authenticated on the gateway host (`gh auth login`). |
| `telegram` | Routes the response to Telegram. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `discord` | Routes the response to Discord. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `slack` | Routes the response to Slack. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `signal` | Routes the response to Signal. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `sms` | Routes the response to SMS via Twilio. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `whatsapp` | Routes the response to WhatsApp. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `matrix` | Routes the response to Matrix. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `mattermost` | Routes the response to Mattermost. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `homeassistant` | Routes the response to Home Assistant. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `email` | Routes the response to Email. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `dingtalk` | Routes the response to DingTalk. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `feishu` | Routes the response to Feishu/Lark. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `wecom` | Routes the response to WeCom. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `weixin` | Routes the response to Weixin (WeChat). Uses the home channel, or specify `chat_id` in `deliver_extra`. |
| `bluebubbles` | Routes the response to BlueBubbles (iMessage). Uses the home channel, or specify `chat_id` in `deliver_extra`. |
For cross-platform delivery, the target platform must also be enabled and connected in the gateway. If no `chat_id` is provided in `deliver_extra`, the response is sent to that platform's configured home channel.
---
## Direct Delivery Mode {#direct-delivery-mode}
By default, every webhook POST triggers an agent run — the payload becomes a prompt, the agent processes it, and the agent's response is delivered. This costs LLM tokens on every event.
For use cases where you just want to **push a plain notification** — no reasoning, no agent loop, just deliver the message — set `deliver_only: true` on the route. The rendered `prompt` template becomes the literal message body, and the adapter dispatches it directly to the configured delivery target.
### When to use direct delivery
- **External service push** — Supabase/Firebase webhook fires on a database change → notify a user in Telegram instantly
- **Monitoring alerts** — Datadog/Grafana alert webhook → push to a Discord channel
- **Inter-agent pings** — Agent A notifies Agent B's user that a long-running task finished
- **Background job completion** — Cron job finishes → post result to Slack
Benefits:
- **Zero LLM tokens** — the agent is never invoked
- **Sub-second delivery** — a single adapter call, no reasoning loop
- **Same security as agent mode** — HMAC auth, rate limits, idempotency, and body-size limits all still apply
- **Synchronous response** — the POST returns `200 OK` once delivery succeeds, or `502` if the target rejects it, so your upstream service can retry intelligently
### Example: Telegram push from Supabase
```yaml
platforms:
webhook:
enabled: true
extra:
port: 8644
secret: "global-secret"
routes:
antenna-matches:
secret: "antenna-webhook-secret"
deliver: "telegram"
deliver_only: true
prompt: "🎉 New match: {match.user_name} matched with you!"
deliver_extra:
chat_id: "{match.telegram_chat_id}"
```
Your Supabase edge function signs the payload with HMAC-SHA256 and POSTs to `https://your-server:8644/webhooks/antenna-matches`. The webhook adapter validates the signature, renders the template from the payload, delivers to Telegram, and returns `200 OK`.
### Example: Dynamic subscription via CLI
```bash
hermes webhook subscribe antenna-matches \
--deliver telegram \
--deliver-chat-id "123456789" \
--deliver-only \
--prompt "🎉 New match: {match.user_name} matched with you!" \
--description "Antenna match notifications"
```
### Response codes
| Status | Meaning |
|--------|---------|
| `200 OK` | Delivered successfully. Body: `{"status": "delivered", "route": "...", "target": "...", "delivery_id": "..."}` |
| `200 OK` (status=duplicate) | Duplicate `X-GitHub-Delivery` ID within the idempotency TTL (1 hour). Not re-delivered. |
| `401 Unauthorized` | HMAC signature invalid or missing. |
| `400 Bad Request` | Malformed JSON body. |
| `404 Not Found` | Unknown route name. |
| `413 Payload Too Large` | Body exceeded `max_body_bytes`. |
| `429 Too Many Requests` | Route rate limit exceeded. |
| `502 Bad Gateway` | Target adapter rejected the message or raised. The error is logged server-side; the response body is a generic `Delivery failed` to avoid leaking adapter internals. |
### Configuration gotchas
- `deliver_only: true` requires `deliver` to be a real target. `deliver: log` (or omitting `deliver`) is rejected at startup — the adapter refuses to start if it finds a misconfigured route.
- The `skills` field is ignored in direct delivery mode (no agent runs, so there's nothing to inject skills into).
- Template rendering uses the same `{dot.notation}` syntax as agent mode, including the `{__raw__}` token.
- Idempotency uses the same `X-GitHub-Delivery` / `X-Request-ID` header — retries with the same ID return `status=duplicate` and do NOT re-deliver.
---
## Dynamic Subscriptions (CLI) {#dynamic-subscriptions}
In addition to static routes in `config.yaml`, you can create webhook subscriptions dynamically using the `hermes webhook` CLI command. This is especially useful when the agent itself needs to set up event-driven triggers.
### Create a subscription
```bash
hermes webhook subscribe github-issues \
--events "issues" \
--prompt "New issue #{issue.number}: {issue.title}\nBy: {issue.user.login}\n\n{issue.body}" \
--deliver telegram \
--deliver-chat-id "-100123456789" \
--description "Triage new GitHub issues"
```
This returns the webhook URL and an auto-generated HMAC secret. Configure your service to POST to that URL.
### List subscriptions
```bash
hermes webhook list
```
### Remove a subscription
```bash
hermes webhook remove github-issues
```
### Test a subscription
```bash
hermes webhook test github-issues
hermes webhook test github-issues --payload '{"issue": {"number": 42, "title": "Test"}}'
```
### How dynamic subscriptions work
- Subscriptions are stored in `~/.hermes/webhook_subscriptions.json`
- The webhook adapter hot-reloads this file on each incoming request (mtime-gated, negligible overhead)
- Static routes from `config.yaml` always take precedence over dynamic ones with the same name
- Dynamic subscriptions use the same route format and capabilities as static routes (events, prompt templates, skills, delivery)
- No gateway restart required — subscribe and it's immediately live
### Agent-driven subscriptions
The agent can create subscriptions via the terminal tool when guided by the `webhook-subscriptions` skill. Ask the agent to "set up a webhook for GitHub issues" and it will run the appropriate `hermes webhook subscribe` command.
---
## Security {#security}
The webhook adapter includes multiple layers of security:
### HMAC signature validation
The adapter validates incoming webhook signatures using the appropriate method for each source:
- **GitHub**: `X-Hub-Signature-256` header — HMAC-SHA256 hex digest prefixed with `sha256=`
- **GitLab**: `X-Gitlab-Token` header — plain secret string match
- **Generic**: `X-Webhook-Signature` header — raw HMAC-SHA256 hex digest
If a secret is configured but no recognized signature header is present, the request is rejected.
### Secret is required
Every route must have a secret — either set directly on the route or inherited from the global `secret`. Routes without a secret cause the adapter to fail at startup with an error. For development/testing only, you can set the secret to `"INSECURE_NO_AUTH"` to skip validation entirely.
### Rate limiting
Each route is rate-limited to **30 requests per minute** by default (fixed-window). Configure this globally:
```yaml
platforms:
webhook:
extra:
rate_limit: 60 # requests per minute
```
Requests exceeding the limit receive a `429 Too Many Requests` response.
### Idempotency
Delivery IDs (from `X-GitHub-Delivery`, `X-Request-ID`, or a timestamp fallback) are cached for **1 hour**. Duplicate deliveries (e.g. webhook retries) are silently skipped with a `200` response, preventing duplicate agent runs.
### Body size limits
Payloads exceeding **1 MB** are rejected before the body is read. Configure this:
```yaml
platforms:
webhook:
extra:
max_body_bytes: 2097152 # 2 MB
```
### Prompt injection risk
:::warning
Webhook payloads contain attacker-controlled data — PR titles, commit messages, issue descriptions, etc. can all contain malicious instructions. Run the gateway in a sandboxed environment (Docker, VM) when exposed to the internet. Consider using the Docker or SSH terminal backend for isolation.
:::
---
## Troubleshooting {#troubleshooting}
### Webhook not arriving
- Verify the port is exposed and accessible from the webhook source
- Check firewall rules — port `8644` (or your configured port) must be open
- Verify the URL path matches: `http://your-server:8644/webhooks/`
- Use the `/health` endpoint to confirm the server is running
### Signature validation failing
- Ensure the secret in your route config exactly matches the secret configured in the webhook source
- For GitHub, the secret is HMAC-based — check `X-Hub-Signature-256`
- For GitLab, the secret is a plain token match — check `X-Gitlab-Token`
- Check gateway logs for `Invalid signature` warnings
### Event being ignored
- Check that the event type is in your route's `events` list
- GitHub events use values like `pull_request`, `push`, `issues` (the `X-GitHub-Event` header value)
- GitLab events use values like `merge_request`, `push` (the `X-GitLab-Event` header value)
- If `events` is empty or not set, all events are accepted
### Agent not responding
- Run the gateway in foreground to see logs: `hermes gateway run`
- Check that the prompt template is rendering correctly
- Verify the delivery target is configured and connected
### Duplicate responses
- The idempotency cache should prevent this — check that the webhook source is sending a delivery ID header (`X-GitHub-Delivery` or `X-Request-ID`)
- Delivery IDs are cached for 1 hour
### `gh` CLI errors (GitHub comment delivery)
- Run `gh auth login` on the gateway host
- Ensure the authenticated GitHub user has write access to the repository
- Check that `gh` is installed and on the PATH
---
## Environment Variables {#environment-variables}
| Variable | Description | Default |
|----------|-------------|---------|
| `WEBHOOK_ENABLED` | Enable the webhook platform adapter | `false` |
| `WEBHOOK_PORT` | HTTP server port for receiving webhooks | `8644` |
| `WEBHOOK_SECRET` | Global HMAC secret (used as fallback when routes don't specify their own) | _(none)_ |
---
# Integrations
# Integrations
Hermes Agent connects to external systems for AI inference, tool servers, IDE workflows, programmatic access, and more. These integrations extend what Hermes can do and where it can run.
## AI Providers & Routing
Hermes supports multiple AI inference providers out of the box. Use `hermes model` to configure interactively, or set them in `config.yaml`.
- **[AI Providers](/docs/user-guide/features/provider-routing)** — OpenRouter, Anthropic, OpenAI, Google, and any OpenAI-compatible endpoint. Hermes auto-detects capabilities like vision, streaming, and tool use per provider.
- **[Provider Routing](/docs/user-guide/features/provider-routing)** — Fine-grained control over which underlying providers handle your OpenRouter requests. Optimize for cost, speed, or quality with sorting, whitelists, blacklists, and explicit priority ordering.
- **[Fallback Providers](/docs/user-guide/features/fallback-providers)** — Automatic failover to backup LLM providers when your primary model encounters errors. Includes primary model fallback and independent auxiliary task fallback for vision, compression, and web extraction.
## Tool Servers (MCP)
- **[MCP Servers](/docs/user-guide/features/mcp)** — Connect Hermes to external tool servers via Model Context Protocol. Access tools from GitHub, databases, file systems, browser stacks, internal APIs, and more without writing native Hermes tools. Supports both stdio and SSE transports, per-server tool filtering, and capability-aware resource/prompt registration.
## Web Search Backends
The `web_search` and `web_extract` tools support four backend providers, configured via `config.yaml` or `hermes tools`:
| Backend | Env Var | Search | Extract | Crawl |
|---------|---------|--------|---------|-------|
| **Firecrawl** (default) | `FIRECRAWL_API_KEY` | ✔ | ✔ | ✔ |
| **Parallel** | `PARALLEL_API_KEY` | ✔ | ✔ | — |
| **Tavily** | `TAVILY_API_KEY` | ✔ | ✔ | ✔ |
| **Exa** | `EXA_API_KEY` | ✔ | ✔ | — |
Quick setup example:
```yaml
web:
backend: firecrawl # firecrawl | parallel | tavily | exa
```
If `web.backend` is not set, the backend is auto-detected from whichever API key is available. Self-hosted Firecrawl is also supported via `FIRECRAWL_API_URL`.
## Browser Automation
Hermes includes full browser automation with multiple backend options for navigating websites, filling forms, and extracting information:
- **Browserbase** — Managed cloud browsers with anti-bot tooling, CAPTCHA solving, and residential proxies
- **Browser Use** — Alternative cloud browser provider
- **Local Chrome via CDP** — Connect to your running Chrome instance using `/browser connect`
- **Local Chromium** — Headless local browser via the `agent-browser` CLI
See [Browser Automation](/docs/user-guide/features/browser) for setup and usage.
## Voice & TTS Providers
Text-to-speech and speech-to-text across all messaging platforms:
| Provider | Quality | Cost | API Key |
||----------|---------|------|---------|
|| **Edge TTS** (default) | Good | Free | None needed |
|| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
|| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
|| **MiniMax** | Good | Paid | `MINIMAX_API_KEY` |
|| **NeuTTS** | Good | Free | None needed |
Speech-to-text supports six providers: local faster-whisper (free, runs on-device), a local command wrapper, Groq, OpenAI Whisper API, Mistral, and xAI. Voice message transcription works across Telegram, Discord, WhatsApp, and other messaging platforms. See [Voice & TTS](/docs/user-guide/features/tts) and [Voice Mode](/docs/user-guide/features/voice-mode) for details.
## IDE & Editor Integration
- **[IDE Integration (ACP)](/docs/user-guide/features/acp)** — Use Hermes Agent inside ACP-compatible editors such as VS Code, Zed, and JetBrains. Hermes runs as an ACP server, rendering chat messages, tool activity, file diffs, and terminal commands inside your editor.
## Programmatic Access
- **[API Server](/docs/user-guide/features/api-server)** — Expose Hermes as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox — can connect and use Hermes as a backend with its full toolset.
## Memory & Personalization
- **[Built-in Memory](/docs/user-guide/features/memory)** — Persistent, curated memory via `MEMORY.md` and `USER.md` files. The agent maintains bounded stores of personal notes and user profile data that survive across sessions.
- **[Memory Providers](/docs/user-guide/features/memory-providers)** — Plug in external memory backends for deeper personalization. Eight providers are supported: Honcho (dialectic reasoning), OpenViking (tiered retrieval), Mem0 (cloud extraction), Hindsight (knowledge graphs), Holographic (local SQLite), RetainDB (hybrid search), ByteRover (CLI-based), and Supermemory.
## Messaging Platforms
Hermes runs as a gateway bot on 19+ messaging platforms, all configured through the same `gateway` subsystem:
- **[Telegram](/docs/user-guide/messaging/telegram)**, **[Discord](/docs/user-guide/messaging/discord)**, **[Slack](/docs/user-guide/messaging/slack)**, **[WhatsApp](/docs/user-guide/messaging/whatsapp)**, **[Signal](/docs/user-guide/messaging/signal)**, **[Matrix](/docs/user-guide/messaging/matrix)**, **[Mattermost](/docs/user-guide/messaging/mattermost)**, **[Email](/docs/user-guide/messaging/email)**, **[SMS](/docs/user-guide/messaging/sms)**, **[DingTalk](/docs/user-guide/messaging/dingtalk)**, **[Feishu/Lark](/docs/user-guide/messaging/feishu)**, **[WeCom](/docs/user-guide/messaging/wecom)**, **[WeCom Callback](/docs/user-guide/messaging/wecom-callback)**, **[Weixin](/docs/user-guide/messaging/weixin)**, **[BlueBubbles](/docs/user-guide/messaging/bluebubbles)**, **[QQ Bot](/docs/user-guide/messaging/qqbot)**, **[Yuanbao](/docs/user-guide/messaging/yuanbao)**, **[Home Assistant](/docs/user-guide/messaging/homeassistant)**, **[Microsoft Teams](/docs/user-guide/messaging/teams)**, **[Webhooks](/docs/user-guide/messaging/webhooks)**
See the [Messaging Gateway overview](/docs/user-guide/messaging) for the platform comparison table and setup guide.
## Home Automation
- **[Home Assistant](/docs/user-guide/messaging/homeassistant)** — Control smart home devices via four dedicated tools (`ha_list_entities`, `ha_get_state`, `ha_list_services`, `ha_call_service`). The Home Assistant toolset activates automatically when `HASS_TOKEN` is configured.
## Plugins
- **[Plugin System](/docs/user-guide/features/plugins)** — Extend Hermes with custom tools, lifecycle hooks, and CLI commands without modifying core code. Plugins are discovered from `~/.hermes/plugins/`, project-local `.hermes/plugins/`, and pip-installed entry points.
- **[Build a Plugin](/docs/guides/build-a-hermes-plugin)** — Step-by-step guide for creating Hermes plugins with tools, hooks, and CLI commands.
## Training & Evaluation
- **[RL Training](/docs/user-guide/features/rl-training)** — Generate trajectory data from agent sessions for reinforcement learning and model fine-tuning. Supports Atropos environments with customizable reward functions.
- **[Batch Processing](/docs/user-guide/features/batch-processing)** — Run the agent across hundreds of prompts in parallel, generating structured ShareGPT-format trajectory data for training data generation or evaluation.
---
# AI Providers
# AI Providers
This page covers setting up inference providers for Hermes Agent — from cloud APIs like OpenRouter and Anthropic, to self-hosted endpoints like Ollama and vLLM, to advanced routing and fallback configurations. You need at least one provider configured to use Hermes.
## Inference Providers
You need at least one way to connect to an LLM. Use `hermes model` to switch providers and models interactively, or configure directly:
| Provider | Setup |
|----------|-------|
| **Nous Portal** | `hermes model` (OAuth, subscription-based) |
| **OpenAI Codex** | `hermes model` (ChatGPT OAuth, uses Codex models) |
| **GitHub Copilot** | `hermes model` (OAuth device code flow, `COPILOT_GITHUB_TOKEN`, `GH_TOKEN`, or `gh auth token`) |
| **GitHub Copilot ACP** | `hermes model` (spawns local `copilot --acp --stdio`) |
| **Anthropic** | `hermes model` (Claude Max + extra usage credits via OAuth; also supports Anthropic API key or manual setup-token — see note below) |
| **OpenRouter** | `OPENROUTER_API_KEY` in `~/.hermes/.env` |
| **AI Gateway** | `AI_GATEWAY_API_KEY` in `~/.hermes/.env` (provider: `ai-gateway`) |
| **z.ai / GLM** | `GLM_API_KEY` in `~/.hermes/.env` (provider: `zai`) |
| **Kimi / Moonshot** | `KIMI_API_KEY` in `~/.hermes/.env` (provider: `kimi-coding`) |
| **Kimi / Moonshot (China)** | `KIMI_CN_API_KEY` in `~/.hermes/.env` (provider: `kimi-coding-cn`; aliases: `kimi-cn`, `moonshot-cn`) |
| **Arcee AI** | `ARCEEAI_API_KEY` in `~/.hermes/.env` (provider: `arcee`; aliases: `arcee-ai`, `arceeai`) |
| **GMI Cloud** | `GMI_API_KEY` in `~/.hermes/.env` (provider: `gmi`; aliases: `gmi-cloud`, `gmicloud`) |
| **MiniMax** | `MINIMAX_API_KEY` in `~/.hermes/.env` (provider: `minimax`) |
| **MiniMax China** | `MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn`) |
| **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`) |
| **Alibaba Coding Plan** | `DASHSCOPE_API_KEY` (provider: `alibaba-coding-plan`, alias: `alibaba_coding`) — separate billing SKU, different endpoint |
| **Kilo Code** | `KILOCODE_API_KEY` in `~/.hermes/.env` (provider: `kilocode`) |
| **Xiaomi MiMo** | `XIAOMI_API_KEY` in `~/.hermes/.env` (provider: `xiaomi`, aliases: `mimo`, `xiaomi-mimo`) |
| **Tencent TokenHub** | `TOKENHUB_API_KEY` in `~/.hermes/.env` (provider: `tencent-tokenhub`, aliases: `tencent`, `tokenhub`, `tencentmaas`) |
| **OpenCode Zen** | `OPENCODE_ZEN_API_KEY` in `~/.hermes/.env` (provider: `opencode-zen`) |
| **OpenCode Go** | `OPENCODE_GO_API_KEY` in `~/.hermes/.env` (provider: `opencode-go`) |
| **DeepSeek** | `DEEPSEEK_API_KEY` in `~/.hermes/.env` (provider: `deepseek`) |
| **Hugging Face** | `HF_TOKEN` in `~/.hermes/.env` (provider: `huggingface`, aliases: `hf`) |
| **Google / Gemini** | `GOOGLE_API_KEY` (or `GEMINI_API_KEY`) in `~/.hermes/.env` (provider: `gemini`) |
| **Google Gemini (OAuth)** | `hermes model` → "Google Gemini (OAuth)" (provider: `google-gemini-cli`, free tier supported, browser PKCE login) |
| **LM Studio** | `hermes model` → "LM Studio" (provider: `lmstudio`, optional `LM_API_KEY`) |
| **Custom Endpoint** | `hermes model` → choose "Custom endpoint" (saved in `config.yaml`) |
For the official API-key path, see the dedicated [Google Gemini guide](/docs/guides/google-gemini).
:::tip Model key alias
In the `model:` config section, you can use either `default:` or `model:` as the key name for your model ID. Both `model: { default: my-model }` and `model: { model: my-model }` work identically.
:::
### Google Gemini via OAuth (`google-gemini-cli`)
The `google-gemini-cli` provider uses Google's Cloud Code Assist backend — the
same API that Google's own `gemini-cli` tool uses. This supports both the
**free tier** (generous daily quota for personal accounts) and **paid tiers**
(Standard/Enterprise via a GCP project).
**Quick start:**
```bash
hermes model
# → pick "Google Gemini (OAuth)"
# → see policy warning, confirm
# → browser opens to accounts.google.com, sign in
# → done — Hermes auto-provisions your free tier on first request
```
Hermes ships Google's **public** `gemini-cli` desktop OAuth client by default —
the same credentials Google includes in their open-source `gemini-cli`. Desktop
OAuth clients are not confidential (PKCE provides the security). You do not
need to install `gemini-cli` or register your own GCP OAuth client.
**How auth works:**
- PKCE Authorization Code flow against `accounts.google.com`
- Browser callback at `http://127.0.0.1:8085/oauth2callback` (with ephemeral-port fallback if busy)
- Tokens stored at `~/.hermes/auth/google_oauth.json` (chmod 0600, atomic write, cross-process `fcntl` lock)
- Automatic refresh 60 s before expiry
- Headless environments (SSH, `HERMES_HEADLESS=1`) → paste-mode fallback
- Inflight refresh deduplication — two concurrent requests won't double-refresh
- `invalid_grant` (revoked refresh) → credential file wiped, user prompted to re-login
**How inference works:**
- Traffic goes to `https://cloudcode-pa.googleapis.com/v1internal:generateContent`
(or `:streamGenerateContent?alt=sse` for streaming), NOT the paid `v1beta/openai` endpoint
- Request body wrapped `{project, model, user_prompt_id, request}`
- OpenAI-shaped `messages[]`, `tools[]`, `tool_choice` are translated to Gemini's native
`contents[]`, `tools[].functionDeclarations`, `toolConfig` shape
- Responses translated back to OpenAI shape so the rest of Hermes works unchanged
**Tiers & project IDs:**
| Your situation | What to do |
|---|---|
| Personal Google account, want free tier | Nothing — sign in, start chatting |
| Workspace / Standard / Enterprise account | Set `HERMES_GEMINI_PROJECT_ID` or `GOOGLE_CLOUD_PROJECT` to your GCP project ID |
| VPC-SC-protected org | Hermes detects `SECURITY_POLICY_VIOLATED` and forces `standard-tier` automatically |
Free tier auto-provisions a Google-managed project on first use. No GCP setup required.
**Quota monitoring:**
```
/gquota
```
Shows remaining Code Assist quota per model with progress bars:
```
Gemini Code Assist quota (project: 123-abc)
gemini-2.5-pro ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░ 85%
gemini-2.5-flash [input] ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░ 92%
```
:::warning Policy risk
Google considers using the Gemini CLI OAuth client with third-party software a
policy violation. Some users have reported account restrictions. For the lowest-risk
experience, use your own API key via the `gemini` provider instead. Hermes shows
an upfront warning and requires explicit confirmation before OAuth begins.
:::
**Custom OAuth client (optional):**
If you'd rather register your own Google OAuth client — e.g., to keep quota
and consent scoped to your own GCP project — set:
```bash
HERMES_GEMINI_CLIENT_ID=your-client.apps.googleusercontent.com
HERMES_GEMINI_CLIENT_SECRET=... # optional for Desktop clients
```
Register a **Desktop app** OAuth client at
[console.cloud.google.com/apis/credentials](https://console.cloud.google.com/apis/credentials)
with the Generative Language API enabled.
:::info Codex Note
The OpenAI Codex provider authenticates via device code (open a URL, enter a code). Hermes stores the resulting credentials in its own auth store under `~/.hermes/auth.json` and can import existing Codex CLI credentials from `~/.codex/auth.json` when present. No Codex CLI installation is required.
:::
:::warning
Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use a separate "auxiliary" model. By default (`auxiliary.*.provider: "auto"`), Hermes routes these tasks to your **main chat model** — the same model you picked in `hermes model`. You can override each task individually to route it to a cheaper/faster model (e.g. Gemini Flash on OpenRouter) — see [Auxiliary Models](/docs/user-guide/configuration#auxiliary-models).
:::
:::tip Nous Tool Gateway
Paid Nous Portal subscribers also get access to the **[Tool Gateway](/docs/user-guide/features/tool-gateway)** — web search, image generation, TTS, and browser automation routed through your subscription. No extra API keys needed. It's offered automatically during `hermes model` setup, or enable it later with `hermes tools`.
:::
### Two Commands for Model Management
Hermes has **two** model commands that serve different purposes:
| Command | Where to run | What it does |
|---------|-------------|--------------|
| **`hermes model`** | Your terminal (outside any session) | Full setup wizard — add providers, run OAuth, enter API keys, configure endpoints |
| **`/model`** | Inside a Hermes chat session | Quick switch between **already-configured** providers and models |
If you're trying to switch to a provider you haven't set up yet (e.g. you only have OpenRouter configured and want to use Anthropic), you need `hermes model`, not `/model`. Exit your session first (`Ctrl+C` or `/quit`), run `hermes model`, complete the provider setup, then start a new session.
### Anthropic (Native)
Use Claude models directly through the Anthropic API — no OpenRouter proxy needed. Supports three auth methods:
:::caution Requires Claude Max "extra usage" credits
When you authenticate via `hermes model` → Anthropic OAuth (or via `hermes auth add anthropic --type oauth`), Hermes routes as Claude Code against your Anthropic account. **It only works if you're on a Claude Max plan and have purchased extra usage credits.** The base Max plan allowance (the usage included in Claude Code by default) is not consumed by Hermes — only the extra/overage credits you've added on top are. Claude Pro subscribers cannot use this path.
If you don't have Max + extra credits, use an `ANTHROPIC_API_KEY` instead — requests are billed pay-per-token against that key's organization (standard API pricing, independent of any Claude subscription).
:::
```bash
# With an API key (pay-per-token)
export ANTHROPIC_API_KEY=***
hermes chat --provider anthropic --model claude-sonnet-4-6
# Preferred: authenticate through `hermes model`
# Hermes will use Claude Code's credential store directly when available
hermes model
# Manual override with a setup-token (fallback / legacy)
export ANTHROPIC_TOKEN=*** # setup-token or manual OAuth token
hermes chat --provider anthropic
# Auto-detect Claude Code credentials (if you already use Claude Code)
hermes chat --provider anthropic # reads Claude Code credential files automatically
```
When you choose Anthropic OAuth through `hermes model`, Hermes prefers Claude Code's own credential store over copying the token into `~/.hermes/.env`. That keeps refreshable Claude credentials refreshable.
Or set it permanently:
```yaml
model:
provider: "anthropic"
default: "claude-sonnet-4-6"
```
:::tip Aliases
`--provider claude` and `--provider claude-code` also work as shorthand for `--provider anthropic`.
:::
### GitHub Copilot
Hermes supports GitHub Copilot as a first-class provider with two modes:
**`copilot` — Direct Copilot API** (recommended). Uses your GitHub Copilot subscription to access GPT-5.x, Claude, Gemini, and other models through the Copilot API.
```bash
hermes chat --provider copilot --model gpt-5.4
```
**Authentication options** (checked in this order):
1. `COPILOT_GITHUB_TOKEN` environment variable
2. `GH_TOKEN` environment variable
3. `GITHUB_TOKEN` environment variable
4. `gh auth token` CLI fallback
If no token is found, `hermes model` offers an **OAuth device code login** — the same flow used by the Copilot CLI and opencode.
:::warning Token types
The Copilot API does **not** support classic Personal Access Tokens (`ghp_*`). Supported token types:
| Type | Prefix | How to get |
|------|--------|------------|
| OAuth token | `gho_` | `hermes model` → GitHub Copilot → Login with GitHub |
| Fine-grained PAT | `github_pat_` | GitHub Settings → Developer settings → Fine-grained tokens (needs **Copilot Requests** permission) |
| GitHub App token | `ghu_` | Via GitHub App installation |
If your `gh auth token` returns a `ghp_*` token, use `hermes model` to authenticate via OAuth instead.
:::
:::info Copilot auth behavior in Hermes
Hermes sends a supported GitHub token (`gho_*`, `github_pat_*`, or `ghu_*`) directly to `api.githubcopilot.com` and includes Copilot-specific headers (`Editor-Version`, `Copilot-Integration-Id`, `Openai-Intent`, `x-initiator`).
On HTTP 401, Hermes now performs a one-shot credential recovery before fallback:
1. Re-resolve token via the normal priority chain (`COPILOT_GITHUB_TOKEN` → `GH_TOKEN` → `GITHUB_TOKEN` → `gh auth token`)
2. Rebuild the shared OpenAI client with refreshed headers
3. Retry the request once
Some older community proxies use `api.github.com/copilot_internal/v2/token` exchange flows. That endpoint can be unavailable for some account types (returns 404). Hermes therefore keeps direct-token auth as the primary path and relies on runtime credential refresh + retry for robustness.
:::
**API routing**: GPT-5+ models (except `gpt-5-mini`) automatically use the Responses API. All other models (GPT-4o, Claude, Gemini, etc.) use Chat Completions. Models are auto-detected from the live Copilot catalog.
**`copilot-acp` — Copilot ACP agent backend**. Spawns the local Copilot CLI as a subprocess:
```bash
hermes chat --provider copilot-acp --model copilot-acp
# Requires the GitHub Copilot CLI in PATH and an existing `copilot login` session
```
**Permanent config:**
```yaml
model:
provider: "copilot"
default: "gpt-5.4"
```
| Environment variable | Description |
|---------------------|-------------|
| `COPILOT_GITHUB_TOKEN` | GitHub token for Copilot API (first priority) |
| `HERMES_COPILOT_ACP_COMMAND` | Override the Copilot CLI binary path (default: `copilot`) |
| `HERMES_COPILOT_ACP_ARGS` | Override ACP args (default: `--acp --stdio`) |
### First-Class API-Key Providers
These providers have built-in support with dedicated provider IDs. Set the API key and use `--provider` to select:
```bash
# z.ai / ZhipuAI GLM
hermes chat --provider zai --model glm-5
# Requires: GLM_API_KEY in ~/.hermes/.env
# Kimi / Moonshot AI (international: api.moonshot.ai)
hermes chat --provider kimi-coding --model kimi-for-coding
# Requires: KIMI_API_KEY in ~/.hermes/.env
# Kimi / Moonshot AI (China: api.moonshot.cn)
hermes chat --provider kimi-coding-cn --model kimi-k2.5
# Requires: KIMI_CN_API_KEY in ~/.hermes/.env
# MiniMax (global endpoint)
hermes chat --provider minimax --model MiniMax-M2.7
# Requires: MINIMAX_API_KEY in ~/.hermes/.env
# MiniMax (China endpoint)
hermes chat --provider minimax-cn --model MiniMax-M2.7
# Requires: MINIMAX_CN_API_KEY in ~/.hermes/.env
# Alibaba Cloud / DashScope (Qwen models)
hermes chat --provider alibaba --model qwen3.5-plus
# Requires: DASHSCOPE_API_KEY in ~/.hermes/.env
# Xiaomi MiMo
hermes chat --provider xiaomi --model mimo-v2-pro
# Requires: XIAOMI_API_KEY in ~/.hermes/.env
# Tencent TokenHub (Hy3 Preview)
hermes chat --provider tencent-tokenhub --model hy3-preview
# Requires: TOKENHUB_API_KEY in ~/.hermes/.env
# Arcee AI (Trinity models)
hermes chat --provider arcee --model trinity-large-thinking
# Requires: ARCEEAI_API_KEY in ~/.hermes/.env
# GMI Cloud
# Use the exact model ID returned by GMI's /v1/models endpoint.
hermes chat --provider gmi --model zai-org/GLM-5.1-FP8
# Requires: GMI_API_KEY in ~/.hermes/.env
```
Or set the provider permanently in `config.yaml`:
```yaml
model:
provider: "gmi"
default: "zai-org/GLM-5.1-FP8"
```
Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, `MINIMAX_CN_BASE_URL`, `DASHSCOPE_BASE_URL`, `XIAOMI_BASE_URL`, `GMI_BASE_URL`, or `TOKENHUB_BASE_URL` environment variables.
:::note Z.AI Endpoint Auto-Detection
When using the Z.AI / GLM provider, Hermes automatically probes multiple endpoints (global, China, coding variants) to find one that accepts your API key. You don't need to set `GLM_BASE_URL` manually — the working endpoint is detected and cached automatically.
:::
### xAI (Grok) — Responses API + Prompt Caching
xAI is wired through the Responses API (`codex_responses` transport) for automatic reasoning support on Grok 4 models — no `reasoning_effort` parameter needed, the server reasons by default. Set `XAI_API_KEY` in `~/.hermes/.env` and pick xAI in `hermes model`, or drop `grok` as a shortcut into `/model grok-4-1-fast-reasoning`.
When using xAI as a provider (any base URL containing `x.ai`), Hermes automatically enables prompt caching by sending the `x-grok-conv-id` header with every API request. This routes requests to the same server within a conversation session, allowing xAI's infrastructure to reuse cached system prompts and conversation history.
No configuration is needed — caching activates automatically when an xAI endpoint is detected and a session ID is available. This reduces latency and cost for multi-turn conversations.
xAI also ships a dedicated TTS endpoint (`/v1/tts`). Select **xAI TTS** in `hermes tools` → Voice & TTS, or see the [Voice & TTS](../user-guide/features/tts.md#text-to-speech) page for config.
### Ollama Cloud — Managed Ollama Models, OAuth + API Key
[Ollama Cloud](https://ollama.com/cloud) hosts the same open-weight catalog as local Ollama but without the GPU requirement. Pick it in `hermes model` as **Ollama Cloud**, paste your API key from [ollama.com/settings/keys](https://ollama.com/settings/keys), and Hermes auto-discovers the available models.
```bash
hermes model
# → pick "Ollama Cloud"
# → paste your OLLAMA_API_KEY
# → select from discovered models (gpt-oss:120b, glm-4.6:cloud, qwen3-coder:480b-cloud, etc.)
```
Or `config.yaml` directly:
```yaml
model:
provider: "ollama-cloud"
default: "gpt-oss:120b"
```
The model catalog is fetched dynamically from `ollama.com/v1/models` and cached for one hour. `model:tag` notation (e.g. `qwen3-coder:480b-cloud`) is preserved through normalization — don't use dashes.
:::tip Ollama Cloud vs local Ollama
Both speak the same OpenAI-compatible API. Cloud is a first-class provider (`--provider ollama-cloud`, `OLLAMA_API_KEY`); local Ollama is reached via the Custom Endpoint flow (base URL `http://localhost:11434/v1`, no key). Use cloud for large models you can't run locally; use local for privacy or offline work.
:::
### AWS Bedrock
Anthropic Claude, Amazon Nova, DeepSeek v3.2, Meta Llama 4, and other models via AWS Bedrock. Uses the AWS SDK (`boto3`) credential chain — no API key, just standard AWS auth.
```bash
# Simplest — named profile in ~/.aws/credentials
hermes chat --provider bedrock --model us.anthropic.claude-sonnet-4-6
# Or with explicit env vars
AWS_PROFILE=myprofile AWS_REGION=us-east-1 hermes chat --provider bedrock --model us.anthropic.claude-sonnet-4-6
```
Or permanently in `config.yaml`:
```yaml
model:
provider: "bedrock"
default: "us.anthropic.claude-sonnet-4-6"
bedrock:
region: "us-east-1" # or set AWS_REGION
# profile: "myprofile" # or set AWS_PROFILE
# discovery: true # auto-discover region from IAM
# guardrail: # optional Bedrock Guardrails
# id: "your-guardrail-id"
# version: "DRAFT"
```
Authentication uses the standard boto3 chain: explicit `AWS_ACCESS_KEY_ID`/`AWS_SECRET_ACCESS_KEY`, `AWS_PROFILE` from `~/.aws/credentials`, IAM role on EC2/ECS/Lambda, IMDS, or SSO. No env var is required if you're already authenticated with the AWS CLI.
Bedrock uses the **Converse API** under the hood — requests are translated to Bedrock's model-agnostic shape, so the same config works for Claude, Nova, DeepSeek, and Llama models. Set `BEDROCK_BASE_URL` only if you're calling a non-default regional endpoint.
See the [AWS Bedrock guide](/docs/guides/aws-bedrock) for a walkthrough of IAM setup, region selection, and cross-region inference.
### Qwen Portal (OAuth)
Alibaba's Qwen Portal with browser-based OAuth login. Pick **Qwen OAuth (Portal)** in `hermes model`, sign in through the browser, and Hermes persists the refresh token.
```bash
hermes model
# → pick "Qwen OAuth (Portal)"
# → browser opens; sign in with your Alibaba account
# → confirm — credentials are saved to ~/.hermes/auth.json
hermes chat # uses portal.qwen.ai/v1 endpoint
```
Or configure `config.yaml`:
```yaml
model:
provider: "qwen-oauth"
default: "qwen3-coder-plus"
```
Set `HERMES_QWEN_BASE_URL` only if the portal endpoint relocates (default: `https://portal.qwen.ai/v1`).
:::tip Qwen OAuth vs DashScope (Alibaba)
`qwen-oauth` uses the consumer-facing Qwen Portal with OAuth login — ideal for individual users. The `alibaba` provider uses DashScope's enterprise API with a `DASHSCOPE_API_KEY` — ideal for programmatic / production workloads. Both route to Qwen-family models but live at different endpoints.
:::
### Alibaba Coding Plan
If you're subscribed to Alibaba's **Coding Plan** (a pricing SKU separate from standard DashScope API access), Hermes exposes it as its own first-class provider: `alibaba-coding-plan`. Endpoint: `https://coding-intl.dashscope.aliyuncs.com/v1`. It's OpenAI-compatible like the regular `alibaba` provider but with a different base URL and billing surface.
```yaml
model:
provider: alibaba_coding # alias for alibaba-coding-plan
model: qwen3-coder-plus
```
Or from the CLI:
```bash
hermes chat --provider alibaba_coding --model qwen3-coder-plus
```
`alibaba_coding` uses the same `DASHSCOPE_API_KEY` your `alibaba` entry already uses — no separate key needed, just a different routing target. Before this provider was registered, users who set `provider: alibaba_coding` in `config.yaml` silently fell through to OpenRouter routing.
### MiniMax (OAuth)
MiniMax-M2.7 via browser OAuth login — no API key needed. Pick **MiniMax (OAuth)** in `hermes model`, sign in through the browser, and Hermes persists the access + refresh tokens. Uses the Anthropic Messages-compatible endpoint (`/anthropic`) under the hood.
```bash
hermes model
# → pick "MiniMax (OAuth)"
# → browser opens; sign in with your MiniMax account (global or CN region)
# → confirm — credentials are saved to ~/.hermes/auth.json
hermes chat # uses api.minimax.io/anthropic endpoint
```
Or configure `config.yaml`:
```yaml
model:
provider: "minimax-oauth"
default: "MiniMax-M2.7"
```
Supported models: `MiniMax-M2.7` (main) and `MiniMax-M2.7-highspeed` (wired as the default auxiliary model). The OAuth path ignores `MINIMAX_API_KEY` / `MINIMAX_BASE_URL`.
:::tip MiniMax OAuth vs API key
`minimax-oauth` uses MiniMax's consumer-facing portal with OAuth login — no billing setup required. The `minimax` and `minimax-cn` providers use `MINIMAX_API_KEY` / `MINIMAX_CN_API_KEY` — for programmatic access. See the [MiniMax OAuth guide](/docs/guides/minimax-oauth) for a full walkthrough.
:::
### NVIDIA NIM
Nemotron and other open source models via [build.nvidia.com](https://build.nvidia.com) (free API key) or a local NIM endpoint.
```bash
# Cloud (build.nvidia.com)
hermes chat --provider nvidia --model nvidia/nemotron-3-super-120b-a12b
# Requires: NVIDIA_API_KEY in ~/.hermes/.env
# Local NIM endpoint — override base URL
NVIDIA_BASE_URL=http://localhost:8000/v1 hermes chat --provider nvidia --model nvidia/nemotron-3-super-120b-a12b
```
Or set it permanently in `config.yaml`:
```yaml
model:
provider: "nvidia"
default: "nvidia/nemotron-3-super-120b-a12b"
```
:::tip Local NIM
For on-prem deployments (DGX Spark, local GPU), set `NVIDIA_BASE_URL=http://localhost:8000/v1`. NIM exposes the same OpenAI-compatible chat completions API as build.nvidia.com, so switching between cloud and local is a one-line env-var change.
:::
### GMI Cloud
Open and reasoning models via [GMI Cloud](https://inference.gmi.ai) — OpenAI-compatible API, API key authentication.
```bash
# GMI Cloud
hermes chat --provider gmi --model deepseek-ai/DeepSeek-R1
# Requires: GMI_API_KEY in ~/.hermes/.env
```
Or set it permanently in `config.yaml`:
```yaml
model:
provider: "gmi"
default: "deepseek-ai/DeepSeek-R1"
```
The base URL can be overridden with `GMI_BASE_URL` (default: `https://api.gmi.ai/v1`).
### StepFun
Step-series models via [StepFun](https://platform.stepfun.com) — OpenAI-compatible API, API key authentication.
```bash
# StepFun
hermes chat --provider stepfun --model step-3-mini
# Requires: STEPFUN_API_KEY in ~/.hermes/.env
```
Or set it permanently in `config.yaml`:
```yaml
model:
provider: "stepfun"
default: "step-3-mini"
```
The base URL can be overridden with `STEPFUN_BASE_URL` (default: `https://api.stepfun.com/v1`).
### Hugging Face Inference Providers
[Hugging Face Inference Providers](https://huggingface.co/docs/inference-providers) routes to 20+ open models through a unified OpenAI-compatible endpoint (`router.huggingface.co/v1`). Requests are automatically routed to the fastest available backend (Groq, Together, SambaNova, etc.) with automatic failover.
```bash
# Use any available model
hermes chat --provider huggingface --model Qwen/Qwen3-235B-A22B-Thinking-2507
# Requires: HF_TOKEN in ~/.hermes/.env
# Short alias
hermes chat --provider hf --model deepseek-ai/DeepSeek-V3.2
```
Or set it permanently in `config.yaml`:
```yaml
model:
provider: "huggingface"
default: "Qwen/Qwen3-235B-A22B-Thinking-2507"
```
Get your token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) — make sure to enable the "Make calls to Inference Providers" permission. Free tier included ($0.10/month credit, no markup on provider rates).
You can append routing suffixes to model names: `:fastest` (default), `:cheapest`, or `:provider_name` to force a specific backend.
The base URL can be overridden with `HF_BASE_URL`.
## Custom & Self-Hosted LLM Providers
Hermes Agent works with **any OpenAI-compatible API endpoint**. If a server implements `/v1/chat/completions`, you can point Hermes at it. This means you can use local models, GPU inference servers, multi-provider routers, or any third-party API.
### General Setup
Three ways to configure a custom endpoint:
**Interactive setup (recommended):**
```bash
hermes model
# Select "Custom endpoint (self-hosted / VLLM / etc.)"
# Enter: API base URL, API key, Model name
```
**Manual config (`config.yaml`):**
```yaml
# In ~/.hermes/config.yaml
model:
default: your-model-name
provider: custom
base_url: http://localhost:8000/v1
api_key: your-key-or-leave-empty-for-local
```
:::warning Legacy env vars
`OPENAI_BASE_URL` and `LLM_MODEL` in `.env` are **removed**. Neither is read by any part of Hermes — `config.yaml` is the single source of truth for model and endpoint configuration. If you have stale entries in your `.env`, they are automatically cleared on the next `hermes setup` or config migration. Use `hermes model` or edit `config.yaml` directly.
:::
Both approaches persist to `config.yaml`, which is the source of truth for model, provider, and base URL.
### Switching Models with `/model`
:::warning hermes model vs /model
**`hermes model`** (run from your terminal, outside any chat session) is the **full provider setup wizard**. Use it to add new providers, run OAuth flows, enter API keys, and configure custom endpoints.
**`/model`** (typed inside an active Hermes chat session) can only **switch between providers and models you've already set up**. It cannot add new providers, run OAuth, or prompt for API keys. If you've only configured one provider (e.g. OpenRouter), `/model` will only show models for that provider.
**To add a new provider:** Exit your session (`Ctrl+C` or `/quit`), run `hermes model`, set up the new provider, then start a new session.
:::
Once you have at least one custom endpoint configured, you can switch models mid-session:
```
/model custom:qwen-2.5 # Switch to a model on your custom endpoint
/model custom # Auto-detect the model from the endpoint
/model openrouter:claude-sonnet-4 # Switch back to a cloud provider
```
If you have **named custom providers** configured (see below), use the triple syntax:
```
/model custom:local:qwen-2.5 # Use the "local" custom provider with model qwen-2.5
/model custom:work:llama3 # Use the "work" custom provider with llama3
```
When switching providers, Hermes persists the base URL and provider to config so the change survives restarts. When switching away from a custom endpoint to a built-in provider, the stale base URL is automatically cleared.
:::tip
`/model custom` (bare, no model name) queries your endpoint's `/models` API and auto-selects the model if exactly one is loaded. Useful for local servers running a single model.
:::
Everything below follows this same pattern — just change the URL, key, and model name.
---
### Ollama — Local Models, Zero Config
[Ollama](https://ollama.com/) runs open-weight models locally with one command. Best for: quick local experimentation, privacy-sensitive work, offline use. Supports tool calling via the OpenAI-compatible API.
```bash
# Install and run a model
ollama pull qwen2.5-coder:32b
ollama serve # Starts on port 11434
```
Then configure Hermes:
```bash
hermes model
# Select "Custom endpoint (self-hosted / VLLM / etc.)"
# Enter URL: http://localhost:11434/v1
# Skip API key (Ollama doesn't need one)
# Enter model name (e.g. qwen2.5-coder:32b)
```
Or configure `config.yaml` directly:
```yaml
model:
default: qwen2.5-coder:32b
provider: custom
base_url: http://localhost:11434/v1
context_length: 32768 # See warning below
```
:::caution Ollama defaults to very low context lengths
Ollama does **not** use your model's full context window by default. Depending on your VRAM, the default is:
| Available VRAM | Default context |
|----------------|----------------|
| Less than 24 GB | **4,096 tokens** |
| 24–48 GB | 32,768 tokens |
| 48+ GB | 256,000 tokens |
For agent use with tools, **you need at least 16k–32k context**. At 4k, the system prompt + tool schemas alone can fill the window, leaving no room for conversation.
**How to increase it** (pick one):
```bash
# Option 1: Set server-wide via environment variable (recommended)
OLLAMA_CONTEXT_LENGTH=32768 ollama serve
# Option 2: For systemd-managed Ollama
sudo systemctl edit ollama.service
# Add: Environment="OLLAMA_CONTEXT_LENGTH=32768"
# Then: sudo systemctl daemon-reload && sudo systemctl restart ollama
# Option 3: Bake it into a custom model (persistent per-model)
echo -e "FROM qwen2.5-coder:32b\nPARAMETER num_ctx 32768" > Modelfile
ollama create qwen2.5-coder-32k -f Modelfile
```
**You cannot set context length through the OpenAI-compatible API** (`/v1/chat/completions`). It must be configured server-side or via a Modelfile. This is the #1 source of confusion when integrating Ollama with tools like Hermes.
:::
**Verify your context is set correctly:**
```bash
ollama ps
# Look at the CONTEXT column — it should show your configured value
```
:::tip
List available models with `ollama list`. Pull any model from the [Ollama library](https://ollama.com/library) with `ollama pull `. Ollama handles GPU offloading automatically — no configuration needed for most setups.
:::
---
### vLLM — High-Performance GPU Inference
[vLLM](https://docs.vllm.ai/) is the standard for production LLM serving. Best for: maximum throughput on GPU hardware, serving large models, continuous batching.
```bash
pip install vllm
vllm serve meta-llama/Llama-3.1-70B-Instruct \
--port 8000 \
--max-model-len 65536 \
--tensor-parallel-size 2 \
--enable-auto-tool-choice \
--tool-call-parser hermes
```
Then configure Hermes:
```bash
hermes model
# Select "Custom endpoint (self-hosted / VLLM / etc.)"
# Enter URL: http://localhost:8000/v1
# Skip API key (or enter one if you configured vLLM with --api-key)
# Enter model name: meta-llama/Llama-3.1-70B-Instruct
```
**Context length:** vLLM reads the model's `max_position_embeddings` by default. If that exceeds your GPU memory, it errors and asks you to set `--max-model-len` lower. You can also use `--max-model-len auto` to automatically find the maximum that fits. Set `--gpu-memory-utilization 0.95` (default 0.9) to squeeze more context into VRAM.
**Tool calling requires explicit flags:**
| Flag | Purpose |
|------|---------|
| `--enable-auto-tool-choice` | Required for `tool_choice: "auto"` (the default in Hermes) |
| `--tool-call-parser ` | Parser for the model's tool call format |
Supported parsers: `hermes` (Qwen 2.5, Hermes 2/3), `llama3_json` (Llama 3.x), `mistral`, `deepseek_v3`, `deepseek_v31`, `xlam`, `pythonic`. Without these flags, tool calls won't work — the model will output tool calls as text.
:::tip
vLLM supports human-readable sizes: `--max-model-len 64k` (lowercase k = 1000, uppercase K = 1024).
:::
---
### SGLang — Fast Serving with RadixAttention
[SGLang](https://github.com/sgl-project/sglang) is an alternative to vLLM with RadixAttention for KV cache reuse. Best for: multi-turn conversations (prefix caching), constrained decoding, structured output.
```bash
pip install "sglang[all]"
python -m sglang.launch_server \
--model meta-llama/Llama-3.1-70B-Instruct \
--port 30000 \
--context-length 65536 \
--tp 2 \
--tool-call-parser qwen
```
Then configure Hermes:
```bash
hermes model
# Select "Custom endpoint (self-hosted / VLLM / etc.)"
# Enter URL: http://localhost:30000/v1
# Enter model name: meta-llama/Llama-3.1-70B-Instruct
```
**Context length:** SGLang reads from the model's config by default. Use `--context-length` to override. If you need to exceed the model's declared maximum, set `SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1`.
**Tool calling:** Use `--tool-call-parser` with the appropriate parser for your model family: `qwen` (Qwen 2.5), `llama3`, `llama4`, `deepseekv3`, `mistral`, `glm`. Without this flag, tool calls come back as plain text.
:::caution SGLang defaults to 128 max output tokens
If responses seem truncated, add `max_tokens` to your requests or set `--default-max-tokens` on the server. SGLang's default is only 128 tokens per response if not specified in the request.
:::
---
### llama.cpp / llama-server — CPU & Metal Inference
[llama.cpp](https://github.com/ggml-org/llama.cpp) runs quantized models on CPU, Apple Silicon (Metal), and consumer GPUs. Best for: running models without a datacenter GPU, Mac users, edge deployment.
```bash
# Build and start llama-server
cmake -B build && cmake --build build --config Release
./build/bin/llama-server \
--jinja -fa \
-c 32768 \
-ngl 99 \
-m models/qwen2.5-coder-32b-instruct-Q4_K_M.gguf \
--port 8080 --host 0.0.0.0
```
**Context length (`-c`):** Recent builds default to `0` which reads the model's training context from the GGUF metadata. For models with 128k+ training context, this can OOM trying to allocate the full KV cache. Set `-c` explicitly to what you need (32k–64k is a good range for agent use). If using parallel slots (`-np`), the total context is divided among slots — with `-c 32768 -np 4`, each slot only gets 8k.
Then configure Hermes to point at it:
```bash
hermes model
# Select "Custom endpoint (self-hosted / VLLM / etc.)"
# Enter URL: http://localhost:8080/v1
# Skip API key (local servers don't need one)
# Enter model name — or leave blank to auto-detect if only one model is loaded
```
This saves the endpoint to `config.yaml` so it persists across sessions.
:::caution `--jinja` is required for tool calling
Without `--jinja`, llama-server ignores the `tools` parameter entirely. The model will try to call tools by writing JSON in its response text, but Hermes won't recognize it as a tool call — you'll see raw JSON like `{"name": "web_search", ...}` printed as a message instead of an actual search.
Native tool calling support (best performance): Llama 3.x, Qwen 2.5 (including Coder), Hermes 2/3, Mistral, DeepSeek, Functionary. All other models use a generic handler that works but may be less efficient. See the [llama.cpp function calling docs](https://github.com/ggml-org/llama.cpp/blob/master/docs/function-calling.md) for the full list.
You can verify tool support is active by checking `http://localhost:8080/props` — the `chat_template` field should be present.
:::
:::tip
Download GGUF models from [Hugging Face](https://huggingface.co/models?library=gguf). Q4_K_M quantization offers the best balance of quality vs. memory usage.
:::
---
### LM Studio — Desktop App with Local Models
[LM Studio](https://lmstudio.ai/) is a desktop app for running local models with a GUI. Best for: users who prefer a visual interface, quick model testing, developers on macOS/Windows/Linux.
Start the server from the LM Studio app (Developer tab → Start Server), or use the CLI:
```bash
lms server start # Starts on port 1234
lms load qwen2.5-coder --context-length 32768
```
Then configure Hermes:
```bash
hermes model
# Select "LM Studio"
# Press Enter to use http://localhost:1234/v1
# Pick one of the discovered models
# If LM Studio server auth is enabled, enter LM_API_KEY when prompted
```
Hermes will automatically load a LM Studio model with 64K context length
To change context length in LM Studio:
1. Click the gear icon next to the model picker
2. Set "Context Length" to at least 64000 for a smooth experience
3. Reload the model for the change to take effect
4. If your machine cannot fit 64000, consider using a smaller model with larger context lengths.
Alternatively, use the CLI: `lms load model-name --context-length 64000`
You can use the CLI to estimate if the model will fit: `lms load model-name --context-length 64000 --estimate-only`
To set persistent per-model defaults: My Models tab → gear icon on the model → set context size.
:::
**Tool calling:** Supported since LM Studio 0.3.6. Models with native tool-calling training (Qwen 2.5, Llama 3.x, Mistral, Hermes) are auto-detected and shown with a tool badge. Other models use a generic fallback that may be less reliable.
---
### WSL2 Networking (Windows Users)
Since Hermes Agent requires a Unix environment, Windows users run it inside WSL2. If your model server (Ollama, LM Studio, etc.) runs on the **Windows host**, you need to bridge the network gap — WSL2 uses a virtual network adapter with its own subnet, so `localhost` inside WSL2 refers to the Linux VM, **not** the Windows host.
:::tip Both in WSL2? No problem.
If your model server also runs inside WSL2 (common for vLLM, SGLang, and llama-server), `localhost` works as expected — they share the same network namespace. Skip this section.
:::
#### Option 1: Mirrored Networking Mode (Recommended)
Available on **Windows 11 22H2+**, mirrored mode makes `localhost` work bidirectionally between Windows and WSL2 — the simplest fix.
1. Create or edit `%USERPROFILE%\.wslconfig` (e.g., `C:\Users\YourName\.wslconfig`):
```ini
[wsl2]
networkingMode=mirrored
```
2. Restart WSL from PowerShell:
```powershell
wsl --shutdown
```
3. Reopen your WSL2 terminal. `localhost` now reaches Windows services:
```bash
curl http://localhost:11434/v1/models # Ollama on Windows — works
```
:::note Hyper-V Firewall
On some Windows 11 builds, the Hyper-V firewall blocks mirrored connections by default. If `localhost` still doesn't work after enabling mirrored mode, run this in an **Admin PowerShell**:
```powershell
Set-NetFirewallHyperVVMSetting -Name '{40E0AC32-46A5-438A-A0B2-2B479E8F2E90}' -DefaultInboundAction Allow
```
:::
#### Option 2: Use the Windows Host IP (Windows 10 / older builds)
If you can't use mirrored mode, find the Windows host IP from inside WSL2 and use that instead of `localhost`:
```bash
# Get the Windows host IP (the default gateway of WSL2's virtual network)
ip route show | grep -i default | awk '{ print $3 }'
# Example output: 172.29.192.1
```
Use that IP in your Hermes config:
```yaml
model:
default: qwen2.5-coder:32b
provider: custom
base_url: http://172.29.192.1:11434/v1 # Windows host IP, not localhost
```
:::tip Dynamic helper
The host IP can change on WSL2 restart. You can grab it dynamically in your shell:
```bash
export WSL_HOST=$(ip route show | grep -i default | awk '{ print $3 }')
echo "Windows host at: $WSL_HOST"
curl http://$WSL_HOST:11434/v1/models # Test Ollama
```
Or use your machine's mDNS name (requires `libnss-mdns` in WSL2):
```bash
sudo apt install libnss-mdns
curl http://$(hostname).local:11434/v1/models
```
:::
#### Server Bind Address (Required for NAT Mode)
If you're using **Option 2** (NAT mode with the host IP), the model server on Windows must accept connections from outside `127.0.0.1`. By default, most servers only listen on localhost — WSL2 connections in NAT mode come from a different virtual subnet and will be refused. In mirrored mode, `localhost` maps directly so the default `127.0.0.1` binding works fine.
| Server | Default bind | How to fix |
|--------|-------------|------------|
| **Ollama** | `127.0.0.1` | Set `OLLAMA_HOST=0.0.0.0` environment variable before starting Ollama (System Settings → Environment Variables on Windows, or edit the Ollama service) |
| **LM Studio** | `127.0.0.1` | Enable **"Serve on Network"** in the Developer tab → Server settings |
| **llama-server** | `127.0.0.1` | Add `--host 0.0.0.0` to the startup command |
| **vLLM** | `0.0.0.0` | Already binds to all interfaces by default |
| **SGLang** | `127.0.0.1` | Add `--host 0.0.0.0` to the startup command |
**Ollama on Windows (detailed):** Ollama runs as a Windows service. To set `OLLAMA_HOST`:
1. Open **System Properties** → **Environment Variables**
2. Add a new **System variable**: `OLLAMA_HOST` = `0.0.0.0`
3. Restart the Ollama service (or reboot)
#### Windows Firewall
Windows Firewall treats WSL2 as a separate network (in both NAT and mirrored mode). If connections still fail after the steps above, add a firewall rule for your model server's port:
```powershell
# Run in Admin PowerShell — replace PORT with your server's port
New-NetFirewallRule -DisplayName "Allow WSL2 to Model Server" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 11434
```
Common ports: Ollama `11434`, vLLM `8000`, SGLang `30000`, llama-server `8080`, LM Studio `1234`.
#### Quick Verification
From inside WSL2, test that you can reach your model server:
```bash
# Replace URL with your server's address and port
curl http://localhost:11434/v1/models # Mirrored mode
curl http://172.29.192.1:11434/v1/models # NAT mode (use your actual host IP)
```
If you get a JSON response listing your models, you're good. Use that same URL as the `base_url` in your Hermes config.
---
### Troubleshooting Local Models
These issues affect **all** local inference servers when used with Hermes.
#### "Connection refused" from WSL2 to a Windows-hosted model server
If you're running Hermes inside WSL2 and your model server on the Windows host, `http://localhost:` won't work in WSL2's default NAT networking mode. See [WSL2 Networking](#wsl2-networking-windows-users) above for the fix.
#### Tool calls appear as text instead of executing
The model outputs something like `{"name": "web_search", "arguments": {...}}` as a message instead of actually calling the tool.
**Cause:** Your server doesn't have tool calling enabled, or the model doesn't support it through the server's tool calling implementation.
| Server | Fix |
|--------|-----|
| **llama.cpp** | Add `--jinja` to the startup command |
| **vLLM** | Add `--enable-auto-tool-choice --tool-call-parser hermes` |
| **SGLang** | Add `--tool-call-parser qwen` (or appropriate parser) |
| **Ollama** | Tool calling is enabled by default — make sure your model supports it (check with `ollama show model-name`) |
| **LM Studio** | Update to 0.3.6+ and use a model with native tool support |
#### Model seems to forget context or give incoherent responses
**Cause:** Context window is too small. When the conversation exceeds the context limit, most servers silently drop older messages. Hermes's system prompt + tool schemas alone can use 4k–8k tokens.
**Diagnosis:**
```bash
# Check what Hermes thinks the context is
# Look at startup line: "Context limit: X tokens"
# Check your server's actual context
# Ollama: ollama ps (CONTEXT column)
# llama.cpp: curl http://localhost:8080/props | jq '.default_generation_settings.n_ctx'
# vLLM: check --max-model-len in startup args
```
**Fix:** Set context to at least **32,768 tokens** for agent use. See each server's section above for the specific flag.
#### "Context limit: 2048 tokens" at startup
Hermes auto-detects context length from your server's `/v1/models` endpoint. If the server reports a low value (or doesn't report one at all), Hermes uses the model's declared limit which may be wrong.
**Fix:** Set it explicitly in `config.yaml`:
```yaml
model:
default: your-model
provider: custom
base_url: http://localhost:11434/v1
context_length: 32768
```
#### Responses get cut off mid-sentence
**Possible causes:**
1. **Low output cap (`max_tokens`) on the server** — SGLang defaults to 128 tokens per response. Set `--default-max-tokens` on the server or configure Hermes with `model.max_tokens` in config.yaml. Note: `max_tokens` controls response length only — it is unrelated to how long your conversation history can be (that is `context_length`).
2. **Context exhaustion** — The model filled its context window. Increase `model.context_length` or enable [context compression](/docs/user-guide/configuration#context-compression) in Hermes.
---
### LiteLLM Proxy — Multi-Provider Gateway
[LiteLLM](https://docs.litellm.ai/) is an OpenAI-compatible proxy that unifies 100+ LLM providers behind a single API. Best for: switching between providers without config changes, load balancing, fallback chains, budget controls.
```bash
# Install and start
pip install "litellm[proxy]"
litellm --model anthropic/claude-sonnet-4 --port 4000
# Or with a config file for multiple models:
litellm --config litellm_config.yaml --port 4000
```
Then configure Hermes with `hermes model` → Custom endpoint → `http://localhost:4000/v1`.
Example `litellm_config.yaml` with fallback:
```yaml
model_list:
- model_name: "best"
litellm_params:
model: anthropic/claude-sonnet-4
api_key: sk-ant-...
- model_name: "best"
litellm_params:
model: openai/gpt-4o
api_key: sk-...
router_settings:
routing_strategy: "latency-based-routing"
```
---
### ClawRouter — Cost-Optimized Routing
[ClawRouter](https://github.com/BlockRunAI/ClawRouter) by BlockRunAI is a local routing proxy that auto-selects models based on query complexity. It classifies requests across 14 dimensions and routes to the cheapest model that can handle the task. Payment is via USDC cryptocurrency (no API keys).
```bash
# Install and start
npx @blockrun/clawrouter # Starts on port 8402
```
Then configure Hermes with `hermes model` → Custom endpoint → `http://localhost:8402/v1` → model name `blockrun/auto`.
Routing profiles:
| Profile | Strategy | Savings |
|---------|----------|---------|
| `blockrun/auto` | Balanced quality/cost | 74-100% |
| `blockrun/eco` | Cheapest possible | 95-100% |
| `blockrun/premium` | Best quality models | 0% |
| `blockrun/free` | Free models only | 100% |
| `blockrun/agentic` | Optimized for tool use | varies |
:::note
ClawRouter requires a USDC-funded wallet on Base or Solana for payment. All requests route through BlockRun's backend API. Run `npx @blockrun/clawrouter doctor` to check wallet status.
:::
---
### Other Compatible Providers
Any service with an OpenAI-compatible API works. Some popular options:
| Provider | Base URL | Notes |
|----------|----------|-------|
| [Together AI](https://together.ai) | `https://api.together.xyz/v1` | Cloud-hosted open models |
| [Groq](https://groq.com) | `https://api.groq.com/openai/v1` | Ultra-fast inference |
| [DeepSeek](https://deepseek.com) | `https://api.deepseek.com/v1` | DeepSeek models |
| [Fireworks AI](https://fireworks.ai) | `https://api.fireworks.ai/inference/v1` | Fast open model hosting |
| [GMI Cloud](https://www.gmicloud.ai/) | `https://api.gmi-serving.com/v1` | Managed OpenAI-compatible inference |
| [Cerebras](https://cerebras.ai) | `https://api.cerebras.ai/v1` | Wafer-scale chip inference |
| [Mistral AI](https://mistral.ai) | `https://api.mistral.ai/v1` | Mistral models |
| [OpenAI](https://openai.com) | `https://api.openai.com/v1` | Direct OpenAI access |
| [Azure OpenAI](https://azure.microsoft.com) | `https://YOUR.openai.azure.com/` | Enterprise OpenAI |
| [LocalAI](https://localai.io) | `http://localhost:8080/v1` | Self-hosted, multi-model |
| [Jan](https://jan.ai) | `http://localhost:1337/v1` | Desktop app with local models |
Configure any of these with `hermes model` → Custom endpoint, or in `config.yaml`:
```yaml
model:
default: meta-llama/Llama-3.1-70B-Instruct-Turbo
provider: custom
base_url: https://api.together.xyz/v1
api_key: your-together-key
```
---
### Context Length Detection
:::note Two settings, easy to confuse
**`context_length`** is the **total context window** — the combined budget for input *and* output tokens (e.g. 200,000 for Claude Opus 4.6). Hermes uses this to decide when to compress history and to validate API requests.
**`model.max_tokens`** is the **output cap** — the maximum number of tokens the model may generate in a *single response*. It has nothing to do with how long your conversation history can be. The industry-standard name `max_tokens` is a common source of confusion; Anthropic's native API has since renamed it `max_output_tokens` for clarity.
Set `context_length` when auto-detection gets the window size wrong.
Set `model.max_tokens` only when you need to limit how long individual responses can be.
:::
Hermes uses a multi-source resolution chain to detect the correct context window for your model and provider:
1. **Config override** — `model.context_length` in config.yaml (highest priority)
2. **Custom provider per-model** — `custom_providers[].models.