Hermes S6 Container Supervision
Modify, debug, or extend the s6-overlay supervision tree inside the Hermes Agent Docker image — adding new services, debugging profile gateways, understanding the Architecture B main-program pattern.
Skill metadata
| Source | Bundled (installed by default) |
| Path | skills/software-development/hermes-s6-container-supervision |
| Version | 1.0.0 |
| Author | Hermes Agent |
| License | MIT |
| Tags | docker, s6, supervision, gateway, profiles |
| Related skills | hermes-agent, hermes-agent-dev |
Reference: full SKILL.md
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
Hermes s6-overlay Container Supervision
When to use this skill
Load this skill when you're working on:
- Adding or removing a static service in the Hermes Docker image (something that should be supervised at every container start, like the dashboard)
- Diagnosing why a per-profile gateway isn't starting, restarting, or surviving
docker restart - Understanding why the container's CMD is
/opt/hermes/docker/main-wrapper.shand how leading-dash args reach the user's program - Modifying
cont-init.dboot scripts (UID remap, volume seeding, profile reconciliation) - Changing the rendered run-script for per-profile gateways (Phase 4)
If you're just running the Hermes Agent and want to use Docker, see website/docs/user-guide/docker.md instead.
Architecture at a glance
/init ← PID 1 (s6-overlay v3.2.3.0)
├── cont-init.d ← oneshot setup, runs as root
│ ├── 01-hermes-setup ← docker/stage2-hook.sh
│ │ ├── UID/GID remap
│ │ ├── chown /opt/data
│ │ ├── chown /opt/data/profiles (every boot)
│ │ ├── seed .env / config.yaml / SOUL.md
│ │ └── skills_sync.py
│ └── 02-reconcile-profiles ← hermes_cli.container_boot
│ ├── chown /run/service (hermes-writable for runtime register)
│ └── walk $HERMES_HOME/profiles/<name>/gateway_state.json
│ → recreate /run/service/gateway-<name>/
│ → auto-start only those with prior_state == "running"
│
├── s6-rc.d (static services, in /etc/s6-overlay/s6-rc.d/)
│ ├── main-hermes/run ← exec sleep infinity (no-op slot)
│ └── dashboard/run ← if HERMES_DASHBOARD=1, runs `hermes dashboard`
│
├── /run/service (s6-svscan watches; tmpfs)
│ ├── gateway-coder/ ← runtime-registered per-profile
│ │ ├── type ("longrun")
│ │ ├── run ("#!/command/with-contenv sh ... exec s6-setuidgid hermes hermes -p coder gateway run")
│ │ ├── down (marker — present means "registered but don't auto-start")
│ │ └── log/run (s6-log → $HERMES_HOME/logs/gateways/coder/current)
│ └── ...
│
└── CMD ("main program") ← /opt/hermes/docker/main-wrapper.sh
└── routes user args: bare exec | hermes subcommand | hermes (no args)
— exec'd by /init with stdin/stdout/stderr inherited (TTY for --tui)
Key files
| Path | Role |
|---|---|
Dockerfile | s6-overlay install + cont-init.d wiring + ENTRYPOINT ["/init", "/opt/hermes/docker/main-wrapper.sh"] |
docker/stage2-hook.sh | The "old entrypoint logic" — UID remap, chown, seed, skills sync. Runs as cont-init.d/01-hermes-setup. |
docker/cont-init.d/02-reconcile-profiles | Calls hermes_cli.container_boot on every boot to restore profile gateway slots from the persistent volume. |
docker/main-wrapper.sh | The container's CMD. Routes user args, drops to hermes via s6-setuidgid, exec's the chosen program. |
docker/s6-rc.d/main-hermes/run | No-op sleep infinity — slot exists so the s6-rc user bundle is valid; main hermes runs as the CMD, not as a supervised service. |
docker/s6-rc.d/dashboard/run | Conditional service — exec sleep infinity unless HERMES_DASHBOARD is truthy. |
docker/entrypoint.sh | Back-compat shim that execs the stage2 hook. External scripts that hard-coded the old entrypoint path still work. |
hermes_cli/service_manager.py | S6ServiceManager: register_profile_gateway, unregister_profile_gateway, start/stop/restart/is_running, list_profile_gateways. |
hermes_cli/container_boot.py | reconcile_profile_gateways() — walks persistent profiles, regenerates s6 slots, emits container-boot.log. |
hermes_cli/gateway.py::_dispatch_via_service_manager_if_s6 | Intercepts hermes gateway start/stop/restart and routes to s6 when running in a container. |
Why Architecture B (CMD as main program, not s6-supervised)
The original plan (v1–v3) called for main hermes to run as a supervised s6-rc service. Two real s6-overlay v3 mechanics blocked that:
- cont-init.d scripts receive no CMD args — so the stage2 hook can't parse
docker run <image> chat -q "hi"to setHERMES_ARGSfor a servicerunscript to consume. /run/s6/basedir/bin/haltdoes NOT propagate the exit code written to/run/s6-linux-init-container-results/exitcode. Containers always exit 143 (SIGTERM) regardless. Confirmed by skarnet (s6 author) in issue #477: "if you want a container shutdown, you need to either have your CMD exit, or, if you have no CMD, write the container exit code you want then call halt".
So we use the s6-overlay-native CMD pattern: ENTRYPOINT ["/init", "/opt/hermes/docker/main-wrapper.sh"]. /init prepends the wrapper to user args automatically — so docker run <image> --version becomes /init main-wrapper.sh --version, and --version doesn't get intercepted by /init's POSIX shell. The wrapper drops to hermes via s6-setuidgid, then exec's the chosen program. The program's exit code becomes the container exit code, exactly matching the pre-s6 tini contract.
Trade-off: main hermes is unsupervised under s6. That exactly matches its behavior under tini (the pre-s6 image). Dashboard supervision is the only new guarantee — and per-profile gateways under /run/service/ get full supervision.
Quick recipes
Verify s6 is PID 1 in a running container
docker exec <c> sh -c 'cat /proc/1/comm; readlink /proc/1/exe'
# Expect: s6-svscan or init / /package/admin/s6/.../s6-svscan
Inspect a profile gateway service
# /command/ isn't on docker-exec PATH — use absolute path
docker exec <c> /command/s6-svstat /run/service/gateway-<name>
# "up (pid …) … seconds" → running
# "down (exitcode N) … seconds, normally up, want up, …" → s6 wants it up but the process keeps exiting (crash loop)
# "down … normally up, ready …" → user stopped it
Bring a service up/down manually
docker exec <c> /command/s6-svc -u /run/service/gateway-<name> # up
docker exec <c> /command/s6-svc -d /run/service/gateway-<name> # down
docker exec <c> /command/s6-svc -t /run/service/gateway-<name> # SIGTERM (restart)
Watch the cont-init reconciler log
docker exec <c> tail -n 50 /opt/data/logs/container-boot.log
# 2026-05-21T06:18:05+0000 profile=coder prior_state=running action=started
# 2026-05-21T06:18:05+0000 profile=writer prior_state=stopped action=registered
Add a new static service
- Create
docker/s6-rc.d/<name>/typewithlongrun\nanddocker/s6-rc.d/<name>/run(use#!/command/with-contenv sh+# shellcheck shell=sh). - Drop to hermes via
s6-setuidgid hermesat the top of run (unless you specifically need root). - Create empty
docker/s6-rc.d/<name>/dependencies.d/baseso it waits for the base bundle. - Create empty
docker/s6-rc.d/user/contents.d/<name>so it joins the user bundle. - The
COPY docker/s6-rc.d/in the Dockerfile picks it up automatically — no other changes.
Change the per-profile gateway run command
Edit S6ServiceManager._render_run_script in hermes_cli/service_manager.py. The function is also called by hermes_cli/container_boot.py::_register_service during boot reconciliation, so it's the single source of truth. Update the corresponding assertion in tests/hermes_cli/test_service_manager.py::test_s6_register_creates_service_dir_and_triggers_scan.
Run the docker test harness
docker build -t hermes-agent-harness:latest .
HERMES_TEST_IMAGE=hermes-agent-harness:latest scripts/run_tests.sh tests/docker/ -v
# Expect 19 passed, 0 xfailed against the s6 image
The harness lives in tests/docker/ and skips when Docker isn't available. The per-test timeout is bumped to 180s (see tests/docker/conftest.py).
Common pitfalls
"command not found" via docker exec
/command/ (where s6-overlay puts its binaries) is on PATH only for processes spawned by the supervision tree — services, cont-init.d, main-wrapper.sh. docker exec <c> s6-svstat … will fail with "command not found"; always use the absolute path /command/s6-svstat. The hermes binary works because the Dockerfile adds /opt/hermes/.venv/bin to the runtime ENV PATH.
Profile directory ownership
The cont-init reconciler runs as hermes (s6-setuidgid hermes in 02-reconcile-profiles). If a profile dir ends up root-owned (e.g. because docker exec <c> hermes profile create … ran as root by default), the reconciler can't read SOUL.md and fails with PermissionError. Mitigation: stage2-hook.sh chowns $HERMES_HOME/profiles to hermes on every boot, idempotently. Don't remove that block.
Files written by docker exec are root-owned
docker exec defaults to root. Either pass --user hermes or rely on the stage2 chown sweep next reboot. Don't write files under $HERMES_HOME/profiles/<name>/ as root manually — the next reconcile pass will sweep them but in-flight operations may hit perm errors.
Service slot exists but s6-svstat says "s6-supervise not running"
The service directory is on tmpfs and was wiped on container restart. Either the cont-init reconciler hasn't run yet (give it a moment after docker restart) or it failed. Check docker logs <c> | grep '02-reconcile'.
Gateway starts then immediately exits (down (exitcode 1) in svstat)
Most likely the profile has no model or auth configured. The service slot is correct — the gateway itself is unconfigured. Run hermes -p <profile> setup first. The s6 supervisor will keep restarting it; that's the desired behavior (when you fix the config, the next attempt succeeds and stays up).
Reconciler skipped a profile
The reconciler keys on the presence of SOUL.md as the "real profile" marker. hermes profile create always seeds it. If a profile dir is missing SOUL.md (stray directory, partial restore, backup-in-progress), the reconciler skips it intentionally. Add a SOUL.md (even empty) to opt back in.
"Help, the container exits 143!"
Check whether something is invoking s6-svscanctl -t or /run/s6/basedir/bin/halt — both cause /init to begin stage 3 shutdown but return 143 (SIGTERM) rather than the desired exit code. This was the Phase 2 architecture pivot from A to B. For container shutdown with a real exit code, you must let the CMD (main-wrapper.sh) exit normally; do not try to control exit from a finish script.
Related skills
hermes-agent-dev: General hermes-agent codebase navigationhermes-tool-quirks: Specific Hermes-tool workarounds (sed/grep/etc.) — load when debugging the s6 stack's interaction with hermes built-in tools.