Browser Automation

Hermes Agent includes a full browser automation toolset with multiple backend options:

Browserbase cloud mode via Browserbase for managed cloud browsers and anti-bot tooling
Browser Use cloud mode via Browser Use as an alternative cloud browser provider
Firecrawl cloud mode via Firecrawl for cloud browsers with built-in scraping
Camofox local mode via Camofox for local anti-detection browsing (Firefox-based fingerprint spoofing)
Local Chromium-family CDP — connect browser tools to your own Chrome, Brave, Chromium, or Edge instance using /browser connect
Local browser mode via the agent-browser CLI and a local Chromium installation

In all modes, the agent can navigate websites, interact with page elements, fill forms, and extract information.

Overview

Pages are represented as accessibility trees (text-based snapshots), making them ideal for LLM agents. Interactive elements get ref IDs (like @e1, @e2) that the agent uses for clicking and typing.

Key capabilities:

Multi-provider cloud execution — Browserbase, Browser Use, or Firecrawl — no local browser needed
Local Chromium-family integration — attach to your running Chrome, Brave, Chromium, or Edge browser via CDP for hands-on browsing
Built-in stealth — random fingerprints, CAPTCHA solving, residential proxies (Browserbase)
Session isolation — each task gets its own browser session
Automatic cleanup — inactive sessions are closed after a timeout
Vision analysis — screenshot + AI analysis for visual understanding

Setup

Nous Subscribers

If you have a paid Nous Portal subscription, you can use browser automation through the Tool Gateway without any separate API keys. New installs can run hermes setup --portal to log in and turn on every gateway tool at once; existing installs can pick Nous Subscription as the browser provider via hermes model or hermes tools.

Browserbase cloud mode

To use Browserbase-managed cloud browsers, add:

# Add to ~/.hermes/.env
BROWSERBASE_API_KEY=***
BROWSERBASE_PROJECT_ID=your-project-id-here

Get your credentials at browserbase.com.

Browser Use cloud mode

To use Browser Use as your cloud browser provider, add:

# Add to ~/.hermes/.env
BROWSER_USE_API_KEY=***

Get your API key at browser-use.com. Browser Use provides a cloud browser via its REST API. If both Browserbase and Browser Use credentials are set, Browserbase takes priority.

Firecrawl cloud mode

To use Firecrawl as your cloud browser provider, add:

# Add to ~/.hermes/.env
FIRECRAWL_API_KEY=fc-***

Get your API key at firecrawl.dev. Then select Firecrawl as your browser provider:

hermes setup tools
# → Browser Automation → Firecrawl

Optional settings:

# Self-hosted Firecrawl instance (default: https://api.firecrawl.dev)
FIRECRAWL_API_URL=http://localhost:3002

# Session TTL in seconds (default: 300)
FIRECRAWL_BROWSER_TTL=600

Hybrid routing: cloud for public URLs, local for LAN/localhost

When a cloud provider is configured, Hermes auto-spawns a local Chromium sidecar for URLs that resolve to a private/loopback/LAN address (localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, 172.16-31.x.x, *.local, *.lan, *.internal, IPv6 loopback ::1, link-local 169.254.x.x). Public URLs continue to use the cloud provider in the same conversation.

This solves the common "I'm developing locally but using Browserbase" workflow — the agent can screenshot your dashboard at http://localhost:3000 AND scrape https://github.com without you switching providers or disabling the SSRF guard. The cloud provider never sees the private URL.

The feature is on by default. To disable it (all URLs go to the configured cloud provider, as before):

# ~/.hermes/config.yaml
browser:
  cloud_provider: browserbase
  auto_local_for_private_urls: false

With auto-routing disabled, private URLs are rejected with "Blocked: URL targets a private or internal address" unless you also set browser.allow_private_urls: true (which lets the cloud provider attempt them — usually won't work since Browserbase etc. can't reach your LAN).

Requirements: the local sidecar uses the same agent-browser CLI as pure local mode, so you need it installed (hermes setup tools → Browser Automation auto-installs it). Post-navigation redirects from a public URL onto a private address are still blocked (you can't use a redirect-to-internal trick to reach your LAN through the public path).

Camofox local mode

Camofox is a self-hosted Node.js server wrapping Camoufox (a Firefox fork with C++ fingerprint spoofing). It provides local anti-detection browsing without cloud dependencies.

# Clone the Camofox browser server first
git clone https://github.com/jo-inc/camofox-browser
cd camofox-browser

# Build and start with Docker using the default container settings
# (auto-detects arch: aarch64 on M1/M2, x86_64 on Intel)
make up

# Stop and remove the default container
make down

# Force a clean rebuild (for example, after upgrading VERSION/RELEASE)
make reset

# Just download binaries without building
make fetch

# Override arch or version explicitly
make up ARCH=x86_64
make up VERSION=135.0.1 RELEASE=beta.24

make up starts the default container immediately. If you want custom runtime settings such as a larger Node heap, VNC, or a persistent profile directory, build the image first and then run it yourself:

# Build the image without starting the default container
make build

# Start with persistence, VNC live view, and a larger Node heap
mkdir -p ~/.camofox-docker
docker run -d \
  --name camofox-browser \
  --restart unless-stopped \
  -p 9377:9377 \
  -p 6080:6080 \
  -p 5901:5900 \
  -e CAMOFOX_PORT=9377 \
  -e ENABLE_VNC=1 \
  -e VNC_BIND=0.0.0.0 \
  -e VNC_RESOLUTION=1920x1080 \
  -e MAX_OLD_SPACE_SIZE=2048 \
  -v ~/.camofox-docker:/root/.camofox \
  camofox-browser:135.0.1-aarch64

With VNC enabled, the browser runs in headed mode and can be watched live in your browser at http://localhost:6080 (noVNC). You can also connect a native VNC client to localhost:5901.

If you already ran make up, stop and remove that default container before starting the custom one:

make down
# then run the custom docker run command above

Then set in ~/.hermes/.env:

CAMOFOX_URL=http://localhost:9377

If Camofox is running in Docker and you want it to open web apps served from the host machine, enable loopback rewriting. CAMOFOX_URL should still point at the host-published control API, but page URLs such as http://127.0.0.1:3000 must be opened from inside the container as http://host.docker.internal:3000:

# ~/.hermes/config.yaml
browser:
  camofox:
    rewrite_loopback_urls: true
    loopback_host_alias: host.docker.internal  # default; use a LAN IP if needed

Equivalent env vars:

CAMOFOX_REWRITE_LOOPBACK_URLS=true
CAMOFOX_LOOPBACK_HOST_ALIAS=host.docker.internal

The rewrite only applies to page navigation URLs with loopback hosts (localhost, 127.0.0.1, ::1). It does not change CAMOFOX_URL. Leave it disabled for non-Docker Camofox installs, where the browser already runs on the host and loopback URLs are correct.

Or configure via hermes tools → Browser Automation → Camofox.

When CAMOFOX_URL is set, all browser tools automatically route through Camofox instead of Browserbase or agent-browser.

Persistent browser sessions

By default, each Camofox session gets a random identity — cookies and logins don't survive across agent restarts. To enable persistent browser sessions, add the following to ~/.hermes/config.yaml:

browser:
  camofox:
    managed_persistence: true

Then fully restart Hermes so the new config is picked up.

Nested path matters

Hermes reads browser.camofox.managed_persistence, not a top-level managed_persistence. A common mistake is writing:

# ❌ Wrong — Hermes ignores this
managed_persistence: true

If the flag is placed at the wrong path, Hermes silently falls back to a random ephemeral userId and your login state will be lost on every session.

What Hermes does

Sends a deterministic profile-scoped userId to Camofox so the server can reuse the same Firefox profile across sessions.
Skips server-side context destruction on cleanup, so cookies and logins survive between agent tasks.
Scopes the userId to the active Hermes profile, so different Hermes profiles get different browser profiles (profile isolation).

What Hermes does not do

It does not force persistence on the Camofox server. Hermes only sends a stable userId; the server must honor it by mapping that userId to a persistent Firefox profile directory.
If your Camofox server build treats every request as ephemeral (e.g. always calls browser.newContext() without loading a stored profile), Hermes cannot make those sessions persist. Make sure you are running a Camofox build that implements userId-based profile persistence.

Verify it's working

Start Hermes and your Camofox server.
Open Google (or any login site) in a browser task and sign in manually.
End the browser task normally.
Start a new browser task.
Open the same site again — you should still be signed in.

If step 5 logs you out, the Camofox server isn't honoring the stable userId. Double-check your config path, confirm you fully restarted Hermes after editing config.yaml, and verify your Camofox server version supports persistent per-user profiles.

Where state lives

Hermes derives the stable userId from the profile-scoped directory ~/.hermes/browser_auth/camofox/ (or the equivalent under $HERMES_HOME for non-default profiles). The actual browser profile data lives on the Camofox server side, keyed by that userId. To fully reset a persistent profile, clear it on the Camofox server and remove the corresponding Hermes profile's state directory.

Externally managed Camofox sessions

When another app drives the visible Camofox browser (a desktop assistant, a custom integration, another agent), configure Hermes to operate inside that same identity instead of spawning its own isolated profile.

Three knobs control the behavior:

Setting	Env var	Effect
`browser.camofox.user_id`	`CAMOFOX_USER_ID`	Camofox `userId` Hermes uses when creating tabs. Setting this opts the session into "externally managed" mode.
`browser.camofox.session_key`	`CAMOFOX_SESSION_KEY`	`sessionKey` (a.k.a. `listItemId`) sent on tab creation. Used to match an existing tab during adoption. Defaults to a per-task value if unset.
`browser.camofox.adopt_existing_tab`	`CAMOFOX_ADOPT_EXISTING_TAB`	When true, Hermes calls `GET /tabs?userId=<user_id>` on first use and reuses an existing tab before creating a new one.

Env vars take precedence over config.yaml. Either form works:

browser:
  camofox:
    user_id: shared-camofox
    session_key: visible-tab
    adopt_existing_tab: true

CAMOFOX_USER_ID=shared-camofox
CAMOFOX_SESSION_KEY=visible-tab
CAMOFOX_ADOPT_EXISTING_TAB=true

What changes when user_id is set:

Hermes skips destructive cleanup at task end (same as managed_persistence: true). The other app's tab/cookies/profile survive.
Hermes does not call DELETE /sessions/<user_id> — that endpoint wipes all user data, so it would nuke the external app's session if it fired.

How tab adoption works (when adopt_existing_tab: true):

On the first browser tool call after a process start, Hermes issues GET /tabs?userId=<user_id> (5-second timeout).
If any tab in the response has listItemId == session_key, Hermes adopts the most recently created one in that group.
Otherwise, Hermes adopts the most recently created tab for the user (any listItemId).
If no tabs exist or the request fails, Hermes falls back to creating a new tab on the next operation.

Adoption only fires until tab_id is populated for the session. If the external app closes the adopted tab mid-run, the next browser tool call will surface a Camofox error — Hermes does not re-poll for a fresh tab on every call.

Picking session_key: if you want Hermes to reliably attach to a specific existing tab, set session_key to the listItemId the external app used when creating it. If you leave session_key unset and only set user_id, Hermes generates a per-task session_key (task_<id>) — Hermes will share cookies and the profile with the external app, but will open its own tab alongside instead of reusing one.

Concurrency note: the external app and Hermes can drive the same Camofox userId simultaneously, but Camofox does not coordinate per-tab focus between clients. Coordinate ownership at the application layer (e.g. the external app pauses while Hermes runs).

VNC live view

When Camofox runs in headed mode (with a visible browser window), it exposes a VNC port in its health check response. Hermes automatically discovers this and includes the VNC URL in navigation responses, so the agent can share a link for you to watch the browser live.

Local Chromium-family browser via CDP (`/browser connect`)

Instead of a cloud provider, you can attach Hermes browser tools to your own running Chrome, Brave, Chromium, or Edge instance via the Chrome DevTools Protocol (CDP). This is useful when you want to see what the agent is doing in real-time, interact with pages that require your own cookies/sessions, or avoid cloud browser costs.

note

/browser connect is an interactive-CLI slash command — it is not dispatched by the gateway. If you try to run it inside a WebUI, Telegram, Discord, or other gateway chat, the message will be sent to the agent as plain text and the command will not execute. Start Hermes from the terminal (hermes or hermes chat) and issue /browser connect there.

In the CLI, use:

/browser connect                 # Auto-launch/connect to a local Chromium-family browser at http://127.0.0.1:9222
/browser connect ws://host:port  # Connect to a specific CDP endpoint
/browser status                  # Check current connection
/browser disconnect              # Detach and return to cloud/local mode

If a browser isn't already running with remote debugging, Hermes will attempt to auto-launch a supported Chromium-family browser with --remote-debugging-port=9222. Detection includes Brave, Google Chrome, Chromium, and Microsoft Edge, with common Linux install paths such as /opt/brave-bin/brave and /snap/bin/brave.

tip

To start a Chromium-family browser manually with CDP enabled, use a dedicated user-data-dir so the debug port actually comes up even if the browser is already running with your normal profile:

# Linux — Brave
brave-browser \
  --remote-debugging-port=9222 \
  --user-data-dir=$HOME/.hermes/chrome-debug \
  --no-first-run \
  --no-default-browser-check &

# Linux — Google Chrome
google-chrome \
  --remote-debugging-port=9222 \
  --user-data-dir=$HOME/.hermes/chrome-debug \
  --no-first-run \
  --no-default-browser-check &

# macOS — Brave
"/Applications/Brave Browser.app/Contents/MacOS/Brave Browser" \
  --remote-debugging-port=9222 \
  --user-data-dir="$HOME/.hermes/chrome-debug" \
  --no-first-run \
  --no-default-browser-check &

# macOS — Google Chrome
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="$HOME/.hermes/chrome-debug" \
  --no-first-run \
  --no-default-browser-check &

Then launch the Hermes CLI and run /browser connect.

Why --user-data-dir? Without it, launching a Chromium-family browser while a regular instance is already running typically opens a new window on the existing process — and that existing process was not started with --remote-debugging-port, so port 9222 never opens. A dedicated user-data-dir forces a fresh browser process where the debug port actually listens. --no-first-run --no-default-browser-check skips the first-launch wizard for the fresh profile.

When connected via CDP, all browser tools (browser_navigate, browser_click, etc.) operate on your live browser instance instead of spinning up a cloud session.

WSL2 + Windows Chrome: prefer MCP over `/browser connect`

If Hermes runs inside WSL2 but the Chrome window you want to control runs on the Windows host, /browser connect is often not the best path.

Why:

/browser connect expects Hermes itself to reach a usable CDP endpoint
modern Chrome live-debugging sessions often expose a host-local endpoint that is not directly reachable from WSL the same way a classic 9222 port is
even when Windows Chrome is debuggable, the cleanest integration is often to let a Windows-side browser MCP server attach to Chrome and let Hermes talk to that MCP server

For that setup, prefer chrome-devtools-mcp through Hermes MCP support.

See the MCP guide for the practical setup:

Use MCP with Hermes

Local browser mode

If you do not set any cloud credentials and don't use /browser connect, Hermes can still use the browser tools through a local Chromium install driven by agent-browser.

Optional Environment Variables

# Residential proxies for better CAPTCHA solving (default: "true")
BROWSERBASE_PROXIES=true

# Advanced stealth with custom Chromium — requires Scale Plan (default: "false")
BROWSERBASE_ADVANCED_STEALTH=false

# Session reconnection after disconnects — requires paid plan (default: "true")
BROWSERBASE_KEEP_ALIVE=true

# Custom session timeout in seconds (max 21600 = 6 hours) (default: project default)
# Examples: 600 (10min), 1800 (30min), 21600 (6h max)
BROWSERBASE_SESSION_TIMEOUT=1800

# Inactivity timeout before auto-cleanup in seconds (default: 120)
BROWSER_INACTIVITY_TIMEOUT=120

# Extra Chromium launch flags (comma- or newline-separated). Hermes auto-injects
# `--no-sandbox,--disable-dev-shm-usage` when it detects root or AppArmor-restricted
# unprivileged user namespaces (Ubuntu 23.10+, DGX Spark, many container images),
# so most users don't need to set this. Set it manually only if you need a flag
# Hermes doesn't add automatically; setting it disables the auto-injection.
AGENT_BROWSER_ARGS=--no-sandbox

Install agent-browser CLI

npm install -g agent-browser
# Or install locally in the repo:
npm install

info

The browser toolset must be included in your config's toolsets list or enabled via hermes config set toolsets '["hermes-cli", "browser"]'.

Available Tools

`browser_navigate`

Navigate to a URL. Must be called before any other browser tool. Initializes the Browserbase session.

Navigate to https://github.com/NousResearch

tip

For simple information retrieval, prefer web_search or web_extract — they are faster and cheaper. Use browser tools when you need to interact with a page (click buttons, fill forms, handle dynamic content).

`browser_snapshot`

Get a text-based snapshot of the current page's accessibility tree. Returns interactive elements with ref IDs like @e1, @e2 for use with browser_click and browser_type.

full=false (default): Compact view showing only interactive elements
full=true: Complete page content

Snapshots over 8000 characters are automatically summarized by an LLM.

`browser_click`

Click an element identified by its ref ID from the snapshot.

Click @e5 to press the "Sign In" button

`browser_type`

Type text into an input field. Clears the field first, then types the new text.

Type "hermes agent" into the search field @e3

`browser_scroll`

Scroll the page up or down to reveal more content.

Scroll down to see more results

`browser_press`

Press a keyboard key. Useful for submitting forms or navigation.

Press Enter to submit the form

Supported keys: Enter, Tab, Escape, ArrowDown, ArrowUp, and more.

`browser_back`

Navigate back to the previous page in browser history.

`browser_get_images`

List all images on the current page with their URLs and alt text. Useful for finding images to analyze.

`browser_vision`

Take a screenshot and analyze it with vision AI. Use this when text snapshots don't capture important visual information — especially useful for CAPTCHAs, complex layouts, or visual verification challenges.

The screenshot is saved persistently and the file path is returned alongside the AI analysis. On messaging platforms (Telegram, Discord, Slack, WhatsApp), you can ask the agent to share the screenshot — it will be sent as a native photo attachment via the MEDIA: mechanism.

What does the chart on this page show?

Screenshots are stored in ~/.hermes/cache/screenshots/ and automatically cleaned up after 24 hours.

`browser_console`

Get browser console output (log/warn/error messages) and uncaught JavaScript exceptions from the current page. Essential for detecting silent JS errors that don't appear in the accessibility tree.

Check the browser console for any JavaScript errors

Use clear=True to clear the console after reading, so subsequent calls only show new messages.

browser_console also evaluates JavaScript when called with an expression argument — same shape as DevTools console, the result comes back parsed (JSON-serialized objects become dicts; primitive values stay primitive).

browser_console(expression="document.querySelector('h1').textContent")
browser_console(expression="JSON.stringify(performance.timing)")

When a CDP supervisor is active for the current session (typical for any session that's run browser_navigate against a CDP-capable backend), evaluation runs over the supervisor's persistent WebSocket — no subprocess startup cost. Falls through to the standard agent-browser CLI path otherwise. Behaviour is identical either way; only latency changes.

`browser_cdp`

Raw Chrome DevTools Protocol passthrough — the escape hatch for browser operations not covered by the other tools. Use for native dialog handling, iframe-scoped evaluation, cookie/network control, or any CDP verb the agent needs.

Only available when a CDP endpoint is reachable at session start — meaning /browser connect has attached to a running Chrome, Brave, Chromium, or Edge browser, or browser.cdp_url is set in config.yaml. The default local agent-browser mode, Camofox, and cloud providers (Browserbase, Browser Use, Firecrawl) do not currently expose CDP to this tool — cloud providers have per-session CDP URLs but live-session routing is a follow-up.

CDP method reference: https://chromedevtools.github.io/devtools-protocol/ — the agent can web_extract a specific method's page to look up parameters and return shape.

Common patterns:

# List tabs (browser-level, no target_id)
browser_cdp(method="Target.getTargets")

# Handle a native JS dialog on a tab
browser_cdp(method="Page.handleJavaScriptDialog",
            params={"accept": true, "promptText": ""},
            target_id="<tabId>")

# Evaluate JS in a specific tab
browser_cdp(method="Runtime.evaluate",
            params={"expression": "document.title", "returnByValue": true},
            target_id="<tabId>")

# Get all cookies
browser_cdp(method="Network.getAllCookies")

Browser-level methods (Target.*, Browser.*, Storage.*) omit target_id. Page-level methods (Page.*, Runtime.*, DOM.*, Emulation.*) require a target_id from Target.getTargets. Each stateless call is independent — sessions do not persist between calls.

Cross-origin iframes: pass frame_id (from browser_snapshot.frame_tree.children[] where is_oopif=true) to route the CDP call through the supervisor's live session for that iframe. This is how Runtime.evaluate inside a cross-origin iframe works on Browserbase, where stateless CDP connections would hit signed-URL expiry. Example:

browser_cdp(
  method="Runtime.evaluate",
  params={"expression": "document.title", "returnByValue": True},
  frame_id="<frame_id from browser_snapshot>",
)

Same-origin iframes don't need frame_id — use document.querySelector('iframe').contentDocument from a top-level Runtime.evaluate instead.

`browser_dialog`

Responds to a native JS dialog (alert / confirm / prompt / beforeunload). Before this tool existed, dialogs would silently block the page's JavaScript thread and subsequent browser_* calls would hang or throw; now the agent sees pending dialogs in browser_snapshot output and responds explicitly.

Workflow:

Call browser_snapshot. If a dialog is blocking the page, it shows up as pending_dialogs: [{"id": "d-1", "type": "alert", "message": "..."}].
Call browser_dialog(action="accept") or browser_dialog(action="dismiss"). For prompt() dialogs, pass prompt_text="..." to supply the response.
Re-snapshot — pending_dialogs is empty; the page's JS thread has resumed.

Detection happens automatically via a persistent CDP supervisor — one WebSocket per task that subscribes to Page/Runtime/Target events. The supervisor also populates a frame_tree field in the snapshot so the agent can see the iframe structure of the current page, including cross-origin (OOPIF) iframes.

Availability matrix:

Backend	Detection via `pending_dialogs`	Response (`browser_dialog` tool)
Local Chrome via `/browser connect` or `browser.cdp_url`	✓	✓ full workflow
Browserbase	✓	✓ full workflow (via injected XHR bridge)
Camofox / default local agent-browser	✗	✗ (no CDP endpoint)

How it works on Browserbase. Browserbase's CDP proxy auto-dismisses real native dialogs server-side within ~10ms, so we can't use Page.handleJavaScriptDialog. The supervisor injects a small script via Page.addScriptToEvaluateOnNewDocument that overrides window.alert/confirm/prompt with a synchronous XHR. We intercept those XHRs via Fetch.enable — the page's JS thread stays blocked on the XHR until we call Fetch.fulfillRequest with the agent's response. prompt() return values round-trip back into page JS unchanged.

Dialog policy is configured in config.yaml under browser.dialog_policy:

Policy	Behavior
`must_respond` (default)	Capture, surface in snapshot, wait for explicit `browser_dialog()` call. Safety auto-dismiss after `browser.dialog_timeout_s` (default 300s) so a buggy agent can't stall forever.
`auto_dismiss`	Capture, dismiss immediately. Agent still sees the dialog in `browser_state` history but doesn't have to act.
`auto_accept`	Capture, accept immediately. Useful when navigating pages with aggressive `beforeunload` prompts.

Frame tree inside browser_snapshot.frame_tree is capped to 30 frames and OOPIF depth 2 to keep payloads bounded on ad-heavy pages. A truncated: true flag surfaces when limits were hit; agents needing the full tree can use browser_cdp with Page.getFrameTree.

Practical Examples

Filling Out a Web Form

User: Sign up for an account on example.com with my email john@example.com

Agent workflow:
browser_navigate("https://example.com/signup")
browser_snapshot()  → sees form fields with refs
browser_type(ref="@e3", text="john@example.com")
browser_type(ref="@e5", text="SecurePass123")
browser_click(ref="@e8")  → clicks "Create Account"
browser_snapshot()  → confirms success

Researching Dynamic Content

User: What are the top trending repos on GitHub right now?

Agent workflow:
1. browser_navigate("https://github.com/trending")
2. browser_snapshot(full=true)  → reads trending repo list
3. Returns formatted results

Session Recording

Automatically record browser sessions as WebM video files:

browser:
  record_sessions: true  # default: false

When enabled, recording starts automatically on the first browser_navigate and saves to ~/.hermes/browser_recordings/ when the session closes. Works in both local and cloud (Browserbase) modes. Recordings older than 72 hours are automatically cleaned up.

Stealth Features

Browserbase provides automatic stealth capabilities:

Feature	Default	Notes
Basic Stealth	Always on	Random fingerprints, viewport randomization, CAPTCHA solving
Residential Proxies	On	Routes through residential IPs for better access
Advanced Stealth	Off	Custom Chromium build, requires Scale Plan
Keep Alive	On	Session reconnection after network hiccups

note

If paid features aren't available on your plan, Hermes automatically falls back — first disabling keepAlive, then proxies — so browsing still works on free plans.

Session Management

Each task gets an isolated browser session via Browserbase
Sessions are automatically cleaned up after inactivity (default: 2 minutes)
A background thread checks every 30 seconds for stale sessions
Emergency cleanup runs on process exit to prevent orphaned sessions
Sessions are released via the Browserbase API (REQUEST_RELEASE status)

Limitations

Text-based interaction — relies on accessibility tree, not pixel coordinates
Snapshot size — large pages may be truncated or LLM-summarized at 8000 characters
Session timeout — cloud sessions expire based on your provider's plan settings
Cost — cloud sessions consume provider credits; sessions are automatically cleaned up when the conversation ends or after inactivity. Use /browser connect for free local browsing.
No file downloads — cannot download files from the browser

Overview​

Setup​

Browserbase cloud mode​

Browser Use cloud mode​

Firecrawl cloud mode​

Hybrid routing: cloud for public URLs, local for LAN/localhost​

Camofox local mode​

Persistent browser sessions​

What Hermes does​

What Hermes does not do​

Verify it's working​

Where state lives​

Externally managed Camofox sessions​

VNC live view​

Local Chromium-family browser via CDP (/browser connect)​

WSL2 + Windows Chrome: prefer MCP over /browser connect​

Local browser mode​

Optional Environment Variables​

Install agent-browser CLI​

Available Tools​

browser_navigate​

browser_snapshot​

browser_click​

browser_type​

browser_scroll​

browser_press​

browser_back​

browser_get_images​

browser_vision​

browser_console​

browser_cdp​

browser_dialog​

Practical Examples​

Filling Out a Web Form​

Researching Dynamic Content​

Session Recording​

Stealth Features​

Session Management​

Limitations​