Web Pentest
Authorized web application penetration testing — reconnaissance, vulnerability analysis, proof-based exploitation, and professional reporting. Adapts Shannon's "No Exploit, No Report" methodology with hard guardrails for scope, authorization, and aux-client leakage. Active testing against running applications you own or have written authorization to test.
Skill metadata
| Source | Optional — install with hermes skills install official/security/web-pentest |
| Path | optional-skills/security/web-pentest |
| Platforms | linux, macos |
Reference: full SKILL.md
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
Web Application Penetration Testing
A phased pentesting workflow for running web applications. Adapted from Shannon's pipeline (Keygraph, AGPL — concepts only, no code borrowed). Built around three rules:
- No exploit, no report — every finding requires reproducible evidence.
- Bounded scope — every active request goes against a target the operator pre-declared. Off-scope hosts are refused.
- Bypass exhaustion before false-positive dismissal — a "blocked" payload is not a clean bill of health until you've tried the bypass set.
⚠️ Hard Guardrails — Read Before Every Engagement
Violating any of these invalidates the engagement and may be illegal.
-
Authorization gate. Before the first active scan in a session, you MUST confirm with the user, in writing, that they own or have written authorization to test the target. Record the acknowledgement in
engagement/authorization.md(see template). No acknowledgement → no active scanning. Reading public pages withcurlis fine; sending payloads is not. -
Scope allowlist. Maintain
engagement/scope.txt— one hostname or CIDR per line. Everynmap,curl,whatweb, browser navigation, or payload-bearing request MUST be against an entry in scope. If a target redirects you off-scope (3xx to a different host, a link in HTML), STOP and confirm with the user before following. -
No production systems without paper. If the user hasn't told you "yes, prod is in scope and I have written sign-off," assume not. Default targets are staging, local docker, dedicated test instances.
-
Cloud metadata is off by default. Do not probe
169.254.169.254,metadata.google.internal,100.100.100.200,[fd00:ec2::254], or equivalent unless the engagement explicitly includes SSRF-to-metadata as a goal AND the target is one you control. The agent's browser tool can reach these from inside your own infrastructure — don't. -
Destructive payloads need approval. SQLi payloads that DROP/DELETE, filesystem-write SSTI, command injection with
rm/shutdown/mkfs, anything that mutates beyond a single test row → ASK FIRST. Theapproval.pysystem catches some; don't rely on it alone. -
Aux-client leakage risk (Hermes-specific). This skill produces sessions full of SQLi/XSS/RCE payloads, captured credentials, JWT tokens. Hermes' compression and title-generation paths replay history through the auxiliary client (often the main model). Anything sensitive you write to the conversation can leave the box on the next compress. Mitigation:
- Redact captured tokens/credentials to the LAST 6 CHARS before logging
them in any message. Full values go to
engagement/evidence/files, never into chat history. - If the engagement is sensitive, set
auxiliary.title_generation.enabled: falsein~/.hermes/config.yamlfor the session.
- Redact captured tokens/credentials to the LAST 6 CHARS before logging
them in any message. Full values go to
-
Rate limit yourself. Default 200ms between active requests against any single host. The recon-scan.sh script enforces this. Don't bypass it without operator approval.
-
Authority of the report. This skill produces a security assessment, not a "PASS." Even a clean run is "no exploitable issues FOUND in scope X within time T using methods Y" — not "the application is secure." Mirror that language in the report.
Phase 0: Engagement Setup
Before any scanning happens, create the engagement directory and authorization acknowledgement.
ENGAGEMENT=engagement-$(date +%Y%m%d-%H%M%S)
mkdir -p "$ENGAGEMENT"/{evidence,findings,reports}
cd "$ENGAGEMENT"
-
Ask the user (verbatim):
"Confirm: (a) the target URL is [X], (b) you own this application or have written authorization to test it, and (c) the engagement may run for up to [N] hours starting now. Reply 'authorized' to proceed."
-
Wait for explicit
authorizedresponse. Any other answer means STOP. -
Record authorization to
engagement/authorization.mdusing the template intemplates/authorization.md. Include:- Target URL(s) and IP(s)
- Authorization basis (ownership / written authz from $name)
- Engagement window
- Out-of-scope items (production, third-party services, etc.)
- Operator name (the user driving this session)
-
Build scope.txt:
localhost
127.0.0.1
staging.example.com
192.168.1.0/24 # internal lab only, with operator OK -
Read
references/scope-enforcement.mdbefore issuing the first active request — that doc has the host-extraction rules you apply to every command/URL before it goes out.
Phase 1: Pre-Recon (Code Analysis, optional)
Skip if no source access (black-box engagement).
If you have read access to the application source:
- Map the architecture — framework, routing, middleware stack
- Inventory sinks — every
execute(,os.system(,eval(, template render, file read/write, redirect target - Map auth — session cookie vs JWT, OAuth flows, password reset, privileged endpoints
- Identify trust boundaries — what's authenticated, what's not,
what comes from
request.* - Backward taint from each sink to a request source. Early-terminate
when proper sanitization is found (parameterized queries, allowlists,
shlex.quote, well-known escapers).
Output: evidence/pre-recon.md — architecture map, sink inventory,
suspected vulnerable code paths.
This is OFFLINE work. No traffic to the target.
Phase 2: Recon (Live, Read-Only)
Maps the attack surface. All requests are GETs of public pages, no payloads yet. Still scope-bounded.
-
Verify scope. Resolve every target hostname → IP. Confirm IPs are in scope (avoids the "DNS points somewhere unexpected" trap).
-
Network surface (only if scope permits port scanning):
nmap -sT -T3 --top-ports 100 -oN evidence/nmap.txt $TARGETUse
-T3(default), not-T4/-T5. Stealthier and avoids tripping IDS/IPS in shared environments. -
Tech fingerprint:
whatweb -v $TARGET_URL > evidence/whatweb.txt
curl -sIk $TARGET_URL > evidence/headers.txt -
Endpoint discovery:
- Crawl the app with the browser tool (
browser_navigate,browser_get_images, follow links). - Inspect
robots.txt,sitemap.xml,.well-known/*. - Use the developer tools network panel via browser tool to capture XHR/fetch calls.
- Crawl the app with the browser tool (
-
Auth surface: Identify login, registration, password reset, session cookie names, token formats. Do NOT send credentials yet — just observe.
-
Correlate with pre-recon (if you have source). For each
evidence/pre-recon.mdfinding, mark whether the live surface confirms it's reachable.
Output: evidence/recon.md — endpoints, technologies, auth model,
input vectors.
Phase 3: Vulnerability Analysis
One delegate_task per vulnerability class. Each agent reads
evidence/recon.md (+ evidence/pre-recon.md if present), produces
findings/<class>-queue.json using templates/exploitation-queue.json.
Use delegate_task with these focused subagents (parallel where possible):
| Class | Goal | Reference |
|---|---|---|
injection | SQLi, command, path traversal, SSTI, LFI/RFI, deserialization | references/vuln-taxonomy.md (slot types) |
xss | Reflected, stored, DOM-based | references/vuln-taxonomy.md (render contexts) |
auth | Login bypass, JWT confusion, session fixation, OAuth flaws | references/exploitation-techniques.md |
authz | IDOR, vertical/horizontal escalation, business logic | references/exploitation-techniques.md |
ssrf | Internal reachability, metadata, protocol smuggling | Skip metadata unless explicitly authorized |
infra | Misconfig, info disclosure, default creds, exposed admin | references/exploitation-techniques.md |
Each queue entry has: id, vuln class, source (file:line if known),
endpoint, parameter, slot type, suspected defense, verdict
(identified / partial / confirmed / critical), witness payload,
confidence (0-1), notes.
The analysis phase doesn't send malicious payloads yet — it stages them. The exploitation phase actually fires them.
Phase 4: Exploitation (Proof-Based, Conditional)
Only run a sub-agent per class where the analysis queue has actionable
entries (identified or partial).
For each candidate:
- Pre-send check — host in scope? auth gate satisfied? payload approved if destructive?
- Send the witness payload — minimal proof. SQLi:
' AND 1=1--then' AND 1=2--. XSS: a benign marker like<svg/onload=console.log("HERMES-PENTEST-XSS")>. Neveralert(1)in stored XSS — it'll fire for other users in shared environments. - Verify the witness fires — for blind injection, use a sleep
probe (
SLEEP(5)) and time the response. For SSRF, use a tester-controlled callback host you own (NOT a public service like webhook.site for sensitive engagements — exfil paths). - Promote level:
- L1 Identified — pattern matched, no behavior change
- L2 Partial — sink reached, but defense in place
- L3 Confirmed — payload changed app behavior in observable way
- L4 Critical — data extracted, code executed, access escalated
- Bypass exhaustion before classifying as FP. For each candidate
that blocks: try at least the bypass set in
references/bypass-techniques.mdfor that class. Only after the set is exhausted may you writeverdict: false_positive. - Record evidence for every L3/L4:
- Full request (method, URL, headers, body)
- Response (status, headers, relevant body excerpt)
- Reproducer command (curl one-liner)
- Impact statement
Output: findings/exploitation-evidence.md
Redact in evidence files:
- Any captured credentials/tokens → last 6 chars only in chat;
full value to
findings/secrets-vault.md(gitignored). - Other users' PII → redact.
- Your test credentials → fine to keep.
Phase 5: Reporting
Generate the final report using templates/pentest-report.md. Sections:
- Executive summary
- Engagement scope (from
engagement/scope.txt) - Authorization (from
engagement/authorization.md) - Findings (L3/L4 only — proof-required). Per finding:
- Title, severity (CVSS 3.1), CWE
- Affected endpoint(s)
- Proof (request + response excerpt)
- Reproduction steps
- Impact
- Remediation
- Not-exploited candidates (L1/L2 with notes on what blocked them)
- Out-of-scope observations
- Methodology / tools used
- Limitations and what was NOT tested
Severity policy: CVSS only for L3/L4. L1/L2 are "candidates pending verification" — don't assign CVSS to unverified findings.
When to Stop
- The user revokes authorization.
- A candidate finding clearly impacts production data and you don't have approval for destructive testing — STOP and ask.
- The target starts returning 503/429 storms — back off, reconvene with the operator.
- You discover something outside the contracted scope (e.g. an exposed customer database while testing an unrelated endpoint). STOP, document, report to the operator. Do not pivot without explicit approval — that pivot is what makes pentesting illegal.
What This Skill Does NOT Cover
- Network-layer pentesting beyond port scanning (no Metasploit, Cobalt Strike, AD attacks, network protocol fuzzing).
- Reverse engineering / binary analysis (see issue #383).
- Source-only static analysis (see issue #382).
- Active social engineering / phishing.
- Anything against systems the operator hasn't pre-authorized.
If the engagement needs any of these, escalate to a professional pentester. This skill complements professional pentesting; it does not replace it.
Further Reading
references/scope-enforcement.md— how to bound every active requestreferences/vuln-taxonomy.md— slot types, render contexts, OWASP mapreferences/exploitation-techniques.md— per-class payload patternsreferences/bypass-techniques.md— common WAF/filter bypassestemplates/authorization.md— engagement authorization templatetemplates/pentest-report.md— final report templatetemplates/exploitation-queue.json— per-class finding queue schemascripts/recon-scan.sh— rate-limited nmap+whatweb+headers wrapper