Unbroker
Autonomously remove your info from data-broker sites.
Skill metadata
| Source | Optional — install with hermes skills install official/security/unbroker |
| Path | optional-skills/security/unbroker |
| Version | 1.0.0 |
| Author | SHL0MS (github.com/SHL0MS) |
| License | MIT |
| Platforms | linux, macos, windows |
| Tags | privacy, data-broker, opt-out, ccpa, gdpr, security, doxxing |
| Related skills | google-workspace, agentmail, himalaya, scrapling, osint-investigation |
Reference: full SKILL.md
The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
unbroker
Find where a person's personal information (name, addresses, phone, email, relatives) is exposed on data brokers and people-search sites, then remove it - automatically where possible, with guided human steps only where a site demands a CAPTCHA, government ID, phone call, or fax. Manages multiple people independently. It does not defeat anti-bot systems, does not act on anyone without recorded consent, and does not remove public records (voter/property/court) or accounts the person controls.
The Python CLI (scripts/pdd.py) owns the deterministic state - config, dossiers + consent, the
broker database, tier planning, the ledger, drafts, reports, email sending (SMTP), verification-link
polling (IMAP), and the autonomous action queue (next). You (the agent) do the scanning and
form-driving with native tools: web_extract and browser_navigate for searching and web forms, and
cronjob for recurring re-scans.
Autonomy contract
This skill is designed to run hands-off. After intake (+ recorded consent) there are exactly TWO
legitimate human touchpoints: (1) the intake conversation itself, and (2) ONE consolidated human-task
digest at the end of the run ($PDD tasks). Between those:
- Never ask the operator to choose configuration.
$PDD setup --autodetects capabilities and picks the most autonomous valid config itself. - Never pause before individual submissions when
autonomy=full(the default): the consent recorded at intake is standing authorization for T0-T2 opt-outs. (autonomy=assistedrestores per-submission confirmation for cautious operators - honorconfirm_firstflags innextoutput.) - Never interrupt the run for human-only work. Record it (
record ... human_task_queued --reason "...") and keep going; it all surfaces once in the final digest. - Drive the whole run as a loop over
$PDD next <subject>- it returns the exact ordered actions to take right now (scan, poll verification, re-check, opt out parents-first, requeue blocked), plus the human digest. Execute every action, record outcomes, re-runnext, repeat untildone_for_now. Then present the digest, report, and schedule the cron.
The hard limits that autonomy never overrides: no acting without recorded consent, no disclosure
beyond disclosure_fields, no CAPTCHA/anti-bot bypass, and confirmed_removed only after a
verifying re-scan.
When to Use
- "Remove my (or my family member's) data from data brokers / people-search sites."
- "Opt me out", "delete me from Spokeo/Whitepages/etc.", "clean up after a doxxing."
- "Set up recurring privacy monitoring" (brokers re-list people).
- Checking which brokers still expose someone and why.
Prerequisites
python3(stdlib only; no extra packages needed for the core engine).- Optional upgrades (the skill works zero-config without these;
setup --autoturns on every one it detects, reading credentials from the shell env and from$HERMES_HOME/.envso keys Hermes already loads for its own tools are picked up without re-exporting - each one converts a class of human tasks into agent actions):- Cloud browser (recommended default):
BROWSERBASE_API_KEY.setup --autoselects it whenever the key is present, and it is the intended baseline: a real residential-IP cloud browser clears soft/managed CAPTCHAs (Cloudflare Turnstile, hCaptcha/reCAPTCHA checkbox) as normal operation, so those brokers stay automated (T1) instead of becoming human tasks. This is not CAPTCHA "solving" - no solver service, no fingerprint spoofing; only interactive/behavioral ("hard") challenges the browser genuinely cannot pass fall back to a human task. Without the key, the plain agent browser is used and soft-CAPTCHA brokers drop to T2 (human). - Email automation, two credential-free-or-not options:
- Browser mode (no password):
setup --email-mode browser. The agent sends opt-out/CCPA emails and opens verification links through the operator's logged-in webmail usingbrowser_*tools. Nothing is stored. This requires Hermes to be pointed at the operator's own logged-in browser, NOT a cloud browser: a headless cloud browser (Browserbase) holds no webmail session and is itself Cloudflare/DataDome-gated on webmail and on session-bound broker gates (e.g. PeopleConnect guided-mode). Drive the operator's real Chrome over CDP - launchchrome --remote-debugging-port=9222 --user-data-dir="$HOME/.hermes/chrome-debug"(a dedicated debug profile signed into the webmail once, not the Default profile) and connect the browser tools to127.0.0.1:9222.$PDD cdplaunches this for you (finds Chrome/Chromium/Brave/Edge, starts it detached on the dedicated profile, prints the CDP endpoint;--checkto test,--printfor the command). Seereferences/methods.md-> "Browser backends: scan vs execute". Falls back to drafts for an email if the inbox isn't reachable. - SMTP/IMAP (stored creds):
EMAIL_ADDRESS+EMAIL_PASSWORD(+EMAIL_SMTP_HOST/EMAIL_IMAP_HOSTfor non-mainstream providers; gmail/outlook/yahoo/icloud/fastmail inferred). The CLI sends viasend-emailand reads verify links viapoll-verification. Theagentmailskill (per-broker aliases) also counts.
- Browser mode (no password):
- Google Sheets tracker: the
google-workspaceskill. - The
scraplingskill for stealth/Cloudflare-protected pages.
- Cloud browser (recommended default):
How to Run
Run everything through the terminal tool. From this skill's directory:
PDD="python3 scripts/pdd.py"
The engine stores data under $PDD_DATA_DIR (default $HERMES_HOME/unbroker), written
0600. Run via terminal, not execute_code (that sandbox scrubs env and redacts output, which
breaks reading the dossier).
Quick Reference
| Command | Purpose |
|---|---|
$PDD setup --auto | Autonomous setup: detect capabilities, pick the most autonomous valid config (no questions) |
$PDD doctor | Readiness check: config, broker count, and which upgrades are on/available |
$PDD cdp [--check] [--print] [--port N] | Launch/detect the operator's Chrome over CDP for Phase-2 browser + webmail (dedicated debug profile; the reliable way to send webmail and clear session-bound gates) |
$PDD intake --full-name "..." [--alias ...] [--email ... --phone ...] [--city --state] [--prior-location "City,ST"] --consent | Create a consenting subject; captures aliases + multiple emails/phones + prior locations; prints subject_id |
$PDD next <subject> | The autonomous loop driver: ordered agent actions right now + human digest + next_wake_at |
$PDD brokers [--priority crucial] | List the people-search broker database (curated + live) |
$PDD refresh-brokers | Pull the latest BADBOOL people-search list and the CA Data Broker Registry (next requeues this automatically when the cache is stale) |
$PDD registry [--search NAME] | State registry coverage (CA ~545 ingested; VT/OR/TX portals surfaced); the DROP/email lane, not scanned |
$PDD drop <subject> [--filed] | The one-shot legal lever: one CA DROP request deletes from ALL registered brokers; --filed records it |
$PDD plan <subject> [--priority crucial] | Per-broker tier + method + search_vectors + the exact fields to disclose |
$PDD plan <subject> --batch | Reduce view: overlays ledger state, groups brokers by next action (unscanned/found/indirect/blocked/in_progress/done), collapses ownership clusters, orders found cluster-parents-first + emits a tailored parent_playbook, prints next_actions |
$PDD fanout <subject> [--priority crucial] [--size 5] | Batch brokers into parallel delegate_task subagents (auto for large runs; batches of 5 - 8+ time out) |
$PDD record <subject> <broker> <state> [--found true] [--evidence JSON] [--disclosed F --channel C] [--reason "..."] | Update the ledger (validated state machine); auto-stamps next_recheck_at |
$PDD show <subject> <broker> | Read back a case's recorded state + evidence + disclosure log (so the parent re-verifies a subagent's found without re-deriving the listing URL) |
$PDD send-email <subject> <broker> --listing <url> [--kind ccpa_indirect ...] | Render + record the request (recipient locked to the broker's own address). browser mode returns a compose payload to send via webmail (no password); programmatic mode SMTP-sends |
$PDD verify-link <subject> <broker> --text '<body>' | browser mode: extract a broker's verification link from webmail text you read (anti-phishing scored) |
$PDD poll-verification <subject> [--broker <id>] | programmatic mode: poll IMAP for verification links (anti-phishing scored); auto-advances submitted → verification_pending |
$PDD render-email <subject> <broker> --listing <url> | Draft only (fallback when no email mode is configured) |
$PDD due <subject> | Cases whose recheck window arrived (the cron re-scan queue) |
$PDD tasks <subject> | ONE consolidated human-task digest (present at END of run) |
$PDD status <subject> | Markdown status report |
$PDD report <subject> --sheets | Rows for the Google Sheets tracker |
Batch operation (two-phase: crawl-all, then delete)
For anything past a couple of brokers, run this as map → reduce → act, not broker-by-broker:
- Phase 1 - DISCOVER (read-only, parallel, idempotent). Crawl every broker first and record a
verdict for each (
found/not_found/indirect_exposure/blocked). Scanning has no side effects, so it is safe to parallelize and retry. Getting the full exposure map before acting is what unlocks cluster dedup and prioritization below. Default: the parent drivesweb_extractprobes directly - most people-search sites render name/phone/address results as static HTML thatweb_extractreads in seconds. Escalate tobrowser_*only for the few JS-only sites, and todelegate_tasksubagents only for genuinely reasoning-heavy work (large-scale namesake/relative disambiguation). Do NOT hand a browser-toolset subagent a big list of brokers to crawl - in the field this timed out repeatedly (600s, ~5-6 brokers each, no summary) because browser navigation is heavy; the ledger writes that survived came at 10x the cost of parentweb_extract. Ablocked(DataDome/Cloudflare/antibot) site is not a subagent job either: recordblockedand requeue it for a stealth/cloud browser (Browserbase) pass. Subagent reports are self-reports - the parent re-fetches key URLs to confirm afoundbefore trusting it (this cuts both ways: it caught a real listing the parent had wrongly assumed was a false positive). - REDUCE -
$PDD plan <subject> --batch. Collapses the crawl into a phase-oriented plan: groups by next action, collapses ownership clusters (a parent removal that clears children is ONE action, not N - e.g. one Intelius/PeopleConnect suppression covers Truthfinder/Instant Checkmate/US Search/…), and printsnext_actions.phaseisdiscoverwhile anything is unscanned, elsedelete. - Phase 2 - DELETE (sequential, irreversible). Work the reduced groups parents first:
plan --batchorders thefoundgroup cluster-parents-first (most children first) and emits aparent_playbookwith tailored, ordered steps per parent - follow that order and those steps (full recipes inreferences/methods.md→ "Ownership clusters - DO PARENTS FIRST"). Do the cluster parents (skipping the covered children), re-scan each parent's children after it confirms (they usually drop out), then the standalone listings; send theindirect_exposurecases as CCPA/GDPR delete-my-PII emails (send-email --kind ccpa_indirect), and deferblockedto the stealth-browser pass. Opt-outs hit CAPTCHAs, email-verification loops, and session binding - work them one at a time, carefully (this is the opposite of fan-out), but do NOT stop to ask permission per submission inautonomy=full; inassisted, confirm each one. Usually prefer deletion over suppression where a broker offers both (Spokeo/BeenVerified) - but follow the record'sdeletion.prefer: PeopleConnect is the exception (prefer: false), where deleting your user data removes your suppressions and does not stop public-records re-listing, so you suppress-and-maintain instead. - Blind opt-out is the DEFAULT, not a fallback. Submit an opt-out/deletion on every site with an
accessible removal channel, even when a listing was not first confirmed - it discloses only the
subject's own identifiers to the broker's own official channel, so it does not violate
least-disclosure. Two corollaries: (1) a guided flow that matches email+DOB+name and says "no results"
is a stronger
not_foundthan any scrape - the opt-out flow doubles as the search; (2) when a form is automation-hostile (hard CAPTCHA, Cloudflare/DataDome, slide-to-verify slider), default to the broker's cited rights-request email (name+state+contact-email only) rather than recordingblocked. CAPTCHA policy: never defeat behavioral/token/slider challenges; OK to read a static distorted-text or plain-arithmetic CAPTCHA on the subject's own opt-out, but stop if the site rejects the whole submission after a correct answer (it is fingerprinting the automation). Third-party/indirect records are the exception - still confirm those before acting. Per-site game plans + the meta-search no-op skip-list are inreferences/site-playbooks.md; the full policy is inreferences/methods.md. - PeopleConnect delete-wipes-suppression (permanent rule). A PeopleConnect deletion wipes the
suppression and the subject re-lists across the whole affiliate cluster. If a "Your deletion request
for PeopleConnect.us is Complete" email ever appears, the suppression is gone -> re-run suppression
and re-verify the Control step reads "suppressed". Never leave this cluster on a completed deletion
(see
references/brokers/intelius.json).
Subagent reports are self-reports: the parent re-verifies key claims (listing URLs, match basis) before
recording found and before any deletion.
Procedure (the autonomous loop)
-
Setup (once, no questions). Run
$PDD setup --auto- it detects capabilities and configures the most autonomous valid combination itself (programmatic email whenEMAIL_*creds exist, Browserbase when its key exists,ageencryption when the binary exists,autonomy=full). Then$PDD doctorand show the operator the readiness output for information, not as a question - proceed immediately. Mention what would unlock more automation (e.g. email creds) but do not wait. -
Intake + consent (the ONE human conversation).
$PDD intake ...with--consent(and--consent-method). Without consent the engine refuses to plan or act. Collect everything in one pass - names/aliases, current + prior cities, emails, phones - so you never have to come back with questions. For California subjects, also readreferences/legal/drop.md:nextwill surface adrop_submitone-shot that deletes from every registered broker (~545) at once, which is the single highest-leverage action. File it, thendrop <subject> --filed. For non-CA subjects the registry is covered by targeted CCPA/GDPR emails (registry --search, thensend-email); the people-search sites are worked directly in either case. -
Drain the queue. Loop:
while true:
q = $PDD next <subject>
if q.actions is empty: break
execute EVERY action in order; record each outcome via $PDD recordnextemits, in order:refresh_brokers(stale cache),fanout_scan/scan_inline(Phase 1 crawl - see step 4),poll_verification(in-flight email confirmations),verify_removal(due re-checks),optout_web_form/optout_email_send(Phase 2, parents-first with playbook steps),indirect_email_send, andstealth_rescan. Human-only work never appears as an action - it accumulates inq.human_digest. Inautonomy=full, execute actions without pausing; honorconfirm_firstinassistedmode. -
Scanning (when
nextsays so). Forfanout_scan: run$PDD fanout <subject>and spawn onedelegate_tasksubagent perbatch, in parallel, passing that batch's ready-madebrief- do not scan all brokers yourself sequentially. Forscan_inline: scan the few brokers yourself. Either way, each broker gets everysearch_vectorsentry via thereferences/methods.mdladder (web_extract→site:probe →browser_navigate→scrapling), a 404 is INCONCLUSIVE (notnot_found),blockedis recorded whenantibotis set and no stealth browser is available, and subject vs namesake/relative is confirmed before recording:$PDD record <subject> <broker> <found|not_found|indirect_exposure|blocked> --found <bool> --evidence '{"listing_urls":[...]}'. The parent re-verifies keyfoundclaims from subagents before trusting them. -
Opt-outs (when
nextsays so). Actions come pre-ordered parents-first withstepsfrom each broker record's ownoptout.playbook(field-verified; cluster parents like PeopleConnect, Whitepages, BeenVerified, Spokeo have exact, live-checked recipes). Deletion usually beats suppression: when an action carriesprefer_deletion, complete the record's DELETION lane, not just the hide-my-listing flow. When it carriesprefer_suppressioninstead (PeopleConnect - deleting removes your suppressions and does not stop re-listing), do the suppression flow and keep it maintained; use their Delete button only for a deliberate data-purge. Per method:- web_form → drive
optout_urlwithbrowser_navigate/browser_type/browser_click, submit onlydisclosure_fields, screenshot the confirmation, then the action'safterrecord command. Playbooks may end with a right-to-deletesend-emailfollow-up - do it (full erasure, not just listing suppression). - email →
$PDD send-email <subject> <broker> --kind <ccpa|gdpr|generic> --to <addr> --listing <url>records + discloses in one step (recipient locked to addresses the broker record declares;nextpicks the kind from residency - never claim CCPA/GDPR for someone who can't). In browser mode it returns a recipient-lockedcomposepayload: compose a new message tocompose.towithcompose.subject/compose.bodyexactly in the operator's webmail viabrowser_*and send (no password); in programmatic mode it SMTP-sends.nextalso routes human-gated forms (phone-callback/gov-ID) through a broker's deletion email when one exists - the rescue lane (verified Whitepages pattern). Draft-only falls back torender-email+ a digest entry. - captcha → soft/managed challenges clear automatically on the default cloud browser (proceed
as normal); only a hard interactive/behavioral challenge it can't pass is recorded
blocked(requeued for the stealth/operator-browser pass). Never a solver service. - phone_callback / account / gov_id / fax / mail / voice (T3) without a deletion email →
never an agent action;
nextalready routed these to the digest. Record them:$PDD record <subject> <broker> human_task_queued --reason "...".
- web_form → drive
-
Verification (when
nextsays so). In programmatic mode$PDD poll-verification <subject>finds arrived confirmation links via IMAP (anti-phishing scored, auto-advances state). In browser mode, open the broker's confirmation email in the operator's webmail and run$PDD verify-link <subject> <broker> --text '<body>'to score the link. Either way open the link in the same browser (several brokers bind the verification session to the browser that opens it), finish the flow, then recordawaiting_processing.confirmed_removedONLY after a verifying re-scan shows the listing gone - never off the submission flow's own confirmation page. -
Wrap up (once per run). When
nextreturns no actions: present$PDD tasks <subject>(the consolidated human digest) if non-empty, then$PDD status <subject>; if the Sheets tracker is on, append$PDD report <subject> --sheetsrows via thegoogle-workspaceskill. -
Schedule the next wake-up.
nextreturnsnext_wake_at(earliest due re-check). Create ONEcronjobthat re-runs this skill's loop for the subject (a prompt like: "run the unbroker loop for <subject_id>:$PDD nextand execute all actions"). Processing windows, verification polls, and reappearance sweeps all flow through the same queue, so the case keeps advancing with zero human attention.
Pitfalls
- Never disclose more than the broker already shows. Submit only
disclosure_fields. The engine never volunteers SSN/ID numbers; you must not either. - No consent, no action. The engine enforces this; do not work around it to "research" a third party.
send-emailis idempotent + rate-limited. It refuses to re-send a case alreadysubmittedor beyond (use--forceonly if a genuine re-send is needed), and SMTP sends are paced byemail_min_interval_seconds(default 20s) with retry/backoff. Do not loop it to "make sure" - a successful SMTP handoff is not proof of delivery; the due-queue re-scan is the real confirmation.- Ledger writes are locked. Concurrent runs (cron + manual) serialize safely; if you ever see a
lock timeout, another run is mid-write - let it finish, don't delete the
.lockby hand. - Autonomy ≠ improvisation. Full autonomy means not asking between steps; it does not loosen any
gate. If a broker demands MORE than the planned
disclosure_fieldsmid-flow, stop that case and queue it (human_task_queued --reason) rather than deciding alone to disclose extra PII. - Don't interrupt the run with questions. Config choices are
setup --auto's job; human-only work goes to the digest. The only mid-run question that's ever warranted is a missing-identity fact that blocks scanning (e.g. no city at all) - and that should have been collected at intake. - Use
terminal, notexecute_codeforpdd.py(secret scrubbing + output redaction break it). - Dossiers are plaintext by default (JSON,
0600underHERMES_HOME). For at-rest encryption run$PDD setup --encryption age- it generates a localagekey and encrypts dossiers + ledgers (the audit log holds field names only and stays plaintext). It guards casual/backup/commit exposure, not a full-HERMES_HOMEread; setPDD_AGE_IDENTITYto a separate volume for real key separation.$PDD doctorshows whether encryption is actually engaged (not just whetherageis installed). - "Hidden from free search" ≠ deleted. Only mark
confirmed_removedafter verifying the record is actually gone; note paid-tier retention in the report. - Soft CAPTCHAs clear by default; don't fight the hard ones. The default cloud browser passes
managed/soft challenges as normal operation (those brokers stay T1). For a hard interactive one it
genuinely can't pass, record
blockedand let the stealth/operator-browser pass take it - never a third-party solver service or fingerprint spoofing. - Broker pages change. If a flow breaks,
$PDD record ... blockedand flag the broker file inreferences/brokers/for re-verification instead of guessing. - Verify non-field-verified records before submitting.
confidence: autorecords came from parsing BADBOOL (readoptout.notes/optout.links, confirm the real opt-out URL).confidence: documentedrecords (several people-search sites) carry the correct published opt-out URL but have not been field-verified (they 403 datacenter IPs), so confirm the live flow via the operator's residential browser on first use, then setlast_verified. Field-verified curated records (noconfidence, e.g. the cluster parents) have checked mechanics and take precedence.
Verification
scripts/run_tests.sh tests/skills/test_unbroker_skill.py(hermetic; no network), or the dependency-free runnerpython3 tests/skills/test_unbroker_skill.py.- Dry run:
$PDD setup --auto && $PDD doctor && SID=$($PDD intake --full-name "Test Person" --email t@example.com --consent | python3 -c 'import sys,json;print(json.load(sys.stdin)["subject_id"])') && $PDD next "$SID"and confirm a readiness summary plus an ordered action queue.