Skip to main content

Darwinian Evolver

Evolve prompts/regex/SQL/code with Imbue's evolution loop.

Skill metadata

SourceOptional — install with hermes skills install official/research/darwinian-evolver
Pathoptional-skills/research/darwinian-evolver
Version0.1.0
AuthorBihruze (Asahi0x), Hermes Agent
LicenseMIT
Platformslinux, macos
Tagsevolution, optimization, prompt-engineering, research
Related skillsarxiv, jupyter-live-kernel

Reference: full SKILL.md

info

The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.

Darwinian Evolver

Run Imbue's darwinian_evolver — an LLM-driven evolutionary search loop — to optimize a prompt, regex, SQL query, or small code snippet against a fitness function.

Status: thin wrapper around the upstream tool. The skill installs it, walks the agent through writing a Problem definition (organism + evaluator + mutator), and drives the loop via the upstream CLI or a small custom Python driver.

License: the upstream tool is AGPL-3.0. The skill ONLY ever invokes it via the upstream CLI or a subprocess/uv run call (mere aggregation). Do NOT import upstream classes into Hermes itself.

When to Use

  • User says "optimize this prompt", "evolve a regex for X", "auto-improve this code/SQL", "search for a better instruction".
  • You have a scorer (exact match, regex pass-rate, unit test, LLM-judge, runtime metric) AND a starting candidate (organism). If you don't have a scorer, stop and define one first — that's the hard part.
  • Cost is OK: a typical run is 50–500 LLM calls. On gpt-4o-mini that's pennies; on Claude Sonnet it can be a few dollars.

Do not use this when:

  • The optimization target is differentiable (use gradient descent / DSPy).
  • You only need to try 2–3 variants — just write them by hand.
  • The fitness signal is purely subjective with no measurable criterion.

Prerequisites

  • Python ≥3.11
  • git, uv (or pip)
  • One of: OPENROUTER_API_KEY, ANTHROPIC_API_KEY, or OPENAI_API_KEY

The skill ships a small parrot_openrouter.py driver that uses OPENROUTER_API_KEY via the OpenAI SDK, so any model on OpenRouter works. The upstream CLI itself hardcodes Anthropic and needs ANTHROPIC_API_KEY.

Install (One-Time)

Run via the terminal tool:

mkdir -p ~/.hermes/cache/darwinian-evolver && cd ~/.hermes/cache/darwinian-evolver
[ -d darwinian_evolver ] || git clone --depth 1 https://github.com/imbue-ai/darwinian_evolver.git
cd darwinian_evolver && uv sync

Verify:

cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver \
&& uv run darwinian_evolver --help | head -5

Quick Start — The Built-In Parrot Example

Tiny smoke test (requires ANTHROPIC_API_KEY):

cd ~/.hermes/cache/darwinian-evolver/darwinian_evolver
uv run darwinian_evolver parrot \
--num_iterations 2 \
--num_parents_per_iteration 2 \
--mutator_concurrency 2 --evaluator_concurrency 2 \
--output_dir /tmp/parrot_demo

Outputs:

  • /tmp/parrot_demo/snapshots/iteration_N.pkl — pickled population per iteration
  • /tmp/parrot_demo/<jsonl> — per-iteration JSON log (path printed at end)

Open ~/.hermes/cache/darwinian-evolver/darwinian_evolver/darwinian_evolver/lineage_visualizer.html in a browser and load the JSON log to see the evolutionary tree.

Quick Start — OpenRouter Driver (No Anthropic Key)

The skill ships scripts/parrot_openrouter.py — same parrot problem, but the LLM call goes through OpenRouter so any provider works.

# From wherever the skill is installed:
SKILL_DIR=~/.hermes/skills/research/darwinian-evolver
DE_DIR=~/.hermes/cache/darwinian-evolver/darwinian_evolver

cd "$DE_DIR" && \
EVOLVER_MODEL='openai/gpt-4o-mini' \
uv run --with openai python "$SKILL_DIR/scripts/parrot_openrouter.py" \
--num_iterations 3 --num_parents_per_iteration 2 \
--output_dir /tmp/parrot_or

Inspect the result with scripts/show_snapshot.py:

uv run --with openai python "$SKILL_DIR/scripts/show_snapshot.py" \
/tmp/parrot_or/snapshots/iteration_3.pkl

Expected output: 7 evolved prompt templates ranked by score, with the best landing around 0.6–0.8 (the seed Say {{ phrase }} scored 0.000).

Defining a Custom Problem

The skill ships templates/custom_problem_template.py — copy, edit, run. Three things you must define:

  1. Organism — a Pydantic BaseModel subclass holding the artifact being evolved (prompt_template: str, regex_pattern: str, sql_query: str, code_block: str, etc.). Add a run(*args) method that exercises it.

  2. Evaluator.evaluate(organism) -> EvaluationResult(score=..., trainable_failure_cases=[...], holdout_failure_cases=[...], is_viable=True).

    • score is in [0, 1]. Higher is better.
    • trainable_failure_cases — what the mutator sees. Include enough context (input, expected, actual) for the LLM to diagnose.
    • holdout_failure_cases — kept out of the mutator's view. Use these to detect overfitting.
    • is_viable=True unless the organism is completely broken (raises, returns None, etc.). A 0-score viable organism is fine — it just gets down-weighted in parent selection.
  3. Mutator.mutate(organism, failure_cases, learning_log_entries) -> list[Organism]. Typically: build an LLM prompt that includes the current organism + a failure case + an ask to propose a fix; parse the LLM's response; return a new Organism. Return [] on parse failure — the loop handles it.

Then write a driver script that wires Problem(initial_organism, evaluator, [mutators]) into EvolveProblemLoop and iterates over loop.run(num_iterations=N) — the shipped scripts/parrot_openrouter.py is the reference.

Hyperparameters That Actually Matter

flagdefaultwhen to change
--num_iterations5bump to 10–20 once you trust the evaluator
--num_parents_per_iteration4drop to 2 for cheap exploration
--mutator_concurrency10drop to 2–4 to avoid rate limits
--evaluator_concurrency10same; evaluator hits the LLM too
--batch_size1raise to 3–5 once your mutator handles multiple failures
--verify_mutationsoffturn on once mutator is wasteful (>10× cost saving on later runs per Imbue)
--midpoint_scorep75leave alone unless scores cluster
--sharpness10leave alone

Pitfalls

  1. Initial organism must be viable — set is_viable=True in your EvaluationResult even on a 0-score seed. The loop refuses non-viable organisms because they imply the loop has nothing to evolve from.
  2. Provider content filters kill runs. Azure-backed OpenRouter models reject phrases like "ignore previous instructions" with HTTP 400. Wrap the LLM call in try/except and return f"<LLM_ERROR: {e}>" — the evolver will just score that organism 0 and move on.
  3. loop.run() is a generator — calling it doesn't run anything until you iterate. Use for snap in loop.run(num_iterations=N):.
  4. Snapshots are nested pickles. iteration_N.pkl contains a dict with population_snapshot (more pickled bytes). To unpickle you must have the Organism class importable under the same dotted path it was pickled at.
  5. Concurrency defaults are aggressive. 10/10 will hit rate limits on most providers. Start with 2/2.
  6. CLI is hardcoded to Anthropic. uv run darwinian_evolver <problem> reaches for ANTHROPIC_API_KEY and uses Claude Sonnet. To use any other provider, write a driver like parrot_openrouter.py.
  7. AGPL. Never from darwinian_evolver import ... inside Hermes core. Custom driver scripts under ~/.hermes/skills/... are user-side and fine.
  8. No PyPI package. pip install darwinian-evolver will pull the wrong thing. Always install from the GitHub repo.

Verification

After install + a parrot run, exit code 0 from this is sufficient:

DE_DIR=~/.hermes/cache/darwinian-evolver/darwinian_evolver
ls "$DE_DIR/darwinian_evolver/lineage_visualizer.html" >/dev/null && \
cd "$DE_DIR" && uv run darwinian_evolver --help >/dev/null && \
echo "darwinian-evolver: OK"

References