跳到主要内容

Building a Context Engine Plugin

Context engine plugins replace the built-in ContextCompressor with an alternative strategy for managing conversation context. For example, a Lossless Context Management (LCM) engine that builds a knowledge DAG instead of lossy summarization.

How it works

The agent's context management is built on the ContextEngine ABC (agent/context_engine.py). The built-in ContextCompressor is the default implementation. Plugin engines must implement the same interface.

Only one context engine can be active at a time. Selection is config-driven:

# config.yaml
context:
engine: "compressor" # default built-in
engine: "lcm" # activates a plugin engine named "lcm"

Plugin engines are never auto-activated — the user must explicitly set context.engine to the plugin's name.

Directory structure

Each context engine lives in plugins/context_engine/<name>/:

plugins/context_engine/lcm/
├── __init__.py # exports the ContextEngine subclass
├── plugin.yaml # metadata (name, description, version)
└── ... # any other modules your engine needs

The ContextEngine ABC

Your engine must implement these required methods:

from agent.context_engine import ContextEngine

class LCMEngine(ContextEngine):

@property
def name(self) -> str:
"""Short identifier, e.g. 'lcm'. Must match config.yaml value."""
return "lcm"

def update_from_response(self, usage: dict) -> None:
"""Called after every LLM call with the usage dict.

Update self.last_prompt_tokens, self.last_completion_tokens,
self.last_total_tokens from the response.
"""

def should_compress(self, prompt_tokens: int = None) -> bool:
"""Return True if compaction should fire this turn."""

def compress(self, messages: list, current_tokens: int = None,
focus_topic: str = None) -> list:
"""Compact the message list and return a new (possibly shorter) list.

The returned list must be a valid OpenAI-format message sequence.

``focus_topic`` is an optional topic string from manual
``/compress <focus>``; engines that support guided compression should
prioritise preserving information related to it, others may ignore it.
"""

Class attributes your engine must maintain

The agent reads these directly for display and logging:

last_prompt_tokens: int = 0
last_completion_tokens: int = 0
last_total_tokens: int = 0
threshold_tokens: int = 0 # when compression triggers
context_length: int = 0 # model's full context window
compression_count: int = 0 # how many times compress() has run

Optional methods

These have sensible defaults in the ABC. Override as needed:

MethodDefaultOverride when
on_session_start(session_id, **kwargs)No-opYou need to load persisted state (DAG, DB)
on_session_end(session_id, messages)No-opYou need to flush state, close connections
on_session_reset()Resets token countersYou have per-session state to clear
update_model(model, context_length, ...)Updates context_length + thresholdYou need to recalculate budgets on model switch
get_tool_schemas()Returns []Your engine provides agent-callable tools (e.g., lcm_grep)
handle_tool_call(name, args, **kwargs)Returns error JSONYou implement tool handlers
should_compress_preflight(messages)Returns FalseYou can do a cheap pre-API-call estimate
get_status()Standard token/threshold dictYou have custom metrics to expose

Engine tools

Context engines can expose tools the agent calls directly. Return schemas from get_tool_schemas() and handle calls in handle_tool_call():

def get_tool_schemas(self):
return [{
"name": "lcm_grep",
"description": "Search the context knowledge graph",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"],
},
}]

def handle_tool_call(self, name, args, **kwargs):
if name == "lcm_grep":
results = self._search_dag(args["query"])
return json.dumps({"results": results})
return json.dumps({"error": f"Unknown tool: {name}"})

Engine tools are injected into the agent's tool list at startup and dispatched automatically — no registry registration needed.

Registration

Place your engine in plugins/context_engine/<name>/. The __init__.py must export a ContextEngine subclass. The discovery system finds and instantiates it automatically.

Via general plugin system

A general plugin can also register a context engine:

def register(ctx):
engine = LCMEngine(context_length=200000)
ctx.register_context_engine(engine)

Only one engine can be registered. A second plugin attempting to register is rejected with a warning.

Lifecycle

1. Engine instantiated (plugin load or directory discovery)
2. on_session_start() — conversation begins
3. update_from_response() — after each API call
4. should_compress() — checked each turn
5. compress() — called when should_compress() returns True
6. on_session_end() — session boundary (CLI exit, /reset, gateway expiry)

on_session_reset() is called on /new or /reset to clear per-session state without a full shutdown.

Configuration

Users select your engine via hermes plugins → Provider Plugins → Context Engine, or by editing config.yaml:

context:
engine: "lcm" # must match your engine's name property

The compression config block (compression.threshold, compression.protect_last_n, etc.) is specific to the built-in ContextCompressor. Your engine should define its own config format if needed, reading from config.yaml during initialization.

Testing

from agent.context_engine import ContextEngine

def test_engine_satisfies_abc():
engine = YourEngine(context_length=200000)
assert isinstance(engine, ContextEngine)
assert engine.name == "your-name"

def test_compress_returns_valid_messages():
engine = YourEngine(context_length=200000)
msgs = [{"role": "user", "content": "hello"}]
result = engine.compress(msgs)
assert isinstance(result, list)
assert all("role" in m for m in result)

See tests/agent/test_context_engine.py for the full ABC contract test suite.

See also