Runners¶
A runner is the backend that actually drives an agent through a phase. The
orchestrator doesn't know or care which agent ran the work — it hands the
runner a rendered prompt and a session id, and gets back a RunnerResult with
captured stdout, stderr, transcript, return code, timing, and the parsed
session id (models.RunnerResult).
Every runner implements the same three-phase contract (runners.base.Runner):
| Method | Phase | Session |
|---|---|---|
start_attempt |
attempt | opens a new session |
reflect |
reflection | resumes the attempt session |
build_tooling |
build / repair | resumes the same session |
close |
— | tears down any backend process |
Reflection and build resume the attempt session so the agent reflects on work it genuinely did, not a summary handed back to a cold session.
Picking a runner¶
Select with --runner:
| Runner | Backend | Use it for |
|---|---|---|
exec |
Codex CLI (codex exec) |
the recommended MVP path |
claude |
Claude Code CLI (claude -p) |
running the loop on Claude Code |
fake |
none (deterministic) | tests, demos, install checks |
tui-goal |
Codex PTY (/goal) |
experimental |
What's the same across runners¶
Whatever the backend, HarnessGym applies the same machinery:
- Process-group timeouts. The runner launches in its own process group, and HarnessGym terminates the whole group on timeout so child benchmark/tool processes can't keep pipes open past the deadline.
- MCP activation + telemetry. Generated MCP servers are wired into the
runner's native config and launched through a telemetry wrapper so every
tools/callis logged. See Activation and Telemetry. - Skill activation. Generated skills are symlinked into the runner's native
skills directory (
.agents/skillsfor Codex,.claude/skillsfor Claude). - Captured artifacts. Each phase writes
<phase>.prompt.txt,.stdout.txt,.stderr.txt, and.transcript.txtinto the iteration directory.
The differences are entirely in how each backend is invoked and how its MCP stdio is framed — covered on the per-runner pages.