Skip to content

Runners

A runner is the backend that actually drives an agent through a phase. The orchestrator doesn't know or care which agent ran the work — it hands the runner a rendered prompt and a session id, and gets back a RunnerResult with captured stdout, stderr, transcript, return code, timing, and the parsed session id (models.RunnerResult).

Every runner implements the same three-phase contract (runners.base.Runner):

Method Phase Session
start_attempt attempt opens a new session
reflect reflection resumes the attempt session
build_tooling build / repair resumes the same session
close tears down any backend process

Reflection and build resume the attempt session so the agent reflects on work it genuinely did, not a summary handed back to a cold session.

Picking a runner

Select with --runner:

Runner Backend Use it for
exec Codex CLI (codex exec) the recommended MVP path
claude Claude Code CLI (claude -p) running the loop on Claude Code
fake none (deterministic) tests, demos, install checks
tui-goal Codex PTY (/goal) experimental

What's the same across runners

Whatever the backend, HarnessGym applies the same machinery:

  • Process-group timeouts. The runner launches in its own process group, and HarnessGym terminates the whole group on timeout so child benchmark/tool processes can't keep pipes open past the deadline.
  • MCP activation + telemetry. Generated MCP servers are wired into the runner's native config and launched through a telemetry wrapper so every tools/call is logged. See Activation and Telemetry.
  • Skill activation. Generated skills are symlinked into the runner's native skills directory (.agents/skills for Codex, .claude/skills for Claude).
  • Captured artifacts. Each phase writes <phase>.prompt.txt, .stdout.txt, .stderr.txt, and .transcript.txt into the iteration directory.

The differences are entirely in how each backend is invoked and how its MCP stdio is framed — covered on the per-runner pages.