Skip to content

Codex runner (exec)

The exec runner is the recommended MVP backend. It drives the OpenAI Codex CLI (runners.exec_runner.ExecRunner).

harnessgym run --task task.md --workspace . --iterations 3 --runner exec

How phases map to Codex

Phase Command
attempt codex exec <prompt>
reflect codex exec resume <session_id> <prompt>
build / repair codex exec resume <session_id> <prompt>

The attempt opens a fresh Codex session; reflection and build resume it via the session id Codex reports. It uses ordinary autonomous prompts — it does not assume codex exec "/goal ..." sets a goal.

Flags

Flag Default Meaning
--codex-bin codex path to the Codex executable

MCP activation for Codex

Generated MCP servers are injected as codex exec -c mcp_servers... overrides, because codex exec loads the user config by default and HarnessGym should not have to mutate global Codex config for a repo-local generated server.

Each generated server is launched through harnessgym.mcp_telemetry_proxy, which preserves Content-Length MCP framing while logging each tools/call to .harnessgym/mcp_calls.jsonl.

The mcp_call.py helper

Activation also writes .harnessgym/runtime/mcp_call.py. When a codex exec worker can't see native MCP callables in its session, it should call generated tools through this helper rather than writing an ad-hoc JSON-RPC client:

python3 .harnessgym/runtime/mcp_call.py \
  --server <server> --tool <tool> --arguments '<json-object>'

This routes through the telemetry proxy, so the call is logged and counts as concrete harness tool use. Bypassing it — launching .harnessgym/mcp/... server files directly — loses telemetry and does not count as verified tool use.

Timeout handling

HarnessGym launches Codex in a process group and terminates the whole group on timeout, so spawned benchmark/tool children cannot keep pipes open indefinitely. A timed-out attempt still has its workspace scored when --post-attempt-command is set — see How It Works.

Claude Code runner → Activation →