Experiments¶

These are preserved notes from real HarnessGym validation runs — the raw engineering record behind the Results page. Each captures the exact commands, the generated artifacts, the qualification/repair events, and the measured before/after numbers.

They are evidence the mechanism works end to end on one machine, not statistically powered benchmarks. Where an attempt timed out, the notes say so; where an artifact was quarantined or repaired, the notes record it.

Date	Experiment	What it validated
2026-05-19	Tensor Layout Qualification	fresh-workspace artifact qualification end to end
2026-05-20	Claude Code Runner	the `claude` headless runner, MCP activation, replay
2026-05-23	Claude MCP Telemetry	machine-readable evidence that Claude actually called generated tools
2026-05-24	CPU MoE Real Smoke	the compiled top-2 MoE kernel task path
2026-05-26	H100 Triton RMSNorm	real H100-over-SSH scoring, 150.0 → 103.3 µs
2026-05-27	H100 Triton RMSNorm (Long)	longer run from a generated bundle, 17 active tools, 142.8 → 99.7 µs

How to read these¶

Each note follows the same arc the loop does:

Task — what was being optimized and how it was scored.
Generation — the harnessgym run command and the artifacts it produced.
Qualification — whether artifacts passed the clean-room gate, and any repair/quarantine events.
Replay — the harnessgym compare (or post-attempt scoring) and the plain-vs-harnessed numbers.

If you want to reproduce one, the commands in each note are the same ones on the Examples page, run against the corresponding workspace.