Experiments¶
These are preserved notes from real HarnessGym validation runs — the raw engineering record behind the Results page. Each captures the exact commands, the generated artifacts, the qualification/repair events, and the measured before/after numbers.
They are evidence the mechanism works end to end on one machine, not statistically powered benchmarks. Where an attempt timed out, the notes say so; where an artifact was quarantined or repaired, the notes record it.
| Date | Experiment | What it validated |
|---|---|---|
| 2026-05-19 | Tensor Layout Qualification | fresh-workspace artifact qualification end to end |
| 2026-05-20 | Claude Code Runner | the claude headless runner, MCP activation, replay |
| 2026-05-23 | Claude MCP Telemetry | machine-readable evidence that Claude actually called generated tools |
| 2026-05-24 | CPU MoE Real Smoke | the compiled top-2 MoE kernel task path |
| 2026-05-26 | H100 Triton RMSNorm | real H100-over-SSH scoring, 150.0 → 103.3 µs |
| 2026-05-27 | H100 Triton RMSNorm (Long) | longer run from a generated bundle, 17 active tools, 142.8 → 99.7 µs |
How to read these¶
Each note follows the same arc the loop does:
- Task — what was being optimized and how it was scored.
- Generation — the
harnessgym runcommand and the artifacts it produced. - Qualification — whether artifacts passed the clean-room gate, and any repair/quarantine events.
- Replay — the
harnessgym compare(or post-attempt scoring) and the plain-vs-harnessed numbers.
If you want to reproduce one, the commands in each note are the same ones on the Examples page, run against the corresponding workspace.