Experiments
Research Experiments
Each experiment tests a specific hypothesis about making AI-generated software systematically reliable.
Two Roads to Deployment
In ProgressCan a guided agent loop with 99.7% local compute match a 19K-line orchestration pipeline — and what does each approach trade away?
19
Trials
43
Failure Modes
1/2
Approaches Converged
Normative Convergence
CompleteCan a 5-layer epistemic scorer, mapped to ISO/IEC 25010, measure real code quality — or does the model just learn to pass the scorer?
2
Trials
9
Failure Modes
5/5
Layers Converged
Discrete Convergence
CompleteDoes code that an LLM thinks is high-quality actually pass real tools?
28
Trials
64
Failure Modes
5/5
Phases Converged
Layered Convergence
CompleteCan a specification-first methodology converge across 10 full-stack layers?
44
Trials
102
Failure Modes
10/10
Layers Converged