Stage 2: pre-registered multi-agent value dynamics test; frontier agents; $1.5k

What this is. A pre-registered, budget-capped, publish-either-way experiment. Testing whether frontier-model agents engage the coupled-regime predictions of a published theory of multi-agent value dynamics, in a setting where neither Kelly portfolios nor classical control individually predicts the result.

The paper. A Mathematical Theory of Value (arXiv:2606.12502; Zenodo DOI 10.5281/zenodo.20487041; public repo: github.com/macrokit/value; tool: value.macrokit.dev). The theory derives a coding theorem for value (realized growth ΔG ≤ I(X;Y)), a fleet capacity region with an explicit gap law, and a governance result: for payout-coupled agent populations, reducing what the misaligned goal pays converges faster than correcting drift after it occurs (residual ‖Vg‖/γ; does not shrink under pure oversight). The paper states its decisive test and binary decision rule in print.

The experiment. Stage 2: a payout-coupled multi-agent economy on frontier-model API agents (shared-bankroll / parimutuel; varied perception overlaps). Two pre-registered predictions with frozen pass/fail bands: (i) the gap law G_a − G_b = I_a − I_b and capacity-region structure; (ii) the residual-scaling exponents ‖Vg‖/γ ∝ g^{+1} γ^{-1}.

Why frontier models. The design was validated in synthetic simulation (15/15 pre-registered checks). Two prior pre-registered attempts on 1.5B-parameter local models came back CAP from opposite directions — the 1.5B instrument is affirmatively exhausted; frontier-model API is the next bar.

The gate. Before any grid spend, a pre-registered responsiveness gate (≈$14 at frontier prices, ~3% of the cap) verifies the agents engage the coupled dynamics at all. Gate-fail → full stop, published gate report, no further spend. This is not boilerplate: the gate is exactly the mechanism that would have saved the two prior grid budgets if it had been in place.

The decision rule (printed in the paper, applied mechanically): Predictions pass → the unification earns the stronger word on real agents. Gate passes, predictions fail → the distinctively-unified layer is retired and published as a clean negative with the same prominence a positive would receive. Gate fails → published gate report only; no grid spend. There is no outcome that is not published, and no outcome that gets shelved for being inconvenient.

The honesty record. Published three prior pre-registered negatives in full; absorbed six external-review passes across v1–v5; issued a public erratum in v5 (a false fleet-ceiling claim caught in audit, corrected with a counterexample that now defines what Stage 2 must implement). Pre-registration provably precedes results (git history, public repo).

The ask. $1,500 hard cap (likely less: frontier full program ≈$240 with safety factor, plus reserve for cross-vendor replication arm). No salary, no hardware, no overhead — frontier API credits only.

Stage 2: pre-registered multi-agent value dynamics test; frontier agents; $1.5k

Offer to donate

Offers

Help unlock this project.

Comments