The reference checkpoint rotates on schedule; every accepted rollout proves the right weights. Open distributed training (v3) is on the roadmap.
v1 · live
v1 · what's live
Reference checkpoint. The validator pins a single canonical model hash; miners must serve that exact checkpoint to be scored.
GRAIL drift detection. Every accepted rollout carries a sketch proof. Miners on stale weights produce a sketch that fails verification — caught within milliseconds, no emission earned.
Scheduled rotation. The reference policy rotates per the trainer cadence. Cycles publish to R2 as signed attestations indexed under attestations/<netuid>/.
Once the open inference API (v2) is live, the next step is closing the loop: miner rollouts feed the trainer that produces the next checkpoint. GRPO outer-loop telemetry —loss_mean_window,grad_norm,advantage_variance,kl_vs_reference — will become live counters on this surface.
Bands worth flagging when that lands: loss_mean_window drifting up > 0.1 over 128 steps is the first sign of reward hacking; grad_norm above the clip value (default 1.0) flags numerical instability; kl outside the [-clip_eps, +clip_eps] band indicates the policy is running away from the reference.