Reliquary is a decentralized GRPO training market on Bittensor Subnet 81. A model learns fastest from prompts at its learning frontier — hard enough that rollouts disagree, easy enough that the gradient still carries signal. Reliquary turns finding those prompts into a competitive market.
Miners aren't paid per rollout. They're paid for verified rollouts the trainer actually needs. A miner predicts which prompts sit in the learning zone, generates rollouts, and attaches a cryptographic proof that the work is real. The validator verifies every proof, runs a GRPO step on what survives, and publishes the updated checkpoint to Hugging Face.
The actors
- Miners select prompts at the learning frontier, run the model, and submit rollouts + a GRAIL proof.
- The validator recomputes rewards, verifies proofs, runs the GRPO step, and publishes checkpoints.
- The chain records selection and scoring so outside observers can audit later.
One window, end to end
- The validator announces the active checkpoint and the per-window randomness.
- Miners bet their own compute on prompts they predict sit in the trainable band, generate M rollouts per group, and attach a GRAIL sketch binding each rollout to the announced weights.
- The window seals once enough valid distinct-prompt groups land; final ordering follows drand/canonical rules, not validator-side latency.
- The validator recomputes every reward, verifies every sketch, and assembles a GRPO batch from the survivors. Fabricated work earns zero.
- A PPO-clipped GRPO step runs; updated weights publish to Hugging Face every ten trained windows, with a signed manifest recording the chain of custody from the base model.
Why selection is the game
Only a small slice of prompts sit in the learning zone at any
checkpoint, and the band narrows as the policy matures. A miner who
picks well lands on winning prompts and earns emission; a miner who
picks poorly burns its own compute on rejects like out_of_zone.
This converts DAPO's reactive generate-then-discard filter into an
ex-ante prediction market — and makes selection intelligence more
valuable over time, not less.
Why the work can't be faked
Every rollout carries a GRAIL sketch — a fingerprint over hidden-state activations at sampled positions, bound to per-window randomness the miner can't predict in advance. The validator recomputes it and rejects anything outside tolerance. The full attack-class audit lives at /docs/scoring.
What's different from today's subnets
Most Bittensor subnets pay miners for volume. Reliquary pays for the rollouts the trainer needs — and proves they're real before a single gradient step. The network's output is a continuously-trained model published to Hugging Face, not just a leaderboard.