loading singularity
measuring window
Awaiting the first sealed window. The moment a window seals, its submitted bundles flow through the stages below — each recomputed by every validator. Survivors don’t just pass the gates — they become the next GRPO step.
01
window
event-sealed · each tempo
02
select
miner bets a prompt
03
rollout
M=8 sampled completions
04
GRAIL sketch
binds rollout to weights
05
learning zone
group-σ in the band
06
GRPO step
checkpoint → HF
the gate every rollout passes — recompute, or earn zero.
A model only learns from prompts at its frontier — and only 1–15% of prompts are ever there. Find them and every step counts; miss them and you burn compute on prompts that teach nothing. Reliquary turns that search into a race, so a miner that predicts well is faster — and the model reaches a given accuracy in far fewer, far cheaper steps.
pass@1 · held-out math · 300 steps
+14pp
the frontier, narrowing
As the policy improves, the zone shrinks toward 1%. A reactive filter then discards 99% of its compute. A market doesn't — miners track the moving frontier. That's why ~2× is a floor, not a ceiling.
+14pp
pass@1 vs vanilla GRPO
identical 300-step budget
~2×
training efficiency
a lower bound — it grows
1–15%
of prompts are in-zone
the rest teach nothing
We aim to be the fastest RL training layer on Bittensor.
read the measurement →Windows seal every few minutes, once enough distinct in-zone prompts arrive. The moment this validator publishes its archive, the submitted / accepted split and the cross-validator acceptance spread appear right here.
anatomy of a sealed window
The whole bundle is archived to R2, replayable byte-for-byte — the exact dataset behind the checkpoint.
One protocol, two instruments. The archive is the cool one — R2 stores the full bundle the trainer saw, so every checkpoint ships with the exact dataset behind it, replayable byte-for-byte. Anyone can re-hash a bundle, re-derive its GRAIL commitment, and check the signatures against the on-chain hotkeys.
01
prompt
the seeded input from drand
02
rollout
miner forward-pass token stream
03
grail_sketch
256-bit cryptographic commitment
04
logprob_field
per-token log-probabilities, recomputed
05
signatures
validator hotkey attestations
Don't trust us. Replay it.
The hot instrument. Each window, the validator runs a GRPO step on the rollouts that survived GRAIL and pushes the updated weights to Hugging Face. The model is trained on the exact prompts the market found at the frontier — and every checkpoint is benchmarked against the one before it.
01
rollouts
GRAIL-accepted miner outputs
02
curation
log-prob-weighted sampling
03
GRPO step
train on the survivors
04
benchmark
every checkpoint vs. the last
05
publish
new weights → Hugging Face
Watch the furnace tick.
The architecture isn't tied to one model or domain — it's a general-purpose RL inference layer. A client brings a model and a set of environments; the miner network handles rollout generation, prompt selection, and verification. RL inference on demand.
who · 01
Bring a model and your environments. Skip building and babysitting rollout infrastructure — the network returns an optimized training signal, already verified.
who · 02
Any Bittensor subnet that wants to RL-tune a model can route its rollout generation here instead of standing up a trainer of its own.
The RL inference layer on Bittensor.
read the vision →Run a miner. Predict which prompts sit at the model’s edge. Land in the learning zone and your verified rollouts train the next checkpoint.
◉ Reliquary · subnet 81 · the learning frontier, as a market