loading singularity

The validator is watching.

measuring window

loading singularity

The validator is watching.

FIG.01THE LOOP

From rollout to checkpoint.

Awaiting the first sealed window. The moment a window seals, its submitted bundles flow through the stages below — each recomputed by every validator. Survivors don’t just pass the gates — they become the next GRPO step.

01
window
event-sealed · each tempo
02
select
miner bets a prompt
03
rollout
M=8 sampled completions
04
GRAIL sketch
binds rollout to weights
05
learning zone
group-σ in the band
06
GRPO step
checkpoint → HF

the gate every rollout passes — recompute, or earn zero.

GRAIL verdictschematic

rollout

proofrecomputed✓

reward μ0.62✓

group σin-band✓

signatures8 / 8 valid✓

verdictaccept

FIG.02THE CLAIM

The fastest learning signal in RL.

A model only learns from prompts at its frontier — and only 1–15% of prompts are ever there. Find them and every step counts; miss them and you burn compute on prompts that teach nothing. Reliquary turns that search into a race, so a miner that predicts well is faster — and the model reaches a given accuracy in far fewer, far cheaper steps.

pass@1 · held-out math · 300 steps

+14pp

0.33

base model

0.47

vanilla GRPO

0.61

Reliquary

the frontier, narrowing

reliably solvedlearning zonereliably failed

As the policy improves, the zone shrinks toward 1%. A reactive filter then discards 99% of its compute. A market doesn't — miners track the moving frontier. That's why ~2× is a floor, not a ceiling.

+14pp

pass@1 vs vanilla GRPO

identical 300-step budget

~2×

training efficiency

a lower bound — it grows

1–15%

of prompts are in-zone

the rest teach nothing

We aim to be the fastest RL training layer on Bittensor.

read the measurement →

FIG.03THE LAST WINDOW

Awaiting the next sealed window.

Windows seal every few minutes, once enough distinct in-zone prompts arrive. The moment this validator publishes its archive, the submitted / accepted split and the cross-validator acceptance spread appear right here.

open the dashboard →subnet 81 · finney

anatomy of a sealed window

promptselected at the frontier
rolloutsM = 8 sampled completions
grail_sketchbinds each rollout to the weights
rewardsrecomputed by the validator
signaturesone per attesting validator

The whole bundle is archived to R2, replayable byte-for-byte — the exact dataset behind the checkpoint.

FIG.04THE ARCHIVE · LEDGER

Every selected rollout, kept forever.

One protocol, two instruments. The archive is the cool one — R2 stores the full bundle the trainer saw, so every checkpoint ships with the exact dataset behind it, replayable byte-for-byte. Anyone can re-hash a bundle, re-derive its GRAIL commitment, and check the signatures against the on-chain hotkeys.

01
prompt
the seeded input from drand
02
rollout
miner forward-pass token stream
03
grail_sketch
256-bit cryptographic commitment
04
logprob_field
per-token log-probabilities, recomputed
05
signatures
validator hotkey attestations

Don't trust us. Replay it.

open the explorer →read the protocol

FIG.05THE FORGE

Where the next checkpoint is forged.

The hot instrument. Each window, the validator runs a GRPO step on the rollouts that survived GRAIL and pushes the updated weights to Hugging Face. The model is trained on the exact prompts the market found at the frontier — and every checkpoint is benchmarked against the one before it.

01
rollouts
GRAIL-accepted miner outputs
02
curation
log-prob-weighted sampling
03
GRPO step
train on the survivors
04
benchmark
every checkpoint vs. the last
05
publish
new weights → Hugging Face

Watch the furnace tick.

live training stream →training cycles

FIG.06THE LAYER

Bring a model. We handle the RL.

The architecture isn't tied to one model or domain — it's a general-purpose RL inference layer. A client brings a model and a set of environments; the miner network handles rollout generation, prompt selection, and verification. RL inference on demand.

who · 01

Teams running RL pipelines

Bring a model and your environments. Skip building and babysitting rollout infrastructure — the network returns an optimized training signal, already verified.

who · 02

Subnets that need an RL phase

Any Bittensor subnet that wants to RL-tune a model can route its rollout generation here instead of standing up a trainer of its own.

The RL inference layer on Bittensor.

read the vision →

FIG.07MINE THE FRONTIER

Mine the frontier.

Run a miner. Predict which prompts sit at the model’s edge. Land in the learning zone and your verified rollouts train the next checkpoint.

run a miner →live dashboard github

◉ Reliquary · subnet 81 · the learning frontier, as a market

measuring window

Reliquary — a market for the learning frontier

FIG.01THE LOOP

From rollout to checkpoint.

01
window
event-sealed · each tempo
02
select
miner bets a prompt
03
rollout
M=8 sampled completions
04
GRAIL sketch
binds rollout to weights
05
learning zone
group-σ in the band
06
GRPO step
checkpoint → HF

the gate every rollout passes — recompute, or earn zero.

GRAIL verdictschematic

rollout

proofrecomputed✓

reward μ0.62✓

group σin-band✓

signatures8 / 8 valid✓

verdictaccept

FIG.02THE CLAIM

The fastest learning signal in RL.

pass@1 · held-out math · 300 steps

+14pp

0.33

base model

0.47

vanilla GRPO

0.61

Reliquary

the frontier, narrowing

reliably solvedlearning zonereliably failed

+14pp

pass@1 vs vanilla GRPO

identical 300-step budget

~2×

training efficiency

a lower bound — it grows

1–15%

of prompts are in-zone

the rest teach nothing

We aim to be the fastest RL training layer on Bittensor.

read the measurement →

FIG.03THE LAST WINDOW

Awaiting the next sealed window.

open the dashboard →subnet 81 · finney

anatomy of a sealed window

promptselected at the frontier
rolloutsM = 8 sampled completions
grail_sketchbinds each rollout to the weights
rewardsrecomputed by the validator
signaturesone per attesting validator

The whole bundle is archived to R2, replayable byte-for-byte — the exact dataset behind the checkpoint.

FIG.04THE ARCHIVE · LEDGER

Every selected rollout, kept forever.

01
prompt
the seeded input from drand
02
rollout
miner forward-pass token stream
03
grail_sketch
256-bit cryptographic commitment
04
logprob_field
per-token log-probabilities, recomputed
05
signatures
validator hotkey attestations

Don't trust us. Replay it.

open the explorer →read the protocol

FIG.05THE FORGE

Where the next checkpoint is forged.

01
rollouts
GRAIL-accepted miner outputs
02
curation
log-prob-weighted sampling
03
GRPO step
train on the survivors
04
benchmark
every checkpoint vs. the last
05
publish
new weights → Hugging Face

Watch the furnace tick.

live training stream →training cycles

FIG.06THE LAYER

Bring a model. We handle the RL.

who · 01

Teams running RL pipelines

Bring a model and your environments. Skip building and babysitting rollout infrastructure — the network returns an optimized training signal, already verified.

who · 02

Subnets that need an RL phase

Any Bittensor subnet that wants to RL-tune a model can route its rollout generation here instead of standing up a trainer of its own.

The RL inference layer on Bittensor.

read the vision →

FIG.07MINE THE FRONTIER

Mine the frontier.

Run a miner. Predict which prompts sit at the model’s edge. Land in the learning zone and your verified rollouts train the next checkpoint.

run a miner →live dashboard github

◉ Reliquary · subnet 81 · the learning frontier, as a market