skip to content
Reliquary
dashboard
research
roadmap
docs
source
menu
Forge · live training · team only · Reliquary
forge · live training
5H47sFL6-base-reset-qwen35
·
step 16,247
live
▸ open in wandb
status
running
runtime
19d 18h
last seen
—
loss
5.57e-4
kl
6.76e-4
grad_norm
2.125
reward μ
0.5703
steps / h
34
target met
lr
4.72e-6
gpu util
0.0%
gpu mem
0.0%
ai advisor
reading the last 160 points…
model quality
computing quality signals…
validator rejections
tailing validator logs over ssh…
PPO loss
primary objective
KL divergence
budget kl_beta = 0.04
grad_norm
clip @ 1
learning rate
cosine schedule
rewards
mean ± std
degenerate-group ratio
zero-variance reward groups
valid rollout ratio
GRAIL accepted / submitted
model improvement · checkpoint evals
held-out pass@1 · math + code
model-improvement evals
checkpoint benchmarks stream once the eval pipeline publishes to R2
gpu util
0.0%
gpu mem
0.0%
sm occupancy
0.0%
gpu temp
28°C
power
114 W
run config
11 keys · hide
b_batch
8
grad_clip_norm
1
kl_beta
0.04
learning_rate
0.000005
lr_cosine_max_windows
10000
lr_warmup_windows
10
m_rollouts_per_prompt
8
ppo_clip_epsilon
0.2
reliquary_version
0.1.0
wandb_training_version
v1
window_length
5