VLA-Zoo — reports & docs

π0.5 pre-training — landscape, throughput & π-equivalent sizing

Sizing π0-equivalent pre-training (match timesteps-seen) on our 8×H200 nodes — a reference-driven VLA landscape table, throughput/memory vs batch, and 1-node vs multi-node time & cost.

π0 target 0.7–1.4B ts1 node 31–62 d4 nodes 8–15 dnode tput 269 fr/s

pi0.5pretrainingthroughputscalingmulti-nodelandscape

updated 2026-05-24 · tohkawa25 · readyOpen →

π0.5 wrist-only — full-FT baselines

report

Wrist-only π0.5 difficulty is mostly a masking artifact: physically removing the third-person camera from attention nearly recovers the both-camera ceiling.

attention drop 94.2%zero-mask 27.4%both-cam 96.6%LoRA floor 5.0%

pi0.5liberoablationwrist-onlydistillation

updated 2026-05-23 · tohkawa25 · readyOpen →

π0.5 — architecture & tensor IO

model card

Dual-expert flow-matching VLA, pinned down: per-module tensor IO for pi05_libero, the 3-camera input contract, and every ablation knob mapped to the exact module it touches.

pi0.5architectureliberotensor-ioablation

updated 2026-05-23 · tohkawa25 · readyOpen →