π0.5 — architecture & tensor IO
model cardA dual-expert flow-matching VLA, pinned down: every module's input/output tensor shape for pi05_libero, the camera-input contract, and each GitHub-issue ablation knob mapped to the exact module it touches.
- Dual-expert Gemma (PaliGemma
gemma_2b2048-w + action expertgemma_300m1024-w), 18 layers, joint attention - SigLIP So400m/14 image tower → 256 tokens/camera; 3 fixed image slots (base + left/right wrist)
- Per-module tensor IO table + flow-matching train/sample loop
- Ablation knobs: action_horizon, action_dim padding, LoRA vs full-FT, head pretraining, view masking, fps/codec, multi- vs per-task
π0.5 wrist-only — full-FT baselines
report · comparisonThe apparent difficulty of wrist-only π0.5 was largely an artifact of the masking mechanism, not a capacity ceiling: physically removing the third-person camera from attention nearly recovers the both-camera ceiling.
physical removal 94.2%
zero-mask 27.4%
both-cam ceiling 96.6%
LoRA floor 5.0%
- 5 full-FT baselines (2 evaluated, 3 in-flight self-distillation ablations)
- How each is trained & wired into the openpi pipeline, with the line of code per mechanism
- GitHub issues #5 (wrist-only floor) · #10 (curriculum + distillation)