Three complementary, self-contained stacks — a world-model-driven vision-language navigation model, semantics-rich language-driven indoor & outdoor navigation, and high-speed embodied marathon autonomy in the open.
Natural language drives navigation directly — no predefined path or map.
A keyframe history of past observations preserves critical decision points over long horizons.
ODE integration yields smooth, continuous trajectories beyond discrete action spaces.
The world model looks ahead implicitly in latent space for anticipatory, robust decisions.
One model jointly optimizes four complementary tasks with shared knowledge.
Three navigation stacks — a language-driven model, indoor wayfinding, and outdoor long-range autonomy — running on real robots.
The robot finds its own way from a plain-language instruction.
Ask for any place by name and the robot takes you there.
A humanoid covers a long outdoor course on its own — collision-free.