RUYi Dynamics · Navigation

Embodied Autonomous Navigation Stack

Three complementary, self-contained stacks — a world-model-driven vision-language navigation model, semantics-rich language-driven indoor & outdoor navigation, and high-speed embodied marathon autonomy in the open.

Unified World-Action Architecture
Vision
RGB-D Stream
Egocentric observation
Language
NL Instruction
Free-form command
State
Proprioception
Pose + odometry
Memory
Spatial History Memory
Keyframe observation history
Multi-Modal DiT Backbone
Unified World Model
Visual + action streams · cross-modal attention fusion · Flow Matching
4-Task Router
Policy
Action decision
4-Task Router
Fwd / Inv Dynamics
State prediction
4-Task Router
Visual Generation
Future foresight
Output
Action Prediction
Body-frame control
Deploy
Edge Closed-Loop
Sensor-to-actuator, real-time
012345678901234567890123456789012345678901234567890012345678901234567890123456789012345678901234567890123456789012345678901
Language-Driven

Natural language drives navigation directly — no predefined path or map.

0123456789012345678901234567890123456789012345678900123456789012345678901234567890123456789012345678901234567890123456789012
Spatial History Memory

A keyframe history of past observations preserves critical decision points over long horizons.

01234567890123456789012345678901234567890123456789001234567890123456789012345678901234567890123456789012345678901234567890123
Continuous Trajectory

ODE integration yields smooth, continuous trajectories beyond discrete action spaces.

012345678901234567890123456789012345678901234567890012345678901234567890123456789012345678901234567890123456789012345678901234
Latent Foresight

The world model looks ahead implicitly in latent space for anticipatory, robust decisions.

0123456789012345678901234567890123456789012345678900123456789012345678901234567890123456789012345678901234567890123456789012345
4-Task Unified

One model jointly optimizes four complementary tasks with shared knowledge.

Proven in the Real World

Three navigation stacks — a language-driven model, indoor wayfinding, and outdoor long-range autonomy — running on real robots.

Language-Guided Navigation

The robot finds its own way from a plain-language instruction.

Indoor Wayfinding

Ask for any place by name and the robot takes you there.

Outdoor Long-Range

A humanoid covers a long outdoor course on its own — collision-free.