RUYi Dynamics · Navigation

Embodied Autonomous Navigation Stack

Three complementary, self-contained stacks — a world-model-driven vision-language navigation model, semantics-rich language-driven indoor & outdoor navigation, and high-speed embodied marathon autonomy in the open.

Unified World-Action Architecture

Vision

RGB-D Stream

Egocentric observation

Language

NL Instruction

Free-form command

State

Proprioception

Pose + odometry

Memory

Spatial History Memory

Keyframe observation history

Multi-Modal DiT Backbone

Unified World Model

Visual + action streams · cross-modal attention fusion · Flow Matching

4-Task Router

Policy

Action decision

4-Task Router

Fwd / Inv Dynamics

State prediction

4-Task Router

Visual Generation

Future foresight

Output

Action Prediction

Body-frame control

Deploy

Edge Closed-Loop

Sensor-to-actuator, real-time

012345678901234567890123456789012345678901234567890012345678901234567890123456789012345678901234567890123456789012345678901

Language-Driven

Natural language drives navigation directly — no predefined path or map.

0123456789012345678901234567890123456789012345678900123456789012345678901234567890123456789012345678901234567890123456789012

Spatial History Memory

A keyframe history of past observations preserves critical decision points over long horizons.

01234567890123456789012345678901234567890123456789001234567890123456789012345678901234567890123456789012345678901234567890123

Continuous Trajectory

ODE integration yields smooth, continuous trajectories beyond discrete action spaces.

012345678901234567890123456789012345678901234567890012345678901234567890123456789012345678901234567890123456789012345678901234

Latent Foresight

The world model looks ahead implicitly in latent space for anticipatory, robust decisions.

0123456789012345678901234567890123456789012345678900123456789012345678901234567890123456789012345678901234567890123456789012345

4-Task Unified

One model jointly optimizes four complementary tasks with shared knowledge.

Proven in the Real World

Three navigation stacks — a mapless VLN model, mapped semantic navigation, and outdoor long-range autonomy — running on real robots.

Mapless VLN Model

RY-SOLO

Fine-Grained Instruction

Go straight, turn right, turn left at the yellow wall painting, and stop beside the red sofa.

2 段

Mapped · SLAM + Topology

RY-LUMO

Language-Interactive Tour

A hierarchical scene graph lets you name any place — the robot localizes it and leads the way.

Outdoor · RTK Fusion

RY-VASTO

Marathon Autonomous Navigation

A humanoid runs a full 21 km urban marathon on its own — start to finish, zero collisions.