RUYi Dynamics · ACT-DEXO Policy

Multi-Modal End-to-End VLA Model

A unified Vision-Language Transformer fuses language, vision, tactile and proprioception into autoregressive embodied control.

01RY-ACT-DEXOMulti-Modal VLA Policy

Vision-Language-Action Architecture

Vision

Visual Observation

Egocentric RGB stream

Language

Manipulation Instruction

Natural-language task prompt

Tactile

Contact Sensing

Force & touch feedback

State

Proprioception

Joint & end-effector pose

Memory

Spatial Memory

Keyframe long-horizon context

Multimodal Backbone

Vision-Language Transformer

Unified autoregressive reasoning

Unified Token Sequence

Motion Prior · Perceptual Consistency

World Model

Future Scene Prediction

Flow matching in vision latent space

Action Head

Trajectory Generation

Chunked continuous decoding

Output

Embodied Action

Real-time closed-loop control

012345678901234567890123456789012345678901234567890012345678901234567890123456789012345678901234567890123456789012345678901

Multi-Modal Fusion

Vision, language, tactile and proprioception attend jointly in a single autoregressive sequence.

0123456789012345678901234567890123456789012345678900123456789012345678901234567890123456789012345678901234567890123456789012

Spatial Memory

Keyframe selection preserves critical decision points across long-horizon episodes.

01234567890123456789012345678901234567890123456789001234567890123456789012345678901234567890123456789012345678901234567890123

Latent World Model

Forecasts the future in vision latent space with a built-in dynamics prior.

012345678901234567890123456789012345678901234567890012345678901234567890123456789012345678901234567890123456789012345678901234

Chunked Action

Decodes multi-step continuous trajectories for temporally coherent execution.

0123456789012345678901234567890123456789012345678900123456789012345678901234567890123456789012345678901234567890123456789012345

Edge Real-Time Inference

Runs on-device with asynchronous inference for low-latency, real-world closed-loop control.

Built for Vertical Markets

One VLA model, adapted and deployed across real industrial and commercial scenarios.

Drive-Controller Sorting

Picking and sorting industrial autonomous-driving domain controllers on the line.

Flexible-PCB Handling

Grasping and arranging delicate flexible printed circuit boards.