RUYi Dynamics · ACT-DEXO Policy

Multi-Modal End-to-End VLA Model

A unified Vision-Language Transformer fuses language, vision, tactile and proprioception into autoregressive embodied control.

Vision-Language-Action Architecture
Vision
Visual Observation
Egocentric RGB stream
Language
Manipulation Instruction
Natural-language task prompt
Tactile
Contact Sensing
Force & touch feedback
State
Proprioception
Joint & end-effector pose
Memory
Spatial Memory
Keyframe long-horizon context
Multimodal Backbone
Vision-Language Transformer
Unified autoregressive reasoning
Unified Token Sequence
Motion Prior · Perceptual Consistency
World Model
Future Scene Prediction
Flow matching in vision latent space
Action Head
Trajectory Generation
Chunked continuous decoding
Output
Embodied Action
Real-time closed-loop control
012345678901234567890123456789012345678901234567890012345678901234567890123456789012345678901234567890123456789012345678901
Multi-Modal Fusion

Vision, language, tactile and proprioception attend jointly in a single autoregressive sequence.

0123456789012345678901234567890123456789012345678900123456789012345678901234567890123456789012345678901234567890123456789012
Spatial Memory

Keyframe selection preserves critical decision points across long-horizon episodes.

01234567890123456789012345678901234567890123456789001234567890123456789012345678901234567890123456789012345678901234567890123
Latent World Model

Forecasts the future in vision latent space with a built-in dynamics prior.

012345678901234567890123456789012345678901234567890012345678901234567890123456789012345678901234567890123456789012345678901234
Chunked Action

Decodes multi-step continuous trajectories for temporally coherent execution.

0123456789012345678901234567890123456789012345678900123456789012345678901234567890123456789012345678901234567890123456789012345
Edge Real-Time Inference

Runs on-device with asynchronous inference for low-latency, real-world closed-loop control.

Built for Vertical Markets

One VLA model, adapted and deployed across real industrial and commercial scenarios.

Drive-Controller Sorting

Picking and sorting industrial autonomous-driving domain controllers on the line.

Flexible-PCB Handling

Grasping and arranging delicate flexible printed circuit boards.