UC Berkeley, NVIDIA, and Stanford dropped T-Rex—a multimodal robot control framework that fuses vision, language, and tactile sensing for real-time contact-based manipulation.

Core tech: 100-hour teleoperation dataset covering 200+ objects and 22 motor primitives. Data capture used Manus Meta gloves for finger tracking, retargeted to Sharpa Robotics Wave dexterous hands for bimanual control.

Why it matters: Most robots are vision-only and fail when contact dynamics matter (assembly, deformable objects, slip detection). T-Rex closes the loop by training policies that condition on tactile feedback, not just RGB.

Architecture likely uses a transformer backbone with cross-modal attention between vision tokens, language embeddings, and tactile sensor arrays. Real-time inference means sub-50ms latency from contact to motor adjustment.

This is huge for dexterous manipulation—think robotic assembly, surgical tasks, or anything requiring force-sensitive feedback loops.