Llarva: Vision-Action Instruction Tuning Enhances Robot Learning

Tech ID: 33981 / UC Case 2025-111-0

Patent Status

Patent Pending

Brief Description

Bridging the gap between a language model’s next-word prediction and physical robot control, researchers at UC Berkeley have developed LLARVA (Large Language model for Robotic Vision and Action). This model utilizes a novel vision-action instruction tuning method that allows a robotic device to handle various tasks and environments without task-specific fine-tuning.

Suggested uses

General-Purpose Robot Assistants: Equipping service robots with the ability to follow natural language instructions for varied household tasks like "clear the table" or "stack the boxes."
Multi-Robot Industrial Automation: Implementing a single, unified model that can control different robot models (e.g., Franka or UR5) across diverse manufacturing scene configurations.
Rapid Task Deployment: Enabling robots in warehouse environments to switch between novel manipulation tasks instantly via simple text-based prompts and visual context.
Enhanced Teleoperation: Providing operators with predictive visual traces that show the robot's intended path, improving the precision of remote control in complex environments.
Robot Skill Acquisition: Serving as a foundation model for learning complex, long-horizon manipulation sequences through instruction-based "waypoint" prediction.

Advantages

Superior Generalization: The model can adapt to different robot configurations and environments because it is trained on diverse, large-scale datasets rather than specialized niche data.
Scalable Data Efficiency: By leveraging the Open X-Embodiment dataset, the system utilizes millions of existing trajectories, reducing the need for expensive, manual real-world demonstrations.
Zero-Shot Task Execution: LLARVA can often perform new tasks correctly the first time by reasoning through the structured language prompts that define the robot type and control mode.
Improved Spatial Awareness: The auxiliary task of waypoint prediction provides the robot with better fine-grained localization, leading to higher success rates in contact-rich tasks like stacking.

Related Materials

Contact

Learn About UC TechAlerts - Save Searches and receive new technology matches

Llarva: Vision-Action Instruction Tuning Enhances Robot Learning

Patent Status

Brief Description

Suggested uses

Advantages

Related Materials

Contact

Inventors

Other Information

Categorized As

Additional Technologies by these Inventors

Llarva: Vision-Action Instruction Tuning Enhances Robot Learning

Patent Status

Brief Description

Suggested uses

Advantages

Related Materials

Share This

Contact

Inventors

Other Information

Categorized As

Related cases

Additional Technologies by these Inventors