Blog

We are bringing general Embodied AI into the physical world.

BRMData Dataset

BRMData: A Bimanual-Mobile Robot Manipulation Dataset for Household Tasks

BRMData is a Bimanual-mobile Robot Manipulation Dataset designed for household applications, featuring 10 diverse tasks including single-arm and dual-arm, tabletop, and mobile manipulations. It includes multi-view and depth-sensing data, with tasks ranging from single-object to multi-object grasping, non-interactive to human-robot interactive scenarios, and rigid to flexible-object manipulation. A novel Manipulation Efficiency Score (MES) metric is introduced to evaluate the precision and efficiency of robot manipulation methods.

Object-Focus Actor

Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation

The Object-Focus Actor (OFA) introduces a data-efficient approach for generalized dexterous robot manipulation, addressing limitations in generalization across diverse scenes and object placements. It employs a hierarchical pipeline including object perception and pose estimation, pre-manipulation pose arrival, and object-focus policy learning, leveraging consistent end trajectories for efficient policy training. Real-world experiments across seven tasks demonstrate superior performance in positional and background generalization, achieving robust results with only 10 demonstrations.

JDVLA-Align Representation Learning

JDVLA-Align: Enhancing Visual Language Action Representation Learning

JDVLA-Align enhances visual-language-action (VLA) representation learning by leveraging pre-trained CLIP models for aligning vision and language, extracting task-relevant visual representations to improve 3D spatial understanding in VLA models. This approach supports robot generalization based on language instructions through early fusion mechanisms for representation augmentation.

JDVLA-RL Manipulation

JDVLA-RL: Towards General Robotic Manipulation with Reinforcement Learning

JDVLA-RL is an algorithmic framework that utilizes online reinforcement learning (RL) to refine pre-trained autoregressive VLA models, enabling scalable general robotic manipulation. It improves task completion alignment and dynamic skill acquisition through RL fine-tuning.

JDVLA-Human Learning

JDVLA-Human: Learning from Human Video

JDVLA-Human trains VLA models from egocentric human videos, predicting wrist and hand actions which are then converted to robot motions via inverse kinematics and retargeting. This supports large-scale learning from human videos with dynamic pose estimation.