🦾 ForceMimic

Force-Centric Imitation Learning with Force-Motion Capture System for Contact-Rich Manipulation

Teaser

Abstract

In most contact-rich manipulation tasks, humans apply time-varying forces to the target object, compensating for inaccuracies in the vision-guided hand trajectory. However, current robot learning algorithms primarily focus on trajectory-based policy, with limited attention given to learning force-related skills. To address this limitation, we introduce ForceMimic, a force-centric robot learning system, providing a natural, force-aware and robot-free robotic demonstration collection system, along with a hybrid force-motion imitation learning algorithm for robust contact-rich manipulation. Using the proposed ForceCapture system, an operator can peel a zucchini in 5 minutes, while force-feedback teleoperation takes over 13 minutes and struggles with task completion. With the collected data, we propose HybridIL to train a force-centric imitation learning model, equipped with hybrid force-position control primitive to fit the predicted wrench-position parameters during robot execution. Experiments demonstrate that our approach enables the model to learn a more robust policy under the contact-rich task of vegetable peeling, increasing the success rates by 54.5% relatively compared to state-of-the-art pure-vision-based imitation learning.

Video

ForceCapture

  • Objectives
    • Scalability
    • On-site force realism
    • Ergonomic comfort
  • Mechanisms
    • Ratchet locking
    • Gravity compensation

Two versions of ForceCapture are designed, one with a fixed tool and the other with an adaptive gripper. At its core, both designs share the feature of a six-axis force sensor placed between the end-effector and the user's gripping handle, which can be used to capture the effector-environment interaction forces.

ForceCapture is quite straightforward to manufacture, with the main body fully produced using 3D printing. The total cost of the printed parts and encoder is approximately $50. The weight of the device equipped with the gripper is only 0.8kg, of which the force sensor weighs 0.5kg, and our accessories weigh only 0.3kg, which is even lighter than a can of cola. And its center of mass is positioned above the handle, conforms to the natural force application habits of the human hand.

Except for the effector-environment interaction forces, forces exerted by human hands during opening and closing, and the gravity of the effector are also captured. A ratchet is inserted to isolate human hand forces by unidirectional locking if closed, and least-squares estimation of the effector's center of mass and weight is used to compensate for the self-gravity before collecting data.

Hardware

HybridIL

Pipeline

We first transfer the collected robot-free data to (pseudo-)robot data, migrating the domain gap. The captured wrench is compensated to account for self-gravity effects. The pose recorded by SLAM camera is transformed as the robot TCP pose. And RGB-D observation images are backprojected into point cloud and filtered out unrelated points. Leveraging this data, a diffusion-based policy is learned, with both pose and wrench trajectory predicted, conditioned on the encoded point cloud features, history pose and diffusion timestep embeddings. According to the predicted force value, either IK joint position primitive or hybrid force-position primitive is selected, and fits the output force-position parameters to conduct execution actions.

  • Using orthogonal hybrid force-postion controller to fit predicted wrench-position
    • Active on consecutive force threshold exceedance
    • Determine motion direction
    • Orthogonalize force-postion
    • Press opposite initially

Realized by Flexiv RDK of hybrid force-position control primitive.

Controller

Data Collection Efficiency

We conduct a case study of data collection by peeling a zucchini using a single-arm. The procedure involved picking up the peeler, peeling the zucchini on a stand, placing the peeler down, then grasping the zucchini to adjust its orientation for peeling until the entire vegetable was peeled. We use the gripper version of ForceCapture, and the teleoperation setup follows the configuration in RH20T.


Demo clip by ForceCapture

Quantitative Results

Efficiency comparison


Demo clip by Teleoperation

Not only the collection time of ForceCapture is very close to humans, but also it takes nearly no additional training time and nearly no operational errors.

Robotic Peeling Experiments

We use the fixed-tool version of ForceCapture to collect data, with total 15 zucchinis, collecting 438 peeling skill segments, resulting in a total of 30,199 action sequences. Leveraging the collected data, we train the HybridIL model and the baseline methods, all by 500 epochs.


Dataset replay by ForceCapture


Pose-augmented dataset replay by ForceCapture


Rollout on validation dataset by HybridIL


Rollout on real robot by HybridIL

  • Raw DP used raw visual perception and robot pose as inputs, outputting the end- effector pose sequence based on diffusion policy.
  • Force DP incorporated visual perception, robot pose, and robot force sensing as inputs, also outputting the end-effector pose sequence.
  • Force+Hybrid DP used visual perception, robot pose, and robot force sensing as inputs, but output both pose and wrench sequences
  • For baselines that output wrench-position parameters, hybrid force-position control primitives were employed to match and switch between control modes. Raw DP and HybridIL were tested for 20 peeling actions, while Force DP and Force+Hybrid DP were tested for 10 peeling actions.

    Method Success rate (%)
    motion correct peel length > 10cm
    Raw DP 80 55
    Force DP 60 10
    Force+Hybrid DP 80 20
    HybridIL (proposed) 100 85
    Raw DP

    Peeled skins by Raw DP

    Ablation

    Peeled skins by Force DP (left) and Force+Hybrid DP (right)

    HybridIL

    Peeled skins by HybridIL


    Rollouts on real robot by Raw DP and HybridIL

    GT Force Curve

    Force curve during data collection

    Position Force Curve

    Force curve during rollout of Raw DP

    Hybrid Force Curve

    Force curve during rollout of HybridIL

    Force DP and Force+Hybrid DP performed poorly. The mismatch between input forces and the force distribution in the dataset made it difficult for the models to predict the correct actions. Not only the success rates of HybridIL are higher than the baselines, but also the peeled skins of HybridIL are longer and smoother, and the interaction forces during execution are more similar to the collected data by human operators.