SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies

Abstract

Offline Imitation Learning (IL) methods such as Behavior Cloning (BC) are effective at acquiring complex robotic manipulation skills. However, existing IL-trained policies are confined to execute the task at the same speed as shown in the demonstration. This limits the task throughput of a robotic system, a critical requirement for applications such as industrial automation. In this paper, we introduce and formalize the novel problem of enabling faster-than-demonstration execution of visuomotor policies and identify fundamental challenges in robot dynamics and state-action distribution shifts. We instantiate the key insights as SAIL (Speed-Adaptive Imitation Learning), a full-stack system integrating four tightly-connected components: (1) a consistency-preserving action inference algorithm for smooth motion at high speed, (2) high-fidelity tracking of controller-invariant motion target, (3) adaptive speed modulation that dynamically adjusts execution speed based on motion complexity, and (4) action scheduling to handle real-world system latencies. Experiments in 12 tasks across simulation and two real robot platforms shows that SAIL achieves up to a 4× speedup over demonstration speed in simulation and up to 3.2× speedup on physical robots.

Stacking Cups to Pyramid Shape

Wiping Board

Baking: pick up bowl and put them in the oven

Folding Cloth

Plate Fruits

Bimanual Serve: pick up the peach, put it in the bowl and serve

Simulation Results

Can
Mug Cleanup

Can

Mug Cleanup: open the drawer, put the mug in it and close the drawer

SAIL System Overview

(a) Policy Level: Starting with synchronized observations (Obs. sync) from robot state and camera inputs, the system generates (1) temporally-consistent action predictions through action-conditioned CFG and (2) time-varying action interval δt.
(b) Controller Level: The predicted actions are scheduled for execution while accounting for sensing-inference delays, with outdated actions being discarded. The scheduled actions are executed using a high-gain controller with velocity feedforward (Vel FF) terms to track trajectory at the specified time parametrization.

Policy Level

Challenge 1: Divergence of Consecutive Action Prediction Diffusion policy would produce diverging action predictions, which is harmful for high-speed execution. We propose a novel action-conditioned CFG to generate temporally-consistent action predictions.

Challenge 2: Adaptive Speed Modulation Motion segments that are nonlinear and require precise control should not be sped up. Based on the motion complexity, we adjust the execution speed adaptively.

Controller Level

Challenge 3: Controller Behavior Shift

Challenge 4: System Latency in Control Loop System latencies caused by communication and control loop can lead to out-of-distribution inputs to the policy and time-misaligned action commands to the controller. To address this, we propose a novel action scheduling mechanism to handle real-world system latencies.

Evaluation

Quantitative Results

Real world: For throughput-with-regret (TPR) of 4× speeding up execution for real-world evaluation tasks, SAIL outperforms the baseline DP 32% on average.

Simulation: We test SAIL on 5 tasks in simulation. TODO: add more description here and short summary here.

High Gain Controller and Reached Pose Prediction

We examine the effects of increasing controller gains and speed for replaying demos in simulation. Left: using commanded poses performs better when replaying at the original speed (c = 1) but using reached poses matches performance when using high gains. Right: A high-gain controller using reached poses performs better than one using commanded poses at a higher execution speed.

What makes a good action condition for Classifier-Free Guidance?

Action conditioned on unconditional action distribution

Action conditioned on temporally perturbed action

Yellow: Action condition, Grey: Unconditional action distribution, Red: Conditional action distribution. CFG works best when the action condition is in unconditional action distribution.

Can task rollout trajectory with CFG. Action conditioned on unconditional action distribution

Can task rollout trajectory without CFG.

Adaptive Speed Modulation Increases Task Success Rate

Compared to a simple gripper heuristic, Adaptive Speed Modulation will detect the motion complexity of the action sequence and adjust the execution speed accordingly. As an inherent result, policy execution is only sped up when the motion is linear and smooth, and is only slowed down for complex motion (e.g. precise manipulation). In the above video, blue stars represent extracted waypoints and green stars represent clusters of complex motion segments.

Metrics. Adaptive Speed Modulation is necessary for success in high-precision tasks. As seen in the results to the left, our ablation baseline (SAIL-AS) without ASM achieves a reduced success rate when compared to SAIL. Since tasks like Square, Stack, and Mug require precise motion, even minor errors in tracking or model prediction can lead to failure, hence why it is important to slow down.

Limitations

While SAIL shows promising results in both accelerating policy execution in simulation and real-world deployment, we do not explicitly tackle the dynamics shift of robot-object interaction. Future research could address this by developing methods to incorporate explicit dynamics modeling into policies, either through learning speed-dependent dynamic models or leveraging physics simulation during training.
Adaptive Speed Modulation is also currently only applied in simulation, while a gripper heuristic for slowdown is used in real experiments. We found that ASM would not slow down consistently in the right segments due to the noisiness of real data. Future research could aim to develop upon this problem.

Acknowledgement

TODO: add acknowledgement

BibTeX

placeholder 7