SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies

1Georgia Institute of Technology

Abstract

We propose SAIL (Speed-Adaptive Imitation Learning), a framework for enabling faster-than- demonstration execution of policies by addressing key technical challenges in robot dynamics and state-action distribution shifts. Offline Imitation Learning (IL) methods such as Behavior Cloning (BC) are a simple and effective way to acquire complex robotic manipulation skills. However, existing IL-trained policies are confined to execute the task at the same speed as shown in the demonstration. This limits the task throughput of a robotic system, a critical requirement for applications such as industrial automation. SAIL features four tightly-connected components: High-gain control to enable high-fidelity tracking of IL policy trajectories, consistency-preserving trajectory generation to ensure smoother robot motion, adaptive speed modulation that dynamically adjusts execution speed based on motion complexity, and action scheduling to handle real-world system latencies. Experimental validation on six robotic manipulation tasks shows that SAIL achieves up to a speedup over demonstration speed in simulation and up to 3.2× speedup on physical robot.

Real-World Results

Stacking Cups to Paramid Shape

Wiping Board

Baking: pick up bowl and put them in the oven

Folding Cloth

SAIL System Overview

SAIL Overview

(a) Policy Level: Starting with synchronized observations (Obs. sync) from robot state and camera inputs, the system generates (1) temporally-consistent action predictions through action-conditioned CFG and (2) time-varying action interval δt.
(b) Controller Level: The predicted actions are scheduled for execution while accounting for sensing-inference delays, with outdated actions being discarded. The scheduled actions are executed using a high-gain controller with velocity feedforward (Vel FF) terms to track trajectory at the specified time parametrization.

Policy Level

Challenge 1: Divergence of Consecutive Action Prediction Diffusion policy would produce diverging action predictions, which is harmful for high-speed execution. We propose a novel action-conditioned CFG to generate temporally-consistent action predictions.

SAIL Overview

Challenge 2: Adaptive Speed Modulation Motion segments that are nonlinear and require precise control should not be sped up. Based on the motion complexity, we adjust the execution speed adaptively.

speed modulation

Controller Level

Challenge 3: Controller Behavior Shift

Challenge 4: System Latency in Control Loop System latencies caused by communication and control loop can lead to out-of-distribution inputs to the policy and time-misaligned action commands to the controller. To address this, we propose a novel action scheduling mechanism to handle real-world system latencies.

SAIL Overview

Evaluation

Quantitative Results

SAIL Overview

Real world: For throughput-with-regret (TPR) of 4× speeding up execution for real-world evaluation tasks, SAIL outperforms the baseline DP 32% on average.

Sim number


Simulation: We test SAIL on 5 tasks in simulation. TODO: add more description here and short summary here.

High Gain Controller and Reached Pose Prediction

replay vs kp

We examine the effects of increasing controller gains and speed for replaying demos in simulation. Left: using commanded poses performs better when replaying at the original speed (c = 1) but using reached poses matches performance when using high gains. Right: A high-gain controller using reached poses performs better than one using commanded poses at a higher execution speed.

What makes a good action condition for Classifier-Free Guidance?

Action conditioned on unconditional action distribution

Action conditioned on temporally perturbed action

Yellow: Action condition, Grey: Unconditional action distribution, Red: Conditional action distribution. CFG works best when the action condition is in unconditional action distribution.

Can task rollout trajectory with CFG. Action conditioned on unconditional action distribution

Can task rollout trajectory without CFG.

Adaptive Speed Modulation Increases Task Success Rate

Compared to a simple gripper heuristic, Adaptive Speed Modulation will detect the motion complexity of the action sequence and adjust the execution speed accordingly. As an inherent result, policy execution is only sped up when the motion is linear and smooth, and is only slowed down for complex motion (e.g. precise manipulation). In the above video, blue stars represent extracted waypoints and green stars represent clusters of complex motion segments.

reach pose

Metrics. Adaptive Speed Modulation is necessary for success in high-precision tasks. As seen in the results to the left, our ablation baseline (SAIL-AS) without ASM achieves a reduced success rate when compared to SAIL. Since tasks like Square, Stack, and Mug require precise motion, even minor errors in tracking or model prediction can lead to failure, hence why it is important to slow down.

Limitations

While SAIL shows promising results in both accelerating policy execution in simulation and real-world deployment, we do not explicitly tackle the dynamics shift of robot-object interaction. Future research could address this by developing methods to incorporate explicit dynamics modeling into policies, either through learning speed-dependent dynamic models or leveraging physics simulation during training.
Adaptive Speed Modulation is also currently only applied in simulation, while a gripper heuristic for slowdown is used in real experiments. We found that ASM would not slow down consistently in the right segments due to the noisiness of real data. Future research could aim to develop upon this problem.

Acknowledgement

TODO: add acknowledgement

BibTeX

placeholder 7