Publikation
Noise-conditioned Energy-based Annealed Rewards (NEAR): A Generative Framework for Imitation Learning from Observation
Anish Abhijit Diwan; Julen Urain; Jens Kober; Jan Peters
In: The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. International Conference on Learning Representations (ICLR), Pages 1-22, OpenReview.net, 2025.
Zusammenfassung
This paper introduces a new imitation learning framework based on energy-based
generative models capable of learning complex, physics-dependent, robot mo-
tion policies through state-only expert motion trajectories. Our algorithm, called
Noise-conditioned Energy-based Annealed Rewards (NEAR), constructs several
perturbed versions of the expert’s motion data distribution and learns smooth, and
well-defined representations of the data distribution’s energy function using de-
noising score matching. We propose to use these learnt energy functions as reward
functions to learn imitation policies via reinforcement learning. We also present a
strategy to gradually switch between the learnt energy functions, ensuring that the
learnt rewards are always well-defined in the manifold of policy-generated sam-
ples. We evaluate our algorithm on complex humanoid tasks such as locomotion
and martial arts and compare it with state-only adversarial imitation learning algo-
rithms like Adversarial Motion Priors (AMP). Our framework sidesteps the opti-
misation challenges of adversarial imitation learning techniques and produces re-
sults comparable to AMP in several quantitative metrics across multiple imitation
settings. Code and videos available at anishhdiwan.github.io/noise-conditioned-
energy-based-annealed-rewards/
