Publikation
Handling Long-Term Safety and Uncertainty in Safe Reinforcement Learning
Jonas Günster; Puze Liu; Jan Peters; Davide Tateo
In: Pulkit Agrawal; Oliver Kroemer; Wolfram Burgard (Hrsg.). Conference on Robot Learning, 6-9 November 2024, Munich, Germany. Conference on Robot Learning (CoRL), Pages 4670-4697, Proceedings of Machine Learning Research, Vol. 270, PMLR, 2024.
Zusammenfassung
Safety is one of the key issues preventing the deployment of reinforce-
ment learning techniques in real-world robots. While most approaches in the Safe
Reinforcement Learning area do not require prior knowledge of constraints and
robot kinematics and rely solely on data, it is often difficult to deploy them in
complex real-world settings. Instead, model-based approaches that incorporate
prior knowledge of the constraints and dynamics into the learning framework have
proven capable of deploying the learning algorithm directly on the real robot. Un-
fortunately, while an approximated model of the robot dynamics is often available,
the safety constraints are task-specific and hard to obtain: they may be too com-
plicated to encode analytically, too expensive to compute, or it may be difficult
to envision a priori the long-term safety requirements. In this paper, we bridge
this gap by extending the safe exploration method, ATACOM, with learnable con-
straints, with a particular focus on ensuring long-term safety and handling of un-
certainty. Our approach is competitive or superior to state-of-the-art methods in
final performance while maintaining safer behavior during training.
