Authors: Mansur Arief, Mike Timmerman, Jiachen Li, David Isele, Mykel J Kochenderfer
Abstract: Training intelligent agents to navigate highly interactive environments
presents significant challenges. While guided meta reinforcement learning (RL)
approach that first trains a guiding policy to train the ego agent has proven
effective in improving generalizability across various levels of interaction,
the state-of-the-art method tends to be overly sensitive to extreme cases,
impairing the agents’ performance in the more common scenarios. This study
introduces a novel training framework that integrates guided meta RL with
importance sampling (IS) to optimize training distributions for navigating
highly interactive driving scenarios, such as T-intersections. Unlike
traditional methods that may underrepresent critical interactions or
overemphasize extreme cases during training, our approach strategically adjusts
the training distribution towards more challenging driving behaviors using IS
proposal distributions and applies the importance ratio to de-bias the result.
By estimating a naturalistic distribution from real-world datasets and
employing a mixture model for iterative training refinements, the framework
ensures a balanced focus across common and extreme driving scenarios.
Experiments conducted with both synthetic dataset and T-intersection scenarios
from the InD dataset demonstrate not only accelerated training but also
improvement in agent performance under naturalistic conditions, showcasing the
efficacy of combining IS with meta RL in training reliable autonomous agents
for highly interactive navigation tasks.
Source: http://arxiv.org/abs/2407.15839v1