Authors: Anthony GX-Chen, Kenneth Marino, Rob Fergus
Abstract: In the face of difficult exploration problems in reinforcement learning, we
study whether giving an agent an object-centric mapping (describing a set of
items and their attributes) allow for more efficient learning. We found this
problem is best solved hierarchically by modelling items at a higher level of
state abstraction to pixels, and attribute change at a higher level of temporal
abstraction to primitive actions. This abstraction simplifies the transition
dynamic by making specific future states easier to predict. We make use of this
to propose a fully model-based algorithm that learns a discriminative world
model, plans to explore efficiently with only a count-based intrinsic reward,
and can subsequently plan to reach any discovered (abstract) states.
We demonstrate the model’s ability to (i) efficiently solve single tasks,
(ii) transfer zero-shot and few-shot across item types and environments, and
(iii) plan across long horizons. Across a suite of 2D crafting and MiniHack
environments, we empirically show our model significantly out-performs
state-of-the-art low-level methods (without abstraction), as well as performant
model-free and model-based methods using the same abstraction. Finally, we show
how to reinforce learn low level object-perturbing policies, as well as
supervise learn the object mapping itself.
Source: http://arxiv.org/abs/2408.11816v1