Authors: Anay Pattanaik, Lav R. Varshney
Abstract: This paper considers an online reinforcement learning algorithm that
leverages pre-collected data (passive memory) from the environment for online
interaction. We show that using passive memory improves performance and further
provide theoretical guarantees for regret that turns out to be near-minimax
optimal. Results show that the quality of passive memory determines
sub-optimality of the incurred regret. The proposed approach and results hold
in both continuous and discrete state-action spaces.
Source: http://arxiv.org/abs/2410.14665v1