On shallow planning under partial observability

Authors: Randy Lefebvre, Audrey Durand

Abstract: Formulating a real-world problem under the Reinforcement Learning framework
involves non-trivial design choices, such as selecting a discount factor for
the learning objective (discounted cumulative rewards), which articulates the
planning horizon of the agent. This work investigates the impact of the
discount factor on the biasvariance trade-off given structural parameters of
the underlying Markov Decision Process. Our results support the idea that a
shorter planning horizon might be beneficial, especially under partial
observability.

Source: http://arxiv.org/abs/2407.15820v1