Authors: Phurinut Srisawad, Juergen Branke, Long Tran-Thanh
Abstract: This paper formulates a new Best-Arm Identification problem in the
non-stationary stochastic bandits setting, where the means of all arms are
shifted in the same way due to a global influence of the environment. The aim
is to identify the unique best arm across environmental change given a fixed
total budget. While this setting can be regarded as a special case of
Adversarial Bandits or Corrupted Bandits, we demonstrate that existing
solutions tailored to those settings do not fully utilise the nature of this
global influence, and thus, do not work well in practice (despite their
theoretical guarantees). To overcome this issue, in this paper we develop a
novel selection policy that is consistent and robust in dealing with global
environmental shifts. We then propose an allocation policy, LinLUCB, which
exploits information about global shifts across all arms in each environment.
Empirical tests depict a significant improvement in our policies against other
existing methods.
Source: http://arxiv.org/abs/2408.12581v1