Authors: Yue Yang, Linfeng Zhao, Mingyu Ding, Gedas Bertasius, Daniel Szafir
Abstract: Robotics has long sought to develop visual-servoing robots capable of
completing previously unseen long-horizon tasks. Hierarchical approaches offer
a pathway for achieving this goal by executing skill combinations arranged by a
task planner, with each visuomotor skill pre-trained using a specific imitation
learning (IL) algorithm. However, even in simple long-horizon tasks like skill
chaining, hierarchical approaches often struggle due to a problem we identify
as Observation Space Shift (OSS), where the sequential execution of preceding
skills causes shifts in the observation space, disrupting the performance of
subsequent individually trained skill policies. To validate OSS and evaluate
its impact on long-horizon tasks, we introduce BOSS (a Benchmark for
Observation Space Shift). BOSS comprises three distinct challenges: “Single
Predicate Shift”, “Accumulated Predicate Shift”, and “Skill Chaining”, each
designed to assess a different aspect of OSS’s negative effect. We evaluated
several recent popular IL algorithms on BOSS, including three Behavioral
Cloning methods and the Visual Language Action model OpenVLA. Even on the
simplest challenge, we observed average performance drops of 67%, 35%, 34%, and
54%, respectively, when comparing skill performance with and without OSS.
Additionally, we investigate a potential solution to OSS that scales up the
training data for each skill with a larger and more visually diverse set of
demonstrations, with our results showing it is not sufficient to resolve OSS.
The project page is: https://boss-benchmark.github.io/
Source: http://arxiv.org/abs/2502.15679v1