Authors: Hana Chockler, David A. Kelly, Daniel Kroening, Youcheng Sun
Abstract: Existing algorithms for explaining the output of image classifiers use
different definitions of explanations and a variety of techniques to extract
them. However, none of the existing tools use a principled approach based on
formal definitions of causes and explanations for the explanation extraction.
In this paper we present a novel black-box approach to computing explanations
grounded in the theory of actual causality. We prove relevant theoretical
results and present an algorithm for computing approximate explanations based
on these definitions. We prove termination of our algorithm and discuss its
complexity and the amount of approximation compared to the precise definition.
We implemented the framework in a tool rex and we present experimental results
and a comparison with state-of-the-art tools. We demonstrate that rex is the
most efficient tool and produces the smallest explanations, in addition to
outperforming other black-box tools on standard quality measures.
Source: http://arxiv.org/abs/2411.08875v1