1. Introduction

2. Related Work

3. ROAR: Remove And Retrain

4. Large scale experiments

4.1. Estimators under consideration

Base estimators

Gradients or Sensitivity heatmaps (GRAD)
- the gradient of the output activation of interest with respect to inputs

$$ \mathbf{e} = \frac{\partial A^l_n}{\partial x_i} $$

Guided Backprop (GB)
- stops the flow of gradients when less than zeros at a ReLu gate
Integrated Gradients (IG)
- between a non-informative reference point x0 to the actual input x

$$ \mathbf{e} = (\mathbf{x}i-\mathbf{x}^0_i) \times \sum^k{i=1} \frac{\partial f_w(\mathbf{x}^0 + \frac{i}{k}(\mathbf{x}-\mathbf{x}^0))}{\partial \mathbf{x}_i} \times \frac{1}{k} $$

Ensembling methods

Classic SmoothGrad (SG)
- averages a set J noisy estimates of feature importance

$$ \mathbf{e} = \sum^J_{i=1} (g_i(\mathbf{x} + \eta, A^l_n)) $$

SmoothGrad^2 (SG-SQ)
- squares each estimate e before averaging the estimates

$$ \mathbf{e} = \sum^J_{i=1} (g_i(\mathbf{x}+\eta, A^l_n)^2) $$

VarGrad (Var)
- computes the variance of the noise set rather than the mean

$$ \mathbf{e} = \mathbf{Var}(g_i(\mathbf{x}+\eta, A^l_n)) $$