1. Introduction
2. Related Work
3. ROAR: Remove And Retrain
4. Large scale experiments
4.1. Estimators under consideration
Base estimators
- Gradients or Sensitivity heatmaps (GRAD)
- the gradient of the output activation of interest with respect to inputs
$$
\mathbf{e} = \frac{\partial A^l_n}{\partial x_i}
$$
- Guided Backprop (GB)
- stops the flow of gradients when less than zeros at a ReLu gate
- Integrated Gradients (IG)
- between a non-informative reference point x0 to the actual input x
$$
\mathbf{e} = (\mathbf{x}i-\mathbf{x}^0_i) \times \sum^k{i=1} \frac{\partial f_w(\mathbf{x}^0 + \frac{i}{k}(\mathbf{x}-\mathbf{x}^0))}{\partial \mathbf{x}_i} \times \frac{1}{k}
$$
Ensembling methods
- Classic SmoothGrad (SG)
- averages a set J noisy estimates of feature importance
$$
\mathbf{e} = \sum^J_{i=1} (g_i(\mathbf{x} + \eta, A^l_n))
$$
- SmoothGrad^2 (SG-SQ)
- squares each estimate e before averaging the estimates
$$
\mathbf{e} = \sum^J_{i=1} (g_i(\mathbf{x}+\eta, A^l_n)^2)
$$
- VarGrad (Var)
- computes the variance of the noise set rather than the mean
$$
\mathbf{e} = \mathbf{Var}(g_i(\mathbf{x}+\eta, A^l_n))
$$
Control Variants