General Information

Interpretability of machine learning is crucial for the so-called conservative domains like medicine or credit scoring. That's why we can see that in many cases, the more sophisticated models that provide, the more accurate prediction are rejected in favor of the less accurate but more interpretable solutions. If we do not clearly explain why the model returns one or the other response, we can not trust the model and use it for the basis of the decision-making.

One of the potential solutions to this issue is to apply modern approaches to explain complex models, which have solid theoretical foundations and practical implementations. The complex models' explanation methods can be divided into two big groups - global and local explanations.

The global explanation tries to describe the model in general, in terms of the features that impact the model the most, like, for instance, Feature Importance, that is available in the Model Performance dashboard.

As for the local explanation, it is aimed to identify how the different input variables or features influenced a specific prediction or model response.

The most bright examples of the local-explanation methods are the approaches related to model-agnostic methods like Explanation VectorsLIME (Local Interpretable Model-agnostic Explanations), and Shapley values.

The Shapley values approach builds on concepts from cooperative game theory - this is a method initially invented for assigning payouts to players depending on their contribution towards the total payout. In the context of the explanation, the features are considered as players, and the prediction is the total payout.

The Shapley values explain the difference between the specific prediction and the global average prediction - the sum of the Shapley values for all features is equal to the difference between this particular prediction (probability of the event A for these particular conditions) and the mean probability of the event A. As for the LIME approach, it explains the impact of the specific features on the model response in comparison with the model responses for similar conditions.

For the credit scoring case, when the model predicts the probability of the credit default, the sum of the Shapley values is equal to the difference between the predicted credit default probability and the global average probability of the credit default. In contrast, the sum of LIME values is the difference between the predicted credit default probability and the probability of the credit default of the average credit default probability for the persons are similar to the analyzed one.

In the study A Unified Approach to Interpreting Model Predictions, the authors showed that Shapley values have a much stronger agreement with human explanations than LIME, so this approach is preferable for the basis of the universal solution for the reason of the modeling results.

SHAP for the model explanation

The Shapley value is defined as a value function $v$ of player $i$ in $S$. The Shapley value reflects the contribution of the player to the payout, weighted and summed over all possible feature value combinations:

shap_values.png

where $S$ is a subset of the features used in the model and $N$the number of features.

The goal of SHAP is to explain the specific prediction (for instance x) by computing the individual contribution of each feature to the prediction. Shapley values show how the prediction distribute among the features and the explanation is specified as:

$$ g(z')=\phi_0+\sum_{j}^M{\phi_jz'_j} $$

where $g$ is the explanation model, $z'\in\{0,1\}^M$ is the coalition vector, $M$ is the maximum coalition size, and $\phi_j\in \R$ is the feature attribution for a feature j, the Shapley values. Coalition vector is the binary vector when In the coalition vector, an entry of 1 means that the corresponding feature value is “present” and 0 that it is “absent”.

The sum of SHAP values represents the difference between the model response for the analyzed case and the average model response. The last one is resulted as "base value". For instance, we are solving the "Titanic Survival" problem and got the following results:

$$ \def\arraystretch{1.4}\begin{array}{|l|l|l|l|l|}\hline\textsf{\textbf{}} & \textsf{\textbf{Feature Values}} & \textsf{\textbf{Shap Values}} & \textsf{\textbf{Feature Values}} & \textsf{\textbf{Shap Values}}\\\hline\textsf{pclass} & \textsf{1,00} & \textsf{63,50} & \textsf{3,00} & \textsf{-50,53}\\\hline\textsf{age} & \textsf{29,00} & \textsf{14,12} & \textsf{29,00} & \textsf{6,28}\\\hline\textsf{sibsp} & \textsf{0,00} & \textsf{5,84} & \textsf{0,00} & \textsf{6,60}\\\hline\textsf{parch} & \textsf{0,00} & \textsf{-3,81} & \textsf{0,00} & \textsf{-4,18}\\\hline\textsf{fare} & \textsf{211,34} & \textsf{90,70} & \textsf{211,34} & \textsf{68,76}\\\hline\textsf{P(Survived)} & \textsf{1,00} & \textsf{} & \textsf{0,00} & \textsf{}\\\hline\textsf{SUM(SHAP)} & \textsf{} & \textsf{170,35} & \textsf{} & \textsf{26,93}\\\hline\end{array} $$

The first example relates to the positive response of the model - we got the probability 1.0, while the negative answer characterizes the second one. SHAP values allow us to understand why the model provides so different reactions while the values of the features are mostly equal.