Scoring models Interpretability — Explainable AI applied to churn prediction

10 min readNov 2, 2023

Introduction

In the dynamic landscape of marketing, the ability to communicate with precision to the right client at the right moment is a strategic imperative for any company. This personalized strategy encourages conversion, engagement, and resonance. Achieving these demands is a multi-faceted strategy, including data analysis to identify optimal touchpoints, leveraging customer segmentation for tailored messaging, and using predictive analytics to anticipate needs. Interpretable models play a pivotal role in this process. By providing transparent insights into the factors driving predictions, these models empower marketers to refine strategies in alignment with audience preferences. To achieve this goal, Explainable Artificial Intelligence (XAI) methods help getting clear interpretation of our prediction results.

This article will provide a comprehensive exploration of various methods designed to improve the interpretability of machine learning models. Several methods will be explored, including built-in feature importance, permutation importance, Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), SHAPASH, and InterpretML with Explainable Boosting Machines (EBM) models.

We will start by providing a comprehensive understanding of each technique, their strengths and limitations, and conclude with a synthetic comparison of all models. Then, we will present the application of these methods on our use case with detailed code, graphical output, and interpretation.

Few Reminders about Scoring and Client knowledge

In marketing, understanding clients is crucial for effective decision-making. This knowledge informs targeted campaigns and personalized experiences. Scoring, a common method, quantifies leads or customers based on criteria, helping predict different kinds of actions, such as churn. Churn prediction scoring involves data-driven techniques assigning probabilities to anticipate customer attrition. Our first article discusses developing a scoring model for precise churn prediction.

I — Interpretability methods

1. Built-in feature importance

Overview

To determine whether a feature is relevant in predicting the target, XGBoost embeds a feature importance function. This is crucial for feature selection, interpretation, and assessing the influence of features on predictions. Values result from the feature’s contribution to model performance boosting. Multiple importance types exist:

Gain: Measures average gain throughout boosting rounds, indicating prediction significance.
Weight: Reflects feature’s split usage across trees.
Cover: Gauges feature’s influence via splits on observation count.

To avoid misunderstandings, be aware of subtle differences across types. The built-in function of XGBoost assists in comprehending the multifarious insights of feature importance.

Pros and cons

Selecting importance type is based on goals. “Gain Importance” is ideal for boosting prediction. Yet, for comprehensive insight, consider evaluating scores from various types to understand feature behavior better. For more information about built-in feature importance with XGBoost, we recommend this medium article.

2. Permutation importance

Overview

Permutation importance gauges feature significance in machine learning by shuffling feature values and observing the resulting impact on model performance, revealing the feature’s contribution to the relationship with the target.

Here’s permutation importance procedure:

Train model with original data, noting performance metric.
Shuffle a feature’s values while preserving target.
Predict with shuffled data, measure performance drop, which indicates feature importance.
Repeat for all features.

Pros and cons

3. LIME

Overview

LIME (Local Interpretable Model-agnostic Explanations) explains complex machine learning model predictions locally. It approximates with a simplified “local” model and provides explanations for why particular predictions were made. It was introduced in 2016 and is available as a Python Library.

LIME’s explanation process involves the following steps:

Pick an instance for explanation.
Modify its features slightly to create a similar dataset.
Use original model to predict perturbed instances, then train a simpler and interpretable model locally.
Assign weights to perturbed instances based on proximity.
Evaluate feature importance through coefficients from interpretable model.
Explain complex model’s prediction from interpretation generated for chosen instance, showcasing influential features and their effects in the context.

Pros and cons

4. SHAP

Overview

SHAP (SHapley Additive exPlanations) employs cooperative game theory’s Shapley value to explain machine learning predictions. Shapley value provides a fair way of distributing the “worth” of a coalition of players in a game. SHAP explanations aim to assign contributions to individual features for a given prediction in a way that is both intuitive and mathematically sound. SHAP was introduced in 2017 and the SHAP library is available for various programming languages, including Python and R.

Focus: Cooperative game theory

“Cooperative game theory” examines collaboration among players with shared goals, distributing rewards fairly based on contributions. SHAP employs the “Shapley value,” a cooperative game theory concept, for feature attribution. It ensures just allocation of coalition rewards. Introduced by Lloyd Shapley, it calculates individual impacts within coalitions and is vital in explaining complex model predictions by attributing feature contributions.

Now, let’s draw the connection between cooperative game theory and SHAP explanations:

Features as Players: Each instance’s prediction is a “game.” Features are players. A coalition is a subset of collaborating features for a prediction.
Contribution: Features affect prediction. Shapley value averages marginal impact of each player on outcomes. SHAP calculates feature’s average contribution to prediction across subsets.
Fair Distribution: Shapley fairly distributes coalition’s worth among players based on contribution. SHAP assigns worth (model’s prediction) among features to explain contributions.
Individual and Group Contributions: Both respect of individual importance and coalition dynamics.
Consistency and Intuition: Just as Shapley values players’ intuition, SHAP attributes value to features consistently and intuitively, enhancing interpretability.

Pros and cons

5. SHAPASH

Overview

SHAPASH, a Python library, simplifies machine learning models interpretability for tabular data. Built on SHAP framework, it attributes feature contributions for predictions. With a user-friendly interface, SHAPASH offers interactive visual explanations.

SHAPASH interpretability steps are:

Load data into Pandas DataFrame.
Train machine learning model on your dataset.
Use SmartExplainer for explanations.
Explain predictions, visualize feature impacts using interactive, customizable visualizations.

SHAPASH extends beyond single predictions, analyzing feature importance and interactions globally. It also supports deployment and sharing of explanations, helping transparency and comprehension in model predictions.

SHAPASH explanation steps (from SHAPASH documentation)

Pros and cons

6. InterpretML/EBM

InterpretML is a Python library offering a unified interface for model interpretation. It includes techniques like Explainable Boosting Machines (EBM), a transparent model built on trees and cyclic gradients, but also automatic interaction detection. Like Random Forest and Boosted Trees, EBM strikes a compromise between accuracy and interpretability. While slightly less accurate than XGBoost, EBM excels in intelligibility, acting as a Generalized Additive Model (GAM).

Here’s how interpretability with InterpretML and EBM models works:

Load data, train complex model (XGBoost in our case).
Create EBM model using “ExplainableBoostingClassifier” class. It approximates complex model interpretably.
Use EBM explainer to interpret predictions, offering global and local insights.
Visualize explanation via InterpretML’s tools, showcasing individual feature contributions to model predictions.

Pros and cons

Final comparison of methods

II — Application to our use case

1. Built-in feature importance

In our case, the key variables are different regarding the type of feature importance considered:

Code

importance_types = ["gain", "cover", " weight"]
colors = palette
sns.set(style="darkgrid")
for (f,color) in zip(importance_types,colors):
f_importance = xgb.get_booster().get_score(importance_type=f)
importance_df = pd.DataFrame.from_dict(data=f_importance, orient='index').sort_values(by=[0],ascending=False)
importance_df[0:9].plot(kind="barh", color = color ,legend=None, grid=False,figsize=(8,6)).invert_yaxis()
plt.title("XGBClassifier Features Importance | " + f)

2. Permutation importance

We use the permutation_importance function from scikit-learn inspection library.

Code

from sklearn.inspection import permutation_importance
perm_importance = permutation_importance(xgb, X_test, y_test)
sorted_idx = perm_importance.importances_mean.argsort()
plt.barh(X_test.columns[sorted_idx][-9:], perm_importance.importances_mean[sorted_idx][-9:], color = "#1DE9B6")
plt.title("XGBClassifier Features Importance")
plt.xlabel("Permutation Importance")

Once again, the autorenew_not_cancelfeature is the most important. This corroborates the impact of this feature on the model results.

3. LIME

Considering a classified individual, LIME uses the following process to determine the importance of different features in the classification:

For our use case, we consider the case where the client is classified as churn and the autorenew_not_cancel feature is 0. To obtain clients in this case only, we create the X_churn_with_renew_0 dataset, which contains only the clients from X_testdataset that satisfy our case. This dataset will also be used for following methods of the article.

The results are the following:

Code

X_churn_with_renew_0_n = X_churn_with_renew_0.to_numpy()
exp1 = explainer.explain_instance(
            data_row= X_churn_with_renew_0_n[0],
            predict_fn=xgb.predict_proba,
            num_features=10
            )
exp1.show_in_notebook(show_table=True)

The prediction probabilities represent the confidence of LIME about the prediction. In this case, LIME has 95% certainty of churn (is_churn=1). On the right, the graph shows the weights of the contribution of the features in this prediction. The feature autorenew_not_cancel has an importance of 38%.

4. SHAP

SHAP provides different types of explainers tailored to various machine learning models and scenarios. In our case, since our model is XGBoost, we use Tree Explainer, designed for tree-based models. Since SHAP provides global and local interpretability, we perform the study for both cases.

Global interpretability

To visualize the Shapley values for our use case, we use a beeswarm summary plot. This graph combines the variables’ importance and their effects. Each point on the graph is a Shapley value for a variable and a data item. The color represents the impact of the variable, from low (blue) to high (red).

For our case, the plot is the following:

Code

import shap
explainer = shap.TreeExplainer(xgb)
shap_values = explainer.shap_values(X_test)
plt.grid(False)
shap.summary_plot(shap_values, X_test)

These results can be interpreted as follows:

No subscriptions cancellation implies less churn likelihood (since high values of autorenew_not_cancel impact negatively churn probability).
Higher average subscription days implies higher churn chance (since high values of feature mean_membership_duration impact positively churn probability).
autorenew_not_canel greatly influences model’s decision.

Local interpretability

SHAP is also capable of providing explanation for a particular prediction. For this task, several plots are available to visualize the impact of different features on the prediction.

Considering the case where the client is classified as churn and the autorenew_not_cancel feature is 0, we can observe the features impact by SHAP force plot:

Code

shap.initjs()
shap_values = explainer.shap_values(X_churn_with_renew_0)
shap.plots.force(shap_values[0])

In the plot, the model’s score (bold 3.19) influences prediction: lower predicts 0, higher predicts 1. Essential features marked red/blue impact the score. Closeness to line between red/blue shows impact extent; bar size indicates strength. Client’s churn classification linked to red variables ( autorenew_not_cancel, activity_level_february, mean_active_march).

The results can be interpreted as follow:

Longer-than-average subscriptions increases chances of churn ( mean_membership_duration)
Non-renewal of subscriptions increases chances of churn ( autorenew_not_cancel)

5. SHAPASH

As mentioned, SHAPASH provides both local and global interpretability. For global interpretability, the results are the following:

Code

pip install shapash
from shapash import SmartExplainer
xpl = SmartExplainer(model = xgb, backend='acv')
xpl.compile(x=X_test,y_target=y_test)
xpl.plot.features_importance()

Once again, the autorenew_not_cancel feature is the one with the highest contribution. Since we use the SHAP backend, the results are quite like the one obtained with SHAP.

For local interpretability, let’s consider once again the case where a client is classified as churn, knowing that the feature autorenew_not_cancel is 0. We observe the following results from SHAPASH:

Code

xpl = SmartExplainer(
          model=xgb,
          backend=acv_backend,
          features_dict=feature_dico
          )
xpl.compile(x= X_churn_with_renew_0)
idx = X_churn_with_renew_0.index
xpl.plot.local_plot(index=idx[1])

This graph represents the contributions of the different features in this prediction, with features having a positive contribution represented in yellow, and those having a negative contribution in blue. Like previous examples, the mean_membership_duration and autorenew_not_cancel features are the ones with the biggest contribution.

6. InterpretML/EBM

InterpretML with EBM models provides global and local interpretability. For global interpretability, these are the obtained results:

Code

pip install interpret
from interpret import show
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier(random_state=seed)
ebm.fit(X_train, y_train)
ebm_global = ebm.explain_global()
show(ebm_global)

In this case too, the trend remains the same, and the features with the greatest importance are like those identified by the other methods.

For local interpretability, we consider the same case with a client classified as churn. The results are the following:

Code

ebm_local1 = ebm.explain_local(X_churn_with_renew_0,
X_churn_with_renew_0['is_churn'],name='Cas 1')
show(ebm_local1)

The graph shows the contribution of different features in the prediction, with features having a positive contribution in orange, and features having a negative contribution in blue.

The results can be interpreted as follows:

Longer-than-average subscriptions reduce chances of churn ( mean_membership_duration)
Non-renewal of subscriptions increases chances of churn ( autorenew_not_cancel)
The amount paid compared to the average reduces the chances of being a churn customer ( mean_amount_paid)

Conclusion

In summary, this article highlights various methods in the pursuit of making artificial intelligence more transparent and accountable. Built-in feature importance and permutation importance offer quick insights but may be less efficient with complex models. LIME provides model-agnostic interpretability, while SHAP uncovers global feature contributions with a strong theoretical foundation. SHAPASH adds interactivity for real-time exploration, and InterpretML with EBM models offers a principled framework. Each method comes with unique advantages and trade-offs, supporting diverse scenarios. This evolving field equips practitioners to balance prediction accuracy and transparency for ethical and responsible AI development.

Scoring models Interpretability — Explainable AI applied to churn prediction

Introduction

Few Reminders about Scoring and Client knowledge

I — Interpretability methods

1. Built-in feature importance

2. Permutation importance

3. LIME

4. SHAP

5. SHAPASH

6. InterpretML/EBM

Final comparison of methods

II — Application to our use case

1. Built-in feature importance

2. Permutation importance

3. LIME

4. SHAP

5. SHAPASH

6. InterpretML/EBM

Conclusion

Written by Heka.ai