Scoring models Interpretability — Explainable AI applied to churn prediction
Introduction
In the dynamic landscape of marketing, the ability to communicate with precision to the right client at the right moment is a strategic imperative for any company. This personalized strategy encourages conversion, engagement, and resonance. Achieving these demands is a multi-faceted strategy, including data analysis to identify optimal touchpoints, leveraging customer segmentation for tailored messaging, and using predictive analytics to anticipate needs. Interpretable models play a pivotal role in this process. By providing transparent insights into the factors driving predictions, these models empower marketers to refine strategies in alignment with audience preferences. To achieve this goal, Explainable Artificial Intelligence (XAI) methods help getting clear interpretation of our prediction results.
This article will provide a comprehensive exploration of various methods designed to improve the interpretability of machine learning models. Several methods will be explored, including built-in feature importance, permutation importance, Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), SHAPASH, and InterpretML with Explainable Boosting Machines (EBM) models.
We will start by providing a comprehensive understanding of each technique, their strengths and limitations, and conclude with a synthetic comparison of all models. Then, we will present the application of these methods on our use case with detailed code, graphical output, and interpretation.
Few Reminders about Scoring and Client knowledge
In marketing, understanding clients is crucial for effective decision-making. This knowledge informs targeted campaigns and personalized experiences. Scoring, a common method, quantifies leads or customers based on criteria, helping predict different kinds of actions, such as churn. Churn prediction scoring involves data-driven techniques assigning probabilities to anticipate customer attrition. Our first article discusses developing a scoring model for precise churn prediction.
I — Interpretability methods
1. Built-in feature importance
Overview
To determine whether a feature is relevant in predicting the target, XGBoost embeds a feature importance function. This is crucial for feature selection, interpretation, and assessing the influence of features on predictions. Values result from the feature’s contribution to model performance boosting. Multiple importance types exist:
- Gain: Measures average gain throughout boosting rounds, indicating prediction significance.
- Weight: Reflects feature’s split usage across trees.
- Cover: Gauges feature’s influence via splits on observation count.
To avoid misunderstandings, be aware of subtle differences across types. The built-in function of XGBoost assists in comprehending the multifarious insights of feature importance.
Pros and cons
Selecting importance type is based on goals. “Gain Importance” is ideal for boosting prediction. Yet, for comprehensive insight, consider evaluating scores from various types to understand feature behavior better. For more information about built-in feature importance with XGBoost, we recommend this medium article.
2. Permutation importance
Overview
Permutation importance gauges feature significance in machine learning by shuffling feature values and observing the resulting impact on model performance, revealing the feature’s contribution to the relationship with the target.
Here’s permutation importance procedure:
- Train model with original data, noting performance metric.
- Shuffle a feature’s values while preserving target.
- Predict with shuffled data, measure performance drop, which indicates feature importance.
- Repeat for all features.
Pros and cons
3. LIME
Overview
LIME (Local Interpretable Model-agnostic Explanations) explains complex machine learning model predictions locally. It approximates with a simplified “local” model and provides explanations for why particular predictions were made. It was introduced in 2016 and is available as a Python Library.
LIME’s explanation process involves the following steps:
- Pick an instance for explanation.
- Modify its features slightly to create a similar dataset.
- Use original model to predict perturbed instances, then train a simpler and interpretable model locally.
- Assign weights to perturbed instances based on proximity.
- Evaluate feature importance through coefficients from interpretable model.
- Explain complex model’s prediction from interpretation generated for chosen instance, showcasing influential features and their effects in the context.
Pros and cons
4. SHAP
Overview
SHAP (SHapley Additive exPlanations) employs cooperative game theory’s Shapley value to explain machine learning predictions. Shapley value provides a fair way of distributing the “worth” of a coalition of players in a game. SHAP explanations aim to assign contributions to individual features for a given prediction in a way that is both intuitive and mathematically sound. SHAP was introduced in 2017 and the SHAP library is available for various programming languages, including Python and R.
Focus: Cooperative game theory
“Cooperative game theory” examines collaboration among players with shared goals, distributing rewards fairly based on contributions. SHAP employs the “Shapley value,” a cooperative game theory concept, for feature attribution. It ensures just allocation of coalition rewards. Introduced by Lloyd Shapley, it calculates individual impacts within coalitions and is vital in explaining complex model predictions by attributing feature contributions.
Now, let’s draw the connection between cooperative game theory and SHAP explanations:
- Features as Players: Each instance’s prediction is a “game.” Features are players. A coalition is a subset of collaborating features for a prediction.
- Contribution: Features affect prediction. Shapley value averages marginal impact of each player on outcomes. SHAP calculates feature’s average contribution to prediction across subsets.
- Fair Distribution: Shapley fairly distributes coalition’s worth among players based on contribution. SHAP assigns worth (model’s prediction) among features to explain contributions.
- Individual and Group Contributions: Both respect of individual importance and coalition dynamics.
- Consistency and Intuition: Just as Shapley values players’ intuition, SHAP attributes value to features consistently and intuitively, enhancing interpretability.
Pros and cons
5. SHAPASH
Overview
SHAPASH, a Python library, simplifies machine learning models interpretability for tabular data. Built on SHAP framework, it attributes feature contributions for predictions. With a user-friendly interface, SHAPASH offers interactive visual explanations.
SHAPASH interpretability steps are:
- Load data into Pandas DataFrame.
- Train machine learning model on your dataset.
- Use SmartExplainer for explanations.
- Explain predictions, visualize feature impacts using interactive, customizable visualizations.
SHAPASH extends beyond single predictions, analyzing feature importance and interactions globally. It also supports deployment and sharing of explanations, helping transparency and comprehension in model predictions.
Pros and cons
6. InterpretML/EBM
InterpretML is a Python library offering a unified interface for model interpretation. It includes techniques like Explainable Boosting Machines (EBM), a transparent model built on trees and cyclic gradients, but also automatic interaction detection. Like Random Forest and Boosted Trees, EBM strikes a compromise between accuracy and interpretability. While slightly less accurate than XGBoost, EBM excels in intelligibility, acting as a Generalized Additive Model (GAM).
Here’s how interpretability with InterpretML and EBM models works:
- Load data, train complex model (XGBoost in our case).
- Create EBM model using “ExplainableBoostingClassifier” class. It approximates complex model interpretably.
- Use EBM explainer to interpret predictions, offering global and local insights.
- Visualize explanation via InterpretML’s tools, showcasing individual feature contributions to model predictions.
Pros and cons
Final comparison of methods
II — Application to our use case
1. Built-in feature importance
In our case, the key variables are different regarding the type of feature importance considered:
Code
importance_types = ["gain", "cover", " weight"]
colors = palette
sns.set(style="darkgrid")
for (f,color) in zip(importance_types,colors):
f_importance = xgb.get_booster().get_score(importance_type=f)
importance_df = pd.DataFrame.from_dict(data=f_importance, orient='index').sort_values(by=[0],ascending=False)
importance_df[0:9].plot(kind="barh", color = color ,legend=None, grid=False,figsize=(8,6)).invert_yaxis()
plt.title("XGBClassifier Features Importance | " + f)
2. Permutation importance
We use the permutation_importance
function from scikit-learn inspection library.
Code
from sklearn.inspection import permutation_importance
perm_importance = permutation_importance(xgb, X_test, y_test)
sorted_idx = perm_importance.importances_mean.argsort()
plt.barh(X_test.columns[sorted_idx][-9:], perm_importance.importances_mean[sorted_idx][-9:], color = "#1DE9B6")
plt.title("XGBClassifier Features Importance")
plt.xlabel("Permutation Importance")
Once again, the autorenew_not_cancel
feature is the most important. This corroborates the impact of this feature on the model results.
3. LIME
Considering a classified individual, LIME uses the following process to determine the importance of different features in the classification:
For our use case, we consider the case where the client is classified as churn and the autorenew_not_cancel
feature is 0. To obtain clients in this case only, we create the X_churn_with_renew_0
dataset, which contains only the clients from X_test
dataset that satisfy our case. This dataset will also be used for following methods of the article.
The results are the following:
Code
X_churn_with_renew_0_n = X_churn_with_renew_0.to_numpy()
exp1 = explainer.explain_instance(
data_row= X_churn_with_renew_0_n[0],
predict_fn=xgb.predict_proba,
num_features=10
)
exp1.show_in_notebook(show_table=True)
The prediction probabilities represent the confidence of LIME about the prediction. In this case, LIME has 95% certainty of churn (is_churn=1). On the right, the graph shows the weights of the contribution of the features in this prediction. The feature autorenew_not_cancel
has an importance of 38%.
4. SHAP
SHAP provides different types of explainers tailored to various machine learning models and scenarios. In our case, since our model is XGBoost, we use Tree Explainer, designed for tree-based models. Since SHAP provides global and local interpretability, we perform the study for both cases.
Global interpretability
To visualize the Shapley values for our use case, we use a beeswarm summary plot. This graph combines the variables’ importance and their effects. Each point on the graph is a Shapley value for a variable and a data item. The color represents the impact of the variable, from low (blue) to high (red).
For our case, the plot is the following:
Code
import shap
explainer = shap.TreeExplainer(xgb)
shap_values = explainer.shap_values(X_test)
plt.grid(False)
shap.summary_plot(shap_values, X_test)
These results can be interpreted as follows:
- No subscriptions cancellation implies less churn likelihood (since high values of
autorenew_not_cancel
impact negatively churn probability). - Higher average subscription days implies higher churn chance (since high values of feature
mean_membership_duration
impact positively churn probability). autorenew_not_canel
greatly influences model’s decision.
Local interpretability
SHAP is also capable of providing explanation for a particular prediction. For this task, several plots are available to visualize the impact of different features on the prediction.
Considering the case where the client is classified as churn and the autorenew_not_cancel feature is 0, we can observe the features impact by SHAP force plot:
Code
shap.initjs()
shap_values = explainer.shap_values(X_churn_with_renew_0)
shap.plots.force(shap_values[0])
In the plot, the model’s score (bold 3.19) influences prediction: lower predicts 0, higher predicts 1. Essential features marked red/blue impact the score. Closeness to line between red/blue shows impact extent; bar size indicates strength. Client’s churn classification linked to red variables ( autorenew_not_cancel, activity_level_february, mean_active_march
).
The results can be interpreted as follow:
- Longer-than-average subscriptions increases chances of churn (
mean_membership_duration
) - Non-renewal of subscriptions increases chances of churn (
autorenew_not_cancel
)
5. SHAPASH
As mentioned, SHAPASH provides both local and global interpretability. For global interpretability, the results are the following:
Code
pip install shapash
from shapash import SmartExplainer
xpl = SmartExplainer(model = xgb, backend='acv')
xpl.compile(x=X_test,y_target=y_test)
xpl.plot.features_importance()
Once again, the autorenew_not_cancel
feature is the one with the highest contribution. Since we use the SHAP backend, the results are quite like the one obtained with SHAP.
For local interpretability, let’s consider once again the case where a client is classified as churn, knowing that the feature autorenew_not_cancel
is 0. We observe the following results from SHAPASH:
Code
xpl = SmartExplainer(
model=xgb,
backend=acv_backend,
features_dict=feature_dico
)
xpl.compile(x= X_churn_with_renew_0)
idx = X_churn_with_renew_0.index
xpl.plot.local_plot(index=idx[1])
This graph represents the contributions of the different features in this prediction, with features having a positive contribution represented in yellow, and those having a negative contribution in blue. Like previous examples, the mean_membership_duration and autorenew_not_cancel features are the ones with the biggest contribution.
6. InterpretML/EBM
InterpretML with EBM models provides global and local interpretability. For global interpretability, these are the obtained results:
Code
pip install interpret
from interpret import show
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier(random_state=seed)
ebm.fit(X_train, y_train)
ebm_global = ebm.explain_global()
show(ebm_global)
In this case too, the trend remains the same, and the features with the greatest importance are like those identified by the other methods.
For local interpretability, we consider the same case with a client classified as churn. The results are the following:
Code
ebm_local1 = ebm.explain_local(X_churn_with_renew_0,
X_churn_with_renew_0['is_churn'],name='Cas 1')
show(ebm_local1)
The graph shows the contribution of different features in the prediction, with features having a positive contribution in orange, and features having a negative contribution in blue.
The results can be interpreted as follows:
- Longer-than-average subscriptions reduce chances of churn (
mean_membership_duration
) - Non-renewal of subscriptions increases chances of churn (
autorenew_not_cancel
) - The amount paid compared to the average reduces the chances of being a churn customer (
mean_amount_paid
)
Conclusion
In summary, this article highlights various methods in the pursuit of making artificial intelligence more transparent and accountable. Built-in feature importance and permutation importance offer quick insights but may be less efficient with complex models. LIME provides model-agnostic interpretability, while SHAP uncovers global feature contributions with a strong theoretical foundation. SHAPASH adds interactivity for real-time exploration, and InterpretML with EBM models offers a principled framework. Each method comes with unique advantages and trade-offs, supporting diverse scenarios. This evolving field equips practitioners to balance prediction accuracy and transparency for ethical and responsible AI development.