How can AI leverage banks’ credit risk modelling methods?

12 min readNov 13, 2023

Unlike our usual articles on the latest technologies, this study delves into the profound impact that the Machine Learning and Deep Learning has on credit risk modelling. Through credit risk modelling, the recent AI journey comes to light.

The 2008 financial crisis, triggered by the collapse of the real estate market in the United States, has highlighted the failures and deficiencies of the international banking system and has brought a severe discredit on the credibility of the financial authorities.

In response to this crisis, credit institutions, especially traditional banks, are facing more and more new banking and financial regulations, mainly in the context of credit risk modelling. These regulations generate new constraints, particularly regarding the granting of credit , the definition of interest rates, and the determination of the amounts of provision and capital associated with credit portfolio. These include the Basel III Accord (2014) and Basel IV Accord (still in development) at the instigation of the G20, as well as the IFRS 9 accounting standard (2014). Thus, banks are being driven to develop and improve their own internal models but also to consider new variables in their approaches, such as the use of future macroeconomic conjecture scenarios.

At the same time, in the era of the digital revolution, banks are also facing new competitors in their credit activities: banking Fintech, credit platforms, online banks, Neobanks and so on. These new players rely on new sources of information (such as social networks) and on the digitization of the customer relationship to develop their own advanced models of credit risk modelling. To stay competitive, traditional banks are again pushed to use these new sources of data and to elaborate more effective, more reliable, faster and more profitable models.

Confronted to these two challenges (regulation and competition), banks are now calling their traditional credit risk modelling approaches into question and are considering the possibility of implementing more advanced risk modelling methods using Machine Learning (ML) and Deep Learning (DL) which undergo rapid evolution.

This article aims precisely to analyse how can ML leverage banks credit risk modelling methods.

First, we will describe models traditionally used by banks in credit risk modelling and their particularities. In a second part, we will analyse the different contributions of ML in the credit risk modelling: improvement of predictive performance of models, productivity gains and usage of new, richer, and larger sources of data. Finally, in a third part, we will describe the current limits of ML-oriented approaches, especially regarding requirements of international regulations.

Credit risk modelling: definition, context, and traditional models

Definition and context

Credit risk modelling is an analytical process implemented by financial institutions such as banks and credit companies to assess and quantify the risk of default associated with their credit and investment portfolios.

Credit risk modelling is based on the analysis of many financial variables and data, for example in the case of credit to companies: credit history, credit score by rating agencies, nature and duration of credit, financial situation, quality of guarantees, economic situation (unemployment rate, inflation rate, interest rate), sector-specific characteristics and so on.

This data is used to develop statistical models that estimate for each credit lines of a given credit portfolio:

Default probabilities (PD for « Probability of default »).
Default exposures (EAD for « Exposure at default »).
Default loss (LGD for « Loss Given Default »).

These estimations are then used by financial institutions and banks to:

Take decision on granting new credits.
Define interest rates of their credits.
Assess the level of risk related to a portfolio and thus determine the amount of provision (IFRS 9) or the amount of capital required (Basel III).

In this article, we will mainly focus our analysis on the central and fundamental Probability Default (PD) estimation model that is used in the three points listed above.

Traditional models of credit risk modelling

The default probability estimation models traditionally used today by most of the banks are parametric models, meaning they assume that data is generated from specific probability distributions and is based on a fixed number of parameters to be estimated.

One of the most common used models is the Logistic Regression, which is based on several strong assumptions: observations must be independent, explanatory variables must be non-multicollinear, and the Logit of the explained variable and the explanatory variables must have a linear relationship.

Other more complex parametric models are also used such as the Merton structural model or reduced-form models. These models are again based on numerous probabilistic assumptions regarding the probability distributions followed by the variables as well as the nature of the relation between them.

What are the benefits of machine learning models compared to these traditional modelling approaches?

The contributions of Machine Learning in credit risk modelling

Improvement of the predictive performance of models

First, the metrics used in the evaluation of the performance of the PD model are the usual classification metrics: Accuracy, Precision and Recall. Banking methodologies also construct ROC curves (Receiver Operating Characteristic) and measure AUC (Area Under the Curve). Other more complex performance measures can be used: Brier score, H-measurement and so on.

The results of several studies highlight the higher performance of ML algorithms to predict credit defaults compared to traditional approaches. Thus, the study by Stefan Lessmann, Bart Baesens et al. (2015), «Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research», produced a comparative analysis of 41 ML algorithms on 8 credit databases. Their results showed that ML algorithms using ensemble methods (Ensemble Learning) such as bagging (Random Forest) or boosting (XGBoost) achieve significantly higher performance compared to parametric classifiers (Logistic Regression) and to individual classifiers of ML type (Decision Tree). Heterogeneous ensemble methods (such as Weighted Average Ensemble) proved to be the most efficient.

This advantage of ML methods over parametric approaches (such as Logistic Regression) can be partially explained by two factors: the ability of ML algorithms to take into account interactions between explanatory variables and the existence of non-linear effects (such as “threshold effect”) between the explanatory variables and the explained variable. It would potentially be possible for a Logistic Regression to consider these two aspects, but this would be at the cost of an advanced feature engineering work in order to identify the relevant combinations between variables as well as to build the pertinent thresholds during the construction of the parametric model.

This explanation must be qualified in the field of credit risk modelling. Indeed, the performance differences between individual classifiers (Decision Trees, Neural Network) and the parametric approach is less significant than in the case of Ensemble Learning algorithms, even though the previous explanation is still valid. Indeed, in the case of credit risk modelling, variables used by the models include few non-linear relationships to fully explain the differences of performance.

The use of ML approaches in credit risk modelling not only improves predictive performance, but also results in significant productivity gains.

Increase in productivity

ML algorithms allow a strong productivity gain compared to traditional approaches by reducing the pre-processing time of credit data before modelling.

Thus, in the case of the traditional parametric approach, many data transformations are required to meet the requirements and assumptions of the models:

Processing of missing values, outliers, and duplicate values.
Grouping classes of discrete variables to increase their discrimination capacities.
Discretization of continuous variables to take into account non-linear effects (such as « threshold effects ») and to reduce the influence of extreme values.
Combination of variables to take into account interactions between explanatory variables.
Study of correlation (correlation matrix) between explanatory variables in order to verify the satisfaction of the non-multi-collinearity assumption but also to measure the degree of correlation.
Selection of explanatory variables of the final model (manual or automatic selection), especially in the case of overfitting due to “Fat Data”.

These different steps of preprocessing involve many algorithms and statistical tests: chi-squared Test, Kolmogorov–Smirnov Test, Tschuprow Test, Tukey-Kramer Test and so on.

On the contrary, most ML algorithms allows to ignore most of these pre-processing. Thus, the predictive performances of an algorithm such as the Random Forest are generally not affected by the presence of missing values, outliers, duplicate observations, strong correlations between explanatory variables, existence of many categories for discrete variables, presence of continuous variables and non-linear effects. At the same time, Lasso Regression (L1 standard penalization) allows to automatically select the variables of the model and to solve multilinearity issues.

Moreover, by bypassing preprocessing steps of credit risk data and using almost raw data, ML methods also reduce the risk of bias in modelling.

Finally, the extreme ease of access, usage and implementation of ML models contrasts with the difficulties and long development times of traditional parametric models. This is particularly the case when they go beyond Logistic Regression to become more specific, as in the case of Merton structural models and reduced-form models. They require a long work of mathematical modelling but also of computer implementation. On the contrary, in the case of ML, the most used programming languages in credit risk modelling (Python, R and SAS) each provide many packages to easily set up most models. They also provide numerous algorithms making it easy to determine hyperparameters and measure the performance of the models.

The last advantage of ML models is their ability to mobilize more diverse and voluminous data than traditional approaches.

The use of new, richer, and larger datasets in the era of digital revolution and Big Data

The traditional data used by traditional models of credit risk modelling, in the case of credit to companies or individuals, are generally as follows: credit history, credit score by rating agencies, nature and duration of credit, financial situation, quality of guarantees, economic conditions (unemployment rate, inflation rate, interest rate, growth rate), sector-specific characteristics, characteristics of the borrower (in the case of an individual) and so on.

In the era of the digital revolution, with the appearance of new sources of information (social networks) and the digitization of the customer relationship, new and much more diverse and richer data are collected, especially by new players in the credit sector (banking Fintech, credit platforms, online banks, Neobanks and so on):

Data from social networks of companies (or its managers) or individuals, such as on LinkedIn: activities on the profile/ official page, number of publications, number of contacts, quality of contacts (including their solvency).
Browsing data during online credit requests: IP address, number of connections, device used, type of email address, operating system, location, and time of connection.
Personal data, usually not used by traditional banks: bank accounts, history of payments and direct debit (health, food, education), wages.

ML models make it possible to make use of these new data sources, which are often very large (Big Data). Thus, many of these new variables have non-linear effects on the probability of default, making difficult the manual determination of their functional relationship within the parametric approach. In the same way, dataset with large number of explanatory variables and relatively small number of observations (such as internet connection and browsing data) can lead to a strong overfitting of the parametric models. To solve it, methods of variable selection are needed, which require again the manual implementation of many tests and controls. On the contrary, ML methods allow both to automatically determine the optimal functional forms (Decision Trees) and to automatically select the relevant variables (Lasso Regression).

According to several studies, the inclusion of these new data combined with the use of ML method significantly increases the predictive performance of credit risk models. For example, the study by Maria Oskardottir and al. (2019), “The Value of Big Data for Credit Scoring: Enhancing Financial Inclusion using Mobile Phone Data and Social Network Analysis”, estimated credit card holder defaults using or not borrower social media data in addition to traditional data. The results showed that the AUC of the model using all available data is higher than that of the model using only traditional data. Moreover, the authors showed that a model using only new data (social networks) achieves similar performance to the model using only traditional data.

Current limits of Machine Learning oriented approaches

The question of interpretability

Traditional models used in credit risk modelling are considered to be highly interpretable. Thus, in the case of parametric models, a weight is associated with each explanatory variable to quantify its importance in the prediction process. The counterpart of this transparency are the many assumptions that underlie the model, including the constraint functional form between the explanatory variables and the explained variable.

On the other hand, ML models are considered as less transparent and are assimilated to « black boxes », especially the most efficient methods of Ensemble Learning (Random Forest, XGBoost and so one). This is mainly due to their greater flexibility: the functional forms of the relationships between the explanatory variables and the explained variable are not predefined.

However, there are now more and more methods called « Model-Agnostic Methods » that allow to interpret the importance of each variable in ML algorithms, often in exchange of several model hypotheses: PDP (Partial Dependencies Plot), LIME method (Local Interpretable Model-agnostic Explanation), QII (Quantitative Input Influence) , SHAP (Shapley Additive exPlanations ), Gini importance (for Random Forest) and so on.

Nevertheless, the interpretability of a model is not limited to only quantifying the importance of each variable in the prediction process.

The question of causal inference

Parametric models, such as logistic regression, can establish causal relationships between predictors and the predicted variable. They make it possible to identify the causal effect of the variation of an explanatory variable « all other things being equal » on the variable explained. The possibility of carrying out such reasoning is a prerequisite for many banking regulations, especially as part of the implementation of Stress testing exercise to challenge the robustness of models. These stress tests include simulating extreme but plausible economic and financial conditions in order to identify the consequences for banks and to measure their resilience to such situations.

In the case of ML models, many studies are now conducted on the notion of causality, but it remains currently difficult to establish causal relationships for most algorithms.

The challenges related to the use of new data sources

The use of new data available in the era of the digital revolution can raise regulation issues with the GDPR (General Data Protection Regulation). Thus, if the use of new data in credit risk modelling can allow banks to better comply with the new banking regulatory requirements in terms of predictive performance, it may come up against the GDPR. It is therefore necessary for banks wishing to exploit the new data available to pay attention to the nature of this data and to implement anonymization processes and secure data sharing.

The use of this new data also generates ethical issues. Indeed, conditioning the access to credit of individuals or companies to the values of some variables, even if they have a strong predictive capacity of credit default, can raise questions. This ethical issue can appear in direct voluntary selection of variables, such as for example when taking into account the quality of a person’s network on LinkedIn to degrade his conditions of access to credit. But it can also result from unintentional biases of ML models by the appearance of inequitable treatments that they can induce and that can lead to discrimination of populations. To avoid these discriminatory biases, monitoring the absence of discriminatory variables in model databases is not always sufficient. These biases can indeed occur indirectly via the interaction of proxy variables that are not discriminatory (proxy discrimination). These discriminations may concern in particular the gender, the origin, or the places of residence of the borrowers.

However, a precise and thoughtful choice of variables used as well as the implementation of new methods of interpretation of the ML models described above (new ones are still in development), can help to address these issues.

The contributions of ML algorithms in the credit risk modelling as a replacement for the traditional parametric models traditionally, are numerous. They allow both the improvement of the predictive performance of the models, a productivity gain in the data pre-processing as well as the use of more diverse and voluminous data sources.

These different advantages make these ML-oriented approaches ideal candidates to meet the different challenges that banks are currently facing, such as the implementation of even more new international regulations as well as the emergence of new competitive players in the credit market.

However, these algorithms must still be used with prudence, especially due to their lack of interpretability (mainly regarding causal inference), the new variables used and the potential discrimination biases they may induce. These difficulties do not always allow these approaches to comply with international regulations such as stress testing exercises or the GDPR.

The intense research activity in the field of ML and the constant development of new algorithms and new approaches, especially in the area of causal inference, should allow in the near future to challenge these issues.

Sources: