Transparency and Governance in Machine Learning Applications

Published in

Analytics Vidhya

9 min readFeb 6, 2022

Machine learning models and, in front of all, deep learning applications are quite often labeled as “black box” tools. A lot of input data go into the machines. Data are chewed for some time. Eventually, a result is spit out but it is not quite clear how we came to exactly that result.

Why did a valuation model price an apartment at EUR 345,000 while pricing another one at just EUR 295,000? Why did a credit scoring model rate one credit applicant as potentially defaulting while another one was classified as potentially non-defaulting?

The issue with this advanced techniques is that those applications can read out complex patterns with huge amounts of data (in order to use it for certain tasks) which human beings would not be capable of. Saying this, transparency in using those models is lost as human beings cannot retrieve the detailed logic of results anymore.

This situation might not be in line with governance rules rooted in the European Union ESG taxonomy and being incorporated by companies in their respective governance policies. This might be even worse when risk decisions have to be made based on a data driven environment within a stronger regulated industry, like financial institutions, where the transparency of decision making is a required commodity.

Here, we discuss how Shapley values help to bring more transparency into the data driven decision making process and why Game Theory builds an invaluable foundation for all this.

Transparency and Governance Issues

It is needless to say, that one should have an understanding of the architecture of a machine learning/ deep learning model, the algorithms involved, the loss functions embedded and the purpose for which the whole tool was made for. This is a first step of establishing transparency and a proper governance in a data driven business environment.

Another story is to keep this commitment when putting a machine learning/ deep learning model into action. A huge amount of data points is then churned through this architecture based on the underlying algorithms and functions. As said before, it is the job of those models to detect patterns and behaviours in the data at hand which would be not accessible by humans due to the underlying complexity.

Imagine a dataset of 60,000 RGB-based images with a 256 x 256 pixel grid. This would mean a total of 11,796,480,000 data points (60,000 observations x 256 pixel height x 256 pixel width x 3 colour channels) in 4D tensor objects made available to train a deep learning model. It is impossible for a human being to follow the training process in detail.

So, how to deal with transparency and governance in a data driven process? A solution to this problem can be found in the Shapley values.

Shapley Values and Game Theory

The concept of the Shapley values was established in the year 1953 and refers to the world of Game Theory (references you may find below).

To be a bit more specific, we are touching characteristic-function games which are most widely used in the field of cooperative game theory. Characteristic-function games are defined by a set of players and a function that gives the result of the cooperation of those players.

Different combinations of players will achieve different results. One problem is that the number of combinations grows exponentially with the number of players. Another problem of cooperative games is that a respective subset of players cooperating has to be stable. i.e. no player has an incentive to leave the group which would be the case when the players earn more when cooperating than acting alone.

A third issue is how to divide the achieved result from the cooperative game among the players. And here, the Shapley values offers a principled way to do this. According to those principles each player should be paid out with an amount which satisfies the following axioms:

Efficiency: The total value of the achieved result should be distributed.
Dummy Player: Players who make no contribution should receive nothing.
Symmetry: Players who make the same contribution should receive the same.
Additivity: The value should be additive over the set of all games (as there are several games in different combinations).

Due to the exponential behaviour of the structure which makes an analytical determination of the Shapley values impractical in a lot of cases, if not impossible, when increasing the number of players, this value has to be approximated. There are several compact representations for the Shapley values. Though this goes beyond this article. For details, please see references below.

Shapley Values and Artificial Intelligence

So, what does this have to do with our machine learning/ deep learning applications?

In order to make the Shapley values concept adaptable for machine learning/ deep learning models, the input features of a data set are handled as the players in such a cooperative game.

In this context, the average marginal contribution of each of the input parameters is measured given the output of the model and in relation to the other input parameters. This is done by carefully nudging input features and then checking how those changes to the input parameters correspond to the final model output. And it is done over all the observations where features in different observations can have different marginal contributions.

This is once more summarised in the Shapley values equation:

To put it another way, what this equation does is to calculate what the prediction of the model would be without a certain input feature, then calculate the prediction of the model with that very feature and finally calculate the difference. The change in the model’s prediction (i.e. the difference) is essentially the effect of that specific feature.

It is to be noted that the order in which the input features are added in the model is important to how their values are assigned. There can be quite a difference depending on the order. The Shapley values therefore considers any possible ordering and computes a weighted sum to find the final value. That is the reason why the (equation for the) Shapley values must permute over all possible sets of feature groupings (minus the feature we are interested in).

This makes computing the Shapley value computational expensive, i.e. necessary calculations would rise exponentially with the increase of input parameters. In order to keep the complexity manageable, Shapley values are often calculated based on a subgroup of features and computed with respect to a comparison or background group which takes the role of a benchmark group.

Furthermore, approximation methods have to be used to keep the computation feasible.

In other words, estimates of the true Shapley value have to be produced. One of those methods is Kernel SHAP which is a computationally efficient approximation to Shapley values in higher dimensions.

Of course, there are also limitations in using approximation methods. One major limitation for the Kernel SHAP is that it assumes independency of the input features. In order to overcome this, different approaches have to be engaged to estimate the conditional probabilities, like:

Multivariate Gaussian Distribution Approach
Gaussian Copula Approach
Empirical Conditional Distribution Approach
Conditional Inference Tree Approach

Another issue is that Shapley values is vulnerable to unrealistic input, e.g. a housing price model which takes in latitude and longitude as separate features which is queried on real estate in the middle of a mountain or in the middle of water. These circumstances could be even exploited by attackers.

For further details, see references below.

Shapley Values — Example

Let’s see how this works in practice.

An xgboost — machine learning model is trained on a credit card portfolio in order to be able to classify potential default clients.

The input features in this example are:

MARRIAGE … married, single or others
LIMIT_BAL … credit card limit
PAY_AMT1, PAY_AMT2 … amounts of payments in previous months
PAY_0, PAY_2, PAY_3 … history of payments in previous months, like 1 = pays duly, 2 = payment delay for 1 month, 3 = payment delay for 2 months and so on

The resulting classification of the model for a potential borrower is then 0 for non-default and 1 for default.

In this example, we computed the Shapley values with Kernel SHAP method based on a Conditional Inference Tree Approach using the shapr — package/ Rstats. The contribution of each of the input parameters to the classification result is as follows:

On the y-axis the input parameters which were implemented in the xgboost-model are shown. On the x-axis the impact of the respective parameter on the model result is shown. All this yellow, orange and purple points are single observations of the data set the model was trained on. The color of the points represents if the feature value is low (e.g. low credit limit) or a high one (e.g. extended months of delays in repayment) for a certain observation.

The outcome of the Shapley values in this example is to be taken with a grain of salt. The example is for illustration purposes only, the features of the model are not really engineered and the pooling of different input features was limited.

Nevertheless, there could be already interpretation of the Shapley values. Take the parameter ‘credit limit’ as an example. Impact on the classification result is in both directions not so heavy. But it shows, that rather small credit limits are more contributing to defaults. This could be an indication that smaller limits are more easily granted to borrowers with rather low repayment capabilities.

Moving to the PAY_x — parameters reveals that lower values (of course) are rather not leading to a default classification. Though, the impact is not so deep. Higher values (e.g. 9 = payment delays of 8 months and above) have already a clearer impact on the model result.

It should be noted that the impact of each input parameter on single observations can be quite different. This can be seen in the following graph for 4 randomly chosen observations from the data set.

Additionally -and this is definitely one of the major limitations of this method- that with every input feature added the computation of the Shapley values gets computationally expensive very, very fast. This calls for a wise feature engineering when developing the machine learning/ deep learning model and a sufficient computational infrastructure.

Conclusion

An important question in machine learning/ deep learning is why an algorithm made a certain decision.

Being able to interpret individual predictions of machine learning/ deep learning applications helps to bring transparency in data driven decision making environments and hence, helps to match governance standards within ESG policies.

Shapley values compute the dominance of an input variable by comparing what a model does with and what it does without the feature. This is done in every possible order as the order of input variables in which a model sees features can affect its predictions.

The concept of Shapley values is regarded to be the only model-agnostic explanation method with a solid theoretical foundation. Model-agnostic means that the Shapley values can be calculated without any knowledge of the model’s internal mechanisms as long as inputs (i.e. observations and input features) as well as the output (i.e. model result) are on the table.

And, the theoretical foundation is deduced from Game Theory.

To leave no doubt:

A lot of financial institutions still stick to logistic regression in building up their “advanced” credit scoring systems while cutting out the potential of more effective credit scoring applications. The reason is to be found in the fact that it would otherwise not be possible for them to hold up the necessary transparency and to explain what the models are doing as requested by financial authorities.

Shapley values is one more toolkit to change this attitude!

References

A Value for n-Person Games by Shapley L. published in Contributions to the Theory of Games II, Princeton University Press, page 307–317/ 1953

Cooperative Game Theory: Basic Concepts and Computational Challenges by Georgios Chalkiadakis, Edith Elkind and Michael Wooldridge published in AI and Game Theory, page 86–90, IEE Computer Society/ 2012

The Shapley Value for ML Models by Divya Gopinath published in Towards Data Science/ October 26, 2021

Shap (SHapley Additive exPlanations) by Scott Lundberg contributed on Github/ latest commitment October 20, 2021

shapr: Explaining individual machine learning predictions with Shapley values by Camilla Lingjaerde, Martin Jullum & Nikolai Sellereite published in cranr.r-project.org/ 2019

Interpreting complex models with SHAP values by Gabriel Tseng published in Medium/ June 21, 2018

SHAP Visualization in R by Yang Liu published in Yang’s Research Blog/ October 14, 2018