Fostering Trust on ML Inferences

Dalmo Cirne
Workday Technology
Published in
10 min readSep 6, 2023

Abstract

The Machine Learning teams at Workday have a tremendous responsibility to develop reliable AI and ML. Building ever more trustworthy ML inferences is a path to either increase the value of our products (i.e., increased trust in the results) and to engage in conversations with customers. In this article we examine the dynamic of trust between a service provider (Workday/Trustor) and service users (Customers/Trustees). Trustors are required to be trusting and trustworthy, whereas trustees need not be trusting nor trustworthy. The challenge for trustors is to provide services that are good enough to make a trustee increase their level of trust above a minimum threshold for: 1- doing business together; 2- continuation of service.

Introduction

The paradigm explored in this article assumes that trust is built by an initial altruistic act by the trustor, signaling that the actor is trustworthy. More specifically, Workday’s altruistic act would be to invest in building a product and offer it to customers with the promise that it will generate value to them; more value than what is paid in return for the service. The trustor decides how much to invest, and the trustee decides whether to reciprocate and give continuity tothe business relationship.

The objective is to make them [customers] trusting — above a minimum threshold T — as to engage in the Trust Game [1]. These games are extensions built on top of Game Theory [2]. Furthermore, trust has a temporal element to it. Once established, there are no guarantees that there will be a continuation; therefore this is an extensive form of the interactions, where both actors collaborate and observe each other, reacting to historical actions from one another.

Trust Games

The motion of a Trust Game is developed around two actors: a trustor (Workday) and a trustee (a Workday customer). The trustor has a service of value V to offer to a trustee. The value in question is quality machine learning inferences. Machine learning is implemented as a software service and, by its nature, software can be replicated to any number n of customers without physical constraints, thus V can be offered independently and concurrently to all customers.

Note that the nature of concurrency allows for independent actors (trustees) to observe and react to the actions of other actors.

The value V of ML inferences may be only partially utilized by a trustee. The limited, portioned, consumption could be due to a variety of reasons, including, but not limited to: eligibility/capacity to use all the features (i.e., satisfies all requirements), service subscription tiers, users have yet to be trained, and so forth.

In order to accommodate for such scenarios, the trustor may transfer the entirety of value V or a smaller portion p of it, where {p ∈ ℝ | 0 ≤ p ≤ 1}.

The initial remittance sent by trustor u is:

Equation 1

Depending on the quality of the results delivered by the trustor, the perception of value by trustees may be magnified or contracted by a factor K, where {K ∈ ℝ}. For K > 1, it means that the trustor improved the efficiency of operations for the trustee (they do better than operating on their own). For K = 1 the trustee is operating at about the same efficiency, and for K < 1 (negative values are also possible) the trustee is less efficient than before they started using the service.

The initial perceived gain G received by trustee v is:

Equation 2

A trustee is free to reciprocate or not. During a trial period they may choose to decline further service. Even if under contract, they may choose to skip renewal. On the other hand, assuming that the value received from ML inferences improved their efficiency, the incentive is to continue to engage. In either case a trustee will give back a portion q of the gain received, where {q ∈ ℝ | 0 ≤ q < 1}. The value sent back may take the form of monetary payment for the service, focus groups, interviews, usability feedback, labeling of transactions, or a combination of those. The repayment B expected by trustor u is therefore:

Equation 3

One could suggest the introduction of a magnification factor on the repayment from trustee v. That, however, is not necessary in the scope of this article, since trustees do not need to be trustworthy; trustor u is not evaluating whether to trust them or not.

Figure 1 represents the flow of the initial step in this trust game. The blue line segment represents the range of possible values delivered to trustees by the trustor, the large blue circle is the magnification factor applied to the value delivered, and the orange segment represents the range of possible values reciprocated to the trustor by a trustee.

Figure 1: Trust Game payoffs.

Regarding the magnification factor, for the cases where K > 1, the value received back by trustor u is positive and enables the necessary conditions for an extensive form of the trust game (long-term engagement). It becomes a strong indicator that trustee v trustiness towards trustor u is equal or above the minimum threshold T, where {T ∈ ℝ | 0 ≤ T ≤ 1}.

When 0 ≤ K < 1, the service is causing the trustee some form of disruption (in the sense that efficiency has dropped below the level prior to using the service). This would be acceptable during the development phase of a product where the trustee takes part of a beta tests program. In such situation, the trustee sees a benefit in participating, assuming future value in adopting the service and the ability to harvest the benefits early on.

Worst case scenario comes when K < 0. This could lead to rapid erosion of trustor trustworthiness, customer churn, and other negative outcomes.

Quantifying Trust

The aim of this trust game is to create the circumstances necessary for continuous and repeated interactions between trustor and trustee that take place over long periods of time, with no specified temporal upper limit.

After the initial remittance Ru (eq. 1), there may be residual value r on the trustor’s side that a trustee did not take advantage of. For instance, maybe not all product features are being used, inference happens in batches and data is yet to be sent through the pipeline, or some other reason. That residual value is what is left from V:

Equation 4

The accumulated value A for trustor u upon completing the first cycle is the residual value ru (eq. 4) plus the repayment Bu (eq. 3) received from the trustee:

Equation 5

On the trustee’s side, they will have received a value of Gv (eq. 2) and given back a portion q of it. The net gain N for trustee v at the end of the first cycle is:

Equation 6

Generalizing the gains for trustor and trustee for n cycles of the trust game, we have equations for trustor:

Equation 7

and trustee:

Equation 8

The objective is to maximize the payoff to trustee and trustor, establishing a region where the exchange of values is considered fair trade. As such, trust has to be repaid [3] (i.e., q > 0). The trustor benefits from economies of scale by the aggregate of payoffs from all trustees.

Threshold

In order for a trustor to increase its trustworthiness (Wu) in the eyes of a trustee, the gains delivered by the service has to be higher than if the trustee was operating on their own. Such condition is satisfied by the following system of inequalities:

Equation 9

That happens when the value of the remittance Ru is equal or greater than the threshold T (the value sent is at a minimum equal to the perceived value received), and the magnification factor K greater or equal to one.

Being a system of inequalities, it is also possible to have a lower remittance (pV < T ), as long as the magnification factor is large enough (K ≫ 1) to make up for the shortfall. Although possible, this would be uncommon.

Simulated Experiments

The following are a set of four experiments that simulate scenarios from fostering to eroding trust as result of the quality of ML inference outputs. In all cases, the trust game begins with the trustor having an initial arbitrary balance, let’s say one million points (1,000,000), to offer to trustees as added value.

The expectation is that by providing good ML inferences, a trustee would increase their trustiness level towards the trustor. And that less than good enough results would have the opposite effect (i.e., erode trust).

In each of the results, notice not only the shape of the gain curves, but also the scale, as they will vary substantially. Later we plot all four experiments side-by-side using the same scale.

Case 1: ML Inferences Add Value

For this first experiment, let’s go step-by-step in the first interaction. For subsequent experiments only the final graph plots will be shown. Irrespective of the experiment, they all can be reproduced with the source code in appendix A of this article.

The assumption in case 1 is that ML inferences are magnifying the value of the product (K > 1).

Assume that in the first cycle the trustor begins V = 1,000,000 points and was able to send a remittance of 65% (Ru = 0.65 × 1,000,000) of ML inferences value to a trustee. The magnification factor perceived by the trustee was K = 2, thus the gain is 1,300,000 (Gv = 2 × 650,000) points.

The trustee sent back a portion (q = 0.14) of the value back by interacting with the user interface, providing a feedback label, and paying for the service. The rebate received by the trustor was 182,000 (Bu = 0.14 × 1,300,000) points.

Adding the rebate to the residual value (ru = 0.35 × 1,000,000), the trustor’s accumulated gain is equal to 532,000 (Au = 350,000 + 182,000) points. And the trustee’s gain is 1,118,000 (Nv = 0.86 × 1,300,000) points.

The trustee perceived more value that what the trustor had to offer (win), the trustor received a rebate in the form of feedback, adding value that wasn’t there before (win). Last, after the aggregate across all trustees the trustor will have accumulated more than the initial value offered (win). Win-win-win.

Figure 2 shows the graph with the accumulated gains from the remaining cycles.

Figure 2: Accumulated Gains (K > 1).

Case 2: ML Inferences Are Neutral

In the second experiment a neutral magnification factor (K = 1) is being simulated — the value sent and the value received are perceived equally.

Throughout all four experiment simulations, all parameters are kept the same, varying only K.

Figure 3 shows the graph with the accumulated gains. The trustee receives increasing value, but trustor sees a small decline. That would be acceptable because the trustor’s final gain is the aggregate from all trustees.

Figure 3: Accumulated Gains (K = 1).

Case 3: ML Inferences Are Causing Inefficiencies

The third experiment shows a scenario, figure 4, where inefficiencies are being brought upon the trustee (0 ≤ K < 1). Their gains are at best marginal, and at the same time there is a significant drop in the trustor’s gains.

This situation would be plausible and acceptable during the development phase of a product and the trustee has accepted to be an Early Adopter of the service.

Figure 4: Accumulated Gains (0 ≤ K < 1).

Case 4: ML Inferences Are Leading to Rapid Erosion of Trust

The last experiment shows the worst case scenario, figure 5, where ML inferences are eroding the trustor’s trustworthiness (K < 0), therefore reducing the trustee’s ability to be trusting.

Trustee’s gains are negative, meaning that they are worse off by using the service than operating without it.

Figure 5: Accumulated Gains (K < 0).

Side-by-side Comparison

Here you can see all four use cases using the same scale on the graph. The only situation generating gains to both trustor and trustee is when the magnification factor K is greater than 1.

Figure 6: Side-by-side at the same scale.

Conclusion

This article has demonstrated that good ML inference results satisfy a valid criteria to increase a trustor’s trustworthiness, allowing for trustees to be more trusting.

According to what we have shown here, there exists a strong motivation for Workday’s ML products to provide ML inferences only when a minimum confidence level has been cleared. It would be better to not produce a result than to provide an erroneous one. When nothing is provided, a customer can still operate at their nominal level of productivity.

References

[1] J. Berg, J. Dickhaut, and K. McCabe, “Trust, reciprocity, and social history,” Games and Economic Be- havior, vol. 10, 1995.

[2] J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior. Princeton University Press, 1944.

[3] D. Kreps, “Corporate culture and economic theory,” Perspectives on Positive Political Economy, pp. 90–142, 1990.

Appendix A

The following is the Jupyter notebook used implement the framework shown here. It is very easy for you to reproduce the results, and also adapt it to run alternative scenarios.

--

--