AI-watchdog for public procurement.

5 min readJan 22, 2019

In summer 2018 we went live with a pilot of a monitoring system for public procurement, based on Machine Learning. In this series of articles, I will describe what was our objective, which results we arrived to and where we are moving further.

Despite the early stage of the experiment, we already received some measurable results and, most importantly, gained some experience to share and learn from.

Hopefully, someone will grasp some inspiration to build anything similar or make a breakthrough in this field. Also, it will be fun to come back to these memories later and see, how naive we were at the beginning of the journey 😉

Background

In 2016 Ukraine made a substantial change in the area of public procurement — we entirely centralized all data about public procurement and made it available to the public. This system is called “Prozorro”, slightly modified Ukrainian work “transparently”.

In this context, “data available to the public” meant more than simply dashboards with aggregated information but also detailed, accurate, auditable real-time transactions. As of January 1, 2019, Prozorro contains details about 1,14 million tenders and 1,61 million reports on direct purchases. So, anyone skillful can build independent analytical systems on the top of it.

Obviously, in spite of digitalization and increased transparency, public procurement required serious monitoring and control by civil society.

Shortly after Prozorro was launched, Transparency International Ukraine started with another project — non-government monitoring portal aimed to control diligence of public procurement. In August 2016, as my mission was completed, I resigned from the position of Prozorro IT Project Manager and, together with colleagues, who also were part of Prozorro IT team, joined the project. It was called “Dozorro”, a mix of Ukrainian word “watcher” and Prozorro. In other words, “watchdog for Prozorro”.

We started with collecting feedback from the market about individual tenders guided by the goal to become “Tripadvisor of public procurement”. Then Transparency International built a community of activists in charge of reacting to the feedback received on Dozorro. When they found evidence of indecent work of the organization executing a tender, they worked with the buyer and, if needed, with law enforcing authorities in order to improve the situation.

Pretty soon we understood the limitations of this approach. Feedback, which we received, covered just small fraction of the tenders and 99% of the tenders stayed outside the focus of Dozorro activists. And this feedback was not official, thus useless for governmental auditors.

That time, two initiatives started almost simultaneously and with some degree of similarity:

1) Development of an official public procurement monitoring system, lead by the Government and supported by EBRD

2) Development of an unofficial monitoring system, lead by Transparency International Ukraine and supported by TAPAS (USAID project), Omidyar Network, and Open Contracting Partnership.

I myself was involved in the second project. There was no intention to duplicate the official system and, having freedom of non-government organization, we had an opportunity to experiment.

Objective

The team, comprised of experts from Transparency International Ukraine and civil activists developed a set of risk indicators, which could be automatically calculated for each particular tender in Prozorro. These risk indicators were used by the activists to look for tenders with the largest probability of misconduct of the buyers.

There was a certain bottleneck with this approach as well — risk indicators were diverse and made up thousands of combinations. In some combinations risk indicators compensated each other. In other combinations it was just the opposite — risk indicators amplified each other. And prioritization of all possible combinations manually was impossible.

So, we decided to hire a machine to do this job :)

Training the machine, first steps

First, we created a team of experts from Transparency International Ukraine who volunteered to train the model.

We did not ask them to think about algorithms which may be applied. Instead, we selected a set of tenders with different risk indicators, trying to make a representative selection. Out of this set, we formed pairs of tenders. Then we showed these pairs to our experts, asking them to decide, which tenders out of each pair deserved more attention to investigate it.

Their answers formed the basis for the initial training of the machine. In other words, we asked a machine to identify patterns, used by the ‘collective mind’ of our experts to prioritize different combinations of risk indicators.

Sounds easy and obvious? In real life, it was not so :( We tried different approaches and spent hundreds of hours doing the job, which then was simply rejected and restarted. And we had to develop a number of tools, to make our job easier. In the next articles, I will give some details of it.

Beta-test

At the point when the machine showed some reasonable results, we started a closed beta-test. We applied the model to the entire population of tenders in Prozorro and asked civil activists to check the tenders with the highest risk ranking, calculated by the machine.

Findings of activists were registered in Dozorro and measurement of the overall impact of the new model was relatively an easy task. We simply compared an average number of problems, identified by the activists per month, before and after new model introduction. For our experiment, we selected three types of probable misconduct: ‘unfair selection of the winner’, ‘‘groundless disqualification of a tenderer’ and ‘collusion between tender participants’.

The results of the first beta test were the following:

unfair selection of the winner: +26% growth after implementation of the new model
groundless disqualification of a tenderer: +37%
collusion between tender participants: +298%

As you see, all the three metrics improved far above statistical error.

Training the machine, next steps

At the beginning of the project, we thought that we will finish it after completion of the beta-test and analysis of results. Later on, we realized, that it was just a beginning of the journey.

Now we see how to improve the model substantially.

Initial training, based on prioritization of pairs by experts, was a good start. But for a sustainable improvement of the system, we need another solution. Currently, we are changing the model to consider real findings of activists, registered in the system.

The moment activist registers misconduct in the system, AI will re-evaluate its model and recalculate weights of risk indicators in the way to increase the accuracy of identification of new risky tenders. In a way, it will make the system really self-trainable.

To be continued

In the next articles, I will go to the details of this project. Stay tuned :)