Optimizing Banknote Sorting Machine Settings

Michiel Nijhuis
DNB — Data Science Hub
7 min readOct 11, 2022

A multi-objective genetic algorithm approach

DNB, the Dutch Central Bank is responsible for the quality of the banknotes in the Netherlands. To monitor the quality of banknotes, banknotes are tested regularly. During the sorting of banknotes, the quality of a banknote is tested. During its lifetime a banknote could tear, people could apply tape or marks to it, or it can get dirty. In these cases, the banknote is deemed unfit, that is, the quality is so low that it should be removed from circulation. Once banknotes are deposited, they are first sorted and tested at external cash centers. If a banknote is determined to be unfit, the banknote is sent to DNB. DNB analyses these banknotes again to test whether they are really unfit or can still be used. This is done with an identical sorting machine. The second check should produce the same results, because the machines are calibrated in the same way. However, in practice the results often differ significantly. A percentage of the banknotes are thus deemed unfit, while the banknotes are actually fit. On the plus side, this second check reduces the number of fit banknotes that are destroyed. Furthermore, it also means that in some cases transport might not have been necessary. Reducing transportation saves on costs and CO2-emissions.

Our goal was thus to adjust the settings of the sorting machines at the external cash center such that when a banknote is tested by DNB the conclusion would be the same as when it was tested at the cash center. The settings we can adjust in the sorting machines are the thresholds of the sensors that determine whether a banknote is no longer fit. The setting of the thresholds is basically a combinatorial optimization problem and we chose to apply a genetic algorithm to the banknote data. Let’s dive in and see how we did this. We first define two basics concepts:

False unfit
The number of fit banknotes sorted at DNB over the total number of unfit banknotes sent to DNB.

False fit
The number of unfit banknotes wrongly classified as fit over the total number of fit banknotes.

Ideally our new thresholds lead to a reduced number of false unfits without increasing the number of false fits. Realistically this will probably be unattainable so we have to trade-off between the two. For the trade-off, we need to find the possible optimum points where the amount of false unfits cannot be reduced without increasing the number of false fits. The problem is thus a multi-objective combinatorial optimization problem. So, we apply the Non-Dominated-Sorting Genetic Algorithm II (NSGA-II) to solve our problem. Let’s see how this algorithm works step by step.

Random search

The simplest way we could find new thresholds is just to randomly select a set of thresholds and evaluate the performance. In the figure below we show how well this performs for 50 iterations of 100 random solutions:

Performance for a random search approach

The dark blue dots show the Pareto optimal solutions. This means that for each of those points we cannot reduce the amount of false unfit without creating more false fit. As there are about 1.000.000.000.000.000.000 possible combinations of thresholds, finding optimal ones in this way will take endless amounts of time.

Mutation

So, the first step to improve the search is to take the Pareto optimal points and start randomly adjusting their thresholds. So, you make variations of the current best points in the hope of finding better ones. This process is called mutation, below you can find the code used for the mutation.

In the code we use pop as the variable for the population of our possible sets of thresholds. In our case the set pop contains 100 sets of thresholds. Based on the mutation factor we first select the places where the mutation should occur in the population. We do this by randomly selecting a matrix of 0’s (no change) and 1’s (change threshold). Next, we adjust the thresholds to a new random value from the possible options in the rule table (a table with the possible options for each threshold). We will only apply this mutation to the sets of thresholds that are on the Pareto front. In this way we are adjusting the already good values and the improvement in performance, as can be seen in the figure below, is much higher.

Performance for mutating the best results

The figure shows that by purely mutating the best results, the newly generated points are mostly close to the pareto front. Two phases can be distinguished: the first phase in which the front changes fast and becomes pronounced and a second phase in which the front is moving slowly towards the left corner. Next to applying mutation to generate new sets of thresholds we can also combine two sets of thresholds to find a new and possibly better set.

Crossover

This can be done with crossover. In this case we used a k-point crossover as the different thresholds are independent. Below you can find the code that we used for performing the crossover.

We first make a choices vector selecting which thresholds from parents_a should be used. The rest of the thresholds comes from parents_b. The child is created and the thresholds from parents_a and parents_b are written into the child vector. At this point we still face the question how to select the two parent vectors. For this we use a two-stage tournament selection, meaning we have two stages in a tournament where parent solutions go head-to-head. In a tournament, the solution with the highest rank wins and progresses to the second round, where the highest ranked solution will win. The rank of a solution is given by which Pareto front it belongs to. The highest ranked solutions belong to the Pareto front of all the data. The second highest ranked solutions belong to the Pareto front if we remove the highest ranked solutions from the data, and so on. Many points will still have the same rank, and hence to distinguish between these points we make a random decision. This can be implemented in code by choosing the best parent based on ranking from a group of 4 randomly selected parents, as shown in the code below. For both parents this is done 100 times, so we get 100 child solutions for the next iteration.

Adding the crossover to the code improves the optimization process. The shape of the Pareto front starts to become visible more quickly in the first phase and the second phase with incremental improvement yields more results.

Performance for a combined mutation and crossover approach

Now we are already getting to what seems like a good Pareto front quite quickly. There is still one more improvement we can make. If you look at the results you can see that the points on the left side of the Pareto front are spaced quite far apart and move towards an optimum rather slowly. This can be fixed by applying the crowding distance.

Crowding distance

When we are selecting parents, we are making a random decision if points are ranked equally. However, it seems intuitive to give points that are in more empty regions of the solution space an advantage as this region of the solution space has been explored less. This can be achieved by penalizing solutions that are close together. Basically, we try to fill up the empty spaces on the Pareto front. The penalty for each solution is calculated based on the crowding distance. To calculate the crowding distance, we use the code below:

In the code we calculate the distance for every solution to the next two solutions with the same rank. We add these distances to the rank to distinguish between solutions with an equal rank. For ranks with just one or two points, a random number between 0 and 1 (exclusive) is added to the rank as crowding distances do not exist for these points. Adding the crowding distance to our optimization will lead to the NSGA-II.

Performance of the NSGA-II algorithm

The NSGA-II finds the Pareto front fast and will give us a large collection of possible sets of thresholds. A trade-off can now be made on how the thresholds can be adjusted to get an acceptable amount of false unfit and false fit. For each of the points on the Pareto optimum we do have to check whether these results hold true for out of sample data as well. Therefore, we check all the points with a testing data set and only select the points of the Pareto front that are not changing significantly. This gives us some indication of the stability of the solutions and increases our confidence that a new set of thresholds will perform as expected. Note that the optimization of the thresholds should be rerun periodically to ensure the performance does not drift.

With the Pareto front we can now adjust the thresholds for the sorting machines to get a better grip on the number of false unfits.

--

--