Classifying the needles in the data haystack

Facing multiple challenges developing a tool to diagnose for Malaria

Christiaan Schoenaker
Orikami blog
13 min readNov 26, 2018

--

Momala is a project company from orikami, that is developing a tool to diagnose for Malaria. This blog post is about one of the challenges Momala faced in its development process. The challenge is not limited to only our situation or even to machine learning. It boils down to how to deal with a yes/no decision when that decision relies on many relevant conditions of which you know some will be wrongly assessed.

The blog starts with the context of detecting malaria parasites and becomes more abstract towards the end. The more in-depth parts of the text are put in grayed textboxes.

What we do

At Momala we are good at digitally detecting malaria parasites in (digital) photographs of thick-blood smears taken through the lens of a microscope. We developed an algorithm that can classify individual parasites, white blood cells and other objects visible in the photographs of blood smears.

The blood smears are stained with Giemsa to enhance the contrast, as is a routine procedure in Malaria Microscopy. The Giemsa-stained thick blood smears are then scanned by taking 100 pictures with a mobile phone through the lens of a bright field microscope set to 1000x magnification. We developed a framework (application) to guide the scanning and to feed the offline algorithm with the images. In total, we scan 0.2 microliters (µL) of blood to detect the presence of malaria parasites.

If you are curious about our complete solution and why we think this will improve malaria disease management we wrote other parts that might interest you.

The challenge

Sometimes it is like finding a needle in a haystack when we are looking for malaria parasites (Plasmodium) in the scanned images of blood smears. A patient that is infected with Malaria has a certain concentration of parasites in its bloodstream called the parasitemia, which is measured in parasites per microliter (µL¯¹).

The wide range of parasitemia levels that can occur, make it hard to spot parasites or to discriminate them from debris. We have seen samples with parasitemias ranging from 1.000.000 µL¯¹ to 1 µL¯¹. For the latter, it is rare to find a parasite in our scanned sample of blood of only 0.2 µL.

The parasites are very small compared to the total volume that is scanned. The combination of rare and small makes it difficult to spot a parasite in low parasitemia levels. In fact, by comparison, that would be like finding a medium sized pinhead in a haystack as tall as an average human.

Analyzing a volume of 0.2 µL of blood equals scanning 2.2 mm² of surface area by approximation. The trophozoite phase of the Plasmodium falciparum (P.f.) parasite, that is typically seen in these low parasitemia levels for (Malaria) P.f. infection is roughly ~3µm² in size, or in the order of one millionth the size of the scanned area.

The microscope can only focus on one focal plane at a time. This means that the parasite can still hide in a layer of the blood smear that is not in focus. A comparison by volume, e.g. with focal-planes, will show that it’s even more difficult to spot a parasite. The volume of the parasite is in the order of one billion smaller than that of the analyzed blood sample.

Other objects in the blood smear (white blood cells, debris, platelets, etc.) also pose a challenge, even when the parasitemia is higher than 1 µL¯¹. Some of these objects obfuscate the spotting of the parasites, e.g. debris. Debris can be leftovers from the staining process, broken white blood cells, etc. Telling a piece of debris and a parasite apart is not easy and is usually left to a WHO certified level one expert. An improved analogy would be: Instead of finding one medium sized pinhead in a haystack, we would need to find multiple pinheads in a pool of pinheads of a slightly different colour.

Better safe than sorry?

We now let go of the analogy of the haystack. We make the challenge more abstract by generalizing parasites to red dots while white blood cells are larger black dots. We assume that a scan is a subsample of the whole blood smear sample and that the blood smear is flat, smooth and has no volume, i.e. only one focus plane is available and required to scan the sample.

The figure below demonstrates how one field-of-view (right) is taken from the blood smear (left). The field-of-view shows different objects: white blood cells (black dots), parasites (red dots), background (white) and debris (orange dots). Can you see the difference on your screen?

Abstract representation of a blood smear (left) and a field of view (right). The Field of view is an enlarged part of the blood smear. Individual white blood cells (black dots), parasites (red dots) and debris (orange dots) can be distinguished.

The parasites (red) can easily be confused with other objects in the blood like debris (orange) and even background (white) and is true for both the Momala algorithm and for humans, including routine microscopists. That is the reason why there are only a few well trained, WHO certified level one malaria microscopists that can make that distinction. Yet some differences exist between how algorithms and humans classify objects.

The algorithm can only fall back on patterns it has studied in the past when asked to classify the objects. New, strange shaped and/or slightly discolored objects can throw the algorithm off balance leading to false positive and false negative object classifications. This bias is always present with such a classification process, e.g. diagnostics, and is reflected in the sensitivity and specificity specifications of that process. The algorithm, however, makes many (thousands) of classifications per blood sample. One false positive (that is very likely to occur) may give the wrong impression that the blood sample is positive for Malaria.

One might argue that when the parasitemia is high enough, it should not matter whether some of the objects are misclassified. In that case, we have enough evidence that there are at least some parasites to justify a positive Malaria diagnosis. However, we do not know the parasitemia beforehand and we cannot exclude the possibility that all the detected parasites are mere false positives and that the sample is negative.

Even with a near perfect classifier, we can never rule out the possibility that multiple objects are misclassified. This is caused by the enormous number of objects that were scanned in the blood sample. Telling the classifier that it is better to be safe than sorry does not resolve the issue. To many classifications will be pulled to the “be safe” side as a result of this rule. Consequently, all blood samples will be classified positive for Malaria. We need an extra step to take the classification of individual objects to a classification for a whole blood smear sample.

Patching the image

Images of blood-smears show parts that clearly do not contain any objects, e.g. background. The other parts contain objects with some being potential parasites. The Momala algorithm is smart enough to only classify potential parasites by separating the background from objects. That way the algorithm can spend more of its precious processing time on deciding if an object is a parasite than to be kept busy analyzing parts that are clearly background.

Regardless of how smart the algorithm is, it still needs to scan the whole image one “patch” at a time to decide were the potential parasites are located. Once identified, the algorithm will evaluate the patch containing the potential parasite. We call this algorithm the patch-classifier (or classifier)

A patch is a part of the image where the algorithm can focus on. By using a grid, we can select patches from the image. For simplicity, we visualize that patches do not overlap and assume that parasites belong to one patch only.

We can always impose that the patches do not overlap and that parasites only belong to one patch only. In this abstract example, the parasites do not have an intrinsic size, they are points. We can increase the number of patches, make the size of a patch arbitrarily small, in such a way that a patch only includes a maximum of one parasite.

As previously mentioned in this post, actual parasites do have a size. The algorithm uses multiple overlapping grids to detect the “potential parasites objects”. Each potential parasite is then assigned its own patch and is placed in the center of that patch. Multiple parasites can end up in one patch if parasites are to close together, however this is rare. Continue reading to find out how you can calculate the probability yourself!

An abstract representation of an image of a field-of-view. The grid that is shown divides the image into patches. The patches contain either “nothing” (white), white blood cells (black dots), parasites (red dots) or debris (orange dots).

The probability of finding a parasite

A patch can either contain a malaria parasite m=1 or not m=0. In machine learning, this is also known as the ground-truth of that patch. What the classifier picks will depend on how likely it is to make an error; Type I (False Positives) & Type II (False Negatives) errors to be precise. Likewise, the classifier can also make the correct call (m = m*), or True Positives & True Negatives).

The probability of classifying malaria (m*=1) will be the sum of all combinations the classifier can come up with for a positive answer: the possibility of a True Positive times the probability that the patch contains a parasite, plus the possibility of a False Positive times the probability that the patch did not contain a parasite.

The sample has a certain parasitemia (that can also be equal to 0 µL¯¹). With this information, we can estimate what the expected number of parasites per image (field-of-view) would be.

It is assumed that the parasites are uniformly distributed across the blood smear (and fields-of-view & patches), e.g. every spot is equally likely to house a parasite. The distribution of the number of parasites in a patch is therefore known; a Poisson distribution that is dependent on the actual parasitemia.

For microscopists, one could argue that they consider a whole slide (with multiple field-of-view) as one big “patch”. While the algorithm can only look at a small area of the image, human eyes are not limited by such construct. We keep the concept of field-of-view but consider it only to be a measure of how much surface area of the smear has been scanned.

Calculating the odds

We can calculate the odds of finding at least one parasite in a sample blood smear. This outcome is depending only on the parasitemia and volume of the sample, i.e. the number of fields-of-view with a fixed volume. If only one field-of-view is examined, we need a parasitemia of 800 µL¯¹ to get an 80% probability that at least one parasite is in that one field-of-view. Looking at more field-of-views, e.g. 100 fields-of-view, we’ll see that a parasitemia of only 8 µL¯¹ is required to get an 80% probability that at least one parasite is in one of that one-hundred fields-of-view.

Figure 1: Four cases are displayed were microscopists would look at (multiple) field-of-view (1, 10, 50 and 100). Scanning more fields, increase the chance that one might come across at least one parasite. The same is true for higher parasitemia levels. If the parasitemia is higher, it is more likely that one of the examined field-of-view contain parasites.

The above graph and calculation do not consider the possibility of missing the parasite completely (the type I and II errors). The Poisson distribution can be used in combination with conditional probability to estimate the probability of classifying a parasite in a patch (including the possibility that the patch-classifier algorithm might be wrong!).

From patch to blood smear

With Momala, we do not look at only one field-of-view, but many (One hundred to be precise), each holding numerous patches. The probability that at least one patch is classified to contain a parasite is almost 100% because of Type I errors, even when the error is small.

Given our knowledge about Type I, Type II errors of the patch-classifier algorithm, the number of patches we look at, and the source parasitemia, we get a good picture of what we might expect as “classified” parasitemia. Figure 2 shows the relationship between the Type I error (False Positive patches), true parasitemia and the expected classified parasitemia.

Note that the lines only show the expected value, e.g. the expectancy of the distribution. The classified parasitemia will be drawn from a distribution that also depends on the number of fields-of-view.

Figure 2: Five cases are shown with various imposed Type I error rates. The expected classified parasitemia increases when the parasitemia increases, but the Type 1 error rate, causes a lower limit to the parasitemia. Below that point, the Type 1 error rate will cause the algorithm to wrongly identify many non-parasites. Only when Type 1 error is equal to 0% the expected parasitemia matches the true parasitemia.

For lower parasitemia levels, figure 2 shows us that a low Type I error can already contribute to an expected classified parasitemia that doesn’t come near the true parasitemia. A common solution for these kinds of issues is to enforce a detection limit.

When we look at a real-world example, we see similar behaviour for the classified parasitemia. Figure 2 only shows the influence of the Type I error. The True Positive rate also has an effect. When the True Positive rate is lowered, the lines from figure 2 shift downwards.

Figure 3 Parasitemia estimated with the use of (part of) the Momala algorithm v1.0. Six sample Blood smears are analyzed three times each. Each smear is diluted by a factor 10 from the previous.

The estimated parasitemia does not adequality predict the true parasitemia. Figure 3 shows data points that are in line with what we theoretically predicted. The True Positive rate is lower than 1 because the estimated parasitemia does not follow the true parasitemia line for high parasitemia levels. The Type I error that is between 0.1% and 1% (which is quite low), prevents us from estimating the parasitemia correctly. Momala developed an extra step to solve this challenge.

The extra step

Momala solved the discrepancy that exists between the expected classified parasitemia <ρ*>and the true parasitemia ρ. By looking at many (thousands) scans of sample blood smears we discovered a pattern. By reversing this pattern, Momala solved the challenge. We called the extra step the “post statistical analysis” or PSA.

In contrast to the before mentioned theoretical example, the reality is much more complicated. The marginal probabilities are not known beforehand. What we do know is the confusion matrix CM of the algorithm that scans the patches for parasites. A link exists between the marginal probabilities and the confusion matrix C=f(CM). The function is one of the parts that we find in the pattern discovered by the PSA implementation.

Compared to the example from the previous paragraph, the algorithm does not only recognize a generic malaria parasite (Plasmodium) but makes a much more elaborate distinction on many classes; 3 types of Plasmodium parasites, white blood cells, and debris. The parasites are known to have multiple (life-cycle) stages and the white blood cells have many subtypes, that all look different. Debris is the most diverse of the classes; in both shape, colour, and size. In addition, the algorithm needs to work with images that are not perfect. The scans of blood smears can be blurry or having out-of-focus objects, lighting conditions may fail, or the smear is badly stained.

The patch-classifier algorithm has a feature that can help us. We get back additional information of sorts on every patch, i.e. a “score” for each of the categories the patch might belong to. The score cannot be interpreted as a probability directly; however, it does share some common properties. The score (also called the Soft-max) of all categories adds up to 1. A higher soft-max value for one category implies that category is more likely to be true. All-in-all, we bring together all these insights the algorithm provided on the patches in the sample blood smear. We matched that to the many blood smear samples that we collected to find a trustworthy way of estimating the parasitemia.

The parasitemia is only one of the tree diagnostics results that the Momala algorithm provides. Next to parasitemia, also species and Plasmodium status are returned. Of course, when there are no Plasmodium parasites, there are no species to identify and no parasitemia to estimate. The tree diagnostic results have a lot in common. Consequently, the Plasmodium detection (and the species identification) are both integrated into the PSA-algorithm that looks for patterns. The patterns that the algorithm find are too complex to be interpreted by a human, so, unfortunately, we cannot visualize this in this blog post.

Solution

The algorithm that is developed is a very powerful tool for malaria diagnostics and other domains that cope with similar issues. One might think that very powerful computers are needed to run such algorithms. Momala achieved — because of our expertise and determination — to run this power algorithm on a mobile device (2017). Analyzing 100 images with a resolution of 12 megapixels only takes 5 minutes on an HTC U11 smartphone.

Screens of Momala’s solution. Left: the camera view where the image is captured through the lens of the microscope. Right: the results view after the algorithm has finished processing 100 fields-of-view.

At Orikami, we have solved the Momala problem by building an advanced analysis tool called the PSA, that compares the many features that are outputted by the patch-classifier for each object to give a result on malaria infection and parasitemia levels. We are continuously improving this process with new insight and new data.

Do you also need to solve an issue where multiple classifications are confusion you or are you interested to hear if our solution works for you? Please do not hesitate to contact us, and maybe we can help you out.

--

--