Megaface Benchmark. What Does It Show? Part I.

Published in

Faceter

7 min readMar 15, 2018

If you are following us here, you probably have read the story about LWF benchmark. We are number six there btw. Recently, Faceter was listed on the website of another popular benchmark — MegaFace.

This benchmark was developed by the University of Washington for scientific use — to help anyone check the accuracy of algorithms. They created an open dataset, available for anyone and the procedure for measuring the accuracy of facial recognition.

It is not an industrial standard, because there are some ways that the results can be faked. Some companies started to use MegaFace for PR reasons. There are more and more such companies every year, and it makes the situation increasingly difficult. On one hand, you never know who has cheated, or who has published real results. The initial purpose of MegaFace as a tool for scientists has almost lost its meaning. On the other hand, MegaFace became a “must do” benchmark for every facial recognition product. That’s why we couldn’t skip it.

We have sent our results months ago, but it took a while to see them published. Our algorithms were improved since that time, and we are working on the next more advanced version of the product, which will be ready by the end of March. But anyway, we are glad to see Faceter listed among the other projects on the MegaFace website.

There are several necessary steps to get the results and see them published on the website.

How MegaFace benchmark works

Get Started

You need to register first — to fill out the form to get the access to three data-sets. You will receive an email with access information after it is approved.

The form is very simple. You need to provide the name of the team, affiliated company name, email and add the name of the researcher to the license.

2. Feature vectors extraction

When you get access, you can download three datasets:

Megaface (~1 mln pics), FaceScrub (~100k pics) and FGNet (~3500 pics).

You have to have a deep neural network already. Run your algorithm to produce features for all three data-sets.

Important: You may train your algorithm with any dataset except, for these three.

Why it this so? Because, if you train it with these three, your algorithm will work better and will show better results than it would in real life. This is the first possibility to cheat, btw. But not the last.

And it’s better to take a data-set that doesn’t cross with those three. For example, you shouldn’t use one of the most popular datasets, MS-Seleb-1M (one million pictures of celebrities), for training your neural network. Because FaceScrub consists pictures of celebrities, too, they both will have pictures of the same people for sure.

For example, you’ll probaby see Brad Pitt in both. It means that your NN will recognize Brad Pitt much better if it was trained with his pictures, and the result will not be objective.

In our case we use a proprietary dataset, which was provided by our partner from the financial industry.

3. Challenge

There are two different challenges. The difference is that in the first case, you can train your NN with any dataset (except those three, of course). In the second case, you need to download the special MegaFace dataset of 672K identities (4.7 Million photos) and use it for training your NN.

The most popular is the first one. All results of this challenge are published here (for FaceScrub dataset) and here (for FGNet). Let’s clarify what they mean.

Identification Rate vs. Distractors Size

On the second step you have prepared feature vectors of FaceScrub and feature vectors of MegaFace pictures. You will need them now.

The FaceScrub dataset comprises a total of 107,818 unconstrained face images of 530 celebrities crawled from the Internet, with about 200 images per person.

Guys from the University of Washington divided this dataset into three parts. And they sent only one part of it to all of the developers. You will have to use it to run special scripts, together with this piece of dataset. MegaFace dataset (one-mlnute pics) will be used as a bunch of distractors.

What does it mean? Let’s dig deeper into the process of comparison.

You need to take out pic #1 (feature vector) of the person #1 from FaceScrub and add it to the set of distractors.

Then you need to compare pic #2 of the same person from FaceScrub with every feature vector from MegaFace (including pic #1 that we added before).

The result of every comparison is the number that shows distance between two vectors. If your algorithm works well, the distance between feature vector of pic #1 and all distractor vectors will be bigger than the distance between vector of pic #1 and vector of pic #2 (because it is the same person in two different pictures). A smaller distance between pic #1 and pic # 2 is better. When you’ve finished the comparison of the first pic, you need to roll back all changings in two datasets. Then you take the next vector and do the same procedure. Then repeat with each pic from the FaceScrub set.

An obvious way to cheat is to blur or crop pictures from MegaFace before the vector extraction. In this case, the distance between pics from FaceScrub and pics from MegaFace will be really big. Nobody will know that distractors were changed — the participant has to send feature vectors and results only for review.

But maybe, we hope the University of Washington added some pics from FaceScrub to MegaFace as Easter Eggs. If they did, they know for sure which vectors must be at the closest distance. If pictures-distructors were changed, this will highlight it.

4. Results

At the end, you will have X cases when your alrothims was right — it recognized person on the picture correctly. Accuracy = X divided to all number of pics in the FaceScrub dataset.

Let’s look at the current table of the results:

Faceter is number 17 in the world. Not bad, right? Considering that the algorithm which was tested is out of date already and new one is much more powerfull.

You see that there are numbers for different sets: 1,2 and 3. As we mentioned before, developers receive only one part of the FaceScrub dataset. They send the results together with all three sets of vectors. The MegaFace team from the University of Washington tests them with two other parts of FaceScrub to check the results. The essence of this benchmark is to find out which algorithm extracts vectors from pics in the best way.

When you send your data, you should also define the size of the training dataset that you use. In most cases, it is large (0,5 mln pics and bigger), but sometimes it can be 100k-500k. And it is a really small size in this case. Of course, an algorithm that was trained with a large dataset will be stronger than an algorithm trained with ~100k pics.

If you look at the page with results, you can see two charts at the bottom. They weren’t updated for a long time. The purpose of them is to show how algorithm works with different numbers of distractors. If you have just one picture that you need to recognize and only one distractor, any algorithm will give 100% accuracy. But when you increase the number of distractors, different algorithms behave in different ways. You can see how dramatically some algorithms drop that were trained with small dataset. But not all of them — Beijing Deep Sense is pretty good and can compete with Google Face Net and other folks.

If you used a small training dataset and still your curve is at the top, it means that your technology (your CNN) is more advanced. But from a practical point of view, the most important factor is the implementability of your technology in real life. If you reached great results using CNN that consumes all of the resources of the data-center and works on vector extraction for hours, it can’t be used in real life, and it can’t be a part of a market product. If it were, this product would definitely cost a lot.

That’s why the MegaFace benchmark is not a contest, but it is just one of many ways to test your algorithms.

In Part II, we will tell you what “Identification Rate vs Rank” means and discuss the difference between FaceScrub and FGNet.

Megaface Benchmark. What Does It Show? Part I.

How MegaFace benchmark works

Published in Faceter

Written by Faceter Fog

Responses (1)