Automated profile picture moderation at BlaBlaCar using Deep Learning

Raphaël Berly

Published in

BlaBlaCar

6 min readMar 9, 2023

A story of how we drastically reduced manual profile picture moderation at BlaBlaCar.

Profile picture moderation at BlaBlaCar

More than 19 million profile pictures were uploaded to the BlaBlaCar carpooling platform in 2022. This represents roughly 53k pictures per day.

These profile pictures help build trust in our carpooling community. They help carpoolers know who they carpool with, and recognize each other at the meeting point.

Profile pictures help you know who you carpool with, and easily recognize them at the meeting point.

Our Community Relations team has been moderating pictures for years, since BlaBlaCar was originally created. They help maintain the highest level of quality on the platform by ensuring profile pictures respect a set of rules. Out of the 53k pictures moderated daily, roughly 25% are refused.

Profile picture moderation is done using a set of rules presented in this table.

Until late 2021, all profile picture moderation was done manually. The amount of work was significant (roughly 12 FTE), and led to substantial delays in the approval of profile pictures during peak times on the platform. For instance, the delay approached 3 days during the French strikes in December 2019, vs a few hours only usually.

Designing an automated pipeline

The Data Services team proposed designing an automated pipeline for profile picture moderation. Our objective was to automate most of the profile picture moderation work, without making compromises on our quality standards.

The pipeline takes a picture as an input, and returns the following:

A label: either “refuse”, “accept”, or “send to manual labeling”
Coordinates: In the case of an accepted picture, we perform a crop of the picture around the face.

We split the task into sub-tasks, based on whether or not they were generic or BlaBlaCar-specific:

Detect faces in the picture and crop them: Generic. If there is not exactly one face, then refuse the picture. Else, crop the picture around the face coordinates. To perform this task, we used an open-source library called face-detection. It enabled us to locate faces on pictures using a one-liner of Python code, with an estimated accuracy greater than 99%.
Predict whether the picture is to be accepted or refused: Specific. This requires training a classifier to learn the moderation rules of BlaBlaCar. We will focus on this task in the next section.
Detect celebrities: Generic. In some countries, uploading a celebrity’s picture as a profile picture is a common practice. In such countries, we need to detect celebrity pictures and refuse them. This is a complex task: Detecting faces is one thing, recognizing faces is something else. Luckily, it is not specific to our use-case. For this task, we decided to use AWS Rekognition, a paid API thanks to which less than 5 lines of code were sufficient to detect whether or not a picture was that of a celebrity.

Focus on the BlaBlaCar-specific classification task

We built a classifier whose purpose is to predict whether or not a picture cropped around exactly one face should be accepted or not, given the moderation rules of BlaBlaCar.

Building the dataset

We collected the data using manual profile picture moderation logs, which gave us access to both the raw picture (before cropping) uploaded by the user, and the associated label (“accept” or “refuse”).

Then, to obtain our final dataset, we applied the same preprocessing that would be applied in production:

Detect faces on all pictures, using the open-source Python library mentioned above;
If there is not exactly one face, then exclude from the training dataset. Otherwise, crop the picture around the face coordinates.

Training the model

To train the model, we used a technique called Transfer Learning. According to Wikipedia, this technique consists of “storing knowledge gained while solving one problem and applying it to a different but related problem.”

Concretely, we followed these steps:

We took an open-source deep learning model pre-trained on ImageNet, an expansive open-source picture database, to perform generic classification tasks.
We froze all layers of the network, except two fully-connected layers at the very end. They constitute what we call the “specialization layer.” When fitting the model to our profile picture dataset, these layers will adapt the general knowledge held by the pre-trained model to the specificities of the BlaBlaCar profile picture moderation task.

We used a pre-trained model and specialized it on BlaBlaCar pictures.

Using this approach, we reached a ROC AUC above .98, which basically means that the predictive power of the model was very good.

Partial automation of profile picture moderation using the model

The distribution below shows that the model is able to separate the data very well:

The data is well-separated by our classifier.

However, in the middle of the distribution plot, one can observe an area where there are roughly the same number of refused and accepted pictures. We call this the “gray zone.” It is the part where it is the most difficult to make a decision based on the model’s prediction, since it will be close to neither 0% nor 100%. It is also the part where the model’s prediction is the least reliable.

In order to maintain the highest quality standards on the platform, we decided to send the pictures in the gray zone to manual moderation, as the following graph illustrates:

The gray zone is sent to manual moderation. The rest is automatically accepted or refused.

Furthermore, we wanted to continuously monitor the model’s quality. Thus, we decided to collect unbiased labels by sending a random sample of 5% of all pictures to manual moderation, regardless of the model’s prediction. This would enable us to continuously measure the model’s performance, notably its precision and recall. The collected labels (either accepted or refused) also help retrain the model.

Using this approach, we managed to automate 80% of the picture moderation, while increasing its accuracy to 97% (vs. 93% when manually moderated). The remaining 20% are sent to manual moderation. The collected labels will also be used to retrain the model, and will help make it better at predicting complex cases that are currently close to the decision boundary.

The model is in production at the time of writing, and has been for a year. It requires very little or no maintenance, since we do not observe drift on the task.

Key takeaways

If you had to remember 3 things from this read, here is what it might be:

To build an automated machine learning or deep learning pipeline, split the work between generic tasks and non-generic tasks. If a task is generic, A package likely already exists that does exactly what you are looking for. This way you can focus your attention and energy on the tasks that are specific to your use-case.
Transfer learning helps you achieve great results with very little effort and labeled data, by using pre-trained models and adapting them to your specific tasks.
Human input is still valuable, as it provides the raw material for training the model: labels. But by automating the easy decisions, you can focus human attention precisely where it has the most added value: on the hard cases, closer to the decision boundary, and on quality monitoring.