Bumble Inc open sources Private Detector™ and makes another step towards a safer internet for women
A version of the model powering the feature that detects and blurs lewd images is now available on Github
At Bumble Inc., the parent company of Bumble, Badoo, and Fruitz, user safety has long been a central part of our mission and a core value that informs the company’s product innovations and roadmap. We’ve leveraged the latest advancement in technology and Artificial Intelligence (AI) to help provide our community of users with the tools and resources they need to have a safe experience on all our platforms. In 2019 we launched Private Detector™ across Bumble and Badoo app, an AI-powered feature that detects and blurs lewd images and sends a warning to users about the photo before they open it.
As just one of many players in the world of dating apps and social media at large, we also recognize that there’s a need to address this issue beyond the Bumble ecosystem and to engage in a larger conversation about how to address the issue of unsolicited lewd photos — also known as cyberflashing — to make the internet a safer and kinder place for everyone.
In an effort to help address this larger issue of cyberflashing, Bumble teamed up with legislators from across the aisle in 2019 in Texas to pass a bill that effectively made sending unsolicited lewd photos a punishable offense. Since the passing of HB 2789 in Texas in 2019, Bumble has continued to advocate successfully for similar laws to be introduced across the rest of the United States and globally.
In 2022, Bumble reached another milestone in public policy by helping to pass SB 493 in Virginia and most recently SB 53 in California, adding another layer of online safety in one of the most populous states in the United States.
These new laws are the first step to creating accountability and consequences for this everyday form of harassment that causes victims — predominantly women — to feel distressed, violated, and vulnerable online.
As Bumble continues to help curb cyberflashing through legislative efforts and provide safety tools such as Private Detector™ to help keep our community safe from unsolicited nudes within our apps, we hope to create a ripple effect of change across the internet and social media at large. This is why today we are extremely proud to release a version of the Private Detector™ to the wider tech community with the hope of democratizing access to our technology and to help scientists and engineers around the world with the same challenges to improve their approach to online safety.
How does it work?
Since the early days of Badoo, we have always been pioneers in leveraging technology and advanced procedures to improve both our match-making experience and our integrity and safety capabilities. Behind the scenes, we started designing and implementing machine learning solutions for lewd image detection almost a decade ago. Well placed, in a dominant position in the dating industry, we have worked to leverage both our best-in-class knowledge in the tech space and the insights collected by our apps.
Machine learning (ML) is a field devoted to understanding and building methods that learn (or better, mimic) how to reach human-level performance on specific tasks, leveraging data to improve their accuracy. The development cycle requires you to carefully design and develop a neural network’s architecture and to provide it iteratively with a curated set of samples (dataset) from the problem — in our case, detecting if a picture contains lewd content or not.
Even though the number of users sending lewd images on our apps is fortunately very small — just 0.1% — our scale allows us to collect a best-in-the-industry dataset of both lewd and non-lewd images, tailored to achieve the best possible performances on the task. Our Private Detector™ is trained using very high-volume data sets, with the negative samples (the ones not containing any lewd content) carefully selected in order to better reflect edge cases and other parts of the human body (e.g. legs, arms) in order not to flag them as abusive. In all our machine learning endeavors over the years, we have iteratively added samples to the training dataset to reflect actual users’ behavior or test misclassification. This proved to be a successful exercise. Even if the downstream task is framed as a binary classification problem (as in our case!) nothing prevents data scientists from potentially defining additional concepts (or labels), and then merging them back just before the actual training epochs.
Exploring the trade-offs between state-of-the-art performance and the ability to serve our user base at scale, we implemented (in its latest iteration) an EfficientNetv2-based binary classifier: a convolutional network that has faster training speed and heightened parameters efficiency overall. It uses a combination of better designed architecture and scaling, with layers like MBConv (that utilizes 1x1 convolutions to widen the space and depth-wise convolutions for reducing the number of overall parameters); and FusedMBConv (that merges some steps of the vanilla MBConv above for faster execution). Together they do the job. The model has been trained leveraging our GPU powered data centers in a continuous optimisation exercise focused on dataset, network and hyperparameters (the settings used to speed up or improve the training performance) optimization.
When analyzing its performance in different conditions (both offline and online) we are proud to say that it delivers world class performance (>98% accuracy), both in upsampled and production-like settings, with no apparent tradeoffs between precision and recall.
What are we releasing today?
Concomitantly with this White Paper, we are releasing on Github.com the source code we used to train the machine learning engine that powers the Private Detector™. We are also releasing a ready-to-use SavedModel to deploy this version of the model as it is (using TensorFlow Serving) and a checkpoint for possibly finetuning it with additional images to improve its performance on samples that are important for specific use cases. In both scenarios, the repository comes with extensive documentation and a comprehensive user guide. They help make the experience as smooth as possible for all the scientists, engineers or product folks around the world.
This version of the Private Detector™ is released under the Apache License, making it available for everyone to implement it as the standard means of blurring lewd images, either as is, or after fine tuning it with additional training samples. Further improvements to the architecture or to the overall code quality and structure are welcome.