The FaceApp Saga: The Elation, Gloom and the Future of Privacy

Dr. Santanu Bhattacharya
DataSeries
Published in
7 min readAug 18, 2019

Yet another day goes and yet another (rather, many) data and privacy breach stories keep showing up. Consider the list of major global breaches maintained by CYWARE.

August 16: Gray Harbor Community Hospital, 20 departments of the Texas Government, Camp Verde School District

August 15: NCH Healthcare, Capital One, Credit Karma, Hy-Vee (customer payment data at fuel pumps)

August 14: British Airways, Choice Hotels, “Biostar 2” Biometric System used by major Police Forces, Border Control and Armies, Indian Army Northern Command

and the list goes on…

We use the recent example of the FaceApp saga to do a deep dive of what happened and on what the future of privacy holds for us.

Unless one is living on a deserted island for the past month, you have seen amusing, and sometimes, baffling picture on social media of your friend, younger, older or of a different, all thanks to an app called FaceApp. FaceApp uses artificial intelligence-powered filters to gender-swap or radically aged selfies. Somewhat unlike most that go viral, this one had a long way coming. Released in January 2017, it went viral on both iOS and Android immediately, eventually losing momentum within 2 months. In its second reincarnation, it went viral again in July of 2019, and this time along, the infamy followed.

First it was concerns — that FaceApp was not only accessing the submitted photos, but also grabbing entire camera and contacts from the users phone. What did not help was the nationality of its founder, Yaroslav Goncharov, a Russian who had once worked on Windows Mobile for Microsoft and then co founded company that sold to Yandex, Russia’s Google, in a $38 million deal.

Given the current political situation in the United States, his nationality gave way to tons of clickbait articles about the story which were exaggerated. Russophobes raised concern about where all the face data was going, leading powerful Democrat Senator Chuck Schumer to ask for an FBI investigation into the app.

The Beginning of a Saga: One tweet at a Time

It all started with Twitter user JoshuaNozzi sending off a tweet claiming that FaceApp exports all the photos on the device without any interaction from the user. Joshua, as he describes later in a blog post, “posted this, was angry about it for a bit, then went to bed.”

Next day, his story was picked up by 9to5Mac, TechCrunch, then a day later Forbes had their own unique take on the event. Soon, the tweet and related claims went viral and was picked up by evert major news outlets around the world.

Nozzi later wrote a blog post explaining that he was mistaken and his original claim wasn’t correct. This blog, however, did not receive as much attention as his original tweet.

Figure 1: The tweet that was heard around the world

The Technology and Innovation Behind FaceApp

The technology behind the FaceApp is quite straightforward. In Artificial Intelligence (AI) technology, a neural network that generates an image, be it of a cat or a human, is called a generator. The generator takes in a noise vector, essentially a set of random numbers to produce the variety of faces it can produce — else, all faces generated by AI would look the same.

Training a generator to create a new, realistic-looking face requires large amount of data set created or curated by humans. When the generator does create an artificial face, a human being again critiques the results to provide feedback on which part of the face is not realistic. The generator makes adjustments and eventually, a realistic face appears that passes a human master.

In 2014, Google researcher Ian Goodfellow employed a new idea. Instead of human critic, it pitted a “adversarial” neural network that worked as an “art critic” to the original attempts by the generative neural network. Such Generative Neural Networks (GAN) made the task of producing realistic images much faster and cheaper, and became a boon for the digital advertising industry.

To produce an age or gender defined face, a “conditional” GAN needs categorical inputs, e.g., specific feature set for elderly or gender specific faces. A conditional GAN can thus generate realistic faces of a specific age, gender or any other category, including, how one could have looked as a neanderthal.

Figure 2. FaceApp’s identity-preserving, age and gender specific filters

FaceApps innovation seems to go further. In order to preserve the specific features of a person, it required a identity-preserving conditional GAN. The regenerative GAN could not have started with random noise vectors, but a stripped down version of a specific input face image. The “non-noisy” vector is then used for regenerating images that looked very much like one’s older self, or other gender self.

The Lessons in Privacy

Notwithstanding the general fear mongering about the Russians, FaceApp did have a vaguely worded privacy policy which caused the story to blow up even further. But as many people pointed out, many apps seemed to have similar privacy policies, including Instagram, which is owned by Facebook.

Figure 3. “Allow access to your photos”, combined with a poorly worded privacy policy was interpreted by Joshua Nozzi as “IT’S UPLOADING ALL YOUR PHOTOS”

The accusations about FaceApp were easy enough to debunk. People were able to intercept FaceApps network activity using network monitoring/ packet sniffing tools and verifiably prove that FaceApp did not upload all user photos without permission.

However, it begs the question about two things that are prevalent online:

“Free” products: When a service is free, the product is what you provide them with, personal data, images.

How did the users expect the FaceApp creator to pay for all the infrastructure and computation cost? By creating a database of millions of real faces and selling to other companies for training? Selling advanced photo editing features? Creating Deepfake videos for advertising? The possibilities go from benign to real dark ideas

Privacy: Tens of millions of people gave away their privacy in exchange for seeing how they might look like in 20 years.

In 2019, while enterprises face increasing regulation with GDPR and local data privacy laws, individuals seem to be oblivious to protection of their personal data

In fact, FaceApp was not hiding anything, A section of the terms, since modified, had said users grant the company “irrevocable” access to “use, reproduce, modify, adapt, publish … distribute” any name, username or likeness provided. Its viral success showed how little people scrutinize companies’ privacy policies before giving them access to sensitive information.

The Silver Lining

Eventually the story blew up, and that’s perhaps the silver lining in this. While it started falling apart due to an incorrect claim made on twitter, amplified by fear of Russian’s owning America’s face data. At the same time, most people did not seem to realize that they share similar or, even more information on other apps.

The public outrage and backlash forced Yaroslav Goncharov to change the Terms and Conditions. The good thing is that when informed, people do seem to care about privacy, and would prefer solutions which are secure by design.

Do something your future will thank you for

Is his all gloom and doom with no chance for escape, or course corrections? Machine learning products are relatively easy to build for, as demonstrated in this case, and once created, easy for its owners to manipulate. This paradigm is hard to change, but not impossible.

Fortunately, the timing seems to be about right: new technologies supporting privacy preserving AI are just becoming available. We consider the following possible recipes for solution.

Powerful AI solutions that do not require data collection or storage

Recently we have witnessed the beginning of a decentralized AI model, called Federated Learning, born at the intersection of on-device AI, blockchain, and edge computing/IoT. In contrast to the traditional AI methods, Federated Learning brings the models to the data source or client device for training and inferencing. The local copies of the model on the device eliminate network latencies and costs incurred due to continuously sharing data with the server. Being local, model response is hyper personalized for a particular user. Federated Learning utilizes computing and storage resources on the user’s device reducing cloud infra overheads even at scale.

Figure 4: A user’s phone personalizes the model locally, based on her usage (A). Many users’ updates are then aggregated (B) to form a consensus change © to the shared model. This process is then repeated

Additionally, Federated Learning techniques are privacy preserved by design. This is due to two very important factors: One, the data stays in users device and the AI models are brought to the data source, e.g., the mobile, IoT or other edge devices. Two, powerful encryption techniques such as Homomorphic encryption are used to ensure that the user’s identity remain private.

Adoption of Differential Privacy

Differential privacy techniques impose constraints on algorithms used to publish aggregate information about statistical databases which limits the privacy impact on individuals whose information is aggregated. For example, differentially private algorithms are used by some government agencies to publish demographic information while ensuring confidentiality of survey responses. In general, an algorithm is differentially private if an analyst or an observer seeing its output cannot tell if a particular individual’s information was used in the computation.

The same techniques can be required to be used by companies above a certain threshold, say one million downloads, to collect information about user behavior while controlling what is visible even to internal analysts.

Nine Ways a Cat Lives, Privately!

A complex, multi-faceted privacy issue? A far-fetched fear of national security threat? A souped up scenario of domination by foreign AI?

It’s not! As the ecosystem grows, we are urging a privacy-first approach to consumers. Right now, there are emerging technologies that can enable countries, or regulatory authorities to have a secure, robust, world-class privacy framework for its citizens.

The timing is right. An emerging class of technologies such as Federated Learning or secure computation is being made available. A strong partnership among the government, regulatory bodies, industry associations and technology companies will be a good first step to initiate a constructive action-oriented dialogue that leads us towards being a society independent of privacy-breached, data-led domination by selected entities.

--

--

Dr. Santanu Bhattacharya
DataSeries

Chief Technologist at NatWest, Prof/Scholar at IISc & MIT, worked for NASA, Facebook & Airtel, built start-ups, and future settler for Mars & Tatooine