Defeating Deepfakes: Ivan Begunov of Lucky Loki How We Can Identify Convincingly Real Fake Video, Pictures, and Writing, And How We Can Push Back
Business may design and tailor perfect brand representative. Control their image and online presence for maximum impact and engagement.
Most of us are very impressed with the results produced by generative AI like ChatGPT, DALL-E and Midjourney. Their results are indeed very impressive. But all of us will be struggling with a huge problem in the near future. With the ability for AI to create convincingly real images, video, and text, how will we know what is real and what is fake, what is reality and what is not reality? See this NYT article for a recent example. This is not just a problem for the future; it is already a struggle today. Media organizations are struggling with a problem of fake people, people with AI-generated faces and AI-generated text, applying to do interviews. This problem will only get worse as AI gets more advanced. In this interview series, called “Defeating Deepfakes: How We Can Identify Convincingly Real Fake Video, Pictures, and Writing, And How We Can Push Back,” we are talking to thought leaders, business leaders, journalists, editors, and media publishers about how to identify fake text, fake images and fake video, and what all of us can do to push back against disinformation spread by deepfakes. As a part of this series we had the distinct pleasure of interviewing Ivan Begunov.
The Founder of Lucky Loki AI startup (lucky-loki.com), Ivan Begunov, is a tech-trend-watcher and AI enthusiast with an engineering background and professional experience in product development. His passion lies in facilitating the effective implementation of positive innovations in everyday life. With over three years of research on generative AI algorithms, he is well-versed in the technical aspects of the field.
Thank you so much for joining us. Before we dive in, our readers would love to ‘get to know you’ a bit better. Can you share with us the “backstory” about how you got started in your career?
As a serial tech entrepreneur, I have been dealing with technologies for the most of my life. I have dedicated more than three years to the study and exploration of generative AI algorithms. I was a co-organizer of the Synthetics.media foundation, where we researched AI-powered solutions for hassle-free digital content creation and modification. In 2019 these technologies seemed to be quite promising. Looks like we were right.
Can you share the most interesting story that occurred to you in the course of your career?
One of the experiments was dedicated to tests with synthetic accounts, inspired by Lil Miquela (https://www.instagram.com/lilmiquela). It implies social media accounts of personalities that don’t exist in the real world. Some synthetic accounts have been created on Instagram and Tinder. The goal was to create synthetic personalities and test, how they can interact with real people. Actually, we could only admire and a bit envy the popularity of the synthetic accounts on Tinder.
It has been said that our mistakes can be our greatest teachers. Can you share a story about the funniest mistake you made when you were first starting? Can you tell us what lesson you learned from that?
One of the projects was based on GPT-2, long before the hype of ChatGPT. Technical team managed to train the language model to write texts on behalf of different characters, for example Rick form “Rick and Morty”. You could talk to the character in the messenger bot, which was quite fun. But it was totally useless in terms of commercial use, even though couple corporate clients showed interest for using it as a concierge bot. Generated texts were sometimes offensive. Our first mistake was that we didn’t pay much attentions to the limits of the application of technology to casual tasks and business routines.
We recognized that even if we could successfully manage a product with retrained GPT-2, it was at risk of becoming quickly outdated and irrelevant due to the release of newer, more advanced models like GPT-3 and ChatGPT.
This realization highlighted the importance of staying current with the latest technology trends and continuously re-evaluating our projects to ensure they remain relevant and useful.
What are some of the most interesting or exciting projects you are working on now?
Our team has developed a sophisticated AI-powered architecture to provide solutions for realistic and automatic face swap in photo / video content for content creators. It helps content creators and brands to protect their intellectual property and privacy and, if necessary, to reduce content production costs by utilizing pre-filmed footages.
Our face swap technology has enabled several digital media production studios to significantly reduce their filming budget by more than 75%. With our technology, filming a scene is no longer necessary as it allows for the seamless substitution of an actor’s face with that of another. By simply using stock videos, you can save a significant amount of money while achieving the same desired result.
For the benefit of our readers, can you share why you are an authority about the topic of Deepfakes?
My startup focuses on developing a high-quality face-swap service for photo and video content, that is often called ‘Deepfakes’. However, “DeepFakes” is inappropriate term for our products and we’ll discuss it a bit later.
Ok, thank you for that. Let’s now shift to the main parts of our interview. Let’s start with a basic set of definitions so that we are all on the same page. Can you help define what a “Deepfake” is? How is it different than a parody or satire?
First of all, “DeepFake” is a technical term. It describes certain GAN (Generative Adversarial Network) architecture, that implies deep learning to perform face swaps in photo and video content.
By the way, DeepFakes architecture has some serious limitations (it needs retraining for every new identity to be swapped) and we don’t use it in our project.
Limitation of the DeepFakes:
The architecture of DeepFakes contains two parts, a common Encoder 𝐸𝑛𝑐 and two identity-specific Decoders 𝐷𝑒𝑐𝑆 , 𝐸𝑛𝑐𝑇 . In the training stage, the 𝐸𝑛𝑐-𝐷𝑒𝑐𝑆 architecture takes in the warped source images and restore them to the original unwarped source images. The same procedure will be conducted with the target images using the 𝐸𝑛𝑐-𝐷𝑒𝑐𝑇 architecture. In the test stage, a target image will be sent to the 𝐸𝑛𝑐-𝐷𝑒𝑐𝑆 architecture. During this process, the Encoder Enc extracts the target’s features which contain both identity and attribute information of the target face. Since the Decoder 𝐷𝑒𝑐𝑆 manages to convert the target’s features to an image with source’s identity, the identity information of the source face must have been integrated into the weights of 𝐷𝑒𝑐𝑆.
So the Decoder in DeepFakes can be only applied to one specific identity.
In casual meaning “Deepfake” is often used to talk about video content that appears to be real but is actually AI-modified. It involves using deep learning algorithms to create highly realistic and convincing media that can be difficult to detect as fake. Deepfakes can be created for a variety of purposes.
On the other hand, parody and satire are forms of creative expression that use humor and exaggeration to make a point. They are intended to be obvious, unlike deepfakes. Parodies and satires are intended to be clearly identifiable as fictional or exaggerated representations of reality.
Can you help articulate to our readers why Deepfakes should be a serious concern right now, and why we should take measures to identify them?
First of all it is important to differentiate between casual face-swap in digital content and “Deepfakes”. Face swap may bring value to business and individuals in various ways:
- Business may design and tailor perfect brand representative. Control their image and online presence for maximum impact and engagement.
- Make content appear native in various regions without the need for refilming.
- Reduce content production costs by utilizing pre-filmed footage and swapping in the necessary identity or shoot content in far-off locations using a stand-in actor.
Influencers and private users may:
- Protect their identity and privacy, experience safety and freedom of expression.
- Take social media videos to the next level by incorporating unique role-played characters.
- Increase likes and followers by creating fun memes and featuring notable appearances.
From what you can observe, face swap technology can be utilized to create a new “synthetic” identity without necessarily copying someone else’s identity. In my opinion, the creation of new identities for the purposes of enhancing creativity and safeguarding privacy carries a great potential and value.
For instance, if you attempt to swap your face with that of Elon Musk, the resulting image is likely to depict someone who bears a resemblance to both you and Elon (assuming your facial structure and hairstyle do not align perfectly with his). It may be fine for a parody, but not for a fake video.
However DeepFake content (generated with retrained models), copying someone’s identity may be used in dark patterns like:
1. Misinformation: Deepfakes have the potential to spread misinformation and create confusion, making it difficult to distinguish between real and fake content.
2. Harm to individuals: Deepfakes can also be used to harm individuals by spreading false or defamatory content. For example, a Deepfake video could be created to falsely damage their reputation.
3. Security risks: Deepfakes can also pose a security risk by enabling fraud and impersonation.
Why would a person go to such lengths to create a deepfake? How exactly can malicious actors benefit from making them?
Individuals who stand to profit from misinformation or fraud may utilize the deceptive practices of DeepFake technology for their own gain.
Can you please share with our readers a few ways to identify fake images?
If we talk about human photos, creating a fully realistic DeepFake is quite a challenging process. You can often notice imperfections: visible borderlines or skin color discrepancies between the substituted face and the rest of the image. Notably, hair hanging down the face can be particularly challenging to replicate with accuracy, as can teeth, when the person in the image is talking or smiling.
Similarly, can you please share with our readers a few ways to identify fake audio?
There are currently enough solutions for voice generation/cloning. It’s becoming easy to copy the tone of a voice. However, a much more complex task is emotional coloring of speech, sighs, etc. Although there are editors for intonation markup for this purpose.
If the speech is too monotonous and has errors in stress, it is a hint of a generated voice.
If you find yourself in a situation where someone calls you on the phone and you suspect that you are talking to a bot using a pre-recorded generated voice, try to steer the conversation in an unexpected direction.
Next, can you please share with our readers a few ways to identify fake text?
I would rather argue about the definition of the “fake text”. Any misleading information from both a human and a machine may be considered to be a fake text. So the only way to deal with it is critical thinking and fact checking.
If we are talking about text, generated by language models, it may be quite truthful and correct.
Early-stage language models often generated texts that seemed to be human-like, but had no sense. Recent language models generate brilliant texts. So if the text looks too good, may be it has been AI-generated. BUT it doesn’t necessary mean that it contains false information.
Finally, can you please share with our readers a few ways to identify fake video?
The biggest problem for algorithms is replacing a face in a video when there are strong head rotations from a portrait view, face occlusions with glasses, phones or other objects, sudden turns, and movements. In such moments, you may notice twitching or complete disappearance of the applied mask.
How can the public neutralize the threat posed by deepfakes? Is there anything we can do to push back?
I believe that generative AI algorithms are a promising and powerful tool that will improve many content creation processes and enable people to save budgets and unleash their creativity. And we shouldn’t ban or restrict the development of these technologies.
But, like any powerful technology, it can be used on the dark side.
I think one solution to the problem could be a universally recognized standard of labels for AI-generated or AI-modified content. They would promote the positive use of technology and help address the limitations that arise from the misuse of technology on social media and elsewhere.
For example, there is a technology, known as a spectral watermark for visual content. A watermark is embedded in the image (or video frame), that is invisible to the human eye. It is resistant to most possible image edits: cropping, flipping, changing palette, etc. The label can be used to mark the content as AI-generated, as well as refer to the origin.
This is the signature question we ask in most of our interviews. Can you share your “5 Things I Wish Someone Told Me When I First Started” and why? Please share a story or an example for each.
1 . Don’t try to make the PERFECT product — make the most demanded.
2 . Think about both aspects: how to gain revenue today and capitalization tomorrow.
3 . One of the key indicators of a startup’s success is how many hypotheses you can test before running out of money. Try to increase that number.
4 . If you consider to train your own AI model always ask yourself, is it likely that one the Big-tech is doing the same?
5 . Always buy Bitcoin, when it’s low ☺
You are a person of enormous influence. If you could start a movement that would bring the most amount of good to the greatest amount of people, what would that be? You never know what your idea can trigger. :-)
As I have mentioned before, I would suggest to start a movement for developing a mass-recognized standard of tags for AI-generated/modified content. Actually, I would love to discuss some technical solutions and ideas with market players.
How can our readers further follow your work online?
Thank you so much for the time you spent on this. We greatly appreciate it and wish you continued success!