Telling Apart AI and Humans: #2 Photo VS GAN Generated Image
If you missed the 1st installement of this series, Humans vs Androids is here.
Prompted by advances in Generative Adversarial Networks (GAN), a year ago I tweeted a thread about telling apart pictures taken with a camera from generated pictures.
In the years since the 1st GAN paper, there have been many advances leading to better and better “false” images.
A few of my tips are still relevant. Who can say for how long? Get’em while they’re hot!
Not quite human humans.
Our bains are especially good at recognizing human faces, and this is the category where you have the best chances to spot a GAN. Remember that next time that a stressfull music plays and you have to choose which one is your best friend and which is the evil double.
Look for perspective issues: eyes looking in different directions, eyes facing forward but face turned to the side, or in this case, picasso symmetry: we shouldn’t be seeing that man’s left ear…
Symmetry issues are sometimes easier to spot on women with makeup. Only one eye with eyeshadow, eyebrows with different shapes… Granted, the women below have more pressing issues, but they also have seriously original makeup — one panda eye, one missing eyebrow, differently colored eyelids ↓
Even the latest GANs still have a hard time dealing with these issues.
Older networks tended to have strong distortions at the 4 corners of the frames, leading to strangely fuzzy worlds. This is not an issue for newer networks, but fuzzyness shows up in a different way: GANs most characteristic feature, their ability to morph smoothly from an image to another, is also their weakness.
Pause anywhere in the above gif. Can you tell where the faces and hair start, and where they end? Often they just blend into another person’s hair or face. Follow the object’s contour and check is everything is as sharp as it should, if parts aren’t blending into others.
Sometimes it’s… something different.
Janelle Shane nailed it, GANs can’t write. They get the concept of text being on labels and numbers on clock faces, but there’s something not quite right. The images below are from the same paper as above.
GANs can’t count. Up to two it’s mostly alright: two eyes, two hands… above that, all bets are off!
This thread is wonderful and I invite you to read every single tweet.
High resolution images.
This one stems from a very basic fact: GAN algorithms are a pain to run on a computer. They require specific hardware and take days to train on a dataset. They are naturally gluttonous in computing time and resources, and therefore are usually trained to generate small, low resolution images. It is also simply easier to fool the human eye if the image is small and blurry. Common sizes go from 64x64 to 256x256 pixels if you feel frisky; sometimes a bit more.
I hope you feel better about spotting fakes — for the most part, photoshop is still much more worrisome than GANs. But GANs are more fun! This spider agrees.