These Bored Apes Do Not Exist
TL;DR — I complain about NFTs, then attempt to train both a GAN and super-resolution model to generate Bored Apes that do not exist. You can check out all the generated images on thisboredapedoesnotexist.nathancooperjones.com.
Friday, November 12th, 2021 started out as a normal day for me. Before starting my day at work, I decided to open Twitter to see if I missed anything since the night before.
Then, it happened. I saw this Tweet:
Say what you will about how Jimmy Fallon tells and reacts to jokes, but if this was a joke, I did not understand it. An animated monkey dressed in a sailor cap? I soon learned that this wasn’t just any animated ape, but one of exactly 10,000 unique images produced in a collection called Bored Ape Yacht Club. After ten minutes down a Twitter rabbit hole, I thought nothing of it and continued on with my day.
But by this point, it was too late. Twitter, too, had gone down the NFT rabbit hole with me. To my dismay, throughout the next few weeks, my entire feed was plastered with suggested tweets about NFTs.
In case you are lucky enough to have missed it, NFT stands for non-fungible token. The simplest way to understand an NFT is like this: an artist creates a piece of digital art (the NFT), then sells the ownership of the art to a buyer, who pays using cryptocurrency. In an effort to turn a profit, the buyer may then hold or trade the NFT, like a commodity.
If this is unclear, think of this like the concept of purchasing a star from one of those sites that generates a printable certificate attesting that you are now the owner of that star. You might technically own it (I suppose), but I can still look at it, NASA can still visit it, and an alien can decide to blow it up at any time — and you have no jurisdiction over any of that. NFTs are like that, but even more lucrative and incredibly damaging to the environment (see here and here for more info on that).
And of all the NFTs out there today, a collection of 10,000 Bored Apes stands out as royalty.
Apparently, Jimmy Fallon, Post Malone, Steph Curry, Mark Cuban, and Logan Paul all spent hundred of thousands of dollars purchasing Bored Apes. Someone purchases a Bored Ape for $2.7 million. Not to be outdone, someone else purchases a Bored Ape for $3.4 million. Suddenly, the Bored Ape ecosystem is valued at over one billion dollars. What is going on?!
Just a few weeks after my initial descent into the Twitter NFT rabbit hole, I see the Tweet that pushes me over the edge: someone suggests creating a GAN model to learn to reproduce the apes, generate a ton of these fake apes, release them for free, and flood the market — a virtual middle finger to the NFT industry. In fact, someone does create a GAN to generate the apes…and then ends up selling these images as NFTs…
The description of this collection, called GAN APES on OpenSea, reads:
A GAN, or a Generative Adversarial Network, is a type of machine learning framework that “learns” in an unsupervised manner. This collection was created by training our GAN for over 300 hours based on 9940 HQ Original Bored Ape Yacht Club (BAYC) images as well as our custom PKL file.
The PKL file is specially trained on psychedelic / abstract images. The PKL file used to create this collection has been destroyed to ensure that the collection can never be restored and will remain unique forever. Everything is stored on IPFS — you own it forever.
At this point, I had had enough — I was too invested in this not to do anything. Someone needed to do something. Today, I hope to be that someone.
Generating the Apes
I think the GAN APES NFT collection had the right idea in a lot of respects, but missed a couple of things that may have affected the final results. It’s hard to say for sure (since the PKL file has been destroyed), but my hunch is that the GAN collapsed at some point and could only produce images that resembled apes, but without any solid features.
My ‘data science spidey-sense’ also tingles a bit with a couple of additional observations:
- All Bored Apes face the right, but some of the generated GAN apes face the left, implying the author used some sort of image augmentation during training that included a horizontal flip.
- 300 hours is usually not enough time to fully train a standard GAN, but it can be enough to see more solid features in the images. However, these images don’t show that. This implies either a hyperparameter or architecture change could be necessary.
- The author only uses 9,940 apes, intentionally excluding 60 from the dataset. Why was this done?
One advantage for my project is that we can use this GAN as a great baseline, and see if we can’t improve on these results. With this, I set out on a quest to see if I could generate 100,000 Bored Apes (10x more than the number of images currently in the collection) that didn’t exist in the original, yet matched this Bored Ape Yacht Club style as closely as possible.
Scraping the Data
We can’t get a model without first getting the data, and in this case, that meant somehow collecting all 10,000 high-quality images of the Bored Ape Yacht Club collection. My first instinct was to write a complicated Python script to scrape all the images in the OpenSea collection, but, luckily for me, it turns out someone already did! In fact, they made a whole GitHub repo with the images. I cloned it locally, went into the images directory, wrote a quick script to convert the PNGs to JPEGs (a format that’s easier to use for training models), and… that’s it. Within seconds, I had 10,000 Bored Apes sitting on my desktop, waiting to be used. Nice!
I think the GAN APES project had it right by using a GAN (generative adversarial network) here, but for my project, I initially wanted to use an evolution of the GAN made by NVIDIA called StyleGAN, whose architecture and results are so impressive, it’s one of those “even data scientists think this is magic” type of models. If you have never heard of this model before, check out thispersondoesnotexist.com — a collection of faces generated from a StyleGAN model where every face is from a person that does not exist.
I promise not to dwell on the technical details too much here, but, in short, a GAN (and StyleGAN, for that manner) consists of two major components: a generator that tries to make new images that look like the images you show it, and a discriminator that tries to differentiate between the real images and the images made by the generator (yes, I am super duper oversimplifying this). Initially, both the generator and discriminator start out with no idea what they are doing, but eventually, they begin to learn that they are in direct competition with each other, as these are opposing tasks. The beauty of the GAN is that as the generator gets better, the discriminator has to get better in order to compete, and vice versa. Both components want to win in their own respective game. This competition is what makes these results so incredible.
For nearly ten days, I tried countless attempts at training both a StyleGAN3 and StyleGAN2 model on the Bored Apes dataset, but despite my best efforts, my model completely collapsed before getting any decent results.
“Okay,” I thought, “surely I just need to supe up some of these hyperparameters!” For the next week, I tried six more variations of the model — some generating smaller images, some larger; some with a massive batch size, some with smaller; some with lots of attention, some with none. I tried them all, even getting my partner in on the drama while I was out of town, which she did not appreciate (but nonetheless complied with).
After days of attempts, I had just about admitted defeat to the NFTs. It seemed I had overestimated how simple it would be for a model to generate these Bored Apes.
But then, I had an epiphany — a StyleGAN (and even a GAN for that matter) is an incredibly complicated architecture, one meant to generate complex images resembling a complex subject matter, such as the human face. But the Bored Apes were simple — they were all the same size and shape, all faced the right, all wore some top-piece of clothing, had some variant of hair or hats, etc. Using a vanilla GAN or StyleGAN might actually be overkill for such a simple, invariable task like this.
I needed to reconsider my architecture to use something simpler, something lighter… something like a Lightweight GAN!
Keeping It Simple
To the rescue came Phil Wang. Phil recently created a repo called “Lightweight GAN” based on a 2020 paper proposing a simpler version of a GAN by Liu et al., able to quickly and accurately converge on a variety of simpler image tasks with very few images in the training dataset. By introducing skip-connections into the generator (similar to what exists in a ResNet) and treating the discriminator as a self-supervised auto-encoder, the training is not only incredibly quick, but surprisingly stable. I know I didn’t do this paper justice with that single-sentence description, so I highly recommend reading the full paper! It’s a great read.
I cloned the repo, kicked off a training run (the command I used for that can be found here), and within hours, was seeing Bored Apes that did not exist. I couldn’t believe my eyes!
I let my model train for about 120 hours before the discriminator finally seemed like it had won. And, just like that, I had me some fake apes!
At this point, I felt like I had achieved my goal, but my apes were 1) a different size than the images in the Bored Ape Yacht Club collection (512 px vs 631 px), and 2) a different image format (mine were JPEGs, while the original were PNGs with transparent rounded corners).
A normal person would likely say to themselves, “Well, I did what I set out to do. No one is going to care about an extra 59.5 pixels on each side. Time to go take my free time and live my life again!”
But as many therapists have told me in my life so far, I am not a “normal person.” I, instead, decided to spend the next few days over-engineering a solution.
Painstakingly Matching the Bored Apes Style
At this point, I’m sure there is a flock (or shrewdness, if you will) of data scientists upset that I am going down this road, when a simpler solution exists. Why not instead just train up a Lightweight GAN to produce 1024x1024 px images, then just easily scale them down to 631x631 px? To this, I have two responses:
- Generally, doing so would take about 4x longer on my single GPU. Waiting 480 hours to have any results feels like torture.
- (More importantly) GANs aren’t perfect models, and sometimes leave small noise artifacts and imperfections in images. There’s a way that we can train a model not only to upscale an image, but also to remove the noise artifacts in a single go, which makes the final images from our GAN a lot cleaner and more realistic.
After doing some research, it seemed like the answer was to duct-tape two completely separate models into one by also training up a super-resolution model. Ideally, this model can enhance (and clean up, if need be) an image to be bigger in size while not getting blurrier and more pixelated. You can see the difference in the image below — one is of a 256 px ape manually upscaled to 512 px, and the next is a 256 px ape upscaled to 512 px using super-resolution.
I settled on a repo called Waifu2x written by user yu45020, who left a nice base training script to start with, as well as some pre-trained weights. The repo contains a whole bunch of models to do super-resolution, so I (completely randomly and arbitrarily) picked a Cascading Residual Network (paper here). To oversimplify another model architecture (three in one blog post must be a world record of some kind), this architecture mimics a ResNet, but with residual blocks replaced with cascading ones that allow for both local and global connections. The result allows us to upscale an image in an efficient manner without losing quality.
I spent a couple hours modifying the training script and data loading methods to specifically add more noise to the training images (in the hopes the model would pick up exactly how to get rid of these smaller noise artifacts in the final image), and within a couple of hours, I had trained up a custom Bored Ape super-resolution model (that training and data augmentation code can be found here)! Best of all, it looked like the modifications to the augmentation I made to remove noise seemed to be working as expected!
Finally, I wrote a small script to extract the alpha channel from an original Bored Ape image, copy it to a generated ape image, and save it in PNG format. Thanks to the efforts made by some brave souls who wrote the Python Pillow library, it only took me a couple of minutes (hours) to do.
And with that, I sat back and decided to generate, super-resolution, and format 100,000 Bored Apes that did not exist.
Finally, the fun part: a quiz!
Of the nine Bored Apes below, which one is generated from a GAN, and thus does not exist as part of the original 10,000 images in the Bored Ape Yacht Club collection?
I’m sure you saw this twist coming, but yes, they are all generated from the GAN. Surprise!
I think there is a bit of quality degradation when converting from a JPEG to a PNG, so future work should explore keeping the image format as a PNG throughout the entire generation process (something I can try the next time I have a free 120 hours). It’s certainly not perfect, but regardless, I think these results are incredible for a V1.
I solemnly swear I will not sell these generated apes as NFTs. And, to hold me to that promise, I am open sourcing the code and model weights I used to generate these images, as well as the 100,000 fake apes. All the code used to train and evaluate these models can be found on GitHub here.
GitHub - nathancooperjones/thisboredapedoesnotexist
Code to fully reproduce the results for the blog post These Bored Apes Do Not Exist. View results at…
And, just to make it even more fun, I took some inspiration from NVIDIA and hosted all the images on the website thisboredapedoesnotexist.nathancooperjones.com. Feel free to check it out and refresh a couple times to see some fun Bored Apes that do not exist!
This Bored Ape Does Not Exist
This Bored Ape Does Not Exist
This Bored Ape Does Not Existthisboredapedoesnotexist.nathancooperjones.com
Feel free to use these images in any way you want.
Except if you are Jimmy Fallon — you’ve done enough already.