AI, Crypto & Art: Generating NFT Style Images with Neural Networks

11 min readDec 6, 2021

By Taylor Tucker, Dean Wahle, and Ethan Sidelsky

Intro

Humans have been creating artistic works almost as long as we have walked the Earth. As technology has evolved, so too has the way artists express themselves. In the 20th century, the invention of computers unlocked nearly limitless potential new avenues for artists to explore. In the 21st century, the way art is owned, viewed, and shared is also evolving. Our project will apply neural networks to innovate new ways of generating art while exploring the changing forms of art ownership.

NFTs

Innovating upon the foundation created by Satoshi Nakamoto’s Bitcoin, blockchains were developed to hold more than just an alternative currency. In the 2010s, non-fungible tokens (NFT) emerged as unique and non-interchangeable units of data stored on a digital ledger (blockchain). NFTs are stored on a blockchain so that there is a shared, public ledger that provides proof of ownership. The blockchain also provides the functionality of easily being able to transfer ownership of NFTs. NFTs are important because unlike most digital photographs which can be infinitely reproduced, NFTs provide a way to easily identify ownership of a digital art piece.

CryptoPunks

Digital art is one of the initial use cases for NFTs because they can easily be embedded into the blockchain. One of the most popular and highly valued NFT collections is CryptoPunks. CryptoPunks are inspired by the London punk scenes and the cyberpunk movement. There are 10,000 unique CryptoPunk tokens with some selling for many millions of dollars. CryptoPunks vary in value based on some of their unique attributes, which are mixed and matched to create each CryptoPunk. For example, the “Alien” body type is considered one of the rarest, with just 9/10,000 of the images containing that body type.

CryptoPunk #3100, Valued at over $7,500,000

Goals

Using neural networks and deep learning, our project will explore ways to generate new art. Specifically, we will utilize General Adversarial Networks (GANs) to generate art in the CryptoPunk style. Additionally, we will use style transfer to create CryptoPunk style pictures of people.

Data

To begin our project, we needed access to the entire library of CryptoPunk NFTs. OpenSea is one of the most popular sites for auctioning NFT collections. We used their open API to download all 10,000 images in the NFT collection. To be clear, downloading an image of an NFT does not mean you now own it; that would require a transfer of the asset on the blockchain. What we did is more akin to taking a photo of an art piece. Scraping the data was relatively straightforward, with the OpenSea API being incredibly easy to work with. The code for the scraper can be found in the GitHub repository under the name download_images.py

What is a Generative Adversarial Network?

GANs are a machine learning framework that consist of two neural networks, referred to as the generative network and the discriminative network, that essentially play a zero-sum game. The generative network learns to map from a latent space to a data distribution — in this case an image — attempting to mimic specific data points. Before the model has been trained, the generative network produces images of noise. The discriminator takes the image from the generative model and from the dataset and tries to distinguish which one is real and which one came from the generator. The training loss is calculated based on the ability of the discriminator to accurately determine which images are real and is then used in a backpropagation sequence through both networks. Over time, this process eventually generates new images that look like they belong in the initial dataset.

Data Pre-Processing

Next, we needed to format the data so that we could apply it to our GAN. Below, we took the directory of images and used Keras to create TensorFlow BatchDatset objects which can be used for validation and training. We also normalized the pixel values of our CryptoPunk images to being between -1 and 1 to match the generator model.

image_sizes = {"cryptopunks": (32, 32),
              "boredapeyachtclub": (512, 512), 
              "mutant-ape-yacht-club": (512, 512),
              "decentraland": (512, 512),
              "grey": (336, 336)}
batch_size = 32def get_data(collection):
    train_ds = tf.keras.preprocessing.image_dataset_from_directory(
        "./%s"%collection,
        validation_split=0.2,
        subset="training",
        seed=1337,
        image_size=image_sizes[collection],
        #batch_size=batch_size,
    )
    val_ds = tf.keras.preprocessing.image_dataset_from_directory(
        "./%s"%collection,
        validation_split=0.2,
        subset="validation",
        seed=1337,
        image_size=image_sizes[collection],
        #batch_size=batch_size,
    )
    return train_ds, val_dstrain_ds, val_ds = get_data("cryptopunks")def process(image,label):
    image = tf.cast((image - 127.5)/127.5 ,tf.float32)
    return image,labeltrain_ds = train_ds.map(process)

Generator Model

The generative model, or the generator, has a unique architecture, by basically starting with low-resolution images and slowly upscaling them (in this case, by a factor of 2) while applying the learned weights and biases. Our generator outputs a 32x32 RGB image, hence the shape of the final Conv2D layer (32x32x3). Our generator consisted of the following architecture:

from tensorflow.keras import layersdef make_generator_model():
    model = tf.keras.Sequential()
    model.add(layers.Dense(8*8*256, use_bias=False, input_shape=(100,)))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())model.add(layers.Reshape((8, 8, 256)))
    assert model.output_shape == (None, 8, 8, 256)  # Note: None is the batch sizemodel.add(layers.Conv2DTranspose(128, (5, 5), strides=(1, 1), padding='same', use_bias=False))
    assert model.output_shape == (None, 8, 8, 128)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())model.add(layers.Conv2DTranspose(64, (5, 5), strides=(2, 2), padding='same', use_bias=False))
    assert model.output_shape == (None, 16, 16, 64)
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU())model.add(layers.Conv2DTranspose(3, (5, 5), strides=(2, 2), padding='same', use_bias=False, activation='tanh'))
    assert model.output_shape == (None, 32, 32, 3)return modelgenerator = make_generator_model()noise = tf.random.normal([1, 100])
generated_image = generator(noise, training=False)plt.imshow((127.5 + 127.5*generated_image[0, :, :, :].numpy()).astype("uint8"), cmap='viridis')

We confirmed the generator worked by passing through a vector of noise with training disabled to get this picture, which was expected (i.e. a 32x32 RGB noisy image).

Discriminator Model

The discriminator model is architecturally simpler than the generative model. The discriminator model works by passing the image through two Conv2D convolutional layers and ending at a 1 neuron Dense layer which outputs the image’s likelihood of being a real image. The discriminator consisted of the following architecture:

def make_discriminator_model():
    model = tf.keras.Sequential()
    model.add(layers.Conv2D(64, (5, 5), strides=(2, 2), padding='same',
                                     input_shape=[32, 32, 3]))
    model.add(layers.LeakyReLU())
    model.add(layers.Dropout(0.3))model.add(layers.Conv2D(128, (5, 5), strides=(2, 2), padding='same'))
    model.add(layers.LeakyReLU())
    model.add(layers.Dropout(0.3))model.add(layers.Flatten())
    model.add(layers.Dense(1))return modeldiscriminator = make_discriminator_model()
decision = discriminator(generated_image)
print (decision)

Training the Model

Training the model is fairly straightforward. For every epoch, we go through all images in the training dataset in batches. For each image from each batch, we pass the generator a noise vector, which it uses to produce an image. Then we compare the output of the generator with the real image in the discriminator. The loss from the discriminator is then added to a GradientTape which is applied to both the discriminator and generator (both of which use cross-entropy, albeit in different forms) after each batch.

We trained the model three distinct times, using different numbers of epochs. First, we trained for 60, then 150, and finally 200 epochs. Total training times for the model were roughly 30, 85, and 160 minutes respectively.

def train(dataset, epochs):
    for epoch in range(epochs):
        start = time.time()for image_batch, labels in dataset:
            train_step(image_batch)# Produce images for the GIF as you go
        display.clear_output(wait=True)
        generate_and_save_images(generator,
                                 epoch + 1,
                                 seed)print ('Time for epoch {} is {} sec'.format(epoch + 1, time.time()-start))# Generate after the final epoch
    display.clear_output(wait=True)
    generate_and_save_images(generator,
                           epochs,
                           seed)

CryptoGIFs

After each epoch, we saved a batch of images from the model to be used to create a GIF of the model’s performance as it trained. The GIF was fascinating, as it allowed us to see the model progress from noise to very recognizable images. One interesting thing to note about the GIFs is that, while the generator quickly learned the face shapes of the CryptoPunks, which are all very similar, the accessories and other attributes, such as a cigar or different hair styles, became more like blurs because those features vary in shape and color so much. Therefore, in the GIFs, sometimes you can see cigars fade in and fade out over a number of epochs. You may also see hazy clouds of color around the heads of the CryptoPunks, which seem to come from the wide variety of hairstyles that are used on the CryptoPunks.

import tensorflow_docs.vis.embed as embed
import glob
import imageioanim_file = 'training.gif'with imageio.get_writer(anim_file, mode='I') as writer:
    filenames = glob.glob('./epoch_images/image*.png')
    filenames = sorted(filenames)
    for filename in filenames:
        image = imageio.imread(filename)
        writer.append_data(image)
    image = imageio.imread(filename)
    writer.append_data(image)embed.embed_file(anim_file)

Neural Style Transfer

For the next part of our project, we implemented a neural style transfer where we can take images of real people and then create a CryptoPunk style version of them. A neural style transfer takes two images — a content image and style reference image — and blends them together so that the output image is the content image in the style of the reference image.

Data

In the case of Neural Style Transfer, you need two distinct datasets, as aforementioned. In this project, we used the CryptoPunk library that we had already created, in addition to the TensorFlow “aflw2k3d” dataset, which included a large number of human faces and is publicly available.

Pre-Processing

Our first goal when it came to processing these datasets was to get them to a uniform size. We had several issues with this throughout, but eventually settled on scaling the originally 450x450 pixel faces down to 50x50, and then grabbing a 32x32 crop randomly from that. This matched the optimal resolution for CryptoPunk at 32x32. The face dataset also went through random preprocessing, such as random flips and the aforementioned crops to vary the dataset.

In terms of preprocessing for both datasets, we first normalized the values of the RGB channels to being between -1 and 1. We also resized the CryptoPunk images from 336x336x3 to 32x32x3, which because of the pixelated nature of CryptoPunks led to almost no loss in resolution while drastically improving runtime efficiency. We also prefetched both datasets to expedite training by keeping buffers of the data in RAM

In the end, we visualized both of our datasets using matplotlib. While the datasets may seem of different sizes, the preprocessing steps occur after the visualization and before model training.

CycleGAN Model

CycleGAN is a model that is used to achieve image-to-image translation. The problem with this is that usually, learning the mapping between an input image and an output image is done using a training set of “aligned image pairs”. Unfortuniatly, finding paired examples isn’t always easy. The good news is that CycleGAN can to learn this mapping without requiring paired input-output images, using cycle-consistent adversarial networks.

Generator and Discriminator Models

In essence, the CycleGAN model works as such. A GAN takes a real image (in this case, of faces) and tries to recreate it, but in the style of the style-dataset (in this case, CryptoPunks). That generated image is then passed through another GAN, but this time “in reverse” so as to try to recreate the original image. This step is performed to ensure that the details from the original image (i.e. the faces) are not deviated from too much. The generated image is also passed through a discriminator with an image in the style-dataset (i.e. CryptoPunks) to make sure that it is working towards that style. Thus, in essence, the CycleGAN is trying to walk a fine line between staying close enough to the details of the original image while nudging itself closer and closer to the style images.

The generator and discriminator models are similar to those in the regular GAN described above, except with largely more complex architectures. While the generator in the regular GAN only contains Dense, Conv2D, Batch_normalization, LeakyReLU, and Reshape layers, the CycleGAN generator model also contains addition, instance_normalization, activation_padding, and ReflectionPadding layers. The generator also contains more Conv2D layers than the typical GAN. On a similar token, the CycleGAN discriminator model adds instance_normalization layers as well as another Conv2D layer. In the end, the generators and discriminators work similarly to those in the normal GAN described above, and play a “zero-sum-game” with each other to trick the discriminator into believing the generated image is real.

Training the Model

For CycleGAN, it is necessary to calculate different categories of losses for the generators and discriminators. In our models, the first thing we did was pass real images through our generator networks to produce the generated images. Next, we passed the generated images back to the generators to check if we are capable of predicting the original image from the generated image.

After that, we had to create an identity mapping of the real images using the generators. The next step was to pass our generated images into the discriminators. Afterwards, we calculated the generator’s total loss (adversarial + cycle + identity) as well as the discriminator’s loss, and used that to update the weights of the generators and the weights of the discriminators. Finally, we return the losses in a dictionary.

Conclusion/Reflection

In all, we found our CycleGAN project to be much more successful than we had first imagined. The images from the two datasets could not be more different: the facial dataset being pixelated bits of real human faces, and the CryptoPunks being hyper-cartoonish pixelated images of humanoids.

Despite this, we found some surprising results. For example, one of the aforementioned attributes that a CryptoPunk can have is an eyepatch. When the model saw a woman whose hair was partially covering her eye, it gave the corresponding generated CryptoPunk an eyepatch over the same eye. We also noticed fantastic improvement between epochs of training. While the generated CryptoPunks in the first epoch were basically a blur, by only epoch three there was clearly a CryptoPunk there, although it was relatively amorphous and seemed to not know what to do with the features from the human image.

At the same end, we found that many of the generated CryptoPunk images simply had a hard time finding the happy medium between the two domains from which they were generated. In the end, we managed to create some very interesting albeit sometimes scary images while also learning a lot about AI and blockchain. Check out some examples below, and, if you’re curious, check out our code and datasets on GitHub.

Generated CryptoPunks over the first three epochs

CryptoPunk showing a stong facial resemblence, especially in the eyes and hair.

These CryptoPunks mimic the head orientation of the human images.

Check out our code and datasets on GitHub

AI, Crypto & Art: Generating NFT Style Images with Neural Networks

Written by Ethan Sidelsky