Images from Humans of Nowhere Instagram account @

How to train Pix2Pix model and generating on the web with ml5.js

Dongphil Yoo
Dec 20, 2018 · 5 min read

In this post, I will go through the process of training a generative image model using and then porting the model to . This is how I did for my project during the class in NYU ITP this semester. I made a simple interactive web app using newly trained my own Pix2Pix model.

What is Pix2Pix?

Pix2Pix, or Image-to-Image Translation, can learn how to convert images of one type into another type of new image. Pix2Pix uses a kind of generative algorithm called Conditional Adversarial Network(cGAN) where the generation process is “conditioned” on the input image. The original and published by Phillip Isola et al.

Fortunately, I could get in-depth explanations of the model and technical details from classes which are Neural Aesthetic and Machine Learning for the Web taught by Gene Kogan and Yining Shi, respectively. Here are good resources from classes that contain more detailed descriptions of how Pix2Pix works.

Image courtesy of Machine Learning for Artists, Gene Kogan

Also, we can try some well-known interactive demo and find out a great post including a detailed explanation of how the encoding-decoding process works on this made by . I used his and ran it inside Spell run.

What I built

First, I created an interactive human picture generator which is based on the . Here is a demo link of mine:

I trained a model on pairs of photos of people and color segmented image of those original pictures. From that, Pix2Pix learned how to convert the color segmented images into output images that show human-like pictures. So you can use the same approach to train a model in any kind of related pairs of images you like.

I also made another web app which is called #HUMANSOFNOWHERE. You can use a webcam stream image as an input source for Pix2Pix generation. It also generates text with LSTM model ported by ml5.js. The model trained with all the images and texts that scraped from an existing Instagram account Humans of New York. So the basic idea of this project was making an autonomous Instagram bot which generates images and caption then posts them so that the account could be like a reflection of , one of New York’s iconic IG account.

Actually, this one is yet completed, but I would release a link of its alpha version and the Instagram account here.

How I built

I will cover only a process about the first Pix2Pix example, not about the other one which is combined with LSTM. If you want to train and use LSTM with ml5.js, there is a good to start with.

1) Data preparation

I needed a couple of hundred paired images of human photo and its person segmentation to train my own Pix2Pix model. Since the model doesn’t seem to require insanely large sets of image data compare to other generative models, I decided to scrape all images from Humans of New York Instagram account. I used python script for this. As a result, I got total 4200 images from the account.

The hardest part of data preparation was refining images. To train Pix2Pix properly, I needed to get 256x256 pixel size of square images first, and then I had to make them into 512x256 images paired with the same size of person segmented images like this:

One image pair. Left is the original photo(A) and right is person segmented image(B). Pix2Pix will learn to translate from (B) to (A)

I had a look several methods to do this effectively, and I ended up in a dirty way. Firstly, I almost randomly chose a few hundred square images among the scraped images that have various poses, gender, figures of people. And then I fed chosen images into the person segmentation process.

For the person segmentation, I wrote a script using (if it possible, I recommend using instead). The code snippet below is particularly used for pair images processing. After image processing, I downloaded 240 paired images by hand.

There are other choices to get person segmented images. For instance, model likely can help to make it happen.

2) Training

Once pairs of images have been collected, I needed to train my model. In order to do so, I used GPU accelerated computing. This process is done with Spell. If you are not familiar with Spell, check this introductory by Daniel Shiffman *Choo Choo* or Learn with Spell . And also Yinying has a that handled the exact same process so that you can get more sense of Spell.

After all the Spell settings are done on my local, I opened terminal and typed spell login to login Spell. Then I cloned the Pix2Pix Tensorflow implementation repo and moved to the directory.

git clone 
cd pix2pix-tensorflow

Before starting the training, I uploaded the prepared image dataset to remote Spell resources folder. In my case, I set the upload folder name as ‘data’.

spell upload DIRECTORY_PATH_WITH_DATASET [--name new-name]

If uploading succeeds, you can find a folder at Spell Resources dashboard with the same name you set.

All seemed good, so I ran the command to train.

spell run --machine-type V100 --mount uploads/data:/data/ --framework tensorflow 'python --mode train --input_dir /data/ --output_dir ckpt --which_direction BtoA --max_epochs 200'

It took around a half an hour, but the running time could be varied by different machines or hyperparameters you set.

In root folder pix2pix-tensorflow, I could export the model to use it on the web with ml5.js.

# create folder in pix2pix-tensorflow repo folder
mkdir ckpt
cd ckpt

# copy the result from spell run
spell cp runs/YOUR_RUN_NUMBER/ckpt

# export the model in pix2pix-tensorflow root folder
python --mode export --output_dir export/ --checkpoint ckpt/ --which_direction BtoA

# port the model to tensorflow.js
cd server
cd static
mkdir models
cd ..
python3 tools/ --checkpoint ../export --output_file static/models/MY_MODEL_BtoA.pict

These commands gave the exported model file in .pict format. I used that model on ml5.js.

3) Use the model

In ml5.js script, I could simply change the path to point to my new model.

pix2pix = ml5.pix2pix('model/MY_OWN_MODEL.pict', modelLoaded);

If anyone wants to play with the example I made, here is the . This approach and code for training Pix2Pix should work with any well-structured pairs of images you want to use. You can check some interesting results created by this model on Instagram.

Dongphil Yoo

Written by

Creative Technologist / Master candidate at New York University

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade