Runway asked me to come in to their office for a few hours each week over the summer to work on some language-related prototypes. This post is a summary of what I got up to. I also talk a little bit about what I think is special about Runway.
Porting my first model
I’m an educator and my students are mostly artists and designers. A recent focus of my teaching is doing creative things at the intersection of machine learning and the language arts, mostly using the Python programming language. If you’ve ever taught programming before, you’ll know that one of the biggest challenges for instructors is just getting the damn software installed and working on your students’ machines. The challenge is even tougher when you’re teaching machine learning: there are dozens of machine learning frameworks, and hundreds of frameworks built on top of those frameworks, each of which potentially has multiple versions, mutually-incompatible dependencies, and/or arcane installation instructions. Even downloading and installing pre-trained models can be a hassle: services like Tensorflow Hub aim to make the process easier, but actually end up adding yet another library requirement on top of everything you’ve already installed.
One model that I frequently recommend to my students is Google’s Universal Sentence Encoder. (In class, I usually start off the discussion of sentence vectors by telling students just to average word vectors in a sentence together to get sentence vectors. This works fine for many applications, especially poetic applications, but the Universal Sentence Encoder is a handy step up from that technique.) But in order to use this model, you have to go through the rigamarole of installing Tensorflow, SentencePiece, etc. This makes it difficult to use it as a drop-in replacement for other techniques, especially in introductory classes and workshops.
So the first project I took on at Runway was to make what the Runway folks call a “port” of this model. A Runway port is usually just a wrapper around an existing model that extracts the parts of the underlying code that perform prediction and puts them together in a simple web API, using a Flask-like framework called the Runway Model SDK. The interface of this web API — including the data formats and data types — follow a set of conventions understood by the Runway desktop application. The “porting” process itself is easy, even for beginner Python programmers, and getting the basic version of the model running in Runway’s local development mode ended up taking less than an hour. (The one tricky part of porting the Universal Sentence Encoder model was including an extra step in the build instructions to download the model from Tensorflow Hub at image build-time. Thank you Anastasis for helping me figure this out!) The source code for my port is open source and freely available, if you want to take a look.
(The Runway Model SDK is, essentially, a common set of conventions that machine learning researchers and engineers can use to make their models easy to use and interoperable. This is, in my opinion, Runway’s most interesting and important innovation.)
Sentences and t-SNE
Runway makes the output from models immediately visible in a “Preview” panel, which makes the interface friendly in a way that notebooks and command-lines are not. (You don’t have to go poking and prodding through directories just to see what happened.) Many of the models most prominently featured in the application seem to be chosen specifically for their immediate visual impact. The application comes with a number of different Preview modes, which are selected automatically based on the type of data returned by the model.
My problem when porting the Universal Sentence Encoder was this: the Model SDK didn’t seem to have a good match for the kind of data that the model outputs. The best solution I could figure out was to make the model return two arrays: the first is an array of strings (the sentences from the input text) and the second is an array of vectors (the encodings from the model), where the strings and their corresponding vectors share indexes in their respective arrays.
Simple enough! But Runway doesn’t have a built-in preview mode for this kind of data type, like it does for images and text. This means that if you were to use the model right now, nothing will show up in the output panel at all, even though the model is successfully returning data. So another part of my work with Runway was to propose how the preview panel should work when displaying this kind of data. The obvious solution, of course, is to reduce the number of dimensions in the vectors from 512 to two (with an algorithm like t-SNE) then use those dimensions as X/Y coordinates when plotting the sentences on the screen.
I happened to be in the office while Runway was rolling out their initial version of a feature called model “chaining” — the ability for one model to use another model’s output as its input. I was impressed with how quickly Runway was able to deploy a working version of this feature in the application. The feature itself is a ton of fun, and I spent a while playing around with it just to understand the affordances and possibilities. My favorite experiment was making a loop, connecting SPADE-COCO (generates images from segmentations) to im2txt (generates captions from images) to AttnGAN (generates images from captions) to DeepLab (image segmentation) and back to SPADE-COCO again. Here’s a brief screen capture of the kind of weirdness that ensues:
Cris asked me to write a short tutorial on how to use the chaining feature, which you can read on the Runway website.
I love Runway’s “vector grid” interface, which lets you explore the latent space of certain kinds of models. But all of the models that take advantage of that interface right now are image-based. So for my second project, I wanted to make something that works in the vector grid, but with text instead of images.
For a model to work with this kind of vector grid, the model has to be “generative,” like Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs). Both types of models are designed to imitate the data they’re trained on by learning the distribution of underlying latent variables, which are mapped to a fixed-size vector. I’ve been working a lot with VAEs recently in my own research, so I decided to train one on my Gutenberg Poetry Corpus. I was inspired, of course, by Bowman et al.’s magical “Generating Sentences from a Continuous Space” paper, and also by Robin Sloan’s excellent Voyages in Sentence Space.
It turns out that it’s a lot easier to train generative models to produce images than it is to train generative models to produce sequences (like text). Most VAE model architectures suffer from “model collapse,” wherein the decoder part of the model learns to just ignore the latent variable, and instead decodes from whatever bits and pieces it learned from the sequences themselves. Over the past few years, researchers have devised one arcane technique after another to avoid model collapse, with varying levels of success. Bowman used annealing, Semeniuta et al. use a convolutional decoder, and others suggest using exotic probability distributions in favor of the more conventional unit Gaussian. I ended up using the technique and source code from He et al.’s Lagging Inference Networks paper, which mitigates model collapse by training the encoder more aggressively than the decoder. (I am extraordinarily grateful to He and co-authors for this paper and for sharing their code — it’s the first VAE training code I’ve tried that actually works off the shelf. As an added bonus, the code is easy to understand, customize and modify.)
My fork of the code is available on GitHub. I made some changes to facilitate the kind of model I wanted to train. Most importantly, I modified the code so that you can use pre-trained sub-word embeddings from BPEmb. This speeds up training and (to my eye) improves the quality of the output. I also included a few helper classes and Jupyter Notebooks in my fork that make experimenting with trained models a little bit easier. My fork also, of course, includes the files necessary to build and deploy a model on Runway.
Anyway, I used my fork to train a VAE on one million lines of poetry from Project Gutenberg. This model (along with the training data) is available for download as well (116MB zip file). Using this model, you can do fun things like interpolate between two lines of poetry:
Poems on the grid
You can spin up a copy of this model in the Runway application right now, and even explore the lines of poetry in the vector grid. Unfortunately, the vector grid doesn’t yet display the generated lines of poems in the grid itself, though you can see them in the Preview panel below. The last thing I did for Runway was to modify my fork of the desktop application’s source to display text in the vector widget. Here’s my fork in action:
I enjoyed working with the folks at Runway. The Runway application is impressive to me both as an educator and an artist— it strikes just the right balance between flexibility and ease of use, which is a difficult sweet spot to find. I’m excited to watch what happens as even more researchers, professionals and creative practitioners incorporate Runway into their workflow.
For reference, here are links to the GitHub repositories for the models referenced in this post:
- Universal Sentence Encoder Runway Port
- My fork of Lagging Inference Networks (with Runway port and links to pre-trained poetry model)