RunwayML Flash Residency: MatCap Mayhem

Purvi Goel
May 23, 2020 · 7 min read

A MatCap, short for “material capture”, is a way of representing an object’s material and the lighting environment within a single spherical image. Because all of this information is compressed into the image, we can apply them to objects and skip any expensive lighting, reflection, or shadow computation.

Image for post
Image for post
Image from https://learn.foundry.com/modo/901/content/help/pages/shading_lighting/shader_items/matcap.html, an excellent resource about creating Matcaps

To apply a MatCap texture to an object, we transfer its shading and color by mapping normals from the sphere to corresponding normals on the target object. The mapping is easily calculated, and can produce some very interesting non-photorealistic results at blazing fast speeds. As a result, MatCaps are a tried-and-true way to prototype and interact with materials and 3D models.

Image for post
Image for post
Models textured with deep-learning generated MatCaps!

MatCaps come in a wide range of colors and varieties, but they all share a few defining features. First, as a consequence of the normal-to-normal mapping process, MatCaps are all square images. Second, because MatCaps describe how texture and lighting would look on the surface of a sphere, they contain a distinctive “circular” shape. Third, most tend to model materials with only low-frequency details. If an object is made of two separate materials, it is much neater to model them with two separate MatCaps rather than compressing both materials into a single MatCap.

There’s a lot of room within these constraints to explore and create. I was interested in applying deep learning to this exploration task: generating valid MatCaps that could be taken right out of a neural network and applied convincingly onto a 3D model. I also wanted to add some user control to the model’s generations. The model would draw some “inspiration” from a small, user-provided piece of an image and generate a low-resolution MatCap, inferring a spherical shape, existence or positions of highlights, and the size of shadowed regions.

Image for post
Image for post
The general goal: train a model that can generate MatCaps “inspired” by patches of color on an image. The patch specifies the color, the model chooses where to place specular highlights, shadows, and exposure. I generated both of these MatCaps using deep learning, from patches in the shown images.

Week 1: Collecting Data

RunwayML had several models at its disposal. I started by taking the application on a test drive inspired by Yuka Takeda, who used a generative model called StyleGAN to generate MatCaps. To get a hang of querying Runway’s hosted models, and also get my hands on one of those mesmerizing latent space walks, I also fine-tuned StyleGAN on a small dataset of MatCaps. I fine-tuned StyleGAN on a small online dataset of MatCaps. Fine-tuning took only a handful of hours, and did not disappoint.

Image for post
Image for post
MatCap-StyleGAN latent space walk, courtesy of RunwayML. You’ll have to excuse the gif’s poor resolution — that’s due to Medium’s size limits!

Mulling these results over, I settled on three key takeaways to inform the rest of the residency.

  1. Even though StyleGAN produces high-resolution images(1024 by 1024 pixels), the MatCaps still exhibited some blurriness around the edges and highlights. I’m sure fine-tuning on a larger dataset for a longer period of time would gradually clear up these issues. Since even the most advanced GANs can struggle to produce large images, I decided to stick with generating low-resolution MatCaps and leave higher resolution for future exploration.
  2. I chose to fine-tune on RunwayML’s “Faces” checkpoint because both human faces and MatCaps were vaguely…spherical? At some point half-way through the fine-tuning process, I was lucky enough to watch my model generate a human-MatCap hybrid. Think bright green MatCap ,wearing slightly blurry glasses. I regret not taking a screenshot.
  3. There’s no denying that StyleGAN is a powerful tool for image generation, and Runway made interacting with the model very accessible. But without some surgery on the model’s architecture, there was no way to actually control what StyleGAN was generating. I wanted a neural model that could draw “inspiration” from an existing splash of color or texture when generating MatCaps. If I wanted more creative control on what a GAN produced, I would need a different model.

Week 2: Choice of Architecture

Some reading led me to Conditional GANs. While StyleGAN and other similar networks produce novel-yet-arbitrary images from a vector of random numbers, CGANs are built to generate samples that exhibit a specific condition. CGANs have been used for tasks like super-resolution of a low-resolution input image, image-to-image translation, and generating images from text. Since I wanted my neural network to generate a MatCap based on characteristics specific to an input patch, a CGAN seemed like the right choice. I spent the second week of the residency building out one of these architectures, poking and prodding hyper-parameters into line, and watching its training epochs tick by.

Image for post
Image for post
The Pipeline. The ConditionGAN accepts an image patch as input and generates a low-resolution MatCap. We can then use that MatCap to texture a 3D model.

I generated a few hundred MatCaps to train the model, using the first week’s fine-tuned StyleGAN model. I selected 15 random 16x16 pixel squares from each MatCap image, and trained the generator to upsample the small patches back into their parent MatCaps. The model would learn about colors and shading in this the training process so that, at test time, it could generate MatCaps from patches of pixels outside the training dataset.

While that was training, I spun up a visualizer with Three.js on top of its helpful MatCap example. I chose a small patch from random images (found via the web) to run the model on. Here’s a handful of Week 2 results!

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Week 3: To Interactivity and Beyond..

Image for post
Image for post
Let’s generate a MatCap inspired by this image.
Image for post
Image for post
More specifically, this patch from one of the flower petals.

In order to more easily interact with the model, I uploaded it to Runway as a Hosted Model. This feature provided an API endpoint I could use to query and control the model. The whole uploading process was very easy with Runway’s numerous tutorials and examples, and only took about five minutes.

The web interface adds a few more seconds of buffering time, but removes the middle man.

I extended Three.js’s MatCap viewer to extract a patch from an input image, query the model over the web, and texture a 3D object with the output. The results are in the video above — It takes a few extra seconds to send data to and from the model, but I think it’s worth it!

Future Work

I had a really good time developing this project as a proof of concept, and I can think of several features I’m excited to add.

Image for post
Image for post
You’re gonna need a bigger ̶b̶o̶a̶t̶ 3D content archive. (image)
  1. As augmented reality on mobile devices becomes more prevalent, the demand for 3D content and interactivity is increasing rapidly. A smoother pipeline might query the model right from a phone application. Texturing simulated 3D models in augmented reality with MatCaps “inspired” by pieces and patches of the real world would be an interesting way to add more interactivity to the experience.
  2. There’s always a higher resolution. While low-resolution MatCaps can still work on most models, we inevitably miss some sharp colors and finer details around specular highlights. And of course, there’s something very compelling about watching a latent space walk through StyleGAN’s high-resolution outputs.
Image for post
Image for post
An object augmented with both a normal-map and a MatCap. The possibilities!

3. MatCaps are one of several representations of appearance and texture. If we fed patches of normal maps or UV texture maps into a CGAN, what sort of outputs would we get? It would be interesting to explore upsampling small image patches into both 2D MatCaps and 3D normal maps. By their powers combined…

4. Finally, the model has a distinctive bias towards producing blueish-purplish MatCaps. I attribute it to not enough diversity in the training set. I’ll have to collect more varied training data for future iterations of the project.

Finally, I want to thank RunwayML for giving me the opportunity to participate in this Flash Residency. The application was intuitively designed and easy to use, the documentation was plentiful, and the Runway developers were generous with their advice and constant support!

Runway

Make the Impossible

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store