Mixing movement and machine

A behind the scenes look at building A.I. experiments with dance legend Bill T. Jones

As a creative technologist at the Google Creative Lab, I had the opportunity to collaborate with Bill T. Jones to create Body, Movement, Language, an online collection of A.I. sketches. This is where we ended up, but technically, how did we get there?

Over the past 5 months, our team has been working with legendary choreographer Bill T. Jones and his company to explore how A.I., and more specifically PoseNet, could be used in combination with dance. PoseNet is Google’s machine learning model that can estimate human pose from an image or video in a web browser.

The Creative Lab has previously experimented with PoseNet: Move Mirror lets you search 80,000 images with your movement and Creatability makes creative tools more accessible on the web. This project raised a new question: How would pioneering artist Bill T. Jones use PoseNet as an instrument for creation?

Bill and dancer Vinson Fraley experimenting with a prototype during a workshop session

When we demoed PoseNet for Bill, his central question was not “What are all the things it can do?” but rather: “What do I want from it?”

The technologist in me thought: This incredible machine learning model can estimate where you are in space from just a webcam image in an internet browser — we can create anything you want.

But, the dancer in me understood his hesitation.

Bill is an award-winning choreographer, dancer, writer, and director who is known for creating deeply personal work. He was one of the first choreographers to blur traditional gender roles on stage by having men lift men. He introduced improvised speech into dance breaking the idea that dance should exist as pure movement. He broke down barriers to the avant-garde modern dance world by working with non-classically trained dancers. He has never needed A.I. to tell his stories before, so how could it serve him now?

We quickly discovered that PoseNet was only interesting to Bill if it helped him convey meaning. The tech wasn’t an end in itself, it was only useful to him as a tool for artistic expression. Working with Bill taught me to focus less on what the machine learning model could do and more on what it could help convey. While building the experiments for this project, I learned a lesson in technological restraint: let the concept star and the technology play a supporting role.

Where we ended: Body, Movement, Language

The product of our collaboration is a collection of PoseNet and speech experiments titled Body, Movement, Language: A.I. Sketches with Bill T. Jones. They are all inspired by Bill’s work interweaving speech and movement in performance, and are the direct result of his and his company’s engagement with this nascent technology. We also captured the process of creating these experiments in a short film.

The behind the scenes film

One particularly powerful experiment encourages you to match Bill’s poses alongside him as performs his iconic solo 21. The experiment uses PoseNet to determine the similarity between your pose and Bill’s pose. After completing the poses, you can then download your gif, explore his personal reflections on each pose and what it means to put a live performance piece on the internet. This experiment builds upon work done by Move Mirror, which involves a mechanism to calculate how similar to PoseNet generated poses are to one another (read about their process and tech in the blog post here).

Though the final product is exciting, as a maker, I crave the ability to peer behind the scenes and see exactly what tools and techniques were used in the process of pushing a creation toward its final form. So, I want to peel back the curtain and take you back to the beginning.

Where we started: The evolution of a single experiment

For our first workshop, I created this collection of web experiments to showcase the wide range of interactions PoseNet can enable. At the time, I expected it to be a starting point from which our ideas would grow more complex and layered. Instead, over the course of four workshops, we watched as Bill systematically stripped down each experiment until it contained only what he needed to convey a concept.

We started our first day together experimenting with a prototype I called Movement Multiplier.

The team discussing how to evolve the current prototype for the next workshop

Watching his dancers try it out, Bill said he liked the way it responded to the dancers’ movement, but it looked “like a screen saver”.

Two more experiments stood out to him that day: Body Writer and Audio Controller.

Body Writer (speak to attach words to your body) and Audio Controller (manipulate sound with a single body Point)

Bill liked the ability to transcribe speech (Body Writer) and the simplicity of seeing a dot relating to a single body point (Audio Controller). Could we combine elements from all three of these prototypes we saw today, he asked, so that a person’s spoken sentence becomes a trail of words behind their hand? Yes, we could.

So for our next workshop, we brought: Text Trailer. The dancers could now speak and pull their spoken sentences around the screen.

Text trailer: Use a body point to trail letters behind you (no person visible)

We went from utilizing all 17 PoseNet key points, to only focusing on one at a time; from displaying a large block of text, to a single elegant line. Bill is a famous improviser with speech, so the ability to speak freely and interact with speech and movement was compelling.

But he had two more issues: ever inspired by the human from, he disliked not seeing the dancer, and only seeing the graphics. He also wanted a way to “stick” lines of text onto the screen in order to build a composition.

So I built a new version in which you constantly see the live webcam feed, so you always see the dancer/user, and when you wanted to lock in a line, you simply move to the edge of the screen to drop it off.

Christina Robson, dancer with Bill T. Jones/Arnie Zane company, testing the next iteration of Text Trailer

To test this new version out, Bill prompted his dancers to try to use it to tell a story from their lives. He asked them to talk about themselves, what they do, and why they do it. Below Vinson Fraley, a dancer with Bill T. Jones/Arnie Zane company, performs with the final version of the experiment. Bill aptly named it Manifesto.

Vinson Fraley, dancer with Bill T. Jones/Arnie Zane company, improvising with Manifesto

Finally, we had reached a point in which the human and story were the focus, and the technology simply a means. You can create your own Manifesto on the Body, Movement, Language site.

But if you want to build your own “movement meets machine” experiments, where should you start?

Where you can start: PoseNet Sketchbook

An online collection of PoseNet experiments. Check out the repo on GitHub for installment and development instructions.

I’ve published here what is essentially my raw starter sketchbook that I used with Bill during our collaboration.

It is not a library or evolving repository. Instead, it is an archive of Body, Movement, Language’s beginnings. I hope it’s a starting point for anyone to create their own wacky, wild, or just plain useful PoseNet experiments that push the boundaries of digital interaction design beyond the current standard of clicks, presses, and taps.

PoseNet is continually improving, thanks to the tensorflow.js team. To learn more about the model, you can read their blog post for a high level description of how it works.

Two prototypes from the sketchbook: Basic and Movement Multiplier (grid mode)

A few PoseNet specific technicalities I discovered:

  • It recognizes humans best in very pedestrian human positions as opposed to unusual forms, like a dancer with her leg by her head, for example.
  • It recognizes some points better than others: the nose, for example, tends to be pretty consistent — way more consistent than an elbow or wrist, especially if the person is moving around a lot.
  • Multi-pose works best if more than one person is expected to be in the space. Otherwise, it tries to make sense of multiple people as a single person.
  • PoseNet has no sense of depth, but you can calculate scale based on the distance between two points. I like to use the distance between eyes because people tend to be watching the screen so the eyes are consistently recognized points. When you get closer to the screen, the distance between your eyes becomes larger in the image, so you are able to use the value to scale any elements you want to respond to a person’s proximity to the machine.
  • Smoothing is useful, but only truly works when a single person is in the frame. Because PoseNet has no persistent knowledge of who is attached to each pose, it often returns poses in a different order.

Throughout all this testing and tinkering, I enjoyed uncovering these useful snippets about the model, but my most important learning was how to keep the technology subtle in order to let the meaning of what Bill and the team were making shine through.

So, if you decide to remix any of the starter sketches from the collection, I encourage you to forget about the technicalities (for a moment!) and consider the concept. Or, as Bill would say, can you “make those dots make somebody cry”?

Bill on stage at New York Live Arts during our last workshop, warming up for his performance.

Maya Man is a Brooklyn based artist and technologist. Currently, she’s building experiments at the Google Creative Lab. You can find her online at mayaontheinter.net.