Motion Music

Creating music by movements(p5.js + poseNet)

Atharva Patil
Disintegration Anxiety | Fall 2018
6 min readDec 9, 2018


Screengrab from final presentation

As part of my course Introduction to Computational media class at ITP I had to build a playful application using Javascript(probably using the p5.js library). I built my final project using a combination of p5.js javascript library and ml5’s pose estimator library(poseNet).


As part of the 14 week class I learned about the different types of elements which make up the beautiful websites we love, how they work and how to build some by myself. This was an amazing learning experience where I explored DOM, Serial Communication, Cueing and manipulating video and sound, working with API’s(built my own weather display) & machine learning applications using ml5 library.

As part of my final presentation I had 2 objectives:

  • Use multiple aspects of what I had learnt.
  • Build a interaction between the user and the system.
Interaction and fascination similar to these cats playing with Newton’s cradle

So, as part of ideation I came up with a few project ideas which I thought were possible in 2 weeks time. After discussion & feedback from class and Cassie I narrowed it down to 2 ideas. Building a mixing a zodiac API with the weather API to predict weird futures based on user inputs or playing music based on how people move or interact.


As someone who just knew about the existence of poseNet & ml5 I decided to break my workflow into three steps before I go and build my final idea.

- Get poseNet to work(figure out the configurations to connect, receive and understand the received data).
- Display a circle at the nose(or google eyes) and sort through variables for the 17 data points returned.
- Build a simple interaction using a body part.


I got poseNet to work(and turned myself into a clown by putting a red circle on my nose).

self five(source giphy)

Next was building some interactive logic around it. I took the nose as the interactive trigger as it was the easiest one to test with while sitting and debugging. So, I decided to map the nose position across the X axis & have different songs played across different segments of the screen window. After a day’s work and finding some cute minion sounds on a website I ended up building this;

This was a fun experimentation but this had two major problems which became apparent in the playtest:

  • The screen was mirrored so figuring out personal sense of direction was a bit weird.
  • Even though not apparent in the video it was a laggy experience as the video capture display was slowing it down.


Cassie suggested a very simple solution to fix the mirroring problem, which was flipping the X-axis about. It was a simple few lines of code which looks like this:

translate(video.width, 0);
scale(-1, 1);
background(255, 240);

& reducing the video snippet size helped in reducing the latency in the video stream. Even with these two issues solved there was one more thing to worry about which was communicating how to use this to a random user. So in order to have a good user experience even if they come and stand in front it there should be prompt. I tried a placing the text around the video canvas and a few variations of the text. I narrowed down on a few & built a fun minion player which was controlled by the nose.


Moving to a different body part(from nose to right wrist) for play testing while building the right hand controlled drum-kit gave an very interesting insight. The ported model for p5js + ml5 is a bit unreliable when it comes to certain body parts. It detects nose, eyes, shoulders, hips with a much higher confidence than wrists, ankles or elbows.

I tried to see the change in body motion by connecting different body parts as vectors and measuring the change in angles between as people move. The first test I ran was with two vectors between my shoulder to hip and shoulder to elbow. As I moved my arm up and down trying to test the code I found there to be a large error in what I was seeing in the console and the actions I was doing. I tried to see if it was just a glitch but the pattern existed across all body parts.

This made me shift from my original plan of detecting changes in pose live to detecting where different body parts are and trigger music accordingly. This meant I had to rethink the visual design and way people interacted.

I decided to stick with the original hypothesis where people move and music plays. So, I wanted to know if people had the least amount of prompts on what they had to do how they explore and interact. I decided to get rid of the live video and connect detected body parts as a skeleton, add logic similar to the drum kit logic and let people play with it.

Here’s a video of Cara trying it out during the class playtest session.

Some things I learnt during the playtest:

  • People moved their hands a lot to figure if something is going on.
  • Head movements are second most common(Moving eyes in different directions was also tried)
  • Once the triggers were figured out people tried experimenting with trying to create a their song.

Based on the observations I had during the playtests I decided not to add permanent colour sections and label based on the location of body parts, but just simple cues to indicate when they have explored and started playing a new sound.

The final piece for the class looks like this;

Screenshot from final presentation

Github repository with the final code.

Even though it did not turn out to be the thing I had imagined it to be but, in terms of interaction & engagement people enjoy trying to move around and make some sounds. Even waving hand in weird angles make them look weird they still enjoy doing it because of the fun results they get out of it.


source giphy

I will be working with a Arnab on expanding his static image poseNet estimator to estimate how close the posed as compared to famous Bollywood celebrities. The idea is to get it installed somewhere as an fun exhibit.

I may have done much more if the vectors worked with a lower margin of error. Getting the points from Kinect which are relatively more reliable, I have part of the logic with me already and maybe possible to expand the project in that direction. Also the possibility using Tone.js to generate music based on position/moves is another direction to improve the project.

Adding a introduction page for people who will be using it on the browser without any introduction as they have instructions to walk back.