Creating Waveforms Out of Spotify Tracks

Using the Spotify Platform’s Audio Analysis Endpoint

Published in

The Startup

7 min readOct 11, 2020

One of the key features of my recent Future Islands project was pre-rendering waveforms so that we had a cool visual which would accompany audio playback. It’s a topic I haven’t thought about since I worked at SoundCloud many years ago. Since we didn’t stream the audio from Spotify on that project, I ended up extracting the waveform data using Meyda. (See the case study for more info.) However, when I finished up the project, I started to think about how I might actually be able to create waveform images from Spotify tracks. Spotify doesn’t really grant access to the full length audio files (for good reason) which would be required to extract this data. In addition, from what I can tell, their Web Playback SDK does not expose the audio in a way which you might be able to generate this in real-time using Web Audio.

I was ready to give up when I recalled that Spotify’s platform provided an Audio Analysis endpoint. This endpoint provides all sorts of interesting analysis on the track’s structure and musical content, including rhythm, pitch, and timbre. The object I was most interested in was Segments. These are sections of the track which contain roughly consistent sound. Each segment has many interesting properties, but I was most interested in three: the start point (in seconds,) the duration (in seconds,) and the max loudness (in decibels) of the segment. Using this data, I should be able to visualize the audio levels of the track. First, let’s download the data.

Calling the Endpoint

Since I’m going to be simplifying this data before I use it for visualizations, I like using Curl to download the data locally. This can be done very simply by passing the track id and a Spotify access token. You can generate a temporary access token using the Spotify Platform console. Once you do, just add --output track.json to the curl command to download the data to a file.

curl -X "GET" "https://api.spotify.com/v1/audio-analysis/TRACK_ID" -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_ACCESS_TOKEN" --output track.json

Now, we can simplify the data.

Preparing the Data

As I mentioned, the Audio Analysis endpoint provides all sorts of interesting data and this makes the returned data size pretty large to work with in practical applications. I downloaded Future Islands new track “Thrill” and the data amounted to 378kb. 😰 What I want to do is greatly simplify this data to only include an array of loudness levels from 0 to 1. I’ll then use this array of levels to generate a waveform using HTML5 canvas or SVG. I do this by writing a little node script.

First, we’ll include the node file system model and also our downloaded data.

const fs = require('fs')
const data = require('./track.json')

Next, we’ll create a variable for the track’s duration which is part of the data Spotify provides.

let duration = data.track.duration

Then, we’ll map the segments data to only include the start, duration, and loudness properties.

let segments = data.segments.map(segment => {
  let loudness = segment.loudness_max
  
  return {
    start: segment.start / duration,
    duration: segment.duration / duration,
    loudness: 1 - (Math.min(Math.max(loudness, -35), 0) / -35)
  }
})

If you look closely, you’ll see that I’m not mapping the properties directly as I want to create even further simplification. Instead of start or duration being a value in seconds, I want it to be a float between 0 and 1. We can get this by dividing each value by the duration we declared earlier.

Loudness is a bit more complicated because it is declared in decibels from 0 to -60. Similar to the two time properties, I want to turn this into a float from 0 to 1. In order to do this, I need to define a range of values. You might think that simply setting the range at 0 and -60 would yield a good result but how often do Spotify tracks hit those lower dB values? On Spotify’s Audio Features endpoint, they actually share a chart which shows the overall distribution of decibel data on the platform.

Using this chart, I decided to set my range between 0 and -35. Without this, the waveform would actually have a lot of dead space for all those decibels which don’t appear frequently. You can then use the Math min and max functions with a bit of division to create that 0 to 1 float we’re looking for.

Another way of deciding which range of decibels we should use is simply figuring out what the lowest and highest dB values exist on the track. Let’s actually establish those values now from our simplified segments data because they may come in handy.

let min = Math.min(...segments.map(segment => segment.loudness))
let max = Math.max(...segments.map(segment => segment.loudness))

If you look closely at the segment data, you’ll notice the reason for the start and duration properties is that the Spotify analyzer does not analyze tracks in a nice consistent manner like every second. Instead, these segments are organized into chunks of consistent sound. What we want to do is create a new array called levels and create 1000 new evenly spaced audio levels from the beginning to the end of a track.

Since we already turned both our start and duration properties into floats, we can achieve this by using a for loop to increment through 1000 values between 0 and 1. Within each loop iteration, we’ll find the segment in which the current duration iterator falls and add that loudness value to the array. However, we won’t add loudness directly and instead add one more level of simplification and divide the loudness value by the max value just in case the max value doesn’t actually reach 1. Oh, and we’ll also round the loudness value to two decimal places to. OK, enough simplification. 😅

let levels = []for (let i = 0.000; i < 1; i += 0.001) {
  let s = segments.find(segment => {
    return i <= segment.start + segment.duration
  })  let loudness = Math.round((s.loudness / max) * 100) / 100  levels.push(
    loudness
  )
}

The last thing to do is write the levels array to a JSON file so we use it for visualization. You can do this with the fs writeFile function.

fs.writeFile('levels.json', JSON.stringify(levels), (err) => {
  console.log(err)
})

This is a rough solution I came to quickly and could certainly be cleaned up quite a bit. In the meantime, here’s the gist of it.

Generating the Waveform

Depending on where you plan on using your waveform, there are many ways of generating a visual of them from SVG to Canvas. Personally, I prefer generating my waveforms using Canvas so they can be both dynamic and responsive to screen sizes. Check out this CodePen which loads up our data and generates a responsive waveform. I’ll explain a bit about what’s happening.

First, I’m using axios to load the waveform data but you can use fetch if you’d like. I establish a Vue.js method called renderWaveform to do the actual rendering and make sure to call it anytime the window is resized.

let { data } = await axios.get('levels.json')this.waveform = datathis.renderWaveform()window.onresize = () => this.renderWaveform()

We can finally move on to actually rendering this thing. I like to place my <canvas> element into a responsive parent div and simply resize the canvas size based on the parent’s size. Once you do that, establish the context for drawing.

let canvas = this.$refs.canvaslet { height, width } = canvas.parentNode.getBoundingClientRect()canvas.width = width
canvas.height = heightlet context = canvas.getContext('2d')

We’ll then want to loop through each pixel of the width of our canvas and decide if and how we should be drawing a waveform line there. In this example, I’m going to draw a 4 pixel wide line every 8 pixels as I find this is aesthetically pleasing. I have also decided to mirror the waveform from a centered vertical location which is common in waveform visualization.

for (let x = 0; x < width; x++) {
  if (x % 8 == 0) {
    let i = Math.ceil(this.waveform.length * (x / width))    let h = Math.round(this.waveform[i] * height) / 2    context.fillRect(x, (height / 2) - h, 4, h)
    context.fillRect(x, (height / 2), 4, h)
  }
}

First, I check if the x value is divisible by 8. If so, we establish which value from the waveform data we should use for this line. We can define the height of this line by multiplying the selected waveform level with the height of the canvas. Since we’re mirroring the levels vertically, we’ll half this value. Finally, we’ll draw two lines using the fillRect method of HTML canvas. One which extends from the middle up and another which extends from the middle down.

Again, check out and fork this pen if you’re interested in this topic.

So what’s next? That’s really up to you. This could be a nice method of adding a known visual element to any custom Spotify player interfaces. I immediately began thinking about an embeddable Spotify player which also had timed comments. Let me know your thoughts and what you think.

Creating Waveforms Out of Spotify Tracks

Using the Spotify Platform’s Audio Analysis Endpoint

Calling the Endpoint

Preparing the Data

Generating the Waveform

Written by Lee Martin