Rebuilding the Player Experience

Before and After switching to a waveform player.

The recording archive at UNT runs about audio 12,000 downloads but 70,000 audio streams per year to authenticated faculty and students. Even though we push a considerable amount of streaming video, supporting streaming audio is just as vital, and our clients spend a large part of time listening to those recordings.

At first glance this might seem like a Sound Cloud ripoff, however the switch to a waveform-based player is a calculated improvement.

Most people on streaming services listen to a 5 minute or less track from start to finish. That just isn’t the case for an archive with this type of content. After spending a year tracking interactions it was clear the students and faculty were listening to specific portions of a concert, and then skip around within a piece.

Using screen recordings it was clear that listeners were struggling to find a specific spot in the audio timeline. Below is an example of a screen recording, red dots are mouse clicks. These were rapid, all within about 5 seconds.

Each red dot is a mouse click.

This is a frustrating experience and wastes time. There was an average of 8 seeks to each track play and overall is more than 5 times the amount of seeking in audio than in our streaming video. In some cases users spent more time trying to find the right spot than the time they listened to that track.

In post production when cutting up tracks we have waveforms to visually see where loud and soft points are in the track. Like seeing The Matrix code, after awhile we can start to see key points like applause, movement breaks, and sections within the piece. It helps us to rapidly cut up a single file into movements and pieces without listening to the entire concert again or constantly seeking. If a musician could see peaks and troughs, loud and soft, it should help them better navigate the piece.

Waveforms from a concert.

The Technicals

New features built into browsers including canvas and the web audio API allow for easier development of a waveform audio player. I used a waveform player project called Wavesurfer.js and switched to an internal API to push track data. The process server-side checks and builds compressed aac audio from wav files, then checks and builds waveform data using the BBC’s audiowaveform project, saving to a small data file. When a visitor loads the page, Wavesurfer loads the waveform data and draws a player in the canvas element and attaches typical audio player event handlers and methods.

The post-production review process also benefited from waveforms. Wavesurfer can build the waveform data in the browser instead of server-side. Before any data files are created, I can review a concert seeing real-time waveform data and can inspect the tracks quicker for quality control.

So why not use the browser-based waveform generation instead of server-side? Creating this data on multiple tracks, over an hour of audio, uses a lot of memory and can crash the browser after a dozen tracks. On a mobile browser it only takes 2 or 3 tracks before a crash.

The new player was rolled out in January of 2017. Below is the comparison of the two most active months of the year:

Stats for two busy months before and after the switch.

Seeks were reduced by 55%, even while plays increased. Screen recordings show less rapid seeking and longer play times. User session times doubled.

Where this really shines is in the single track concerts. Some recordings are not cut if it were a rehearsal, difficult to find movement breaks (attacca) or doctoral lecture. In fact, the time to find movements can double the post-production editing time.

A concert not cut into individual files.

Each one of those globs of waveforms is a piece. You can even see the applause before each piece as the musicians walk on stage.

Other improvements were added making the page more like a playlist with keyboard navigation.

By studying a sample of users I found the amount of time spent on each visit had increased almost 14% with -8% less time in navigation (amount of time audio/video was played subtracted by the total visit length). Decreasing friction to achieving the user goals improved engagement.

And given half of our visitors are not on desktops, yes, it works quite well on mobile.