Measuring the subjective — improving the quality of AI-generated music as a PM

For context, I used to be a Product Manager at Jukedeck, a London-based AI music startup. I probably knew the least about music there.

Jukedeck’s warmup for Boiler Room at Slush Music Conference

Music quality is subjective, so we broke it down into three groups of levers we could pull: composition, arrangement and production.

We brainstormed tests to measure incremental improvements in music quality, and considered a range of variables.

To avoid analysis paralysis we picked the simplest test: present two tracks (one before a change, one after) and ask the user to rate them both on an arbitrary scale.

Using Google Scripts we could quickly run a test at scale. We’d serve up two random tracks (one old, one new) for a user to compare and rate.

We tracked two measures: 1) % of users that preferred the new tracks to the old, and 2) a measure of by how much the quality had improved.

