Yes, ML can be hard. Yes, most companies are still unaware that Applied ML and ML Research are 2 completely different disciplines. Yes, there’s still technical friction in going from experimentation to production. Yes, managing data at scale can be painful. And yes… many ML projects fail because of one or more of the above reasons.
But when you start debugging the real issues out there all the way to the root, I’ve learned that in most cases, it all comes down to the fractal nature of decision-making:
poor definitions of BUSINESS OBJECTIVES.
… which end up mapping to volatile…
This is the second article under the ‘Audio AI’ series I began back in March and it can be considered Part 2 after my first article on vocal isolation using CNNs. If you haven’t read that one yet, I highly recommend you start there!
As a quick recap, in that first article, I showed you that we can build a pretty-small-for-the-task Convolutional Neural Network (~300k parameters) to perform vocal isolation in real-time. We tricked this network into ‘thinking’ it was solving a simpler problem and eventually we got this kind of results:
Now, how do we go from here…
What if we could go back to 1965, knock on Abbey Road Studios’ front door holding an ‘All Access’ badge, and have the privilege of listening to those signature Lennon-McCartney harmonies A-Capella? Our input here is a medium quality mp3 of We Can Work it Out by The Beatles. The top track is the input mix and the bottom track, the isolated vocals coming out of our model.
Formally known as Audio Source Separation, the problem we are trying to solve here consists in recovering or reconstructing one or more source signals that, through some -linear or convolutive- process…
Head of Machine Learning @Splice. Audio AI. Undercover product guy. Made in 🇦🇷, living in Venice, CA.