Audio Style Transfer

Gautham Santhosh
Apr 17, 2018 · 3 min read

Applying CycleGan for Audio texture synthesis and Style Transfer.

CycleGAN is a Deep Learning method.

Normally CycleGAN gives you epic results like the one below

So we liked the idea of replacing an object in an image by another.

So these are our image outputs : -

Apples to Oranges
Well this didn’t work

Apple to Oranges and vice versa.

Well not all were bad, but some were pretty awful.

So we decided to apply this idea in another domain. AUDIO.

Wait wait….

Lets look at cycleGan first

Basic idea is you have image of one style ,you copy that style to another.

(Yeah like the Prisma App which was viral some time ago).

Cool thing is you only need unpaired image samples.

So google with a simple google search you can make your own Dataset.

If you wanna know more read the paper. In short it has a Cyclic loss with the Adversarial loss with Two Generators and Two Discriminators.

Tweak

Lets convert audio to image and apply the same thing.

We first chose midi file, But the results were not that great.
So instead we used Spectrogram.

So we convert audio to greyscale Spectrogram Image. And two sets of these images were used as the datasample.

Steps

  • Take two 20 second audio sample.
  • Slice them upto into 4 images of 5 seconds each (512px x 512px)
  • Apply to model
  • Use the outputs and restore it back.

We got this.

a) Style 1
b) Style 2 applied on Style 1
c) Style 2
d) Style 1 applied on Style 2

Results

Inputs

Style 1
Style 2

Outputs

Style 2 applied on Style 1
Style 1 applied on Style 2

Well it gets good at some point of training then it starts producing results which will rip your ear off again.So this is around the place where it is at its optimum.

Good news is its works.

If you make a better network it will give you better results and maybe apply some noise filter.

Some Future Tweaks

  • InputAudio -> Tweaked CycleGAN -> OutputAudio (Well its almost same), using librosa for audio input.
  • Use RGB instead of GreyScale.
  • Apply on DiscoGAN and compare results.

Now look at this epic tiget to panther conversion.

I couldn’t find a good benchmark, if you have an idea of one. Please comment below.
If you have some intersting ideas comment below. Lets try some stuff.

I would like to thanks Ankit Petkar and Amrit Daimary for their valuable contrubtions.
Thanks you for Reading.

If you are looking for a job in machine learning checkout mljobslist.com

This blog was originally posted at gauthamzz.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store