Audio Style Transfer
Applying CycleGan for Audio texture synthesis and Style Transfer.
CycleGAN is a Deep Learning method.
Normally CycleGAN gives you epic results like the one below
So we liked the idea of replacing an object in an image by another.
So these are our image outputs : -
Apple to Oranges and vice versa.
Well not all were bad, but some were pretty awful.
So we decided to apply this idea in another domain. AUDIO.
Lets look at cycleGan first
Basic idea is you have image of one style ,you copy that style to another.
(Yeah like the Prisma App which was viral some time ago).
Cool thing is you only need unpaired image samples.
So google with a simple google search you can make your own Dataset.
If you wanna know more read the paper. In short it has a Cyclic loss with the Adversarial loss with Two Generators and Two Discriminators.
Lets convert audio to image and apply the same thing.
We first chose midi file, But the results were not that great.
So instead we used Spectrogram.
So we convert audio to greyscale Spectrogram Image. And two sets of these images were used as the datasample.
- Take two 20 second audio sample.
- Slice them upto into 4 images of 5 seconds each (512px x 512px)
- Apply to model
- Use the outputs and restore it back.
We got this.
a) Style 1
b) Style 2 applied on Style 1
c) Style 2
d) Style 1 applied on Style 2
Well it gets good at some point of training then it starts producing results which will rip your ear off again.So this is around the place where it is at its optimum.
Good news is its works.
If you make a better network it will give you better results and maybe apply some noise filter.
Some Future Tweaks
- InputAudio -> Tweaked CycleGAN -> OutputAudio (Well its almost same), using librosa for audio input.
- Use RGB instead of GreyScale.
- Apply on DiscoGAN and compare results.
Now look at this epic tiget to panther conversion.
I couldn’t find a good benchmark, if you have an idea of one. Please comment below.
If you have some intersting ideas comment below. Lets try some stuff.
I would like to thanks Ankit Petkar and Amrit Daimary for their valuable contrubtions.
Thanks you for Reading.
This blog was originally posted at gauthamzz.com