Week 7 — Tune It Up

Fidan Samet
BBM406 Spring 2021 Projects
4 min readMay 30, 2021

Hello world,
We are Fidan Samet, Oğuz Bakır and Adnan Fidan. In the scope of the Fundamentals of Machine Learning course project, we have worked on music genre transfer and prediction. We have been writing blogs about our progress throughout the project and this is the last one of our blog series. The final results obtained according to our contributions to the baseline model and the last remarks will be covered in this post. So let’s get started!

Previously on Tune It Up…

Timeline of Tune It Up

Last week, we talked about eliminating Pop tracks from our dataset, the genre classification results we obtained on this updated dataset, our contributions to CycleGAN baseline model and the visual results on genre style transfer. You can find last week’s blog here. This week, we will talk about the last results obtained according to our contributions to the baseline model and the last remarks.

Experimental Results

Last week, we talked about the contributions we made on the baseline model. After the model trainings were finished, we tested the models. By giving these test results to the music genre classifier model that we built with the Multi-layer Perceptron algorithm, we evaluated our style transfer models. So let’s see those accuracy results according to the contributions we made.

1. Generator Networks

Test Results — ResNet vs U-Net

As we mentioned last week, CycleGAN uses ResNet as its default generator network. To improve the baseline model according to our task, we experimented with another well-known generator network, U-Net. However, ResNet performed significantly better than U-Net generator network.

2. Auxiliary Discriminators

Test Results — Baseline vs Auxiliary Discriminators

As you can recall, we added auxiliary discriminators to our baseline for model to stick the music manifold and generate more realistic music. However, it did not help to increase the baseline performance.

3. Auxiliary Discriminators & Triplet Loss

Test Results — Baseline vs Auxiliary Discriminators & Triplet Loss

We also added triplet loss to the model we used auxiliary discriminators for obtaining better style transfer results. As you can see from the table above, triplet loss increased the performance of the model. Below are the examples of this setup according to Jazz2Classic and Classic2Jazz translation tasks.

  • Jazz2Classic
Audio Form of the Input MIDI File — Jazz
Audio Form of the Output MIDI File — Classic
  • Classic2Jazz
Audio Form of the Input MIDI File — Classic
Audio Form of the Output MIDI File — Jazz

4. Triplet Loss

Test Results — Baseline vs Auxiliary Discriminators & Triplet Loss vs Triplet Loss

To investigate the role of auxiliary discriminators in the previous model, we removed them and trained the model by only adding triplet loss. As above table shows, auxiliary discriminators have a little effect on the model performance. Below are the examples of this setup according to Jazz2Classic and Classic2Jazz translation tasks.

  • Jazz2Classic
Audio Form of the Input MIDI File — Jazz
Audio Form of the Output MIDI File — Classic
  • Classic2Jazz
Audio Form of the Input MIDI File — Classic
Audio Form of the Output MIDI File — Jazz

Conclusions

Our developed method benefits from our strong baseline CycleGAN. We contribute to this model by adding auxiliary discriminators and triplet loss. We improve the baseline model with these contributions. We especially obtain good results for Jazz2Classic music genre transfer.

One important weakness of our method is that we do not have an objective evaluation method. Therefore, we have to rely on the prediction model or make a subjective user study analysis. That is why, we prepared a subjective experiment. We would be more than happy if you could join our user study. You can find it here.

Another weakness is that our method is not successful enough for the Classic2Jazz music genre translation. However, the prediction model also performs poorly on this task. Therefore, we think that Classic2Jazz task is more challenging than Jazz2Classic task.

In the future, we can improve our results further for instance by adding new losses. We can also use recent Generative Adversarial Networks models as our baseline. Later we can apply our model to the new music genres.

--

--