Week 4 — Tune It Up

Fidan Samet
BBM406 Spring 2021 Projects
5 min readMay 9, 2021

Hello world,
We are Fidan Samet, Oğuz Bakır and Adnan Fidan. In the scope of the Fundamentals of Machine Learning course project, we are working on music genre transfer and prediction. We will be writing blogs about our progress throughout the project and this is the fourth one of our blog series. The dataset we use, the baseline results of music genre transfer and the results of various machine learning algorithms on music genre prediction will be covered in this post. So let’s get started!

Previously on Tune It Up…

Timeline of Tune It Up

In the previous weeks, we were working on prediction and style transfer of song release years. While reviewing the literature in the second week blog post, we discovered that audio files are needed for music style transfer. While analyzing the datasets in the third week blog post, we found out that the songs belonging to the most distinguishable decades are too few in MSD¹ and are almost absent in FMA². After the last week’s blog post, we discussed this issue with our instructors and we decided to change the domain of our topic. Therefore, we work on music genre transfer and prediction instead of release years. This week, we will examine the dataset we use, the baseline results of music genre transfer and the results of machine learning algorithms on music genre prediction.

Dataset

After changing the domain of our topic, we started to search for datasets containing audio files to perform music style transfer. Although there exists this kind of datasets, lyrics and various instruments are mixed in them. To perform music genre transfer, these variants must be separated from each other but due to time issues, we cannot effort that. In our literature review, we came across a related work that performs music genre style transfer, which will be covered in the next section. They created a music dataset³ that contains only piano as an instrument to perform symbolic style transfer. Below is the chart of the distribution of samples according to music genres in this dataset.

Distribution of Samples According to Music Genres

The dataset consists of MIDI files and the number of samples is high for each music genre. Since this dataset is prepared for symbolic music genre transfer, it is suitable for our task. Therefore, we decided to use this dataset.

Music Genre Transfer

As mentioned in the previous section, there is a work on music genre transfer: Symbolic Music Genre Transfer with CycleGAN⁴. In this work, the authors perform music genre transfer by introducing additional discriminators and classifiers to the CycleGAN⁵ model. Since their code is available, we decided to use this model as baseline. Below is the note sequence results of the model after approximately 10 hours training. Note that we trained this model for transferring from jazz to classical genre.

Note Sequence of the Input MIDI File — Jazz
Note Sequence of the Output MIDI File — Classical

As it can be seen from the piano rolls, the transferred note sequence is in harmony with the source note sequence. This note pitch conservation is an expected feature of the model. Below is the audio form of these note sequences. You can listen to them and see the difference. As you can notice, the transferred music sounds like a classical music while retaining enough of the melody of jazz music.

Audio Form of the Input MIDI File — Jazz
Audio Form of the Output MIDI File — Classical

While diving into code and trying to run it, we had some issues with code readability and older version of TensorFlow package. As the authors stated, their code needs refactorization and they are working on it. Therefore, we decided to use CycleGAN as our baseline and make improvements on that model in our next steps.

Music Genre Prediction

To perform music genre prediction, we used three different machine learning algorithms: Naive Bayes, k-Nearest Neighbors and Random Forest. We trained these algorithms with train data and tested their performance with test data. By using Naive Bayes algorithm, we obtained 51.57% test accuracy. Below is the test accuracy plot of using different k values in k-Nearest Neighbors algorithm. We obtained the best accuracy as 67.73% with this algorithm.

Test Accuracies of Different k Values in k-Nearest Neighbors Algorithm

We used different max depths to tune in Random Forest algorithm. Below is the train and test accuracies according to different max depth values in Random Forest algorithm. With max depths greater than 30, although train accuracy increases, there is no significant increase in test accuracy. Therefore, 30 is the value of tuned max depth. We obtained the best accuracy as 83.76% with this algorithm.

Train and Test Accuracies According to Different Max Depths in Random Forest Algorithm

That is all for this week. Thank you for reading and we hope to see you next week!

Bob Ross Says Goodbye

References

[1] Teixeira, M., & Rodrguez, M. M0444 Project One: Release Year Prediction for Songs.
[2] Defferrard, M., Benzi, K., Vandergheynst, P., & Bresson, X. (2016). Fma: A dataset for music analysis. arXiv preprint arXiv:1612.01840.
[3]https://drive.google.com/file/d/1zyN4IEM8LbDHIMSwoiwB6wRSgFyz7MEH/view?usp=sharing
[4] Brunner, G., Wang, Y., Wattenhofer, R., & Zhao, S. (2018, November). Symbolic music genre transfer with cyclegan. In 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI) (pp. 786–793). IEEE.
[5] Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223–2232).

Past Blogs

--

--