[CC Lab 22 Fall] Odd-time signature project(on-going)

Shoma Sawa
Computational Creativity Lab at Keio SFC
6 min readFeb 8, 2023

0. Motivation

Music generations in deep learning are mostly trained and generated on 4/4 time signature, with lack of emphasis on other time signatures. Furthermore, the availability for non 4/4 time signatures data-set hinders us from exploring beyond 4/4 time signatures. Due to the lack of exploration of odd time signatures in this research area, I decided to focus on generating odd-time signature MIDI generation.

  1. Introduction

Generations of irregular and mixed meter have been explored previously in a paper[1], but with limited control of the time signature it generates. The paper utilizes it’s hybrid temporal scope representation and Bidirectional Long Short-Term Memory(LSTM) network to generate music with mixed and irregular meter. However, due to the representation, it has no control on which time signature to generate on. Time signature are often overlooked in the music processing, where a survey in time signature detection[2] has discussed the need for such analysis. Music generations are mostly seen trained in 4/4 time signature, but time signatures other than 4/4 exists as well. By exploring music generations beyond 4/4, we can open up for the possibility of having the creative freedom of choosing the time signatures of the music to generate on. We hypothesize by using Conditional Variational Autoencoder(CVAE) with RNN layer will allow for controllable time signature music generation.

2. Datasets and Methodologies

An dataset from Reddit’s Drum Percussion MIDI was used, in which each midi with the supported time signature was sliced into 2 bars using PrettyMIDI[3] and Magenta’s note-seq[4]. The sliced midi was then converted to a matrix form, consisting of the drum’s onset, velocity and the conditional vector(time signature). All of the midi was quantized into 12 quarter note to support triplets, and the matrix size was set to shape(168, 9), consisting of 168 notes and 9 possible drum instruments. The largest time signature used for our experiment was 7/4, thus the matrix size was set to an axis of 168(1/4 consists of 12 notes, thus 2 bars of 7/4 will mean (12 × 7) × 2 = 168. An RNN was used for the CVAE and the datasets was filled with padding when necessary(when a certain time signature’s music generation is ended, the rest is filled with the padding.) Filling the datasets with padding was crucial as some time signatures matrix can leave out some empty space in the matrix. For example, 2 bars of 4/4 will consists of 96 notes, (12 × 4) × 2 = 96, in which the rest of the matrix will be empty. This emptiness may result in poor generations among certain time signatures, as 4/4 meter is the most dominant meter out of all the meter available in the dataset.

Dataset distribution for the time signatures

3. Results

The output from the trained CVAE was poorly generated(can be seen in the appendix). Out of all the conditioned output, 4/4 time signature had the best result. This is trivial, as the data distribution for the dataset is heavily skewed to 4/4 time. The problem may have been due to the data representation in an absolute value(matrix), instead of a relative value(symbolic). The matrix approach may have been inappropriate due to the aforementioned sparseness of the matrix. Therefore, a symbolic approach may alleviate the problem, and improve the training and generation of the model.

The symbolic approach would be like the following figure. It will convert the midi into a token.

Example 1.

Tokenization of example 1.

['BAR',
'BEAT_0',
'POS_0',
'KICK',
'BEAT_1',
'POS_6',
'KICK',
'BEAT_2',
'POS_0',
'KICK',
'BEAT_3',
'POS_6',
'KICK',
'BEAT_4',
'POS_0',
'KICK',
'BEAT_5',
'POS_6',
'KICK',
'BAR',
'BEAT_0',
'POS_0',
'KICK',
'BEAT_2',
'POS_0',
'KICK',
'BEAT_4',
'POS_0',
'KICK',
'END']

For future research, I would research using this symbolic approach instead. Although, my research on odd-time signature are still on-going, I will continue to research music generation(especially symbolic). As my goal is to create a AI system that can both generate and assist the users in composing a song in a DAW.

3.1 Extra: ChatGPT

ChatGPT from OpenAI is now a big thing, which was a surprise for me because AI models(deep learning model) usually do not get picked up by the media. The feat achieved by ChatGPT may explain the hype and praise it gets. However, the ethical concerns about outsourcing Kenyan workers (human labellers) is important to be highlighted. Yes, it’s great to achieve better accuracy, but at what cost? Also, as like any other language models, it includes bias as well. It is important for the company to develop a model that helps a person, not dehumanize them. Furthermore, I believe majority of the consumer do not understand what exactly the model is doing and trained on. It is important to note the data is only limited to 2021 data, and can generate incorrect information’s. There is indeed a lack of awareness and knowledge about ChatGPT, therefore the company should try to be more transparent about their services, educate their consumers.

4. Conclusion

Through this seminar, I have been inspired by a various interesting projects and presentations from my lab mates. Despite having a different research area, everyone seems to be passionate about enhancing our lives creatively. I enjoyed my time at CCLAB, and would like to thank my fellow lab mates and professor.

5. Appendix

Generated output for 4/4 drum midi
Generated output for 3/4 drum midi
Generated output for 5/4 drum midi
Generated output for 6/8 drum midi
Generated output for 7/4 drum midi

6. Reference

[1] Z. J. Kan and A. Sourin, “Generation of Irregular Music Patterns With Deep Learning,” 2020 International Conference on Cyberworlds (CW), 2020, pp. 188–195, doi: 10.1109/CW49994.2020.00038.

[2]Abimbola J, Kostrzewa D, Kasprowski P. Time Signature Detection: A Survey. Sensors (Basel). 2021 Sep 29;21(19):6494. doi: 10.3390/s21196494. PMID: 34640814; PMCID: PMC8512143.

[3] Colin Raffel and Daniel P. W. Ellis. Intuitive Analysis, Creation and Manipulation of MIDI Data with pretty$_$midi. In 15th International Conference on Music Information Retrieval Late Breaking and Demo Papers, 2014.

[4] https://github.com/magenta/note-seq

[4] Alexander, J.A.\ \& Mozer, M.C.\ (1995) Template-based algorithms for
connectionist rule extraction. In G.\ Tesauro, D.S.\ Touretzky and T.K.\ Leen
(eds.), {\it Advances in Neural Information Processing Systems 7},
pp.\ 609–616. Cambridge, MA: MIT Press.

[5] Bower, J.M.\ \& Beeman, D.\ (1995) {\it The Book of GENESIS: Exploring
Realistic Neural Models with the GEneral NEural SImulation System.} New York:
TELOS/Springer — Verlag.

[6] Hasselmo, M.E., Schnell, E.\ \& Barkai, E.\ (1995) Dynamics of learning and
recall at excitatory recurrent synapses and cholinergic modulation in rat
hippocampal region CA3. {\it Journal of Neuroscience} {\bf 15}(7):5249–5262.
}

--

--