Indian Rāgas 101: Detour — analytical methods

Parikshit Sanyal
Indian Rāgas 101
Published in
3 min readSep 29, 2019

In a discussion of music, we can never leave maths too far behind. Which means, we need techniques to analyse — and visualise — sound.

For electronic music, this is easy. Lossless files (WAV) are stored as a sequence of numbers, which reflect the very waveform. For example, when I say ‘Sa’ at my microphone and save it as a WAV file, it gets stored like this

import matplotlib.pyplot as plt
from scipy.fftpack import fft
from scipy.io import wavfile
bitrate, data = wavfile.read('sa.wav') # load the data
print(data)
# Output
array([[184, 0],
[138, 0],
[186, 0],
...,
[730, -1],
[673, 0],
[710, 0]], dtype=int16)

When plotted directly, this data gives the bare waveform

a = data.T[0] # Selecting the first channel, remember these days all audio is stereo, i.e. two separate channels
plt.plot(a{:1000]) # Plotting the first 1000 values for easy visualisation
The bare waveform of a ‘Sa’ in my voice (amplitude plotted against time)

As you can see, I am not producing pure tone (actually, far from it). I am producing a random mix of frquencies. To break it down, we will use a Fast Fourier Transform (Code snippet from user eusoubrasileiro at https://stackoverflow.com/questions/23377665/python-scipy-fft-wav-files)

import matplotlib.pyplot as plt
from scipy.fftpack import fft
from scipy.io import wavfile
bitrate, data = wavfile.read('sa.wav')
data_fft = fft(data.T[0])
length = int(len(data_fft)/2)
plt.xlabel("Frequency")
plt.ylabel("Amplitude")
plt.plot(abs(data_fft[:(length-1)]),'r')
plt.show()
The frequency domain of the smae

My ‘Sa’ is a lot many frequencies. most prominently those between 1000–2000 Hz. This is not very useful: let’s try it out with a piano.

The fourier transform of a D# from a piano

As expected, the piano produces a much cleaner graph (pardon the ambient noise at my home, which are showing up as the low frequencies). The Fourier transform produces nice frequency graphs, which will be handy for our purpose.

Visualising all at once

Any sound, in effect, is a set of frequencies, having certain amplitudes, which change with time. We can, theoretically, make a 3 dimensional plot of all three variables. (Wavosaur has built in 3D spectrograms: https://www.wavosaur.com/download.php).

x = time, y = frequency, z = amplitude

Alternatively we can represent the 3rd dimension with intensities of color (brighter = louder) and convert any sound (of a certain duration) into an image!

The plot of ambient sound in my room
And what happens when I say ‘Sa’

Try it out at https://auditoryneuroscience.com/acoustics/spectrogram.

--

--