How to create a music visualizer

Avi Rzayev
Analytics Vidhya
Published in
5 min readApr 30, 2020

We all like music. Music could be in any form and kind, and each person has a different taste. The common factor is we all can listen to music, but why only listen to music when we also can see music?

This article is about my experience in making a music visualizer. This article will also explain how to create your visualizer.

How can we visualize music?

We need to know how music consists of and how to visualize those parts. Music is a combination of sounds. Sounds are frequent vibrations that our ear detects. The vibration is defined by frequency and amplitude — speed and loudness.

The easiest way to visualize that is by drawing a row of bars. Each bar represents a frequency. While the music goes on, those bars will move up or down, depends on the amplitude of the frequency.

Implementation in Python

Before you start to code, you need to install a graphical library and a sound analyzer library. In my project, I used Pygame(graphical) and Librosa(sounds).

Librosa has very useful functions that help us to analyze sounds. documentation: https://librosa.github.io/librosa/

Here is a code which returns a 2-dimensional array of frequencies magnitude corresponding to a certain time:

# getting information from the file
time_series, sample_rate = librosa.load(filename)
# getting a matrix which contains amplitude values according to frequency and time indexes
stft = np.abs(librosa.stft(time_series, hop_length=512, n_fft=2048*4))
# converting the matrix to decibel matrix
spectrogram = librosa.amplitude_to_db(stft, ref=np.max)

librosa.load() is reading the given file and keeps the information about the file for later uses. mple_rate is how much samples are taken per period. time_series is a one-dimensional array that represents the time when each sample was taken.

Libros.stft() returns us a 2-dimensional array with the frequencies and time. Then you can see that I converted this array from amplitude to decibel. This step is not necessary unless you what to use decibel units.

The Short-time Fourier transform (STFT), is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time.(Wikipedia)

hop_length is the number of samples between frames. n_fft is the number of samples in each frame. I found that when increasing the n_fft, the result becomes more accurate, I set it 4 times greater than its default value.

You can also see the result of the STFT, using matplotlib:

librosa.display.specshow(self.spectrogram,
y_axis='log', x_axis='time')
plt.title('Your title')
plt.colorbar(format='%+2.0f dB')
plt.tight_layout()
plt.show()
Now you can see the decibels for each time and frequency. 0db is the loudest.

You can access the array’s values using indexes. But we what to choose the value by its time and frequency. This code might help:

frequencies = librosa.core.fft_frequencies(n_fft=2048*4)  # getting an array of frequencies# getting an array of time periodic
times = librosa.core.frames_to_time(np.arange(self.spectrogram.shape[1]), sr=sample_rate, hop_length=512, n_fft=2048*4)
self.time_index_ratio = len(times)/times[len(times) - 1]self.frequencies_index_ratio = len(frequencies)/frequencies[len(frequencies)-1]

I separated the 2d-array into arrays that indicate what is the time or the frequency for certain indexes. The sample-rate is constant. Therefore, we can create a ratio between the time and the index, and the same to the frequency. Then, we just multiply the time and the frequency we what by the ratio, and we get the indexes:

def get_decibel(self, target_time, freq):return spectrogram[int(freq*frequencies_index_ratio)][int(target_time*self.time_index_ratio)]

Now we only need it to represent it using “moving bars” that I have mentioned at the beginning.

Create a class that will represent our frequency bar:

class AudioBar:def __init__(self, x, y, freq, color, width=50, min_height=10, max_height=100, min_decibel=-80, max_decibel=0):self.x, self.y, self.freq = x, y, freqself.color = colorself.width, self.min_height, self.max_height = width, min_height, max_heightself.height = min_heightself.min_decibel, self.max_decibel = min_decibel, max_decibelself.__decibel_height_ratio = (self.max_height - self.min_height)/(self.max_decibel - self.min_decibel)def update(self, dt, decibel):desired_height = decibel * self.__decibel_height_ratio + self.max_heightspeed = (desired_height - self.height)/0.1self.height += speed * dtself.height = clamp(self.min_height, self.max_height, self.height)def render(self, screen):pygame.draw.rect(screen, self.color, (self.x, self.y + self.max_height - self.height, self.width, self.height))

I created x,y coordinates, frequency of the bar, color, and ranges for its height and the decibel. I defined a ratio between the height and the decibel to determine the bar’s height later. In the update() method, I get the desired height of the bar corresponding to the current decibel and setting the speed to the bar growth.

bars = []
frequencies = np.arange(100, 8000, 100)
for c in frequencies:
bars.append(AudioBar(x, 300, c, (255, 0, 0), max_height=400, width=width))
x += width

Here I am creating an array that holds the bars. I created 80 bars from 100Hz to 8000Hz with a step of 100 and added them to the array.

Then, you simply run a Pygame window and draw the bars:

t = pygame.time.get_ticks()
getTicksLastFrame = t
pygame.mixer.music.load(filename)
pygame.mixer.music.play(0)
# Run until the user asks to quit
running = True
while running:
t = pygame.time.get_ticks()
deltaTime = (t - getTicksLastFrame) / 1000.0
getTicksLastFrame = t
# Did the user click the window close button?
for event in pygame.event.get():
if event.type == pygame.QUIT:
running = False
# Fill the background with white
screen.fill((255, 255, 255))
for b in bars:
b.update(deltaTime, get_decibel(pygame.mixer.music.get_pos()/1000.0, b.freq))
b.render(screen)
# Flip the display
pygame.display.flip()
# Done! Time to quit.
pygame.quit()

Note that I also used pygame.mixer to play the music and accessed the time using pygame.mixer.music.get_pos()

You made a music visualizer! You can find the full code here: https://gitlab.com/avirzayev/medium-audio-visualizer-code/-/blob/master/main.py

Expanding your project

This article covered the basics of creating a simple music visualizer. You can create amazing visualizers using this little example.

Firstly, try to simplify the code by wrapping the code with classes. Creating a class for audio analysis will make the code tidier, and help you to avoid writing the same code again.

Secondly, you may try to visualize your code, in another way. Instead of drawing a row of bars, you can place them in a circle. You can also create some triggers that affect the visualizer’s appearance. For example, you can create a bass trigger — when there is a certain amount of bass, you can make the bars change the colors.

To summarize, you can create wonderful visualizers, you can use this article to guide you. Happy coding!

--

--