Master Real-Time Frequency Extraction in Flutter to Elevate Your App Experience

Florian Vögtle
neusta mobile solutions

--

If the title caught your eye, you’re probably wondering what this is all about. I’ve been fascinated by the latest trend from Apple and Samsung — integrating shaders directly into apps to create a fluid, dynamic user experience (See Video 1). These shaders often sync with real-time audio and, when combined with AI, give the interface a natural, organic feel. The result? AI feels like a seamless part of the entire system rather than an add-on. I wanted to bring that same experience to my own project — so here’s how I made it happen.

Video 1: Apple intelligence uses the frequencies of the voice for animation

I set out to build a real-time, sound-reactive shader in Flutter, aiming to have different frequency ranges in a voice independently alter specific parts of the shader. For example, the bass frequencies would influence one part of the shader, while the mid and high frequencies would affect other parts. The final result looks like this:

Let’s examine the code that enables frequency extraction from a real-time audio signal:

class VoiceApi {
void getFrequencies(void Function(List<({
FrequencySpectrum spectrum,
double value,
})> data,
) onData,
) {
final _flutterAudioCapture = FlutterAudioCapture();
_flutterAudioCapture.start(
(data) {
final buffer = data;
final fft = FFT(buffer.length);

final freq = fft.realFft(buffer);
final freqList = freq.discardConjugates().magnitudes().toList();
final frequencies = [
FrequencySpectrum(0, 20),
FrequencySpectrum(20, 25),
FrequencySpectrum(25, 31),
FrequencySpectrum(31, 40),
FrequencySpectrum(40, 50),
FrequencySpectrum(50, 63),
FrequencySpectrum(63, 80),
FrequencySpectrum(80, 100),
FrequencySpectrum(100, 125),
FrequencySpectrum(125, 160),
FrequencySpectrum(160, 200),
FrequencySpectrum(200, 250),
FrequencySpectrum(250, 315),
FrequencySpectrum(315, 400),
FrequencySpectrum(400, 500),
FrequencySpectrum(500, 630),
FrequencySpectrum(630, 800),
FrequencySpectrum(800, 1000),
FrequencySpectrum(1000, 1250),
FrequencySpectrum(1250, 1600),
FrequencySpectrum(1600, 2000),
FrequencySpectrum(2000, 2500),
FrequencySpectrum(2500, 3150),
FrequencySpectrum(3150, 4000),
FrequencySpectrum(4000, 5000),
FrequencySpectrum(5000, 6300),
FrequencySpectrum(6300, 8000),
FrequencySpectrum(8000, 10000),
FrequencySpectrum(10000, 12500),
FrequencySpectrum(12500, 16000),
FrequencySpectrum(16000, 22000),
];
List<({FrequencySpectrum spectrum, double value})> frequencyValues =
frequencies.map((e) {
final min = fft.indexOfFrequency(e.min.toDouble(), 44000);
final max = fft.indexOfFrequency(e.max.toDouble(), 44000);

return (
spectrum: e,
value: freqList
.sublist(min.floor(), max.ceil())
.reduce((a, b) => a + b),
);
}).toList();
onData(frequencyValues);
},
(e) {
debugPrint(e);
},
sampleRate: 44000,
bufferSize: 256,
);
}
}

// A frequency spectrum
class FrequencySpectrum {
FrequencySpectrum(this.min, this.max);

final int min;
final int max;
}

This might not look like a lot of code, but it brings a high level of complexity with it. To really understand how this works, we need to dig (somewhat) deep into audio processing. This is what I will do in this article.

Implementation Details and Methodology

Before delving into the code, it’s essential to first understand what an audio frequency is, in order to effectively extract it from audio signals.

Audio Frequencies

What exactly are audio frequencies? A frequency is a periodic vibration that is audible to the human ear¹. The unit of a frequency is called Hertz (Hz) and refers to how often the wave of the audio signal repeats complete cycles within a second, i.e. 5 repetitions equal 5 Hz (See Figure 2).

Figure 2: frequency (f), amplitude (A), and period (T). The orange portion of the wave signifies a complete cycle (from 0–0.2 seconds).²

We, as humans, link a small Hz number with a low tone, e.g. bass or the lowest “a” key on the piano, while a larger Hz number is linked to a high tone, such as whistling or the highest “a” key on the piano. Generally, the human ear can recognize frequencies between 20–20000Hz (20kHz). Therefore, every audible sound that a human can recognize, is a combination of the different frequencies with different levels in this frequency range (See Figure 4).

Figure 4: An audio signal is reproduced using FFT.⁹

Now that we understand what frequencies are, how are these extracted from an audio signal?

Microphone Signal

Before we perform any extraction, the captured audio signal from the microphone is required first. There are packages on pub.dev that take care of that. In this case, the flutter_audio_capture is used, but feel free to change to any other package.

To begin using this package, we must first initialize the flutter_audio_capture before listening to the audio signal.

final  _flutterAudioCapture = FlutterAudioCapture().initialize();

Once initialized, we can begin listening to the audio signal directly from the microphone:

_flutterAudioCapture.start(
(data) { ... },
(e) { ... },
sampleRate: <sample_rate>,
bufferSize: <buffer_size>,
);

Before applying any logic on the microphone data, let’s first examine the following two parameters sampleRate and bufferSize :

Sample Rate

The sample rate refers to the number of sample that are captured per second (See Figure 4). To capture high frequencies from an audio signal, a higher sampling rate is required. For instance, a typical phone call has a sampling rate of only 8000 Hz (8 kHz), which limits the capture of higher frequencies, resulting in a muffled sound quality and a loss in information.

Figure 4: The sample rate determines which frequencies can be captured¹⁰

Therefore, it would be logical to use a sampling rate of approximately 20 kHz; however, the actual sampling rate must be at least 40 kHz. This requirement is based on the Nyquist–Shannon sampling theorem.

Nyquist–Shannon sampling theorem

Audio consists of pressure waves, which, when visualized, exhibit an oscillating pattern, i.e. they go up and down. To accurately capture the full range of an audio wave, it is essential to record the points where the wave reaches its lowest and highest levels, ensuring that the wave has completed a full cycle. Otherwise, this can lead to incorrect signal capture, as illustrated in Figure 5 A and C. This phenomenon is called aliasing.

Figure 5: Only having a sampling rate equal to the frequency (e.g. both 40 Hz), will result in loss of data. Instead, the sampling rate should be double the actual frequency (e.g. 80 Hz for 40 Hz frequency)⁷.

The wagon wheel problem describes a similar problem, where the wheel of a car spinning 24 times a second appears stationary when filmed with a 24 frames per second (FPS) camera. With more than 24 rotations, the wheel can even look like it is going reverse.

Figure 6: Cameras can have aliasing with fast moving objects⁶

That is the reason why, according to the Nyquist–Shannon sampling theorem, the Hz of the sample rate must be twice the Hz of the actual frequency that should be measured/reconstructed. This requirement ensures that the original waveform can be accurately and precisely reconstructed.

So for 20000 Hz, the upper frequency bound the human ear can hear, this equals to 40000 Hz. So this means that audio should be recorded at a sampling rate of 40000 Hz to capture the full spectrum from 20 to 20000 Hz.

However, the most common sampling rates are 44.1 kHz and 48 kHz and not 40kHz. The reason for that is, that we do not want to record any audio above the 40kHz, but it is not directly possible to cut off any signals above a certain frequency. Instead the so called anti-aliasing filters, which are used for cutting off frequencies above or below a certain threshold, feature a transition band, i.e. it takes from frequency 20000Hz to frequency 22000Hz to have all frequencies above the threshold of 20000Hz removed (See Figure 7). Within this range (e.g. 20000Hz22000Hz), the frequencies gradually decrease but are still present until they are completely absent.

Figure 7: Anti-Aliasing filter require a transition band to remove all frequencies higher than a given threshold⁵

This is the reason, why the sample rate should not be 40 kHz, but should have a buffer to allow for the transition band. Therefore, going back to our code example, we can choose a sample rate of 48000.

Buffer Size

The buffer size defines the number of samples that should be buffered before processing. The higher the number of data points we buffer, the more delay will be in the signal. The lower the number is, the less time we have to apply any calculations to prevent any stuttering, e.g. the buffer size has 53ms of data (buffer size/sample rate) and our calculation takes 59ms, there will be a stuttering/no sound of 6ms when playing back the data in real-time.

We will go with a buffer size of 256, i.e. 256 data points are buffered, but feel free to change to a lower or higher number if necessary.

With both the sample rate and buffer size in place, we can finally take a look at the code for extracting the frequencies from our audio signal:

_flutterAudioCapture.start(
(data) {
final buffer = data;
<...>
},

So what is the data that we get from the package?

PCM Data

The data the package returns is called PCM data. Basically, this is the raw audio data representing sound wave amplitudes over time. In more detail, PCM stands for Pulse Code Modulation and is the method for transforming analog signals to digital ones. Only with the raw/uncompressed audio data, it is possible to extract the frequencies as desired.

How can the frequencies be extracted? Are they directly accessible in the data? No, the data must first be transformed. This transformation is performed in the following code lines using the fftea package:

final fft = FFT(buffer.length);
final freq = fft.realFft(buffer);
final freqList = freq.discardConjugates().magnitudes().toList();

What is happening here? To fully understand this, we must first grasp the underlying concept.

Time vs Frequency Domain

Why can we not extract the frequency information directly? Because the PCM Data is in the time domain, i.e. the data represents audio over time and not per frequency. But luckily, having the data in time domain also means that we have the data in the frequency domain but just can not access it directly (See Figure 1). Instead, it’s required to apply an Fast-Fourier-Transform (FFT). The FFT is used to transform the data from time domain to frequency domain.

Figure 1: Audio data in the time domain can be transformed into frequency domain and the other way around.⁸

Fast-Fourier-Transform: Transforming Time to Frequency Domain

How does the Fast Fourier Transform (FFT) work? In the ‘Audio Frequencies’ section, it was explained that every sound is composed of various frequencies at different levels. Therefore, it should be possible to reconstruct any sound by combining and adding these different frequencies. That’s correct, and the fascinating part is, that when reconstructing a sound using different frequencies, the level of each frequency needed to reproduce the waveform is determined (See Figure 4). This is exactly what is required: The level of each frequency aka. the frequency domain.

Figure 4: An audio signal is reproduced using FFT.⁹

That is what the FFT is all about: It reengineers the audio signal to extract the frequency domain by determining the level of each frequency required. This is the process that occurs in the FFT formula:

Basically, this is what is happening behind the scenes when executing the following code:

final freq = fft.realFft(buffer);

Symmetrical Result

The output of the FFT consists of pairs of complex numbers, which can be used to calculate values such as the magnitude. We then apply two functions to this output. The first is fftOutput.discardConjugates(), which removes half of the values from the FFT. This is possible because the FFT produces a symmetrical, mirrored result for real signals (See Figure 9), with one half representing negative frequencies and the other representing positive frequencies.

Figure 9: The result of a FFT visualized in a graph³

Both the positive and negative frequency components are required to fully reconstruct a signal. However, since the use case is not about full reconstruction but about extracting frequencies, the negative frequencies are not needed and can be discarded, effectively reducing the output size by half.

Magnitude

The second function we call is fftOutput.magnitudes() .

The FFT outputs us pairs of complex numbers. These are not in a useable format. What we want, is the maximum absolute number for a frequency, known as the magnitude. To calculate the magnitude, the Pythagorean theorem (a² + b² = c²) is used, where a and b equal the complex numbers from the FFT output and c equals the magnitude.

This calculation does not come out of nowhere. Instead the values for a and b represent sine and cosine values within a circle and together with the third value, i.e. the magnitude, form a triangle within a circle (See Figure 8).

Figure 8: It is possible to apply the Pythagorean theorem on the result of a FFT to get the magnitude, because the three values form a triangle within a circle when visualized⁴

Therefore, we can calculate the magnitude by calculating the missing side of a triangle using the Pythagorean theorem.

Frequency Spectrums

Finally, we will group various frequencies into broader frequency spectrums to reduce the number of data points for visualization. This step is optional.

For this we will use a FrequencySpectrum model, which includes a minimum and maximum frequency value:

class FrequencySpectrum {
FrequencySpectrum(this.min, this.max);

final int min;
final int max;
}

The frequencies will be grouped by octaves. Within an octave the frequency doubles, e.g. 40–80 Hz is one octave and the following octave would go from 80–160 Hz. Interestingly, humans perceive these two areas as qualitatively identical, with the latter merely being at a higher pitch, still the distance between the start and end frequency being twice as far. All octaves, starting with 0 — 20 Hz and ending with 16kHz-20kHz, will be put into a list:

final frequencies = [
FrequencySpectrum(0, 20),
FrequencySpectrum(20, 25),
FrequencySpectrum(25, 31),
FrequencySpectrum(31, 40),
FrequencySpectrum(40, 50),
FrequencySpectrum(50, 63),
FrequencySpectrum(63, 80),
FrequencySpectrum(80, 100),
FrequencySpectrum(100, 125),
FrequencySpectrum(125, 160),
FrequencySpectrum(160, 200),
FrequencySpectrum(200, 250),
FrequencySpectrum(250, 315),
FrequencySpectrum(315, 400),
FrequencySpectrum(400, 500),
FrequencySpectrum(500, 630),
FrequencySpectrum(630, 800),
FrequencySpectrum(800, 1000),
FrequencySpectrum(1000, 1250),
FrequencySpectrum(1250, 1600),
FrequencySpectrum(1600, 2000),
FrequencySpectrum(2000, 2500),
FrequencySpectrum(2500, 3150),
FrequencySpectrum(3150, 4000),
FrequencySpectrum(4000, 5000),
FrequencySpectrum(5000, 6300),
FrequencySpectrum(6300, 8000),
FrequencySpectrum(8000, 10000),
FrequencySpectrum(10000, 12500),
FrequencySpectrum(12500, 16000),
FrequencySpectrum(16000, 20000),
];

For each frequency spectrum, the combined magnitude is calculated afterwards:


List<({FrequencySpectrum spectrum, double value})> frequencyValues =
frequencies.map((e) {
final min = fft.indexOfFrequency(e.min.toDouble(), 44000);
final max = fft.indexOfFrequency(e.max.toDouble(), 44000);

return (
spectrum: e,
value: freqList
.sublist(min.floor(), max.ceil())
.reduce((a, b) => a + b),
);
}

Finally, these frequencyValues are passed to theonData function, which can then be used for visualizing the data in the UI.

We can now extract frequencies from a real-time audio signal, allowing us to visualize, animate, and implement various creative applications, e.g. integrating different frequencies into an apple intelligence clone like I did here:

Summary

In this article, we explored how frequency data can be extracted from real-time audio signals using Dart/Flutter to create a responsive visualization. To achieve this, raw PCM data is captured first, then a Fast Fourier Transform (FFT) is applied to convert the data from time domain to frequency domain. By analyzing this transformed data, we can group various frequencies to drive different parts of a shader, allowing specific audio frequency spectrums to influence separate parts of an animation.

We also covered essential concepts such as sample rate and buffer size, explaining how they affect real-time audio capture and processing. The sample rate should be high enough to prevent aliasing as suggested by the Nyquist–Shannon sampling theorem, which requires a rate at least double the highest frequency in the signal. Buffer size, meanwhile, controls the latency and processing delay, balancing responsiveness with smooth performance.

By following these steps, you can successfully integrate audio-responsive visualizations in Flutter, aligning with the latest trends set by companies like Apple and Google.

The full source code, including the visualization with the shader and Equalizer, is available under:

If you enjoyed this guide and would like to see a Part 2, let me know! Clap for this article, leave a comment, or share it with others who might find it useful. Your feedback will help me create even more in-depth guides on different Flutter topics!

You can connect with me on X (formerly Twitter) or LinkedIn if you’re interested in Flutter and want to explore more in-depth topics.

Sources

[1] Pilhofer, Michael (2007). Music Theory for Dummies. For Dummies. p. 97. ISBN 9780470167946.

[2] https://vru.vibrationresearch.com/lesson/introduction-sine/ (15.10.24)

[3] Chowdhury, Mehdi & Cheung, Ray C.C.. (2019). Reconfigurable Architecture for Multi-lead ECG Signal Compression with High-frequency Noise Reduction. Scientific Reports. 9. 10.1038/s41598–019–53460–3.

[4] https://www.youtube.com/watch?v=rUtz-471LkQ (06.11.24)

[5] Kapić, Aladin & Sarić, Rijad & Lubura, Slobodan & Jokic, Dejan. (2021). FPGA-based Implementation of IIR Filter for Real-Time Noise Reduction in Signal. Journal of Engineering and Natural Sciences. 3. 10.14706/JONSAE2021316.

[6] https://www.youtube.com/watch?v=VNftf5qLpiA (06.11.2024)

[7] https://www.ni.com/de/shop/data-acquisition/measurement-fundamentals/analog-fundamentals/acquiring-an-analog-signal--bandwidth--nyquist-sampling-theorem-.html (06.11.2024)

[8] Mastriani, Mario. (2018). Quantum-Classical Algorithm for an Instantaneous Spectral Analysis of Signals: A Complement to Fourier Theory. Journal of Quantum Information Science. 08. 52–77. 10.4236/jqis.2018.82005.

[9] https://www.jezzamon.com/fourier/ (06.11.2024)

[10] https://www.hollyland.com/blog/tips/what-is-sample-rate-in-audio (12.11.2024)

--

--

Responses (3)