Sunday, July 8, 2018

signal analysis - How to build a realtime simple audio spectogram?



I'm trying to build a spectogram view of history time of 4 seconds and I'm stuck at the part where you load/draw FFT vectors into the 2D bitmap. I just don't know how to do that, but i got the FFT in realtime.


I'm doing it in swift(iOS)



Thank you!



Answer



The Discrete Fourier Transform of a signal is a series of $\mathbb{C}$omplex numbers. When performed in short time segments and depicted in a spectrogram, what is really depicted is a spectrum.


That is, the strength of a particular frequency component, which is a $\mathbb{R}$eal number.


There are various different expressions for the strength of a component of the spectrum such as Amplitude, Power, Power Spectral Density and so on.


The most straightforward one is the Amplitude spectrum. To calculate the Amplitude of a given spectrum component, you need to get the absolute value of the complex number that represents a frequency component in your spectrum.


Programming languages such as C, Python, Julia, MATLAB / Octave and others, have native complex data types. In that case, the output from an fft() kind of function is an array of type complex and each element of that array contains a function (e.g. abs()) to obtain the absolute value of that complex number. If your programming language does not have a complex data type, then the output of your fft() is probably two arrays of real numbers. One for the real part of the complex number and one for the imaginary part. In that case, to obtain the spectrum, all that you have to do is spc[m] = sqrt(real_part[m]^2 + imaginary_part[m]^2) where ^ denotes exponentiation and m is the $m^{th}$ element of your DFT spectrum.


Once you do that, you will have an array of real numbers. These are going to vary wildly, because of the way the DFT sums work and for this reason you usually apply a logarithmic transformation. This transformation keeps large numbers large and makes small numbers large. You would typically apply this with something like spc[m] = log(1 + spc[m]).


At this point, you can depict spc[m] in grayscale by normalising it to 255 (usually) tones of gray with something like rendered_spc[m] = int(round(spc[m]*255.0)). int denotes an integer data type, you are likely to use something like a BYTE or whatever is more convenient for your bitmaps. The other thing you can do is to map the value of spc[m] within a palette of N colours and "paint" the pixels with that colour.


Finally, it would be probably useful to emphasise here that the spectrogram does not depict one "real time FFT" but many short time fourier transforms. So, if you want to depict 4 seconds worth of a histogram, you would have to accumulate a number of FFTs, get their amplitude spectra, transform them logarithmically, map them to colours and then depict them. Think of each FFT amplitude spectrum as one column of your bitmap.



This is a brief outline on rendering spectrograms practically. Depending on the desired accuracy, you might have to add some more complex features, such as "overlapping" DFT segments, interpolation (for smoother bitmaps with less data) and others.


Hope this helps.


No comments:

Post a Comment

periodic trends - Comparing radii in lithium, beryllium, magnesium, aluminium and sodium ions

Apparently the of last four, $\ce{Mg^2+}$ is closest in radius to $\ce{Li+}$. Is this true, and if so, why would a whole larger shell ($\ce{...