Monday, October 16, 2017

frequency - Performing classification based on FFT results


It is perhaps important to start with the fact that I am a complete beginner in DSP. I have got a number of audio recordings (of sampling rate 22 kHz) - of bird songs - which I have been trying to analyse using FFT (in Matlab/Octave). Particularly, I am trying to show using Machine Learning classification algorithms that different classes of those recordings have prominence of different frequencies (or frequency ranges). The recordings are of variable length and, due to computation limitations, the largest size FFT that I can do is 2^19 (which I understand is the number of points it takes from each audio file). So, my first question is: if I break my recordings in parts, each corresponding to the size of FFT that I've chosen - would it be still reasonable to treat those parts as separate data examples (i.e. separate recordings), and what kind of information do I lose when splitting the larger recordings in such way?


The second question is: is there a better way for a beginner to perform this analysis, in a not so computationally expensive way, since I think working with vectors of size 2^18+1 is not really the best thing to do in the current case.



Answer



I think the FFT is a bad choice of representation for your problem - it captures many properties of the signal irrelevant to your application, and as you are suspecting, it generates a huge amount of data to process if you extract the FFT of the whole signal.


It seems to me that the most important quantity to consider when studying birdsongs is pitch (fundamental frequency) - all other dimensions of sound (loudness, timbre) are actually variability factors you want to get rid off. For example, two recordings of the same birdsong made in a different environment and with different equipment will exhibit a different frequency response due to the variation in conditions; but fortunately the pitch profile would be exactly the same!


So I suggest you to use a pitch transcription utility (canned solutions : aubio, praat, sonic visualizer...) to extract a pitch contour - a function giving the predominant "note" as a function of time. From that, you could define a feature vector containing pitch statistics (mean, standard deviation, maybe higher order moments); or maybe just build a histogram of pitch values; and this would result in a very compact feature vector suitable for automatic classification. To improve your results, you might then add features capturing the dynamics of pitch over time - dominant modulation rates extracted form the pitch contour, variation of pitch statistics over chunks of a few seconds of audio, etc..


No comments:

Post a Comment

periodic trends - Comparing radii in lithium, beryllium, magnesium, aluminium and sodium ions

Apparently the of last four, $\ce{Mg^2+}$ is closest in radius to $\ce{Li+}$. Is this true, and if so, why would a whole larger shell ($\ce{...