I've seen many posts here that state that an FFT is almost all you need to get the fundamental frequency from an input stream. The problem is that, with sound, this doesn't work; we need to look for periodicity as well. I know that autocorrelation is useful in this arena, but even then, I am unfamiliar with how one would extract the fundamental frequency from the data after autocorrelation. Also, I am aware that there are alternative methods as well.
I am not interested in the easiest solution; I am interested in 1: the most efficient solution and 2: the most accurate solution, especially if these are mutually exclusive.
Answer
FFTs alone are lousy at pitch estimation for many really common sound sources, including male voices, and low piano or guitar notes. Thus, the comments that an FFT is all you need are false and misleading. (Although FFTs can be used as components of a more accurate composite pitch detection algorithm.)
When using autocorrelation, after you remove or discount subharmonic lag peaks, the pitch frequency may be around or close to the reciprocal of the time lag corresponding to the autocorrelation peak.
Pitch often refers to the human perception of such, and thus an estimate sometimes needs to include psycho-acoustics effects and illusions. There are multiple methods for, and lots of book chapters and research papers on this problem of pitch detection. How well each algorithm works might depend on your particular sound source and requirements.
Pitch detection/estimation methods include lag estimators, such as autocorrelation, weighted autocorrelation, AMDF and ASDF; and frequency domain analysis methods based on initial FFTs, such as cepstral/cepstrum methods, and the harmonic product spectrum algorithm. Composite methods, and ones that use some decision analysis are less easy, but can be more accurate or robust for some sound sources. Look for YIN, RAPT and YAAPT. This list is by no means exhaustive.
No comments:
Post a Comment