Could anybody tell me a method to calculate the "4 Hz modulation energy" for a speech signal?
Thanks.
EDIT
I get some details : 4 Hz modulation energy: Speech has a characteristic energy modulation peak around the 4 Hz syllabic rate [3]. We use a portion of the MFCC algorithm [4] to convert the audio signal into 40 perceptual channels. We extract the energy in each band, bandpass filter each channel with a second order filter with a center frequency of 4 Hz, then calculate the short-term energy by squaring and smoothing the result. We normalize each channel's 4 Hz energy by the overall channel energy in the frame, and sum the result from all channels. Speech tends to have more modulation energy at 4Hz than music does.
Answer
The link in my comment suggests the following:
I would recommend first extracting the envelope by using either a halfwave rectification (i.e., replace all the negative values in the time waveform with zeros) or a Hilbert Transform and then lowpass filtering the waveform at around 50 Hz (the lowpass is optional if you only care about the 4 Hz component). Once you have the envelope, simply do an fft (making sure that your frequency resolution is at least 1 Hz) and look for the energy around 4 Hz. If any of these steps don't seem obvious too you, let me know and I can send you some Matlab code.
So, what that means is you need to take your speech $s[n]$ and find the envelope:
$$ e[n] = \left | s[n] + j{\bf H}\left[s[n]\right] \right | $$
where ${\bf H}$ is the Hilbert transform of your speech signal.
Then, take the FFT and look at the bin at 4Hz.
No comments:
Post a Comment