I am building a speech recognition system using Hidden Markov Model in python. I referred to this and this question and its answers, which were very helpful.
In my approach, I split the continuous speech into separate words. I am thinking of using HMM to detect each word. So my states of HMM will be phones.
What I understood so far is that HMM estimates next state based on current state(phone). But I don't get how to estimate first state of HMM(i.e. the first phone of the word).
Can you suggest the best approach to use HMM to achieve this?
Also states of HMM will be phones, but I am not getting what can be observation in problem? There are multiple frame in a single phone and there is a feature vector corresponding to each frame. What should I use as observation?
Answer
The Baum-Welch algorithm uses the EM (Expectation Maximization) algorithm to estimate the model parameters $(T, E, \pi)$, where:
$T$: the transition probabilities
$E$: the emition probabilities
$\pi$: probability distribution on the states
Some years ago, I made the following quick-and-dirty implementation (may be fairly broken now), for the discrete case.
Hope this helps.
No comments:
Post a Comment