Monday, August 13, 2018

audio - Feature extraction for sound classification


I'm trying to extract features from a sound file and classify the sound as belonging to a particular category (eg : dog bark, vehicle engine e.t.c). I'd like some clarity on the following things :


1) Is this doable at all? There are programs that can recognize speech, and differentiate between different types of dog bark. But is it possible to have a program that can receive a sound sample and just say what kind of a sound it is? (Assume there's a database containing a lot of sound samples to refer to). The input sound samples can be a bit noisy (microphone input).


2) I assume that the first step is audio feature extraction. This article suggests extracting MFCCs and feeding them to a machine learning algorithm. Is MFCC enough? Are there any other features that are generally used for sound classification?



Thank you for your time.



Answer




  1. By long shot it is doable - to what extend? You will see. This task of environmental sound classification is not very well studied. Also choice of machine learning paradigm is crucial - statistical approach or maybe binary classifier? You can start with GMM's, ANN's and SVM's - I opt for GMM's and ANN's.

  2. Yes, most of people are using MFCC's because they are well correlated with what people are actually hearing and also no one came up with anything better since. You might also want to add extra features such as MPEG-7 descriptors. Proper feature optimisation must be performed because sometimes you don't need so many features, especially when they are do not separable. For more info please refer to my previous answers:



Feature extraction from spectrum


MFCC extraction


Detection of sounds




No comments:

Post a Comment

periodic trends - Comparing radii in lithium, beryllium, magnesium, aluminium and sodium ions

Apparently the of last four, $\ce{Mg^2+}$ is closest in radius to $\ce{Li+}$. Is this true, and if so, why would a whole larger shell ($\ce{...