fft - How to distinguish voice from snoring?

Wednesday, September 5, 2018

fft - How to distinguish voice from snoring?

Background: I'm working on an iPhone application (alluded to in several other posts) that "listens to" snoring/breathing while one is asleep and determines if there are signs of sleep apnea (as a pre-screen for "sleep lab" testing). The application principally employs "spectral difference" to detect snores/breaths, and it works quite well (ca 0.85--0.90 correlation) when tested against sleep lab recordings (which are actually quite noisy).

Problem: Most "bedroom" noise (fans, etc) I can filter out through several techniques, and often reliably detect breathing at S/N levels where the human ear cannot detect it. The problem is voice noise. It's not unusual to have a television or radio running in the background (or to simply have someone talking in the distance), and the rhythm of voice closely matches breathing/snoring. In fact, I ran a recording of the late author/storyteller Bill Holm through the app and it was essentially indistinguishable from snoring in rhythm, level variability, and several other measures. (Though I can say that apparently he didn't have sleep apnea, at least not while awake.)

So this is a bit of a long shot (and probably a stretch of forum rules), but I'm looking for some ideas on how to distinguish voice. We don't need to filter the snores out somehow (thought that would be nice), but rather we just need a way to reject as "too noisy" sound that is overly polluted with voice.

Any ideas?

Files published: I've placed some files on dropbox.com:

The first is a rather random piece of rock (I guess) music, and the second is a recording of the late Bill Holm speaking. Both (which I use as my samples of "noise" be differentiated from snoring) have been mixed with noise to sort of obfuscate the signal. (This makes the task of identifying them significantly more difficult.) The third file is ten minutes of a recording of yours truly where the first third is mostly breathing, middle third is mixed breathing/snoring, and the final third is fairly steady snoring. (You get a cough for a bonus.)

All three files have been renamed from ".wav" to "_wav.dat", since many browsers make it maddeningly difficult to download wav files. Just rename them back to ".wav" after downloading.

Update: I thought entropy was "doing the trick" for me, but it turned out to mostly be peculiarities of the test cases I was using, plus an algorithm that wasn't too well designed. In the general case entropy is doing very little for me.

I subsequently tried a technique where I compute the FFT (using several different window function) of the overall signal magnitude (I tried power, spectral flux, and several other measures) sampled about 8 times a second (taking the stats from the main FFT cycle which is every 1024/8000 seconds). With 1024 samples this covers a time range of about two minutes. I was hoping that I would be able to see patterns in this due to the slow rhythm of snoring/breathing vs voice/music (and that it might also be a better way to address the "variability" issue), but while there are hints of a pattern here and there, there's nothing I can really latch onto.

(Further info: For some cases the FFT of signal magnitude produces a very distinct pattern with a strong peak at about 0.2Hz and stairstep harmonics. But the pattern is not nearly so distinct most of the time, and voice and music can generate less distinct versions of a similar pattern. There might be some way to calculate a correlation value for a figure of merit, but it seems that would require curve fitting to about a 4th order polynomial, and doing that once a second in a phone seems impractical.)

I also attempted to do the same FFT of average amplitude for the 5 individual "bands" I've divided the spectrum into. The bands are 4000-2000, 2000-1000, 1000-500, and 500-0. The pattern for the first 4 bands was generally similar to the overall pattern (though there was no real "stand-out" band, and often vanishingly small signal in the higher frequency bands), but the 500-0 band generally was just random.

Bounty: I'm going to give Nathan the bounty, even though he's not offered anything new, given that his was the most productive suggestion to date. I still have a few points I'd be willing to award to someone else, though, if they came through with some good ideas.

Notes

Wednesday, September 5, 2018

fft - How to distinguish voice from snoring?

No comments:

Post a Comment

periodic trends - Comparing radii in lithium, beryllium, magnesium, aluminium and sodium ions