There is a recording which contains very noisy voice. (voice with a low Signal-to-Noise ratio).
It is a female voice of a stationary speaker (right in front of the microphone) and there are many other recordings of the same speaker with much better quality.
What can be done to denoise the audio?
From the other recordings, it might be possible to extract statistical values that allow some model to filter the speaker's voice.
Existing questions only address snoring, have non-random noise, want to identify a specific artifact or a moving speaker. The question is-it-possible-to-filter-out-a-persons-voice-out-of-100-of-other-voices is similar, but the answer there lacks details and is more about distinguishing, less about extraction.
Some of the tapes are direct copies, some are copies of copies, which have also been recorded using a lower speed setting (4.75 cm/s). This affected the quality prior to digitalization.
Answer
I am afraid there's not much you can do. The voice part seems to have gone through the equivalent of a low pass filter with a cut off of around 1000 Hz. Basically, all of the speech components above 1000Hz are gone.
The filtering action may not have been an intentional filter, but may have been due to the improper biasing of the tape during recording. If it is an old tape, it may simply have deteriorated over time. Also, the playback head may have needed degaussing.
Running it through a high pass flattens the frequency response, but doing so pushes the level down so far it drowns the signal in the noise - no help there.
The best result I got was from using a very steep low pass with a cutoff of 1000Hz together with a very steep high pass with a cutoff of 160 Hz. That gets rid of the noise by only passsing what is left of the actual speech, but obviously it can't recover what was lost.
Your real problem is not the noise, it is the lost frequency range.
This is the spectrum of the bad recording:
This is the spectrum of the good recording:
As you can see, there's a lot of stuff missing from the bad recording. So, it isn't simply a problem of removing noise. The problem is that there's stuff that's just GONE.
Look at the range from 1000Hz to 7000Hz. There's lots of stuff there in the good recording, but in the bad one it is just a flat spectrum a good 30dB below the voice peaks aroudn 400Hz.
Some of what's missing might be buried in the noise, but recovering it would cause artifacts that are worse than the noise and muffled sound.
Looking at just the noise, it doesn't seem like there would be much to recover out of it. It looks just like the portions with speech (except for between 160Hz to 1000Hz,) so anything that is in there is going to be buried really deep.
No comments:
Post a Comment