Sunday, February 12, 2017

frequency spectrum - Which transform most closely mimics the human auditory system?


The Fourier transform is commonly used for frequency analysis of sounds. However, it has some disadvantages when it comes to analyzing the human perception of sound. For example, its frequency bins are linear, whereas the human ear responds to frequency logarithmically, not linearly.


Wavelet transforms can modify the resolution for different frequency ranges, unlike the Fourier transform. The wavelet transform’s properties allow large temporal supports for lower frequencies while maintaining short temporal widths for higher frequencies.


The Morlet wavelet is closely related to human perception of hearing. It can be applied to music transcription and produces very accurate results that are not possible using Fourier transform techniques. It is capable of capturing short bursts of repeating and alternating music notes with a clear start and end time for each note.


The constant-Q transform (closely related to the Morlet wavelet transform) is also well suited to musical data. As the output of the transform is effectively amplitude/phase against log frequency, fewer spectral bins are required to cover a given range effectively, and this proves useful when frequencies span several octaves.


The transform exhibits a reduction in frequency resolution with higher frequency bins, which is desirable for auditory applications. It mirrors the human auditory system, whereby at lower-frequencies spectral resolution is better, whereas temporal resolution improves at higher frequencies.


My question is this: Are there other transforms which closely mimic the human auditory system? Has anyone attempted to design a transform that anatomically/neurologically matches the human auditory system as closely as possible?


For example, it is known that human ears have a logarithmic response to sound intensity. It is also known that equal-loudness contours vary not only with intensity, but with the spacing in frequency of spectral components. Sounds containing spectral components in many critical bands are perceived as louder even if the total sound pressure remains constant.



Finally, the human ear has a frequency-dependent limited temporal resolution. Perhaps this could be taken into account as well.



Answer



In designing such transformations, one should take into account competing interests:



  • fidelity to the human auditory system (that varies with people), including non-linear or even chaotic aspects (tinnitus)

  • easiness of the mathematical formulation for the analysis part

  • possibility to discretize it or allow fast implementations

  • existence of a suitable stable inverse


Two recents designs have catch my ears recently: Auditory-motivated Gammatone wavelet transform, Signal Processing, 2014




The ability of the continuous wavelet transform (CWT) to provide good time and frequency localization has made it a popular tool in time–frequency analysis of signals. Wavelets exhibit constant-Q property, which is also possessed by the basilar membrane filters in the peripheral auditory system. The basilar membrane filters or auditory filters are often modeled by a Gammatone function, which provides a good approximation to experimentally determined responses. The filterbank derived from these filters is referred to as a Gammatone filterbank. In general, wavelet analysis can be likened to a filterbank analysis and hence the interesting link between standard wavelet analysis and Gammatone filterbank. However, the Gammatone function does not exactly qualify as a wavelet because its time average is not zero. We show how bona fide wavelets can be constructed out of Gammatone functions. We analyze properties such as admissibility, time-bandwidth product, vanishing moments, which are particularly relevant in the context of wavelets. We also show how the proposed auditory wavelets are produced as the impulse response of a linear, shift-invariant system governed by a linear differential equation with constant coefficients. We propose analog circuit implementations of the proposed CWT. We also show how the Gammatone-derived wavelets can be used for singularity detection and time–frequency analysis of transient signals.



The ERBlet transform: An auditory-based time-frequency representation with perfect reconstruction, ICASSP 2013



This paper describes a method for obtaining a perceptually motivated and perfectly invertible time-frequency representation of a sound signal. Based on frame theory and the recent non-stationary Gabor transform, a linear representation with resolution evolving across frequency is formulated and implemented as a non-uniform filterbank. To match the human auditory time-frequency resolution, the transform uses Gaussian windows equidistantly spaced on the psychoacoustic “ERB” frequency scale. Additionally, the transform features adaptable resolution and redundancy. Simulations showed that perfect reconstruction can be achieved using fast iterative methods and preconditioning even using one filter per ERB and a very low redundancy (1.08). Comparison with a linear gammatone filterbank showed that the ERBlet approximates well the auditory time-frequency resolution.



And I shall mention also:


An Auditory-Based Transform For Audio Signal Processing, WASPAA 2009




An auditory-based transform is presented in this paper. Through an analysis process, the transform coverts time-domain signals into a set of filter bank output. The frequency responses and distributions of the filter bank are similar to those in the basilar membrane of the cochlea. Signal processing can be conducted in the decomposed signal domain. Through a synthesis process, the decomposed signals can be synthesized back to the original signal through a simple computation. Also, fast algorithms for discrete-time signals are presented for both the forward and inverse transforms. The transform has been approved in theory and validated in experiments. An example on noise reduction application is presented. The proposed transform is robust to background and computational noises and is free from pitch harmonics. The derived fast algorithm can also be used to compute continuous wavelet transform



No comments:

Post a Comment

periodic trends - Comparing radii in lithium, beryllium, magnesium, aluminium and sodium ions

Apparently the of last four, $\ce{Mg^2+}$ is closest in radius to $\ce{Li+}$. Is this true, and if so, why would a whole larger shell ($\ce{...