Thursday, September 7, 2017

Multi-channel audio upsampling interpolation


I have a four-channel audio signal from a microphone tetrahedral array. I wish to upsample it from 48 kHz to 240 kHz.


Is there a preferred interpolation method for audio? Does cubic interpolation (or any other) have any advantages over linear for the specific case of audio?


Assuming I am using cubic interpolation, do I interpolate each channel separately or is there any benefit in using a bicubic interpolation over all four channels?



Answer





Does cubic interpolation (or any other) have any advantages over linear for the specific case of audio?



You'd use neither for audio. The reason is simple: The signal models you typically assume for audio signals are very "Fourier-y", to say, they assume that sound is composed of weighted harmonic oscillations, and bandlimited in its nature.


Neither linear interpolation nor cubic interpolation respect that.


Instead, you'd use a resampler with a anti-imaging / anti-aliasing filter that is a good low-pass filter.


Let's take a step back:


When we have a signal that is discrete in time, i.e. has been sampled at a regular lattice of time instants, its spectrum is periodic – it repeats every $f_s$ (sampling freq.).


Now, of course, we rarely look at it this way, because we know that our sampling can only represent a bandwidth of $f_s/2$, we typically only draw the spectrum from 0 to $f_s/2$, for example:


S(f)

^
|---
| \
| \ ---
| --/ \
| \------\
+----------------------'---> f
0 f_s/2

Now, the reality of it is that in fact, we know that for real-valued signals, the spectrum is symmetrical to $f=0$:



                     S(f)
^
---|---
/ | \
--- / | \ ---
/ \-- | --/ \
/------/ | \------\
---'----------------------+----------------------'--->
-f_s2/2 0 f_s/2


But, due to the periodic nature of the spectrum of something that got multiplied with a "sampling instance impulse train", that thing repeats to both sides infinitely, but we only typically "see" the 1. Nyquist zone (marked by :)


       :                    S(f)                     :
: ^ :
: ---|--- : -------
… : / | \ : / \ …
: --- / | \ --- : --- / \ ---
: / \-- | --/ \ : / \-- --/ \
: /------/ | \------\ : /------/ \------\
-------'----------------------+----------------------'---------------------------------------------'-->
-f_s/2 0 f_s/2 f_s


When we increase the sample rate, we "just" increase the observational width. Just a random example:


                            S(f)                      
^
---|--- :------
… / | \ /: \ …
--- / | \ --- --- / : \ ---
/ \-- | --/ \ / \-- : --/ \
/------/ | \------\ /------/ : \------\
-------'----------------------+----------------------'---------------------------------------------'-->

-f_s/2 0 f_s/2 new f_s/2 f_s

Try that! Take an audio file, let the tool of your liking show you its spectrum. Then, just insert a $0$ after every sample, save as a new audio file (python works very well for such experiments), and display its spectrum. You'll see the original audio (positive half of the) spectrum on the left side, and its mirror image on the right!


Now, to get rid of these images, you'd just low-pass filter to your original Nyquist bandwidth.


And that's really all a resampler does: change the sampling rate, and make sure repetitions and foldovers (aliases) don't appear in the output signal.


If you're upsampling by an integer factor $N$ (say, 48 kHz -> 192 kHz), then you just insert $N-1$ zeros after every input sample and then low-pass filter; it's really that simple.


In the ideal case, that filter would be a rectangle: Let through the original bandwidth unaltered, suppress everything not from there. A filter with a rectangular spectral shape has (infinite!) sinc shape in time domain, so that's what sinc interpolation is (and why it's pretty much as perfect as it gets).


Since that sinc is infinitely long, and your signal isn't, well, that's not really realizable. You can have a truncated sinc interpolation, however.


As a matter of fact, even that would be overkill: your original audio has low-pass characteristics, anyway! (simply because of the anti-aliasing filters that you invariably need before sampling the analog audio source; not to mention that high frequencies are inaudible, anyways.)


So, you'd simply go with a "good enough" low pass filter after inserting these zeros. That keeps the computational effort at bay, and also might be even better than the truncation of the sinc.



Now, what if your problem is decidedly not an integer interpolation? For example, 240000 / 44800 is definitely not an integer. So, what to do?


In this relatively benign case, I'd go for a rational resampler: First, we go up by an integer factor $N$, so that the resulting sampling rate is a multiple of the target sampling rate. We'd do the low-pass filtering as explained above, limiting the resulting signal to its original 44.8 kHz/2 bandwidth, and then apply a downsampling by $M$, i.e. anti-aliasing filtering it to the target 240 kHz/2 bandwidth, and then throwing out $M-1$ of $M$ samples.


It's really that easy!


In fact, we can simplify further: since the anti-imaging filter cuts off at 22.4 kHz, and the anti-aliasing filter only after 120 kHz, the latter is redundant, and can be eliminated, so that the overall structure of a rational resampler becomes:


Upsampling -> core filter -> downsampling


(in fact, we can even apply multirate processing and flip the order, greatly reducing effort, but that'd lead too far here.)


So, what are your rates here? For 44800 Hz in, 240000 Hz out, the least common multiple is 3360000 Hz = 3360 kHz, that's up by a factor of 75, low pass filter, and then down by 14. So, you'd need a 1/75 band lowpass filter. It's easy to design one using python or octave!


No comments:

Post a Comment

periodic trends - Comparing radii in lithium, beryllium, magnesium, aluminium and sodium ions

Apparently the of last four, $\ce{Mg^2+}$ is closest in radius to $\ce{Li+}$. Is this true, and if so, why would a whole larger shell ($\ce{...