Thursday, May 4, 2017

audio - What is the difference between PSOLA and TDHS time-scaling or pitch-shifting?


What is the difference (if any) between PSOLA (Pitch Synchronous Overlap and Add) and TDHS (time domain harmonic scaling) time-pitch modification algorithms?



Answer




OK I did the two some time ago, TDHS in principle just apply time scale modification and to change the pitch do you need apply interpolation (resample) and it will shift the spectral envelope.


For TDHS is hard to find some paper that teach how its really works, I learned the math and how it works in the Burazerovic Dzevdet paper:


$N_p$ is defined as the local Pitch Period


$\alpha$ is the time stretch factor


$N_c$ is the cross-fade length, here half the length of the window.


To compress the time $\alpha < 1$ the equation is:


$$ \begin{align} N_c & = \operatorname{round}\left(N_p\frac{\alpha}{1-\alpha}\right) \\ & = \left\lfloor N_p\frac{\alpha}{1-\alpha} + \frac12 \right\rfloor \\ \end{align} $$


where "$\lfloor \cdot \rfloor $" is the floor() function returning the most positive integer not exceeding the argument.


The illustration from Dzevdet paper show how the algorithm works using a triangular window (n. b. 4 cycles become 3 and the period and fundamental frequency (pitch) of the waveform is not changed):


enter image description here



Now to expansion $\alpha > 1$ (note 3 cycles become 4 and the period and fundamental frequency (pitch) of the waveform is not changed):


$$ \begin{align} N_c & = \operatorname{round}\left(N_p\frac{\alpha}{\alpha-1}\right) \\ & = \left\lfloor N_p\frac{\alpha}{\alpha-1} + \frac12 \right\rfloor \\ \end{align} $$


enter image description here


OK here my simple demo how TDHS works to compress:


alpha=0.5;
f=735;
Fs=44100;
signal= 0.9*sin(2*pi*f/Fs*(1:1000));
signal = signal';
period= 60;

out2=[];

Nc=round((period*alpha)/(1-alpha));
nsamples = length(signal);

ii=1;
out2=[];
while ( ii <= nsamples )

if ii+Nc+period-1 > nsamples

%OK I'm lost some samples in the end of out2 output signal, no problem just tests
HI=000 %debug

else

frame = signal(ii:ii+Nc+period-1);
frame1=frame(1:Nc) .* (1-linspace(0,1,Nc))';
frame2=frame(period:Nc+period-1) .* linspace(0,1,Nc)';
OUT = frame1+frame2;
out2 = [out2' OUT']';



end

ii = ii + Nc+period-1

end

plot(out2)


Course this is just to show for you the concept, I used fixed integer period $N_p = 60 (44100/735)$


Now To PSOLA things will different, Usually you need to use the Pitch information to mark the sign to the glottal instants you can change the Pitch and the time (no formants change here) .


Take a look in my answer about PSOLA here



No comments:

Post a Comment

periodic trends - Comparing radii in lithium, beryllium, magnesium, aluminium and sodium ions

Apparently the of last four, $\ce{Mg^2+}$ is closest in radius to $\ce{Li+}$. Is this true, and if so, why would a whole larger shell ($\ce{...