What is the difference (if any) between PSOLA (Pitch Synchronous Overlap and Add) and TDHS (time domain harmonic scaling) time-pitch modification algorithms?
Answer
OK I did the two some time ago, TDHS in principle just apply time scale modification and to change the pitch do you need apply interpolation (resample) and it will shift the spectral envelope.
For TDHS is hard to find some paper that teach how its really works, I learned the math and how it works in the Burazerovic Dzevdet paper:
$N_p$ is defined as the local Pitch Period
$\alpha$ is the time stretch factor
$N_c$ is the cross-fade length, here half the length of the window.
To compress the time $\alpha < 1$ the equation is:
$$ \begin{align} N_c & = \operatorname{round}\left(N_p\frac{\alpha}{1-\alpha}\right) \\ & = \left\lfloor N_p\frac{\alpha}{1-\alpha} + \frac12 \right\rfloor \\ \end{align} $$
where "$\lfloor \cdot \rfloor $" is the floor()
function returning the most positive integer not exceeding the argument.
The illustration from Dzevdet paper show how the algorithm works using a triangular window (n. b. 4 cycles become 3 and the period and fundamental frequency (pitch) of the waveform is not changed):
Now to expansion $\alpha > 1$ (note 3 cycles become 4 and the period and fundamental frequency (pitch) of the waveform is not changed):
$$ \begin{align} N_c & = \operatorname{round}\left(N_p\frac{\alpha}{\alpha-1}\right) \\ & = \left\lfloor N_p\frac{\alpha}{\alpha-1} + \frac12 \right\rfloor \\ \end{align} $$
OK here my simple demo how TDHS works to compress:
alpha=0.5;
f=735;
Fs=44100;
signal= 0.9*sin(2*pi*f/Fs*(1:1000));
signal = signal';
period= 60;
out2=[];
Nc=round((period*alpha)/(1-alpha));
nsamples = length(signal);
ii=1;
out2=[];
while ( ii <= nsamples )
if ii+Nc+period-1 > nsamples
%OK I'm lost some samples in the end of out2 output signal, no problem just tests
HI=000 %debug
else
frame = signal(ii:ii+Nc+period-1);
frame1=frame(1:Nc) .* (1-linspace(0,1,Nc))';
frame2=frame(period:Nc+period-1) .* linspace(0,1,Nc)';
OUT = frame1+frame2;
out2 = [out2' OUT']';
end
ii = ii + Nc+period-1
end
plot(out2)
Course this is just to show for you the concept, I used fixed integer period $N_p = 60 (44100/735)$
Now To PSOLA
things will different, Usually you need to use the Pitch information to mark the sign to the glottal instants you can change the Pitch and the time (no formants change here) .
Take a look in my answer about PSOLA here
No comments:
Post a Comment