next up previous contents index
Next: Phase relationships between channels Up: Fourier analysis and resynthesis Previous: Timbre stamping (classical vocoder)   Contents   Index


Phase

So far we have operated on signals by altering the magnitudes of their windowed Fourier transforms, but leaving phases intact. The magnitudes encode the spectral envelope of the sound. The phases, on the other hand, encode frequency and time, in the sense that phase change from one window to a different one accumulates, over time, according to frequency. To make a transformation that allows independent control over frequency and time requires analyzing and reconstructing the phase.

Figure 9.10: Phase in windowed Fourier analysis: (a) a complex sinusoid analyzed on three successive windows; (b) the result for a single channel (k=3), for the three windows
\begin{figure}\psfig{file=figs/fig09.10.ps}\end{figure}

Figure 9.10 shows how the phase of the Fourier transform changes from window to window, using a complex sinusoid as input. The sinusoid's frequency is $3\omega$, so that the peak in the Fourier transform is centered at $k=3$. If the initial phase is $\phi$, then the neighboring phases can be filled in as:

\begin{displaymath}
\begin{array}{lll}
{\angle S[0, 2] = \phi + \pi} &
{\angl...
...ha} &
{\angle S[2, 4] = \phi + 2H\alpha + \pi}\\
\end{array}\end{displaymath}

This gives an excellent way of estimating the frequency $\alpha $: pick any channel whose amplitude is dominated by the sinusoid and subtract two successive phase to get $H\alpha$:

\begin{displaymath}
H \alpha = \angle S[1, 3] - \angle S[0, 3]
\end{displaymath}


\begin{displaymath}
\alpha = {{\angle S[1, 3] - \angle S[0, 3] + 2 p \pi} \over H}
\end{displaymath}

where $p$ is an integer. There are $H$ possible frequencies, spaced by $2\pi/H$. If we are using an overlap of 4, that is, $H=N/4$, the frequencies are spaced by $8\pi/N = 4 \omega$. Happily, this is the width of the main lobe for the Hann window, so no more than one possible value of $\alpha $ can explain any measured phase difference within the main lobe of a peak. The correct value of $p$ to choose is that which gives a frequency closest to the nominal frequency of the channel, $k\omega$.

In the analysis/synthesis examples of the previous section, the phases of the output are derived directly from the phases of an input. This is appropriate when the output signal corresponds in time with the input signal. Sometimes time modifications are desired, for instance to do time stretching or contraction. Alternatively the output phase might depend on more than one input, for instance for attempting to morph between one sound and another.

In these situations, the important thing is to try to maintain the appropriate phase relationships between successive resynthesis windows, and also between adjacent channels. These two sets of relationships are not always compatible, however. We will make it our primary obligation to honor the relations between successive resynthesis windows, and worry about phase relationships between channels afterward.

Suppose we want to construct the $m$th spectrum $S[m, k]$ for resynthesis (having already constructed the previous one, number $m-1$). Suppose we wish the phase relationships between windows $m-1$ and $m$ to be those of a signal $x[n]$, but that the phases of window number $m-1$ might have come from somewhere else and can't be assumed to be in line with our wishes.

Figure 9.11: Propagating phases in resynthesis. Each phase, such as that of $S[6, k]$ here, depends on the previous output phase and the difference of the input phases.
\begin{figure}\psfig{file=figs/fig09.11.ps}\end{figure}

Figure 9.12: Phases of one channel of the analysis windows and two successive resynthesis windows.
\begin{figure}\psfig{file=figs/fig09.12.ps}\end{figure}

To find out how much the phase of each channel should differ from the previous one, we do two analysis of the signal $x[n]$, separated by the same hop size $H$ that we're using for resynthesis:

\begin{displaymath}
T[k] = {\cal FT}(W(n)X[n]) (k)
\end{displaymath}


\begin{displaymath}
T'[k] = {\cal FT}(W(n)X[n+H]) (k)
\end{displaymath}

Figure 9.11 shows the process of phase accumulation, in which the output phases each depend on the previous output phase and the phase difference for two windowed analyses of the input. Figure 9.12 illustrates the phase relationship in the complex plane. The phase of the new output $S[m, k]$ should be that of the previous one plus the difference between the phases of the two analyses:

\begin{displaymath}
\angle S[m, k] = \angle S[m-1, k] +
\left ( \angle T'[k] - \angle T[k] \right )
\end{displaymath}


\begin{displaymath}
= \angle \left (
{{S[m-1, k] T'[k]}
\over
{T[k]}}
\right )
\end{displaymath}

Here we used the fact that multiplying or dividing two complex numbers gives the sum or difference of their arguments.

If the desired magnitude is a real number $a$, then we should set $S[m, k]$ to:

\begin{displaymath}
S[m, k] \; = \;
a
\; \cdot \;
{
{ \left \vert
{{S[m-...
...{-1}
}
\; \cdot \;
{
{{S[m-1, k] T'[k]}
\over
{T[k]}}
}
\end{displaymath}

The magnitudes of the second and third terms cancel out, so that the magnitude of $S[m, k]$ reduces to $a$; the first two terms are real numbers so the argument is controlled by the last term.

If we want to take magnitude from the spectrum $T$ as well, we can set $a = \vert T'[k]\vert$, giving a simpler formula:

\begin{displaymath}
S[m, k] \; = \;
{
{ \left \vert
{{S[m-1, k]}
\over
{...
...{-1}
}
\; \cdot \;
{
{{S[m-1, k] T'[k]}
\over
{T[k]}}
}
\end{displaymath}



Subsections
next up previous contents index
Next: Phase relationships between channels Up: Fourier analysis and resynthesis Previous: Timbre stamping (classical vocoder)   Contents   Index
Miller Puckette 2006-03-03