 
 
 
 
 
   
Our working measure of timbre will be the one assumed by the bonk~
object available in Pd and Max [PuckettePuckette1998].  The incoming sound is
split into 11 frequency bands, three with center frequency 100, 300, and 500
Hz. and bandwidth 200 Hz, and eight more tuned to each half-octave above 500
Hz., so that the top one is centered at 8 kHz.  In each band we estimate a
loudness contribution as the fourth root of the power on the band; this is
close to a loudness measure suggested in [Rossing, Moore, and WheelerRossing
  et al.2002].
It turns out, of course, that the measured power in these eleven bands is strongly
intercorrelated.  We decorrelate them in two steps.  If the raw timbre vector is
![\begin{displaymath}
R = [{r_1}, {r_2}, \cdots, {r_{11}}] \dagger
\end{displaymath}](img3.png) 
 (suitably normalized) and ten other orthogonal components
to form another timbre-without-loudness vector,
 (suitably normalized) and ten other orthogonal components
to form another timbre-without-loudness vector,  , of ten dimensions.  We
then apply multidimensional scaling to
, of ten dimensions.  We
then apply multidimensional scaling to  to give yet another timbre vector
 to give yet another timbre vector
 , in ten dimensions, so that each component of
, in ten dimensions, so that each component of  has sample mean zero
and variance one, and so that the components of
 has sample mean zero
and variance one, and so that the components of  are uncorrelated.  To
find this transformation, a representative corpus of sounds
is analyzed.  When analysing synthetic timbres, a systematic sampling is
made as each of the usable parameters attain all the values in their domain;
for input sounds the corpus is assembled intuitively.
 are uncorrelated.  To
find this transformation, a representative corpus of sounds
is analyzed.  When analysing synthetic timbres, a systematic sampling is
made as each of the usable parameters attain all the values in their domain;
for input sounds the corpus is assembled intuitively.
The musician's controlling signal and a database of possible synthetic sounds are both thus analyzed; each of the two requires its own decorrelating transformation. Associated with each synthetic sound, we also store the synthesis parameters that led to the sound so that we can re-create it later.
By normalizing the timbre vectors of both the input and the available outputs to have the same means and variances, we maximize the closeness of fit between the two; this maximizes the likelihood of finding `good' output parameters. In doing this we are dropping any promise of making the output timbre imitate the input timbre exactly; they should move in roughly the same directions, but each according to its natural span.
 
 
 
 
