Miller Puckette ^{1}
Music, like any art form, defies scientific analysis: the scientific method seeks traits common to a given species, whereas a piece of music is significant precisely because of its differences from other pieces of music. But even if we aren't able to find many reliable scientific laws about music, we can sometimes create interesting new tools for making it. Doing so requires a combination of intuition about music and mathematical understanding of the underlying medium, sound. Here I'll discuss one point of view on sound as a medium, which emphasizes its dependence on time.
More than any other art form, music concerns itself with time. A piece of music takes us on a path from its beginning to its end. A work of visual art, reflected from right to left, might still be recognizably the same work. But a piece of music turned from back to front is garbage. Time is the independent coordinate of music on which the dependent ones (pitch, loudness, ...) depend. Musical phrases are functions of time in the same way that a gesture in a painting is a function of two spatial dimensions.
This time-orientation may derive partly from the fact that music is usually transmitted using sound. When we perceive a sound, we can detect extraordinarily small changes in timing, easily down to a millisecond; but our acoustical perception of spatial location, even in the best of conditions, is limited to about one degree of resolution and is usually much worse. Visually, we can sometimes resolve images to about 1/60 of a degree, both horizontally and vertically, but our eyes usually can't perceive time to better than 50 milliseconds or so of accuracy.
This is reflected in the common digital formats for storing sounds and moving images. Sounds are usually stored at between 40,000 and 50,000 frames per second, but with only between 2 and 6 spatial channels. Video normally requires only 30 frames per second but even low-quality video typically needs 500,000 or more spatial channels (pixels).
Since music is made of sound, and since sound, in our perception at least, can approximately be reduced to one or a few real-valued functions of time, we can get some insight into the workings of music by thinking about the real-valued functions of time and our perception of them. This will never lead to anything like a theory of music, but can shed light on the reasons music is what it is in certain respects--and can help us greatly in our attempts to make music.
The space of real-valued functions is symmetric with respect to translations in
time:
. This is a linear operator whose
eigenfunctions are sinusoids:
Both natural sounds and the sound of the human voice abound with signals
(real-valued functions of time) that are approximately periodic. Since
periodicity is a time symmetry, it is natural to look at the eigenvalue expansion
of a periodic function of time. A signal with period (and hence
with frequency radians per unit of time) can be expanded as:
Spectra of periodic functions (their Fourier series) can be graphed as in Figure 1. The amplitude of the th harmonic is . The timbre of the periodic sound is thought to depend mostly on a (not well-defined) curve called the spectral envelope, shown in the figure; if the fundamental frequency is changed but the spectral envelope kept the same, the resulting sound often has a similar timbre.
Combining two or more periodic signals whose fundamental frequencies have simple ratios (fractions with integers less than about 7 in the numerator and denominator) gives rise to spectra with many shared frequencies; this is the basis for the Helmholtz theory of harmony [1], which depends on the fact that the Fourier series, considered as a spectrum, occupies frequencies at integer multiples of the fundamental frequency.
With this simple spectral model of sounds in mind, we can now develop some of the fundamental techniques for operating on sounds. These techniques recur constantly in efforts to synthesize and process musical sounds electronically, for example with a computer.
Adding a sound to a time-delayed copy of itself sets up an interference pattern in the spectrum of the sound. This is the acoustic analog of a diffraction grating in optics. If the delay between two copies is , the two will interfere constructively at frequencies , , , and destructively halfway between these points. Figure 2 shows a spectrum of an input signal, and the resulting spectrum from adding two copies, one delayed in time compared to the other.
Linear combinations of many differently delayed copies of a signal give rise to more complicated interference patterns in the spectrum. A huge field of study is concerned with choosing particular linear combinations so that the interference pattern has desired properties, such as enhancing one frequency range compared to another [2]. Electrical engineers and electronic musicians call this technique filtering.
Filters arise in the natural world whenever sound encounters a cavity or barrier; for instance, the human vocal tract can be thought of as filtering the raw output of the glottis (vocal fold). To see why this is so, imagine the sound of the glottis scattering, separately, off each point of the surface of the vocal cavity. At the output (the mouth and nose) you get the superposition of all the scattered (and hence delayed) copies: a filter. This is essentially the same picture as Feynman used to describe quantum scattering as a superposition of all possible paths of the particles in a system.
Returning to our complex sinusoid,
, we
try multiplying it by another one, say
.
We get a third one,
The resulting spectrum can again be that of a periodic signal (if the ratio of the frequency of the modulating signal to the original fundamental frequency is a fraction with small numerator and denominator), or otherwise it might have no audible fundamental. Both possibilities can be musically useful, depending on the context.
The spectral envelope of the result resembles that of the original, unmodulated signal as long as the modulating frequency is small compared to the original fundamental; otherwise it can be greatly distorted. So the one operation can affect both tuning and spectral envelope with some degree of independent control.
A technique familiarized in Rock and Roll music of the sixties, but with antecedents in electronic music, is simply to distort sounds to change their timbres. If is an incoming signal, we compose it with a non-linear transfer function , and listen to the result, . In the R&R tradition, is the electric guitar signal and is the transfer function of the overdriven amplifier.
For example, let be a real-valued ``sinusoid" with time-varying amplitude:
Possible input and output functions are graphed in Figure 4. If is chosen
to be a polynomial or a convergent power series, the actions of the monomials
in will be mixed in ratios depending on : changing amplitude in
the input changes timbre in the output. Skillfully chosen input and transfer
functions can give rise to ``nice" spectra; for instance, choosing
Using primarily these three fundamental techniques, electronic musicians operate on a starting palette of sinusoids, white noise, and/or recorded sounds, to produce a huge variety of electronic sounds for use as raw materials in making new forms of music.
It is frequently desirable, in dealing with sounds electronically, to analyze the frequency content of a sound. In the simplest situation we assume the sound is a finite sum of sinusoids and we would like to know their frequencies and (complex) amplitudes. This would be easy except for the fact that the frequencies and amplitudes in question are usually changing, sometimes quite rapidly. Very few sounds in nature or of electronic origin are well modeled as a sum of eternally unchanging sinusoids.
The most frequently taken approach to this problem is to extract a short segment
of the signal to be analyzed, hoping that whatever components are present
haven't had time to vary much within the segment. For example, if the function
to analyze is , one could first multiply it by a windowing function
such as:
The main peak of Figure 6 is in width, centered about . The shorter we make the time segment , the more spread-out the frequency-domain peak will appear. This is the Heisenberg uncertainty principle in action [5, p. 126].
Since the Fourier transform is linear, a superposition of sinusoids would give a superposition of peaks on the frequency axis. To fully resolve them we would need the peaks to be separated by the peak width, . If the sound is periodic, the analysis should be done over a length of time containing at least four periods of the sound. (In practice we can often allow some overlap, reducing this to about three).
Looking at a sound's Fourier transform, we can determine the frequencies and amplitudes of its sinusoidal components by fitting the observed peaks with their known theoretical shape. The spectral envelope can be estimated as well.
Furthermore, a given audio signal can be modified in interesting ways by taking its Fourier transform (using a sequence of overlapping analysis segments called windows), performing some operation, and then taking the inverse Fourier transforms to reconstruct a modified signal. For example, the spectral envelope of one signal can be ``stamped" on another by modifying the magnitude of the latter's Fourier transform non-uniformly.
Many details have been glossed over in this very brief introduction; moreover, only that part of electronic music which deals with realization of pieces of music has been treated. Present-day research also touches on the design of real-time software systems and human-friendly controls for music making; computer understanding of musical form and computer-aided composition; music perception; and music in multimedia applications. Within the narrower field described here, many problems remain open and improvements are constantly sought in our existing repertoire of techniques. Mathematicians can find excellent work here.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -no_navigation artmath-reprint
The translation was initiated by on 2007-08-13