Historically, interactivity has played a central role in electronic music, since electronic musical instruments have usually used acoustical musical instruments as points of departure, and musical instruments are all designed to be played. In the design of electronic musical instruments, the way the instrument is played has often been at least as important as the method of sound generation, and in many important cases (the Theremin, for example), the means of playing the instrument is the instrument's defining characteristic.
In the electronic arts, too, the way sound is generated (if at all) may be less important than the way the art object responds to the changing situation, either by sensing peoples' voluntary or incidental actions, or by responding to environmental changes, which often arrive in the form of signals. Even if some signal (temperature, for example) doesn't have an acoustical origin, operations on signals that we've presented as acoustical can still be helpful. In any situation in which the quantities of interest are functions of time, we can act as if they were sounds.
To make an object that allows interactivity on the basis of signals we will need tools for measurement and for controlling aignal generators as functions of measured quantities.
From a traditional viewpoint, the object of control is to find the inputs to a system that achieves a desired output. For instance, if you're heating a house, you have a desired ideal temperature and you can turn the heater on and off in order to try to keep the actual temperature as close as possible to the desired one. There is a whole theory about this, called, appropriately enough, control theory. This might be a reasonable way to think about the situation in which a violin player, for example, is trying to control the pitch the instrument is putting out. Assuming you can measure (hear) the violin's output, you then move your finger in one direction or the other to correct whatever the error might be. In other situations there isn't a specifically desired output; it might instead be desirable that the object behave in some complicated or unpredictable way.
Central to control theory is the possibility of feedback: a human (or a machine) measures what the output is at a moment in time and adjusts the input as a function of the measured output. If you can make an accurate measurement of the output, and if there aren't other external forces generating inputs to a system (noise, for instance, or variations in time of the system's characteristics), you should be able, by trial and error, to get to any desired output that is in the range of the system. But of course, it's often the possibility of other external forces (such as an antagonistic player) or time variation (the ice sculpture is gradually melting) that makes the whole thing interesting.
Even in the absence of such interference, it may be important not only to be able to reach a desired result but to be able to do so in a fixed amount of time, or make the output reach a succession of values at specified times. Or perhaps it's not important to arrive at specific pitches at specific times, but rather to make any of a huge number of possible pitch curves that will plant the desired perceived pitches in the listener's head. Such things are sometimes made easier through an understanding of control theory but are often only accessible through a long learning process.
In some situations measurements are best made directly by hardware (photoresistors, accelerometers, etc), and appear to the computer as voltages that can be converted to signals using either the computer's audio ADCs or other types of ADCs that are more appropriately adapted to ``control" inputs. (ADCs that re optimized for audio input often have built-in high-pass filters that reject constant offsets, such as the DC output of a stationary accelerometer, that might be important in a non-audio context, whereas many kinds of physical measurements can be transmitted at much lower bit rates, and hence more efficiantly, than audio ADCs use.)
Other types of measurement can be done on audio (or other) signals to generate other signals. An example we're already mentioned is measuring the average power of a signal. A more sophisticated (and trickier) example is measuring an audio signal's pitch. In measurements either of power or pitch, one usually wants not a single answer but, instead, a series of estimates made at different times. Each estimate then depends on values of the signal over an interval of time called a window, as shown:
For each sample of the output, a separate analysis is run over the window consisting of the most recent samples of input. The parameter is called the window size. (In practice, it's often not necessary to recompute an analysis such as power or pitch for every sample of output but at longer intervals, since they don't tend to change as fast as the samples of the signal being analyzed.)
Most algorithms for making measurements on signals do some form of averaging to make aggregate estimates about the signal's behavior over the span of a window. The larger the window, the more accurate such a measurement may potentially be. On the other hand, it may be desirable to keep a window size small if a high time resolution is needed. There is often a trade-off between time resolution and accuracy.
As a matter of efficiency many sorts of measurements (wither directly harvested from the outside world or measured from other signals) can be represented as signals at a lower sample rate than the audio sample rate, but on the other hand some require higher ones than the audio sample rate. In practice audio software often maintains multiple sample rates to address these situations, but we won't worry about that for now.
A signal may be used to control the generation or processing of another signal. The distinction between an ``audio signal" and a ``control signal" is almost a purely psychological one; for instance, one might control the amplitude of a signal by multiplying it by a (more slowly changing) signal . But we might instead regard as the ``control" signal and and the ``audio" one. But even though the difference has no substance, it is a useful one to maintain because certain operations are more likely to come up in ``control" usage than ``audio" usage. And signals that are thought of as ``measurements" in the senses described above are likely to be used in ``control" contexts.
The most frequent, and perhaps the most fundamental, issue that comes up in control is scaling. If you have a measurement whose natural range is from to (for instance, a thermometer outside in San Diego might give outputs ranging from 40 to 80), and if you want to control something whose range is from to (for instance, you might want the frequency of an oscillator to range from 110 to 440), you will at the very least want to be able to convert one range to another. In real situations there will typically be many different such ranges to convert between, and the most efficient way to manage this is often to standardize on a range (most conveniently, the unit interval, which reaches from 0 to 1) and to be able to convert signals with other ranges to and from that one.
To convert a signal ranging from to to one whose range is the unit interval, first subtract (so that the range is now from to ), and then divide by to re-scale the upper value to , as shown:
And supposing that we have a signal whose range is the unit interval and that we wish to make it range from to , we do the reverse: first re-scale it so that it ranges from to , and then add to slide the range to where it should start:
In these drawings we have chosen and but we could have chosen the reverse, in which case the conversion would invert the direction of the signal.
Going back to the thermometer example, it might be a good thing to deal with cases where the thermometer strays outside of its intended range. The easiest way to do this is to clip the signal to its desired range. Assuming that we have scaled the signal so that its desired range is the unit interval, we can then clip the signal by replacing values outside the range with the appropriate endpoint (0 for negative numbers, and for numbers greater than one). That is equivalent to applying this function to the signal:
Anohter class of operations on signals deals with their behavior in time. Some of these, such as delays, are already familiar as audio operations. In particular, filtering, which is useful in audio processing as a way to modify the spectrum of a signal, may be used on control signals for a different purpose: smoothing a control that changes too abruptly or noisily. For example, if we wish to control the amplitude of an audio signal using a switch (which appears to us as a signal that jumps between 0 and 1), we might wish to alter the switch's output so that it ramps between 0 and 1 over an interval of time on the order of 1/20 second. One conceptually simple way to do this is simply to low-pass filter the signal at 20 Hz. Doing this allows us to avoid the click that sounds when a signal changes discontinuously.
In computer audio applications (and, more generally, in the electronic arts), it is frequently desirable to detect a natural event or a human action and to make a causal response to it. An event can be loosely defined as the knowledge that a certain thing has happened at a certain time, often accompanied with some data to further describe it. For example, if you press a key on a musical keyboard, this can be made to generate an event in software, that might be accompanied with data specifying which key was pressed and how hard.
Other things treated as events in interactive computer software might include user input on a computer, arriving network packets, or the ringing of a virtual alarm clock. Here, since we're focusing on audio signals, we'll only worry about a specific class of events that occur when we detect some feature in an audio signal. One might wish to be able to detect features such as the presence of speech, or the arrival of a specific pitch from an instrument, or whatnot, and the detection of the event might require sophisticated software.
Here, we'll just look at what might be the simplest example, which is threshold detection, in which we generate an event whenever a signal exceeds a fixed threshold, such as the temperature of a room rising above 70. Since the thing being measured could itself be the result of many different possible calculations, even though the notion of threshold detection is very simple, it can be very powerful.
Although in most software events are treated as a completely different type of data from signals. for our purposes, since we've only manipulated signals so far in these course notes, we'll offer a notion of threshold detection that results in a pulsed signal, as shown:
The output is a rectangular pulse. Unlike the pulses we've seen before that are for listening to, and that are rounded to control the audio bandwidth, a rectangular pulse changes discontinuously from one sample to the next to mark an event. (This is the way an event would be marked in an analog synthesizer; they are not much used in computer audio but we're using one here so that we can stay in the framework of audio signals.)
We could now design signal operations that are triggered by pulses, in the way an analog sequencer or envelope generator would; but for now we'll concent ourselves with just obtaining one.
[These are all review problems.]
1. A square plate is vibrating sinusoidally to create a `beam' of sound. (Idealize this as in chapter 7 to a 1-foot line segment). At what frequency must it vibrate so that the beam spreads 30 degrees to either side (that is, so that the intensity drops to zero 30 degrees off axis)?
2. If you wish to form a beam with the same dispersion (spread), at a frequency two octaves lower, by what factor would you have to increase the dimensions of the square plate?
3. How many watts should you emit from a speaker (assuming the sound goes equally in all directions) to reach a sound level of 80 dB at a distance of 10 meters? (Assume you're away from any reflecting objects so that you only need consider the direct sound.)
4. What is the wavelength, in air, of the musical F above A440 (the musical A defined as 440 Hz.)? (You can answer in feet with c=1000 feet per second, or in meters at 343 M/sec.) Assume we're using the tempered scale.
5. If a critical band is 300 Hz. wide, what, approximately, is its frequency range (given by its bottom and top frequencies)?
6. How fast must a sound source be moving away from you so that Doppler shift makes the pitch of the sound decrease by one octave?
Project: low-pass filtering as a smoothing operation.
Perhaps the most fundamental and important tool in dealing with sounds is controlling amplitudes by applying a gain to a signal. You do this any time you change the volume on your phone, for example. It's not as simple as it sounds. If you change the gain of an amplifier too quickly the sound will not just change amplitude but will often make an audible clicking sound as it does so. This is a major problem if the quality of the sound matters.
Here is a patch to demonstrate/test this idea:
The three objects on the left are a straightforward sinusoid with amplitude control via a ``multiply" object. On the left, we're generating a signal to turn the sinusoid on and off, by thresholding another, slow sinusoid (the one at top left.) The threshold signal is a series rectangular pulses, three per second, each one 0.1 seconds long.
We're using a ``lowpass" object to smooth the edges of the pulses. The cutoff frequency of the low-pass filter determines the sharpness of the edges. In the picture above, the cutoff frequency is set very low to exaggerate this effect so that you can see it. You might want to try values between 2 and 20 Hz. to see how they affect the picture you get from the ``record" object at bottom.
The assignment is to find out how much smoothing you need to be able to turn the sinusoid on and off without hearing an audible "click" or "pop". This will turn out to depend on the frequency of the sinusoid.
First, set the frequency of the sinusoid (at upper left) to 2000. Adjust the low-pass filter's cutoff frequency to 20000 Hz. (essentially no filtering at all) and enjoy the clicks. Then drop the frequency to 5000, 2000, 1000, 500, etc., until you find the value at which you just hear a sinusoid turning on and off without artifacts. (Don't be a perfectionist... you can always convince yourself you hear a clock or pop, just get it so that it's not easily audible.)
Now do the same thing with the sinusoid set to 500 Hz, and finally repeat the experiment after replacing the sinusoid with a ``noise" object. What are the three values you had to set the low-pass filter to to hear ``clean" turn-ons and turn-offs for the three situations (2000, 500, noise)?