Elucidating Human Expressive Elements Endowable to Computers in Improvisational Environments

Derek L. Keller - Mus 209 with Miller Puckette - Spring 1999

Computers are the greatest expression of man's desire to control. They are a pure representation o authority. They are constructed of the utterly ambiguous "elementary particle" of presence and absence, on and off, one and zero. Computers are metatechnology, almost infinitely flexible and bristling with potential.

- David Rokeby

Just a few years ago I was in the University of Georgia New Music Studios discussing my next project with my professor, Leonard Ball. I told him that I wished to create a largely improvisational work that would allow myself and a computer to interact, or improvise together. He referred me to MAX, an object oriented programming language developed by Miller Puckette in the late 80s. For me, this program would be the impetus towards developing improvisational environments with computers. Though my experience with MAX has been and continues to be very rewarding, I have an increasing desire to instill a greater sense of human presence or intelligence in my computer partner's understanding of improvisation. Since my experience with interactive systems has been relatively brief thus far, this desire has required research to begin to understand how this human quality of expression is invoked. This research yields that I am by no means alone in my endeavor, and has provided a wealth of information to distill. The following discourse is an effort towards defining the parameters required for developing interactive music systems endowed with an intelligent presence predominantly in an improvisational musical space. Through doing so, established performance paradigms will be compared and examples of implementation thereof will be presented. A brief presentation of opinions concerning why interaction of this nature is beneficial to the artistic world will serve as a conclusion.

When thinking of improvisation, a number of terms and ideas come to mind: listening, reacting, dialogue, convergence, divergence, spontaneous creativity, communication. These ideas take on different meanings when one considers the different contexts in which improvisation is experienced as a listener or a performer. These contexts can be defined by setting or ensemble, such as a jazz combo or something resembling a Fluxus Happening. Within these settings and others, the issue of what is being improvised is brought to bear: is it a jazz standard? Is it free? If it is free, is there a script or formal outline to follow? Is it tonal or atonal? Is it textural or noise oriented? As these questions are gestated, different performance settings and corresponding models of interaction can be established. considering these different settings so as to apply them to performance paradigms involving computers can be very beneficial in developing successful and effective environments for improvisation. Most importantly, as these paradigms are gradually made manifest in a developing work, one continually reconsiders modes of interaction with the computer with respect to interaction with human performers, thus instilling, purposely or not, a human touch in the computer's artistic contribution to the work.

These paradigms have been evolving over the last 10 to 15 years, paralleling the development of real-time interactive improvisation with computers. Regarding this evolution, authors of the research and literature in this field have had progressive ways in which they have codified these performance paradigms that are worth examining.

In his text Interactive Music Systems (1993), Robert Rowe proposed a classification system "built on a combination of three dimensions whose attributes help identify the musical motivations behind types of input, interpretations, and methods of response" (Rowe, p.6). These dimensions were meant to be permeable where "any particular system may show some combination of the attributes" (Rowe, p.6)

The first of Rowe's dimensions in which systems are distinguished separates those systems that are score-driven from those that are performance-driven. Score-driven programs are designed to execute pre-programmed musical events representative of more musically traditional accompaniment patterns like regularized beat, tempo, and meter. Performance-driven programs do not have pre-determined material to match against performer input. Further, temporal flow in these systems is defined by a more conceptual understanding of musical time and space.

The second dimension categorizes the basic methods in which a system responds to musical input: transformative, generative, and sequenced. Quite simply, transformative methods produce variations on incoming musical data. Transformation may take place simultaneously as the musical material is being performed or may be transformed then stored for later execution. A system that employs generative algorithms may use small fragments of input or nothing at all to generate a musical texture, thus utilizing more randomized techniques. A sequenced musical response is such that musical material is pre-programmed and then executed by cues supplied by the performer in real time.

Finally, the last dimension differentiates instrument and player paradigms. A instrumental paradigm is one in which the system augments or elaborates the gestures of a human performer in some way. The player paradigm is central to this discourse in that it incorporates the system as an additional performer with a character of its own (Rowe, p.7-8) It is in this situation where dialogue between human and machine players is most prominent, most flexible and, subsequently, the most difficult to develop from a programming standpoint. In his article, Transforming Mirrors (1995), David Rokeby presents four paradigms that an interactive work can embody: a navigable structure or world, a [self-sustaining] creative medium, a transforming mirror, or a automation (Rokeby, p.138). Though Rokeby's article deals with interactive art works in general, these paradigms can be applied to interactive music systems as well.

A navigable structure "can be thought of as a articulation of a space... with a sort of architecture... and a method of navigation. The navigable structure and its system of navigation together make up a guidance system through which the trajectory of the user throughout the work may be subtly controlled (Rokeby, p.138-141). Using Rowe's distinguishable attributes of his system of classification, a navigable structure can be assembled. A score-driven program with its predetermined or pre-programmed sequences of musical events can be navigated in real time with cues provided by the performerÕs input, improvised or meticulously performed from a part. These cues, most effectively transmitted via MIDI, would allow the program to advance from one state or passage to the next only when the performer saw fit. Hence, the performer is the navigator.

Rokeby's second paradigm, a creative medium, is interesting in that, "the artist enables the interacters to express themselves creatively" (Rokeby, p.143). This paradigm is slightly more difficult to translate into a combination of Rowe's dimensions owing to the fact that Rokeby is speaking of creative mediums such as MacPaint, a software package designed for the masses that allowed artists, professional and novice, to express themselves through a computer medium. However, if this creative medium model could be applied to the idea of using the system to enhance, elaborate, or augment a performer's creative or musical gestures, then a paradigm using Rowe's dimensions is applicable. For instance, a performance-driven program could incorporate a player paradigm, using the system to enhance a solo improvisation, ultimately using the system to push the envelope of human capacity. This could cause two results: a superhuman performance and the possibility of using the computer as an expressive musical instrument.

When defining his model of transforming mirrors, Rokeby uses interactive video installations as his example. In this model, "the spectator sees some representation of himself or herself on the video screen... which follows [their] movement like a mirror image or shadow, transformed by the potentials with which the artist has endowed the computer. While unmediated feedback of exact mirroring produces the closed system (the reflection of the self reabsorbed), transformed reflections are a dialogue between the self and the world beyond" (Rokeby, p.146). Rowe's system of classification is easily transferable: a performance-driven program is endowed with transformative methods to alter and reflect back (via loudspeakers) to the performer his or her transformed musical image in dialogue.

To define the automation model, Rokeby refers to Norman White's robotic creations. Here the machine is an autonomous entity endowed with the ability to "make sense of its environment," (Rokeby, p.151) or to adapt to its surroundings. At first this, like the creative medium model, may seem difficult to apply to Rowe's system. However, as interactive music systems have evolved, programming methods (owing largely to the pioneering research of David Cope) such as pattern matching or pattern induction have been developed to allow a system to "adapt" to a performer's musical input (Rowe, p.237). Pattern induction requires a system capable of learning to recognize important sequential structures from repeated exposure to musical examples. Pattern matching requires the system to match incoming musical material to the newly recognized or "learned" material (Rowe, p.237). A pattern worthy of recognition could be a beat pattern, a chord progression, a repeated melodic fragment, etc. This type of programming could be implemented in Rowe's performance-driven model, with these semi-generative induction and matching algorithms, where the computer represents the player paradigm.

The most lucid codification system of interactive performance paradigms has been developed by Todd Winkler. In his text Composing Interactive Music (1998), Winkler describes three models of interaction. However, he directly relates them to three very distinctive human ensemble settings:

1) conductor model/symphony orchestra - predetermined score and predetermined computer sequences

2) chamber music model/string quartet - predetermined score and predetermined or indeterminate computer actions

3) improvisation model/jazz combo/free improvisation - performer improvisation and indeterminate computer actions

In the first correlation above, it can be understood that the conductor, from his score, conducts the orchestra, through their parts, to create a fully preconceived musical space. Thus the conductor interprets the music through his use of gesture. Winkler states that, "Good orchestras are interactive when orchestra members are responsive to a conductors gestures, when they listen to each other, and when the conductor listens to the orchestra as a collection of individuals" (WInkler, p.23). This correlates to the idea that in a work for solo and computer, "the composer [calls] upon the performer to supply tempo and possibly dynamic information, the 'conductor model,' to impart some human expression to an expected computer accompaniment" (my emphasis)(Winkler, p.292). Further still, "the computer contains some information about the performer's score, and matches incoming events to a predetermined sequence of actions" (Winkler, p.292).

Concerning the second example, Winkler presents the following idea behind the correlation. Each member of a string quartet at some point exercises control over the ensemble where "each are capable of exhibiting character and independence" (WInkler, p.25). This paradigm can be made manifest in a interactive setting with a computer where the soloist gives control or responsibility to the computer to execute a sequence of events via a cue. This sequence of events could involve both predetermined actions or the computer accompaniment could utilize more generative algorithms. As this event of independence curtails, the computer yields control back to the soloist in the act of waiting for the next cue or by allowing the performer control over dynamic parameters or tempo fluctuations.

In the traditional sense of a jazz combo, most aspects of the typical jazz standard are available to the performers for improvisational interpretation. Here too there is a taking and yielding of control. Winkler states that, "relationships change frequently, as two members trade riffs, or a third jumps momentarily to the conversational foreground" (WInlker, p.25). As solos are frequent in jazz combos, it is particularly exciting to hear how the supporting accompaniment reacts, embellishes, or answers a soloist. Winkler correlates this scenario to one where the soloist supplies musical fragments on which the computer improvises or where the computer adapts to the performer through pattern induction/matching as described earlier. This paradigm is the most variable, most stimulating, and requires the most sophisticated programming methods. The reason for this results from the fact that as the computer is imparted with more independent responsibilities of developing accompaniment or supporting dialogue, composers strive to instill the computer with more human or intelligent capabilities in its capacity to coexist in an improvisational musical space.

Through the study of the different human-to-human paradigms, particularly the improvisational settings, one will begin to understand the multitude of dimensions to consider when developing an interactive system of response embodying human-like characteristics. In my own work developing improvisational environments with computers, I have found that this type of study particularly rewarding. At the time, Winkler"s text was not yet available. However, my consideration of the human-human interaction in a free jazz or jazz fusion setting was paramount in developing my first interactive work, Holtranix II. In this work, the computer assumes two roles. First, it supplies an accompaniment that is largely improvised through the use of randomized parameters. In essence, the large structure of the work remains intact while local events within each section are realized anew each time the piece is performed. Second, the computer also responds to input from the performer. The computer's output consists of both self-generative music and that which is dervived from the performer through the use of real-time multi-track sequence recording and playback. It was my goal in this work to give the impression that the computer was contributing in some artistic capacity to the musical event. I did this largely in two ways, one in the form of audible response to the soloist's improvisation and the other a space actually designed strictly for a computer "solo."

As mentioned earlier, hearing a musical fragment or phrase migrate throughout an improvisational ensemble is exciting owing to its spontanaeity of passage and its availability to transformation. In Holtranix II, there are a number of places where the soloist's material is recorded and played back in dialogue with the performer. The input is recorded in fragments of various time lengths. When played back, there is not a rhythmic transformation, however, the medium through which the fragment is reiterated into the improvisation is much different. I programmed the computer to play these fragments through a MIDI software sampler (SampleCell II). The sampler is loaded with sounds of my own dreation consisting of those resembling percussion instruments, those useful for creating complex textures, and those which would lend themselves to chordal sonorities or melodic passages. To break up this form of exact response, the computer is programmed to switchback and forth between this and another type of response largely composed of randomized events that in turn influence what the improvisor is developing. As the improvisation progresses, the listener can hear rhythmic and melodic fragments aswering or echoing the performer much like call and response in a jazz combo setting.

To enhance further this quality of contribution, I specifically programmed a section of the work where the computer actually improvises a solo and the performer improvises a chordal accompaniment in support of that solo. As I constructed this bit of programming, I considered what was most important in a human improviser's development of a solo. There is more than one answer to this query of course. However, the aspect I concluded here was the performer's breath. Not only does the performer have to breath, it is my opinion that the manipulation of breath is one of the most defining elements of phrasing in a solo. With this in mind, I simply constructed an element into the program that allowed for a space or "breath" in between phrases. The phrase lengths themselves are also varied. This ultimately endows the computer's solo with a variable sence of pacing, expansion and contraction.

When one experiences being "moved" by expressive qualities in music, what is inherent in performance that causes that state? Where does a passionate performance come from? In Michael Pelz-Sherman's Ph.D. dissertation (UCSD, 1992), he presents evidence from performance studies showing differing opinions regarding the source of expression in performance. He states that, "at one extreme, expression in music is seen to be almost entirely a result of the conscious formulation of performance plans. At the other, expression is seen as a product of unconscious, emotional, and physical responses acquired through conditioning, heredity, and evlolution" (Pelz-Sherman, p.19). Both statements are true to an extant, however, they are not going to make a bit of sense to a computer.

Like any developing musician, before a computer is involved in a musical experience of any kind, it must be able to listen or hear and it must have some capacity to play or create. Through the standard protocol of MIDI, computers can begin to hear, play, and interpret music. These three abilities become highly stylized as humans develop into fine musicians. Winkler explains that, "musicians best known for their passionate performances exhibit their personal style in the control of minute variations in timing, articulation, and dynamics. These are the aspects of performance that best impart human interpretive skills to the computer" (Winkler, p.137). These aspects are implemented in computers via MIDI in terms of its understanding of pitch (MIDI pitch number), loudness (MIDI velocity number), and time (a durational value usually presented in milliseconds). A fourth aspect of great importance, timbre, has been recently made more manageable for implementation in interactive systems as an emitting source of expression, thanks to programs such as Puckette's program Pd (Pure Data) which can perform real-time audio signal analysis. With these basic raw materials in combination with software like Puckette's MAX and Pd, Rowe's Cypher, and others, a composer can begin to formulate ways of endowing an interactive system with human capabilities such as intelligent decision making, short and long term predictions, and stylistically informed creativity. With brevity in mind, a few examples of implementation will be presented varying from localized programming methods to techniques devoted to entire musical works.

The first example involves the implementation of expressive timing in performance as applied in Robert Rowe's software, Cypher. Cypher is a program consisting of two main components: a listener and a player. "The listener analyzes, groups, and classifies input events [from the performer] as they arrive in real-time without matching them against any preregistered representation of any particular piece of music. The player uses various algorithmic styles to produce a musical response [based on the information gleaned by the listener]" (Rowe, p.139). On a very basic level, expressive timing in music consists of slight tempo fluctuations or shadings, ritardandi, and accelerandi. In response to a performer's input, Cypher's listener can analyze these variables at the micro and macro levels. Using this information, Cypher can perform two different temporal affectations: 1) directed temporal operations [which] perform a linear transformation of the offset times separating events, either lengthening them (decelerando/ritardando) or shortening them (accelerando), and 2) static operations [which] add small, non-directional changes to the length of event offsets (Pelz-Sherman, p.114). Cypher's player can perform these affectations on its own musical response at the micro and macro level as well. This means that temporal affectations to single phrases or sub-phrases can effect the way temporal affectations are performed over a longer expanse of time such as phrase group or an entire section, for example.

Sherman's use of intensity curves and pitchvolurations is an interesting and valuable example of implementation. Sherman defines musical intensity "as a percentage of deviation from the median or default values... of an [incoming] musical event." This intensity can be traced through time represented as a curve on a graph or table where time is on the horizontal axis and intensity level is on the vertical (Pelz-Sherman, p.53). These intensity curves can represent incoming data such as pitch number, velocity number, and duration. In fact, Sherman uses the term pitchvoluration. This term is a percentage value that controls intensity deviations affecting the system's outgoing response at the macro level which is a sum of the intensity curves analyzed at the system's input on the micro level: pitch, velocity, duration. Pitchvoluration values can be "plugged" into other parameters that control for example, simple tone production in a synthesizer's output (through MIDI system exclusive commands).

The third example is the program Voyager, specifically written for a improvisational piece of the same name developed by trombonist, improviser, and composer, George Lewis. It is a continually developing work in that it has changed mediums (computer operating systems) more than once. However, the paradigm implemented has remained fixed from the point of its inception in 1985. The paradigm is a result of careful consideration of what the relationship is between the computer and the performer. Winkler sheds light on this relationship explaining that, "Lewis's strategy allows for a great deal of independence between [the] computer and the performer, establishing musical personalities that do not directly control each other, but rather have mutual influence that contributes to the final outcome of an improvisation. His goal is to make the computer 'playing' listenable as music on its own by viewing its behavior as separate from and independent of the performer" (Winkler, p.27). In his own words Lewis describes that , "the interaction takes place in the manner of two improvisers that have their own 'personalities.' The program's extraction of important features from my activity is not reintroduced directly, but used to condition and guide a separate process of real-time [generative] composition" (Winkler, p.27). In this model, both the performers, human and machine, are listening and responding to each other's behavior. In the latest version of the work, at least 30 parameters are considered by the program, accommodating input and generating output, as an improvisation unfolds. Considered parameters include volume, sounding duration, octave, register, interval width, pitches used, volume range, frequency of silence, and articulation. These parameters are monitored and averaged over time; the values of which are transferred to the music-generating process which consists largely of the same governing parameters at the input. During an improvisation, the human performer may stop playing at some point. Hence, the system will not receive input. At this moment the parameters described above at the output are controlled by random number generators. All of the processes create what Lewis refers to as the "personality" of the system (Lewis, 1993).

In a presentation given at UCSD in the Winter of 1999, composer Gerhard Winkler presented a similar paradigm designed for one of his works where both the performing ensemble and the computer system inform each other. However, the computers output is not just transformed musical output amplified through loudspeakers. Winkler also developed an aspect of the program that produces a graphic representation of output from which the performers would play in a semi-improvisational manner. In essence all of the performers have Macintosh Powerbook computers in front of them presenting various screens of information. The sound produced by the individually-miced members of the ensemble is captured by the system which calls upon a library of the screens as a response to the input. This produces a constantly evolving, cyclical unfolding musical space like Lewis's Voyager, yet the performers are working form graphical notation. Another difference is that Winkler's system blurs the distinctive relationship between human and machine players. Finally, the human and machine players are inextricably bound together (borg music?), where if one fails, the musical space has the potential to collapse.

In this quest for the endowment of computer systems with cognitive abilities, one must consider the goals or end results of such a task. Why does one want to interact with computers? Sherman has the simple goal of not necessarily producing "realism... but rather plausibility." In other words, "a listener, upon learning these nuances, should be able to construct a mental image of the virtual performer based on a degree of predictability in its response to input" (Pelz-Sherman, p.21). Rowe believes that, "interactive music programs can change the way we think about machines and how we use them" (Rowe, p.262). This idea of the computer causing one to think differently is expressed by David Rokeby in terms of a reflection of the self. In a concluding manner, he states," that to the degree that technology reflects ourselves back recognizably, it provides us with a self image, a sense of self. To the degree that technology transforms our image in the act of reflection, it provides us with a sense of the relation between this self and the experienced world" (Rokeby, p.133). For myself, I simply desire to experience another enticing setting for improvisation utilizing the wildly unlimited potentials of the computer.

Sources Cited

Lewis, George E. Voyager. CD. Avant 014. 1993.

Pelz-Sherman, Michael. "On the Formalization of Expression in Music Performed by Computers." Ph.D. dissertation, Universtiy of California, San Diego, 1992.

Rokeby, David. "Transforming Mirrors: Subjectivity and control in interactive media." In Critical Issues in Electronic Media, edited by Simon Penny. Albany: State University of New York Press, 1995.

Rowe, Robert. Interactive Music Systems: Machine Listening and Composing. Cambridge: MIT Press, 1993.

Winkler, Todd. Composing Interactive Music: Techniques and Ideas Using MAX. Cambridge: MIT Press, 1998.

Additional References

Baggi, Denis, ed. Computer-Generated Music. Los Alamitos: IEEE Computer Society Press, 1992.

Boden, Margaret A. "Artificial Genius." Discover, vol. 17, No. 10, October 1996.

Cope, David. Experiments in Musical Intelligence. The Computer Music and Digital Audio Series, vol. 12. Madison, Wisconsin: A-R Editions, Inc., 1996.

Cope, David. Computers and Musical Style. The Computer Music and Digital Audio Series, vol. 6. Madison, Wisconsin: A-R Editions, Inc., 1992.

Col Legneo. Gerhard E. Winkler, Edgar Varese, Morton Feldman. CD. WWE 1CD31872. 1994.

Rosenboom, David. "Extended Musical Interface with the Human Nervous System." In Leonardo, Leonardo Monograph Series, No. 1. Berkeley: 1990.