Elucidating Human Expressive Elements
Endowable to Computers in Improvisational Environments
Derek L. Keller - Mus 209 with Miller
Puckette - Spring 1999
Computers are the greatest
expression of man's desire to control. They are a pure
representation o authority. They are constructed of the utterly
ambiguous "elementary particle" of presence and absence, on and
off, one and zero. Computers are metatechnology, almost infinitely
flexible and bristling with potential.
- David
Rokeby
Just a few years ago I was
in the University of Georgia New Music Studios discussing my next
project with my professor, Leonard Ball. I told him that I wished
to create a largely improvisational work that would allow myself
and a computer to interact, or improvise together. He referred me
to MAX, an object oriented programming language developed by Miller
Puckette in the late 80s. For me, this program would be the impetus
towards developing improvisational environments with computers.
Though my experience with MAX has been and continues to be very
rewarding, I have an increasing desire to instill a greater sense
of human presence or intelligence in my computer partner's
understanding of improvisation. Since my experience with
interactive systems has been relatively brief thus far, this desire
has required research to begin to understand how this human quality
of expression is invoked. This research yields that I am by no
means alone in my endeavor, and has provided a wealth of
information to distill. The following discourse is an effort
towards defining the parameters required for developing interactive
music systems endowed with an intelligent presence predominantly in
an improvisational musical space. Through doing so, established
performance paradigms will be compared and examples of
implementation thereof will be presented. A brief presentation of
opinions concerning why interaction of this nature is beneficial to
the artistic world will serve as a conclusion.
When thinking of improvisation, a number of terms and ideas come to
mind: listening, reacting, dialogue, convergence, divergence,
spontaneous creativity, communication. These ideas take on
different meanings when one considers the different contexts in
which improvisation is experienced as a listener or a performer.
These contexts can be defined by setting or ensemble, such as a
jazz combo or something resembling a Fluxus Happening. Within these
settings and others, the issue of what is being improvised is
brought to bear: is it a jazz standard? Is it free? If it is free,
is there a script or formal outline to follow? Is it tonal or
atonal? Is it textural or noise oriented? As these questions are
gestated, different performance settings and corresponding models
of interaction can be established. considering these different
settings so as to apply them to performance paradigms involving
computers can be very beneficial in developing successful and
effective environments for improvisation. Most importantly, as
these paradigms are gradually made manifest in a developing work,
one continually reconsiders modes of interaction with the computer
with respect to interaction with human performers, thus instilling,
purposely or not, a human touch in the computer's artistic
contribution to the work.
These paradigms have been evolving over the last 10 to 15 years,
paralleling the development of real-time interactive improvisation
with computers. Regarding this evolution, authors of the research
and literature in this field have had progressive ways in which
they have codified these performance paradigms that are worth
examining.
In his text Interactive Music Systems (1993), Robert Rowe proposed
a classification system "built on a combination of three dimensions
whose attributes help identify the musical motivations behind types
of input, interpretations, and methods of response" (Rowe, p.6).
These dimensions were meant to be permeable where "any particular
system may show some combination of the attributes" (Rowe, p.6)
The first of Rowe's dimensions in which systems are distinguished
separates those systems that are score-driven from those that are
performance-driven. Score-driven programs are designed to execute
pre-programmed musical events representative of more musically
traditional accompaniment patterns like regularized beat, tempo,
and meter. Performance-driven programs do not have pre-determined
material to match against performer input. Further, temporal flow
in these systems is defined by a more conceptual understanding of
musical time and space.
The second dimension categorizes the basic methods in which a
system responds to musical input: transformative, generative, and
sequenced. Quite simply, transformative methods produce variations
on incoming musical data. Transformation may take place
simultaneously as the musical material is being performed or may be
transformed then stored for later execution. A system that employs
generative algorithms may use small fragments of input or nothing
at all to generate a musical texture, thus utilizing more
randomized techniques. A sequenced musical response is such that
musical material is pre-programmed and then executed by cues
supplied by the performer in real time.
Finally, the last dimension differentiates instrument and player
paradigms. A instrumental paradigm is one in which the system
augments or elaborates the gestures of a human performer in some
way. The player paradigm is central to this discourse in that it
incorporates the system as an additional performer with a character
of its own (Rowe, p.7-8) It is in this situation where dialogue
between human and machine players is most prominent, most flexible
and, subsequently, the most difficult to develop from a programming
standpoint. In his article, Transforming Mirrors (1995),
David Rokeby presents four paradigms that an interactive work can
embody: a navigable structure or world, a [self-sustaining]
creative medium, a transforming mirror, or a automation (Rokeby,
p.138). Though Rokeby's article deals with interactive art works in
general, these paradigms can be applied to interactive music
systems as well.
A navigable structure "can be thought of as a articulation of a
space... with a sort of architecture... and a method of navigation.
The navigable structure and its system of navigation together make
up a guidance system through which the trajectory of the user
throughout the work may be subtly controlled (Rokeby, p.138-141).
Using Rowe's distinguishable attributes of his system of
classification, a navigable structure can be assembled. A
score-driven program with its predetermined or pre-programmed
sequences of musical events can be navigated in real time with cues
provided by the performer�s input, improvised or meticulously
performed from a part. These cues, most effectively transmitted via
MIDI, would allow the program to advance from one state or passage
to the next only when the performer saw fit. Hence, the performer
is the navigator.
Rokeby's second paradigm, a creative medium, is interesting in
that, "the artist enables the interacters to express themselves
creatively" (Rokeby, p.143). This paradigm is slightly more
difficult to translate into a combination of Rowe's dimensions
owing to the fact that Rokeby is speaking of creative mediums such
as MacPaint, a software package designed for the masses that
allowed artists, professional and novice, to express themselves
through a computer medium. However, if this creative medium model
could be applied to the idea of using the system to enhance,
elaborate, or augment a performer's creative or musical gestures,
then a paradigm using Rowe's dimensions is applicable. For
instance, a performance-driven program could incorporate a player
paradigm, using the system to enhance a solo improvisation,
ultimately using the system to push the envelope of human capacity.
This could cause two results: a superhuman performance and the
possibility of using the computer as an expressive musical
instrument.
When defining his model of transforming mirrors, Rokeby uses
interactive video installations as his example. In this model, "the
spectator sees some representation of himself or herself on the
video screen... which follows [their] movement like a mirror image
or shadow, transformed by the potentials with which the artist has
endowed the computer. While unmediated feedback of exact mirroring
produces the closed system (the reflection of the self reabsorbed),
transformed reflections are a dialogue between the self and the
world beyond" (Rokeby, p.146). Rowe's system of classification is
easily transferable: a performance-driven program is endowed with
transformative methods to alter and reflect back (via loudspeakers)
to the performer his or her transformed musical image in
dialogue.
To define the automation model, Rokeby refers to Norman White's
robotic creations. Here the machine is an autonomous entity endowed
with the ability to "make sense of its environment," (Rokeby,
p.151) or to adapt to its surroundings. At first this, like the
creative medium model, may seem difficult to apply to Rowe's
system. However, as interactive music systems have evolved,
programming methods (owing largely to the pioneering research of
David Cope) such as pattern matching or pattern induction have been
developed to allow a system to "adapt" to a performer's musical
input (Rowe, p.237). Pattern induction requires a system capable of
learning to recognize important sequential structures from repeated
exposure to musical examples. Pattern matching requires the system
to match incoming musical material to the newly recognized or
"learned" material (Rowe, p.237). A pattern worthy of recognition
could be a beat pattern, a chord progression, a repeated melodic
fragment, etc. This type of programming could be implemented in
Rowe's performance-driven model, with these semi-generative
induction and matching algorithms, where the computer represents
the player paradigm.
The most lucid codification system of interactive performance
paradigms has been developed by Todd Winkler. In his text Composing
Interactive Music (1998), Winkler describes three models of
interaction. However, he directly relates them to three very
distinctive human ensemble settings:
1) conductor model/symphony orchestra - predetermined score and
predetermined computer sequences
2) chamber music model/string quartet - predetermined score and
predetermined or indeterminate computer actions
3) improvisation model/jazz combo/free improvisation - performer
improvisation and indeterminate computer actions
In the first correlation above, it can be understood that the
conductor, from his score, conducts the orchestra, through their
parts, to create a fully preconceived musical space. Thus the
conductor interprets the music through his use of gesture. Winkler
states that, "Good orchestras are interactive when orchestra
members are responsive to a conductors gestures, when they listen
to each other, and when the conductor listens to the orchestra as a
collection of individuals" (WInkler, p.23). This correlates to the
idea that in a work for solo and computer, "the composer [calls]
upon the performer to supply tempo and possibly dynamic
information, the 'conductor model,' to impart some human expression
to an expected computer accompaniment" (my emphasis)(Winkler,
p.292). Further still, "the computer contains some information
about the performer's score, and matches incoming events to a
predetermined sequence of actions" (Winkler, p.292).
Concerning the second example, Winkler presents the following idea
behind the correlation. Each member of a string quartet at some
point exercises control over the ensemble where "each are capable
of exhibiting character and independence" (WInkler, p.25). This
paradigm can be made manifest in a interactive setting with a
computer where the soloist gives control or responsibility to the
computer to execute a sequence of events via a cue. This sequence
of events could involve both predetermined actions or the computer
accompaniment could utilize more generative algorithms. As this
event of independence curtails, the computer yields control back to
the soloist in the act of waiting for the next cue or by allowing
the performer control over dynamic parameters or tempo
fluctuations.
In the traditional sense of a jazz combo, most aspects of the
typical jazz standard are available to the performers for
improvisational interpretation. Here too there is a taking and
yielding of control. Winkler states that, "relationships change
frequently, as two members trade riffs, or a third jumps
momentarily to the conversational foreground" (WInlker, p.25). As
solos are frequent in jazz combos, it is particularly exciting to
hear how the supporting accompaniment reacts, embellishes, or
answers a soloist. Winkler correlates this scenario to one where
the soloist supplies musical fragments on which the computer
improvises or where the computer adapts to the performer through
pattern induction/matching as described earlier. This paradigm is
the most variable, most stimulating, and requires the most
sophisticated programming methods. The reason for this results from
the fact that as the computer is imparted with more independent
responsibilities of developing accompaniment or supporting
dialogue, composers strive to instill the computer with more human
or intelligent capabilities in its capacity to coexist in an
improvisational musical space.
Through the study of the different human-to-human paradigms,
particularly the improvisational settings, one will begin to
understand the multitude of dimensions to consider when developing
an interactive system of response embodying human-like
characteristics. In my own work developing improvisational
environments with computers, I have found that this type of study
particularly rewarding. At the time, Winkler"s text was not yet
available. However, my consideration of the human-human interaction
in a free jazz or jazz fusion setting was paramount in developing
my first interactive work, Holtranix II. In this work, the
computer assumes two roles. First, it supplies an accompaniment
that is largely improvised through the use of randomized
parameters. In essence, the large structure of the work remains
intact while local events within each section are realized anew
each time the piece is performed. Second, the computer also
responds to input from the performer. The computer's output
consists of both self-generative music and that which is dervived
from the performer through the use of real-time multi-track
sequence recording and playback. It was my goal in this work to
give the impression that the computer was contributing in some
artistic capacity to the musical event. I did this largely in two
ways, one in the form of audible response to the soloist's
improvisation and the other a space actually designed strictly for
a computer "solo."
As mentioned earlier, hearing a musical fragment or phrase migrate
throughout an improvisational ensemble is exciting owing to its
spontanaeity of passage and its availability to transformation. In
Holtranix II, there are a number of places where the
soloist's material is recorded and played back in dialogue with the
performer. The input is recorded in fragments of various time
lengths. When played back, there is not a rhythmic transformation,
however, the medium through which the fragment is reiterated into
the improvisation is much different. I programmed the computer to
play these fragments through a MIDI software sampler (SampleCell
II). The sampler is loaded with sounds of my own dreation
consisting of those resembling percussion instruments, those useful
for creating complex textures, and those which would lend
themselves to chordal sonorities or melodic passages. To break up
this form of exact response, the computer is programmed to
switchback and forth between this and another type of response
largely composed of randomized events that in turn influence what
the improvisor is developing. As the improvisation progresses, the
listener can hear rhythmic and melodic fragments aswering or
echoing the performer much like call and response in a jazz combo
setting.
To enhance further this quality of contribution, I specifically
programmed a section of the work where the computer actually
improvises a solo and the performer improvises a chordal
accompaniment in support of that solo. As I constructed this bit of
programming, I considered what was most important in a human
improviser's development of a solo. There is more than one answer
to this query of course. However, the aspect I concluded here was
the performer's breath. Not only does the performer have to breath,
it is my opinion that the manipulation of breath is one of the most
defining elements of phrasing in a solo. With this in mind, I
simply constructed an element into the program that allowed for a
space or "breath" in between phrases. The phrase lengths themselves
are also varied. This ultimately endows the computer's solo with a
variable sence of pacing, expansion and contraction.
When one experiences being "moved" by expressive qualities in
music, what is inherent in performance that causes that state?
Where does a passionate performance come from? In Michael
Pelz-Sherman's Ph.D. dissertation (UCSD, 1992), he presents
evidence from performance studies showing differing opinions
regarding the source of expression in performance. He states that,
"at one extreme, expression in music is seen to be almost entirely
a result of the conscious formulation of performance plans. At the
other, expression is seen as a product of unconscious, emotional,
and physical responses acquired through conditioning, heredity, and
evlolution" (Pelz-Sherman, p.19). Both statements are true to an
extant, however, they are not going to make a bit of sense to a
computer.
Like any developing musician, before a computer is involved in a
musical experience of any kind, it must be able to listen or hear
and it must have some capacity to play or create. Through the
standard protocol of MIDI, computers can begin to hear, play, and
interpret music. These three abilities become highly stylized as
humans develop into fine musicians. Winkler explains that,
"musicians best known for their passionate performances exhibit
their personal style in the control of minute variations in timing,
articulation, and dynamics. These are the aspects of performance
that best impart human interpretive skills to the computer"
(Winkler, p.137). These aspects are implemented in computers via
MIDI in terms of its understanding of pitch (MIDI pitch number),
loudness (MIDI velocity number), and time (a durational value
usually presented in milliseconds). A fourth aspect of great
importance, timbre, has been recently made more manageable for
implementation in interactive systems as an emitting source of
expression, thanks to programs such as Puckette's program Pd (Pure
Data) which can perform real-time audio signal analysis. With these
basic raw materials in combination with software like Puckette's
MAX and Pd, Rowe's Cypher, and others, a composer can begin to
formulate ways of endowing an interactive system with human
capabilities such as intelligent decision making, short and long
term predictions, and stylistically informed creativity. With
brevity in mind, a few examples of implementation will be presented
varying from localized programming methods to techniques devoted to
entire musical works.
The first example involves the implementation of expressive timing
in performance as applied in Robert Rowe's software, Cypher. Cypher
is a program consisting of two main components: a listener and a
player. "The listener analyzes, groups, and classifies input events
[from the performer] as they arrive in real-time without matching
them against any preregistered representation of any particular
piece of music. The player uses various algorithmic styles to
produce a musical response [based on the information gleaned by the
listener]" (Rowe, p.139). On a very basic level, expressive timing
in music consists of slight tempo fluctuations or shadings,
ritardandi, and accelerandi. In response to a performer's input,
Cypher's listener can analyze these variables at the micro and
macro levels. Using this information, Cypher can perform two
different temporal affectations: 1) directed temporal operations
[which] perform a linear transformation of the offset times
separating events, either lengthening them (decelerando/ritardando)
or shortening them (accelerando), and 2) static operations [which]
add small, non-directional changes to the length of event offsets
(Pelz-Sherman, p.114). Cypher's player can perform these
affectations on its own musical response at the micro and macro
level as well. This means that temporal affectations to single
phrases or sub-phrases can effect the way temporal affectations are
performed over a longer expanse of time such as phrase group or an
entire section, for example.
Sherman's use of intensity curves and pitchvolurations is an
interesting and valuable example of implementation. Sherman defines
musical intensity "as a percentage of deviation from the median or
default values... of an [incoming] musical event." This intensity
can be traced through time represented as a curve on a graph or
table where time is on the horizontal axis and intensity level is
on the vertical (Pelz-Sherman, p.53). These intensity curves can
represent incoming data such as pitch number, velocity number, and
duration. In fact, Sherman uses the term pitchvoluration. This term
is a percentage value that controls intensity deviations affecting
the system's outgoing response at the macro level which is a sum of
the intensity curves analyzed at the system's input on the micro
level: pitch, velocity, duration. Pitchvoluration values can be
"plugged" into other parameters that control for example, simple
tone production in a synthesizer's output (through MIDI system
exclusive commands).
The third example is the program Voyager, specifically written for
a improvisational piece of the same name developed by trombonist,
improviser, and composer, George Lewis. It is a continually
developing work in that it has changed mediums (computer operating
systems) more than once. However, the paradigm implemented has
remained fixed from the point of its inception in 1985. The
paradigm is a result of careful consideration of what the
relationship is between the computer and the performer. Winkler
sheds light on this relationship explaining that, "Lewis's strategy
allows for a great deal of independence between [the] computer and
the performer, establishing musical personalities that do not
directly control each other, but rather have mutual influence that
contributes to the final outcome of an improvisation. His goal is
to make the computer 'playing' listenable as music on its own by
viewing its behavior as separate from and independent of the
performer" (Winkler, p.27). In his own words Lewis describes that ,
"the interaction takes place in the manner of two improvisers that
have their own 'personalities.' The program's extraction of
important features from my activity is not reintroduced directly,
but used to condition and guide a separate process of real-time
[generative] composition" (Winkler, p.27). In this model, both the
performers, human and machine, are listening and responding to each
other's behavior. In the latest version of the work, at least 30
parameters are considered by the program, accommodating input and
generating output, as an improvisation unfolds. Considered
parameters include volume, sounding duration, octave, register,
interval width, pitches used, volume range, frequency of silence,
and articulation. These parameters are monitored and averaged over
time; the values of which are transferred to the music-generating
process which consists largely of the same governing parameters at
the input. During an improvisation, the human performer may stop
playing at some point. Hence, the system will not receive input. At
this moment the parameters described above at the output are
controlled by random number generators. All of the processes create
what Lewis refers to as the "personality" of the system (Lewis,
1993).
In a presentation given at UCSD in the Winter of 1999, composer
Gerhard Winkler presented a similar paradigm designed for one of
his works where both the performing ensemble and the computer
system inform each other. However, the computers output is not just
transformed musical output amplified through loudspeakers. Winkler
also developed an aspect of the program that produces a graphic
representation of output from which the performers would play in a
semi-improvisational manner. In essence all of the performers have
Macintosh Powerbook computers in front of them presenting various
screens of information. The sound produced by the
individually-miced members of the ensemble is captured by the
system which calls upon a library of the screens as a response to
the input. This produces a constantly evolving, cyclical unfolding
musical space like Lewis's Voyager, yet the performers are working
form graphical notation. Another difference is that Winkler's
system blurs the distinctive relationship between human and machine
players. Finally, the human and machine players are inextricably
bound together (borg music?), where if one fails, the musical space
has the potential to collapse.
In this quest for the endowment of computer systems with cognitive
abilities, one must consider the goals or end results of such a
task. Why does one want to interact with computers? Sherman has the
simple goal of not necessarily producing "realism... but rather
plausibility." In other words, "a listener, upon learning these
nuances, should be able to construct a mental image of the virtual
performer based on a degree of predictability in its response to
input" (Pelz-Sherman, p.21). Rowe believes that, "interactive music
programs can change the way we think about machines and how we use
them" (Rowe, p.262). This idea of the computer causing one to think
differently is expressed by David Rokeby in terms of a reflection
of the self. In a concluding manner, he states," that to the degree
that technology reflects ourselves back recognizably, it provides
us with a self image, a sense of self. To the degree that
technology transforms our image in the act of reflection, it
provides us with a sense of the relation between this self and the
experienced world" (Rokeby, p.133). For myself, I simply desire to
experience another enticing setting for improvisation utilizing the
wildly unlimited potentials of the computer.
Sources Cited
Lewis, George E.
Voyager. CD. Avant 014. 1993.
Pelz-Sherman, Michael. "On the Formalization of Expression in Music
Performed by Computers." Ph.D. dissertation, Universtiy of
California, San Diego, 1992.
Rokeby, David. "Transforming Mirrors: Subjectivity and control in
interactive media." In Critical Issues in Electronic Media,
edited by Simon Penny. Albany: State University of New York Press,
1995.
Rowe, Robert. Interactive Music Systems: Machine Listening and
Composing. Cambridge: MIT Press, 1993.
Winkler, Todd. Composing Interactive Music: Techniques and Ideas
Using MAX. Cambridge: MIT Press, 1998.
Additional References
Baggi, Denis, ed.
Computer-Generated Music. Los Alamitos: IEEE Computer
Society Press, 1992.
Boden, Margaret A. "Artificial Genius." Discover, vol. 17,
No. 10, October 1996.
Cope, David. Experiments in Musical Intelligence. The
Computer Music and Digital Audio Series, vol. 12. Madison,
Wisconsin: A-R Editions, Inc., 1996.
Cope, David. Computers and Musical Style. The Computer Music
and Digital Audio Series, vol. 6. Madison, Wisconsin: A-R Editions,
Inc., 1992.
Col Legneo. Gerhard E. Winkler, Edgar Varese, Morton
Feldman. CD. WWE 1CD31872. 1994.
Rosenboom, David. "Extended Musical Interface with the Human
Nervous System." In Leonardo, Leonardo Monograph Series, No.
1. Berkeley: 1990.