Discussion
In a series of behavioral and
electrophysiological experiments, this dissertation has shown
that humans can rapidly learn a new musical system. The
knowledge acquired from exposure includes rote memory for
individual items, grammatical rules for large sets of items,
and sensitivity to the frequency structure underlying the
musical system. Moreover, following repeated exposure to
melodies in the new musical system, humans demonstrate
increased preference for familiar melodies. This converges with
prior reports of the Mere Exposure Effect (Zajonc, 1968; 2001),
although the purported Structural Mere Exposure Effect (Gordon
& Holyoak, 1983) for novel items in the same grammar was
not observed in the present studies.
The effects of learning were observed
for individuals with and without formal musical training,
suggesting that these effects are not a result of explicit
music education buth rather of the human learning ability and
sensitivity to sounds and sound patterns. While effects of
learning are diminished across different transpositions,
participants were still able to perform above chance at
recognition and generalization when presented with transposed
melodies, suggesting that both absolute pitch and more
importantly, relative pitch and transpositional invariance are
at work in our mental representation of the new musical system.
Eliminating harmony as a cue, by warping the Bohlen-Pierce
scale so that it became inharmonic, resulted in equally
successful levels of learning as the original Bohlen-Pierce
scale melodies, suggesting that participants were not
spontaneously inferring harmony from melody; rather, they
learned the grammatical structure of the new musical system by
becoming accustomed to the pathways connecting individual nodes
of the finite state grammar, or in other words, the melodic
intervals that occurred between successive notes within each
melody.
One possible strategy participants
could have employed to perform the two-alternative forced
choice task was to notice transitions between notes (i.e.
first-order statistics) that were most common within the set of
melodies. For instance, if participants noticed that large
intervals were most likely at the end of the melodies within
their exposure set, then they could use this information to
help them in solving the two-alternative forced choice tasks of
recognition and generalization. Further support for the
possibility that participants were using mainly first-order
statistics or melodic intervals stems from the fact that when
some of the pathways of the finite-state grammars were
eliminated, both recognition and generalization dropped to
chance levels.
Although the manipulation of harmonic
relations between chords did not change the pattern of learning
results, changes in timbre (manipulation of the frequency
components of tones) resulted in some differences in learning.
This difference was observed in probe tone ratings but not in
two-alternative forced choice tests of recognition and
generalization. Preference-rating tests showed marginal
differences; however, effects were small and difficult to
interpret perhaps because timbre was manipulated as a
between-subjects variable. As each group of participants was
only exposed to one of the two timbres, they could only have
utilized the full scale to differentiate between aspects other
than different timbres. Nonetheless, probe tone ratings showed
that hearing timbres that were congruent with the Bohlen-Pierce
scale (Shepard tones with components spaced out tritave
multiples apart in frequency) resulted in goodness-of-fit
ratings for the new musical system that were more highly
correlated with the tritave-based new musical system compared
to hearing timbres more congruent with the traditional Western
scale ("original" Shepard tones with components spaced out
octave multiples apart in frequency). Furthermore, the finding
that probe tone ratings showed different patterns of results
than two-alternative forced choice grammar learning tests
suggests that the two kinds of tests are sensitive to different
knowledge in this dissertation.
Based on behavioral data from Chapters
3 and 4, it seems that two-alternative generalization tests,
used in the present context, reflected conditional
probabilities or melodic intervals whereas probe tone ratings
were more reflective of first-order frequencies or the key of
particular contexts. Both of these kinds of statistics, i.e.
frequencies and transitional probabilities, must be encoded in
memory as a result of exposure, but the learning trajectory of
these two kinds of statistics may be differentiated.
Recognition tests seem to tap into similar knowledge as
generalization tests when the number of melodies presented is
large; however, when small numbers of melodies are presented
repeatedly, recognition tests seem to reflect a memorization
strategy and performance is distinct from generalization
tests.
Findings from prolonged exposure are
consistent with results obtained from meta-analyses across
multiple experiments. When exposure includes many melodies
presented with different numbers of repetition, recognition is
highly accurate for repeatedly presented melodies, but for
melodies presented few times, performance on recognition tests
was still significantly but only slightly above chance,
converging with the same levels of performance as obtained from
generalization tests. In addition, probe tone ratings revealed
clear sensitivity to frequencies as a result of exposure, and
preference ratings were selectively improved for melodies that
had been presented repeatedly. Over 10 experiments, preference
ratings were consistently higher for repeatedly presented
melodies but not for novel or sparsely-presented melodies
composed according to the participant's exposure grammar,
suggesting that preference, as assessed in the manner of the
Mere Exposure Effect (Zajonc, 1968; 2001) operated only over
rote items but did not generalize. As recognition and
generalization both demonstrated learning of grammatical
structure following exposure to large numbers of melodies,
whereas preference tests never showed sensitivity to the
grammar, it would seem that preference formation and
generalization are dissociable mechanisms in the human brain,
even in the domain of music where cognition and emotion seem so
intricately connected.
One interesting question concerns why
we see no generalization of the Mere Exposure Effect. If the
Mere Exposure Effect (hereafter abbreviated as MEE) is truly
reflecting implicit processes, then it should exhibit
generalizability towards new items with the same underlying
grammatical structure as the exposure items. Regarding the
generalization of MEE towards new stimuli in the same category,
Monahan et al (2000) reported generalization of the MEE by
showing increased preference for previously unencountered
Chinese ideographs after repeated exposure to another set of
Chinese ideographs, suggesting that the MEE applies for general
categories of items compared to different categories. This
generalized MEE, however, was demonstrated for a broad category
rather than a grammar or an underlying structure: ratings
demonstrating the generalized MEE were collected in comparison
to simple geometric shapes such as circles and triangles
(Monahan et al, 2000). The contrast between fake Chinese
characters and circles and squares was perhaps too large to be
a true demonstration of an MEE for grammatical structure
underlying stimuli; rather, they should be seen as MEE for a
broad general category. Gordon and Holyoak (1983) were the
first to report a true structural Mere Exposure Effect, where
novel items belonging to the exposure grammar were rated as
preferable over items belonging to an unfamiliar grammar.
However, effects of the structural MEE were small (Newell &
Bright, 2003) and have not been replicated successfully under
different experimental conditions (Zizak & Reber, 2004;
Newell & Bright, 2003), suggesting that preference for a
learned grammatical structure may not be easily observable,
perhaps because preference and grammar learning are in fact
different mechanisms. Experiments in this dissertation are
consistent with this idea, by showing double dissociations
between grammar learning and preference change.
The ERP findings also provide evidence
that the new musical system was implicitly learned. Patterns of
neural activity for frequent and infrequent chord progressions
in this new system are very similar to neural responses to
expected and unexpected chords according to traditional Western
music theory. The Early Anterior Negativity, very much related
to the Mismatch Negativity but seemingly sensitive to musical
rules above and beyond simple sound acoustics (Koelsch et al,
2001; Leino et al, 2007), may be a window to the kinds of
learning required in the experience of sounds and sound
patterns. While manipulating the direction of pitch changes
(Koelsch et al, 2001) and consonance or dissonance of chords
(Leino et al, 2007) lead to a kind of MMN component that is
independent of the position of the violations and seemingly
more reflective of surface features of sounds, the violation of
deeper underlying rules, such as the position of an unexpected
chord within a context of chord progressions, results in an EAN
component that is more sensitive to the degree or position of
violation.
Here we have shown that musical rules
such as chord progressions, while probably reflective of a
deeper level of musical knowledge, are represented in the human
brain in a manner that is dictated by probability. By
manipulating the probability of chords within chord
progressions, without prior exposure to the new musical system,
the human brain spontaneously develops sensitivity to different
probabilities. This probability learning mechanism
develops in its sensitivity as exposure increases; furthermore,
probability learning mechanisms are dependent on individual
differences in learning. Taken together, the EAN component is
effectively a neural correlate of probability learning.
The Late Negativity observed in Chapter
5 may reflect further cognitive analysis of improbable chords.
In addition, this Late Negativity is very similar to the
waveform found in many context integration experiments (Coulson
et a, 1998l) that are suggested to reflect activity of the
Anterior Cingulate Cortex (Pizzagalli, 2003). Thus, our
findings address not only music learning but also context
learning in a more general sense. The analogy between LN and
the N400 component in language is interesting and invites
future investigations. While the LN peaks at 500ms, it is
significant over the time window of 400-600ms, an epoch similar
to the N400 component. In addition, we observe a small decrease
in latency of the LN throughout the course of the experiment.
This may be reflective of the development of expertise in the
new sound system, the same reason that the EAN was larger in
the late phase than the early phase. Furthermore, the LN in the
familiar Western musical system, even in nonmusicians, is
significant at the earlier time window of 380-550ms (Loui et
al, 2005), slightly earlier than the presently observed
400-600ms. It is conceivable that given sufficient experience
with the new musical system, the LN would be earlier in
latency. Perhaps more exciting is the possibility that the N400
and the LN are in fact reflective of the same neural
mechanisms, with the N400 being earlier and slightly different
in topography (although the topography of N400 is inconsistent,
see e.g. Van Petten & Luka, 2006) because the normal human
brain is an extremely highly expert language processor as a
result of prolonged experience and active engagement in
linguistic functioning.
Implications
The present dissertation offers a
viable method to test posited theories of music such as the
Generative Theory of Tonal Music (Lerdahl & Jackendoff,
1983) and Lerdahl's proposed constraints of musical pitch
systems (Lerdahl, 1992; also see Krumhansl, 1987). A
theoretical account on the constraints of compositional systems
(Lerdahl, 1992) posits that music must be groupable, parsable,
and hierarchically organized; it must contain salient
transitions, adhere to stability conditions in rhythm and
meter, operate over fixed collections of pitches and harmonies,
adhere to a stable psychoacoustic basis, and be
multidimensionally represented, equal-tempered, and based on
the octave. The new musical system used in initial experiments
from Chapter 3 and early experiments in Chapter 4 is groupable,
parsable, hierarchically organized, statistically predictable,
and adheres to principles of psychoacoustics and memory
function. As the new musical system adheres to many of these
constraints initially, it is not surprising that it could be
learned. It is the subsequent experiments, such as the ones
investigating timbre, harmony, set size, and melodic pathways,
that systematically attempt to test these constraints one by
one. Although most of Lerdahl's constraints are followed, the
present musical system does go against some of the proposed
constraints, and thus the ability to learn the new musical
system questions the validity of these constraints. Notably,
the constraint stating that musical systems must be based on
the octave is strongly questioned by the ability to learn a
musical system based on the tritave instead of the octave. The
importance of the octave is supported by work on octave
equivalence (e.g. Demany & Armand, 1984). Although it
should be noted that as octave generalization and octave
equivalence are different concepts, more recent data
demonstrating octave generalization in monkeys (Wright et al,
2000) has been interpreted as demonstrative of octave
equivalence or of the universality of the octave in music (see
Hauser & McDermott, 2003). While at this point we have not
demonstrated tritave equivalence in the same way as octave
equivalence, it remains to be tested whether prolonged exposure
to a tritave-based musical system could lead to tritave
equivalence, thus further questioning assumptions on the
necessity of the octave (see also Sethares, 2004).
Regarding the constraints on musical
systems, another interesting result from the current
dissertation is that when timbre was varied so that tones
adhered to the tritave-based over the octave-based system,
frequency sensitivity (as assessed by probe tone ratings) was
better learned while grammar learning (as assessed by
two-alternative forced choice tasks) was unaffected. This is
important in light of the claim that music is constrained by
the frequency components of sounds (Schwarz et al, 2003;
Sethares, 2004). Specifically, the claim was that because
periodic sounds often contain energy at integer multiples of
the fundamental (the first harmonic is an octave above the
fundamental, second harmonic is a twelfth above, third harmonic
is two octaves above, and fourth harmonic is two octaves plus a
third above the fundamental, and so on), these tones would be
perceived as most stable and more central to the reference
point or key of a musical piece. The finding that
tritave-component tones elicited more highly correlated ratings
suggests that the frequency structure of sounds does affect our
perception of and sensitivity to musical structures. This may
be the first experimental evidence supporting the claim that
frequency distributions of sounds shape our expectations for
musical systems. While probe tone ratings were sensitive to
timbre, two-alternative forced choice tests assessing grammar
learning did not show a differentiation between tritave- and
octave-Shepard tones. This single dissociation between probe
tone ratings and two-alternative forced choice tests suggests
again that frequency sensitivity and grammar learning are
distinct mechanisms.
Results from this dissertation are
consistent with the following model of musical experience:
Figure 1. A model of musical
experience
We have shown that psychoacoustic factors, such as harmony and
frequency distributions of tones, can influence and constrain
the learnability of musical systems. Perceiving sounds
repeatedly leads to various kinds of expertise, and what kinds
of expertise are formed depends on the statistical properties,
e.g. frequency and transitional probability, of the stimulus.
While work has shown that attention enhances the effects of
statistical learning (Toro et al, 2005) and music processing
(Loui et al, 2005), the present results show that these effects
may be modulated by one's level of expertise, which enhances or
hones one's expectation for frequent or probable events.
Violations of musical expectation lead to arousal at neural
level, as shown in electrophysiological evidence presented in
Chapter 5. This change in arousal levels may have a causal
effect on musical affect, although only correlational evidence
is available at this point: for example, our prior work on
Western music (Loui & Wessel, in press) has shown that
unexpected chord progressions elicit lower preference ratings
compared to expected chord progressions, suggesting a link
between expectation and affect which may be mediated by arousal
as proposed by Meyer (1956). Another link to musical
preference, shown repeatedly in Chapters 3 and 4, is the effect
of mere exposure to repeated items.
Together, musical expectation seems to influence arousal in at
least two disparate ways: early learning of a musical system
leads to the initial buildup of expectations for lower-order
statistical regularities e.g. frequencies of pitches and the
knowledge of key. Items that follow the expected grammar, i.e.
melodies that had been repeatedly presented, lead to
fulfillment or satisfaction of expectation, leading to
increased preference or "goodness" ratings for in-key tones and
repeatedly-presented melodies. On the other hand, slight
violations of expectations in an overlearned system, e.g.
unusual chord progressions or subtle changes in timbre, may
lead to special moments of subtle but important changes in
affect that are on a different level of granularity compared to
what is tested in the traditional laboratory setting. Studies
of "chills" experienced during music (e.g. Sloboda, 1989; Blood
& Zatorre, 2001) show significant physiological indices of
arousal (increased heart rate, muscle tension, and breathing
rate) as well as activation of the limbic system (orbitofrontal
cortex, amygdala, and anterior cingulate cortex) during these
moments of intensely pleasurable music. While these kinds of
experiences are probably not captured in the present
dissertation using the new musical system, we begin to observe
some preference changes and it is interesting to speculate
whether highly extended exposure to music in the Bohlen-Pierce
scale, in the same scale as how the brain experiences music in
traditional music systems, could lead to an overlearned,
expressive, and flexible percept such that slight but
systematic violations of expectations could elicit the kinds of
highly subjective, intensely pleasurable experiences that
individuals report from hearing music by the grand masters
ranging from Bach to Thelonius Monk.
Work presented here converges with
other artificial grammar learning studies (Reber, 1989; Gomez
& Gerken, 1999) but the items used to form our finite-state
grammar are more difficult to classify iconically. Instead of
letters or strings of syllables, here we use tones that have
never been previously perceived. This makes it harder to engage
an explicit system to learn rules in these systems through
passive exposure. The artificial musical system, then, provides
a viable new method for explorations in nonlinguistic,
non-verbalizable domains that can be used to test for various
aspects of learning and memory, and their interface with
perception, cognition, and emotional functioning.
In the clinical domain, the use of this
new musical system may enhance therapy in special populations
such as individuals with neurological disease. Patients
recovering from stroke, neurosurgery, or brain trauma often
suffer from aphasia, the inability to utilize language
functions due to neurological damage. By using the music
learning methods explored in this thesis, it may be possible to
facilitate recovery in these patients via the intervention of a
musical grammar training program (Ozdemir et al, 2006).
Future Directions
The current research raises questions
for four potential directions of study:
1. Brain Networks
Which brain networks are at work in the
learning of artificial grammars? Based on our ERP source
localization data as well as other neuroimaging data (Muller
& Basho, 2003; Levitin & Menon, 2003; Brown et al,
2006), several brain networks may be functionally connected to
subserve music learning. Candidate brain networks include the
bilateral auditory cortices, traditionally motoric areas such
as the premotor cortex, and the lateral prefrontal cortex,
including response competition and conflict monitoring areas
(executive control areas) such as the anterior cingulate.
Methods of cognitive neuroscience could be employed to further
investigate these regional hypotheses. In addition,
neuroimaging studies have found that the limbic system
responses respond significantly to pieces that people find
intensely pleasurable (Blood & Zatorre, 2001).
Correlational studies relying on self-report have also found
that melodically interesting materials tend to elicit tears in
listeners, whereas harmonic changes elicit heartbeats and
chills (Sloboda, 1989). Further work may attempt to identify
the specific aspects of musical stimuli that elicit these
emotional responses, and the different neural underpinnings of
emotional processing in music (Krumhansl, 2002; Koelsch,
2005).
2. Attention and Expertise
To what extent is attention required or
beneficial in these kinds of learning? Toro et al (2005) have
shown that the attention enhances statistical learning. ERPs
elicited by ungrammatical chords are larger when attention is
allocated towards the auditory modality (Loui et al, 2005).
Together, these lines of evidence would predict that learning
of a new musical system would be enhanced when listeners are
attentively engaged in the music during exposure. Future
experiments could test this hypothesis, and perhaps more
interestingly, investigate the interactions between attention
and expertise in learning.
3. Developmental Trajectories
Where does the statistical learning
mechanism come from, and what is its developmental trajectory?
The less is more hypothesis (Newport, 1990) suggests that
children possess maturational constraints which dictate their
paradoxically superior and more flexible representation of
language. The less is more hypothesis may account for critical
period effects in language learning. This idea invites many
parallels in music, notably in pitch perception and the
emergence of harmonic knowledge. Implied harmony, which is
analogous to pragmatics in language in its late emergence and
its variable use (Trainor & Trehub, 1994), is one possible
candidate for a finely-acquired skill which only emerge as a
result of repeated associations of melodies with chordal
accompaniments in many musical genres. In contrast, pitch
perception, timbre perception, and melodic interval and contour
perception are relatively early emerging (Trainor, Wu, &
Sang, 2004). By testing for the learning of the new musical
system in infants, future studies can attempt to map out the
developmental trajectory of the many components of musical
knowledge.
Another interesting issue that an
infant novel music learning study could shed light on is
absolute pitch. One theory of absolute pitch (Saffran, 2003)
posits that AP is part of the innate auditory system in infants
which is subsequently pruned away in environments not requiring
absolute pitch perception (e.g. non-tonal languages). Due to
its novel set of pitch categories, the Bohlen-Pierce scale is
an excellent candidate for testing these theories.
4. Action in Perception
How does being actively engaged with
playing music in the novel system shape the nature of knowledge
or experience acquired? The idea of enactive listening, or
action within perception (Noë, 2004) can be explored both
at a behavioral and at a neural level, specifically by
investigating mirror neuron function in music learning (Lahav
et al, 2007). Future studies may bring the existing artificial
music studies to a more realistic context, by using more
expressive, performed musical pieces (especially those written
in novel scales, e.g. Boulanger, 1998; Blackwood, 2004) and
studying the ways in which musicians express themselves in a
real performance environment. The design of a novel musical instrument, using
sound-generating software coupled with MIDI interface
hardware (e.g. the continuum fingerboard by hakenaudio.com),
could test these ideas in a creative manner.
Taken together, the present
dissertation began with a review of experiments from the music
cognition literature, showing that robust knowledge of harmonic
relations and melodic processes seems ubiquitous, and that a
controlled study that systematically manipulates input (via the
use of a completely novel musical system) has not yet been
explored. After reviewing studies on learning novel systems in
non-musical domains such as artificial grammar learning and
statistical learning in language, we described an artificial
musical system based on the Bohlen-Pierce scale, from which we
derived two novel musical grammars. A series of behavioral and
electrophysiological experiments showed that humans can, given
limited exposure, rapidly acquire knowledge, sensitivity, and
preference in the novel music system. What is acquired depends
on the acoustic properties of the input, as well as statistical
and distributional characteristics of the sound patterns
presented. Electrophysiological measures offer additional
support as well as mechanistic evidence for rapid probability
learning in the human cortex.