The previous chapters have
shown conclusive behavioral evidence for the human ability to
learn the new musical system. In order to further characterize
this learning ability, this chapter moves beyond behavior and
into an investigation of neural mechanisms employed in the
learning of new music. By studying the temporal and spectral
properties of the neural processing of expected and unexpected
harmonies in the new musical system, we hope to gain insight
into the mechanistic processes underlying learning, which
encompasses the brain's ability for expectation formation and
context integration.

The perception of sound patterns and structures
is crucial for music and speech.  Knowledge of these sound
patterns can be observed in humans and other animals using
various brain signatures. From the animal literature, we know
that tuning for frequency and harmonicity in the cortex are
dependent on exposure and interactions with sounds such as song
learning in birds (Grace et al, 2003) and noise exposure in
rats (Zhang et al, 2002). Violations of sound patterns are
shown to elicit the Mismatch Negativity
(Näätänen, 1982), a negative event-related
potential (ERP) waveform onsetting 150-210ms after a pattern
violation (Alain et al, 1998; Woldorff et al, 1998) and is
dependent on NMDA receptors (Javitt et al., 1996). This
mismatch-type activity has been shown in violations of auditory
patterns in various dimensions: tone frequency (Naatanen, 1982;
Deouell & Bentin, 1998), pitch changes (Jacobsen et al,
2003), loudness, duration (Naatanen, 1989), spatial location
(Deouell et al, 2006) and phonemes (citation?), although it is
most robust to tone frequency deviations (Deouell & Bentin,
1998). The extant literature suggests that the MMN provides an
index of echoic memory in neural ensembles in the superior
temporal plane.

Language studies have shown that
violations of syntactical structure in auditory sentences
elicit the ELAN (Early Left Anterior Negativity), a
left-lateralized negativity around 200ms (Hahne &
Friederici, 1999). At a later time window, semantically
incongruous words generate the N400, a negative waveform
largest central-parietally around 400ms after word onset when
presented in both auditory and visual modalities (Kutas &
Hillyard, 1980; McCallum et al, 1984; Bentin et al, 1993; but
see Holcomb & Neville, 1990). In the domain of music,
violations of musical rules or expectations are shown to elicit
a negative-going Event-Related Potential at 150-210ms (Koelsch
et al, 2000, 2002, 2005; Loui et al, 2005; Leino et al, 2007).
This waveform is largest frontally and is originally observed
to be right-sided and thus termed the ERAN (Early Right
Anterior Negativity) (Koelsch et al, 2000, 2002, 2005), but has
been later observed as bilateral and renamed the EAN (Early
Anterior Negativity) (Loui et al, 2005; Leino et al, 2007).
Musical violations also generate a Late Negativity (LN,
probably similar to the N5 in other studies), a negative-going
waveform occurring around 400-600ms after the unexpected chord,
which is observed to be largest over prefrontal sites.

Because of their different topographies
and time courses, the EAN and LN are thought to have different
neural generators.  The relation between music-related
ERPs and language-related ERPs is unclear. Their similar time
courses and partially overlapping scalp topographies suggest
that these brain responses related to deviations of expected
patterns in sound, syntax, semantics, and music may share some
common neural generators (MMN vs. ELAN and EAN; N400 vs.
LN).

While the EAN and LN components are
elicited by unexpected chords in standard Western music, little
is known about the neural processing of non-Western music.
Thus, it is unclear whether the results observed in
electrophysiological studies of Western music reflect specific
rules in Western musical harmony, or the more general
processing of sound patterns in the environment related to
one's culture. Moreover, as Western music is overlearned in
most subjects in Westernized cultures, even individuals with no
formal musical training show robust ERPs to unexpected Western
music (Koelsch et al, 2000; Loui et al, 2005); thus, it is
difficult to see how the brain develops expertise to a pattern
of sounds when exposed to a Western musical context.

We investigated the physiological bases
of learning in a novel musical context employing an artificial
musical system, which was completely unfamiliar to all
participants. We examined musical expectations independently
from all-pre-existing associations by manipulating the
probabilities with which participants heard constituents of the
novel music. We tested the hypothesis that sound patterns in
the novel musical system elicited the same brain potentials as
Western music to test if a general mechanism existed for
pattern detection in musical systems. Furthermore, we sought to
trace the source and development of these effects, as well as
to identify behavioral correlates of individual differences in
electrophysiological results.

In the experiment, patterns of
simultaneous and sequentially presented tones, forming chord
progressions with four chords each in the new musical system,
were played to participants while their electroencephalogram
was recorded. Patterns of chords were presented at different
probabilities, with the Standard sounds being presented with
70% probability and the Deviant sounds at 20%. The remaining
10% of the sounds was similar to the Standard except with a
rapid fading-out of the amplitude (see supporting online
materials for diagrams and audio examples of stimuli).
Participants' task was to press a button upon detecting the
fade-out sounds; this ensured that they attended to auditory
stimuli, but that decision-making was dissociated from the
perception of musical sounds. Using this method, we observed
rapid and reliable sensitivity to the new musical system,
providing evidence for the recruitment of extensive neural
mechanisms which rapidly integrate sounds into new
contexts.

Method

Participants.  12
normal healthy adults (8 females, 4 males, mean age 23.5 years,
age range 19-29) participated in this study.  Participants
had an average of 12 years of musical training (range 6-20
years) outside of normal school education. All subjects were
right-handed and reported having normal hearing, normal or
corrected-to-normal vision, and no history of neurological or
psychiatric disorder.  All subjects were recruited as
volunteers from the University of California at Berkeley
community; each subject gave written informed consent prior to
the experiment, and was paid $10 per hour for their
participation.  Subjects had no prior exposure to the
musical system used in the present study. All research was
approved by the Committee for the Protection of Human Subjects
at UC Berkeley.

Stimuli. The new musical system is based on the
Bohlen-Pierce scale, in which one tritave (replacing an octave
in traditional Western music) consists of 13 divisions of a 3:1
frequency ratio. (See Chapter 2 for details on the new musical
system.)

Three sets of chord progressions were
generated, each consisting of four chords in the Bohlen-Pierce
scale; they were labeled Standard, Deviant, and Fadeout chord
progressions. The only difference between Standard and Deviant
progressions was that the third chord in the Deviant
progression was composed of different pitches from the third
chord in the Standard progression. The Fadeout progressions
were identical to the Standard progressions, except that one of
the four chords contained a significant decrease in amplitude
(fadeout). Three versions of chord progressions were generated
for each of the three conditions, with different versions
corresponding to different keys in the new musical scale, such
that the actual frequencies of the auditory stimuli were
varied, but the relations between the chords remained the same
within each condition. The Fadeout chord progressions contained
task-relevant target amplitude changes in either one of the
four chords. Altogether, we generated 18 sound stimuli
including three Standard chord progressions, three Deviant
chord progressions, and 12 Fadeout chord progressions. All
stimuli were created using Adobe Audition 1.5. Each chord
lasted 600ms; thus each chord progression lasted
2400ms.

Fig. 1.
Pseudospectrogram representations of the three stimulus
conditions.

a.
Standard chord progressions

b.
Deviant chord progressions

c. Fadeout chord progression

Procedure. During the experiment, Standard
progressions were presented for 70% of all chord progressions;
Deviant progressions were presented at 20%, and Fadeout
progressions at 10%. Participants were instructed to make a
button-press response on a joystick immediately upon detecting
each fadeout chord. Stimuli were presented at a level of 70dB
on a PC using Presentation 9.90 software (Neurobehavioral
Systems, Inc.) with a pair of Altec Lansing computer speakers
placed 100cm away from the closer ear of each subject. Each
experiment included 10 runs, with each run containing 100 chord
progressions in total. Thus, each participant heard 1000 chord
progressions overall, with 700 being Standard, 200 being
Deviant, and 100 being Fadeout targets.

EEG recording. EEGs were recorded from a 64-channel
electrode cap which corresponded to the international 10-20
system, with six additional external electrodes placed at the
outer canthi of the eyes, below the left eye, on the nose, and
on each mastoid. EEGs and behavioral data were acquired using a
BioSemi system with ActiView 5.1 software. Electrode impedances
were kept below 25 kW for all
electrodes. All channels were continuously recorded with a
bandpass filter of 0.01-100 Hz and referenced to the right
mastoid during recording. The raw signal was digitized with a
sampling rate of 512 Hz. Recordings took place in an
electrically shielded, sound-attenuated chamber. A video zoom
lens camera was used to monitor participants' movements during
recording.

Data analysis. Raw EEG data were imported into BioSemi
software BrainVision Analyzer for analysis. Raw data were
referenced to the averaged signal of the left and right
mastoids and high-pass filtered at 0.5Hz to eliminate
low-frequency drift. EEG epochs containing fluctuations of more
than 100mV within 200ms was rejected to eliminate noise due
to eye blinks, eye movements, excessive muscle activity, and
other artifacts. ERPs were segmented and averaged separately
for each condition (Standard, Deviant, and Fadeout) over the
time window of 200ms prestimulus to 1000ms poststimulus, and
then band-pass filtered at 0.5-20Hz and baseline-corrected
relative to a period of 200ms prestimulus to 0ms (stimulus
onset). ERPs were grand-averaged across 12 subjects on mean
amplitudes across latency windows of 150-210ms (EAN) and
400-600ms (LN). Peak and latency ANOVAs were conducted over the
most activated site for each time epoch: FCz (EAN) and Fpz
(LN). Scalp topography statistics were calculated by clustering
electrodes into five regions: prefrontal (Fpz, Fp1, Fp2, AFz,
AF3, AF4, AF7, AF8), frontal (Fz, F1, F2, F3, F4, F5, F6, F7,
F8, FCz, FC1, FC2, FC3, FC4, FC5, FC6, FT7, FT8), central (Cz,
C1, C2, C3, C4, C5, C6, CPz, CP1, CP2, CP3, CP4, CP5, CP6, T7,
T8, TP7, TP8), parietal (Pz, P1, P10, P2, P3, P4, P5, P6, P7,
P8, P9, POz, PO3, PO4, PO7, PO8), and occipital (Oz, O1, O2,
Iz).

Source localization was conducted using
difference waves of Deviant ERPs minus Standard ERPs using
LORETA (Low Resolution Electron Tomography) by Pascual-Marqui
et al (1994). Time-frequency analysis was performed using
EEGLAB 5.00 (Delorme & Makeig, 2004) using segmented EEG
data (prior to signal-averaging and band-pass filtering) for
each condition across different participants. 

Behavioral tests of
learning.
Recognition and generalization tests of
memory and grammar learning were conducted on the same group of
participants in a separate session after EEG recording. These
were the same kinds of tests as in Experiment 1 of Chapter
4.

Results

Behavioral data.

Behavioral results obtained
from the amplitude detection task were at ceiling. The average
hit rate across all subjects was 94.8% and the average false
alarm rate was 0.1%. Average reaction time to detect the
amplitude-change targets was 606ms. Our behavioral data confirm
that subjects were able to perform the task while attending to
the auditory stimuli.

Standard vs. Deviants

In our analyses of EEG data, we first
compared Event-Related Potentials elicited by the
high-probability Standard sounds with the low-probability
Deviant sounds. ERPs elicited by Deviant sounds showed both the
Early Anterior Negativity (EAN) and the Late Negativity (LN)
effects when compared to ERPs of Standard sounds (Fig. 2).

Figure 2. ERPs and topography for
Standard versus Deviant chords in the new musical system.

The Early Anterior Negativity (EAN) was
a negative-going waveform significant at a time window of
150-210ms after stimulus onset, with a bilateral frontal scalp
distribution. A one-way ANOVA conducted at the midline frontal
site FCz, comparing ERPs to standard versus deviant chords,
confirmed that the difference was significant at a time window
of 150-210ms (F(1,22) = 5.70, p < 0.05). The
bilateral frontal scalp distribution was confirmed by a two-way
ANOVA on the average amplitude over time window 150-210ms on
all electrodes, with factors of sound type (Standard vs.
Deviant) and electrode region (clustered into prefrontal,
frontal, central, parietal, and occipital), yielding a
significant interaction between condition and electrode region
(F(4,118) = 13.87, p < 0.001). The Late Negativity (LN) was
a negative-going waveform significant at 400-600ms post
stimulus onset, and was largest bilaterally over prefrontal
channels. A one-way ANOVA conducted at the midline prefrontal
site Fpz confirmed that ERPs for deviant chords were
significantly more negative than for standard chords at
400-600ms (F(1,22) = 13.91, p = 0.001), and the
scalp topography was confirmed by a two-way interaction between
sound  type (Standard vs. Deviant) and electrode region
(prefrontal, frontal, central, parietal, occipital): F(4,118) =
17.08, p < 0.001. EAN vs. LN topographies were significantly
different as indicated by a significant three-way interaction
between time course (150-210 vs. 400-600), condition (Standard
vs. Deviant), and electrode region (prefrontal, frontal,
central, parietal, occipital): F(4, 236) = 2.59, p <
0.05. 

The topographies and timescales of
these two components were similar to findings from studies
using traditional Western music (Loui et al, 2005), suggesting
that perceiving novel sound patterns recruits the same neural
systems as are engaged in the perception of Western music (see
figure 3).

Figure 3. A comparison of ERP, scalp
topos, and difference waves for Western music (Loui et al,
2005) and the new musical system.

Standard vs. Fadeout Target

ERPs in response to target fadeout
chords showed a large, parietally-centered positive waveform
around 600ms after target stimulus onset (One-way ANOVA over
parietal site Pz comparing ERP amplitude over time window
300-800ms poststimulus for each participant: F(1, 22) = 49.7, p
< 0.0001, see figure 4). This result is in accordance with
the classic P300 effect associated with target detection,
showing that participants successfully identified the fadeout
chords as task-related targets.

Figure 4. ERPs and scalp topographies
for task-relevant target chords in the Fadeout condition.

Source Localization.

In order to investigate neural networks
contributing to the effects of the EAN and the LN, we used
LORETA to localize the sources of the early and late waveforms.
LORETA (Low-Resolution Electromagnetic Tomography) is an
algorithm which obtains a three-dimensional solution from the
two-dimensional scalp topography of EEG or ERP epochs, and maps
the solution onto standard brain slices obtained from the
Montreal Neurological Institute (Pascual-Marqui et al, 1994).
Results from LORETA point to different neural generators for
the EAN and the LN. The EAN appears to be generated by the
temporal lobe, whereas the sources of the LN are most probably
from the lateral prefrontal cortex.

Figure 5a. LORETA source localization
for the EAN.

Figure 5b. LORETA source localization
for LN.

Time-frequency analysis

Time-frequency analysis of these
electrophysiological signals, notably Gamma band analysis, were
conducted to investigate electrophysiological activity at high
frequencies during the perception of expected and unexpected
chords. Recent studies have reported that Gamma band activity
is induced during states of rhythmic expectation (Zanto &
Large, 2005; Snyder et al, 2005). In addition, the violation of
expectations as in the mismatch-type studies (Plantev et al,
1991) and word-nonword oddball paradigms (Kaiser et al, 2002)
have shown induced low Gamma band or high Beta band activity
(region of 20-40Hz) over left frontal electrodes during
mismatch conditions. Based on time-frequency plots of data
concatenated for all 12 participants, we observed high Gamma
band activity above 50Hz at anterior frontal sites (F3) for
Standard chords. In response to Deviant stimuli, this high
Gamma activity was absent, but increased power was observed in
the high Beta to low Gamma frequency range (20-40Hz) (see
figure 6). These effects were significant at the p < 0.05
level when corrected relative to a 200ms pre-stimulus baseline.
Results suggest that activity in high Gamma range may be
sensitive to the formation of expectation, whereas activity in
low Gamma or high Beta range may be sensitive to the disruption
or violation of expectation.

Figure 6. Time-frequency plots for
channel AFz in Standard and Deviant conditions.

The Development of Expertise

To investigate the effects of learning
during the course of the experiment, data from the EEG
recording sessions were divided evenly into three sections in
time. The earliest and latest phases of the EEG data were
isolated, analyzed, and compared separately in order to
investigate the evolution of the EAN and LN effects over time.
ERPs for Standard conditions in the early and late phases were
identical. However, a comparison of the Deviant conditions in
the early and late phases showed an enhanced negativity in the
late phase. The EAN was significantly larger in the late phase
compared to the early phase, as confirmed by a one-way ANOVA
over site FCz (F(1, 22) = 4.99, p= 0.036) (Fig. 7). The latency
of the late-phase EAN decreased slightly but not significantly
compared to the early-phase EAN latency (peak latencies of EAN
over site FCz: early phase = 191ms, late phase = 165ms, one-way
ANOVA: F(1,22) = 2.13, p = 0.16). The LN shows a similar
pattern of increased amplitude and decreased latency in the
last phase relative to the early phase, but these results are
not significant (LN amplitude: early phase = -3.20uV, late
phase = -3.42uV, F(1,22) = 0.11, p = 0.75; LN latency: early
phase = 582ms, late phase = 598ms, F(1,22) = 0.13, p =
0.72).

Fig. 7. Increase in EAN amplitude as a
function of the course of the experiment.

Probability Learning

To investigate the possibility that the
EAN and the LN reflect surface features of the stimuli used
(e.g. dissonance or harmony arising from interactions between
simultaneously presented tone frequencies - see Kameoka &
Kuriyagawa, 1969), rather than their relative probabilities of
occurrence, we implemented an additional control condition for
half of our participants. Prior to the beginning of the rest of
the experiment, a baseline condition was implemented in which
Standard and Deviant sounds were presented with the equal
probability of 45% each. The remaining 10% of sounds contained
a rapid fade-out in amplitude, as in the rest of the
experiment, and participants' task was to indicate when they
detected these amplitude fade-outs. No significant difference
between standard and deviant chords was observed when the
sounds were played at equal probabilities, suggesting that the
EAN and LN effects were driven by the relative probabilities of
sound patterns, rather than a surface property of the sound
stimulus (Fig. 8).

Figure 8. ERPs for equal and unequal
probabilities showing the lack of EAN and LN effects when
sounds were played at equal probabilities.

Behavioral Correlates of Electrophysiological
Effects

To investigate the correlation between
EAN amplitude and behavioral performance, we conducted a
follow-up behavioral experiment of music learning in another
session using the same participants. Melodies were composed
from the tones that formed the sound patterns used in the ERP
experiment. These melodies were composed by applying the chord
progressions as the rules of a finite-state grammar (Reber,
1989; see Chapter 2 for details on how melodies were composed
using a finite-state grammar). Participants listened to
melodies for an exposure period of 30 minutes.  At test,
they were given a two-alternative forced choice (2AFC) task
where the choices were one melody from the same finite-state
grammar and another from another finite-state grammar, and
their task was to identify the more familiar melody. This
grammar-learning test has been used in previous studies of
implicit learning (Reber, 1989) and is shown to be an accurate
indicator of generalization and grammar-learning ability.

Behavioral results confirmed that
participants were generally above chance at learning the
grammar that generated the melodies they had heard. When
individual participants' generalization scores were correlated
with the amplitude of their Early Anterior Negativity, a
significant positive correlation was observed (Pearson's r =
0.75; p = 0.015, two-tailed; see Fig. 9).

Figure 9. Positive correlation between
the amplitude of the Early Anterior Negativity and behavioral
performance.

Discussion

Our data show that the human brain
rapidly and flexibly integrates novel sound patterns to form a
musical context. The EAN and LN, electrophysiological
signatures of deviancies in Western music, are elicited by
deviancies in the novel sound patterns. Time course and scalp
topographies of these waveforms parallel findings from Western
music, strongly supporting the same core neural mechanism for
processing well-known and novel musical patterns.

The increase in amplitude of the EAN
over the course of music presentation reflects the gradual
development of expertise throughout the course of exposure,
suggesting that the EAN is an effective index of expertise in
auditory patterns. Thus the new musical system affords a
vehicle for the online observations of the brain as it develops
expertise in the new musical system.

Interestingly, the EAN is generally
larger for Western music than for the novel sound patterns (as
shown in Figure 3). This observation lends support to the claim
that the EAN is a neural correlate of expertise for sound
patterns. The results provide support for a neural system of
probability learning which rapidly adapts to the statistical
probabilities of sounds occurrences when given a new auditory
context. The Late Negativity also showed an increase in
amplitude and decrease in frequency late in the experiment, but
this difference was not significant and may be a result of
carryover effects of the EAN.

Another piece of evidence for the claim
that the EAN is an index of expertise is in the positive
correlation between the EAN amplitude and behavioral
performance in grammar learning tasks. Our results show that
the electrophysiological signatures of sound pattern processing
reflect individual differences in learning, and suggest that
the EAN may be an effective neural correlate of individual
differences in learning sound patterns.

Furthermore, both the EAN and the LN
components were eliminated when the sound patterns presented
were equated in probability, suggesting that these effects were
dictated by the relative probabilities of sounds, and that the
learning of music may require sensitivity to the differential
probabilities of sound patterns.

Results suggest that the EAN may
reflect perceptual mechanisms of expectation violation whereas
the LN may reflect further cognitive analysis or the
integration of an unexpected event into its context. EAN and LN
effects seem to have different neural generators. MEG and
patient data (Woldorff et al, 1998; Alain et al, 1998) have
implicated the superior temporal planes as probable sources of
the EAN with top-down modulation from lateral prefrontal
cortex. The PFC has been implicated in maintaining contextual
information (Huettel et al, 2002; Barcelo & Knight, in
press) and the local context of the chord patterns may be
maintained in a network in which prefrontal areas couple with
auditory cortices. These findings are also consistent with
prior MEG studies using Western chord progressions (Maess et
al, 2002) as well as fMRI results from the perception of
melodies and tone sequences (Levitin & Menon, 2003; Janata,
2003), which have identified the inferior frontal gyrus,
including Broca's area, as a brain region engaged in the
processing of music as well as language. Findings also converge
with other results showing that the EAN is larger in musicians
than in nonmusicians (Koelsch et al, 2000) and that EAN is
sensitive to the violation of rules in musical harmony (Leino
et al, in press), as Western music follows rules of musical
harmony to which most individuals - especially musicians - are
sensitive.

Taken together, our results suggest
that music perception engages a flexible set of neural
mechanisms which rapidly develop expectations and integrate
sounds into new contexts. Such learning mechanisms are dictated
by the probabilities of sounds and may include neural networks
that also subserve language acquisition (Huron, 2004), which
couple with the sensory cortices in the development of
sensitivity towards probable events in the auditory
environment.