The previous chapter demonstrated the
possibility that humans can learn grammatical and frequency
structures of sounds from limited exposure. Results suggest
that the human brain is efficient at learning relationships
between sounds and deriving a novel musical experience. Many
questions can be raised regarding this learning ability: what
are the constraints of learning? What are the kinds of sounds,
and relationships between sounds, that facilitate learning?
This chapter involves testing with a larger corpus of melodies,
as well as experiments investigating the contributions of
various aspects of sounds and sound relationship to the
learning ability. Aspects of sounds tested include timbre,
harmony, and melodic intervals. Sound relationships
investigated here are melodic intervals, amount of exposure,
and set size of melodies. As in the previous chapter, all of
the following experiments will test for four specific kinds of
knowledge in music: recognition, generalization, frequency
sensitivity, and preference change.

Experiment 1. Baseline Learning
Experiment

Chapter 3 showed that
presenting a relatively large number of melodies (15 melodies
in Experiment 2, compared to five in Experiment 1) had a
positive effect on the ability to generalize. To replicate and
extend the previous result, an initial experiment is needed, in
which we make a strong case in establishing that humans can
conclusively learn the new musical system when given exposure
to a sufficiently large number of exemplars.

In this experiment,
participants were given exposure to 400 melodies once each,
over the course of 30 minutes. As in experiments from the
previous chapter, participants were given various listening
tests before and after exposure to the melodies. These tests
included pre-exposure and post-exposure probe tone ratings to
assess sensitivity to frequencies, two-alternative forced
choice tests of recognition and generalization to assess memory
and grammar learning, and subjective ratings tests to assess
preference change. Given previous results showing some learning
with 15 melodies compared to five, we believe that increasing
the number of exposure melodies will induce learning; thus we
expect to see significantly above-chance levels of
generalization as well as recognition following exposure to 400
melodies. Additionally, this experiment will be used to compare
against other experiments to assess the relative contributions
of timbre, training, and melodic processes.

Methods

Participants. Twenty-four
undergraduates from UC Berkeley participated for course credit.
All participants reported having normal hearing and five or
more years of musical training. Subjects were randomly assigned
to one of two groups; one group was exposed to Grammar I and
the other group to Grammar II.

Stimuli. All experiments
were run in a sound-attenuated chamber, using a Dell PC desktop
computer coupled with AKAI K301 headphones. All stimuli were
generated and presented using Max/MSP (Zicarelli, 1998) in the
Windows environment. The 1000 melodies were generated
automatically from the chord-progression finite state grammars
described in Chapter 2, with 500 melodies in each of the two
grammars. Each melody consisted of eight notes, where a note
was simply represented as a number that plugged into n
in the Bohlen-Pierce scale formula:

Frequency = 220 * 3 ^
(n/13).

Procedure. The experiment
was run in five phases:

  1. Pre-exposure probe tone ratings:
    this was similar to the probe tone method used by Krumhansl
    and others (Krumhansl, 1990). 13 trials were administered in
    this test. In each trial, a melody was presented, followed by
    a probe tone. Participants' task was to rate how well the
    probe tone fitted with the preceding melody on a scale of 1
    to 7, 7 being the best fitting. The melody was always the
    same for each trial, but the probe tone was varied in each
    trial.
  2. Exposure: 400 melodies in one of the
    two grammars were presented with no repeats for half an hour.
    Melodies were presented in pure tones; each tone was 500ms in
    duration, including rise time and fall times of 10ms each.
    There was a 500ms silence between two successive melodies.
    Participants were told to simply listen to the melodies. To
    alleviate boredom during the exposure phase, participants
    were given the option of drawing with provided pencil crayons
    on paper during the exposure phase.
  3. Two-alternative forced choice tests:
    Twenty trials were presented. The first ten trials tested for
    recognition whereas the last ten tested for generalization.
    In both types of trials, participants' task was to identify
    the melody (the first or the second) that sounded more
    familiar. Responses were coded as correct if the chosen
    melody was from the participant's exposure grammar. The same
    test was used on participants exposed to both grammars; thus
    the correct choice for one group was the incorrect choice for
    the other group. The number of recognition and generalization
    test trials were equated in the present experiments so as to
    allow direct comparisons between recognition and
    generalization performance. 
    1. Recognition: in each trial, two
      melodies were presented sequentially, one of which
      belonged to the set that had been presented during the
      exposure phase (and therefore belonged to the
      participant's exposure grammar) and another which
      belonged to the other grammar.
    2. Generalization: in each trial,
      two melodies were presented sequentially. Neither of the
      melodies had been presented during the exposure phase,
      but one melody belonged to the participants' exposure
      grammar whereas the other melody belonged to the other
      grammar.
  4. Post-exposure probe tone ratings:
    the same test as phase 1 was administered. The purpose of
    this test was to compare the post-exposure probe tone ratings
    with the distribution of exposure frequencies and the
    pre-exposure probe tone ratings in order to assess
    participants' sensitivity to the frequency structure of the
    new musical system.
  5. Preference ratings: 40 trials were
    administered. In each trial, one melody was played and
    participants' task was to rate their preference of the melody
    on a scale of 1 to 7, 7 being most preferable. The 40
    melodies rated were drawn randomly from the melodies
    participants had heard which belonged to their grammar (Old
    Grammatical), melodies they had not heard but which belonged
    to their exposure grammar (New Grammatical), and melodies
    they had not heard that belonged to the other grammar
    (Ungrammatical). Thus, each participant rated a different set
    of melodies, but all trials were drawn from the same pool of
    melodies overall.

Results

Two-alternative forced-choice tests revealed that
participants could still recognize old melodies (average
accuracy 63%, SD = 10%, t(22) = 5.77, p < 0.01, d = 1.17,
prep > .99), but now they could also generalize
to new melodies (average accuracy 69%, SD = 25%), t(22) = 4.26,
p < 0.01, d = 0.87, prep = .99 see Figure 1). No
significant difference was observed between recognition and
generalization levels of accuracy (t(23) = 0.91,
n.s.).

Figure 1.
Results from two-alternative choice tests, Experiment
1.

In the
preference ratings phase, participants rated old melodies in
the same grammar as being similar to new melodies in the same
grammar and new melodies in the other grammar (average ratings:
old melodies M = 3.3, SD = 0.98; new melodies same grammar M =
3.6, SD = 0.98; new melodies other grammar M = 3.3, SD = 0.98;
one-way ANOVA comparing average ratings: F(2,65) = 0.36, p =
0.7). The familiarity preference observed in Experiment 1 of
Chapter 3 (similar to the mere exposure effect first reported
by Zajonc, 1968) was not present in the current experiment.
(Fig. 2)

    Figure 2. Preference ratings results
from Experiment 1.

Pre-exposure probe tone ratings and post-exposure probe
tone ratings were both correlated with exposure frequencies
(pre-exposure r=0.46, s.e.0.07, fig. 3a, post-exposure r=0.65,
s.e.0.06, fig. 3a; correlations fig. 3b). When effects of the
melody used to obtain ratings were partialled out (see Appendix
1 for details on partial correlations), pre-exposure
correlations dropped to chance levels (rxy|z =
0.076, s.e. = 0.07) whereas post-exposure correlations remained
significantly above chance (rxy|z = 0.56, s.e. =
0.07, t(23) = 7.7, p < 0.0001, d = 1.57, prep
> 0.99, fig. 3b), suggesting that after brief exposure to a
new musical system, participants became sensitive to its
underlying frequency structure.

Figure 3a.
Probe tone ratings, Experiment 1.

Figure 3b.
Correlations and partial correlations between pre- and
post-exposure probe tone ratings and exposure frequencies,
Experiment 1.

Conclusion

We observed
significantly above-chance levels of recognition and
generalization, extending findings from the previous chapter
that exposure to a large set of exemplars leads to successful
recognition of previously-encountered melodies, as well as
generalized knowledge of an artificial musical grammar.
Importantly, participants were never exposed to chords or chord
progressions in the new scale; the fact that they chose new
items composed from the same chord progressions that generated
exposure melodies as being more familiar provides evidence for
the learning of musical structure. As the foil melody (the
incorrect choice) in recognition and generalization trials was
always drawn from the other grammar, the two choices in
two-alternative forced choice trials were always separable
based on their underlying grammars. To perform correctly in
both recognition and generalization trials, participants could
have employed the same strategy of grammar generalization. This
experiment design, along with the observation that performance
levels of recognition and generalization trials were
statistically indistinguishable, suggest that after exposure to
large numbers of melodic exemplars (as in the present
experiment), recognition and generalization trials could tap
into the same underlying type of knowledge.

Post-exposure probe tone ratings were significantly
more correlated with the distribution of exposure frequencies
than pre-exposure ratings, suggesting that participants
acquired sensitivity to the frequencies of tones as a result of
exposure. When the effects of the melody used to obtain these
ratings were partialled out, the resulting correlation was
around zero for pre-exposure ratings but significantly above
chance for post-exposure ratings, suggesting that participants
had no knowledge of the new musical system prior to being
exposed to it during the course of the experiment.

Preference
ratings were not significantly different for Old Grammatical,
New Grammatical, and Ungrammatical melodies. This replicates
the previous finding in Experiment 2 of Chapter 3 that
decreasing the number of repetition to each individual melody
led to no preference change for previously encountered
melodies. Thus it seems that preference increase, as reported
in Experiment 1 of Chapter 3, was driven by repetition in
exposure to melody exemplars. 

Experiment 1b. Baseline Control
Experiment

This experiment was the same as
Experiment 1, except there was no exposure involved. We
completed this as a control to ensure that these experiments
were truly testing learning as a result of exposure, rather
than pre-existing knowledge that the mind already
possessed.

Methods

Subjects. Twenty-four
undergraduates from UC Berkeley participated for course credit.
All subjects reported having normal hearing. Participants were
not selected for musical training.

Stimuli. All methods were
the same as Experiment 1 in this chapter, except the exposure
phase was omitted. Instead of listening to melodies for 30
minutes, participants simply sat in the experiment room. During
the half hour participants were given colored pencils and paper
and given the option of drawing to alleviate boredom.

Results

Two-alternative forced choice tests
showed that recognition and generalization tests were both at
chance: recognition: t(23) = -0.87, n.s.; generalization: t(23)
= 1.19, n.s.; see Figure 4. These results confirmed that, as
expected, participants did not possess knowledge of the novel
musical system without exposure.

Figure 4. Experiment 1b two-alternative
forced choice tests.

In the probe tone ratings test, the
correlation between exposure and post-exposure ratings was
indistinguishable from the correlation between exposure and
pre-exposure ratings (average correlation between pre-exposure
ratings and exposure set across subjects: r = 0.24; average
correlation between post-exposure ratings and exposure set: r =
0.34; t(23) = 1.1, n.s.). When effects of the melody used to
obtain the ratings were partialled out, the resulting partial
correlation dropped to chance levels for both pre- and
post-exposure ratings (pre-exposure mean partial correlation:
rxy|z = -0.0066, two-tailed t-test against chance:
t(23) = -0.98, n.s.; post-exposure mean partial correlation:
rxy|z = 0.12, t(23) = 1.54, n.s.). This suggests
that any correlations with exposure frequencies before the
partial correlation procedure was a result of familiarity with
the single melody used to obtain ratings, not with any
knowledge of the new musical system (Figure 5).

Figure 5. Correlations and partial
correlations between pre- and post-exposure probe tone ratings
and exposure frequencies, Experiment 1b.

Preference ratings (Figure 6) were
similar across all three groups, as confirmed by a one-way
ANOVA comparing ratings across the three conditions: F(2,68) =
0.69, n.s.

Figure 6. Preference ratings,
Experiment 1b.

Conclusion

This control condition confirms
that without exposure to specific exemplars, individuals did
not spontaneously demonstrate any knowledge of or preference
for the new musical system. Therefore we can conclude that
successful recognition and generalization, frequency
sensitivity, and preference changes observed in other
experiments of this dissertation are truly due to exposure
rather than pre-existing knowledge.

Experiment 2. Effects of Timbre

As discussed in Chapter 1, music can be
broadly construed as being organized in vertical and horizontal
dimensions (Sethares, 2004; Tramo et al., 2003). Concerning the
learnability of music, one obvious question is how one
dimension can influence the learning of another. One
possibility is that the two dimensions are orthogonal and
unrelated. Another possibility is that vertical organization of
sound, i.e. timbral and harmonic qualities, can aid the
learning of the horizontal organization (such as the direction
in which a melody is likely to go) by cueing the system towards
optimal sound organization. For instance, consonant harmonies,
and timbres with harmonically related partials, predict
melodies with more consonant intervals, and dissonant harmonies
and inharmonic timbres predict more dissonant intervals. This
hypothesis is implicit in one argument for the naturalness of
harmony in Western music (e.g. Schwartz et al, 2003), which
states that the chords used in Western music reflect the
harmonic series found in the timbre of natural periodic sounds
such as speech. Adapted to our new musical system, this
predicts that melodies that are presented in a timbre that is
congruent with the tuning system should help in learning the
tuning system.

In an effort to better understand the
constraints on musical learning, the next two experiments
investigate the relative contributions of psychoacoustical
factors (specifically timbre and harmony) to the learnability
of music. To assess the contribution of timbre to learning, we
constructed new timbres with harmonic partials based on a 3:1
frequency ratio (instead of the 2:1 ratio found in many natural
sounds). We make use of Shepard tones (Shepard, 1964), complex
tones with harmonically related partials, as bases for timbres
in this experiment. By comparing the degree of learning using
pure tones (no harmonic partials), octave-based Shepard tones
(complex tones with partials related to the fundamental in 2:1
multiples of frequency), and tritave-based Shepard tones
(complex tones where partials were related to the fundamental
in 3:1 multiples in frequency), we hope to address the question
of the importance of timbre and harmony to our acquisition of
musical knowledge.

Figure 7. Tritave-based and
Octave-based Shepard tones used in Experiment 2.

Methods

Participants.
Forty-eight undergraduates at the University of California at
Berkeley participated in this experiment in return for course
credit. All participants reported having normal hearing and
five or more years of musical training. Each participant was
randomly assigned to an exposure group and an exposure
condition. Group 1 listened to melodies composed in Grammar I
during the exposure phase. Group 2 listened to melodies
composed in Grammar II during the exposure phase.

Condition A. Four hundred
melodies in tritave Shepard complex tones were presented with
no repeats for half an hour during exposure phase.
Tritave-based Shepard tones were computer-generated complex
tones with six partials, where the partials were related to the
fundamental in 3:1 ratios in frequency; thus the timbre of
tones in this condition were congruent with a tritave-based
musical system. All other tests were the same as Experiment 1
in this chapter.

Condition B. Four hundred
melodies in octave Shepard complex tones were presented with no
repeats for 30 minutes during the exposure phase. Octave-based
Shepard tones were computer-generated complex tones with six
partials, where the partials were related to the fundamental in
2:1 ratios in frequency; thus these tones were incongruous with
the musical system which is based on the 3:1 ratio of the
tritave. All other tests were the same as Experiment 1 in this
chapter.

Results

Forced-choice recognition tests showed
significantly above-chance levels of performance in both
conditions (Tritave Shepard tones (timbre-congruent) condition:
recognition accuracy: M = 60%, SD = 9.8%, two-tailed t-test
against chance: t(23) = 4.63, p < 0.001; generalization
accuracy: M = 62.5%, SD = 25%, t(23) = 2.6535, p < 0.05.
Octave Shepard tones (timbre incongruent) condition:
recognition accuracy: M = 67%, SD = 15%, t(23) = 5.54, p <
0.001; generalization accuracy: M = 62.5%, SD = 29%, t(23) =
2.15, p < 0.05.). No significant difference was observed
between tritave Shepard tone and octave Shepard tone
conditions. Forced-choice generalization also revealed
significantly above-chance levels of performance in both
tritave and octave Shepard tones, again with no differences
between conditions. Furthermore, no significant differences
were observed between either timbre condition and pure tone
performance (Experiment 1 of this chapter). 

Figure 8. Two-alternative forced choice
results for Experiment 2.

Probe tone ratings conducted before and
after exposure showed that for all conditions, both pre- and
post-exposure ratings were significantly correlated with the
profile of exposure frequencies, with the ratings being
significantly higher in correlation post-exposure than
pre-exposure (Tritave Shepard tones condition: pre-exposure
average r = 0.5382, two-tailed t-test against chance level of
zero: t(23) = 8.75, p < 0.001; post-exposure average r =
0.79, t(23) = 35.6, p < 0.001. T-test comparing pre-exposure
and post-exposure correlations: t(23) = 3.81, p < 0.001.
Octave Shepard tones condition: pre-exposure average r = 0.381,
t(23) = 8.70, p < 0.001; post-exposure average r = 0.61,
t(23) = 13.14, p < 0.001; two-tailed t-test comparing pre-
and post-exposure ratings: t(23) = 3.81, p < 0.001). Partial
correlations were also obtained by partialling out the effects
of the melody used to obtain probe tone ratings (see Appendix
1). For both conditions, partial correlations of pre-exposure
ratings were not significantly above chance (Tritave Shepard
tones: pre-exposure partial correlation: rxy|z =
0.027, t(23) = 0.42, n.s.; Octave Shepard tones: pre-exposure
partial correlation: rxy|z = -0.009,  t(23) =
0.14, n.s.) whereas post-exposure partial correlations were
significantly above chance (Tritave: post-exposure partial
correlation = 0.38, t(23) = 7.17, p < 0.001; Octave:
post-exposure partial correlation = 0.27, t(23) = 4.86, p <
0.001.) Two-tailed t-tests comparing pre-exposure and
post-exposure partial correlations were significant for both
timbre conditions (Tritave: t(23) = 4.62, p < 0.001; Octave:
t(23) = 3.63, p = 0.001). Importantly, a direct comparison
between the two timbre conditions revealed that ratings in the
Tritave Shepard tone condition, or tones with timbres whose
frequency components were congruent with the new musical
system, were more highly correlated with the exposure profiles
than ratings in the timbre-incongruent Octave Shepard tone
condition (see figure 9). The effect of higher correlations for
the Tritave condition was significant for pre-exposure ratings
(t(23) = 2.14, p < 0.05) as well as for post-exposure
ratings (t(23) = 3.87, p < 0.001). When effects of the
melody used to obtain were partialled out, pre-exposure ratings
were not different across conditions (t(23) = 0.69, n.s.) and
post-exposure ratings in the Tritave condition were slightly
more correlated with exposure frequencies than ratings in the
Octave condition, but this effect was not significant with 24
participants in each condition (t(23) = 1.34, n.s.).

Figure 9. Correlations and partial
correlations between probe tone ratings and exposure
frequencies in Experiment 2.

Preference ratings were not
significantly different between preference for Old Grammatical,
New Grammatical, and Ungrammatical melodies. Ratings were
undifferentiated across conditions for both tritave and octave
Shepard tones: Tritave Shepard tones: F(2,69) = 0.52, n.s.;
Octave Shepard tones: F(2, 69) = 0.80, n.s. A direct comparison
of preference ratings for Tritave and Octave Shepard tone
melodies revealed no significant differences in preferences for
Octave versus Tritave Shepard tones (t(23) = 0.48, n.s., figure
10).

Figure 10. Preference ratings results from
Experiments 1 and 2.

Conclusion

The new musical system was
learned regardless of the timbre of the sounds. While timbral
differences did not affect recognition, generalization, or
preference ratings, probe tone ratings were significantly more
accurate (i.e. more highly correlated with exposure) for the
Tritave Shepard tone condition, which was more congruent with
the tritave-based tuning system. The more accurate ratings for
congruent timbres compared to incongruent timbres suggests that
the arrangement of frequency components in pitches affected
participants' sensitivity to frequencies of tones in their
input. However, the forced choice performance and preference
ratings tasks remained unaffected, suggesting that the
different types of listening tests used in these experiments
may tap into different cognitive processes.

Experiment 3. Testing for Effects of Long-Term
Training

So far, all
experiments have been conducted with participants with five or
more years of musical training. To test the novelty of this new
musical system, and to assess the degree to which the new
musical system is truly different from the existing Western
system of tonal harmony, we asked whether prior musical
training made a difference in participants' ability to learn
the musical structure. We replicated Experiment 1 of this
chapter with individuals having no formal musical training
outside of normal school education.

Methods

Participants. Twenty-four
participants were recruited in the same way as in Experiments 1
and 2. These participants had no musical training outside of
normal school education.

Stimuli and procedure.
Stimuli and procedures were the same as Experiment 1 of this
chapter. Pre-exposure probe tone ratings were collected at the
start of the experiment. Participants were then presented with
400 melodies; this was followed by two-alternative forced
choice tests of recognition and generalization, post-exposure
probe tone ratings, and preference ratings. Results from this
experiment were compared to results from Experiment 1 of this
chapter to assess the effects of prior musical training.

Results

Musically untrained participants
performed above chance in forced-choice recognition tests
(t(23) = 4.97, p < 0.01) as well as in forced-choice
generalization tests (t(23) = 2.35, p < 0.05). A comparison
between musically trained and untrained participants revealed
no effect of musical training (F(1, 92) = 1.98, n.s.) and no
interaction between recognition and generalization abilities
and musical training (F(1, 92) = 2.53, n.s., see Fig. 11).
Thus, musically untrained participants appear to perform as
well as participants with significant formal musical
training.

Figure 11. Two-alternative forced
choice results for musicians (Experiment 1) and nonmusicians
(Experiment 3).

Nonmusician participants' pre-exposure and
post-exposure ratings were both correlated with exposure
frequencies (pre-exposure r=0.34, s.e.=0.07; post-exposure
r=0.49, s.e.=0.07). Similar to musically trained participants
in Experiment 1, ratings of nonmusician participants showed an
increase in correlation with exposure after hearing the set of
melodies. However, when effects of the melody presented to
obtain the probe tone ratings were partialled out of the
pre-exposure ratings, the resulting correlation dropped to
chance levels (r=-0.03, s.e.=0.08), but the same procedure
applied to post-exposure ratings still resulted in highly
significant correlations (r=0.42, s.e.=0.08, t(23)=6.34,
p<0.0001, d=1.23, prep>0.99, see fig.
12).

Figure 12. Correlations and
partial correlations for probe tone ratings with exposure
frequencies pre- and post-exposure for musicians (Experiment 1)
and nonmusicians (Experiment 3).

There were no significant
differences between preference ratings for Old Grammatical, New
Grammatical, and Ungrammatical melodies (F(2,69) = 0.15, n.s.)
Musically untrained participants' ratings were very similar to
musically trained participants' preference ratings, as a
two-way repeated-measures ANOVA revealed no significant effects
of musical training (F(1,92) = 0.39, n.s.) and no interaction
between musical training and melody type (F(1,92) = 0.03,
n.s.). 

Figure 13. Preference ratings
for musicians (Experiment 2) and non-musicians (Experiment
3)

Conclusion

Results for nonmusicians in the current
experiment showed the same pattern as for musically trained
participants. Both groups were able to recognize individual
melodies and to generalize their knowledge of the melodies to
new melodies composed from the same grammar. Participants
showed no systematic preference for familiar or grammatical
melodies. This pattern was similar to musically trained
participants' data and provides support for the claim that
repeated exposure to small numbers of items leads to increased
preference, but limited exposure to large sets of melodies
(i.e. 400 melodies only once each) does not significantly
influence preference.

An increase in correlation was observed
for nonmusicians' post-exposure correlations compared to
pre-exposure correlations; this shows that after limited
exposure to the new musical system, participants became
sensitive to its underlying statistical structure. When effects
of the melody used to obtain the ratings were partialled out,
pre-exposure correlations dropped to zero whereas post-exposure
ratings were significantly above chance.  This shows that
participants rated based only on what they learned from
exposure to melodies in the new musical system, without relying
on prior musical knowledge. This was true of both musicians and
nonmusicians, suggesting that participants were able to learn
the new musical system regardless of prior training in Western
music. We concluded that effects of learning and preference
shown here are independent of long-term training; rather, they
may reflect mental processes that are more general than
knowledge acquired from explicit training in music.

Experiment 4. Testing for Harmonicity

Based on the results so far on grammar
learning, one may conclude that participants have acquired a
finite-state grammar, expressed as a network of relationships
between items. Another way to interpret these results is that
because chords in the new musical system are chosen to adhere
to psychoacoustical principles by approximating low-integer
ratios (as detailed in Chapter 2, a major chord is designated
as tones approximating a 3:5:7 ratio in frequency),
participants have inferred harmony from melody. In other words
participants may be recognizing and generalizing melodies by
hearing the fundamental frequency of each tone, and then
expecting the next tone to be related in integer multiples of
frequency. Possible support for this hypothesis comes from
psychoacoustical evidence that humans can be trained to listen
for tones that are harmonically related to a
previously-presented tone in frequency (Hafter, Schlauch, &
Tang, 1993).

To distinguish between the
grammar-learning hypothesis and the harmonics-hearing
hypothesis, we replicated the experiment using another scale
that did not contain harmonic relationships within its chords,
but the melodic principles, including all intervals and
contours of the melodies, remained the same. This was the
force-fitted octave scale, using the formula:

Frequency (Hz) = 220 * 2 ^
(n/13),

where all increments of the
Bohlen-Pierce scale were fitted into the octave, such that the
relative interval sizes were similar and all the melodies were
the same but the tones chosen to be chords did not form
low-integer harmonic ratios together. The forced-octave scale
is used in contrast to previous experiments in this
dissertation using a new musical system based on the
Bohlen-Pierce scale, where tones that formed chord progressions
were chosen to approximate low-integer harmonic ratios (see
Chapter 2 for details).

Figure 14. The Forced-octave scale,
which was used for this experiment, in comparison to the
Western and Bohlen-Pierce scales.

Methods

Stimuli. In previous
experiments, all melodies were represented as numbers that fit
into n in the Bohlen-Pierce scale formula: F = 220 * 3 ^
(n/13). As mentioned in Chapter 2, this is in contrast to the
Western scale, F = 220 * 2 ^ (n/12). In testing for
harmonicity, the scale used in the present experiment is a
newly defined forced-octave scale, where the number of steps in
the scale was 13, same as the Bohlen-Pierce scale, but the
numbers were fitted into a 2:1 frequency ratio. Formally the
forced-octave scale is defined as:

F = 220 * 2 ^ (n/13).

Participants and Procedure.
Twenty-four UC Berkeley undergraduates participated in return
for course credit. All participants had normal hearing. Because
Experiment 3 in this chapter demonstrated no effect of musical
training, participants were unselected for prior musical
training in this experiment. Procedures included pre-exposure
probe tone ratings, exposure to 400 melodies, two-alternative
forced choice, post-exposure probe tone ratings, and preference
tests, identical to Experiment 1 of this chapter.

Results

Recognition and generalization tests
revealed significant recognition and generalization:
recognition t(23) = 2.48, p < 0.05; generalization t(23) =
4.02, p < 0.01). Figure 15 shows the significantly
above-chance levels of recognition and generalization for the
inharmonic scale.

Figure 15. Two-alternative forced
choice test in Experiment 4.

Pre-exposure and post-exposure probe
tone ratings were both significantly correlated with the
exposure frequencies. Correlations for pre-exposure and
post-exposure ratings were both significantly above chance
level (pre-exposure r = 0.32, two-tailed t-test against chance:
t(23) = 5.37, p < 0.01; post-exposure r = 0.39, t(23) =
6.06, p < 0.01, see figure 16a). A two-tailed t-test
comparing the pre-exposure and post-exposure correlations was
not significant (t(23) = 0.82); however, when the effects of
the melody used to obtain probe tone ratings were partialled
out, the resulting partial correlations were significantly
higher for post-exposure ratings compared to pre-exposure
ratings (t(23) = 2.31, p < 0.05). Furthermore, partial
correlations for post-exposure ratings were significantly above
chance (rxy|z = 0.21; t(23) = 3.48, p < 0.01)
whereas partial correlations for pre-exposure ratings were not
significantly above chance level of zero (rxy|z =
0.020; t(23) = 0.34, n.s.; see figure 16b), suggesting that
participants became sensitive to the statistical structure in
the new musical system as a result of exposure and not as a
result of pre-existing knowledge independent of the
experiment.

Figure 16. Correlations and partial
correlations for probe tone ratings.

Preference ratings were
undifferentiated across conditions (F(2, 69) = 0.1, n.s., see
figure 17).

 

Figure 17. Preference ratings for
Experiment 4.

Conclusion

Participants still learned the
grammatical structure of the musical system when it was forced
into an inharmonic scale. Thus it was not the case that
participants were using a spontaneous inference of harmonic
process to guess at the melody; rather, participants were
learning a structure of sequential probabilities in the style
of finite-state grammar learning.

Experiment 5. Effects of Melodic
Intervals

While harmony is considered the
vertical dimension of music, melody, or the succession of
pitches that identify a tune, is considered the horizontal
dimension of music (Piston & DeVoto, 1987). A melody can be
identified by its contour (the successive up-down patterns
between notes) and its intervals (the distances between
successive notes). Past researchers have proposed that the
sizes of melodic intervals play an important role in the
perception of melody. The Implication-Realization Model
(Narmour, 1989) and regression-to-the-mean model (Huron, 2006)
have both related melody to Gestalt perceptual processes by
proposing that the ability to perceive a melody as a holistic
object, or its Gestalt, depends mostly on interval sizes. The
rule of thumb governing successive intervals is a gap-fill
model, where a large interval in a melody is usually followed
by small intervals in the opposite direction (Meyer, 1973; also
see Krumhansl, 1995). Gap fill and other melodic processes
generally refer to transitions between sizes of melodic
intervals, which themselves are transitions between successive
notes; thus the melodic processes can be conceptualized
statistically as second-order transitions between notes (Huron,
2006).

 

Figure 14. An illustration of
the gap fill principle: large leaps are usually followed by
small steps in the opposite direction.

In all the experiments we have
reported so far, melodic processes were not incorporated in the
melodies used; rather, they emerged from successive melodic
intervals specified by the differences between any two
consecutive pitches, which are illustrated by pathways
connecting the nodes in the finite-state grammar (see Chapter 2
for details). In order to investigate the effects of melodic
intervals on learning, the present experiment tests for
first-order transitions between notes by constraining possible
sizes of melodic intervals.

This experiment will test for
effects of melodic intervals by changing some of the
small-interval steps in the grammar into illegal pathways, such
that the exposure phase consists mostly of either large
intervals or constant intervals. If melodic processes and small
interval sizes are important for learning, we expect
performance on recognition and generalization performance to be
disrupted after exposure to disjunct melodies with large gaps
without small intervals.

Figure 15. The finite-state
grammar with some of the pathways blocked. These blocked
pathways, shown here in thick red arrows, correspond to the
relatively small steps (less than six increments) in
melodies.

Methods

Participants.
Twenty-four participants were recruited in the same manner as
in previous experiments. Participants were not selected for
musical training; i.e. individuals with all levels of musical
training participated in the experiment.

Stimuli, Materials &
Procedure.
One thousand melodies were composed as
before, but with the horizontal pathways of the finite-state
grammar blocked out such that no melody presented during the
exposure phase contained any of the horizontal pathways of the
grammar. The horizontal pathways corresponded to small
intervals in the melodies; thus, by eliminating these pathways,
the resulting melodies that were presented during exposure
contained mostly large intervals (six or more increments) or
flat contours (repeated notes, i.e. interval sizes of one). The
pathways that were not used in exposure melodies, i.e. the
horizontal pathways, were used to generate melodies for
generalization tests. Participants were exposed to 400 melodies
in one of the two grammars, and pre-tests and post-tests were
conducted in the same manner as in previous experiments.

Results

Forced choice tests of
recognition and generalization were both at chance levels
(recognition: average = 52.5%, two-tailed t-test against
chance: t(23) = 0.84, n.s.; generalization: average = 55%,
t(23) = 0.73, n.s.). No significant difference in performance
was observed between recognition and generalization trials
(t(23) = 0.74, n.s.; see figure 16).

Figure 16. Two-alternative
forced choice tests of recognition and generalization.

Probe tone profiles revealed
significant post-exposure probe tone correlations with the
frequencies of exposure (see figure 17). Pre-exposure
correlation average: r = 0.20; two-tailed t-test against
chance: t(23) = 2.74, p < 0.05). Post-exposure correlation
average: r = 0.46; two-tailed t-test against chance: t(23) =
7.82, p < 0.01). A comparison between pre-exposure and
post-exposure correlations showed a significant difference
between the groups (t(23) = 2.91, p < 0.01), suggesting
increased sensitivity to the frequencies of the musical system
following exposure. In addition, when effects of the melody
used to obtain probe tone ratings were partialled out,
pre-exposure correlations dropped to chance (average
rxy|z = 0.011, t(23) = 0.15, n.s.), suggesting no
prior knowledge of the musical system. Post-exposure
correlations when partialling out the effect of the melody
showed a trend of being significantly above chance (average
rxy|z = 0.13, t(23) = 1.84, p = 0.078).

Figure 17. Correlations and
partial correlations of probe tone ratings before and after
exposure.

Preference data showed
significantly different ratings for the three types of melodies
(F(2, 69) = 5.39, p < 0.01). In particular, preference
ratings were significantly higher for Old Grammatical melodies
compared to New Grammatical melodies (t(23) = 4.52, p <
0.001) and melodies belonging to the other grammar (t(23) =
2.47, p < 0.05). 

 

Figure 18. Preference
ratings.

Conclusion

           
In this experiment, selected pathways of the finite-state
grammar were eliminated during exposure such that certain
melodic intervals were not available. This manipulation
disrupted learning such that both recognition and
generalization dropped to chance levels of performance. The
simultaneous disruption of recognition as well as
generalization suggests that participants were using a single,
indistinguishable cognitive system to perform both recognition
and generalization tasks when the set size of the melodies
during exposure became sufficiently large.

Another possible reason for the
disruption of learning arises from Gestalt theories of melody
perception (Meyer, 1956). When pathways corresponding to
smaller intervals are eliminated, the resulting melodies form
either large intervals or repeated notes. This results in
disjunct melodies, resulting in a disrupted grouping structure
instead of a gestalt percept. Past studies (e.g. Creel et al,
2004) have shown that statistical learning tends to be
facilitated in items that can be easily perceived as gestalts,
and relatively difficult for sounds that do not readily form
auditory streams. Thus the disjunct character of these
melodies, leading to their disrupted grouping structure, may
possibly hinder learning.

The rather surprising result of
significantly higher preference for grammatical items converges
with the lack of generalization in this experiment. This seems
to indicate that preference formation and generalization
reflect different mental systems.

Experiment 6. Testing for Transposition

           
The experiments reported thus far have relied exclusively on
the Bohlen-Pierce scale with 220Hz as the constant reference
point k in the formula F = k * 3 ^ (n/13),
where n spanned the range of one tritave (from 0 to 12) with
actual pitches used in the grammars falling into six possible
"chord tones" or settings of n = {0, 3, 4, 6, 7, 10}. The
assignment of 220Hz as the reference point was a somewhat
arbitrary decision based on the fact that musical pitch
generally ranges from 100Hz to 5kHz (Attneave & Olson,
1971). The choices of n = {0, 3, 4, 6, 7, 10} were based on
theoretical and empirical observations that tones chosen to
approximate low-integer ratios of frequency would be perceived
as more consonant (Krumhansl, 1987; Mathews, 1988; see Chapter
2). In real music, however, pitches not only contain more
variability, but are also organized in many different keys with
distinct reference points. Furthermore, sequences of pitches
that form a melody should exhibit transpositional invariance;
that is, melodies should be transposable in log-frequency space
such that adding or subtracting a constant number to each
member within the set of pitches should result in the same
percept of a melody. Represented formally in the Bohlen-Pierce
scale, a melody, expressed as a set of numbers n =
{n1, n2, … ni }, should
retain its identity such that

F = k * 3 n
/ 13 α F = k * 3 (n + x )/
13

           
We set out to test this identity using the new musical system.
In the following experiment, the exposure phase involved
presenting the same 400 melodies transposed with a fixed
constant x of +1. At test, recognition and
generalization tasks involved an increased number of trials
with melodies transposed to different "keys" by adding
different numbers to n in the generative formula.

Methods

Participants. 24
undergraduates participated in return for course credit in the
same manner as experiments described above. Participants were
not selected for musical training.

Stimuli. Melodies were
transposed into two keys: 8 and 13. Melodies were presented
with the increased frequency created by adding 1 to n, such
that each note in the melody was transposed one increment up
along the Bohlen-Pierce scale relative to all the previous
experiments.

Procedure. The experiment
was run in four phases. Preference ratings were omitted for
this experiment due to constraints on the length of the
experiment.

  1. Pre-exposure probe tone ratings were
    identical to Experiment 1 of this chapter.
  2. Exposure. 400 melodies were
    presented once each. The melodies were shifted one increment
    up compared to previous exposure sets, such that the profile
    of tone frequencies presented to participants was transposed
    up by one increment (see figure 19).

 

 

Figure 19. A profile of the exposure
frequencies for this experiment ("New exposure"), compared to
previous experiments ("Exposure").

  1. Two-alternative forced choice
    tests
    1. Recognition tests included 20
      trials. As in previous experiments, participants were
      presented with two melodies sequentially, one belonging
      to the set that they had previously heard, and the other
      belonging to the other grammar, and were asked to choose
      the melody that sounded more familiar. In the first 10 of
      these trials, both melodies were shifted 8 increments up
      relative to the exposure phase (and therefore 9
      increments up relative to the probe tone ratings phases).
      In the last 10 trials, both melodies were transposed 13
      increments up relative to the exposure phase (and
      therefore 14 increments up relative to previous
      experiments and the probe tone phase of this
      experiment.)
    2. Generalization tests also
      included 20 trials. Each trial consisted of two melodies,
      neither of which had been presented during exposure, but
      one of which belonged to the participant's grammar
      whereas the other belonged to the other grammar.
      Parameters of the melodies were identical to the
      recognition trials, with the first 10 trials being
      transposed 8 increments up and the last 10 trials
      transposed 13 increments up.
  2. Post-exposure probe tone ratings
    were identical to pre-exposure probe tone ratings. 

Results

           
Forced choice tests revealed marginally significant success in
recognition (t(23) = 2.06, p = 0.051) and significantly
above-chance performance in generalization (t(23) = 2.60, p =
0.016, see figure 20).

Figure 20. Two-alternative forced
choice tests of recognition and generalization.

           
Probe tone ratings were significantly correlated with the old
profile of exposure frequencies which was used in probe tone
ratings (pre-exposure r = 0.33, t(23) = 5.28, p < 0.01;
post-exposure r = 0.28, t(23) = 4.37, p < 0.01). Unlike
previous experiments, however, the correlation was higher for
pre-exposure ratings than for post-exposure ratings
(pre-exposure r = 0.33; post-exposure r = 0.28). This
difference was not significant (t(23) = 0.45, n.s.). When
effects of the melody used to obtain ratings were partialled
out (see Appendix 1 for details), the resulting partial
correlation dropped to zero for both pre-exposure and
post-exposure ratings (pre-exposure partial correlation =
-0.073, t(23) = -1.472, n.s.; post-exposure partial correlation
= -0.010, t(23) = 0.17, n.s., see figure 21), suggesting that
participants did not gain sensitivity to the previously used
exposure profile when the experimental procedure was taken into
account.

Figure 21. Correlations and partial
correlations for probe tone ratings with both the profile used
in previous experiments ("old exposure") and the transposed
profile used in the present experiment ("new exposure").

           
When pre- and post-exposure probe tone ratings were compared
against the new profile of exposure frequencies, pre-exposure
correlations were only marginally correlated with the new
exposure profile (rxy|z = 0.13, t(23) = 1.89, p =
0.071) whereas post-exposure correlations were significantly
correlated with the new exposure profile (rxy|z =
0.16, t(23) = 2.28, p < 0.05; see figure 21). This increase
in correlation suggests that participants acquired sensitivity
to the new exposure frequencies when confronted with a
transposed set of tones.

Conclusion

The current experiment tested for
transpositional invariance of melodies in the novel musical
grammars. Participants were able to recognize and generalize
their knowledge to novel melodies when melodies were
transposed, suggesting that the mental representation of
melodies in the new musical system exhibited the same kind of
transpositional invariance as characterizes the common-practice
Western musical system. Effects were small and relatively
subtle compared to previous experiments, especially for
recognition scores in the two-alternative forced choice task in
which performance was only marginally above chance (p = 0.051).
This was perhaps due to the fact that although the identity of
the melody retains its invariance across transpositions,
changing the absolute frequencies of the melodies disrupts part
of the percept, in the same way that modulating the key of a
tune changes the surface characteristics of the melody without
altering the identity of the melody.

Perhaps an even more sensitive measure
of the transposable mental representation of the new musical
system was obtained from the probe tone ratings, which showed
that participants were increasingly sensitive to profiles of
exposed frequencies but decreased in sensitivity to the key of
the melody used in the testing procedure. This provides
compelling evidence that participants' development of
sensitivity to tone frequencies depended crucially on the
nature of the input.

Experiment 7. Further Exploring Effects of Set
Size

From experiments in the previous
chapter, we observed that repeated exposure to a small number
of melodies (i.e. hearing five melodies 100 times, as in
Experiment 1 of Chapter 3) led to ceiling levels of recognition
but no generalization, coupled with stronger preferences for
familiar items. In contrast, non-repeated exposure to a larger
number of melodies (hearing 400 melodies once each, as in
Experiment 1 of Chapter 4) led to significantly above-chance
(but not at ceiling) performance in both recognition and
generalization, but no significant change in preference. These
results suggest a double dissociation between generalization
and preference, where large numbers of exemplars leads to
generalization whereas increased repetition leads to
preference. Recognition and generalization seem to be at odds
when given small numbers of melodies, but approach similar
levels of performance as the number of exemplars increases,
suggesting that the same mechanism may be used for both
recognition and generalization after exposure to large numbers
of exemplars. That is, in the recognition task, items may
actually be treated as generalization items and judged on the
basis of their fit to the participant's representation of the
underlying grammar, rather than on the basis of item
recognition.

To date we have only shown experiments
using five, fifteen, and four hundred exemplars of melodies. As
a follow-up experiment and for the sake of obtaining a clearer
picture based on the different set sizes of exemplar melodies,
we replicated the experiment using ten melodies repeated 40
times each. 

Method

Participants. Twelve
university undergraduates from the University of California at
Berkeley were recruited using the same criteria as Experiment 1
of this chapter.

Stimuli. All stimuli and
materials were the same as the rest of the experiments in this
chapter except that only ten melodies were used. These ten
melodies were randomly selected out of the 400 from Experiment
1. The same ten melodies were used as exposure stimuli for all
participants.

Procedure. Experiments
included four phases: exposure, forced choice recognition test,
forced choice generalization test, and preference ratings
test.

Results

Participants performed significantly
above chance in the two-alternative forced choice recognition
test (t(11) = 20.1, p < 0.01), yielding near-ceiling levels
of performance (average = 91% correct; SE = 2%).

Generalization tests, however, yielded
only chance levels of performance (average = 49% correct; SE =
5%; t(11) = 0.16, n.s.). Thus, participants did not generalize
their knowledge to new melodies in the grammar. Figure 22 shows
recognition and generalization results from Experiment 2.

Figure 22. Two-alternative forced
choice results for Experiment 7. 

In the preference rating test, ratings
for old melodies were significantly higher than new melodies in
either grammar (figure 23). An omnibus analysis of variance
comparing ratings with the fixed factor of melody type (old
melodies same grammar, new melodies same grammar, different
grammar) and the random factor of individual subjects showed a
significant effect of melody type (F(2,22) = 17.4, p <
0.001). In addition, paired samples t-tests showed that ratings
for old melodies in the same grammar were significantly
different from ratings for the other two groups of melodies
(old versus new melodies in the same grammar: t(11) = 4.76, p =
0.001; old melodies versus melodies in the other grammar: t(11)
= 4.31, p = 0.001), whereas ratings for new melodies in the
same grammar were undifferentiated from ratings for melodies in
the other grammar (t(11) = 0.011, p = 0.99.)

Figure 23. Preference ratings following
repeated exposure to 10 melodies.

Conclusion

When exposed to 10 melodies in a new
musical grammar repeating 40 times each, participants
unambiguously recognized melodies they had heard before, but
did not generalize their knowledge to new instances of the same
grammar. This suggests that participants were engaging in rote
memorization of the smaller number of melodic exemplars, rather
than acquiring knowledge of the grammar. Old melodies in the
same grammar were rated as more preferable than new melodies
from the same grammar or new melodies from a different grammar,
suggesting that repeated listening to melodies can result in
changes in preference for those melodies.

Comparing this experiment to Experiment
2 from Chapter 3, we note that decreasing the number of
exposure melodies from 15 to 10 led to increased performance in
recognition but decreased performance in generalization.
However, with the increase in number of repetitions of each
melody, participants rated familiar melodies as more
preferable. The increased exposure to each melody resulted in
increased preference for familiar items, replicating the Mere
Exposure Effect, which was first reported following repeated
exposure to words, visual shapes, and nonsense Chinese
characters (Zajonc, 1968), but which has also been demonstrated
in music using both familiar and unfamiliar melodies (Peretz et
al, 1998; Tan et al, 2006).

Experiment 8. Effects of Prolonging
Exposure

Having shown various effects of
learning as a result of short (30-minute) exposure, it would be
interesting to investigate the effects of extended exposure on
learning and preference. This is especially important
considering that the normal human musical experience consists
of extensive exposure to music in the environment over the
period of years of living in one's culture. In this experiment
we presented participants with melodies over a two-day exposure
period, where exposure melodies from Day 1 consisted of 400
melodies once each (like Experiment 1 in this chapter) whereas
melodies from Day 2 consisted of 10 melodies 10 times each
(like Experiment 7 in this chapter). Pre-exposure probe tone
tests were administered before the two-day exposure phases,
whereas two-alternative forced choice, post-exposure probe
tone, and preference rating tests were administered after the
second day of exposure. We expect that following repeated
exposure to a large number of melodies, some of which are
repeated more than others, participants would show both
recognition and generalization, with higher recognition
accuracy for repeated melodies than for non-repeated ones. In
addition, we expect preference ratings to be higher for
repeated melodies, replicating the Mere Exposure Effect for
repeated melodies.

Method

Participants. Twenty-four
participants were recruited from the community via UC
Berkeley's Research Subject Volunteer Program. These
participants were paid $10 per hour to come back over two days
for the experiment. Participants were recruited based on having
normal hearing, but were not screened for musical training.

Stimuli. The same 400
melodies were used as in Experiment 1 of this chapter.

Procedure. Over the course
of two days, participants were presented with two separate sets
of exposure melodies. Day 1 of exposure consisted of 400
melodies once each in one of the two grammars, in the same
manner as Experiment 1 of this chapter. Day 2 consisted of 10
melodies 40 times each in the same manner as Experiment 7 of
this chapter. Pre-exposure probe tone ratings were obtained
before exposure to Day 1, whereas two-alternative forced choice
tests of recognition and generalization, post-exposure probe
tone ratings, and preference ratings tests were performed after
the 10-melody exposure phase on Day 2.

Results

           
Results from two-alternative forced choice tests are shown in
figure 24. Participants performed above chance in both
recognition and generalization tests (recognition for frequent
items on Day 2: 73%, t(23) = 5.70, p < 0.001; recognition
for rare items from Day 1: 58%, t(23) = 1.94, p < 0.06;
generalization: 59%, t(23) = 3.14, p < 0.01). Recognition
scores for frequently presented (Day 2) items were
significantly higher than either recognition for rarely
presented (Day 1) items (t(23) = 2.65, p < 0.05) or
generalization scores (t(23) = 2.92, p < 0.01), but
recognition for rare (Day 1) items and generalization scores
were not significantly different from each other (t(23) = 0.19,
n.s.).

Figure 24. Two-alternative forced
choice results for Experiment 8. 

Probe tone ratings were significantly
correlated with exposure frequencies both before and after
exposure (pre-exposure average r = 0.45, two-tailed t-test
against chance: t(23) = 9.88, p < 0.001; post-exposure
average r = 0.60, two-tailed t-test against chance: t(23) =
16.34, p < 0.001), but the post-exposure ratings were
significantly more highly correlated (two-sample t-test
comparing pre-exposure and post-exposure correlations: t(23) =
2.65, p = 0.014). When effects of the melody used to obtain
probe tone ratings were partialled out, the resulting
correlations were still significantly above chance for both
pre- and post-exposure ratings (pre-exposure partial
correlation = 0.21, t(23) = 4.13, p < 0.001; post-exposure
partial correlation = 0.35, t(23) = 7.63, p < 0.001), but
post-exposure partial correlations were significantly higher
than pre-exposure partial correlations (two-sample t-test:
t(23) = 2.16, p < 0.05).

Figure 25. Correlations and partial
correlations between probe tone ratings and exposure
frequencies before and after exposure.

Preference ratings showed that
frequently presented melodies were rated as more preferable.
Ratings for Old Grammatical, New Grammatical, and Ungrammatical
melodies were not significantly different (F(2, 69) = 0.59,
n.s.; see figure 26a); however, the difference between ratings
for Old Grammatical items and Ungrammatical items (i.e. the
Mere Exposure Effect) was significantly greater than zero
(average = 0.25, s.e. = 0.11, t-test against chance: t(23) =
2.22, p < 0.05, see figure 26b), suggesting that repeated
exposure to the melodies resulted in some increased preference
for rote items.

 

Figure 26a. Preference ratings for
different melodies.

Figure 26b. Mere Exposure Effect,
plotted as differences in preference ratings across different
conditions.

Conclusions

           
By presenting participants with multiple days of exposure to
the new musical system, where exposure consisted of various
repetitions of a large number of melodies, we observed both
recognition and generalization, as well as some level of
preference change. Consistent with previous experiments,
increased numbers of repetitions led to highly accurate
recognition for individual items, whereas high variability in
the exposure set led to generalization as well as above-chance
(but not extremely accurate) performance on recognition trials
for melodies that were presented without repetition. This
experiment suggests that given sufficient exposure, it may be
possible to induce preference change as well as generalization
within the same musical system, thus bringing these experiments
into a more realistic musical context, such as the case where
much prolonged exposure to variable instances of the same
underlying grammar, e.g. hearing many songs based on the same
chord progressions such as in pop music, may lead to knowledge
as well as preference for certain musical genres.

Meta-Analysis of Behavioral Data

The experiments reported thus far
include data from more than 300 participants. As a final
analysis procedure, all the data on recognition,
generalization, and preference ratings were pooled in order to
perform meta-analyses on all behavioral data in this
dissertation.

Using concatenated two-alternative
forced-choice and preference data, we performed direct
comparisons across different experiments. Stable trends of
results across different experimental manipulations can
elucidate the role of statistical properties (e.g.
manipulations of set size and number of repetitions of
melodies) and various musical attributes (i.e. key, timbre,
melody, and harmony) on the learning and liking of the new
musical system.

Results

Two-alternative forced choice test
results showed significant interactions between recognition and
generalization scores as a function of experiment, as
summarized in figure 27. A two-way ANOVA with factors of task
(recognition and generalization) by experiment (experiments 1-2
of previous chapter and experiments 1-8 of this chapter)
revealed significant differences between recognition and
generalization scores as indicated by main effect of task (F(1,
460) = 7.15, p < 0.05 Bonferroni-corrected). A main effect
of experiments was also found (F(9, 460) = 4.29, p < 0.001
Bonferroni-corrected). In addition, a significant interaction
between task and experiment was observed (F(9, 460) = 4.35, p
< 0.001 Bonferroni-corrected), suggesting that recognition
and generalization tapped into different mechanisms across
different experiments. Main effects of experiment were dictated
by different manipulations of sounds in that changing melodic
processes (i.e. blocking certain pathways of the finite-state
grammar) led to unsuccessful recognition and generalization and
transpositions led to smaller effect sizes of learning as
indicated by marginally above-chance levels of performance.
Disjoint sequences do not form good melodic gestalts as well as
melodies with smaller intervals (Meyer, 1956), which may
account for the difficulty of learning with disrupted melodic
intervals.

Figure 27a. Main effects of experiment
were observed for different experiments manipulating timbre,
transpositions, harmony, and disjoint melodic intervals.

Figure 27b. Interactions between task
performance (recognition vs. generalization) and experiment
(different numbers of melodies presented).

           
Preference ratings, on average, were similar across all the
different experiments (average rating = 3.75 on a 1 to 7 scale;
s.e. = 0.18; F-test comparing average ratings for different
experiments: F(10, 246) = 1.62, n.s.). Perhaps more
interesting, however, the difference between ratings for
familiar melodies and unfamiliar melodies, defined as MEE (Mere
Exposure Effect), was significantly different across
experiments. An ANOVA comparing the difference in preference
ratings (ratings for old grammatical melodies minus ratings for
ungrammatical melodies) across 11 different experiments yielded
significant differences between experiment (F(10,246) = 3.29, p
< 0.001). The effect of experiments was still significant
after applying Bonferroni post-hoc correction. The size of the
MEE was most sensitive to the number of repetitions across
different experiments. As shown in figure 28, the Mere Exposure
Effect increased as number of repetitions for old melodies
increased.

Figure 28. Mere Exposure Effect
(difference between ratings for familiar and unfamiliar
melodies for each subject) plotted as a function of the number
of times each melody was presented across different
experiments.

A comparison between recognition,
generalization scores, and MEE effects revealed that the MEE
followed the pattern of recognition performance, such that more
familiarity with individual items was typically associated with
increased preference. The size of the MEE covaried negatively
with generalization scores in that generalization yielded
superior performance given large set sizes of presented
melodies whereas MEE was larger given more repetitions. Figure
29 shows the double dissociation between MEE and
generalization.

Figure 29. Comparisons between
two-alternative forced choice tasks of recognition and
generalization against preference change (MEE).

General Discussion

In eight experiments, this
chapter explored the learning and liking of new musical systems
in detail. After conducting the baseline learning and control
experiments to show consistent learning of the Bohlen-Pierce
scale-derived artificial musical system as a result of
exposure, various statistical and musical properties of the
stimuli were manipulated while memory, frequency-sensitivity,
and preference were observed using a battery of behavioral
tests. Results showed that participants could learn the new
musical system regardless of timbre, harmony, and musical
training, but that changing melodic intervals disrupted the
ability to learn, possibly because the disjoint melodic
intervals formed by large interval leaps form poor gestalt
percepts for melodies (Meyer, 1956). Exposure to large sets of
melodies led to the generalization of knowledge to novel
instances from the musical grammar, but repeated exposure led
to superior performance in recognition as well as increased
preference for individual exemplars of the new musical
system.

Explicit prior musical training
does not significantly influence the learning of new musical
systems, but it is conceivable that other types of individual
differences, e.g. size of working memory and attention span,
could influence the learning of such a system.

Melodic intervals play a large role in
the learnability of new music, whereas harmony had relatively
little effect. When a subset of legal pathways was eliminated
from the finite state grammar, participants were unable to
recognize old melodies, nor could they generalize their
knowledge of the musical grammar. Manipulating harmony by
force-fitting the Bohlen-Pierce scale into an octave (forming
the new, forced-octave scale) had no effect on the learning of
the new musical system, suggesting that the vertical dimension
of harmony and the horizontal dimension of melody are
functionally dissociated at this stage of learning.

According to many theorists (e.g.
Sethares, 2004; Schwartz et al, 2003), one of the constraints
on musical systems is that musical harmony is constrained by
the distribution of frequencies of harmonic sounds in the
natural world (e.g. speech and timbres of most periodic
instruments). In the preceding series of experiments, we
investigated the contribution of timbres (manipulated here only
as a function of frequency or spectral distribution) to the
learnability of the new musical system. Results showed that
humans were able to learn a new musical system regardless of
whether timbre of musical tones was congruent with the musical
scale. 

Together, vertical and horizontal
dimensions of music seem quite functionally separate, where
vertical factors refer to timbre and harmony, whereas
horizontal factors refer to melody. In all these experiments
involving the presentation of melodies followed by listening
tests, manipulating factors in the vertical dimension did not
seem to have large influences on learning, whereas the
manipulation of factors on the horizontal dimension disrupted
learning. In an experiment where listening tests involve
harmony, the manipulation of vertical-dimension factors may
play a more crucial role.

We have introduced artificial grammars
derived from a new musical scale, and shown that given limited
exposure, human beings can learn many aspects of new music. The
knowledge acquired includes sensitivity to event frequencies of
tones in the new scale, recognition of melodies, generalized
familiarity with chordal grammars underlying large sets of
melodies, and preference increases after repeated exposure to
small numbers of melodies. The current research presents
various possibilities for music, psychology, and cognitive
science. It allows testing for learning and memory using a
system free from prior exposure and semantic associations, thus
affording a new approach for investigations of auditory
perception and music cognition that avoids the confounding
factors of long-term memory. The observed double dissociation
between grammar learning and preference formation needs to be
accounted for in models of cognition and emotion. As an
application of artificial grammar learning, the present
research converges with recent work investigating language
acquisition and implicit learning studies more generally (e.g.
Newport & Aslin, 2000; Reber, 1989) in providing evidence
for a rapid, domain-general learning mechanism in humans.
Furthermore, the finding that repeated exposure can lead to
increased preference for new music has optimistic implications
for contemporary music theory and composition. Finally,
research on the Bohlen-Pierce scale offers a new way of
thinking about the structure of music and sounds. Musical
systems of the world have traditionally been bound by prior
assumptions regarding the necessity of the octave. By
questioning these assumptions through experimenting with an
alternative musical system, the present research redefines our
understanding of the human musical experience.