Perception Lecture Notes: Frequency Tuning and Pitch Perception

Professor David Heeger

What you should know from this lecture

Frequency Tuning

von Békésy used linear systems theory and Fourier analysis to characterize the motion of the basilar membrane. He found that the basilar membrance acts as a shift invariant linear system (by testing shift-invariance, the scalar rule, and superposition). He then used sinusoidal stimuli (pure tones) to measure the frequency response at different points along the basilar membrane. He then was able (using linear systems theory) to predict the response (that is, the motion of the basilar membrane) for any sound.

This is a schematic diagram of the uncurled cochlea. The cochlea is not a homogeneous piece of tissue. It varies in thickness and elasticity as it curls from the oval window out to the helicotrema. The effect of this is that different parts of the basilar membrane respond more strongly to some sounds than others. For sinusoidal (pure tone) sounds, each point on the basilar membrane oscillates up and down at the same frequency as the sound. What differs from point to point is the size of the oscillation.

This figure shows the displacement of the basilar membrane over time, in response to a pure tone stimulus. Some points are displaced up and others down. Over time, the different points on the membrane move up and down (indicated by the 3 curves in the bottom panel of the figure). The entire motion that occurs on the basilar membrane in response to a sound stimulus is called a traveling wave. Each point moves up and down sinusoidally; different points move up and down slightly delayed (out of phase) with respect to one another, yielding the traveling wave. The wave begins at the oval window, rises to a crescendo somewhere along the basilar membrane, and finally falls off with the energy being absorbed around the helicotrema. The dashed lines indicate the envelope of the membrane modulation, the maximum excursion of that bit of membrane throughout the duration of the traveling wave.

This movie shows a simulation of the travelling wave motion along the basilar membrane, again in response to a pure tone stimulus.

Envelope for several frequencies

Each point along the basilar membrane oscillates a different amount, depending on the frequency of the sound. Points near the oval window, at the start, oscillate the largest amount in response to high frequency tones. Points near the helicotrema oscillate by the largest amount in response to low frequency tones.

This graph shows the location of peak excursion for different tone frequencies. These measurements were made on post mortem human ears. This simply summarizes what I've already said. The location of the biggest oscillation depends in a systematic way on the frequency of the tone.

The reason for the appearance of the travelling wave along the basilar membrane is the fact that the stimulus begins with a push at the oval window, which forces the part of the cochlea nearest the oval window to begin oscillating, and then it takes time for that oscillation to propagate down the length of the cochlea. The reason that the travelling wave peaks at one location is because the different points of the basilar membrane oscillate by different amounts - different amplitudes - in response to different tone frequencies.

But what about sounds that are not simple pure tones. Becuase the motion of the basilar membrane behaves like a shift-invariant linear system, we can readily predict its motion in response to a complex sound, just by knowing its motion in response to pure tones. The motion in response to a complex sound is just the sum or the responses to the pure tone components of that complex sound.

Cochlea movie

This animation (make sure to have the volume up high enough to hear the sound track) shows a simulation of the basilar membrane for some complex sounds.

Each auditory nerve fiber is connected to a small number of hair cells, near one another, on the basilar membrane. The nerve fiber's response is governed, therefore, by the motion of a small region of the basilar membrane. And the basilar membrane in any small region undergoes its largest motion only for a limited range of frequencies.

This graph plots the sensitivities of each of three auditory nerve fibers, in response to pure tones of different frequencies. The horizontal axis plots the frequency of the input stimulus. The vertical axis plots the threshold stimulus intensity, the minimum sound pressure level (in dB) needed to evoke a response. Notice that each neuron is most sensitive over a narrow range of frequencies (about 700 Hz, 1300 Hz and , near 10,000 Hz). The most sensitive frequency for an auditory nerve fiber is called the neuron's characteristic frequency. Different auditory nerve fibers attach to different portions of the basilar membrane. The nerve fiber with characteristic frequency of 10,000 Hz must be connected to the section of the basilar membrane near the oval window because it is tuned to very high frequencies. The nerve fiber with characteristic frequency of 700 Hz must be attached near the helicotrema.

Summary: The representation of information on the basilar membrane and in the 8th nerve is very different from the representation at the tympanic membrane. The tympanic membrane displaces to a 100 hz tone, a 1000 hz tone, and to a 10,000 hz tone. It responds to all auditory stimuli, come what may, and faithfully reproduces the changing air pressure by a displacement. By watching any part of the tympanic membrane, we can discriminate between these stimuli.

By the time we have reached the cochlea and beyond, the physical signal is no longer represented by a single mechanism. Rather, we now have forty thousand mechanisms - the 8th nerve fibers coming from the cochlea to the brain - which each encode different portions of the stimulus. And each component is deaf to most of the range of auditory frequencies. This decomposition of the response into different neural channels is very similar to what we saw with the swinging pendulum. There are lots of motions that cause no response whatsoever in the long pendulum. From its point of view, it is as though nothing is happening.

Place and Temporal Code Theories of Pitch Perception

Pitch is a perceptual attribute, not a property of the physical stimulus. In a loose and imprecise way, the pitch we perceive is related to the frequency of the sound.

Place Code Theory: Helmholtz's theory of pitch is based on observations of the anatomy of the ear. It has been the most important theory of hearing for 100 years. Sensation of a low frequency pitch derives exclusively from the motion of a particular group of hair cells, while the sensation of a high pitch derives from the motion of a different group of hair cells. Each sensation is perfectly identified with the action of an anatomical location along the basilar membrane. The place code theory is given that name because it identifies each pitch with a particular place along the basilar membrane. It assumes that any excitation of that particular place gives rise to a specific pitch.

This figure shows an illustration of how place code theory relates to what we have learned about the frequency tuning in the cochlea. For a low frequency tone (top row), the largest motion is at position 1 along the basilar membrane. Hence, there are action potentials in auditory nerve fibers connected to position 1. For a high frequency tone, the largest motion is at position 2 so there are action potentials in auditory nerve fibers connected to position 2.

Temporal Code Theory: According to temporal code theory, the location of activity along the basilar membrane is irrelevant. Rather, pitch is coded by the firing rates of nerve cells in the audotry nerve. In principle, this makes a lot of sense. A low frequency tone causes slow waves of motion in the basilar membrane and that might give rise to low firing rates in the auditory nerve. A high frequency tone causes fast waves of motion in the basilar membrane and that might give rise to high firing rates.

This figure shows an illustration of how temporal code theory relates to the cochlea. Both the low and high frequencies evoke responses at both positions, but there are more action potentials in response to the high frequency.

However, there's a problem with temporal code. The ear is sensitive to frequencies from about 20 Hz up to 20,000 Hz. But a single nerve cell can not signal at a rate of 20,000 Hz. Therefore, the possibility of a temporal code accounting for the detection of the pitch of a 20,000 Hz tone seems impossible because no nerve cells can conduct that many impulses per second. And, in fact, Hallowell Davis, in the 1930s, showed that the maximum response rate of auditory neurons in the cat is about 1000 action potentials per second.

Cochlear Microphonic: The cochlear microphonic is a discovery that cast doubt on Helmholtz's place code and supports the temporal code theory. It was discovered by Wever. The cochlear microphonic is a small electrical signal that can be measured by an electrode placed near the hair cells of the cochlea. We now know that the cochlear microphonic arises from the sum of electrical potentials in the hair cells of the cochlea. It mimics the form of the sound pressure waves that arrive at the ear. Low frequency tones result in low frequency modulations of the cochlear microphonic electrical signal. High freq tones result in high freq modulations of the electrical signal. Combinations (sums) of high plus low frequency tones result in sums of high and low frequency modulations in the cochlear microphonic electrical signal. In fact, the cochlear microphonic is a shift-invariant linear system that obeys the scalar, additivity, and shift-invariance rules.

Volley Principle: The volley principle reconciles the fact that the cochlear microphonic mimics the sound pressure waves with the implausibility of the temporal code. Wever suggested that while one neuron alone could not carry the temporal code for a 20,000 Hz tone, 20 neurons with staggered firing rates could. Each neuron would respond on average to every 20th cycle of the pure tone, and the pooled neural responses would jointly contain the information that a 20,000 hz tone was being presented.

Phase Locking is an empirical observation that supports the volley principle.  When auditory nerve neurons fire action potentials, they tend to respond at times corresponding to a peak in the sound pressure waveform, i.e., when the basilar membrane moves up. The result of this is that there are a bunch of neurons firing near the peak of each and every cycle of a pure tone. No individual neuron can respond to every cycle of a sound signal, so different neurons fire on successive cycles. Nonetheless, when they do respond they tend to fire together.

Why is phase locking important? What you need (for temporal code theory, and to explain the cochlear microphonic) is for the neural activity to look just like the sound pressure waveform. The response (across the whole population of hair cells/8th nerve fibers) must follow each rise and fall of sound pressure level in the sound signal.

Wever's temporal code theory (based on the volley principle) was a clear rejection of Helmholtz's Place Code Theory, and it was backed up by compelling data (cochlear microphonic and phase locking). Wever said that the particular neuron that was signalling was not important, but instead, the way in which the neurons signalled together contained the information as to the pitch of the sound.

How might you test thest two alternative hypotheses? Discussion...

White's Cochlear Implants: Professor John White of the electrical engineering Department at Stanford did some experiments that directly addressed these 2 alternative hypotheses. The ultimate goal of his research was to produce cochlear implants to make up for some kinds of hearing loss. There are many such diseases, including one fairly common one called Meniere's disease, that can poison and destroy the hair cells in the inner ear, while leaving the auditory nerve and the rest of the auditory system intact. What we would like to do for these patients is to send a signal directly to the auditory nerve that will effectively substitute for the signal that the auditory nerve would be receiving were the system fully intact.

White's early experiments with cochlear implants were designed to test the place and temporal code theories of pitch perception. White implanted four electrodes located at different positions along the basilar membrane. He tested the two theories by delivering different types of electrical stimuli to his observer and asking the observer to estimate the pitch of the signal delivered by the prosthetic device. He varied the signal in two ways. First, he varied which of the four electrodes was used for stimulation. By measuring the dependence of pitch on which electrode was being stimulated he could test the place code theory. Second, White varied the rate of the electrical stimulation. He stimulated either with a low frequency series of electrical pulses through one of the electrodes, or with high frequency series of pulses through the same electrode. By measuring the dependence of pitch on the frequency of electrical stimulation he could test the temporal code theory.

As it turns out, both mechanisms play a role in pitch perception. As the stimulating frequency is increased, the subject tends to report a higher pitch. This continues over a significant range, up to a maximum of about 300 hz. At that point, the rate of stimulation on the electrode does not seem to influence the subject's judgement. The perceived pitch also depends on which electrode was doing the stimulating, i.e., the place that is being stimulated is also important information.

In fact, this makes sense. Place coding is weak below 300 Hz because of a broad pattern of oscillation of the basilar membrane at low frequencies (look back at the figure near the beginning of the lecture showing the envelope of basilar membrane motion for low frequencies). The temporal code works best at low frequencies because fibers can phase lock most easily for low frequencies.

Caveat: the patient reported that these electrical stimulations did not sound particuarly like tones, but rather they sounded like a noisy kind of buzzing. The buzzing could appear to be at different pitches. But it was, nonetheless, a buzz rather than a clear tone with a distinct pitch.

Virtual Pitch

Construct a sound that is made by adding pure tones with frequencies 400, 800, 1200, and so on. The 400 Hz component is called the base or fundamental frequency of the tone complex, and the other frequencies are called the higher harmonics. Most sound sources (your vocal tract, musical instruments) produce sounds like this. The higher harmonics come along for the ride.

Imagine that you perform the following experiment. Present the tone complex pictured in (a), then present a pure tone, and ask the observer to set the frequency of the pure tone so that its pitch matches the frequency of the tone-complex. This is an example of a matching experiment. The perceived pitch of this tone complex is very much the same as a pure tone with the same fundamental frequency (400 Hz). Next repeat this experiment using the tone complex pictured in (b) which has a fundamental frequency of 800 Hz and harmonics at multiples of that fundamenatal (1600 Hz, 2400 Hz). The perceived pitch of this tone complex is again the same as a pure tone with the same fundamental frequency (800 Hz this time).

Now take the original tone complex with 400 Hz fundamental and harmonics and remove (subtract out) the 400 Hz component, as pictured in (c). The lowest frequency is now 800 Hz, so you might think that perceived pitch of this new tone complex would match that of an 800 Hz pure tone. Surprisingly, observers still match the complex with a pure tone of 400 Hz.

This is a challenge to Helmholtz's place theory because the tone complex in (c) does not contain any energy that would stimulate the auditory nerve at the point where a tone of 400 Hz would stimulate the nerve. If pitch is encoded by position alone, then how can these two yield the same pitch? This is also a challenge to Wever's volley theory, because there is no energy (or oscillation) in the tone complex at 400 Hz, i.e., there is no 400 Hz component in the cochlear microphonic. Both the place theory and the temporal code/volley theory play roles in pitch perception. However, neither theory provides a complete explanation of pitch perception. Even though it is a seemingly simple perceptual attribute, pitch is not currently fully understood.

If you shift the tone complex to higher frequencies (e.g., from 400, 800, 1200,... up to 500, 900, 1300,...) it is perceived at a slightly higher pitch. Note that this manipulation is a bit odd in that the tones of the new complex are no longer exact harmonics of 400 Hz or any frequency near 400 Hz. The auditory system accepts them as "nearly harmonic'' and identifies/assigns a virtual pitch.

Roger Shepard and others having taken advantage of residual pitch to produce an auditory illusion that gives the sensation of a sound that continuously changes in pitch, rising or falling forever.

The intensity of each component is specified by an amplitude envelope that tapers off at very high and low frequencies.The frequency components then shift upward gradually, increasing in frequency over time, but with the amplitude of each component constrained to be that specified by the fixed, non-shifting envelope. As a result, the low frequency tones gradually increase in amplitude and the high frequency tones gradually decrease in amplitude, as they all shift up in frequency. When one tone falls off the top (note that its amplitude has been reduced to zero by then), a new one is added down at the bottom (initially with zero amplitude, but gradually increasing). The result sounds like it is rising in pitch forever, but never manages to get much higher than where it started, much like an M. C. Escher print.


Copyright © 2006, Department of Psychology, New York University
David Heeger