Perception Lecture Notes: Visual Motion Perception

What you should know about this lecture

Motion aftereffect and other motion illusions
What is motion good for?
- Motion-based segmentation
- Depth from motion and motion parallax
- Navigation and collision avoidance
- Shape and recognition
Optic flow
Motion blindness
Cortical area MT is functionally specialized for motion
- Neurons in MT are selective for motion direction.
- Neural responses in MT are correlated with the perception of motion.
- Damage to MT or temporary inactivation causes deficits in visual motion perception.
- Electrical stimulation in MT causes changes in visual motion perception.
- Computational theory quantitatively explains both the responses of MT neurons and the perception of visual motion.
- Well-defined pathway of brain areas (cascade of neural computations) underlying motion specialization in MT.
Cortical area MST is functionally speicalized for optic flow
STS is specialized for recognizing biological motion
Corrolary discharge explains why eye movements do not evoke a perception of motion.

Motion is a perceptual attribute: the visual system infers motion from the changing pattern of light in the retinal image. Often the inference is correct. Sometimes it is not. In class I showed you a number of demonstrations in which motion is misperceived. Below is one example of a visual illusion of motion that I made. It is a tribute to Duchamp's cubist painting titled "Nude Descending a Staircase" in which the changing pattern of light gives the illustion of motion even though she never gets anywhere (you made need to double-click on the image below or reload the page for the animation to play).

Below is yet another example of a motion illusion.

Role of motion perception: Motion perception serves lots of helpful functions.

Simply detecting that something is moving, draws your attention to it.
Segmentation of foreground from background.
Compute the 3D shape of an object.
Compute the distance to various objects in the scene and estimate the direction in which you are heading within the scene. For example, hold up two fingers (one on each hand) at different distances, and move your head slowly from side to side while fixating an object on a far wall. Things that are further away slide across the retina more slowly. When there is strong motion on your retina, especially in peripheral regions, you can misattribute that motion and perceive yourself as moving (called "vection"). Movies (especially with large screens as in an IMAX theater) can give this illusion that you are moving.
Recognition actions, such as movements of a human (in the "point light displays" shown in class of people walking, dancing, etc., displayed as the motion of a small number of dots attached to the joints of the person).

The image above is unrecognizable. But when set into motion, it is easy to interpret. Download and play the animation (4 Mb mpeg movie).

Download and play the movie (3.8 Mb QuickTime movie with sound).

Download and play the movie (1.9 Mb QuickTime movie with sound).

Optical Flow: The diagram below is a representation of the physical motion in an image while an observer moves through the environmet.

Each arrow represents the speed and direction of motion for each little patch of the visual field. Near points move fast (long arrows), far points move slowly (short arrows). In this example, the arrows point away from a single point called the focus of expansion that corresponds to where the observer is heading. The first step in motion perception is for the visual system to estimate optical flow from the changing pattern of light in the retinal image. Then, the 3D motions of observer and objects can be inferred from the optical flow. The optic flow then provides information about the observer's heading and the relative distance to each surface in the world. J. J. Gibson hypothesized that there's sufficient information in the visual stimulus to specify a unique, unambiguous interpretation of 3D motion and depth. Recently, mathematicians have proven that this hypothesis is basically correct. There is a caveat, however: distance and speed are ambiguous (i.e., they trade off). That is, a small, close object when you are moving slowly creates the identical retinal images over time as a large, distant object when you are moving quickly. That's why you need a speedometer in your car. You are lousy at making absolute speed and distance judgements. But, you are very good at relative speed/direction and relative distance.

Object motion: Two different kinds of events can cause visual motion. When an observer moves through an otherwise stationary environment, the entire retinal image changes over time as discussed above. Or an object can move while the observer is stationary so that a small region of the retinal image changes over time. Often, of course, both of these things happen at once, but it is helpful to consider the two separately. Motion provides information about shape even in the complete absence of other shape cues (you made need to double-click on the image below or reload the page for the animation to play).

Here's another example.

Biological motion: Motion can provides information to allow the recognition human movements. This is true even for cases in which there is very limited information provided at only a few select points (you made need to double-click on the images below or reload the page for the animations to play).

Visual Motion in the Brain

Functional specialization hypothesis: There are specific brain area(s) that are involved in visual motion perception.

Evidence for this includes a patient known as LM who, following a stroke, had great difficulty perceiving certain types of motion. Color vision and acuity remained normal, and there was no difficulty recognizing faces or objects, no difficulty with stereo. But LM cannot see coffee flowing into a cup: appears frozen like a glacier, does not perceive the fluid rising, and often lets the coffee spill or overflows. LM feels uncomfortable in a crowded room or on a street. "People were suddenly here or there, but I have not seen them moving... When I'm looking at the car first it seems far away, but then when I want to cross the road suddenly the car is very near". LM's lesion extends over a substantial region of visual cortex, so one can not localize sharply the regions relevant to LM's motion deficit. This makes it particularly surprising that loss of motion perception can be so cleanly dissociated from other visual abilities.

Area MT is one of the most studied regions of the cortex of the brain, probably second only to V1. Current opinion is that the optical flow field is computed and represented by neurons in area MT.

Neurons in MT are selective for motion direction.
Neural responses in MT are correlated with the perception of motion.
Damage to MT or temporary inactivation causes deficits in visual motion perception.
Electrical stimulation in MT causes changes in visual motion perception.
Computational theory quantitatively explains both the responses of MT neurons and the perception of visual motion.
Well-defined pathway of brain areas (cascade of neural computations) underlying motion specialization in MT.

MT neurons receive inputs from direction-selective neurons in V1. MT neurons are velocity selective, each responds best to a preferred velocity (speed and direction) within its receptive field, pretty much independent of stimulus pattern. By contrast, a direction-elective V1 neuron confounds motion with pattern. A typical V1 neuron responds to a particular orientation (edge or bar) moving in a particular direction. The response of the V1 neuron also increases with contrast. A typical MT neuron, on the other hand, responds to almost any pattern with almost any contrast, as long as it moves with the right velocity.

MT physiology movie (6.7 Mb QuickTime movie)

In class we viewed a video that demonstrates direction-selectivity of MT neurons. You can download this movie (clicking above). The video shows visual stimuli while recording from an MT neuron. The electrode was connected to an amplifier, and output to a loudspeaker. The audio track allows you to hear the loudspeaker - each click corresponds to an action potential. This example MT neuron responds strongly to down-left motion, not at all to up-right motion, and with intermediate firing rates to intermediate directions.

How does one construct a model of how one constructs a neuron that is direction-selective? The first step is to understand what motion is:

This is a plot that represents a moving bar. In this plot, the x-axis represents space (in particular, horizontal location). The vertical dimension of space is not represented here. Rather, the y-axis represents time, with later times represented lower on the plot. The plot shows a red bar that in the beginning (top of the plot) is not visible, then suddenly appears near the left. As time progresses, it moves smoothly rightward until, near the end (bottom of the plot), it disappears. Orientation in this plot indicates motion direction and speed:

These icons represent five different motion paths. From left to right they are: fast rightward motion, slow rightward motion, static/unmoving, slow leftward motion and fast leftward motion. To build a direction-selective neuron, we first recognize that, in this plot, a neuron tuned for speed and direction is effectively (again: with respect to this plot) orientation tuned:

The receptive field indicated above by the black outlines looks like a garden-variety orientation-tuned simple cell. But, there's a big difference. As plotted, these excitatory and inhibitory receptive field regions are plotted in space-time! For example, the slant of the excitatory region indicates that the neuron will be excited if a stimulus arrived near the right side of the plot little while ago but prefers the stimulus to have been further to the left at an earlier time. One way to build such a detector is to combine detectors of light at the two locations and combine them with different time delays. Effectively, one tries to detect the coincidence of stimulation at the left from the past and stimulation at the right from the more recent past. Click on the link below to see a movie of such a model:

Consider a perceptual effect known as the motion aftereffect. Stare at the center of the following animation for about a minute, as it expands continuously (you may need to reload the page to get it moving again after it stops), then fix your gaze on colorful texture pattern next it it.

After viewing continuous motion in the same direction for a long time, if you look at a stationary object, it appears to move in the direction opposite to the one you were viewing. This is sometimes called the "waterfall illusion" - if you look at a waterfall for a while, then look at a tree next to it, the tree appears to move upward. The demonstration above shows that this adaptation is local in the retina (to the right of where you were looking, you were adapting to rightward motion, to the left you adapted to leftward, and so on). We take this as evidence for the existence of neurons that are sensitive to motion and selective for the direction of motion, which adapt to the stimulus (analogous to size, tilt and color adaptation after-effects):

This slide shows three panels, each indicating the activity in neurons selective for upward and downward motion. In the first patter, both fire relatively weakly in response to a static, unmoving scene. The middle panel shows these same neurons responding to downward motion (as seen, for example, in neurons with receptive fields overlying the lower part of the adaptation stimulus above); the downward-preferring neurons fire strongly. After adaptation, a static stimulus leads to weaker responses from the adapted, downward-preferring neurons.

Motion analysis is a bit more complicated than this. Consider a long, black, diagonal line on a white background, tilted up-and-to-the-right. If we move such a stimulus up and to the right, then a neuron with a receptive field lying over a middle section of the line sees an unchanging stimulus; the only indication of motion comes from the ends of the line, not from the middle. Even if the line does change position, say by moving rightward, from the middle of the line one would not be able to distinguish rightward, downward, down-and-to-the right and many other directions of motion. This is made clear in the following movie, which shows a series of such lines moving all together, but seen through apertures of various shapes:

The central portions of each line do not give unambiguous motion information (because motion in the direction along the line is invisible). Thus, perceived direction of motion is controlled by the collection of line endings (in this case occlusion boundaries at the edges of each aperture). In the vertical rectangle, most line endings move downward and the whole grating appears to move downward. In the horizontal rectangle, most line endings move rightward and so the grating appears to move rightward. In the circle, neither dominate so only motion perpendicular to the contours is perceived, so it appears to move down-and-to-the-right. This ambiguity of motion direction for one-dimensional stimuli is called the aperture problem.

If a stimulus is not one-dimensional, and includes contours or other components with varying spatial orientation, motion is no longer ambiguous:

The pattern, shown in the middle, is a cross-hatching (two diagonal gratings) moving horizontally to the right. We show the analysis by two orientation-selective neurons shown at the top. Because of their orientation selectivity, each neuron "sees" only one of the two component gratings in the cross-hatched stimulus (e.g., the first neuron with direction preference indicated by the green arrow only sees the green grating in the cross-hatched stimulus). Because of this, the outputs of the neurons suffer the aperture problem: from the response of the first neuron alone, we don't know if the stimulus is moving down-and-to-the-right, downward, rightward, etc. Effectively, any component of motion along the green grating is invisible to this neuron. The lower panel indicates that all of the velocities indicated by the green arrows are consistent with the motion preference of this neuron (each arrow indicates velocity, i.e., both direction, indicated by the orientation of the arrow, and speed, indicated by its length). The second neuron prefers motion up-and-to-the-right, but can only see the red grating. The set of motions consistent with its preferred direction of motion are indicated below in red. If both neurons fire strongly, the only motion direction consistent with both neurons' sets of preferred velocities is horizontal, rightward motion (indicated by the yellow arrow). Thus, a neuron that had inputs from both of these two neurons and fired only in response to inputs from both of them would solve the aperture problem. This form of solution is called the intersection of constraints and is thought to occur in area MT, an area of the brain in which almost all neurons are direction-selective.

Human cortical area MT can be readily identified with fMRI (you made need to double-click on the image below or reload the page for the animations to play).

Top right panel above: Flickering checkerboard stimulus alternating with a blank uniform gray field. Bottom right: Axial (horizontal) slice through the of the brain with functional activity superimposed in color. Flickering checkerboards "light up", evoking strong activity, in the primary visual cortex (V1) at the very back of the brain. Top left: Moving versus stationary dots stimuli. Bottom left: Moving dots again "light up" V1, but also evoke strong activity in area MT, a lateral area of the occipital lobe (just behind your ears) involved in visual motion perception.

There also appears to be a columnar architecture in MT for stimulus motion; neurons with similar motion preferences lie nearby one another, with an orderly progression from one motion direction to the next as you move through MT, analogous to orientation columns in V1.

MT and motion perception: The experiments described above show that the activity of MT neurons is correlated with motion perception. Bill Newsome (Stanford Neurobiologist) and colleagues took this an important step further to demonstrate a causal relationship between neural activity in MT and motion percpetion. They used electrical microstimulation to change the responses of a small number of MT neurons and showed that this affected the perception of motion.

First, they trained monkeys to perform a difficult motion discrimination task. The monkeys viewed moving dots and decided which way they went, e.g., either up or down. In the stimulus, only a subset of the dots moved in the indicated direction; the others moved randomly. The percentage of dots moving in the given direction was varied (in the graph, a negative percentage means the dots were moved in the opposite direction).

Then they characterized how MT neurons responded during this task. For a neuron's preferred direction of motion (upward in this case) the neuron responds more as coherence is increased.

Next they inserted stimulating electrodes into a particular column of MT neurons (in this case neurons preferring downward motion) and injected pulses of current while the monkeys performed the motion discrimination task. On half the trials, the motion was upward. On the other half, the motion was downward. On some trials the motion was high coherence and on others it was low coherence. All of these stimulus conditions were randomly interleaved. Critically, on half the trials (also randomly interleaved), small pulses of current were injected which caused the neurons near the electrode tip to fire more action potentials (on top of those that were evoked by the motion stimulus by itself).

When the electrode was in a column selective for downward motion, electrical stimulation biased the monkey to report "down" more often. This is a very cool result: stimulating MT neurons directly influences behavior, presumably because it influences the monkeys' conscious percepts.

The blue curve in the graph indicates what the monkey perceived with no electrical stimulation. The graph plots the proportion of trials on which the monkey indicated the dots were moving in the preferred (downward) direction, as a function of the percentage of moving dots. On this graph, 20% coherence means that 20% of the dots really did move downward whereas the other 80% moved randomly whereas -20% means that 20% of the dots moved in the opposite (upward) direction. When there are a lot of downward dots (20%), the monkey always reports seeing them as moving downward. When there are a lot of upward dots (-20%), the monkey never reports seeing them as moving downward.

The yellow curve shows the effect of electrical stimulation. On these trials, the monkey has a stronger tendency to report downward motion (the yellow curve is above the blue curve).

Coritcal area MST is right next to MT. MST neurons have very large receptive fields, respond selectively to complex optical flow fields: expansion, contraction, rotation. It is believed that MST is involved in 3D motion perception, inferring 3D motion of objects/observer from optical flow.

Neurons in area STS respond selectively to biological motion, like the point-light walkers (above).

Eye Movements and Motion Perception

When an observer is moving, the visual system uses the changing retinal image to infer the observer's trajectory and 3D structure. When an observer is stationary, but the object is moving, the visual system infers the motion and structure of the object. How does the visual system keep track of what's moving? Are you moving or is it the scene that you're looking at that is moving? Sometimes your visual system gets it wrong (sitting in a train at the station when the neighboring train pulls away, an example of vection) but most of the time your visual system gets it right.

Answer: Visual images are combined with other information to inform you about the motion of your eyes, head, and body. The vestibular system provides information about the motion of your head and body. A copy of the eye movement command from the eye movement centers in the brain stem provides information about eye movements. Vision is combined with vestibular and eye movement signals in Area MST.

Push gently on the side of your eye and the world appears to jiggle around. Helmholtz did this experiment and came up with a theory of how eye movement information is combined with change in the retinal image to yield the motion percept. Generally speaking, the brain is divided into motor areas and sensory areas. The corrolary discharge is a copy of the motor signal that is transmitted to a comparator - a hypothetical structure that receives both the corrolary discharge and the sensory movement signal (maybe in MST). If the visual motion signal is the same as the eye movement command, then you don't "see" motion. If the visual motion signal is different from the eye movement command, then you do see motion.

David Heeger