Perception Lecture Notes: Depth, Size, and Shape

What you should know about this lecture

Pictorial depth cues (texture, shading, perspective, etc.)
Size constancy
Monocular, physiological cues (blur, accommodation, etc.)
Movement cues (parallax, kinetic depth effect)
Binocular cues (vergence, disparity)

Binocular disparity: definition, crossed, uncrossed, dependence on depth and distance, horopter
stereoscope, stereogram
random-dot stereogram and the correspondence problem
fusion, suppression, diplopia, binocular rivaly
disparity selectivity of binocular neurons in V1

Pictorial Depth Cues

Two objects of the same size at different distances subtend different visual angles.
Two objects at different distances that subtend same visual angle have different physical sizes.

With only one eye open, you still see with a sense of depth, but there is inherent ambiguity between size and distance. What cues does visual system use? In class we reviewed a large set of such cues: relative size, occlusion, cast shadows, shading, dynamic shadows (shadow motion), aerial perspective, linear perspective, texture perspective, and height within the image. Most of these are based on the concept illustrated above: the size of the retinal image of an object is proportional to the object's size, but inversely proportional to the distance to the object.

Texture made up of little circular texture elements on curved surface

Texture provides 3 cues about shape/distance:

texture elements become more dense with distance.
texture elements become smaller with distance
foreshortening (circles become ovals) when the surface is tilted away.

Shading: crater vs mound

Brightness of a surface depends on its orientation with respect to the light source. The visual system assumes that the light comes from above. Brighter patches appear to be tilted up facing the light.

Shading and contour

The interpretation of shape from shading interacts with the interpretation of shape from contours. These two images have the same shading, but different bounding contours, and you see different shapes.

Railroad tracks

Linear perspective is another monocular depth cue. The distance between the rails is constant in the 3D scene but gets smaller and smaller in the image. This is a cue for distance. The visual system uses this to compare the sizes of objects. The two lines are the same length but the one on top appears bigger because it is seen as being further away and the visual system is compensating for the perspective. This compensation for distance in interpreting size is known as "size constancy".

Ames Room

The man on the left is actually almost twice as far away from the observer as the man on the right. However, when the room is viewed through the peephole, the actual distances can not be seen. Since you perceive the two people to be at the same distance from you, the one who has the larger visual angle appears larger.

Size constancy: Hallway

The visual system compensates for perspective in making judgements about size. It is striking that we are so unaware of this. We have a tendency to interpret shape and size in 3D - often unaware of 2D size.

Shepard Tables

A central premise of object perception is that we see objects in a three-dimensional world. If there is an opportunity to interpret a drawing or an image as a three-dimensional object, we do. The two table tops above have precisely the same two-dimensional shape on the page, except for a rigid rotation. Nobody believes this when they first look at the illusion. The illusion shows that we don't see the two-dimensional shape drawn on the page, but instead we see the three-dimensional shape of the object in space.

Monocular Physiological Cues

When we fixate an object, we typically accommodate to the object, i.e., change the power of the lens in our eyes to bring that object into focus. The accommodative effort is a weak cue to depth. Once we've accommodated to that distance, objects that are much closer or further from us than that distance are out of focus on our retina. Thus, blur is a cue that objects are at a different distance than the accommodative distance, although the cue is ambiguous as to whether the objects are closer or more distant. Even weaker still as depth cues (although theoretically useful) are the image distortions resulting from astigmatism (the cornea isn't a perfect sphere) and chromatic aberration (when yellow light is in focus, blue light is out of focus, from a given distance to the object).

Movement Cues

The next set of cues involve movement on the retinal image. There are two cases. If the observer moves through a stationary environment, the resulting movement is called motion parallax (discussed briefly in the previous chapter). Objects will move at different speeds on your retina (for a particular speed of observer movement and choice of fixation object) depending on their distance from the observer. In the second case, the observer is stationary but the object is in motion (e.g., it is rotating and/or moving in a straight path relative to the observer). The resulting retinal velocities will depend on the relative distance of each object feature from the observer resulting in the kinetic depth effect (the calculation of depth here is called structure-from-motion).

Stereo Vision

Stereopsis: greek for "solid sight".

Close one eye, and hold up your two index fingers, one fairly close to your face and one as far as you can reach. Fixate the more distance hand, and alternately view the scene with your left eye and then your right. As you can see, the distance between the two fingers is different in your left than in your right eye; their relative positions in the two retinae are disparate.

Illustration of binocular disparity

Binocular disparity is defined as the difference in the location of a feature between the right eye's and left eye's image. The amount of disparity depends on the depth (i.e., the difference in distance to the two object and the distance to the point of fixation), and hence it is a cue that the visual system uses to infer depth. Wheatstone (1838) was first to figure this out. Before that, people were confused, thought that having two eyes posed a problem because couldn't figure out how you could see only one image when viewing the world with two eyes. Wheatstone correctly pointed out the advantage of having two eyes to see objects in 3D depth. However, the disparity also depends on the distance to the fixation as well, so that disparities must be further interpreted using estimates of the fixation distance.

Horopter, crossed- and uncrossed disparity

Horopter: imaginary 3D surface in the room in front of you that includes the object you are fixating on and all other points in 3D space that project to corresponding positions in the two retinae. The above picture is very misleading, however, because the geometric horopter (the set of points with zero disparity) is a circle that includes the fixation point and the optical centers of the two eyes (in the above picture, the labeled horopter should be much more curved, and curve back to pass through the two lenses).

Uncrossed disparity: An object farther away from you than the horoptor has uncrossed disparities. You'd have to uncross (diverge) your eyes to fixate on it. It lies further to the right from the right eye's viewpoint than from the left eye's viewpoint.

Crossed disparity: An object closer than the horoptor has crossed disparities. You'd have to cross (converge) your eyes to fixate on it. It is further to the left from the right eye's perspective.

Wheatstone stereoscope

Stereoscope: One way to view stereo image pairs is to use a mirror stereoscope. If you put your face in front of a pair of angled mirrors, and put two slightly different pictures off to the sides, your left eye will see the left picture (E') and your right eye will view the right-hand picture (E).

Stereogram: A pair of images (such as E/E' above) that are viewed using a stereoscope (or a red-green anaglyph). The two images in a stereogram are slightly different, with features in one image shifted to slightly different positions in the other image. The shifts mimic differences which ordinarily would exist between the views of genuine 3D objects.

Red-green anaglyph and stereo glasses

There are lots of ways to make and view stereograms. The basic concept is to present slightly different images to the two eyes. One way is to superimpose two half images, one in red and one in green. Viewed through red-green glasses, one eye sees the red image and the other eye sees the green image.

Stereograms have been part of popular culture in each generation since Wheatstone. Brewster stereoscopes (different design, but same in concept) were popular around 1900 with photographed stereo pairs. 3D movies were viewed with red-green anaglyph glasses in the 1950s. Current 3D movies are usually viewed with polarized glasses instead of red-green so the movies can be in color. Another recent technique is the "magic eye" autostereograms.

Random-dot stereogram: The random-dot stereogram was invented by Bela Julesz, a perceptual psychologist who was very influential over the past 30 years. In the example below, with anaglyph glasses you would see a square-shaped surface floating in depth in front of a background. Both the foreground square and the background have little dots painted on them in random locations.

Random dot stereogram example

This has important consequences. It indicates that you

can see depth without any other depth cue present (motion parallax, perspective, etc.),
can see depth without first extracting delineated form, or a recognizable object. This was a surprising result when Julesz first discovered it. It implies that binocular combination is early; it precedes processing of recognizable forms/shapes/objects.
can determine which dot in the left eye goes with (corresponds to) which dot in the right eye in the presence of many potential false matches. Presumably matches are chosen so that the inferred 3D scene is relatively smooth and continuous.

How to make a random-dot stereogram

To construct a random-dot stereogram, you first place a bunch of dots randomly in an image. Then make two copies of it. In one copy shift a central square region to the left and in the other copy shift the same central square region to the right. This leaves holes in each of the images (left over from where the square shifted from). Fill the holes with new random dots. Why do you see it in 3D? The shift mimics differences which ordinarily exist between the views of genuine 3D objects. The extra dots (X and Y above) correspond to those parts of the background that one eye can see, but which are occluded from the view of the other eye by the foreground square.

How does the visual see depth in a random-dot stereogram? One hypothesis is that the visual system matches up features of similar shape, size, contrast, etc. to estimate disparity. But, there can be lots of potential matches. In principle each dot present in one row of one half-image could have a large number of matches in the other half-image.

This problem of resolving this ambiguity is known as the problem of global stereopsis because the brain must find the correct overall (global) set of matches. It can't just try to find a mate for each feature independently. Global stereopsis is not just an issue for random-dot stereograms. Natural scenes (e.g., tree with leaves, carpet, etc.) have similar features. The visual system "solves" the global stereopsis problem by using additional constraints. For example, nearby points in the image are usually at nearby positions in depth, hence have nearly the same disparity.

Autostereogram: The autostereogram is also known as a "magic eye" stimulus. The trick is to display slightly different images to the two eyes. The autostereogram works by having repetitive patterns. To see depth in an autostereogram, you need to either cross or diverge your eyes so that they fixate separately on two different repeats of a repetitive pattern. In this way, you effectively get two different images to the two eyes. A simple example is the wallpaper illusion. If you view vertically striped wallpaper and fixate one eye on one stripe and the other eye on the another stripe, the stripes appear to pop out in depth in front of the wall. This is hard to do because you need to fixate on a point that is effectively in front of the picture while focusing/accomodating on the picture itself. Note that the depth you will perceive will be the opposite if you cross-fuse an autostereogram than if you diverge-fuse it.

Fusion: You have two eyes, and hence two visual fields. A big question in the early 19th century (before Wheatstone): we have two eyes, why don't we alway see two views of the world? Answer: the two images are combined in the brain to yield a single unified perceptual experience.

Panun's fusional area is the range of disparities, or equivalently the range of depths in 3D space on either side of the horoptor, over which the visual system can successfully fuse the two views. If disparity is small enough, within Panum's fusional area, then the visual system suceeds in fusing the two views. If disparity is too large then the neurons in the brain cannot cope with it to create single vision and you either get: diplopia or suppression or binocular rivalry.

Diplopia (double vision): Look at a distant object with boths eyes open. While fixating that object, put your index finger about 6 inches in front of your face. You will see two index fingers (one from the left eye's image and one from the right).

Suppression: This is what normally happens when the retinal disparity is too big (outside of Panum's fusional area). One eye's view dominates. That one is perceived. And the other eye's view is suppressed from awareness.

Binocular rivalry is a phenomenon we experience when the two eyes' views are very different from one another. One eye's view dominates for several seconds and is then replaced by that of the other eye. For example, if a horizontal grating presented to one eye and a vertical grating in the other eye, in the percept one might first see the horizontal for a few seconds, then a mixture, then the vertical for a few seconds, etc. The phenomenon of binocular rivalry is of particular interest in studying consciousness/visual awareness because the physical stimuli (the two gratings) do not change, yet the conscious percept changes dramatically over time. Moreover, we have no conscious control over the percept; you cannot by force of will cause the percept to switch from one to the other.

There are two ways to have single vision: (1) small disparity yields fusion and stereopsis, (2) large disparity often causes one eye's view to be suppressed. Binocular rivalry is a special case of suppression in which the suppression switches back and forth between the two views.

Note that you can have stereopsis in part of the visual field, diplopia in another part of the visual field and rivalry/suppression in yet another part of the visual field, all simultaneously.

Stereoblindness: 10% of people are stereoblind. Some are totally stereoblind, some are blind only to either crossed or uncrossed disparities. Some stereoblindness is caused by strabismus (wandering eye). If not treated/fixed at a very early age (infancy), binocular vision never develops properly. Some people with strabismus end up with amblyopia (sometimes called lazy eye). Amblyopia is a cortical blindness. Amblyopia is a general term for a visual deficit that has nothing to do with the optics or structure of eye and retina. In amblyopia, the brain basically ignores inputs from one eye. Other people with untreated strabismus end up as alternate fixators who can see with either eye, but never use them both at the same time. That is, they first look at you with their left eye (while the right eye is diverged), then switch and look at you with their right eye (while the left eye is diverged). In either case, there is no binocular vision and no stereopsis.

Stereovision in the brain: If you record with a micro-electrode from a V1 neuron while an animal views oriented lines presented separately to the two eyes and vary the disparity, some neurons are selective for particular disparities.

Disparity tuning of V1 neuron

This neuron does not respond at all when a line is shown to one eye at a time. To get a response, the line must be presented simultaneously to both eyes, it must have the correct orientation, direction of motion and the correct binocular disparity, in this case a disparity of about 1/2 deg of visual angle.

Distribution of disparity tunings

<>Neurons differ in the binocular disparities to which they are tuned. This is a histogram graphing the distribution of disparity preferences. Many neurons are tuned for 0 disparity (on the horoptor), but in addition there are neurons tuned for a range of crossed and uncrossed disparities. The picture that emerges is that for each location in the visual field, there is a collection of neurons selective for all different orientations, directions of motion, and binocular disparities. One way this might happen is for a neuron to have a binocular receptive field which sums inputs from slightly displaced positions in the two eyes.

Binocular rivalry in the brain: Logothetis and colleagues recorded from neurons in the inferior temporal lobe, an area of the brain believed to be involved in recognition (see lecture notes on recognition). Neurons in IT are very selective for stimulus patterns. Monkeys were trained to report their percepts during rivalry. At the same time, Logothetis recorded the responses of IT neurons. The neurons tracked the alternations in the monkey's reported percept during rivalry even though the physical stimulus never changed.

The top (A) part of the figure shows the responses of a neuron to several different images. This neuron responded strongly to a pictures of butterflys, but not to any other pictures. In particular, the sunburst pattern (top-right) evoked almost no action potentials. The monkey was trained to press the left bar for the sunburst, the right bar for the butterfly image. The bottom (B) part of the figure shows an example of the neuron's responses. The rivalry period is shown by the shaded (gray) background. During this period, the stimuli were presented simultaneously, one to the left eye and the other to the right eye. The unshaded region, on the other hand, corresponds to a non-rivalry period when only one stimulus was shown at a time, either to the left eye or to the right eye. The dotted vertical lines mark transitions between stimulus conditions. The horizontal light and dark bars show the time periods for which the monkey reported exclusive visibility of the left-lever (sunburst) and right-lever (e.g., butterfly) objects. Note that during rivalry the monkey reports changes in the perceived stimulus with no concomitant changes of the displayed images. Such perceptual alternations regularly followed a significant change in the neuron's activity, as shown by the individual spikes in the middle of each plot and by the firing rate graph below the spikes. Note the similarity of the responses elicited by the unambiguous presentation of the effective and ineffective stimuli (white, unshaded region) with those responses elicited before either stimulus becomes perceptually salient during rivalrous stimulation (gray-shaded region).

David Heeger