Metamers of the ventral stream

J Freeman and E P Simoncelli

Published in Computational and Systems Neuroscience (CoSyNe), (T-28), Feb 2010.

DOI: 10.3389/conf.fnins.2010.03.00053

This paper has been superseded by:
Metamers of the ventral stream
J Freeman and E P Simoncelli.
Nature Neuroscience, vol.14(9), pp. 1195--1201, Sep 2011.

  • Official (pdf)
  • Supplementary Materials

  • How is image structure encoded in the extrastriate ventral visual pathway? Direct characterization of the stimulus selectivity of individual extrastriate cells has proven difficult. However, one robust population-level property of all visual areas is that receptive field sizes grow with eccentricity. It has also been reported (Gattass et al., 1988) that the rate of growth increases along the ventral stream. We hypothesize that this successive increase in pooling region size causes information loss. A well known example occurs in the retina, where spatial pooling in the periphery means that high spatial frequency information is lost. In general, stimuli that differ only in terms of information discarded by the visual system will be indistinguishable to a human observer. Such stimuli are called metamers. Here, we probe the population-level computations of the ventral stream using novel metameric stimuli.

    Starting from any prototype image, we generate stimuli that match in terms of the responses of a simple model for extrastriate ventral computation. The model is based on measurements previously used to characterize visual texture (Portilla & Simoncelli, 2000). The model decomposes an image using a bank of V1-like filters tuned for local orientation and spatial frequency, computing both simple and complex-cell responses. Extrastriate responses are then computed by taking pairwise products amongst these V1 responses, and averaging within overlapping spatial regions that grow with eccentricity. Stimuli are generated by using gradient descent to adjust a random (white noise) image to match the model responses of the original prototype. Previous work showed that the same statistics, averaged over an entire image, allow for the analysis and synthesis of homogenous visual textures.

    If this model accurately reflects representations in early extrastriate areas, then images synthesized to produce identical model responses should be metameric to a human observer. For each of several natural images and pooling region sizes, we generate multiple samples that are statistically-matched but otherwise as random as possible. We use a standard psychophysical task to measure observers' ability to discriminate between image samples, as a function of the rate at which the statistical pooling regions grow with eccentricity. When image samples are statistically matched within small pooling regions, observers perform at chance (50%), failing to notice substantial differences in the periphery. When images are matched within larger pooling regions, discriminability approaches 100%. We fit the psychometric function to estimate the pooling region over which the observer estimates statistics. The result is consistent with receptive field sizes in macaque mid-ventral areas (particularly V2).

    Our model also fully instantiates a recently proposed explanation (Balas et al., 2009) of the phenomenon of ``visual crowding'', in which humans fail to recognize a peripheral target object surrounded by background clutter. In our model, crowding occurs because multiple objects fall within the same pooling region and the model responses cannot uniquely identify the target object. We synthesize images that are metameric to classic crowding stimuli (e.g. groups of letters), and find that stimulus configurations that produce crowding yield synthesized images with jumbled, unidentifiable objects.


    References:


  • Listing of all publications