A spatial subunit model for V2 receptive fields reveals heterogeneous receptive field structure

B Vintch, E P Simoncelli and J A Movshon

Published in Annual Meeting, Neuroscience, Nov 2011.

Primate visual area V2 is often thought to process higher-order visual information that cannot be captured by the locally oriented receptive fields in area V1. Nonetheless, the receptive field structure of neurons in macaque V2 is still imperfectly understood. We have found that many of the receptive fields in V2 are spatially heterogeneous, and propose that this property could underlie the analysis of complex visual features.

We recorded the single-unit activity of 102 V2 neurons and 42 V1 neurons in the macaque cortex and sought to characterize the spatial structure of their receptive fields. We presented a sequence of images, refreshed every 100 ms, that were designed to drive a population of V1 neurons independently. Each frame contained of multiple locally oriented elements covering a region roughly twice the diameter of the RF, and these elements varied in contrast and orientation over position. Overall, stimuli evoked strong responses from both V1 and V2 neurons.

V2 neurons receive their primary input from area V1, and we used this knowledge to fit a feed-forward receptive field model to the V2 responses. First, we computed responses of a spatial array of 2300 diversely tuned model V1 cells (squared outputs of directional derivative filters) for each frame of the stimulus. Then, we found the best-fitting linear combination of these responses that could explain the firing rate of each recorded V2 cell. Given the number of parameters in the model and the amount of data we collected for each neuron, we solved for the weights by optimizing an objective function that favored sparse solutions. The resulting model yielded good cross-validated performance ( = 0.34).

Neurons in both V2 and V1 were better fit by our 2-stage model than by simpler spatial models. In V2, our model outperformed the spike-triggered average or a best-fitting single complex cell by about 24%. In V1, this value was about 26%. The cells in V2 that were least well fit by single V1 model cell often exhibited spatial receptive field structures that could possibly support mid-level feature analysis. Cells that were curved across space were common, and a few cells appeared to signal T-junctions. For a subset of cells we used a "playback" procedure to obtain independent verification that the model explained relevant aspects of the receptive fields. We used the fitted model to generate a set of images predicted to maximally excite or suppress the cellbinhomogeneouss firing rate. On average, the model could account for 90% of the explainable variance of repeated V2 responses to these simple stimuli.


  • Listing of all publications