Responses of neural populations in macaque V4 to object and texture images

J D Lieber, T D Oleskiw, L Palmieri, E P Simoncelli and J A Movshon

Published in Proc. AREADNE: Research in Encoding And Decoding of Neural Ensembles, Jun 2024.

Humans and monkeys can effortlessly recognize objects in natural scenes. This ability relies on neural computations in the ventral stream of visual cortex. The intermediate computations that lead to object selectivity are not well understood, but previous studies implicate V4 as an early site of selectivity for object shape. To explore the mechanisms of this selectivity, we transformed two sets of images from photographs to create a continuum of images that span the space between the original images and ``scrambled'' textures. This was achieved by using the Portilla & Simoncelli texture synthesis procedure, which preserves the local statistics of the original image while discarding information about scene and shape. To create a continuum of images (an ``image family'') that smoothly varies between fully scrambled textures and natural images, we varied the size of scrambling regions from small localized regions to the whole image. For all sizes, these scrambling regions seamlessly covered the whole image, with modest overlap.

Using single electrodes, linear multielectrode arrays, and chronically implanted multielectrode arrays, we measured the responses of both well-isolated single units and multi-unit channels in awake macaque V4 to these scrambled images. On average, V4 neurons were slightly more active in response to the original photographs than to their scrambled counterparts. However, responses in V4 varied widely both across different cells and different sets of images. An important determinant of this variation was the effectiveness of image families at driving strong neural responses. Across the full V4 population, a cell's average evoked firing rate for a family reliably predicts that family's preference for natural over scrambled images. Accordingly, the cells that respond most strongly to each image family showed a much stronger difference between natural and scrambled images and a graded level of modulation for images of intermediate pooling sizes. This preference for natural images was delayed until ~50 ms after the onset of neuronal activity and did not peak in strength until 130 ms after activity onset.

Finally, V4 neural responses strongly separated natural images from all partial and full scrambling conditions, despite the fact that the least scrambled images in our set appear similar to the original natural images. We hypothesized that this separation might be better explained by the exquisite sensitivity of observers to minor degradations in the structure of natural images. To test this, we analyzed our image set with the Deep Image Structure and Texture Similarity metric (DISTS), an image-computable similarity measurement that predicts human judgements of image degradation. Distances measured with DISTS also showed a categorical separation of natural images from all scrambling conditions, and predicted distances measured from V4 neural responses better than simpler metrics like image pixel distance or the Structural Similarity Index Measure (SSIM). This suggests that V4 responses are highly sensitive to small deviations from natural image structure.


  • Listing of all publications