Modeling visual cortex by maximizing layerwise multiscale manifold capacity

T E Yerxa, SY Chung and E P Simoncelli

Published in 9th Annual meeting, Computational Cognitive Neuroscience, Aug 2026.

Download:
  • Reprint (pdf)

  • Deep neural networks provide strong predictive models of primate visual cortex, but are typically trained with end-toend objectives requiring global backpropagation. We introduce ST-MMCR, a layerwise self-supervised learning scheme that trains successive stages of a convolutional hierarchy with local objectives matched to their receptivefield scale. The objective is based on maximum manifold capacity representations: temporally nearby views are pooled into compact, discriminable manifolds using projection, local spatial pooling, temporal pooling, and a nuclear-norm loss. This implements complexity matching through the architecture rather than through separate hand-crafted augmentation streams as was done in previous work. Evaluated on macaque V1, V2, and V4 datasets and human-aligned object-classification benchmarks, ST-MMCR matches or exceeds architecturematched supervised and self-supervised baselines in neural predictivity, approaches adversarially robust models, and improves out-of-distribution and human-aligned behavior when used as a visual front end.
  • Listing of all publications