Learning a visual representation by maximizing manifold capacity

T Yerxa, Y Kuang, E P Simoncelli and SY Chung

Published in Computational and Systems Neuroscience (CoSyNe), Mar 2023.

This paper has been superseded by:
Efficient coding of natural images using maximum manifold capacity representations
T E Yerxa, Y Kuang, E P Simoncelli and SY Chung.
Adv. Neural Information Processing Systems (NeurIPS), vol.36 Dec 2023.


Biological visual systems learn complex representations of the world that support a wide range of cognitive behaviors without using a large number of labelled examples. The efficient coding hypothesis suggests that this is accomplished by adapting the sensory representation to the statistics of the input signal, i.e. in a way that facilitates redundancy reduction. Visual signals have several clear sources of redundancy. They evolve slowly in time, since temporally adjacent inputs typically correspond to different views of the same scene, which in turn are usually more similar than views of distinct scenes. Moreover, the variations within individual scenes often correspond to variations in a small number of parameters, such as those controlling viewing and lighting conditions. Motivated by these observations, we seek to learn a function that represents different views of the same scene with compact, low dimensional manifolds while simultaneously maximizing the separation between manifolds representing distinct scenes. Recent theoretical advances describe how these two notions of extent (dimensionality and size) can be combined in order to measure ``manifold capacity'', a measure of the number of manifolds that can be linearly separated from each other in the representation space. In this work we demonstrate that optimizing a network for manifold capacity results in a representation that supports near state-of-the-art object recognition on several datasets, and is robust to adversarial stimulus perturbations. Both results are consistent with biological networks, since (1) performance on object classification has been shown to be correlated with fits to neural data and (2) vulnerability to adversarial stimulus perturbations is one of the hallmark differences between artificial and biological perception. This is an important first step demonstrating that coding efficiency can serve as a normative principle underlying robust object recognition.
  • Listing of all publications