Published in Computational and Systems Neuroscience (CoSyNe), Mar 2022.
The efficient coding hypothesis posits that sensory systems are adapted to the statistics of their inputs, capturing essential structure while minimizing the use of resources (neurons, spikes, etc). A variety of formulations have been developed and, differing primarily in their definition of efficiency. For example, Independent Components Analysis [Bell and Sejnowski, 1997] seeks a complete set of axes along which the data distribution is heavy-tailed. Sparse Coding [Olshausen and Field, 1996] learns a set of basis functions that can sparsely reconstruct natural image patches (i.e., using a small subset). And recent work generalizes this to local low dimensionality, by seeking a spatially adaptive set of axes (as opposed to a subset of a fixed basis) in which the data lie [Hénaff et al., 2015]. For each of these, the efficiency objective by itself is insufficient -- minimization leads to a trivial solution in which all inputs are mapped to zero -- and this type of ``representational collapse'' is typically avoided by imposing a constraint that the signal can be reconstructed from the representation. While this has proven effective from an optimization perspective, there is little evidence to suggest that image reconstruction occurs in biology, or that such complete information preservation is either necessary or desirable. Here, we develop a novel contrastive objective that avoids the need to reconstruct the input from the representation. Specifically, we minimize the dimensionality of encodings of spatially local image patches relative to their global dimensionality (measured across all image patches). We construct the objective as a continuous relaxation of the discrete dimensionality, allowing for gradient- based optimization, and plausible biological implementation. Although our method does not involve image reconstruction or any other proxy for mutual information between the signal and representation, it is able to generate a rich set of receptive fields that better capture the diversity of tuning properties found in V1 than either Sparse Coding or Independent Components Analysis.