Published in Computational and Systems Neuroscience (CoSyNe), Mar 2023.
Sensory neurons in many organisms and brain areas exhibit a form of local gain control that has been described as ``divisive normalization'' (Carandini & Heeger, 2011) in which the stimulus-driven activity of each neuron is divided by a factor involving the summed activity of a group of neurons. Redundancy reduction has been proposed as normative explanation of divisive normalization, and previous work has used this principle for learning normalization models (Schwartz & Simoncelli, 2000). Specifically, redundancy in filtered representations of images can be reduced by dividing by an estimate of the local standard deviation (the square root of a weighted average of squared responses). Incidentally, a local standard deviation estimation is also used for removing noise from image signals with spatially-varying variance (``heteroskedastic'') corrupted with noise. Based on this, we introduce a joint normalization/denoising model, and optimize both the input filters and the normalization weights to minimize estimation error (the squared distance between the input and the denoised images).
We find that learned input filters are oriented, and well-matched to the distribution of receptive field shapes found in macaque V1 (not shown). In addition, the learned normalization weights allow the model to reproduce the variations in surround suppression strength across different spatial locations found in V1. Finally, the normalized representation in our model provides a high-quality prediction of human perception of image quality. These results show that a denoising objective is capable of driving the learning of the divisive normalization model (including both input filters and normalization weights). Given the ubiquity of noise in the brain, we expect these principles to be applicable to other stages of visual hierarchy, and to other sensory systems.