Self-supervised learning of a visual texture representation for cortical area V2N Parthasarathy and E P SimoncelliPublished in From Neuroscience to Artificially Intelligent Systems (NAISys), Nov 2020. |
In this work, we develop a parametric functional model for V2. The model uses a first stage of oriented linear filters (corresponding to cortical area V1), consisting of both rectified units (simple cells) and pooled phase-invariant units (complex cells).These responses are provided as input to a V2 stage that consists of a set of learned convolutional filters followed by half-wave rectification and pooling to generate V2 'complex cell' responses.
We optimize the filters in the V2 stage over a dataset of homogeneous texture images, using a novel learning objective that aims to separate texture image families in the V2 response space. Rather than use texture class labels as a supervision signal, we develop a more biologically plausible self-supervised objective function that is inspired by contrastive learning methods developed in machine learning. The objective aims to maximize the distance between the distribution of V2 responses to each individual image and the distribution of responses across all images We use this method to learn a single layer of V2 filters, but the layer-wise nature of the objective provides the potential to learn filters in multiple stages of a hierarchical model without requiring backpropagation of gradients across multiple layers.
Our trained model successfully captures texture family invariances in a low-dimensional representation, surpassing texture classification performance of deep supervised networks when trained in small data regimes. Moreover, we show that relative to deep nets, the learned model has a stronger representational similarity to both texture responses of neural populations (recorded in primate V2) and human texture perception.