Learning predictive neural representations by straightening natural videos

X Niu, C Savin and E P Simoncelli

Published in Computational and Systems Neuroscience (CoSyNe), Mar 2023.

Recent experiments demonstrate that the brain transforms visual inputs into representations that follow straighter temporal trajectories than their initial photoreceptor encoding, facilitating prediction by linear extrapolation (Hénaff et al 2019; 2021). Can the brain use this principle to learn visual representations? Here, we develop an objective that quantifies straightening, augment it with a regularizer to prevent collapse to trivial solutions, and use it to train deep feedforward neural networks on video sequences. Decoding of the learned representation with a separately trained readout network reveals that the representation preserves visual information in video frames, and can make good next-frame predictions. Separate SVM decoders reveal that the representation has isolated visual and physical identities in videos, including object category, position, shape, and types of motion. When the straightening objective is applied at different levels of the hierarchy and corresponding temporal scales, the same learning procedure yields hierarchical temporal representations that can predict future inputs at multiple time scales. The local, fast representations learned by our model encode and predict fine details of local motion, while the global and slow representations encode visual aspects that persist for longer durations. Overall, our model provides a potential mechanism by which the visual system can partition and represent features at different spatial and temporal resolutions along the visual hierarchy.
  • Listing of all publications