We present a novel method for self-supervised learning of representations that are equivariant to a set of transfor- mations. When trained on images, we demonstrate that the learned representations effectively factorize sources of variability in their inputs, and provide improved pre- diction of responses of cells in macaque visual area IT across four different datasets.