Unsupervised learning of image manifolds with mutual informationD A Klindt, J Ballé, J Shlens and E P SimoncelliPublished in From Neuroscience to Artificially Intelligent Systems (NAISys), Nov 2020. |
We propose a model layer implementing an overcomplete, convolutional linear expansion of the input signal, followed by divisive normalization (a form of local competition), and then a projection onto a low-dimensional embedding space. A topological organization in the embedding space is learned by maximizing a lower bound on mutual information between the input and the representation (Oord, Li & Vinyals, 2018, arXiv). We also include a term to maximize marginal entropy of the normalized responses. Combined, these terms encourage neurons that are equally utilized, but have locally sparse responses within the embedding space. When model layers are stacked, and the system optimized end-to-end, it learns a representation of the tangent directions on the data manifold at increasing scales and abstraction levels. The objective can also be combined with a supervised task loss.
We train a model with 3 layers on both MNIST and CIFAR10. We fix a two dimensional embedding space for visualization and, through a series of ablation experiments, demonstrate that: (1) Training only the classification objective yields somewhat unstructured filters that are not organized in any discernible pattern; (2) Including the mutual information term produces more structured filters, but many of them lie in the same location in the embedding space, and thus do not fully utilize the capacity of the system; and (3) Additionally maximizing the marginal entropy of the normalized responses encourages full use of all neurons, and yields a solution with highly structured filters that approximately uniformly sample the data manifold, with clearly evident continuity of feature attributes. For example, the first layer contains oriented filters laid out topologically, similar to the orientation tuning maps found in primate V1.