Eigen-Distortions of Hierarchical Representations

Deep Neural Networks trained for object recognition

We begin by examining discrimination predictions derived from the deep convolutional network known as VGG16 (Simonyan and Zisserman [2015]). In their paper, Johnson et al. [2016] trained a neural network to generate super-resolution images using the representation of an intermediate layer of VGG16 as a perceptual loss function, and showed that the images this network produced looked significantly better than images generated with simpler loss functions (e.g. pixel-domain mean squared error). Henaff and Simoncelli [2016] used VGG16 as an image metric to synthesize minimal length paths (geodesics) between images modified by simple global transformations (rotation, dilation, etc.). The authors found that a modified version of the network produced geodesics that captured these global transformations well (as measured perceptually), especially in deeper layers. Implicit in both of these studies, and others like them (e.g., Dosovitskiy and Brox [2016]), is the idea that training a deep neural network to recognize objects may result in a network with other human perceptual qualities.

Here, we compare VGG16’s sensitivity to distortions directly to human perceptual sensitivity to the same distortions. We computed eigen-distortions of VGG16 at 6 different layers: the rectified convolutional layer immediately prior to the first max-pooling operation (Front), as well as each subsequent layer following a pooling operation (Layer2 through Layer6).

Example Distortions

Click on any of the images below to see most- and least-noticeable Eigen-distortions for each of the models we tested.

Parrot	Hats	Bikes	Houses	Boats	Door