Learning and utilizing a prior for natural images with deep neural networks

Z Kadkhodaie and E P Simoncelli

Published in From Neuroscience to Artificially Intelligent Systems (NAISys), Nov 2020.

Many theories of biological visual representation, as well as applications in artificial vision and image processing, rely on statistical prior models for natural images. Describing the full density of natural images, p(x), is a daunting problem: Images are complex and simple parametric models (e.g., Gaussian) do not suffice to capture these complexities. Images are also high-dimensional, and thus learning a nonparametric model from samples (e.g., by constructing a histogram) is infeasible. How can the visual system (or a vision researcher) learn such a prior by observing images of the natural world?

Recent advances in machine learning have demonstrated impressive solutions to many problems that implicitly rely on prior probability models. For example, deep neural networks (DNNs) can be trained to map noisy images to their clean counterparts (``denoising''), by optimizing their parameters over a training set of {noisy,clean} image pairs. Such a network has some implicit knowledge of what a clean image is supposed to look like. In other words, the network implicitly knows what the density of natural images is. How can we access this implicit density model?

The seed of a solution can be found in the statistics literature on Empirical Bayes estimation. In the case of Gaussian noise, the relationship between the denoising mapping and the underlying density was made explicit by Miyawasa (1956). He developed an exact expression for the least-squares estimator for measurements corrupted by Guassian noise, in which the noisy observation is modified by adding the gradient of the log of the observation density (which is a Gaussian-blurred version of the signal density). Thus, a denoiser provides direct access to the gradient of the log of its implicit density of noisy images.

To make use of this, we use a current state-of-the-art denoising DNN, and develop an iterative procedure using Langevin dynamics to draw samples from p(x). As a form of Turing test, we demonstrate that image patches sampled from this density are difficult to distinguish from patches taken from photographs. We also use a constrained version of this sampling procedure to obtain high-quality solutions for any linear inverse problems: For example, inpainting, deblurring, and super-resolution.


  • Listing of all publications