Natural Image Densities: Learning, Understanding and Utilizing

Kadkhodaie,  Zahra

Natural Image Densities: Learning, Understanding and Utilizing

Zahra Kadkhodaie.

PhD thesis, ,
Sep 2024.

Download:

Many problems in image processing and computer vision rely, explicitly or implicitly, on statistical density models. Describing the full density of natural images, 𝑝(𝑥), is a daunting problem given the dimensionality of the signal space. Traditionally, models have been developed by combining assumed symmetry properties, with simple parametric forms, often within pre- specified transformed coordinate systems. While these models have led to steady advances in problems such as denoising, they are too simplistic to generate complex features that occur in our visual world.

Deep neural networks have provided state-of-the-art solutions for problems such as denoising, which implicitly rely on a prior probability model of natural images. Here, we first develop a robust and general methodology for extracting the prior. We rely on a statistical result due to Tweedie (1956) and Miyasawa (1961), who showed that the least-squares solution for removing additive Gaussian noise can be written directly in terms of the gradient of the log of the noisy signal density. We use this fact to develop a stochastic coarse-to-fine gradient ascent procedure for drawing high-probability samples from the implicit prior embedded within a neural network trained to perform blind (i.e., unknown noise level) least-squares denoising. This algorithm is similar to score-based diffusion framework, yet different in several ways.

Unlike the classical framework, we do not have direct access to the learned density, which gives rise to a crucial question: what is the prior? The rest of the thesis focuses on understanding and using this prior.

At the core of our coarse-to-fine gradient ascent sampling algorithm is a deep neural network (DNN) denoiser. Despite their success, we lack an understanding of the DNN denoiser mechanisms and more importantly what priors are being learned by these models. In order to make the DNN denoiser interpretable, we remove all network biases (i.e. additive constants), to enforce the denoising mapping to become locally linear. This architecture lends itself to local linear algebraic analysis through the Jacobian of the denoising map, which provides a high level interpretability. A desired side effect of locally linear models is that they generalize automatically across noise levels.

Next, we study the continuity of the implicit image prior. We design an experiment to investigate whether the prior interpolates between the training examples or consists of a discrete set of delta functions corresponding to a memorized set of training examples. We find that for small datasets, the latter is the case. But with large enough datasets, the network generalizes beyond training examples, evidenced by high quality novel generated samples. Surprisingly, we observe that, for large enough datasets, two models trained on non-overlapping subsets of a dataset learn nearly the same density. We analyze the learned denoising functions and show that the inductive biases give rise to a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous regions. We demonstrate that trained denoisers are inductively biased towards these geometry-adaptive harmonic bases.

Having established that a DNN denoiser can generalize, we employ the learned image density to study the question of low-dimensionality of image priors. The goal is to exploit image properties to factorize the density into low dimensional densities, thereby reducing the number of parameters and training examples. To this end, we develop a low-dimensional probability model for images decomposed into multi-scale wavelet sub-bands. The image probability distribution is factorized as a product of conditional probabilities of its wavelet coefficients conditioned by coarser scale coefficients. We assume that these conditional probabilities are local and stationary, and hence can be captured with low-dimensional Markov models. Each conditional score can thus be estimated with a conditional CNN (cCNN) with a small receptive field (RF). The effective size of Markov neighborhoods (i.e. the size w.r.t to the grid size) grows from fine to coarser scales. The score of the coarse-scale low-pass band (a low-resolution version of the image) is modeled using a CNN with a global RF, enabling representation of large-scale image structures and organization. We evaluate our model and show that locality and stationarity assumptions hold for conditional RF sizes as small as 9 × 9 without harming performance. Thus, high-dimensional score estimation for images can be reduced to low-dimensional Markov conditional models, alleviating the curse of dimensionality.

Finally, we put the denoiser prior into use. A generalization of the coarse-to-fine gradient ascent sampling algorithm to constrained sampling provides a method for using the implicit prior to solve any linear inverse problem, with no additional training. We demonstrate the generality of the algorithm by using it to produce high-quality solutions in multiple applications, such as deblurring, colorization, compressive sensing, and super resolution.

Listing of all publications