Hierarchically normalized models of visual distortion sensitivity: Physiology, perception, and application

Berardino,  Alexander

Hierarchically normalized models of visual distortion sensitivity: Physiology, perception, and application

Alexander Berardino.

PhD thesis, ,
May 2018.

Download:

How does the visual system determine when changes to an image are unnatural (image distortions), how does it weight different types of distortions, and where are these computations carried out in the brain? These questions have plagued neuroscientists, psychologists, and engineers alike for several decades. Different academic communities have approached the problem from different directions, with varying degrees of success. The one thing that all groups agree on is that there is value in knowing the answer to the question. Models that appropriately capture human sensitivity to image distortions can be used as a stand in for human observers in order to optimize any algorithm in which fidelity to human perception is necessary (i.e. image and video compression).

In this thesis, we approach the problem by building models informed and constrained by both visual physiology, and the statistics of natural images, and train them to match human psychophysical judgments about image distortions. We then develop a novel synthesis method that forces the models to make testable predictions, and quantify the quality of those predictions with human psychophysics. Because our approach links physiology and perception, it allows us to pinpoint what elements of physiology are necessary to capture human sensitivity to image distortions. We consider several different models of the visual system, some developed from known neural physiology, and some inspired by recent breakthroughs in artificial intelligence (deep neural networks trained to recognize objects within images at human performance levels). We show that models inspired by early brain areas (retina and LGN) consistently capture human sensitivity to image distortions better than both the state of the art, and better than competing models of the visual system. We argue that divisive normalization, a ubiquitous computation in the visual system, is integral to correctly capturing human sensitivity.

After establishing that our models of the retina and the LGN outperform all other tested models, we develop a novel framework for optimally rendering images on any display for human observers. We show that a model of this kind can be used as a stand in for human observers within this optimization framework, and produces images that are better than other state of the art algorithms. We also show that other tested models fail as a stand in for human observers within this framework.

Finally, we propose and test a normative framework for thinking about human sensitivity to image distortions. In this framework, we hypothesize that the human visual system decomposes images into structural changes (those that change the identity of objects and scenes), and non-structural changes (those that preserve object and scene identity), and weights these changes differently. We test human sensitivity to distortions that fall into each of these categories, and use this data to identify potential weaknesses of our model that can be improved in further work.

Listing of all publications