Hierarchically normalized models of visual distortion sensitivity: Physiology, perception, and applicationAlexander Berardino.PhD thesis, ,May 2018. Download: |
In this thesis, we approach the problem by building models informed and constrained by both visual physiology, and the statistics of natural images, and train them to match human psychophysical judgments about image distortions. We then develop a novel synthesis method that forces the models to make testable predictions, and quantify the quality of those predictions with human psychophysics. Because our approach links physiology and perception, it allows us to pinpoint what elements of physiology are necessary to capture human sensitivity to image distortions. We consider several different models of the visual system, some developed from known neural physiology, and some inspired by recent breakthroughs in artificial intelligence (deep neural networks trained to recognize objects within images at human performance levels). We show that models inspired by early brain areas (retina and LGN) consistently capture human sensitivity to image distortions better than both the state of the art, and better than competing models of the visual system. We argue that divisive normalization, a ubiquitous computation in the visual system, is integral to correctly capturing human sensitivity.
After establishing that our models of the retina and the LGN outperform all other tested models, we develop a novel framework for optimally rendering images on any display for human observers. We show that a model of this kind can be used as a stand in for human observers within this optimization framework, and produces images that are better than other state of the art algorithms. We also show that other tested models fail as a stand in for human observers within this framework.
Finally, we propose and test a normative framework for thinking about human sensitivity to image distortions. In this framework, we hypothesize that the human visual system decomposes images into structural changes (those that change the identity of objects and scenes), and non-structural changes (those that preserve object and scene identity), and weights these changes differently. We test human sensitivity to distortions that fall into each of these categories, and use this data to identify potential weaknesses of our model that can be improved in further work.