Sound texture perception via statistics of peripheral auditory representations

J H McDermott and E P Simoncelli

Published in 34th midWinter Meeting, Assoc. for Research in Otolaryngology, Feb 2011 .

This paper has been superseded by:
Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis
J H McDermott and E P Simoncelli.
Neuron, vol.71(5), pp. 926--940, Sep 2011.


The sounds of rainstorms, fires, swarms of insects, and galloping horses result from the superposition of many acoustic events. A defining characteristic of these "auditory textures" is stationarity, and this reduction in complexity makes them a useful starting point for understanding sound representation. We have previously proposed (McDermott, Oxenham, & Simoncelli, 2009) that the auditory system encodes and recognizes sound textures using statistics -- time-averages of the simple acoustic measurements made in the early auditory system. We have explored this hypothesis with synthesis algorithms, on the grounds that statistics responsible for perception should be sufficient to synthesize realistic sounding signals. Here we extend our statistical model to be compatible with the known structure of the auditory system, and test the role of different statistics and representational properties with experiments on human listeners.

Natural sounds were processed with a cascade of two filter banks, representing cochlear channels and modulation frequency bands. We measured marginal moments and pair-wise correlations of these filter responses, capturing spectral and temporal structure, and sparsity. Our synthesis algorithm then imposed these statistics on samples of noise. Although the statistics in our model were not hand-tuned to specific natural sounds, their imposition produced compelling synthetic examples of a large set of real-world sound textures. Omitting any individual class of statistics audibly impaired the results. Moreover, sounds synthesized using filters qualitatively distinct from those in the auditory system generally did not resemble their real-world counterparts, indicating that successful synthesis depends on, and reflects, the use of a biologically plausible representation. The results show that simple statistics can underlie sound texture percepts, and illustrate how sound textures and their synthesis can serve as engines for investigations of audition.


  • Listing of all publications