Sound texture perception via statistics of peripheral auditory representationsJ H McDermott and E P SimoncelliPublished in Computational and Systems Neuroscience (CoSyNe), (III-95), Feb 2011.This paper has been superseded by:
|
Natural sounds were processed with a cascade of two filter banks, representing cochlear channels and modulation bands computed from their compressed envelopes. We measured marginal moments and pair-wise correlations of these filter responses, capturing spectral and temporal structure, and sparsity. Our synthesis algorithm then imposed these statistics on samples of noise. Although the statistics in our model were not hand-tuned to specific natural sounds, their imposition produced compelling synthetic examples of a large set of real-world sound textures (available at http://www.cns.nyu.edu/~jhm/texture_examples/). Omitting any individual class of statistic audibly impaired the results. Moreover, sounds synthesized using representations qualitatively distinct from those in the auditory system (linear- instead of log-spaced filter banks, or without cochlear compression) generally did not resemble their real-world counterparts, indicating that successful synthesis depends on, and reflects, the use of a biologically plausible representation. The results show that relatively simple statistics can underlie sound texture percepts, and illustrate how sound textures and their synthesis can serve as engines for the investigation of audition.