Sound texture perception via synthesis

J H McDermott, E P Simoncelli and A J Oxenham.

Published in Annual Meeting, ARO, Feb 2008.

This paper has been superseded by:
Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis
J H McDermott and E P Simoncelli.
Neuron, vol.71(5), pp. 926--940, Sep 2011.


Many natural sounds, such as those produced by rainstorms, fires, insects at night, or birds in a forest, are the result of large numbers of superimposed acoustic events occurring rapidly and randomly. Such "sound textures" are temporally homogeneous, and in many cases do not depend much on the precise arrangement of the component events, suggesting that they might be represented statistically. To test this idea and explore the statistics that might characterize natural sound textures, we designed an algorithm to synthesize sound textures from statistics extracted from real sounds. The algorithm is inspired by those used to synthesize visual textures, in which a set of statistical measurements from a real sound are imposed on a sample of noise. This process is iterated, and converges over time to a sound that obeys the chosen constraints. If the statistics capture the perceptually important properties of the texture in question, the synthesized result ought to sound like the original sound.

We tested whether rudimentary statistics computed from the responses of a bank of bandpass filters could produce compelling synthetic textures. Simply matching the marginal statistics (variance, kurtosis) of individual filter responses was generally insufficient to yield good results, but imposing various joint envelope statistics (cross-band correlations, autocorrelations within each band, and cross-band correlations across time) greatly improved the results, frequently producing synthetic textures that sounded natural and recognizable. Synthesizing some classes of textures may necessitate complex "feature detectors", but in many cases, textures with audible features (raindrops, crackles, insect/bird calls) emerge from the imposition of much simpler statistical constraints. The results suggest that the auditory system may rely on surprisingly simple statistics to recognize real-world sound textures.

[Supported by NIH grant R01DC07657 and the Howard Hughes Medical Institute].



Warning: Undefined array key 2 in /System/Volumes/Data/e/1.3/p1/lcv/html_public/pubs/makeAbs.php on line 304
  • Visual texture model: Portilla99
  • Listing of all publications