Sound texture perception via synthesisJ H McDermott, A Oxenham and E P SimoncelliPublished in Computational and Systems Neuroscience (CoSyNe), (I-74), Feb 2010.DOI: 10.3389/conf.fnins.2010.03.00122 This paper has been superseded by:
|
We found that simply matching the marginal statistics (variance, skew, kurtosis) of individual filter responses and their envelopes was generally insufficient to yield perceptually satisfactory results, producing compelling synthetic examples only for certain water sounds. We observed that many sound textures contained structure in frequency and time, evident in pair-wise envelope correlations (between different subbands, and between different time points within each band). Imposing these envelope correlations greatly improved the results, frequently producing synthetic textures that sounded natural and that listeners could reliably recognize. Sounds signals that were successfully synthesized in this way included bubbling water, thunder, insect, frog, and bird choruses, applause, running animals, and frying eggs, among many others.
Despite these successes, there were cases for which synthesized sounds sounded notably different from the corresponding original sound, despite having the same marginal statistics and envelope correlations. Examples of failures included sounds with abrupt broadband onsets, pitch-varying harmonic structure, or strong reverberation. These failures indicate that the statistics we imposed are insufficient to capture these sound qualities, and that the auditory system must be utilizing additional measurements. Our current efforts are directed towards identifying new statistics to account for these sound properties.
Our results suggest that statistical representations could underlie sound texture perception, and that in many cases the auditory system may rely on fairly simple statistics. Although we lack definitive evidence that the precise set of statistics used in our model are instantiated in the auditory system, we note that they are of a form that could plausibly be computed with simple neural circuitry. Our method provides a means of testing the perceptual importance of such statistics, and of generating new forms of experimental stimuli that are precisely characterized, yet share important properties with real-world sounds.
References: