Empirical Derivation of Acoustic Grouping Cues from Natural Sound StatisticsJ H McDermott, D P W Ellis and E P SimoncelliPublished in 34th midWinter Meeting, Assoc. for Research in Otolaryngology, Feb 2011 . |
Our approach stems from the observation that the signals in incorrect mixture decompositions will themselves tend to be partial mixtures of the true sources. Grouping cues might thus be sound properties that have different values for individual sources compared to mixtures. Using databases of natural sound source recordings, we can evaluate sound statistics of individual sources and their mixtures, and search for statistics that should be useful for segregation.
We processed thousands of speech excerpts and their mixtures with an auditory model bfilter bank. From these filter responses we measured a large set of simple statistics that we have shown to be perceptually relevant in the analysis and synthesis of natural sound textures (McDermott & Simoncelli, 2011). The statistics included marginal statistics, capturing sparsity and modulation power, and correlations between filter responses. We found that most statistics, some of which relate to conventional grouping cues, helped to discriminate sources from mixtures. The results suggest that acoustic grouping cues are more diverse than has previously been suspected, and point the way to new perceptual experiments and machine algorithms for sound segregation.