1: Proc Natl Acad Sci U S A. 2005 Feb 15;102(7):2293-8. Epub 2005 Jan 27. Related Articles, Cited Articles, Free in PMC, LinkOut
Click here to read Click here to read 
Speech recognition with amplitude and frequency modulations.

Zeng FG, Nie K, Stickney GS, Kong YY, Vongphoe M, Bhargave A, Wei C, Cao K.

Department of Anatomy and Neurobiology, University of California, Irvine, CA 92697, USA. fzeng@uci.edu

Amplitude modulation (AM) and frequency modulation (FM) are commonly used in communication, but their relative contributions to speech recognition have not been fully explored. To bridge this gap, we derived slowly varying AM and FM from speech sounds and conducted listening tests using stimuli with different modulations in normal-hearing and cochlear-implant subjects. We found that although AM from a limited number of spectral bands may be sufficient for speech recognition in quiet, FM significantly enhances speech recognition in noise, as well as speaker and tone recognition. Additional speech reception threshold measures revealed that FM is particularly critical for speech recognition with a competing voice and is independent of spectral resolution and similarity. These results suggest that AM and FM provide independent yet complementary contributions to support robust speech recognition under realistic listening situations. Encoding FM may improve auditory scene analysis, cochlear-implant, and audiocoding performance.

1: Science. 1995 Oct 13;270(5234):303-4. Related Articles, Cited in PMC, LinkOut
Click here to read 
Speech recognition with primarily temporal cues.

Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M.

House Ear Institute, Los Angeles, CA 90057, USA.

Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information. Temporal envelopes of speech were extracted from broad frequency bands and were used to modulate noises of the same bandwidths. This manipulation preserved temporal envelope cues in each band but restricted the listener to severely degraded information on the distribution of spectral energy. The identification of consonants, vowels, and words in simple sentences improved markedly as the number of bands increased; high speech recognition performance was obtained with only three bands of modulated noise. Thus, the presentation of a dynamic temporal pattern in only a few broad spectral regions is sufficient for the recognition of speech.