Speech sounds vary depending on the characteristics of the talker's voice, coarticulation from adjacent speech sounds, and differences in speaking rate. As a result, speech perception is highly context-dependent. The broader linguistic context (e.g., which word to expect in a sentence) could influence speech perception as well. There is considerable debate over how listeners integrate context with information from the speech signal, including whether higher-level linguistic information feeds back down to influence early speech perception.

One way to address this debate is to study context effects in real time as the speech signal unfolds. For instance, listeners adjust their perception of temporal cues in speech, such as voice onset time (VOT), based on the speaking rate of the talker. Using the visual-world eye-tracking paradigm, we have demonstrated that listeners use preceding sentence rate to interpret VOT, but they do not continue waiting for subsequent rate information (indicated by the length of the following vowel).

The N1 response, which tracks early perceptual encoding of speech sounds, varies based on semantic expectation (Getz & Toscano, 2019).

In other work, we have demonstrated that higher-level linguistic information feeds back down to influence low-level speech perception, demonstrating that the broader linguistic context can influence perception as well. In particular, the auditory N1, an ERP component that provides a measure of listeners' early perceptual encoding, varies depending on semantic context. For example, listeners expect that the word "park" is likely to follow "amusement". If presented with an a word that has an ambiguous VOT between "bark" and "park", listeners encode the sound similarly to an unambiguous "park", demonstrating an influence of lexical feedback on brain responses at the earliest stages of perception.

Together, these results shed light on how listeners perceive speech in context, demonstrating an important role for both acoustic context and top-down feedback from higher-level linguistic representations.

More information