Speech sounds vary depending on the characteristics of the talker's voice, coarticulation from adjacent speech sounds, and differences in speaking rate. As a result, speech perception is highly context-dependent. The broader linguistic context (e.g., which word to expect in a sentence) could influence speech perception as well. There is considerable debate over how listeners integrate context with information from the speech signal, including whether higher-level linguistic information feeds back down to influence early speech perception.
One way to address this debate is to study context effects in real time as the speech signal unfolds. For instance, listeners adjust their perception of temporal cues in speech, such as voice onset time (VOT), based on the speaking rate of the talker. Using the visual-world eye-tracking paradigm, we have demonstrated that listeners use preceding sentence rate to interpret VOT, but they do not continue waiting for subsequent rate information (indicated by the length of the following vowel).The N1 response, which tracks early perceptual encoding of speech sounds, varies based on semantic expectation (Getz & Toscano, 2019).
Together, these results shed light on how listeners perceive speech in context, demonstrating an important role for both acoustic context and top-down feedback from higher-level linguistic representations.