Listeners do not need an F1/F2 target to perceive vowel quality or regional accentedness

A poster presented at Midphon 30, Bloomington, IN 2025 (ASC-Midphon30.pdf)

Authors

Kevin B. McGowan

Stella Takvoryan

Published

October 8, 2025

Background

Silent centers (SC): listeners can identify vowel quality in a CVC syllable even with 65% of tense vowels and 50% of lax vowels removed (Strange & Jenkins, 2013). However, listeners may still require vowel centers to hear social information. Three complementary ideas in the literature suggest that social information in vowel centers may be essential:


1. Primacy of F1/F2 at the vowel midpoint, sometimes taken along with duration, e.g., sociophonetics, sound change, second language acquisition, etc. (Kelley & Tucker, 2020; Labov et al., 1972; Nycz & Hall-Lew, 2013; Thomas, 2014)

  1. Hybrid silent centers (Rakerd & Verbrugge, 1987; Verbrugge & Rakerd, 1986): pairing SC syllable edges from different talkers does not undermine vowel perception so argue vowel edges do not carry social information

  2. Vowel normalization (Johnson, 2005; Johnson & Sjerps, 2021) assumes that variation is problematic for listeners so models typically operate on vowel centers where contextual variation is least (c.f., Barreda, 2025; Fruehwald, 2024)

Methodology

  • Talkers: Three non-Southern talkers from the Wildcat corpus (Van Engen et al., 2010) and two Southern talkers (KY)

  • Stimuli: BVT syllables with [i, ɪ, e, ɛ, æ, u, ʊ, o, ʌ, ɔ, a]; middle 50% for lax vowels & middle 65% for tense vowels (Strange et al., 1983) excised with a custom Praat script (see Figure 7)

Figure 1: Southern talker producing ‘bit’, full vowel
Figure 2: Southern talker producing ‘bit’, SC vowel
Figure 3: Non-southern talker producing ‘bit’, full vowel
Figure 4: Non-southern talker producing ‘bit’, SC vowel
left word right word
bait bat
beat bait
beat bit
beat boot
bet bat
bit bet
boat boot
boat but
bought but
put pot
Table 1: Word Pairings for ‘Who?’ Trials
  • Procedure: 2AFC; listeners heard a CVC word and answered either “what did you hear?” with a pair of words or “who did you hear?” and the maps in Figure 5. “What?” trials displayed a map congruent with the talker; “Who?” task trials displayed the word that was being spoken. 240 ‘what?’ trials and 144 ‘who?’ trials.
Figure 5: “Non-Southern” and “Southern” stimuli


  • Participants: 60 US participants recruited via Prolific

  • Analysis: BRMS logistic regression in R (Bürkner, 2017), NHST with bayestestR (Arel-Bundock et al., 2024; Makowski et al., 2019)

  • Many studies have found that listeners perform poorly when asked to label regional accents (Campbell-Kibler, 2025; Clopper & Pisoni, 2004; Milroy & McClenaghan, 1977). Our simplified maps are intended to represent Clopper & Pisoni’s “dialect clusters”

  • While it is clear that listeners do not need vowel centers to perceive vowel quality accurately, it is not yet known whether listeners can perceive, for example, regional accent without the vowel center.


Predictions

Figure 6: Predictions under two assumptions about social information

Silent Centers Visualized

Figure 7: Vowel stimuli unnormed F1/F2 DCTS with excised portions indicated

Results

Summary: While Southern talkers were perceived less accurately overall, there is no significant difference between listeners’ ability to perceive either vowel quality or region in the Full Vowel and Silent Centers vowel manipulation conditions.

See Figure 9 for model parameters
Figure 8: ‘What?’ (top left) and ‘Who?’ (bottom left) model predictions (95% HDI) and Accuracy differences for responses to Non-Southern (top row) and Southern (bottom row) talkers
Figure 9: Model coefficient parameter estimates for ‘what?’ and ‘who?’ trials

Discussion

  • Listeners do not need the vowel center to perceive vowel quality (replicated): Listener accuracy on the ‘what?’ trials is a straightforward, successful replication of the silent centers effect (Strange et al., 1983; Strange & Jenkins, 2013)
  • Listeners do not need the vowel center to perceive regional accentedness: Contra hybrid silent-centers work that paired incongruous syllable edges (Rakerd & Verbrugge, 1987; Verbrugge & Rakerd, 1986), listeners to the ‘who?’ trials can, indeed, perceive regional accent from SC vowels
  • Tense and lax vowel qualities: different vowel qualities encode regional variation differently, particularly along this dimension, in this study

Conclusions

  • A single point measure to characterize the vowels of a talker, a community, or a time period are missing information that listeners use. Even multiple points per vowel are only measuring vowel centers if they begin at 20% (or later) and end at 80%

  • Tense/Lax: it may be that listeners need a greater percentage of a more dynamic vowel quality to perceive regional variation or this may be due to varying levels of awareness of particular vowel qualities (Babel, 2025)

  • These results are inconsistent with models of sound change or normalization that operate on a single F1/F2 measure (with or without duration) and more consistent with, e.g., Beddor (2009) (sound change) or Fruehwald (2025) (normalization)

References

Arel-Bundock, V., Greifer, N., & Heiss, A. (2024). How to Interpret Statistical Models Using marginaleffects for R and Python. Journal of Statistical Software, 111(9). https://doi.org/10.18637/jss.v111.i09
Babel, A. (2025). A semiotic approach to awareness and control. Journal of Sociolinguistics, 29(1).
Barreda, S. (2025). Normalization, essentialization, and the erasure of social and linguistic variation. Journal of Phonetics, 110, 101409. https://doi.org/10.1016/j.wocn.2025.101409
Beddor, P. S. (2009). A Coarticulatory Path to Sound Change. Language, 85(4), 785–821. https://doi.org/10.1353/lan.0.0165
Bürkner, P.-C. (2017). brms : An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1). https://doi.org/10.18637/jss.v080.i01
Campbell-Kibler, K. (2025). Place-based accentedness ratings do not predict sensitivity to regional features. Journal of Sociolinguistics, 29(1), 7492.
Carignan, C., Coretta, S., Frahm, J., Harrington, J., Hoole, P., Joseph, A., Kunay, E., & Voit, D. (2021). Planting the seed for sound change: Evidence from real-time MRI of velum kinematics in German. Language, 97(2), 333–364. https://doi.org/10.1353/lan.2021.0020
Clapp, W., Vaughn, C., Todd, S., & Sumner, M. (2023). Talker-specificity and token-specificity in recognition memory. Cognition, 237, 105450. https://doi.org/10.1016/j.cognition.2023.105450
Clopper, C. G., & Pisoni, D. B. (2004). Effects of talker variability on perceptual learning of dialects. Language and Speech, 47(3), 207238.
Fruehwald, J. (2024). Working with the discrete cosine transform in r.
Fruehwald, J. (2025). Vowel formant track normalization using discrete cosine transform coefficients. Linguistics Vanguard, 0.
Johnson, K. (2005). Speaker Normalization in Speech Perception. 27.
Johnson, K., & Sjerps, M. J. (2021). Speaker normalization in speech perception. The Handbook of Speech Perception, 145176.
Kelley, M. C., & Tucker, B. V. (2020). A comparison of four vowel overlap measures. The Journal of the Acoustical Society of America, 147(1), 137–145. https://doi.org/10.1121/10.0000494
Labov, W., Yaeger, M., & Steiner, R. (1972). A quantitative study of sound change in progress. Vol. 1. U.S. Regional Survey.
Makowski, D., Ben-Shachar, M., & Lüdecke, D. (2019). bayestestR: Describing Effects and their Uncertainty, Existence and Significance within the Bayesian Framework. Journal of Open Source Software, 4(40), 1541. https://doi.org/10.21105/joss.01541
Milroy, L., & McClenaghan, P. (1977). Stereotyped reactions to four educated accents in ulster. Belfast Working Papers in Language and Linguistics, 2(4), 111.
Nycz, J., & Hall-Lew, L. (2013). Best practices in measuring vowel merger. 20, 060008.
Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech perception as a talker-contingent process. Psychological Science, 5(1), 4246.
Nygaard, L. C., & Tzeng, C. Y. (2021). Perceptual integration of linguistic and non-linguistic properties of speech. The Handbook of Speech Perception, 398427.
Pisoni, D. B. (1997). Some thoughts on normalization in speech perception. Talker Variability in Speech Processing, 6(2), 932.
Preston, D. (2016). Whaddayaknow now? (A. M. Babel, Ed.). Cambridge University Press.
Preston, D. R. (1996). Whaddayaknow: The modes of folk linguistic awareness. Language Awareness, 5(1), 4074.
Rakerd, B., & Verbrugge, R. R. (1987). Evidence that the dynamic information for vowels is talker independent in form. Journal of Memory and Language, 26(5), 558.
Strange, W., Bohn, O.-S., Nishi, K., & Trent, S. A. (2005). Contextual variation in the acoustic and perceptual similarity of North German and American English vowels. The Journal of the Acoustical Society of America, 118(3), 1751–1762. https://doi.org/10.1121/1.1992688
Strange, W., & Jenkins, J. J. (2013). Dynamic Specification of Coarticulated Vowels (G. S. Morrison & P. F. Assmann, Eds.; pp. 87–115). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-14209-3_5
Strange, W., Jenkins, J. J., & Johnson, T. L. (1983). Dynamic specification of coarticulated vowels. The Journal of the Acoustical Society of America, 74(3), 695–705. https://doi.org/10.1121/1.389855
Sumner, M. (2011). The role of variation in the perception of accented speech. Cognition, 119(1), 131–136. https://doi.org/https://doi.org/10.1016/j.cognition.2010.10.018
Sumner, M., Kim, S. K., King, E., & McGowan, K. B. (2014). The socially weighted encoding of spoken words: A dual-route approach to speech perception. Frontiers in Psychology, 4, 1015.
Thomas, E. R. (2014). Phonetic analysis in sociolinguistics (p. 11935).
Van Engen, K. J., Baese-Berk, M., Baker, R. E., Choi, A., Kim, M., & Bradlow, A. R. (2010). The wildcat corpus of native- and foreign-accented english: Communicative efficiency across conversational dyads with varying language alignment profiles. Language and Speech.
Verbrugge, R. R., & Rakerd, B. (1986). Evidence of talker-independent information for vowels. Language and Speech, 29(1), 3957.

Acknowledgements

We are grateful to Josef Fruehwald, Jennifer Cramer, Kendal Smith, Austin Coleman, Kyler Laycock, and Shane O’Nan for their invaluable assistance with this project.