Adjacent and Non‐Adjacent Word Contexts Both Predict Age of Acquisition of English Words: A Distributional Corpus Analysis of Child‐Directed Speech

Published on November 8, 2020

Abstract
Children show a remarkable degree of consistency in learning some words earlier than others. What patterns of word usage predict variations among words in age of acquisition? We use distributional analysis of a naturalistic corpus of child‐directed speech to create quantitative features representing natural variability in word contexts. We evaluate two sets of features: One set is generated from the distribution of words into frames defined by the two adjacent words. These features primarily encode syntactic aspects of word usage. The other set is generated from non‐adjacent co‐occurrences between words. These features encode complementary thematic aspects of word usage. Regression models using these distributional features to predict age of acquisition of 656 early‐acquired English words indicate that both types of features improve predictions over simpler models based on frequency and appearance in salient or simple utterance contexts. Syntactic features were stronger predictors of children’s production than comprehension, whereas thematic features were stronger predictors of comprehension. Overall, earlier acquisition was predicted by features representing frames that select for nouns and verbs, and by thematic content related to food and face‐to‐face play topics; later acquisition was predicted by features representing frames that select for pronouns and question words, and by content related to narratives and object play.

Read Full Article (External Site)