Uncovering the Daily Dance of Language: Analyzing Word Frequencies

Published on December 14, 2022

Imagine language as a lively dance floor, with words moving and grooving to the rhythm of conversation. But standard models in linguistics only capture the static view, missing out on the dynamic variations and shifts that occur daily. In order to truly understand how language evolves, scientists have created a corpus of word frequency data collected from online news sources every 20 minutes for over 2 years. They’ve developed a time-varying model that allows them to analyze the parameters of word usage and track the relative mobility and drift of words on a day-to-day basis. Through this analysis, they challenge the idea that word use follows a ‘rich-get-richer’ pattern, providing evidence that there’s more complexity at play. So next time you’re chatting away, remember that each word has its own dance moves and contributes to the dynamic tapestry of language!

Abstract
Standard models in quantitative linguistics assume that word usage follows a fixed frequency distribution, often Zipf’s law or a close relative. This view, however, does not capture the near daily variations in topics of conversation, nor the short-term dynamics of language change. In order to understand the dynamics of human language use, we present a corpus of daily word frequency variation scraped from online news sources every 20 min for more than 2 years. We construct a simple time-varying model with a latent state, which is observed via word frequency counts. We use Bayesian techniques to infer the parameters of this model for 20,000 words, allowing us to convert complex word-frequency trajectories into low-dimensional parameters in word usage. By analyzing the inferred parameters of this model, we quantify the relative mobility and drift of words on a day-to-day basis, while accounting for sampling error. We quantify this variation and show evidence against “rich-get-richer” models of word use, which have been previously hypothesized to explain statistical patterns in language.

Read Full Article (External Site)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>