Christopher Ahern
Data scientist at Janus Health

Conflict, cheap talk, and Jespersen's cycle

A revised version of a paper with Robin Clark, Conflict, cheap talk, and Jespersen’s cycle is available online along with code. Here’s the abstract:

Game-theory has found broad application in modeling meaning in both the classical Gricean case of common interests between interlocutors and, more recently, in cases of conflicting interests. Here we consider how conflicting interests between speakers and hearers can be used to explain language change. We use tools from evolutionary game theory to characterize the effect of conflicting interests in the case of Jespersen’s cycle. We show how the cycle can be modeled as an inflationary process due to signaling with costless signals under conflicting interests. We fit the resulting dynamic model to time series data drawn from a historical corpus of Middle English.

Distinguishing drift and selection

A new paper with Mitchell Newberry, Robin Clark, and Josh Plotkin is available on arxiv. In it we use statistical techniques from population genetics to distinguish between drift and selection in linguistic time series drawn from English corpora spanning the 12th to the 21st century:

Languages and genes are both transmitted from generation to generation, with opportunity for differential reproduction and survivorship of forms. Here we apply a rigorous inference framework, drawn from population genetics, to distinguish between two broad mechanisms of language change: drift and selection. Drift is change that results from stochasticity in transmission and it may occur in the absence of any intrinsic difference between linguistic forms; whereas selection is truly an evolutionary force arising from intrinsic differences – for example, when one form is preferred by members of the population. Using large corpora of parsed texts spanning the 12th century to the 21st century, we analyze three examples of grammatical changes in English: the regularization of past-tense verbs, the rise of the periphrastic `do’, and syntactic variation in verbal negation. We show that we can reject stochastic drift in favor of a selective force driving some of these language changes, but not others. The strength of drift depends on a word’s frequency, and so drift provides an alternative explanation for why some words are more prone to change than others. Our results suggest an important role for stochasticity in language change, and they provide a null model against which selective theories of language evolution must be compared.

Check dat out

Meredith Tamminga, Aaron Ecay, and I have updated code and renamed a paper, Generalized Additive Mixed Models for intraspeaker variation, which has been accepted for publication at Linguistic Vanguard. We suggest GAMMs as a useful tool for separating out different sources of repetitiveness in naturalistic speech data, taking DH-stoping (‘that’ vs. ‘dat’) in the Philadelphia Neighborhood corpus as an example.

For each interview we want to distinguish two potential sources of repetitiveness in the use of the different DH variants like ‘that’ and ‘dat’.

  • The tendency of speakers to repeat themselves. Using ‘that’ primes subsequent use of ‘that’.
  • The tendency of speakers to use ‘that’ in more formal styles, and ‘dat’ in less formal styles

We expect priming to be a fact about how brains and cognition work, but we don’t have any expectations about how styles change over a given interview. GAMMs allows us to model priming as the effect of the previous token on the current token and model style as a smooth function of elapsed time.

Below is a plot of the random smooths by speaker. Each curve, roughly speaking, represents an estimate of the style for a given speaker in an interview. See the data, code, and paper for more details, but intuitively, the wigglier or more non-linear the smooth, the more varied the style over the course of the interview.