Multi-agent models and social media data: Collective dynamics and individual trajectories in linguistic populations

The interdisciplinary MACDIT project brings together researchers from the DDL, ICAR and Lidilem laboratories. Our goal is to study the interactions between individual and collective levels of language variation and change. We will use data analysis methods applied to Twitter and Wikipedia along with multi-agent modeling to identify how linguistic innovations emerge and which factors are favorable or not to them.

Download the full projet

Context

Change and variation are fundamental properties of language. While it is recognized that the structure of social interactions influences these properties, we are far from understanding the full complexity of this phenomenon and its dynamics. How is it possible that successive generations of speakers use the same language to maintain mutual intelligibility and at the same time that these same speakers speak a language that is sufficiently different to allow languages to change? This paradox goes to the heart of linguistic theories, questioning the very essence of what language is.

Theoritical framework

Variationist sociolinguistics has taken up these questions and has shown that linguistic variation, far from being random, is conditioned by the sociodemographic structure of populations. Used in a differentiated way by different sub-populations, linguistic variants are thus markers of these populations and the use of one variant rather than another allows the speaker to assert an identity. Linguistic change is thus the result of the dynamics of variant use, subject to the dynamics of population structure, to changes in social representations, and to the internal constraints of the language and the cognitive system of the speakers. Language thus appears as a complex dynamic system interacting with other cognitive and social systems.

Method

We will use a "dual track" to deepen this understanding:

  1. We will model the structure and dynamics of interactions using multi-agent networks that are diverse in their internal properties and speaker characteristics.
  2. We will integrate "real world data" using Twitter messages from the SoSweet corpus and online exchanges involved in the construction of Wikipedia articles.

Goals

  1. How are collective linguistic conventions constructed through interindividual interactions within social media data? For example, under what circumstances does an innovation generalize across the network?
  2. How do collective linguistic conventions influence individuals? Are people affected by majority/established sociolinguistic conventions when they enter a network? Or do they resist merging? And what factors influence these dynamics?

Expected results

Twitter data provide information on the variation of linguistic usage (in French in particular) according to network structure, socio-geographical factors and linguistic domains. Wikipedia data document user interactions on circumscribed topics, and the emergence and evolution of a text genre: the online collaborative encyclopedic article. A major contribution of this project is the perfect combination of these two approaches (modeling is constrained by the data and informs data collection and analysis) across a wide range of expertise (sociolinguistics, dialectology, computational modeling, data science and complexity).