Linguistic variation among the speakers opens a possibility to study the drivers of changes in the language. It also allows us to define and study speaker groups. Traditionally speaker groups have been defined based on background information on the speakers (e.g. Labov 1972), but here define speaker groups solely based on the observed linguistic variation. We study variation and change in the Finnish language by applying quantitative and statistical methods, including methods originally developed for biological data. Previous studies that have applied biological methods to language data have focused mostly on language families (Gavin 2013). In contrast, we focus here on a shorter time scale and study Finnish spoken in Helsinki during the last decades. Our data come from the Longitudinal corpus of Finnish spoken in Helsinki that was collected in three decades, in the 1970’s, 1990’s and 2010’s (Helsinki 2014). The corpus consists of interviews with people that differ e.g. in age, sex and social class. We will present new computational methods in sociolinguistics, including Bayesian clustering methods and machine learning based probabilistic topic models. We treat speakers as individuals and speaker groups as populations, which we define as ‘linguistic populations’. We will discuss how language has changed in Helsinki over short time scale, and show what kind of linguistic populations the speakers of Finnish in Helsinki have formed.
Gavin, M. C., Botero, C. A., Bowern, C., et al. 2013. Toward a mechanistic understanding of linguistic diversity. Bioscience 63: 524-535.
Helsinki 2014 = The Longitudinal Corpus of Finnish Spoken in Helsinki (1970, 1990, 2010). University of Helsinki, department of Finnish, Finno-Ugrian and Scandinavian Studies, Institute for the Languages of Finland and Heikki Paunonen. URN: http://urn.fi/urn:nbn:fi:lb-2014073041.
Labov, William 1972. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.