Comparing corpus data (naturally-occurring spoken language samples) from different social groups is problematic, because different groups may normally experience interactions in different situations, and may have different... [ view full abstract ]
Comparing corpus data (naturally-occurring spoken language samples) from different social groups is problematic, because different groups may normally experience interactions in different situations, and may have different language norms and evaluations in superficially the "same" situation. "Casual conversation" is usually assumed to be a style roughly comparable for all groups in a community. However, naturally occurring conversational data representing different social groups in a corpus still may not be exactly matched for situational features such as topics, settings, or social distance between interlocutors. The present paper argues that the stylistic comparability of such data should be explicitly established before analyses of specific linguistic variables can be fully interpreted, and recommends the use of a general stylistic measure as an initial baseline for that purpose. The Conversation section of the Wellington Corpus of Spoken New Zealand English (WSC) is analysed as a case study.
A lexically-based formality index was derived from principal components analysis (PCA) of 25 sets of wordcounts selected to capture various aspects of (in)formality. The index was defined using the WSC and the Wellington Corpus of Written NZE (WWC). Index scores were then calculated for the data representing each individual speaker in the Conversation section of the WSC. Comparisons were limited to 504 speakers each represented by over 200 words, for whom biographical information was available, and who belonged to one of the two largest ethnic groupings (Pakeha, Maori/Pacific). Formality index scores were compared using sign tests across all matched cells representing the social groupings of gender, ethnicity, education, and age. Situations of recording were additionally controlled for whether or not the interlocutors were matched for gender, and for generational age group.
Data representing older speakers tended to be more formal. Also, data representing Pakeha males was significantly more formal than that representing Pakeha females (gender differences among Maori/Pacific individuals were nonsignificant). By contrast, there were no consistently significant differences in formality level for ethnicity or education, and no main effect from interlocutor age and gender differences, although there were signs of stylistic accommodation among individuals recorded in mixed gender and age groups.