Entrepôts, Représentation et Ingénierie des Connaissances
Publications du laboratoire

Recherche approfondie

par Année
par Auteur
par Thème
par Type
- Text Mining and Twitter to Analyze British Swearing Habits doi link

Auteur(s): Gauthier Michael(Corresp.), Guille A.(Corresp.), Rico F., Deseille Anthony

Chapître d'ouvrage: Handbook Of Twitter For Research, vol. p.27--46 (2016)

DOI: 10.5281/zenodo.44882

The way women and men speak are expected to behave is frequently discussed. For example, women are sometimes described as speaking more than men, and men as swearing more than women. These stereotypes can alter people's expectations concerning the way we should behave. Indeed, if the idea that females generally swear less frequently than males is widespread, women who swear may be perceived as deviant from the norm, and thus be stigmatized. Clearly understanding what is true and what is not in these studies and reports is not an easy task, because there is a considerable amount of differing opinions on the topic. The way swear words are used by women and men is one of those topics which remains vague, but whose stake is great, since swearing is often considered as an act of power and a way of affirming oneself. This article will introduce the data gathered from a corpus of tweets in order to shed a new light on new ways of analyzing specific sociolinguistic features like gendered uses of swear words on Twitter. Analyzing the linguistic behaviour of users of these media can be an interesting way of generating a most contemporary corpus representative of general trends, and computational linguistics can represent a very accurate and powerful method of analyzing the different uses people can make of certain speech patterns. In order to carry out the study, we used several tools taken from both computer science and linguistics. These tools may represent innovative methods to analyze the effect of social parameters on speech patterns displayed in Twitter corpora. Thanks to this data, we analyze both quantitative, and qualitative instances of swear words in the corpus, to see how the linguistic gendered preferences may differ when being vulgar, but just as importantly, we see how comparable they can be. Indeed, very often when dealing with gender in corpus linguistics, small differences tend to be focused on, whereas they are actually minor compared to the similarities. As for every study, the methods used here also have certain limits that we present as well. Without pretending to be representative of interactions other than the computer-mediated ones present in this corpus, we hope that this data can shed an up-to-date and neutral light on the way women and men use swear words on Twitter, and on the implications these results may have, as well as on new tools researchers can use in various areas of research. We believe that this study can also be useful to computational linguists/sociologists thanks to the methods used to access data not directly available and displayed by users (e.g. the age or the sex).