Impoliteness and incivility in online discussions have recently been discussed as relevant issues in the field of communication science. However, automatically detecting such concepts with computational methods is challenging. In our study, we develop supervised classification models to predict impoliteness and incivility in German user comments. Using a sample of 10,000 hand-coded user comments and a theory-grounded coding scheme, we train and test classifiers based on unigram, bigram, and trigram feature models and on Naïve Bayes and Support Vector Machine algorithms. Our classification models, based on word frequency distributions in user comments, predict both impoliteness and incivility with an accuracy of about 80 percent. The models also reveal predictive features that include obviously offensive language, uncivil rhetoric, and topic and context-related words. Our study thereby contributes both to the understanding of impolite and uncivil communication in user comment sections and to the development and applications of text classification using machine learning.
This work is licensed under a Creative Commons Attribution 4.0 International License.