Talking politics: Building data-driven lexica to measure political discussion quality

  • Kokil Jaidka
Keywords: deliberation, constructiveness, justification, political talk, comments, Twitter, Facebook


Social media data offers computational social scientists the opportunity to understand how ordinary citizens engage in political activities, such as expressing their ideological stances and engaging in policy discussions. This study curates and develops discussion quality lexica from the Corpus for the Linguistic Analysis of Political Talk ONline (CLAPTON).

Supervised machine learning classifiers to characterize political talk are evaluated for out-of-sample label prediction and generalizability to new contexts. The approach yields data-driven lexica, or dictionaries, that can be applied to measure the constructiveness, justification, relevance, reciprocity, empathy, and incivility of political discussions. In addition, the findings illustrate how the choices made in training such classifiers, such as the heterogeneity of the data, the feature sets used to train classifiers, and the classification approach, affect their generalizability. The article concludes by summarizing the strengths and weaknesses of applying machine learning methods to social media posts and theoretical insights into the quality and structure of online political discussions.

How to Cite
Jaidka, K. (2022). Talking politics: Building data-driven lexica to measure political discussion quality. Computational Communication Research, 4(2), 486-527. Retrieved from