Beyond Counting Words:

Assessing Performance of Dictionaries, Supervised Machine Learning, and Embeddings in Topic and Frame Classification

Authors

  • A.C. Kroon
  • Toni G. L. A. Van der Meer
  • Rens Vliegenthart

Keywords:

automated text analysis, dictionaries, supervised machine learning, word embeddings, frames, policy topics.

Abstract

Topics and frames are at the heart of various theories in communication science and other social sciences, making their measurement of key interest to many scholars. The current study compares and contrasts two main deductive computational approaches to measure policy topics and frames: Dictionary (lexicon) based identification, and supervised machine learning. Additionally, we introduce domain-specific word embeddings to these classification tasks. Drawing on a manually coded dataset of Dutch news articles and parliamentary questions, our results indicate that supervised machine learning outperforms dictionary-based classification for both tasks. Furthermore, results show that word embeddings may boost performance at relatively low cost by introducing relevant and domain-specific semantic information to the classification model.

Downloads

Published

2022-09-28

How to Cite

Kroon, A., Van der Meer, T. G. L. A., & Vliegenthart, R. (2022). Beyond Counting Words:: Assessing Performance of Dictionaries, Supervised Machine Learning, and Embeddings in Topic and Frame Classification. Computational Communication Research, 4(2), 528–570. Retrieved from https://computationalcommunication.org/ccr/article/view/41

Issue

Section

Articles