Beyond Counting Words:

Assessing Performance of Dictionaries, Supervised Machine Learning, and Embeddings in Topic and Frame Classification

  • A.C. Kroon
  • Toni G. L. A. Van der Meer
  • Rens Vliegenthart
Keywords: automated text analysis, dictionaries, supervised machine learning, word embeddings, frames, policy topics.

Abstract

Topics and frames are at the heart of various theories in communication science and other social sciences, making their measurement of key interest to many scholars. The current study compares and contrasts two main deductive computational approaches to measure policy topics and frames: Dictionary (lexicon) based identification, and supervised machine learning. Additionally, we introduce domain-specific word embeddings to these classification tasks. Drawing on a manually coded dataset of Dutch news articles and parliamentary questions, our results indicate that supervised machine learning outperforms dictionary-based classification for both tasks. Furthermore, results show that word embeddings may boost performance at relatively low cost by introducing relevant and domain-specific semantic information to the classification model.

Published
2022-09-28
How to Cite
Kroon, A., Van der Meer, T., & Vliegenthart, R. (2022). Beyond Counting Words:. Computational Communication Research, 4(2), 528-570. Retrieved from https://computationalcommunication.org/ccr/article/view/41
Section
Articles