Beyond Counting Words:
Assessing Performance of Dictionaries, Supervised Machine Learning, and Embeddings in Topic and Frame Classification
Keywords:
automated text analysis, dictionaries, supervised machine learning, word embeddings, frames, policy topics.Abstract
Topics and frames are at the heart of various theories in communication science and other social sciences, making their measurement of key interest to many scholars. The current study compares and contrasts two main deductive computational approaches to measure policy topics and frames: Dictionary (lexicon) based identification, and supervised machine learning. Additionally, we introduce domain-specific word embeddings to these classification tasks. Drawing on a manually coded dataset of Dutch news articles and parliamentary questions, our results indicate that supervised machine learning outperforms dictionary-based classification for both tasks. Furthermore, results show that word embeddings may boost performance at relatively low cost by introducing relevant and domain-specific semantic information to the classification model.