Word Embedding Enrichment for Dictionary Construction
An Example of Incivility in Cantonese
DOI:
https://doi.org/10.5117/CCR2023.1.10.LIANKeywords:
political incivility, machine learning, dictionary construction, Cantonese, swearingAbstract
Dictionary-based methods remain valuable to measure concepts based on texts, though supervised machine learning has been widely used in much recent communication research. The present study proposes a semi-automatic and easily implemented method to build and enrich dictionaries based on word embeddings. As an example, we create a dictionary of political incivility that contains vulgarity and name-calling words in Cantonese. The study shows that dictionary-based classification outperforms supervised machine learning methods, including deep neural network models. Furthermore, a small number of random seed words can generate a highly accurate dictionary. However, the uncivil content detected is only weakly correlated with uncivil perceptions, as we demonstrate in a population-based survey experiment. The strengths and limitations of dictionary-based methods are discussed.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Hai Liang, Yee Man Margaret Ng, Nathan L.T. Tsang
This work is licensed under a Creative Commons Attribution 4.0 International License.