Lowering the Language Barrier

Investigating Deep Transfer Learning and Machine Translation for Multilingual Analyses of Political Texts

Authors

  • Moritz Laurer Vrije Universiteit Amsterdam
  • Wouter van Atteveldt Vrije Universiteit Amsterdam
  • Andreu Casas Vrije Universiteit Amsterdam
  • Kasper Welbers Vrije Universiteit Amsterdam

DOI:

https://doi.org/10.5117/CCR2023.2.7.LAUR

Keywords:

text-as-data, machine learning, multilingualism, computational social sciences

Abstract

The social science toolkit for computational text analysis is still very much in the making. We know surprisingly little about how to produce valid insights from large amounts of multilingual texts for comparative social science research. In this paper, we test several recent innovations from deep transfer learning to help advance the computational toolkit for social science research in multilingual settings. We investigate the extent to which prior language and task knowledge stored in the parameters of modern language models is useful for enabling multilingual research; we investigate the extent to which these algorithms can be fruitfully combined with machine translation; and we investigate whether these methods are accurate, practical and valid in multilingual settings – three essential conditions for lowering the language barrier in practice. We use two datasets with texts in 12 languages from 27 countries for our investigation. Our analysis shows, that, based on these innovations, supervised machine learning can produce substantively meaningful outputs. Our BERT-NLI model trained on only 674 or 1,674 texts in only one or two languages can validly predict political party families’ stances towards immigration in eight other languages and ten other countries.

Downloads

Additional Files

Published

2023-09-28

How to Cite

Laurer, M., van Atteveldt, W., Casas, A., & Welbers, K. (2023). Lowering the Language Barrier: Investigating Deep Transfer Learning and Machine Translation for Multilingual Analyses of Political Texts. Computational Communication Research, 5(2). https://doi.org/10.5117/CCR2023.2.7.LAUR