WU, P. Y.; MEBANE, W. R. MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks. Computational Communication Research, [S. l.], v. 4, n. 1, p. 275–322, 2022. Disponível em: https://computationalcommunication.org/ccr/article/view/102. Acesso em: 19 apr. 2024.