Wu, P. Y., & Mebane, W. R. (2022). MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks. Computational Communication Research, 4(1), 275–322. Retrieved from https://computationalcommunication.org/ccr/article/view/102