Wu, Patrick Y., and Walter R. Mebane. “MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks”. Computational Communication Research 4, no. 1 (May 3, 2022): 275–322. Accessed April 16, 2024. https://computationalcommunication.org/ccr/article/view/102.