Wu, P. Y., and W. R. Mebane. “MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks”. Computational Communication Research, vol. 4, no. 1, May 2022, pp. 275-22, https://computationalcommunication.org/ccr/article/view/102.