(1)
Wu, P. Y.; Mebane, W. R. MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks. CCR 2022, 4, 275-322.