OpenAlex | X2-VLM: All-in-One Pre-Trained Model for Vision-Language Tasks

X2-VLM: All-in-One Pre-Trained Model for Vision-Language Tasks

Work

Year: 2023

Type: article

Abstract: Vision language pre-training aims to learn alignments between vision and language from a large amount of data. Most existing methods only learn image-text alignments. Some others utilize pre-trained o... more

Source: IEEE Transactions on Pattern Analysis and Machine Intelligence

Authors Yan Zeng, Xinsong Zhang, Hang Li, Jiawei Wang, Jipeng Zhang +1 more

Institutions Hong Kong University of Science and Technology, University of Hong Kong, ETH Zurich

Cites: 70

Cited by: 22

Related to: 10

FWCI: 5.122

Citation percentile (by year/subfield): 100

Topic: Multimodal Machine Learning Applications

Subfield: Computer Vision and Pattern Recognition

Field: Computer Science

Domain: Physical Sciences

Open Access status: green