Abstract: Image-text matching is a vital task in multi-modal intelligence. Recently, researchers have moved beyond simply aligning fragments between image regions and text words at a low level. They ...
Abstract: Recently, generative adversarial networks (GAN) have made remarkable progress, particularly with the advent of Contrastive Language-Image Pretraining (CLIP), which take image and text into a ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results