The official implementation of NarVid — a framework that enhances text-video retrieval by leveraging frame-level captions (narration) to improve semantic understanding and retrieval accuracy. NarVid ...
Abstract: Controllability of video generation has been recently concerned in addition to the quality of generated videos. The main challenge to controllable video generation is to synthesize videos ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results