Abstract: Video-text retrieval is a crucial task in numerous computer vision applications. In this paper, we focus on video-text retrieval involving complex action compositions, where a single video ...
Abstract: Text-to-video retrieval (T2VR) aims to identify the most semantically relevant video based on a text query. Text queries typically involve diverse visual elements and events in video, making ...