Semantic Entity Alignment and Non-Corresponding Reasoning for Text-to-Image Person Re-identification
Abstract: With the rapid development of intelligent surveillance technology, the massive amount of multimodal data (e.g., videos, images, and text) has imposed higher demands on efficient information ...
Abstract: Recent CLIP-guided 3D generation methods have achieved promising results but struggle with generating faithful 3D shapes that conform with input text due to the gap between text and image ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results