Our framework consists of two main pipelines: (1) Test-Time Augmentation: Given an input image and text prompt, we apply various transformations to create multiple augmented versions. VLM processes ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results