SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

Type
Publication
UniReps: the First Workshop on Unifying Representations in Neural Models