Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Official PyTorch Implementation of ParGo: Bridging Vision-Language with Partial and Global Views.
All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)
Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment, CVPR, 2024
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
Ambiguity-Aware and High-Order Relation Learning for Multi-Grained Image-Text Alignment
Easily compute clip embeddings and build a clip retrieval system with them
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
ESA: External Space Attention Aggregation for Image-Text Retrieval
A Framework of Small-scale Large Multimodal Models
a family of highly capabale yet efficient large multimodal models
This repository contains code for paper GraDual: Graph-based Dual-modal Representation for Image-Text Matching, published in WACV 2022
The code of the paper of "A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval" accepted by NeurIPS' 2022.
USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval, TIP 2024
PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
Enhanced Citation Counts Manager for Zotero 7
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"
📖 Official Code for “PIR-CLIP: Remote Sensing Image-text Retrieval with Prior Instruction Representation Learning”
An open source implementation of CLIP.
The official source code for the paper Consensus-Aware Visual-Semantic Embedding for Image-Text Matching (ECCV 2020)