Stars
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
The official repository of Continuous Memory Representation for Anomaly Detection
Official repository for EXAONE built by LG AI Research
[AAAI 2024 Oral] AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
Vector (and Scalar) Quantization, in Pytorch
A high-throughput and memory-efficient inference and serving engine for LLMs
Simple PyTorch implementation of "Libra: Building Decoupled Vision System on Large Language Models" (accepted by ICML 2024)
Tips for releasing research code in Machine Learning (with official NeurIPS 2020 recommendations)
1-Click is all you need.
tabtoyou / VL-DINO
Forked from facebookresearch/dinoVerifying Vision-Language alignment using DINO visualization techniques on cross-attention maps
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
chongzhou96 / MaskCLIP
Forked from open-mmlab/mmsegmentationOfficial PyTorch implementation of "Extract Free Dense Labels from CLIP" (ECCV 22 Oral)
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Official implementation of 'CLIP-DINOiser: Teaching CLIP a few DINO tricks' paper.
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Official code for paper "Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models, ICML2024"
A curated publication list on open vocabulary semantic segmentation and related area (e.g. zero-shot semantic segmentation) resources..
An open source implementation of CLIP.
❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119
a state-of-the-art-level open visual language model | 多模态预训练模型
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
Accurate reimplementation of WinCLIP (pytorch version)