-
OGQ
- South Korea
- https://scholar.google.com/citations?user=xgP6q2YAAAAJ&hl=en
Stars
PixArt-ฮฑ: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
MagicAvatar: Multimodal Avatar Generation and Animation
ICCV 2023 ่ฎบๆๅๅผๆบ้กน็ฎๅ้
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
๐ Text-Prompted Generative Audio Model
Image to prompt with BLIP and CLIP
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Official PyTorch implementation of GroupViT: Semantic Segmentation Emerges from Text Supervision, CVPR 2022.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Official Implementation of "CAT-Seg๐ฑ: Cost Aggregation for Open-Vocabulary Semantic Segmentation"
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
IFSeg: Image-free Semantic Segmentation via Vision-Language Model (CVPR 2023)
The pytorch implementation of our CVPR 2023 paper "Conditional Image-to-Video Generation with Latent Flow Diffusion Models"
An open source implementation of CLIP.
An open-source framework for training large multimodal models.
[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language
4 bits quantization of LLaMA using GPTQ
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and ๐ video, up to 5x faster than OpenAI CLIP and LLaVA ๐ผ๏ธ & ๐๏ธ
[CVPR 2022] CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation
๐ค Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion