[ECCV 2024] AnyControl, a multi-control image synthesis model that supports any combination of user provided control signals. 一个支持用户自由输入控制信号的图像生成模型，能够根据多种控制生成自然和谐的结果！

Python 105 3 Updated Jul 5, 2024

KwaiVGI / LivePortrait

Bring portraits to life!

Python 11,874 1,244 Updated Sep 6, 2024

ollama / ollama

Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models.

Go 90,373 7,100 Updated Sep 22, 2024

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,565 434 Updated Sep 19, 2024

FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,205 48 Updated Aug 15, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,689 112 Updated Sep 19, 2024

modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!

Python 6,379 572 Updated Sep 19, 2024

krennic999 / STAR

STAR: Scale-wise Text-to-image generation via Auto-Regressive representations

111 1 Updated Jun 18, 2024

LLaVA-VL / LLaVA-NeXT

Python 2,459 178 Updated Sep 19, 2024

om-ai-lab / RS5M

RS5M: a large-scale vision language dataset for remote sensing

Python 192 7 Updated Aug 28, 2024

karpathy / LLM101n

LLM101n: Let's build a Storyteller

28,821 1,576 Updated Aug 1, 2024

lucidrains / titok-pytorch

Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"

Python 159 3 Updated Jun 20, 2024

Luo-Z13 / SkySenseGPT

A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding

Python 52 4 Updated Aug 3, 2024

fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 9,194 1,261 Updated Sep 14, 2024

zytx121 / Awesome-VLGFM

A Survey on Vision-Language Geo-Foundation Models (VLGFMs)

106 7 Updated Aug 31, 2024

OpenGVLab / OmniCorpus

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Python 250 5 Updated Aug 29, 2024

MingTao(陶明) tobran

Lists (4)

T2I-dataset

TGI

TGP

tools

Starred repositories

text-to-image