Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
High-resolution models for human tasks.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Official code for "RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control"
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vi…
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets
A high-throughput and memory-efficient inference and serving engine for LLMs
DeepSeek-VL: Towards Real-World Vision-Language Understanding
[WIP] Layer Diffusion for WebUI (via Forge)
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Transparent Image Layer Diffusion using Latent Transparency
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Official Code for MotionCtrl [SIGGRAPH 2024]
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Code for the paper "Pix2Video: Video Editing using Image Diffusion"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A feature-rich command-line audio/video downloader
Official and maintained implementation of the paper "Differentiable JPEG: The Devil is in the Details" [WACV 2024].
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
PyTorch implementation of InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.
CoTracker is a model for tracking any point (pixel) on a video.
[CVPR 2024 Highlight] Official PyTorch implementation of CoDeF: Content Deformation Fields for Temporally Consistent Video Processing