- Santa Barbara, CA
- weixi-feng.github.io
- @weixi_feng
Highlights
- Pro
Stars
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Official inference repo for FLUX.1 models
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything
[ICCV 2023] Tracking Anything with Decoupled Video Segmentation
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation
Improved Implementation for Training GLIGEN: Open-Set Grounded Text-to-Image Generation
Official PyTorch implementation of TrackDiffusion (https://arxiv.org/abs/2312.00651)
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) fo…
(CVPR 2024) 🧩 TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
[ECCV'20] Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
[NeurIPS 2024] VideoTetris: Towards Compositional Text-To-Video Generation
[ICLR 2024] LLM-grounded Video Diffusion Models (LVD): official implementation for the LVD paper
Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'
[ICLR 2024 spotlight] Official implementation of "InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior".
Dromedary: towards helpful, ethical and reliable LLMs.
[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation