-
Beijing University of Chemical Technology
- China
- https://fistyee.github.io
Lists (1)
Sort Name ascending (A-Z)
Stars
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark
[Survey] Awesome List of Mixup Augmentation and Beyond (https://arxiv.org/abs/2409.05202)
[ACL 2024 Best Paper] Deciphering Oracle Bone Language with Diffusion Models
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Long Context Transfer from Language to Vision
LiveBench: A Challenging, Contamination-Free LLM Benchmark
Accelerating the development of large multimodal models (LMMs) with lmms-eval
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis (ECCV 2024 Oral) - Official Implementation
Open-Sora: Democratizing Efficient Video Production for All
This may be the simplest implement of DDPM. You can directly run Main.py to train the UNet on CIFAR-10 dataset and see the amazing process of denoising.
Training and Evaluation Code for "Mixture of Volumetric Primitives for Efficient Neural Rendering"
[CVPR'24] Group Anything with Radiance Fields
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
Code release for Image Sculpting: Precise Object Editing with 3D Geometry Control [CVPR 2024]
App showcasing multiple real-time diffusion models pipelines with Diffusers
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
Official Code for "SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation"
[NeurIPS 2023] LMC: Large Model Collaboration with Cross-assessment for Training-Free Open-Set Object Recognition
A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。