-
Beijing University of Chemical Technology
- China
- https://fistyee.github.io
Lists (1)
Sort Name ascending (A-Z)
Stars
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Open-Sora: Democratizing Efficient Video Production for All
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
[ICCV 2023 Oral] Text-to-Image Diffusion Models are Zero-Shot Video Generators
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)
This may be the simplest implement of DDPM. You can directly run Main.py to train the UNet on CIFAR-10 dataset and see the amazing process of denoising.
Accelerating the development of large multimodal models (LMMs) with lmms-eval
Official Code for "SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation"
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ suppo…
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
App showcasing multiple real-time diffusion models pipelines with Diffusers
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark
[CVPR'24] Group Anything with Radiance Fields
Video-P2P: Video Editing with Cross-attention Control
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …
Long Context Transfer from Language to Vision
Code release for Image Sculpting: Precise Object Editing with 3D Geometry Control [CVPR 2024]
LiveBench: A Challenging, Contamination-Free LLM Benchmark
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
Training and Evaluation Code for "Mixture of Volumetric Primitives for Efficient Neural Rendering"