Starred repositories
Zero Bubble Pipeline Parallelism
A lecture note for understanding deep learning
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
FlagScale is a large model toolkit based on open-sourced projects.
A library to analyze PyTorch traces.
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
Tutel MoE: An Optimized Mixture-of-Experts Implementation
A fast communication-overlapping library for tensor parallelism on GPUs.
Implementation of a parallel least squares support vector machine using multiple backends for different GPU vendors.
An OAI compatible exllamav2 API that's both lightweight and fast
A fast inference library for running LLMs locally on modern consumer-class GPUs
学园偶像大师助手 | Assistant For Gakuen Idolmaster/学園アイドルマスター/学マス
ThunderSVM: A Fast SVM Library on GPUs and CPUs
User-friendly WebUI for LLMs (Formerly Ollama WebUI)
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Making eBPF programming easier via build env and examples
A curated list of awesome projects related to eBPF.
Mitsuba 3: A Retargetable Forward and Inverse Renderer