A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,820 303 Updated Sep 19, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 13,445 1,225 Updated Sep 20, 2024

microsoft / torchscale

Foundation Architecture for (M)LLMs

Python 3,003 202 Updated Apr 11, 2024

pythonprofilers / memory_profiler

Monitor Memory usage of Python code

Python 4,329 376 Updated Apr 29, 2024

CompVis / stable-diffusion

A latent text-to-image diffusion model

Jupyter Notebook 67,569 10,085 Updated Jun 18, 2024

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 11,552 2,420 Updated Sep 20, 2024

triton-lang / triton

Development repository for the Triton language and compiler

C++ 12,766 1,539 Updated Sep 20, 2024

lucidrains / PaLM-pytorch

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways

Python 818 82 Updated Nov 9, 2022

mlcommons / training_results_v2.0

This repository contains the results and code for the MLPerf™ Training v2.0 benchmark.

C++ 27 24 Updated Feb 23, 2024

microsoft / DeepSpeedExamples

Example models using DeepSpeed

Python 5,992 1,017 Updated Sep 17, 2024

labmlai / annotated_deep_learning_paper_implementations

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 54,001 5,583 Updated Aug 24, 2024

microsoft / tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Python 712 88 Updated Sep 13, 2024

PKU-DAIR / Hetu

Forked from Hsword/Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Python 256 29 Updated Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shiqing Fan fanshiqing

Achievements

Achievements

Block or report fanshiqing

Stars

NVIDIA / apex

NVIDIA / NeMo-Framework-Launcher

giangdip2410 / HyperRouter

ajtejankar / mixtral-vis-moe

sail-sg / zero-bubble-pipeline-parallelism

deepseek-ai / DeepSeek-V2

meta-llama / llama

deepseek-ai / DeepSeek-MoE

stas00 / the-art-of-debugging

NVIDIA-developer-blog / code-samples

fanshiqing / grouped_gemm

Karbo123 / pytorch_grouped_gemm

tgale96 / grouped_gemm

leptonai / examples

databricks / megablocks

Mellanox / nccl-rdma-sharp-plugins

arogozhnikov / einops

NVIDIA / TransformerEngine