Skip to content
View fanshiqing's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report fanshiqing

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,315 1,384 Updated Aug 30, 2024

Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.

Python 445 134 Updated Sep 6, 2024

Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"

Python 31 3 Updated Nov 29, 2023

Visualize expert firing frequencies across sentences in the Mixtral MoE model

Python 17 2 Updated Dec 22, 2023

Zero Bubble Pipeline Parallelism

Python 256 13 Updated Sep 4, 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

3,432 138 Updated Aug 10, 2024

Inference code for Llama models

Python 55,535 9,476 Updated Aug 18, 2024

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 972 47 Updated Jan 16, 2024

The Art of Debugging

C 790 31 Updated Aug 3, 2024

Source code examples from the Parallel Forall Blog

HTML 1,224 632 Updated Jul 23, 2024

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 57 22 Updated Jul 18, 2024

High Performance Grouped GEMM in PyTorch

Cuda 20 2 Updated May 10, 2022

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 41 35 Updated Aug 26, 2024

Lepton Examples

Jupyter Notebook 139 18 Updated Jul 25, 2024
Python 1,167 170 Updated Sep 19, 2024

RDMA and SHARP plugins for nccl library

C 154 32 Updated Sep 16, 2024

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Python 8,371 347 Updated Sep 17, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,820 303 Updated Sep 19, 2024

Fast and memory-efficient exact attention

Python 13,445 1,225 Updated Sep 20, 2024

Foundation Architecture for (M)LLMs

Python 3,003 202 Updated Apr 11, 2024

Monitor Memory usage of Python code

Python 4,329 376 Updated Apr 29, 2024

A latent text-to-image diffusion model

Jupyter Notebook 67,569 10,085 Updated Jun 18, 2024

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 11,552 2,420 Updated Sep 20, 2024

Development repository for the Triton language and compiler

C++ 12,766 1,539 Updated Sep 20, 2024

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways

Python 818 82 Updated Nov 9, 2022

This repository contains the results and code for the MLPerf™ Training v2.0 benchmark.

C++ 27 24 Updated Feb 23, 2024

Example models using DeepSpeed

Python 5,992 1,017 Updated Sep 17, 2024

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 54,001 5,583 Updated Aug 24, 2024

Tutel MoE: An Optimized Mixture-of-Experts Implementation

Python 712 88 Updated Sep 13, 2024

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Python 256 29 Updated Dec 18, 2023
Next