gangliao

🏠

Working from home

Gang Liao gangliao

🏠

Working from home

ML/LLM Serving at Scale

169 followers · 1 following

Achievements

x2 x3

Achievements

x2 x3

Highlights

Lists (1)

Sort

🚀 My stack

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

mynameisfiber / high_performance_python_2e

Code for the book "High Performance Python 2e" by Micha Gorelick and Ian Ozsvald with OReilly

Python 408 135 Updated Jan 18, 2023

karpathy / LLM101n

LLM101n: Let's build a Storyteller

28,911 1,582 Updated Aug 1, 2024

google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 10,112 1,165 Updated Sep 1, 2024

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 8,404 596 Updated Sep 20, 2024

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,536 363 Updated Sep 13, 2024

google / gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Python 5,242 502 Updated Jul 31, 2024

karpathy / minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,065 836 Updated Jul 1, 2024

facebook / buck2

Build system, successor to Buck

Rust 3,523 215 Updated Sep 26, 2024

zwegner / zp7

ZP7: Zach's Peppy Parallel-Prefix-Popcountin' PEXT/PDEP Polyfill

C 43 3 Updated Aug 14, 2024

S-LoRA / S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,714 89 Updated Jan 21, 2024

harvardnlp / annotated-transformer

An annotated implementation of the Transformer paper.

Jupyter Notebook 5,604 1,212 Updated Apr 7, 2024

meta-llama / codellama

Inference code for CodeLlama models

Python 15,899 1,848 Updated Aug 12, 2024

PKU-YuanGroup / ChatLaw

ChatLaw：A Powerful LLM Tailored for Chinese Legal. 中文法律大模型

6,839 540 Updated Jun 4, 2024

facebook / squangle

SQuangLe is a C++ API for accessing MySQL servers

C++ 123 54 Updated Sep 26, 2024

google / re2

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

C++ 8,900 1,123 Updated Aug 6, 2024

oprecomp / FloatX

Header-only C++ library for low precision floating point type emulation.

C++ 163 26 Updated Jan 24, 2020

karpathy / llama2.c

Inference Llama 2 in one file of pure C

C 17,214 2,053 Updated Aug 6, 2024

cwida / FastLanes

Towards a New File Format

C++ 151 8 Updated Sep 16, 2024

powturbo / TurboPFor-Integer-Compression

Fastest Integer Compression

C 762 111 Updated Mar 1, 2024

erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python

Python 4,871 734 Updated Sep 2, 2024

facebook / wdt

Warp speed Data Transfer (WDT) is an embeddedable library (and command line tool) aiming to transfer data between 2 systems as fast as possible over multiple TCP paths.

C++ 2,862 391 Updated Aug 23, 2024

TsinghuaDatabaseGroup / AIDB

ai4db and db4ai work

647 87 Updated Aug 16, 2024

weaviate / weaviate

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of …

Go 10,939 749 Updated Sep 26, 2024