Skip to content
View hogura99's full-sized avatar
😃
😃

Highlights

  • Pro

Block or report hogura99

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashInfer: Kernel Library for LLM Serving

Cuda 1,223 115 Updated Oct 7, 2024

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 235 33 Updated Sep 25, 2024

A throughput-oriented high-performance serving framework for LLMs

Cuda 568 23 Updated Sep 21, 2024

A C++11 library for serialization

C++ 4,187 751 Updated Aug 19, 2024

Header-only C++ binding for libzmq

C++ 1,951 757 Updated Aug 12, 2024

Grok open release

Python 49,461 8,325 Updated Aug 30, 2024

🚴 Call stack profiler for Python. Shows you why your code is slow!

Python 6,486 230 Updated Sep 29, 2024

A large-scale simulation framework for LLM inference

Python 246 29 Updated Oct 1, 2024

Machnet provides applications like databases and finance an easy way to access low-latency DPDK-based messaging on public cloud VMs. 750K RPS on Azure at 61 us P99.9.

C++ 71 19 Updated Sep 29, 2024

Efficient RPCs for datacenter networks

C++ 851 138 Updated May 9, 2024

NumPy & SciPy for GPU

Python 9,297 837 Updated Oct 3, 2024

A tensor-aware point-to-point communication primitive for machine learning

C++ 248 77 Updated Dec 17, 2022
Python 1,178 170 Updated Sep 19, 2024

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,854 174 Updated Sep 11, 2024

Alpaca dataset from Stanford, cleaned and curated

Python 1,503 149 Updated Apr 14, 2023

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Python 114 7 Updated Aug 17, 2024

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 1,667 224 Updated Oct 7, 2024
Shell 9 2 Updated Mar 14, 2024

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,720 91 Updated Jan 21, 2024

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,112 65 Updated Feb 14, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,342 937 Updated Oct 1, 2024

Mamba SSM architecture

Python 12,750 1,075 Updated Sep 26, 2024

经济学人(含音频)、纽约客、卫报、连线、大西洋月刊等英语杂志免费下载,支持epub、mobi、pdf格式, 每周更新

CSS 21,355 1,656 Updated Oct 4, 2024

libcubwt is a library for GPU accelerated suffix array and burrows wheeler transform construction.

Cuda 30 1 Updated Feb 8, 2024

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 36,591 4,518 Updated Oct 6, 2024

Extending JAX with custom C++ and CUDA code

Python 372 21 Updated Aug 18, 2024

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 456 12 Updated Sep 16, 2024

Open-source software for volunteer computing and grid computing.

PHP 2,004 445 Updated Oct 6, 2024

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Python 9,125 513 Updated Sep 7, 2024
Next