-
gpgpu-sim_distribution Public
Forked from gpgpu-sim/gpgpu-sim_distributionGPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
C++ Other UpdatedAug 9, 2024 -
Time-Series-Library Public
Forked from thuml/Time-Series-LibraryA Library for Advanced Deep Time Series Models.
Python MIT License UpdatedJul 23, 2024 -
GenZ-LLM-Analyzer Public
Forked from abhibambhaniya/GenZ-LLM-AnalyzerLLM Inference analyzer for different hardware platforms
Jupyter Notebook MIT License UpdatedJul 15, 2024 -
ramulator2 Public
Forked from CMU-SAFARI/ramulator2Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM stan…
C++ MIT License UpdatedJul 10, 2024 -
galois Public
Forked from mhostetter/galoisA performant NumPy extension for Galois fields and their applications
Python MIT License UpdatedJul 7, 2024 -
tiny-gpu Public
Forked from adam-maj/tiny-gpuA minimal GPU design in Verilog to learn how GPUs work from the ground up
SystemVerilog UpdatedMay 1, 2024 -
VAR Public
Forked from FoundationVision/VAR[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly …
Python MIT License UpdatedApr 30, 2024 -
wanda Public
Forked from locuslab/wandaA simple and effective LLM pruning approach.
Python MIT License UpdatedApr 29, 2024 -
llmperf Public
Forked from ray-project/llmperfLLMPerf is a library for validating and benchmarking LLMs
Python Apache License 2.0 UpdatedApr 28, 2024 -
Awesome-LLM-Inference Public
Forked from DefTruth/Awesome-LLM-Inference📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
GNU General Public License v3.0 UpdatedApr 21, 2024 -
LLM-Viewer Public
Forked from hahnyuan/LLM-ViewerAnalyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Python MIT License UpdatedApr 12, 2024 -
sparsegpt Public
Forked from IST-DASLab/sparsegptCode for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Python Apache License 2.0 UpdatedApr 6, 2024 -
C(UDA) accelerated language model inference
C MIT License UpdatedApr 3, 2024 -
gpu-benches Public
Forked from te42kyfo/gpu-benchescollection of benchmarks to measure basic GPU capabilities
Jupyter Notebook GNU General Public License v3.0 UpdatedApr 3, 2024 -
-
sot Public
Forked from imagination-research/sot[ICLR 2024] Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
Python MIT License UpdatedMar 1, 2024 -
mixbench Public
Forked from ekondis/mixbenchA GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
C++ GNU General Public License v2.0 UpdatedFeb 23, 2024 -
pdf_annotation_fix Public
Forked from julihoh/pdf_annotation_fixFixes macOS Preview garbled annotations
Rust MIT License UpdatedFeb 23, 2024 -
model_analyzer Public
Forked from triton-inference-server/model_analyzerTriton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Python Apache License 2.0 UpdatedFeb 20, 2024 -
llm-analysis Public
Forked from cli99/llm-analysisLatency and Memory Analysis of Transformer Models for Training and Inference
Python Apache License 2.0 UpdatedFeb 3, 2024 -
llm_profiler Public
Forked from HarleysZhang/llm_profilerllm theoretical performance analysis tools and support params, flops, memory and latency analysis.
Python UpdatedFeb 2, 2024 -
gemmini Public
Forked from ucb-bar/gemminiBerkeley's Spatial Array Generator
Scala Other UpdatedJan 29, 2024 -
BladeDISC Public
Forked from alibaba/BladeDISCBladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
C++ Apache License 2.0 UpdatedJan 22, 2024 -
zigzag Public
Forked from KULeuven-MICAS/zigzagHW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators
C++ BSD 3-Clause "New" or "Revised" License UpdatedJan 19, 2024 -
scale-sim-v2 Public
Forked from scalesim-project/scale-sim-v2Repository to host and maintain scale-sim-v2 code
Python UpdatedJan 2, 2024 -
ECC-exercise Public
Forked from scalable-arch/ECC-exerciseVerilog CERN Open Hardware Licence Version 2 - Strongly Reciprocal UpdatedDec 15, 2023 -
nnfusion Public
Forked from microsoft/nnfusionA flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
C++ MIT License UpdatedNov 22, 2023 -
-
Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)
HTML UpdatedOct 22, 2023 -
Computer-Science-Textbooks Public
Forked from kaitoukito/Computer-Science-TextbooksCollect some CS textbooks for learning.
UpdatedOct 17, 2023