Stars
4
stars
written in Cuda
Clear filter
🎉 CUDA Learn Notes with PyTorch: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、hist etc.
A new tensorrt integrate. Easy to integrate many tasks
高效部署:YOLO X, V3, V4, V5, V6, V7, V8, EdgeYOLO TRT推理 ™️ 🔝 ,前后处理均由CUDA核函数实现 CPP/CUDA🚀