Name		Name	Last commit message	Last commit date
parent directory ..
1 - tiled matmul.ipynb		1 - tiled matmul.ipynb
2 - matmul offsets.ipynb		2 - matmul offsets.ipynb
3 - online softmax.ipynb		3 - online softmax.ipynb
4 - flash attention.ipynb		4 - flash attention.ipynb
README.md		README.md
bert e2e.ipynb		bert e2e.ipynb
t5 e2e.ipynb		t5 e2e.ipynb

README.md

Tutorial

This folder contains 2 kinds of tutorials:

end to end examples of how to use the library
onboard team members and contributors on this project.

End to end examples

XNLI classification: classification with / without optimizations (Roberta + XNLI classification task)
text generation: text generation with / without optimizations (T5)

Learning materials

Tutorials below will show you how to implement a GPU kernel.
It requires basic knowledge of how GPU works, in particular its memory hierarchy.
If you are not familiar with that, check this article first.

Tutorials below are written in Pytorch in the style of triton (rewriting is trivial), to ease the learning.

tiled matmul: matrix multiplication implementation in CUDA style
online softmax: parallelized softmax computation, a key ingredient of Flash Attention
Flash Attention: attention computation without saving attention matrix to global memory
matmul offsets: detailed explanations related to a performance trick used in triton matmul tutorial

Flash Attention tutorial covers most of what you need to know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tutorial

tutorial

README.md

Tutorial

End to end examples

Learning materials

Files

tutorial

Directory actions

More options

Directory actions

More options

Latest commit

History

tutorial

Folders and files

parent directory

README.md

Tutorial

End to end examples

Learning materials