Skip to content

Quantization of Models : Post-Training Quantization(PTQ) and Quantize Aware Training(QAT)

Notifications You must be signed in to change notification settings

ambideXtrous9/Quantization-of-Models-PTQ-and-QAT

Repository files navigation

Quantization

Quantization is a model size reduction technique that converts model weights from high-precision floating-point representation to low-precision floating-point (FP) or integer (INT) representations, such as 16-bit or 8-bit.

image

image

Post-Training Quantization (PTQ)

Post-training quantization (PTQ) is a quantization technique where the model is quantized after it has been trained.

Quantization-Aware Training (QAT)

Quantization-aware training (QAT) is a fine-tuning of the PTQ model, where the model is further trained with quantization in mind. The quantization process (scaling, clipping, and rounding) is incorporated into the training process, allowing the model to be trained to retain its accuracy even after quantization

image

References :

  1. QAT PyTorch
  2. QAT Details

Releases

No releases published

Packages

No packages published