P2-ViT

This repo contains the official implementation of "P2-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer".

Abstract

Transformer-based architectures have achieved competitive performance in various CV tasks. Compared to the CNNs, Vision Transformers (ViTs) have excelled in computer vision tasks but are memory-consuming and computation-intensive, challenging their deployment on resource-constrained devices. To tackle this limitation, prior works have explored ViT-tailored quantization algorithms but retained floating-point scaling factors, which yield non-negligible re-quantization overhead, limiting ViTs' hardware efficiency and motivating more hardware-friendly solutions.

To this end, we propose P$^2$-ViT, the first Power-of-Two (PoT) post-training quantization and acceleration framework to accelerate fully quantized ViTs. Specifically, as for quantization, we explore a dedicated quantization scheme to effectively quantize ViTs with PoT scaling factors, thus minimizing the re-quantization overhead. Furthermore, we propose coarse-to-fine automatic mixed-precision quantization to enable better accuracy-efficiency trade-offs. In terms of hardware, we develop a dedicated chunk-based accelerator featuring multiple tailored sub-processors to individually handle ViTs' different types of operations, alleviating reconfigurable overhead. Additionally, we design a tailored row-stationary dataflow to seize the pipeline processing opportunity introduced by our PoT scaling factors, thereby enhancing throughput.

Extensive experiments consistently validate P$^2$-ViT's effectiveness. Particularly, we offer comparable or even superior quantization performance with PoT scaling factors when compared to the counterpart with floating-point scaling factors. Besides, we achieve up to $\mathbf{10.1\times}$ speedup and $\mathbf{36.8\times}$ energy saving over GPU's Turing Tensor Cores, and up to $\mathbf{1.84\times}$ higher computation utilization efficiency against SOTA quantization-based ViT accelerators.

Quantization

Run

Example: Evaluate quantized DeiT-S with MinMax quantizer.

python test_quant.py deit_small <YOUR_DATA_DIR> --quant --quant-method minmax

deit_small: model architecture, which can be replaced by deit_tiny, deit_base, vit_base, vit_large, swin_tiny, swin_small and swin_base.
--quant: whether to quantize the model.
--quant-method: quantization methods of activations, which can be chosen from minmax, ema, percentile and omse.

Citation

If you find this repo useful in your research, please consider citing the following paper:

@article{shi2024p,
  title={P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer},
  author={Shi, Huihong and Cheng, Xin and Mao, Wendong and Wang, Zhongfeng},
  journal={IEEE Transactions on Very Large Scale Integration Systems},
  year={2024}
}

Acknowledgment

Our code implementation is based on FQ-ViT.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
figures		figures
models		models
pyhessian		pyhessian
utils		utils
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
config.py		config.py
generate_data.py		generate_data.py
test.py		test.py
test.sh		test.sh
test_quant.py		test_quant.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

P2-ViT

Abstract

Quantization

Run

Citation

Acknowledgment

About

Releases

Packages

Contributors 5

Languages

License

shihuihong214/P2-ViT

Folders and files

Latest commit

History

Repository files navigation

P2-ViT

Abstract

Quantization

Run

Citation

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages