Skip to content
/ AD-KD Public

Source code of ACL 2023 accepted paper "AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression"

Notifications You must be signed in to change notification settings

brucewsy/AD-KD

Repository files navigation

AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression (ACL 2023)

Installation

To install the environment, run:

pip install -r requirements.txt

Download GLUE Data

Download the GLUE data using this repository or from GLUE benchmark website, unpack it to directory datas/glue and rename the folder CoLA to COLA.

Download Pre-trained BERT

Download bert_uncased_L-12_H-768_A-12 (BERT-base) and bert_uncased_L-6_H-768_A-12 for teacher model and student model, respectively, from this repository. and use the API from Huggingface to transform them to pytorch checkpoint.

Task-specific Teacher Model Training

We provide training script for each task in script/teacher/, where the $TEACHER_PATH is the path of teacher model.

Task-specific Student Model Distillation

AD-KD can be run on single-GPU or multi-GPU, but make sure to use DistributedDataParallel instead of DataParallel in Pytorch when using multi-GPU. Here we provide the scripts with single-GPU in script/student/, where the $TEACHER_PATH and $STUDENT_PATH are the path of teacher model and student model, respectively.

Student Checkpoints

The distilled student model for each task reported in the paper can be downloaded as follows:

from transformers import BertForSequenceClassification
task_name = 'cola' # task name with lower case
model = BertForSequenceClassification.from_pretrained("Brucewsy/AD-KD_bert_uncased_L-6_H-768_A-12_" + task_name)

About

Source code of ACL 2023 accepted paper "AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published