To install the environment, run:
pip install -r requirements.txt
Download the GLUE data using this repository or from GLUE benchmark website, unpack it to directory datas/glue
and rename the folder CoLA
to COLA
.
Download bert_uncased_L-12_H-768_A-12
(BERT-base) and bert_uncased_L-6_H-768_A-12
for teacher model and student model, respectively, from this repository. and use the API from Huggingface to transform them to pytorch checkpoint.
We provide training script for each task in script/teacher/
, where the $TEACHER_PATH is the path of teacher model.
AD-KD can be run on single-GPU or multi-GPU, but make sure to use DistributedDataParallel instead of DataParallel in Pytorch when using multi-GPU. Here we provide the scripts with single-GPU in script/student/
, where the $TEACHER_PATH and $STUDENT_PATH are the path of teacher model and student model, respectively.
The distilled student model for each task reported in the paper can be downloaded as follows:
from transformers import BertForSequenceClassification
task_name = 'cola' # task name with lower case
model = BertForSequenceClassification.from_pretrained("Brucewsy/AD-KD_bert_uncased_L-6_H-768_A-12_" + task_name)