CSS-LM
CSS-LM
CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models
-
WWW-Workshop 2021 Accepted.
-
IEEE/TASLP 2021 Accepted.
Overview
CSS-LM improves the fine-tuning phase of PLMs via contrastive semi-supervised learning. Specifically, given a specific task, we retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to the task. By performing contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances, CSS-LM can help PLMs capture crucial task-related semantic features and achieve better performance in low-resource scenarios.
Setups
- python>=3.6
- torch>=2.0.0+cu118
Requirements
pip install -r requirement.sh
Prepare the data
Download the open domain corpus (openwebtext
) and backbone models (roberta-base
, bert-base-uncased
) and move them to the corresponding directories.
wget https://cloud.tsinghua.edu.cn/f/690e78d324ee44068857/?dl=1mv 'index.html?dl=1' download.zipunzip download.zip
rm -rf __MACOSXscp -r download/openwebtext datascp -r download/roberta-base script/roberta-base-768scp -r download/bert-base-uncased script/bert-base-768
Semi-supervised Contrastive Fine-tuning (CSS-LM)
The CSS-LM (run_${DATASET}_sscl_dt_k.sh
and run_bert_${DATASET}_sscl_dt_k.sh
) is our main method. Users can run the the example of script/semeval_example.sh
for i_th in {1..5};
do
#RoBERTa-base Model
bash run_semeval_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
bash run_semeval_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
#BERT-base Moodel
bash run_bert_semeval_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
bash run_bert_semeval_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
done
We will introduce the the whole training pipeline and provide the detail of arguments in the following parts.
Run the All Experiments
Excute 'script/run1.sh'.
cd scriptbash run1.sh
The run1.sh script.
for i_th in {1..5};do #RoBERTa-based Model bash run_${DATASET}_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th bash run_${DATASET}_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th bash run_${DATASET}_st.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th bash run_${DATASET}_sscl.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th
#BERT-based Moodel bash run_bert_${DATASET}_finetune.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th bash run_bert_${DATASET}_sscl_dt_k.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th bash run_bert_${DATASET}_st.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_th bash run_bert_${DATASET}_sscl.sh $gpu_0 $gpu_1 $gpu_2 $gpu_3 $N_1 $N_2 $N_3 $N_times_1 $N_times_2 $batch_size $max_length $i_thdone
In run1.sh
, we have two kinds of backbone models (BERT
and RoBERTa
).
RoBERTa-based
- run_${DATASET}_finetune.sh: Few-shot Fine-tuning (Standard)
- run_${DATASET}_sscl_dt_k.sh: Semi-supervised Contrastive Fine-tuning (CSS-LM)
- run_${DATASET}_st.sh: Supervised Contrastive Fine-tuning (SCF)
- run_${DATASET}_sscl.sh: Semi-supervised Contrastive Pseudo Labeling Fine-tuning (CSS-LM-ST)
BERT-based
- run_bert_${DATASET}_finetune.sh: Few-shot Fine-tuning (Standard)
- run_bert_${DATASET}_finetune.sh: Semi-supervised Contrastive Fine-tuning (CSS-LM)
- run_bert_${DATASET}_finetune.sh: Supervised Contrastive Fine-tuning (SCF)
- run_bert_${DATASET}_finetune.sh: Semi-supervised Contrastive Pseudo Labeling Fine-tuning (CSS-LM-ST)
Arguments
${DATASET}
: Can be semeval, sst5, scicite, aclintent, sciie, chemprot, and chemprot.$gpu_0 $gpu_1 $gpu_2 $gpu_3
: You could assign the numbers of GPUs and gpu_ids that you need.$N_1 $N_2 $N_3
: The number of annotated instances.$N_times_1 $N_times_2
: The number of training epoches.$batch_size
: Training batch size.$max_length
: The max length of the input sentence.$i_th
: Given 5 random seeds to train the models. Each$i_th
indicates the different random seed.
Citation
Please cite our paper if you use CSS-LM in your work:
@article{su2021csslm,
title={CSS-LM: A Contrastive Framework for Semi-Supervised Fine-Tuning of Pre-Trained Language Models},
volume={29},
ISSN={2329-9304},
url={http://dx.doi.org/10.1109/TASLP.2021.3105013},
DOI={10.1109/taslp.2021.3105013},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
publisher={Institute of Electrical and Electronics Engineers (IEEE)},
author={Su, Yusheng and Han, Xu and Lin, Yankai and Zhang, Zhengyan and Liu, Zhiyuan and Li, Peng and Zhou, Jie and Sun, Maosong},
year={2021},
pages={2930–2941}
}
Contact
Описание
CSS-LM: Contrastive Semi-supervised Fine-tuning of Pre-trained Language Models
Языки
Python
- Shell