Residual Attention Layer Transformers (RealFormer)

This repository contains the RealFormer model and pre-trained checkpoints for "RealFormer: Transformer Likes Residual Attention" (https://arxiv.org/abs/2012.11747), published in ACL-IJCNLP 2021.

To cite this work, please use:

@inproceedings{he2021realformer,
  title={RealFormer: Transformer Likes Residual Attention},
  author={Ruining He and Anirudh Ravula and Bhargav Kanagal and Joshua Ainslie},
  booktitle={Findings of The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)},
  year={2021}
}

Pre-trained BERT Models with RealFormer

We release pre-trained checkpoints as follows.

Model	#Layers	#Heads	Hidden Size	Intermediate Size	#Parameters	Checkpoint
BERT-Small	4	8	512	2048	30M	Download
BERT-Base	12	12	768	3072	110M	Donwload
BERT-Large	24	16	1024	4096	340M	Donwload
BERT-xLarge	36	24	1536	6144	1B	Donwload

BERT Fine-tuning

Please follow the standard BERT fine-tuning procedure using the above pre-trained checkpoints. Hyper-parameter configuration can be found in the Appendix of the RealFormer paper (https://arxiv.org/abs/2012.11747).

google-research

Residual Attention Layer Transformers (RealFormer)

Pre-trained BERT Models with RealFormer

BERT Fine-tuning

Использование cookies

google-research

DDDaniel DuckworthAdd demo notebook for SMERF6 месяцев назадf9150d

Residual Attention Layer Transformers (RealFormer)

Pre-trained BERT Models with RealFormer

BERT Fine-tuning

Использование cookies