google-research

Форк
0

README.md

Residual Attention Layer Transformers (RealFormer)

This repository contains the RealFormer model and pre-trained checkpoints for "RealFormer: Transformer Likes Residual Attention" (https://arxiv.org/abs/2012.11747), published in ACL-IJCNLP 2021.

To cite this work, please use:

@inproceedings{he2021realformer,
  title={RealFormer: Transformer Likes Residual Attention},
  author={Ruining He and Anirudh Ravula and Bhargav Kanagal and Joshua Ainslie},
  booktitle={Findings of The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)},
  year={2021}
}

Pre-trained BERT Models with RealFormer

We release pre-trained checkpoints as follows.

Model#Layers#HeadsHidden SizeIntermediate Size#ParametersCheckpoint
BERT-Small48512204830MDownload
BERT-Base12127683072110MDonwload
BERT-Large241610244096340MDonwload
BERT-xLarge3624153661441BDonwload

BERT Fine-tuning

Please follow the standard BERT fine-tuning procedure using the above pre-trained checkpoints. Hyper-parameter configuration can be found in the Appendix of the RealFormer paper (https://arxiv.org/abs/2012.11747).

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.