google-research
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
This repository contains the checkpoints and vocabularies from MADLAD-400: A Multilingual And Document-Level Large Audited Dataset.
Checkpoints
Model | Checkpoint |
---|---|
8B parameter LM | link |
3B parameter MT model | link |
7.2B parameter MT model | link |
7.2B parameter MT model (finetuned on backtranslated data) | link |
10.7B parameter MT model | link |
Vocabulary
The vocabularies used to train the models listed above are here.
Example usage
We provide a simple colab example showcasing how to use the released checkpoints for translation.
Contact
Please reach out to {snehakudugunta, icaswell}꩜google.com for any questions or observed issues. Issues will be listed on this page to aid future users. For questions about the canaries, reach out to cchoquette@google.com.