mixtralkit
MixtralKit
A Toolkit for Mixtral Model
📊Performance • ✨Resources • 📖Architecture • 📂Weights • 🔨 Install • 🚀Inference • 🤝 Acknowledgement
English | 简体中文
[!Important]
📢 Welcome to try OpenCompass for model evaluation 📢
🤗 Request for update your mixtral-related projects is open!
🙏 This repo is an **experimental** implementation of inference code.
📊 Performance
Comparison with Other Models
- All data generated from OpenCompass
Performances generated from different evaluation toolkits are different due to the prompts, settings and implementation details.
Datasets | Mode | Mistral-7B-v0.1 | Mixtral-8x7B(MoE) | Llama2-70B | DeepSeek-67B-Base | Qwen-72B |
---|---|---|---|---|---|---|
Active Params | - | 7B | 12B | 70B | 67B | 72B |
MMLU | PPL | 64.1 | 71.3 | 69.7 | 71.9 | 77.3 |
BIG-Bench-Hard | GEN | 56.7 | 67.1 | 64.9 | 71.7 | 63.7 |
GSM-8K | GEN | 47.5 | 65.7 | 63.4 | 66.5 | 77.6 |
MATH | GEN | 11.3 | 22.7 | 12.0 | 15.9 | 35.1 |
HumanEval | GEN | 27.4 | 32.3 | 26.2 | 40.9 | 33.5 |
MBPP | GEN | 38.6 | 47.8 | 39.6 | 55.2 | 51.6 |
ARC-c | PPL | 74.2 | 85.1 | 78.3 | 86.8 | 92.2 |
ARC-e | PPL | 83.6 | 91.4 | 85.9 | 93.7 | 96.8 |
CommonSenseQA | PPL | 67.4 | 70.4 | 78.3 | 70.7 | 73.9 |
NaturalQuestion | GEN | 24.6 | 29.4 | 34.2 | 29.9 | 27.1 |
TrivialQA | GEN | 56.5 | 66.1 | 70.7 | 67.4 | 60.1 |
HellaSwag | PPL | 78.9 | 82.0 | 82.3 | 82.3 | 85.4 |
PIQA | PPL | 81.6 | 82.9 | 82.5 | 82.6 | 85.2 |
SIQA | GEN | 60.2 | 64.3 | 64.8 | 62.6 | 78.2 |
Performance Mixtral-8x7b
dataset version metric mode mixtral-8x7b-32k-------------------------------------- --------- ------------- ------ ------------------mmlu - naive_average ppl 71.34ARC-c 2ef631 accuracy ppl 85.08ARC-e 2ef631 accuracy ppl 91.36BoolQ 314797 accuracy ppl 86.27commonsense_qa 5545e2 accuracy ppl 70.43triviaqa 2121ce score gen 66.05nq 2121ce score gen 29.36openbookqa_fact 6aac9e accuracy ppl 85.40AX_b 6db806 accuracy ppl 48.28AX_g 66caf3 accuracy ppl 48.60hellaswag a6e128 accuracy ppl 82.01piqa 0cfff2 accuracy ppl 82.86siqa e8d8c5 accuracy ppl 64.28math 265cce accuracy gen 22.74gsm8k 1d7fe4 accuracy gen 65.66openai_humaneval a82cae humaneval_pass@1 gen 32.32mbpp 1e1056 score gen 47.80bbh - naive_average gen 67.14
✨ Resources
Blog
- MoE Blog from Hugging Face
- Enhanced MoE Parallelism, Open-source MoE Model Training Can Be 9 Times More Efficient
Papers
Evaluation
- Evaluation Toolkit: OpenCompass
Training
- Megablocks: https://github.com/stanford-futuredata/megablocks
- FairSeq: https://github.com/facebookresearch/fairseq/tree/main/examples/moe_lm
- OpenMoE: https://github.com/XueFuzhao/OpenMoE
- ColossalAI MoE: https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/openmoe
- FastMoE(FasterMoE): https://github.com/laekov/FastMoE
- SmartMoE: https://github.com/zms1999/SmartMoE
Fine-tuning
- Finetuning script (Full-parameters or QLoRA) from XTuner
- Finetuned Mixtral-8x7B from DiscoResearch: DiscoLM-mixtral-8x7b-v2
Deployment
📖 Model Architecture
The Mixtral-8x7B-32K MoE model is mainly composed of 32 identical MoEtransformer blocks. The main difference between the MoEtransformer block and the ordinary transformer block is that the FFN layer is replaced by the MoE FFN layer. In the MoE FFN layer, the tensor first goes through a gate layer to calculate the scores of each expert, and then selects the top-k experts from the 8 experts based on the expert scores. The tensor is aggregated through the outputs of the top-k experts, thereby obtaining the final output of the MoE FFN layer. Each expert consists of 3 linear layers. It is worth noting that all Norm Layers of Mixtral MoE also use RMSNorm, which is the same as LLama. In the attention layer, the QKV matrix in the Mixtral MoE has a Q matrix shape of (4096,4096) and K and V matrix shapes of (4096,1024).
We plot the architecture as the following:
📂 Model Weights
Hugging Face Format
Raw Format
You can download the checkpoints by magnet or Hugging Face
Download via HF
If you are unable to access Hugging Face, please try hf-mirror
# Download the Hugging Facegit lfs installgit clone https://huggingface.co/someone13574/mixtral-8x7b-32kseqlen
# Merge Files(Only for HF)cd mixtral-8x7b-32kseqlen/
# Merge the checkpointscat consolidated.00.pth-split0 consolidated.00.pth-split1 consolidated.00.pth-split2 consolidated.00.pth-split3 consolidated.00.pth-split4 consolidated.00.pth-split5 consolidated.00.pth-split6 consolidated.00.pth-split7 consolidated.00.pth-split8 consolidated.00.pth-split9 consolidated.00.pth-split10 > consolidated.00.pth
Download via Magnet Link
Please use this link to download the original files
magnet:?xt=urn:btih:5546272da9065eddeb6fcd7ffddeef5b75be79a7&dn=mixtral-8x7b-32kseqlen&tr=udp%3A%2F%http://2Fopentracker.i2p.rocks%3A6969%2Fannounce&tr=http%3A%2F%http://2Ftracker.openbittorrent.com%3A80%2Fannounce
MD5 Validation
Please check the MD5 to make sure the files are completed.
md5sum consolidated.00.pthmd5sum tokenizer.model
# Once verified, you can delete the splited files.rm consolidated.00.pth-split*
Official MD5
╓────────────────────────────────────────────────────────────────────────────╖ ║ ║ ║ ·· md5sum ·· ║ ║ ║ ║ 1faa9bc9b20fcfe81fcd4eb7166a79e6 consolidated.00.pth ║ ║ 37974873eb68a7ab30c4912fc36264ae tokenizer.model ║ ╙────────────────────────────────────────────────────────────────────────────╜
🔨 Install
conda create --name mixtralkit python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -yconda activate mixtralkit
git clone https://github.com/open-compass/MixtralKitcd MixtralKit/pip install -r requirements.txtpip install -e .
ln -s path/to/checkpoints_folder/ ckpts
🚀 Inference
Text Completion
python tools/example.py -m ./ckpts -t ckpts/tokenizer.model --num-gpus 2
Expected Results:
==============================Example START==============================
[Prompt]:Who are you?
[Response]:I am a designer and theorist; a lecturer at the University of Malta and a partner in the firm Barbagallo and Baressi Design, which won the prestigious Compasso d’Oro award in 2004. I was educated in industrial and interior design in the United States
==============================Example END==============================
==============================Example START==============================
[Prompt]:1 + 1 -> 32 + 2 -> 53 + 3 -> 74 + 4 ->
[Response]:95 + 5 -> 116 + 6 -> 13
#include <iostream>
using namespace std;
int addNumbers(int x, int y){ return x + y;}
int main(){
==============================Example END==============================
🏗️ Evaluation
Step-1: Setup OpenCompass
- Clone and Install OpenCompass
# assume you have already create the conda env named mixtralkit conda activate mixtralkit
git clone https://github.com/open-compass/opencompass opencompasscd opencompass
pip install -e .
- Prepare Evaluation Dataset
# Download dataset to data/ folderwget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zipunzip OpenCompassData-core-20231110.zip
If you need to evaluate the humaneval, please go to Installation Guide for more information
Step-2: Pre-pare evaluation config and weights
cd opencompass/# link the example config into opencompassln -s path/to/MixtralKit/playground playground
# link the model weights into opencompassmkdir -p ./models/mixtral/ln -s path/to/checkpoints_folder/ ./models/mixtral/mixtral-8x7b-32kseqlen
Currently, you should have the files structure like:
opencompass/├── configs│ ├── .....│ └── .....├── models│ └── mixtral│ └── mixtral-8x7b-32kseqlen├── data/├── playground│ └── eval_mixtral.py│── ......
Step-3: Run evaluation experiments
HF_EVALUATE_OFFLINE=1 HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 python run.py playground/eval_mixtral.py
🤝 Acknowledgement
🖊️ Citation
@misc{2023opencompass, title={OpenCompass: A Universal Evaluation Platform for Foundation Models}, author={OpenCompass Contributors}, howpublished = {\url{https://github.com/open-compass/opencompass}}, year={2023}}
Описание
A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI
Языки
Python