GigaChat
GigaChat
Table of contents
GigaChat-20B-A3B-base
A large language model based on Mixture of Experts (MoE) architecture, trained specifically for the Russian language from scratch. The model has 20B parameters in total with 3B active. Supproted context length =131k tokens.
Resources
Benchmarks
Common LLM metrics by LM Evaluation Harness.
Bench | T-lite-0.1 (llama 3.1 8B based) | Llama-3.1-8B | GigaChat-20B-A3B-base | Gemma-9B |
---|---|---|---|---|
MMLU (5-shot) | 62.56 | 65.21 | 63.02 | 70.6 |
MMLU-pro (5-shot) | 32.19 | 35.7 | 31.41 | 42.85 |
MMLU-ru (5-shot) | 55.51 | 54.1 | 58.38 | 62.57 |
BBH (3-shot) | 62.36 | 62.79 | 53.54 | 70.48 |
ARC-C (25-shot) | 58.19 | 54.69 | 61.69 | 68.34 |
TruthfulQA (0-shot) (rougeL) | 46.51 | 34.52 | 31.82 | 41.49 |
Winogrande (5-shot) | 78.45 | 77.43 | 75.85 | 79.4 |
Hellaswag (10-shot) | 82.21 | 81.85 | 81.91 | 82.5 |
GPQA (5-shot) | 0.25 | 23.44 | 25.22 | 30.36 |
MATH (4-shot) | 12.9 | 14.04 | 15.04 | 20.06 |
GSM8K (4-shot) (strict-match) | 67.93 | 51.4 | 59.06 | 68.99 |
HumanEval | 16.46 | 25.61 | 32.32 | 37.2 |
AVG | 47.96 | 48.4 | 49.11 | 56.24 |
GigaChat-20B-A3B-instruct
A dialog GigaChat model, based on GigaChat-20B-A3B-base. Supproted context length =131k tokens.
Resources
Benchmarks
T-lite-instruct-0.1 (llama 3.1 8B based) | gemma-2-9b-it | GigaChat-20B-A3B-instruct | |
---|---|---|---|
MERA | 0.335 | 0.392 | 0.513 |
ru-MMLU 5-shot | 0,555 | 0,626 | 0,598 |
Shlepa | 0.36 | 0.388 | 0.482 |
Mechanism of concentration
Mechanism that forces GigaChat MoE to answer in a certain domain. It is based on the property of GigaChat MoE experts specialization.