GigaChat

0

Описание

Языки

  • Jupyter Notebook100%
README.md

GigaChat

Table of contents

GigaChat-20B-A3B-base

A large language model based on Mixture of Experts (MoE) architecture, trained specifically for the Russian language from scratch. The model has 20B parameters in total with 3B active. Supproted context length =131k tokens.

Resources

Benchmarks

Common LLM metrics by LM Evaluation Harness.

BenchT-lite-0.1
(llama 3.1 8B based)
Llama-3.1-8BGigaChat-20B-A3B-baseGemma-9B
MMLU (5-shot)62.5665.2163.0270.6
MMLU-pro (5-shot)32.1935.731.4142.85
MMLU-ru (5-shot)55.5154.158.3862.57
BBH (3-shot)62.3662.7953.5470.48
ARC-C (25-shot)58.1954.6961.6968.34
TruthfulQA (0-shot) (rougeL)46.5134.5231.8241.49
Winogrande (5-shot)78.4577.4375.8579.4
Hellaswag (10-shot)82.2181.8581.9182.5
GPQA (5-shot)0.2523.4425.2230.36
MATH (4-shot)12.914.0415.0420.06
GSM8K (4-shot) (strict-match)67.9351.459.0668.99
HumanEval16.4625.6132.3237.2
AVG47.9648.449.1156.24

GigaChat-20B-A3B-instruct

A dialog GigaChat model, based on GigaChat-20B-A3B-base. Supproted context length =131k tokens.

Resources

Benchmarks

T-lite-instruct-0.1
(llama 3.1 8B based)
gemma-2-9b-itGigaChat-20B-A3B-instruct
MERA0.3350.3920.513
ru-MMLU 5-shot0,5550,6260,598
Shlepa0.360.3880.482

Mechanism of concentration

Mechanism that forces GigaChat MoE to answer in a certain domain. It is based on the property of GigaChat MoE experts specialization.

Resources