belle

Форк
0

README.md

GPTQ-for-Bloom & LLaMa

8 bits quantization of Bloom using GPTQ

GPTQ is SOTA one-shot weight quantization method

This code is based on GPTQ-for-LLaMa

Huggingface models

model namefile sizeGPU memory usage
base27G~28.2G
bloom7b-2m-8bit-128g.pt9.7G~11.4G
bloom7b-2m-4bit-128g.pt6.9G~8.4G
bloom7b-0.2m-8bit-128g.pt9.7G~11.4G
bloom7b-0.2m-4bit-128g.pt6.9G~8.4G

All experiments were run on a single NVIDIA A100.

Installation

If you don't have conda, install it first.

conda create --name gptq python=3.9 -y
conda activate gptq
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
# Or, if you're having trouble with conda, use pip with python3.9:
# pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

pip install -r requirements.txt
python setup_cuda.py install

# Benchmark performance for FC2 layer of LLaMa-7B
CUDA_VISIBLE_DEVICES=0 python test_kernel.py

Dependencies

Model inference with the saved model

# BELLE-7B-gptq: local saved model path from Huggingface
git lfs install
git clone https://huggingface.co/BelleGroup/BELLE-7B-gptq
# model inference with the saved model
CUDA_VISIBLE_DEVICES=0 python bloom_inference.py BELLE-7B-gptq --wbits 8 --groupsize 128 --load BELLE-7B-gptq/bloom7b-2m-8bit-128g.pt --text "hello"

Model quantization

# BELLE-7B-gptq: local saved model path
# Save compressed model
CUDA_VISIBLE_DEVICES=0 python bloom.py BelleGroup/BELLE-7B-2M wikitext2 --wbits 8 --groupsize 128 --save BELLE-7B-gptq/bloom7b-2m-8bit-128g.pt

CUDA Kernels support 2,3,4,8 bits.

Basically, 8-bit quantization and 128 groupsize are recommended.

Acknowledgements

This code is based on GPTQ-for-LLaMa

Thanks to Bloom, a powerful LLM.

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.