vllm
Описание
A high-throughput and memory-efficient inference and serving engine for LLMs
Языки
Python
- Dockerfile
- Cuda
- C
- C++
- Shell
- Jinja
год назад
год назад
год назад
README.md
A high-throughput and memory-efficient inference and serving engine for LLMs
Python