tensorrt-llm

0

Описание

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Языки

C++

  • C
  • Makefile
  • Shell
  • PowerShell
  • CMake
  • Python
  • Cuda
  • Smarty
  • Dockerfile
Сообщить о нарушении
README.md