openllm
Examples with OpenLLM
You can find the following examples to interact with OpenLLM features. See more here
Features
The following notebook demonstrate general OpenLLM features and how to start running any open source models in production.
OpenAI-compatible endpoints
The openai_completion_client.py
demos how to use the OpenAI-compatible /v1/completions
to generate text.
export OPENLLM_ENDPOINT=https://api.openllm.compython openai_completion_client.py
# For streaming set STREAM=TrueSTREAM=True python openai_completion_client.py
The openai_chat_completion_client.py
demos how to use the OpenAI-compatible /v1/chat/completions
to chat with a model.
export OPENLLM_ENDPOINT=https://api.openllm.compython openai_chat_completion_client.py
# For streaming set STREAM=TrueSTREAM=True python openai_chat_completion_client.py
TinyLLM
The api_server.py
demos how one can easily write production-ready BentoML service with OpenLLM and vLLM.
Install requirements:
pip install -U "openllm[vllm]"
To serve the Bento (given you have access to GPU):
bentoml serve api_server:svc
To build the Bento do the following:
bentoml build -f bentofile.yaml .