GenerativeAIExamples
210 строк · 6.4 Кб
1{
2"cells": [
3{
4"cell_type": "markdown",
5"id": "a6231285",
6"metadata": {},
7"source": [
8"# Press Release Chat Bot\n",
9"\n",
10"As part of this generative AI workflow, we create a NVIDIA PR chatbot that answers questions from the NVIDIA news and blogs from years of 2022 and 2023. For this, we have created a REST FastAPI server that wraps llama-index. The API server has two methods, ```upload_document``` and ```generate```. The ```upload_document``` method takes a document from the user's computer and uploads it to a Milvus vector database after splitting, chunking and embedding the document. The ```generate``` API method generates an answer from the provided prompt optionally sourcing information from a vector database. "
11]
12},
13{
14"cell_type": "markdown",
15"id": "4c74eaf2",
16"metadata": {},
17"source": [
18"#### Step-1: Load the pdf files from the dataset folder.\n",
19"\n",
20"You can upload the pdf files containing the NVIDIA blogs to ```query:8081/uploadDocument``` API endpoint"
21]
22},
23{
24"cell_type": "code",
25"execution_count": null,
26"id": "263a7a8b",
27"metadata": {},
28"outputs": [],
29"source": [
30"%%capture\n",
31"!unzip dataset.zip"
32]
33},
34{
35"cell_type": "code",
36"execution_count": null,
37"id": "c2244b8c",
38"metadata": {},
39"outputs": [],
40"source": [
41"import os\n",
42"import requests\n",
43"import mimetypes\n",
44"\n",
45"def upload_document(file_path, url):\n",
46" headers = {\n",
47" 'accept': 'application/json'\n",
48" }\n",
49" mime_type, _ = mimetypes.guess_type(file_path)\n",
50" files = {\n",
51" 'file': (file_path, open(file_path, 'rb'), mime_type)\n",
52" }\n",
53" response = requests.post(url, headers=headers, files=files)\n",
54"\n",
55" return response.text\n",
56"\n",
57"def upload_pdf_files(folder_path, upload_url, num_files):\n",
58" i = 0\n",
59" for files in os.listdir(folder_path):\n",
60" _, ext = os.path.splitext(files)\n",
61" # Ingest only pdf files\n",
62" if ext.lower() == \".pdf\":\n",
63" file_path = os.path.join(folder_path, files)\n",
64" print(upload_document(file_path, upload_url))\n",
65" i += 1\n",
66" if i > num_files:\n",
67" break"
68]
69},
70{
71"cell_type": "code",
72"execution_count": null,
73"id": "4f5c99ac",
74"metadata": {},
75"outputs": [],
76"source": [
77"import time\n",
78"\n",
79"start_time = time.time()\n",
80"NUM_DOCS_TO_UPLOAD=100\n",
81"upload_pdf_files(\"dataset\", \"http://chain-server:8081/documents\", NUM_DOCS_TO_UPLOAD)\n",
82"print(f\"--- {time.time() - start_time} seconds ---\")"
83]
84},
85{
86"cell_type": "markdown",
87"id": "830882ef",
88"metadata": {},
89"source": [
90"#### Step-2 : Ask a question without referring to the knowledge base\n",
91"Ask Tensorrt LLM llama-2 13B model a question about \"the nvidia grace superchip\" without seeking help from the vectordb/knowledge base by setting ```use_knowledge_base``` to ```false```"
92]
93},
94{
95"cell_type": "code",
96"execution_count": null,
97"id": "4eb862fd",
98"metadata": {},
99"outputs": [],
100"source": [
101"import time\n",
102"import json\n",
103"\n",
104"data = {\n",
105" \"messages\": [\n",
106" {\n",
107" \"role\": \"user\",\n",
108" \"content\": \"how many cores are on the nvidia grace superchip?\"\n",
109" }\n",
110" ],\n",
111" \"use_knowledge_base\": \"false\",\n",
112" \"max_tokens\": 256\n",
113"}\n",
114"\n",
115"url = \"http://chain-server:8081/generate\"\n",
116"\n",
117"start_time = time.time()\n",
118"with requests.post(url, stream=True, json=data) as req:\n",
119" for chunk in req.iter_lines():\n",
120" raw_resp = chunk.decode(\"UTF-8\")\n",
121" if not raw_resp:\n",
122" continue\n",
123" resp_dict = json.loads(raw_resp[6:])\n",
124" resp_choices = resp_dict.get(\"choices\", [])\n",
125" if len(resp_choices):\n",
126" resp_str = resp_choices[0].get(\"message\", {}).get(\"content\", \"\")\n",
127" print(resp_str, end =\"\")\n",
128"\n",
129"print(f\"--- {time.time() - start_time} seconds ---\")"
130]
131},
132{
133"cell_type": "markdown",
134"id": "fcf37ee9",
135"metadata": {},
136"source": [
137"Now ask it the same question by setting ```use_knowledge_base``` to ```true```"
138]
139},
140{
141"cell_type": "code",
142"execution_count": null,
143"id": "e904a658",
144"metadata": {},
145"outputs": [],
146"source": [
147"data = {\n",
148" \"messages\": [\n",
149" {\n",
150" \"role\": \"user\",\n",
151" \"content\": \"how many cores are on the nvidia grace superchip?\"\n",
152" }\n",
153" ],\n",
154" \"use_knowledge_base\": \"true\",\n",
155" \"max_tokens\": 50\n",
156"}\n",
157"\n",
158"url = \"http://chain-server:8081/generate\"\n",
159"\n",
160"start_time = time.time()\n",
161"tokens_generated = 0\n",
162"with requests.post(url, stream=True, json=data) as req:\n",
163" for chunk in req.iter_lines():\n",
164" raw_resp = chunk.decode(\"UTF-8\")\n",
165" if not raw_resp:\n",
166" continue\n",
167" resp_dict = json.loads(raw_resp[6:])\n",
168" resp_choices = resp_dict.get(\"choices\", [])\n",
169" if len(resp_choices):\n",
170" resp_str = resp_choices[0].get(\"message\", {}).get(\"content\", \"\")\n",
171" print(resp_str, end =\"\")\n",
172"\n",
173"total_time = time.time() - start_time\n",
174"print(f\"\\n--- Generated {tokens_generated} tokens in {total_time} seconds ---\")\n",
175"print(f\"--- {tokens_generated/total_time} tokens/sec\")"
176]
177},
178{
179"cell_type": "markdown",
180"id": "58954d15",
181"metadata": {},
182"source": [
183"#### Next steps\n",
184"\n",
185"We have setup a playground UI for you to upload files and get answers from, the UI is available on the same IP address as the notebooks: `host_ip:8090/converse`"
186]
187}
188],
189"metadata": {
190"kernelspec": {
191"display_name": "Python 3 (ipykernel)",
192"language": "python",
193"name": "python3"
194},
195"language_info": {
196"codemirror_mode": {
197"name": "ipython",
198"version": 3
199},
200"file_extension": ".py",
201"mimetype": "text/x-python",
202"name": "python",
203"nbconvert_exporter": "python",
204"pygments_lexer": "ipython3",
205"version": "3.10.6"
206}
207},
208"nbformat": 4,
209"nbformat_minor": 5
210}
211