GenerativeAIExamples

02_filling_RAG_outputs_for_Evaluation.ipynb
528 строк · 17.3 Кб
Перенос по словам
1
{
2
 "cells": [
3
  {
4
   "cell_type": "markdown",
5
   "id": "4afa980c-21be-44b8-807e-710b5de56198",
6
   "metadata": {},
7
   "source": [
8
    "##  Notebook 2: Filling RAG outputs For Evaluation\n",
9
    "\n",
10
    "In this notebook, we will use the example RAG pipeline to populate the RAG outputs: contexts (retrieved relevant documents) and answer (generated by RAG pipeline).\n",
11
    "\n",
12
    "The example RAG pipeline provided as part of this repository uses [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/) to build a chatbot that references a custom knowledge base. \n",
13
    "\n",
14
    "If you want to learn more about how the example RAG works, please see [03_llama_index_simple.ipynb](../notebooks/03_llama_index_simple.ipynb).\n",
15
    "\n",
16
    "- **Steps 1-5**: Build the RAG pipeline.\n",
17
    "- **Step 6**: Build the Query Engine, exposing the Retriever and Generator outputs\n",
18
    "- **Step 7**: Fill the RAG outputs "
19
   ]
20
  },
21
  {
22
   "cell_type": "markdown",
23
   "id": "191e7b90-128e-4432-82ab-897426389d06",
24
   "metadata": {},
25
   "source": [
26
    "### Steps 1-5: Build the RAG pipeline\n",
27
    "\n",
28
    "#### Define the LLM\n",
29
    "Here we are using a local llm on triton and the address and gRPC port that the Triton is available on. \n",
30
    "\n",
31
    "***If you are using AI Playground (no local GPU) replace, the code in the cell two cells below with the following: ***\n",
32
    "\n",
33
    "```\n",
34
    "import os\n",
35
    "from nv_aiplay import GeneralLLM\n",
36
    "os.environ['NVAPI_KEY'] = \"REPLACE_WITH_YOUR_API_KEY\"\n",
37
    "\n",
38
    "llm = GeneralLLM(\n",
39
    "    model=\"llama2_70b\",\n",
40
    "    temperature=0.2,\n",
41
    "    max_tokens=300\n",
42
    ")\n",
43
    "```"
44
   ]
45
  },
46
  {
47
   "cell_type": "code",
48
   "execution_count": 3,
49
   "id": "a18dfc7b",
50
   "metadata": {},
51
   "outputs": [],
52
   "source": [
53
    "%%capture\n",
54
    "!test -d dataset || unzip dataset.zip"
55
   ]
56
  },
57
  {
58
   "cell_type": "code",
59
   "execution_count": null,
60
   "id": "8a80987e-1ddb-4248-b76c-f3ce16745ca3",
61
   "metadata": {},
62
   "outputs": [],
63
   "source": [
64
    "from triton_trt_llm import TensorRTLLM\n",
65
    "from llama_index.llms.langchain import LangChainLLM\n",
66
    "trtllm =TensorRTLLM(server_url=\"llm:8001\", model_name=\"ensemble\", tokens=300)\n",
67
    "llm = LangChainLLM(llm=trtllm)"
68
   ]
69
  },
70
  {
71
   "cell_type": "markdown",
72
   "id": "bc57b68d-afd5-4a0c-832c-0ad8f3f475d5",
73
   "metadata": {},
74
   "source": [
75
    "#### Create a Prompt Template\n",
76
    "\n",
77
    "A [**prompt template**](https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/prompts.html) is a common paradigm in LLM development.\n",
78
    "\n",
79
    "They are a pre-defined set of instructions provided to the LLM and guide the output produced by the model. They can contain few shot examples and guidance and are a quick way to engineer the responses from the LLM. Llama 2 accepts the [prompt format](https://huggingface.co/blog/llama2#how-to-prompt-llama-2) shown in `LLAMA_PROMPT_TEMPLATE`, which we manipulate to be constructed with:\n",
80
    "- The system prompt\n",
81
    "- The context\n",
82
    "- The user's question\n",
83
    "  \n",
84
    "Much like LangChain's abstraction of prompts, LlamaIndex has similar abstractions for you to create prompts."
85
   ]
86
  },
87
  {
88
   "cell_type": "code",
89
   "execution_count": null,
90
   "id": "682ec812-33be-430f-8bb1-ae3d68690198",
91
   "metadata": {},
92
   "outputs": [],
93
   "source": [
94
    "# import the relevant libraries\n",
95
    "from llama_index.core import Prompt\n",
96
    "\n",
97
    "LLAMA_PROMPT_TEMPLATE = (\n",
98
    " \"<s>[INST] <<SYS>>\"\n",
99
    " \"Use the following context to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer.\"\n",
100
    " \"<</SYS>>\"\n",
101
    " \"<s>[INST] Context: {context_str} Question: {query_str} Only return the helpful answer below and nothing else. Helpful answer:[/INST]\"\n",
102
    ")\n",
103
    "\n",
104
    "qa_template = Prompt(LLAMA_PROMPT_TEMPLATE)"
105
   ]
106
  },
107
  {
108
   "cell_type": "markdown",
109
   "id": "d0af7922",
110
   "metadata": {},
111
   "source": [
112
    "### Load Documents\n",
113
    "Follow the step number 1 [defined here](../notebooks/05_dataloader.ipynb) to upload the pdf's to Milvus server.\n"
114
   ]
115
  },
116
  {
117
   "cell_type": "markdown",
118
   "id": "a7bb75ad",
119
   "metadata": {},
120
   "source": [
121
    "In this rest of this section, we will load and split the pdfs of NVIDIA blogs. We will use the `SentenceTransformersTokenTextSplitter`.\n",
122
    "Additionally, we use a LlamaIndex [``PromptHelper``](https://gpt-index.readthedocs.io/en/latest/api_reference/service_context/prompt_helper.html) to help deal with LLM context window token limitations. "
123
   ]
124
  },
125
  {
126
   "cell_type": "code",
127
   "execution_count": null,
128
   "id": "fa366250-108e-45a0-88ce-e6f7274da8e1",
129
   "metadata": {},
130
   "outputs": [],
131
   "source": [
132
    "# import the relevant libraries\n",
133
    "from langchain.text_splitter import SentenceTransformersTokenTextSplitter\n",
134
    "from llama_index.core.node_parser import LangchainNodeParser\n",
135
    "from llama_index.core import PromptHelper\n",
136
    "\n",
137
    "# setup the text splitter\n",
138
    "TEXT_SPLITTER_MODEL = \"intfloat/e5-large-v2\"\n",
139
    "TEXT_SPLITTER_TOKENS_PER_CHUNK = 510\n",
140
    "TEXT_SPLITTER_CHUNCK_OVERLAP = 200\n",
141
    "\n",
142
    "text_splitter = SentenceTransformersTokenTextSplitter(\n",
143
    "    model_name=TEXT_SPLITTER_MODEL,\n",
144
    "    tokens_per_chunk=TEXT_SPLITTER_TOKENS_PER_CHUNK,\n",
145
    "    chunk_overlap=TEXT_SPLITTER_CHUNCK_OVERLAP,\n",
146
    ")\n",
147
    "\n",
148
    "node_parser = LangchainNodeParser(text_splitter)\n",
149
    "\n",
150
    "\n",
151
    "# Use the PromptHelper\n",
152
    "\n",
153
    "prompt_helper = PromptHelper(\n",
154
    "  context_window=4096,\n",
155
    "  num_output=256,\n",
156
    "  chunk_overlap_ratio=0.1,\n",
157
    "  chunk_size_limit=None\n",
158
    ")"
159
   ]
160
  },
161
  {
162
   "cell_type": "markdown",
163
   "id": "b8dab583-a12d-4fb1-a9eb-3a1b1f04075d",
164
   "metadata": {},
165
   "source": [
166
    "#### Generate and Store Embeddings\n",
167
    "##### a) Generate Embeddings \n",
168
    "[Embeddings](https://python.langchain.com/docs/modules/data_connection/text_embedding/) for documents are created by vectorizing the document text; this vectorization captures the semantic meaning of the text. \n",
169
    "\n",
170
    "We will use [intfloat/e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) for the embeddings."
171
   ]
172
  },
173
  {
174
   "cell_type": "code",
175
   "execution_count": null,
176
   "id": "e9011ba0-f3f6-41f0-8a15-48f264743545",
177
   "metadata": {},
178
   "outputs": [],
179
   "source": [
180
    "# import the relevant libraries\n",
181
    "from langchain.embeddings import HuggingFaceEmbeddings\n",
182
    "from llama_index.embeddings.langchain import LangchainEmbedding\n",
183
    "\n",
184
    "#Running the model on CPU as we want to conserve gpu memory.\n",
185
    "#In the production deployment (API server shown as part of the 5th notebook we run the model on GPU)\n",
186
    "model_name=\"intfloat/e5-large-v2\"\n",
187
    "model_kwargs = {\"device\": \"cuda:0\"}\n",
188
    "encode_kwargs = {\"normalize_embeddings\": False}\n",
189
    "hf_embeddings = HuggingFaceEmbeddings(\n",
190
    "    model_name=model_name,\n",
191
    "    model_kwargs=model_kwargs,\n",
192
    "    encode_kwargs=encode_kwargs,\n",
193
    ")\n",
194
    "# Load in a specific embedding model\n",
195
    "embed_model = LangchainEmbedding(hf_embeddings)"
196
   ]
197
  },
198
  {
199
   "cell_type": "markdown",
200
   "id": "8db99124-e438-406d-880d-557501a461d3",
201
   "metadata": {},
202
   "source": [
203
    "##### b) Store Embeddings \n",
204
    "\n",
205
    "We will use the LlamaIndex module [`Settings`](https://docs.llamaindex.ai/en/stable/module_guides/supporting_modules/settings/?h=settings) to bundle commonly used resources during the indexing and querying stage.\n",
206
    "\n",
207
    "\n",
208
    "In this example, we bundle the build resources: the LLM, the embedding model, the node parser, and the prompt helper.   "
209
   ]
210
  },
211
  {
212
   "cell_type": "code",
213
   "execution_count": null,
214
   "id": "0e493f9d-589a-4820-902d-f68932bfb0d8",
215
   "metadata": {},
216
   "outputs": [],
217
   "source": [
218
    "# import the relevant libraries\n",
219
    "from llama_index.core import Settings\n",
220
    "\n",
221
    "Settings.llm = llm\n",
222
    "Settings.embed_model = embed_model\n",
223
    "Settings.node_parser = node_parser\n",
224
    "Settings.prompt_helper = prompt_helper"
225
   ]
226
  },
227
  {
228
   "cell_type": "markdown",
229
   "id": "44e10c13",
230
   "metadata": {},
231
   "source": [
232
    "Ingest the dataset using the /documents endpoint in the chain-server."
233
   ]
234
  },
235
  {
236
   "cell_type": "code",
237
   "execution_count": null,
238
   "id": "acdc51db",
239
   "metadata": {},
240
   "outputs": [],
241
   "source": [
242
    "import os\n",
243
    "import requests\n",
244
    "import mimetypes\n",
245
    "\n",
246
    "def upload_document(file_path, url):\n",
247
    "    headers = {\n",
248
    "        'accept': 'application/json'\n",
249
    "    }\n",
250
    "    mime_type, _ = mimetypes.guess_type(file_path)\n",
251
    "    files = {\n",
252
    "        'file': (file_path, open(file_path, 'rb'), mime_type)\n",
253
    "    }\n",
254
    "    response = requests.post(url, headers=headers, files=files)\n",
255
    "\n",
256
    "    return response.text\n",
257
    "\n",
258
    "def upload_pdf_files(folder_path, upload_url):\n",
259
    "    for files in os.listdir(folder_path):\n",
260
    "        _, ext = os.path.splitext(files)\n",
261
    "        # Ingest only pdf files\n",
262
    "        if ext.lower() == \".pdf\":\n",
263
    "            file_path = os.path.join(folder_path, files)\n",
264
    "            print(upload_document(file_path, upload_url))"
265
   ]
266
  },
267
  {
268
   "cell_type": "code",
269
   "execution_count": null,
270
   "id": "823c89f9",
271
   "metadata": {},
272
   "outputs": [],
273
   "source": [
274
    "import time\n",
275
    "\n",
276
    "start_time = time.time()\n",
277
    "upload_pdf_files(\"dataset\", \"http://chain-server:8081/documents\")\n",
278
    "print(f\"--- {time.time() - start_time} seconds ---\")"
279
   ]
280
  },
281
  {
282
   "attachments": {},
283
   "cell_type": "markdown",
284
   "id": "79c7923c-d778-4f32-be37-4314063ecd2f",
285
   "metadata": {},
286
   "source": [
287
    "<div class=\"alert alert-block alert-info\">\n",
288
    "    \n",
289
    "⚠️ in the deployment of this workflow, [Milvus](https://milvus.io/) is running as a vector database microservice.\n",
290
    "</div>"
291
   ]
292
  },
293
  {
294
   "cell_type": "code",
295
   "execution_count": null,
296
   "id": "1e94e53e-41a9-47d3-a9d3-7c0af4c07f76",
297
   "metadata": {},
298
   "outputs": [],
299
   "source": [
300
    "# import the relevant libraries\n",
301
    "from llama_index.core import VectorStoreIndex\n",
302
    "from llama_index.core.storage.storage_context import StorageContext\n",
303
    "from llama_index.vector_stores.milvus import MilvusVectorStore\n",
304
    "\n",
305
    "# store\n",
306
    "vector_store = MilvusVectorStore(uri=\"http://milvus:19530\",\n",
307
    "    dim=1024,\n",
308
    "    collection_name=\"developer_rag\",\n",
309
    "    index_config={\"index_type\": \"GPU_IVF_FLAT\", \"nlist\": 64},\n",
310
    "    search_config={\"nprobe\": 16},\n",
311
    "    overwrite=False\n",
312
    ")\n",
313
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
314
    "index = VectorStoreIndex.from_vector_store(vector_store)"
315
   ]
316
  },
317
  {
318
   "cell_type": "markdown",
319
   "id": "b3b58028-04fa-4050-9ec4-6526817fd9cf",
320
   "metadata": {},
321
   "source": [
322
    "### Step 6: Build the Query Engine, exposing the Retriever and Generator outputs\n",
323
    "\n",
324
    "#### a) Limit the Retriever Total Output Length\n",
325
    "\n",
326
    "First, we need to restrict the output of the Retriever to a reasonable length so that the prompt can fit the context length of the LLM.\n",
327
    "In this notebook, we will restrict it to 1000 (anything up to 1000 will ignored).\n"
328
   ]
329
  },
330
  {
331
   "cell_type": "code",
332
   "execution_count": null,
333
   "id": "6efc410c-f488-43aa-af65-c39376bd7ba5",
334
   "metadata": {},
335
   "outputs": [],
336
   "source": [
337
    "# import the relevant libraries\n",
338
    "from llama_index.core.postprocessor.types import BaseNodePostprocessor\n",
339
    "from typing import TYPE_CHECKING, List, Optional\n",
340
    "from llama_index.core.utils import get_tokenizer\n",
341
    "DEFAULT_MAX_CONTEXT = 1000\n",
342
    "\n",
343
    "# limit the Retriever total outputs length\n",
344
    "class LimitRetrievedNodesLength(BaseNodePostprocessor):\n",
345
    "    \"\"\"Llama Index chain filter to limit token lengths.\"\"\"\n",
346
    "\n",
347
    "    def _postprocess_nodes(\n",
348
    "        self, nodes: List[\"NodeWithScore\"], query_bundle: Optional[\"QueryBundle\"] = None\n",
349
    "    ) -> List[\"NodeWithScore\"]:\n",
350
    "        \"\"\"Filter function.\"\"\"\n",
351
    "        included_nodes = []\n",
352
    "        current_length = 0\n",
353
    "        limit = DEFAULT_MAX_CONTEXT\n",
354
    "\n",
355
    "        tokenizer = get_tokenizer()\n",
356
    "        for node in nodes:\n",
357
    "            current_length += len(\n",
358
    "                tokenizer(\n",
359
    "                    node.node.get_content(metadata_mode=MetadataMode.LLM)\n",
360
    "                )\n",
361
    "            )\n",
362
    "            if current_length > limit:\n",
363
    "                break\n",
364
    "            included_nodes.append(node)\n",
365
    "\n",
366
    "        return included_nodes\n",
367
    "\n"
368
   ]
369
  },
370
  {
371
   "cell_type": "markdown",
372
   "id": "e33cfed2-2a63-40be-8a7d-787ba04d2af9",
373
   "metadata": {},
374
   "source": [
375
    "#### b) Build the Query Engine\n",
376
    "\n",
377
    "Now, let's build the query engine that takes a query and returns a response. Each vector index has a default corresponding query engine; for example, the default query engine for a vector index performs a standard top-k retrieval over the vector store.\n",
378
    "We will use `RetrieverQueryEngine` to get the output of the Retriever and generator. Learn more about the RetrieverQueryEngine in the [documentation](https://gpt-index.readthedocs.io/en/latest/examples/query_engine/CustomRetrievers.html).\n",
379
    "\n",
380
    " "
381
   ]
382
  },
383
  {
384
   "cell_type": "code",
385
   "execution_count": null,
386
   "id": "f56f37e0-341e-4d7d-b282-f374a16f55b2",
387
   "metadata": {},
388
   "outputs": [],
389
   "source": [
390
    "# import the relevant libraries\n",
391
    "from llama_index.core.query_engine import RetrieverQueryEngine\n",
392
    "from llama_index.core.schema import MetadataMode\n",
393
    "\n",
394
    "# Expose the retriever\n",
395
    "retriever = index.as_retriever(similarity_top_k=2)\n",
396
    "\n",
397
    "query_engine = RetrieverQueryEngine.from_args(\n",
398
    "    retriever,\n",
399
    "    text_qa_template=qa_template,\n",
400
    "    node_postprocessors=[LimitRetrievedNodesLength()]\n",
401
    ")"
402
   ]
403
  },
404
  {
405
   "cell_type": "markdown",
406
   "id": "c6a58983-2069-450e-adf9-24b0f8736498",
407
   "metadata": {},
408
   "source": [
409
    "### Step 7: Fill the RAG outputs \n",
410
    "\n",
411
    "Let's now query the RAG pipeline and fill the outputs `contexts` and `answer` on the evaluation JSON file.\n",
412
    "\n",
413
    "First, we need to load the previously generated dataset. So far, the RAG outputs fields are empty.\n"
414
   ]
415
  },
416
  {
417
   "cell_type": "code",
418
   "execution_count": null,
419
   "id": "82f0f304-3476-42e3-9be7-1ab38f9e14cd",
420
   "metadata": {},
421
   "outputs": [],
422
   "source": [
423
    "# import the relevant libraries\n",
424
    "import json\n",
425
    "from IPython.display import JSON\n",
426
    "\n",
427
    "# load the evaluation data\n",
428
    "f = open('qa_generation.json')\n",
429
    "data = json.load(f)\n",
430
    "\n",
431
    "# show the first element\n",
432
    "JSON(data[0])"
433
   ]
434
  },
435
  {
436
   "cell_type": "markdown",
437
   "id": "d4b4321b-dfce-4c72-a8f1-2e2264b3c59d",
438
   "metadata": {},
439
   "source": [
440
    "Let now query the RAG pipeline and populate the `contexts` and `answer` fields."
441
   ]
442
  },
443
  {
444
   "cell_type": "code",
445
   "execution_count": null,
446
   "id": "6f238d58-071a-4bb9-956c-d014748c15ab",
447
   "metadata": {
448
    "scrolled": true
449
   },
450
   "outputs": [],
451
   "source": [
452
    "for entry in data:\n",
453
    "    limited_retrieval_length = LimitRetrievedNodesLength()\n",
454
    "    retrieved_text = \"\"\n",
455
    "    response = query_engine.query(entry[\"question\"])\n",
456
    "    entry[\"answer\"] = response.response\n",
457
    "    print(entry[\"answer\"])\n",
458
    "    nodes = retriever.retrieve(entry[\"question\"])\n",
459
    "    included_nodes = limited_retrieval_length.postprocess_nodes(nodes)\n",
460
    "    for node in included_nodes:\n",
461
    "        retrieved_text = retrieved_text + \" \" + node.text\n",
462
    "    entry[\"contexts\"] = [retrieved_text]"
463
   ]
464
  },
465
  {
466
   "cell_type": "code",
467
   "execution_count": null,
468
   "id": "14407673-a8f1-4245-8748-d6885e08f06d",
469
   "metadata": {},
470
   "outputs": [],
471
   "source": [
472
    "# json_list_string=json.dumps(data)\n",
473
    "\n",
474
    "# show again the first element\n",
475
    "JSON(data[0])"
476
   ]
477
  },
478
  {
479
   "cell_type": "markdown",
480
   "id": "dfa9f140-5989-4c3c-98af-18ec63a954b9",
481
   "metadata": {},
482
   "source": [
483
    "Let now save the new evaluation datasets."
484
   ]
485
  },
486
  {
487
   "cell_type": "code",
488
   "execution_count": null,
489
   "id": "958653ba-4228-4c81-8f65-81ead7c8254f",
490
   "metadata": {},
491
   "outputs": [],
492
   "source": [
493
    "import json\n",
494
    "with open('eval.json', 'w') as f:\n",
495
    "    json.dump(data, f)"
496
   ]
497
  },
498
  {
499
   "cell_type": "markdown",
500
   "id": "248982b8-9f9e-4021-a326-657e2e82d43d",
501
   "metadata": {},
502
   "source": [
503
    "In the next notebook, we will evaluate the [Corp Comms Copilot](https://gitlab-master.nvidia.com/chat-labs/rag-demos/corp-comms-copilot) RAG pipeline."
504
   ]
505
  }
506
 ],
507
 "metadata": {
508
  "kernelspec": {
509
   "display_name": "Python 3 (ipykernel)",
510
   "language": "python",
511
   "name": "python3"
512
  },
513
  "language_info": {
514
   "codemirror_mode": {
515
    "name": "ipython",
516
    "version": 3
517
   },
518
   "file_extension": ".py",
519
   "mimetype": "text/x-python",
520
   "name": "python",
521
   "nbconvert_exporter": "python",
522
   "pygments_lexer": "ipython3",
523
   "version": "3.10.12"
524
  }
525
 },
526
 "nbformat": 4,
527
 "nbformat_minor": 5
528
}
529
GenerativeAIExamples

Использование cookies