milvus-io_bootcamp

custom_RAG_workflow.ipynb
1813 строк · 72.3 Кб
Перенос по словам
1
{
2
 "cells": [
3
  {
4
   "cell_type": "markdown",
5
   "id": "369c3444",
6
   "metadata": {},
7
   "source": [
8
    "# ReadtheDocs Retrieval Augmented Generation (RAG) using Zilliz Free Tier"
9
   ]
10
  },
11
  {
12
   "cell_type": "markdown",
13
   "id": "f6ffd11a",
14
   "metadata": {},
15
   "source": [
16
    "In this notebook, we are going to use Milvus documentation pages to create a chatbot about our product.  The chatbot is going to follow RAG steps to retrieve chunks of data using Semantic Vector Search, then the Question + Context will be fed as a Prompt to a LLM to generate an answer.\n",
17
    "\n",
18
    "Many RAG demos use OpenAI for the Embedding Model and ChatGPT for the Generative AI model.  **In this notebook, we will demo a fully open source RAG stack.**\n",
19
    "\n",
20
    "Using open-source Q&A with retrieval saves money since we make free calls to our own data almost all the time - retrieval, evaluation, and development iterations.  We only make a paid call to OpenAI once for the final chat generation step. \n",
21
    "\n",
22
    "<div>\n",
23
    "<img src=\"../../images/rag_image.png\" width=\"80%\"/>\n",
24
    "</div>\n",
25
    "\n",
26
    "Let's get started!"
27
   ]
28
  },
29
  {
30
   "cell_type": "code",
31
   "execution_count": 1,
32
   "id": "d7570b2e",
33
   "metadata": {},
34
   "outputs": [],
35
   "source": [
36
    "# For colab install these libraries in this order:\n",
37
    "# !pip install pymilvus, langchain, torch, transformers, python-dotenv\n",
38
    "\n",
39
    "# Import common libraries.\n",
40
    "import sys, os, time, pprint\n",
41
    "import numpy as np\n",
42
    "\n",
43
    "# Import custom functions for splitting and search.\n",
44
    "sys.path.append(\"..\")  # Adds higher directory to python modules path.\n",
45
    "import milvus_utilities as _utils"
46
   ]
47
  },
48
  {
49
   "cell_type": "markdown",
50
   "id": "fb844837",
51
   "metadata": {},
52
   "source": [
53
    "## Start up a Zilliz free tier cluster.\n",
54
    "\n",
55
    "Code in this notebook uses fully-managed Milvus on [Ziliz Cloud free trial](https://cloud.zilliz.com/login).  \n",
56
    "  1. Choose the default \"Starter\" option when you provision > Create collection > Give it a name > Create cluster and collection.  \n",
57
    "  2. On the Cluster main page, copy your `API Key` and store it locally in a .env variable.  See note below how to do that.\n",
58
    "  3. Also on the Cluster main page, copy the `Public Endpoint URI`.\n",
59
    "\n",
60
    "💡 Note: To keep your tokens private, best practice is to use an **env variable**.  See [how to save api key in env variable](https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety). <br>\n",
61
    "\n",
62
    "In Jupyter, you also need a .env file (in same dir as notebooks) containing lines like this:\n",
63
    "- VARIABLE_NAME=value\n"
64
   ]
65
  },
66
  {
67
   "cell_type": "code",
68
   "execution_count": 2,
69
   "id": "0806d2db",
70
   "metadata": {},
71
   "outputs": [
72
    {
73
     "name": "stdout",
74
     "output_type": "stream",
75
     "text": [
76
      "Type of server: Zilliz Cloud Vector Database(Compatible with Milvus 2.3)\n"
77
     ]
78
    }
79
   ],
80
   "source": [
81
    "# STEP 1. CONNECT TO MILVUS\n",
82
    "\n",
83
    "# !pip install pymilvus #python sdk for milvus\n",
84
    "from pymilvus import connections, utility\n",
85
    "\n",
86
    "# Jupyter notebooks:\n",
87
    "# from dotenv import load_dotenv\n",
88
    "# load_dotenv()\n",
89
    "# TOKEN = os.getenv(\"ZILLIZ_API_KEY\")\n",
90
    "\n",
91
    "# Usual way:\n",
92
    "from dotenv import load_dotenv, find_dotenv\n",
93
    "_ = load_dotenv(find_dotenv()) # read local .env file\n",
94
    "TOKEN = os.environ[\"ZILLIZ_API_KEY\"]\n",
95
    "\n",
96
    "# Connect to Zilliz cloud using endpoint URI and API key TOKEN.\n",
97
    "# TODO change this.\n",
98
    "CLUSTER_ENDPOINT=\"https://in03-xxxx.api.gcp-us-west1.zillizcloud.com:443\"\n",
99
    "connections.connect(\n",
100
    "  alias='default',\n",
101
    "  #  Public endpoint obtained from Zilliz Cloud\n",
102
    "  uri=CLUSTER_ENDPOINT,\n",
103
    "  # API key or a colon-separated cluster username and password\n",
104
    "  token=TOKEN,\n",
105
    ")\n",
106
    "\n",
107
    "# Check if the server is ready and get colleciton name.\n",
108
    "print(f\"Type of server: {utility.get_server_version()}\")"
109
   ]
110
  },
111
  {
112
   "cell_type": "markdown",
113
   "id": "b01d6622",
114
   "metadata": {},
115
   "source": [
116
    "## Load the Embedding Model checkpoint and use it to create vector embeddings\n",
117
    "**Embedding model:**  We will use the open-source [sentence transformers](https://www.sbert.net/docs/pretrained_models.html) available on HuggingFace to encode the documentation text.  We will download the model from HuggingFace and run it locally. \n",
118
    "\n",
119
    "Two model parameters of note below:\n",
120
    "1. EMBEDDING_DIM refers to the dimensionality or length of the embedding vector. In this case, the embeddings generated for EACH token in the input text will have the SAME length = 1024. This size of embedding is often associated with BERT-based models, where the embeddings are used for downstream tasks such as classification, question answering, or text generation. <br><br>\n",
121
    "2. MAX_SEQ_LENGTH is the maximum length the encoder model can handle for input sequences. In this case, if sequences longer than 512 tokens are given to the model, everything longer will be (silently!) chopped off.  This is the reason why a chunking strategy is needed to segment input texts into chunks with lengths that will fit in the model's input."
122
   ]
123
  },
124
  {
125
   "cell_type": "code",
126
   "execution_count": 3,
127
   "id": "dd2be7fd",
128
   "metadata": {},
129
   "outputs": [
130
    {
131
     "name": "stdout",
132
     "output_type": "stream",
133
     "text": [
134
      "device: cpu\n"
135
     ]
136
    },
137
    {
138
     "name": "stderr",
139
     "output_type": "stream",
140
     "text": [
141
      "No sentence-transformers model found with name /Users/christybergman/.cache/torch/sentence_transformers/WhereIsAI_UAE-Large-V1. Creating a new one with MEAN pooling.\n"
142
     ]
143
    },
144
    {
145
     "name": "stdout",
146
     "output_type": "stream",
147
     "text": [
148
      "<class 'sentence_transformers.SentenceTransformer.SentenceTransformer'>\n",
149
      "SentenceTransformer(\n",
150
      "  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel \n",
151
      "  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})\n",
152
      ")\n",
153
      "model_name: WhereIsAI/UAE-Large-V1\n",
154
      "EMBEDDING_DIM: 1024\n",
155
      "MAX_SEQ_LENGTH: 512\n"
156
     ]
157
    }
158
   ],
159
   "source": [
160
    "# STEP 2. DOWNLOAD AN OPEN SOURCE EMBEDDING MODEL.\n",
161
    "\n",
162
    "# Import torch.\n",
163
    "import torch\n",
164
    "from torch.nn import functional as F\n",
165
    "from sentence_transformers import SentenceTransformer\n",
166
    "\n",
167
    "# Initialize torch settings\n",
168
    "torch.backends.cudnn.deterministic = True\n",
169
    "DEVICE = torch.device('cuda:3' if torch.cuda.is_available() else 'cpu')\n",
170
    "print(f\"device: {DEVICE}\")\n",
171
    "\n",
172
    "# Load the model from huggingface model hub.\n",
173
    "# python -m pip install -U angle-emb\n",
174
    "model_name = \"WhereIsAI/UAE-Large-V1\"\n",
175
    "encoder = SentenceTransformer(model_name, device=DEVICE)\n",
176
    "print(type(encoder))\n",
177
    "print(encoder)\n",
178
    "\n",
179
    "# Get the model parameters and save for later.\n",
180
    "EMBEDDING_DIM = encoder.get_sentence_embedding_dimension()\n",
181
    "MAX_SEQ_LENGTH_IN_TOKENS = encoder.get_max_seq_length() \n",
182
    "# # Assume tokens are 3 characters long.\n",
183
    "# MAX_SEQ_LENGTH = MAX_SEQ_LENGTH_IN_TOKENS * 3\n",
184
    "# HF_EOS_TOKEN_LENGTH = 1 * 3\n",
185
    "# Test with 512 sequence length.\n",
186
    "MAX_SEQ_LENGTH = MAX_SEQ_LENGTH_IN_TOKENS\n",
187
    "HF_EOS_TOKEN_LENGTH = 1\n",
188
    "\n",
189
    "# Inspect model parameters.\n",
190
    "print(f\"model_name: {model_name}\")\n",
191
    "print(f\"EMBEDDING_DIM: {EMBEDDING_DIM}\")\n",
192
    "print(f\"MAX_SEQ_LENGTH: {MAX_SEQ_LENGTH}\")"
193
   ]
194
  },
195
  {
196
   "cell_type": "markdown",
197
   "metadata": {},
198
   "source": [
199
    "## Create a Milvus collection\n",
200
    "\n",
201
    "You can think of a collection in Milvus like a \"table\" in SQL databases.  The **collection** will contain the \n",
202
    "- **Schema** (or [no-schema Milvus client](https://milvus.io/docs/using_milvusclient.md)).  \n",
203
    "💡 You'll need the vector `EMBEDDING_DIM` parameter from your embedding model.\n",
204
    "Typical values are:\n",
205
    "   - 1024 for sbert embedding models\n",
206
    "   - 1536 for ada-002 OpenAI embedding models\n",
207
    "- **Vector index** for efficient vector search\n",
208
    "- **Vector distance metric** for measuring nearest neighbor vectors\n",
209
    "- **Consistency level**\n",
210
    "In Milvus, transactional consistency is possible; however, according to the [CAP theorem](https://en.wikipedia.org/wiki/CAP_theorem), some latency must be sacrificed. 💡 Searching movie reviews is not mission-critical, so [`eventually`](https://milvus.io/docs/consistency.md) consistent is fine here.\n",
211
    "\n",
212
    "## Add a Vector Index\n",
213
    "\n",
214
    "The vector index determines the vector **search algorithm** used to find the closest vectors in your data to the query a user submits.  \n",
215
    "\n",
216
    "Most vector indexes use different sets of parameters depending on whether the database is:\n",
217
    "- **inserting vectors** (creation mode) - vs - \n",
218
    "- **searching vectors** (search mode) \n",
219
    "\n",
220
    "Scroll down the [docs page](https://milvus.io/docs/index.md) to see a table listing different vector indexes available on Milvus.  For example:\n",
221
    "- FLAT - deterministic exhaustive search\n",
222
    "- IVF_FLAT or IVF_SQ8 - Hash index (stochastic approximate search)\n",
223
    "- HNSW - Graph index (stochastic approximate search)\n",
224
    "- AUTOINDEX - Automatically determined based on OSS vs [Zilliz cloud](https://docs.zilliz.com/docs/autoindex-explained), type of GPU, size of data.\n",
225
    "\n",
226
    "Besides a search algorithm, we also need to specify a **distance metric**, that is, a definition of what is considered \"close\" in vector space.  In the cell below, the [`HNSW`](https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md) search index is chosen.  Its possible distance metrics are one of:\n",
227
    "- L2 - L2-norm\n",
228
    "- IP - Dot-product\n",
229
    "- COSINE - Angular distance\n",
230
    "\n",
231
    "💡 Most use cases work better with normalized embeddings, in which case L2 is useless (every vector has length=1) and IP and COSINE are the same.  Only choose L2 if you plan to keep your embeddings unnormalized."
232
   ]
233
  },
234
  {
235
   "cell_type": "code",
236
   "execution_count": 4,
237
   "metadata": {},
238
   "outputs": [
239
    {
240
     "name": "stdout",
241
     "output_type": "stream",
242
     "text": [
243
      "Successfully dropped collection: `wikipedia`\n",
244
      "Successfully created collection: `wikipedia`\n"
245
     ]
246
    }
247
   ],
248
   "source": [
249
    "# STEP 3. CREATE A NO-SCHEMA MILVUS COLLECTION AND DEFINE THE DATABASE INDEX.\n",
250
    "\n",
251
    "from pymilvus import MilvusClient\n",
252
    "\n",
253
    "# Set the Milvus collection name.\n",
254
    "COLLECTION_NAME = \"wikipedia\"\n",
255
    "\n",
256
    "# Add custom HNSW search index to the collection.\n",
257
    "# M = max number graph connections per layer. Large M = denser graph.\n",
258
    "# Choice of M: 4~64, larger M for larger data and larger embedding lengths.\n",
259
    "M = 16\n",
260
    "# efConstruction = num_candidate_nearest_neighbors per layer. \n",
261
    "# Use Rule of thumb: int. 8~512, efConstruction = M * 2.\n",
262
    "efConstruction = M * 2\n",
263
    "# Create the search index for local Milvus server.\n",
264
    "INDEX_PARAMS = dict({\n",
265
    "    'M': M,               \n",
266
    "    \"efConstruction\": efConstruction })\n",
267
    "index_params = {\n",
268
    "    \"index_type\": \"HNSW\", \n",
269
    "    \"metric_type\": \"COSINE\", \n",
270
    "    \"params\": INDEX_PARAMS\n",
271
    "    }\n",
272
    "\n",
273
    "# Use no-schema Milvus client uses flexible json key:value format.\n",
274
    "# https://milvus.io/docs/using_milvusclient.md\n",
275
    "mc = MilvusClient(\n",
276
    "    uri=CLUSTER_ENDPOINT,\n",
277
    "    # API key or a colon-separated cluster username and password\n",
278
    "    token=TOKEN)\n",
279
    "\n",
280
    "# Check if collection already exists, if so drop it.\n",
281
    "has = utility.has_collection(COLLECTION_NAME)\n",
282
    "if has:\n",
283
    "    drop_result = utility.drop_collection(COLLECTION_NAME)\n",
284
    "    print(f\"Successfully dropped collection: `{COLLECTION_NAME}`\")\n",
285
    "\n",
286
    "# Create the collection.\n",
287
    "mc.create_collection(COLLECTION_NAME, \n",
288
    "                     EMBEDDING_DIM,\n",
289
    "                     consistency_level=\"Eventually\", \n",
290
    "                     auto_id=True,\n",
291
    "                     # skip setting params below, if using AUTOINDEX\n",
292
    "                     params=index_params\n",
293
    "                    )\n",
294
    "\n",
295
    "print(f\"Successfully created collection: `{COLLECTION_NAME}`\")\n",
296
    "# pprint.pprint(mc.describe_collection(COLLECTION_NAME))"
297
   ]
298
  },
299
  {
300
   "cell_type": "markdown",
301
   "id": "d9bd8153",
302
   "metadata": {},
303
   "source": [
304
    "## Insert data into Milvus\n",
305
    "\n",
306
    "For each original text chunk, we'll write the quadruplet (`vector, text, source, h1, h2`) into the database.\n",
307
    "\n",
308
    "<div>\n",
309
    "<img src=\"../../images/db_insert.png\" width=\"80%\"/>\n",
310
    "</div>\n",
311
    "\n",
312
    "**The Milvus Client wrapper can only handle loading data from a list of dictionaries.**\n",
313
    "\n",
314
    "Otherwise, in general, Milvus supports loading data from:\n",
315
    "- pandas dataframes \n",
316
    "- list of dictionaries\n",
317
    "\n",
318
    "Below, we use the embedding model provided by HuggingFace, download its checkpoint, and run it locally as the encoder.  "
319
   ]
320
  },
321
  {
322
   "cell_type": "code",
323
   "execution_count": 5,
324
   "id": "454e8348",
325
   "metadata": {},
326
   "outputs": [
327
    {
328
     "name": "stdout",
329
     "output_type": "stream",
330
     "text": [
331
      "Num docs: 1\n",
332
      "Num chunks: 704\n",
333
      "Start inserting entities\n"
334
     ]
335
    },
336
    {
337
     "name": "stderr",
338
     "output_type": "stream",
339
     "text": [
340
      "100%|██████████| 1/1 [00:03<00:00,  3.95s/it]\n"
341
     ]
342
    },
343
    {
344
     "name": "stdout",
345
     "output_type": "stream",
346
     "text": [
347
      "Milvus Client insert time for 704 vectors: 3.9572505950927734 seconds\n"
348
     ]
349
    }
350
   ],
351
   "source": [
352
    "# INSERT WIKIPEDIA CHUNKS INTO A SEPARATE PARTITION.\n",
353
    "from langchain.document_loaders import WebBaseLoader\n",
354
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
355
    "\n",
356
    "# load the Wikipedia page and create index\n",
357
    "loader = WebBaseLoader(\"https://en.wikipedia.org/wiki/New_York_City\")\n",
358
    "docs = loader.load()\n",
359
    "\n",
360
    "# Split the documents into smaller chunks\n",
361
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=50)\n",
362
    "print(f\"Num docs: {len(docs)}\")\n",
363
    "chunks = text_splitter.split_documents(docs)\n",
364
    "print(f\"Num chunks: {len(chunks)}\")\n",
365
    "\n",
366
    "# Convert chunks to a list of dictionaries.\n",
367
    "chunk_list = []\n",
368
    "for chunk in chunks:\n",
369
    "    # pprint.pprint(chunk)\n",
370
    "    # Generate embeddings using encoder from HuggingFace.\n",
371
    "    embeddings = torch.tensor(encoder.encode([chunk.page_content]))\n",
372
    "    embeddings = np.array(embeddings / np.linalg.norm(embeddings)) #use numpy\n",
373
    "    converted_values = list(map(np.float32, embeddings))[0]\n",
374
    "    \n",
375
    "    # Assemble embedding vector, original text chunk, metadata.\n",
376
    "    chunk_dict = {\n",
377
    "        'vector': converted_values,\n",
378
    "        'chunk': chunk.page_content,\n",
379
    "        'source': chunk.metadata['source'],\n",
380
    "        'h1': chunk.metadata['title'][:50],\n",
381
    "    }\n",
382
    "    chunk_list.append(chunk_dict)\n",
383
    "\n",
384
    "# Insert data into the Milvus collection.\n",
385
    "print(\"Start inserting entities\")\n",
386
    "start_time = time.time()\n",
387
    "insert_result = mc.insert(\n",
388
    "    COLLECTION_NAME,\n",
389
    "    data=chunk_list,\n",
390
    "    append=True,\n",
391
    "    progress_bar=True)\n",
392
    "end_time = time.time()\n",
393
    "print(f\"Milvus Client insert time for {len(chunk_list)} vectors: {end_time - start_time} seconds\")\n",
394
    "# Milvus Client insert time for 646 vectors: 4.732278823852539 seconds\n",
395
    "\n",
396
    "# After final entity is inserted, call flush to stop growing segments left in memory.\n",
397
    "mc.flush(COLLECTION_NAME)"
398
   ]
399
  },
400
  {
401
   "cell_type": "markdown",
402
   "id": "1746f937",
403
   "metadata": {},
404
   "source": [
405
    "## Define Evaluation Metrics"
406
   ]
407
  },
408
  {
409
   "cell_type": "code",
410
   "execution_count": 6,
411
   "id": "8b3d75c7",
412
   "metadata": {},
413
   "outputs": [],
414
   "source": [
415
    "import openai, pprint\n",
416
    "from openai import OpenAI\n",
417
    "\n",
418
    "# Define the generation llm model to use.\n",
419
    "LLM_NAME = \"gpt-3.5-turbo-1106\"\n",
420
    "TEMPERATURE = 0.1\n",
421
    "RANDOM_SEED = 415\n",
422
    "\n",
423
    "# Reasonable values for the penalty coefficients are around 0.1 to 1 if the aim is to just reduce repition \n",
424
    "# somewhat. To strongly suppress repetition, set coefficients = 2.\n",
425
    "FREQUENCY_PENALTY = 2\n",
426
    "\n",
427
    "# See how to save api key in env variable.\n",
428
    "# https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety\n",
429
    "openai_client = OpenAI(\n",
430
    "    # This is the default and can be omitted\n",
431
    "    api_key=os.environ.get(\"OPENAI_API_KEY\"),\n",
432
    ")"
433
   ]
434
  },
435
  {
436
   "cell_type": "code",
437
   "execution_count": 7,
438
   "id": "b5b6da85",
439
   "metadata": {},
440
   "outputs": [
441
    {
442
     "name": "stderr",
443
     "output_type": "stream",
444
     "text": [
445
      "/Users/christybergman/mambaforge/envs/py311new/lib/python3.11/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The class `langchain_community.chat_models.openai.ChatOpenAI` was deprecated in langchain-community 0.0.10 and will be removed in 0.2.0. An updated version of the class exists in the langchain-openai package and should be used instead. To use it run `pip install -U langchain-openai` and import as `from langchain_openai import ChatOpenAI`.\n",
446
      "  warn_deprecated(\n"
447
     ]
448
    }
449
   ],
450
   "source": [
451
    "# Ragas default uses HuggingFace Datasets.\n",
452
    "# https://docs.ragas.io/en/latest/getstarted/evaluation.html\n",
453
    "from datasets import Dataset\n",
454
    "# Ragas default uses OpenAI through LangChain.\n",
455
    "from langchain.chat_models import ChatOpenAI\n",
456
    "from ragas.llms import LangchainLLM\n",
457
    "from ragas import evaluate\n",
458
    "\n",
459
    "# Choose the metrics you want to see.\n",
460
    "from ragas.metrics import (\n",
461
    "    # Question -> Context metrics\n",
462
    "    context_recall, \n",
463
    "    context_precision, \n",
464
    "    # Context -> Answer metrics\n",
465
    "    faithfulness, \n",
466
    "    # Question -> Answer metrics\n",
467
    "    answer_similarity,\n",
468
    "    answer_relevancy, \n",
469
    "    answer_correctness\n",
470
    "    )\n",
471
    "metrics = ['context_recall', 'context_precision', 'answer_relevancy', 'faithfulness', 'answer_similarity', 'answer_correctness']\n",
472
    "\n",
473
    "# Customize LLM used by Ragas (uses LangChain OpenAI `gpt-3.5-turbo-16k` by default).\n",
474
    "# Possible to swtich out a HuggingFace open LLM here if you want.\n",
475
    "# https://docs.ragas.io/en/latest/howtos/customisations/llms.html\n",
476
    "llm_langchain = ChatOpenAI(model_name=LLM_NAME, temperature=TEMPERATURE)\n",
477
    "gpt3_wrapper = LangchainLLM(llm=llm_langchain)\n",
478
    "# Change the default llm for each metric.\n",
479
    "for metric in metrics:\n",
480
    "    globals()[metric].llm = gpt3_wrapper"
481
   ]
482
  },
483
  {
484
   "cell_type": "code",
485
   "execution_count": 8,
486
   "id": "5e2db9c0",
487
   "metadata": {},
488
   "outputs": [],
489
   "source": [
490
    "def assemble_ragas_dataset(input_df, answer_col_name=\"OpenAI_RAG_answer\", context_exists=False, row_number=-9999):\n",
491
    "    \"\"\"Assemble a RAGAS HuggingFace Dataset from lists of values.\"\"\"\n",
492
    "\n",
493
    "    # Subset input_df to the row number.\n",
494
    "    if row_number >= 0:\n",
495
    "        subset_df = input_df.iloc[row_number:row_number+1, :]\n",
496
    "    else:\n",
497
    "        subset_df = input_df.copy()\n",
498
    "\n",
499
    "    question_list = subset_df.Question.to_list()\n",
500
    "    answer_list = subset_df[answer_col_name].to_list()\n",
501
    "\n",
502
    "    # contexts: list[list[str]] - The contexts which were passed into the LLM to answer the question.\n",
503
    "    if context_exists:\n",
504
    "        context_list = subset_df.Custom_RAG_context.to_list()\n",
505
    "        context_list = [[context] for context in context_list]\n",
506
    "    else:\n",
507
    "        context_list = [ [\"\"] for _ in question_list]\n",
508
    "\n",
509
    "    # ground_truths: list[list[str]] - The ground truth answer to the questions. \n",
510
    "    truth_list = subset_df.ground_truth_answer.to_list()\n",
511
    "    truth_list = [[truth] for truth in truth_list]\n",
512
    "\n",
513
    "    # Create a HuggingFace Dataset from the ground truth lists.\n",
514
    "    ragas_ds = Dataset.from_dict({\"question\": question_list,\n",
515
    "                            \"contexts\": context_list,\n",
516
    "                            \"answer\": answer_list,\n",
517
    "                            \"ground_truths\": truth_list})\n",
518
    "    \n",
519
    "    return ragas_ds\n",
520
    "\n",
521
    "def evaluate_ragas(input_df, answer_col_name=\"OpenAI_RAG_answer\", context_exists=False, row_number=-9999, metrics=\"final_only\"):\n",
522
    "\n",
523
    "    # Create a ragas dataset.\n",
524
    "    ragas_input_ds = assemble_ragas_dataset(input_df, answer_col_name, context_exists, row_number)\n",
525
    "\n",
526
    "    # Evaluate the dataset.\n",
527
    "    if metrics == \"final_only\":\n",
528
    "        ragas_result = evaluate(\n",
529
    "            ragas_input_ds,\n",
530
    "            metrics=[\n",
531
    "                answer_similarity,\n",
532
    "                answer_relevancy,\n",
533
    "                answer_correctness,])\n",
534
    "    else:\n",
535
    "        # calculate all metrics\n",
536
    "        ragas_result = evaluate(\n",
537
    "            ragas_input_ds,\n",
538
    "            metrics=[\n",
539
    "                # Question -> Context metrics\n",
540
    "                context_recall, \n",
541
    "                context_precision, \n",
542
    "                # Context -> Answer metrics\n",
543
    "                faithfulness, \n",
544
    "                # Question -> Answer metrics\n",
545
    "                answer_similarity,\n",
546
    "                answer_relevancy,\n",
547
    "                answer_correctness,])\n",
548
    "        \n",
549
    "    return ragas_result"
550
   ]
551
  },
552
  {
553
   "cell_type": "code",
554
   "execution_count": 9,
555
   "id": "5d9124c2",
556
   "metadata": {},
557
   "outputs": [
558
    {
559
     "data": {
560
      "text/html": [
561
       "<div>\n",
562
       "<style scoped>\n",
563
       "    .dataframe tbody tr th:only-of-type {\n",
564
       "        vertical-align: middle;\n",
565
       "    }\n",
566
       "\n",
567
       "    .dataframe tbody tr th {\n",
568
       "        vertical-align: top;\n",
569
       "    }\n",
570
       "\n",
571
       "    .dataframe thead th {\n",
572
       "        text-align: right;\n",
573
       "    }\n",
574
       "</style>\n",
575
       "<table border=\"1\" class=\"dataframe\">\n",
576
       "  <thead>\n",
577
       "    <tr style=\"text-align: right;\">\n",
578
       "      <th></th>\n",
579
       "      <th>Question</th>\n",
580
       "      <th>ground_truth_answer</th>\n",
581
       "      <th>OpenAI_RAG_answer</th>\n",
582
       "      <th>Custom_RAG_answer</th>\n",
583
       "      <th>Custom_RAG_context</th>\n",
584
       "      <th>Uri</th>\n",
585
       "      <th>H1</th>\n",
586
       "      <th>H2</th>\n",
587
       "      <th>Score</th>\n",
588
       "      <th>Reason</th>\n",
589
       "    </tr>\n",
590
       "  </thead>\n",
591
       "  <tbody>\n",
592
       "    <tr>\n",
593
       "      <th>0</th>\n",
594
       "      <td>What do the parameters for HNSW mean?\\n</td>\n",
595
       "      <td>- M: maximum degree of nodes in a layer of the...</td>\n",
596
       "      <td>The HNSW parameters include the “nlist” which ...</td>\n",
597
       "      <td>The parameters for HNSW have the following mea...</td>\n",
598
       "      <td>performance, HNSW limits the maximum degree of...</td>\n",
599
       "      <td>https://pymilvus.readthedocs.io/en/latest/para...</td>\n",
600
       "      <td>Index</td>\n",
601
       "      <td>Milvus support to create index to accelerate v...</td>\n",
602
       "      <td>NaN</td>\n",
603
       "      <td>NaN</td>\n",
604
       "    </tr>\n",
605
       "    <tr>\n",
606
       "      <th>1</th>\n",
607
       "      <td>What are HNSW good default parameters when dat...</td>\n",
608
       "      <td>M=16, efConstruction=32, ef=32</td>\n",
609
       "      <td>The default HNSW parameters for data size of 2...</td>\n",
610
       "      <td>For a data size of 25K vectors with a dimensio...</td>\n",
611
       "      <td>Metrics. Vector Index¶ FLAT IVF_FLAT IVF_SQ8 I...</td>\n",
612
       "      <td>https://pymilvus.readthedocs.io/en/latest/para...</td>\n",
613
       "      <td>NaN</td>\n",
614
       "      <td>NaN</td>\n",
615
       "      <td>NaN</td>\n",
616
       "      <td>NaN</td>\n",
617
       "    </tr>\n",
618
       "    <tr>\n",
619
       "      <th>2</th>\n",
620
       "      <td>what is the default distance metric used in AU...</td>\n",
621
       "      <td>Trick answer:  IP inner product, not yet updat...</td>\n",
622
       "      <td>The default distance metric used in AUTOINDEX ...</td>\n",
623
       "      <td>The default distance metric used in AUTOINDEX ...</td>\n",
624
       "      <td>The attributes of collection can be extracted ...</td>\n",
625
       "      <td>https://pymilvus.readthedocs.io/en/latest/tuto...</td>\n",
626
       "      <td>NaN</td>\n",
627
       "      <td>NaN</td>\n",
628
       "      <td>NaN</td>\n",
629
       "      <td>NaN</td>\n",
630
       "    </tr>\n",
631
       "    <tr>\n",
632
       "      <th>3</th>\n",
633
       "      <td>How did New York City get its name?</td>\n",
634
       "      <td>In the 1600’s, the Dutch planted a trading pos...</td>\n",
635
       "      <td>I'm sorry, but I couldn't find any information...</td>\n",
636
       "      <td>New York City was originally named New Amsterd...</td>\n",
637
       "      <td>Etymology\\nSee also: Nicknames of New York Cit...</td>\n",
638
       "      <td>https://en.wikipedia.org/wiki/New_York_City</td>\n",
639
       "      <td>NaN</td>\n",
640
       "      <td>NaN</td>\n",
641
       "      <td>NaN</td>\n",
642
       "      <td>NaN</td>\n",
643
       "    </tr>\n",
644
       "  </tbody>\n",
645
       "</table>\n",
646
       "</div>"
647
      ],
648
      "text/plain": [
649
       "                                            Question  \\\n",
650
       "0            What do the parameters for HNSW mean?\\n   \n",
651
       "1  What are HNSW good default parameters when dat...   \n",
652
       "2  what is the default distance metric used in AU...   \n",
653
       "3                How did New York City get its name?   \n",
654
       "\n",
655
       "                                 ground_truth_answer  \\\n",
656
       "0  - M: maximum degree of nodes in a layer of the...   \n",
657
       "1                     M=16, efConstruction=32, ef=32   \n",
658
       "2  Trick answer:  IP inner product, not yet updat...   \n",
659
       "3  In the 1600’s, the Dutch planted a trading pos...   \n",
660
       "\n",
661
       "                                   OpenAI_RAG_answer  \\\n",
662
       "0  The HNSW parameters include the “nlist” which ...   \n",
663
       "1  The default HNSW parameters for data size of 2...   \n",
664
       "2  The default distance metric used in AUTOINDEX ...   \n",
665
       "3  I'm sorry, but I couldn't find any information...   \n",
666
       "\n",
667
       "                                   Custom_RAG_answer  \\\n",
668
       "0  The parameters for HNSW have the following mea...   \n",
669
       "1  For a data size of 25K vectors with a dimensio...   \n",
670
       "2  The default distance metric used in AUTOINDEX ...   \n",
671
       "3  New York City was originally named New Amsterd...   \n",
672
       "\n",
673
       "                                  Custom_RAG_context  \\\n",
674
       "0  performance, HNSW limits the maximum degree of...   \n",
675
       "1  Metrics. Vector Index¶ FLAT IVF_FLAT IVF_SQ8 I...   \n",
676
       "2  The attributes of collection can be extracted ...   \n",
677
       "3  Etymology\\nSee also: Nicknames of New York Cit...   \n",
678
       "\n",
679
       "                                                 Uri     H1  \\\n",
680
       "0  https://pymilvus.readthedocs.io/en/latest/para...  Index   \n",
681
       "1  https://pymilvus.readthedocs.io/en/latest/para...    NaN   \n",
682
       "2  https://pymilvus.readthedocs.io/en/latest/tuto...    NaN   \n",
683
       "3        https://en.wikipedia.org/wiki/New_York_City    NaN   \n",
684
       "\n",
685
       "                                                  H2  Score  Reason  \n",
686
       "0  Milvus support to create index to accelerate v...    NaN     NaN  \n",
687
       "1                                                NaN    NaN     NaN  \n",
688
       "2                                                NaN    NaN     NaN  \n",
689
       "3                                                NaN    NaN     NaN  "
690
      ]
691
     },
692
     "metadata": {},
693
     "output_type": "display_data"
694
    }
695
   ],
696
   "source": [
697
    "# Read questions and ground truth answers into a pandas dataframe.\n",
698
    "import pandas as pd\n",
699
    "\n",
700
    "# Read ground truth answers from file.\n",
701
    "eval_df = pd.read_csv(\"../../../christy_coding_scratch/data/milvus_ground_truth.csv\", \n",
702
    "                      header=0, skip_blank_lines=True)\n",
703
    "display(eval_df.head())\n",
704
    "\n",
705
    "# Get all the questions.\n",
706
    "question_list = eval_df.Question.to_list()\n",
707
    "\n",
708
    "# Get all the ground truth answers.\n",
709
    "truth_list = eval_df.ground_truth_answer.to_list()\n",
710
    "\n",
711
    "# Get all the ground truth sources.\n",
712
    "uri_list = eval_df.Uri.to_list()\n",
713
    "\n",
714
    "# Get all the OpenAI Answers.\n",
715
    "openai_answer_list = eval_df.OpenAI_RAG_answer.to_list()"
716
   ]
717
  },
718
  {
719
   "cell_type": "markdown",
720
   "id": "bb69c50d",
721
   "metadata": {},
722
   "source": [
723
    "## Define a Custom Execution Loop for RAG."
724
   ]
725
  },
726
  {
727
   "cell_type": "code",
728
   "execution_count": 10,
729
   "id": "9b6aca9b",
730
   "metadata": {},
731
   "outputs": [],
732
   "source": [
733
    "import requests, json, pprint\n",
734
    "\n",
735
    "# Milvus search, define how many retrieval results to return.\n",
736
    "# Milvus automatically sorts results descending by distance score.\n",
737
    "TOP_K = 3\n",
738
    "\n",
739
    "# Search a collection containing Milvus Documentation.\n",
740
    "def zilliz_pipeline_collection_search(token, question):\n",
741
    "    # Define the URL, headers, and data\n",
742
    "    url = \"https://controller.api.gcp-us-west1.zillizcloud.com/v1/pipelines/pipe-3de3fb4a9bc3c2a64a786b/run\"\n",
743
    "    headers = {\n",
744
    "        \"Content-Type\": \"application/json\",\n",
745
    "        \"Authorization\": f\"Bearer {token}\",\n",
746
    "    }\n",
747
    "    data = {\n",
748
    "        \"data\": {\n",
749
    "            \"query_text\": question\n",
750
    "        },\n",
751
    "        \"params\": {\n",
752
    "            \"limit\": 3,\n",
753
    "            \"offset\": 0,\n",
754
    "            \"outputFields\": [\"chunk_text\", \"chunk_id\", \"doc_name\", \"source\"],\n",
755
    "            \"filter\": \"chunk_id >= 0 && doc_name == 'param.html'\",\n",
756
    "        }\n",
757
    "    }\n",
758
    "\n",
759
    "    # Send the POST request\n",
760
    "    response = requests.post(url, headers=headers, json=data)\n",
761
    "\n",
762
    "    # # Print the response\n",
763
    "    # pprint.pprint(response.json())\n",
764
    "    return response.json()\n",
765
    "\n",
766
    "# Search a collection containing Wikipedia articles about New York City.\n",
767
    "def wikipedia_search(mc, collection_name, collection_encoder, question, output_fields=None, top_k=3):\n",
768
    "    # Embed the query\n",
769
    "    query_embeddings = _utils.embed_query(collection_encoder, [question])\n",
770
    "\n",
771
    "    # Define search parameters\n",
772
    "    INDEX_PARAMS = dict({\n",
773
    "        'M': M,               \n",
774
    "        \"efConstruction\": efConstruction })\n",
775
    "    SEARCH_PARAMS = dict({\n",
776
    "        \"ef\": INDEX_PARAMS['efConstruction']\n",
777
    "    })\n",
778
    "\n",
779
    "    # Define output fields to return\n",
780
    "    OUTPUT_FIELDS = [\"h1\", \"source\", \"chunk\"]\n",
781
    "\n",
782
    "    # Perform the search\n",
783
    "    answers = mc.search(\n",
784
    "        collection_name,\n",
785
    "        data=query_embeddings, \n",
786
    "        search_params=SEARCH_PARAMS,\n",
787
    "        output_fields=output_fields, \n",
788
    "        filter=\"(source like 'https://en.wikipedia.org%')\",\n",
789
    "        limit=top_k,\n",
790
    "        consistency_level=\"Eventually\"\n",
791
    "    )\n",
792
    "\n",
793
    "    return answers"
794
   ]
795
  },
796
  {
797
   "cell_type": "code",
798
   "execution_count": 11,
799
   "id": "cfb1f303",
800
   "metadata": {},
801
   "outputs": [],
802
   "source": [
803
    "# Function to get OpenAI response and token usage.\n",
804
    "def get_openai_chat(llm_name, user_prompt, retrieval_context, retrieval_source, message_history,\n",
805
    "                     temperature=0.0, random_seed=415, frequency_penalty=2):\n",
806
    "    \"\"\" \n",
807
    "    Returns 2 pandas dataframes: response, token_use.\n",
808
    "    \"\"\"\n",
809
    "    \n",
810
    "    system_message = f\"\"\"\n",
811
    "    Use the Context to answer the user's question. Be clear, factual, complete, concise.\n",
812
    "    If the answer is not in the Context, say \"I don't know\".  Otherwise answer using this format:\n",
813
    "    Context: {retrieval_context}\n",
814
    "    Answer: The answer to the question.\n",
815
    "    Grounding source: {retrieval_source}\n",
816
    "    \"\"\"\n",
817
    "    messages = [\n",
818
    "        {'role': 'system', 'content': system_message},\n",
819
    "        {'role': 'user', 'content': f\"{user_prompt}\"},\n",
820
    "        {'role': 'assistant', 'content': f\"Relevant context:\\n{retrieval_context}\"}\n",
821
    "    ]\n",
822
    "\n",
823
    "    # Define the OpenAIEvaluator.\n",
824
    "    responses = openai_client.chat.completions.create(\n",
825
    "        response_format={\n",
826
    "            \"type\": \"json_object\", \n",
827
    "            # \"schema\": Result.schema_json()\n",
828
    "        },\n",
829
    "        messages=message_history + messages,\n",
830
    "        model=llm_name,\n",
831
    "        temperature=temperature, # the degree of randomness of the model's output\n",
832
    "        seed=random_seed,  # for reproducibility\n",
833
    "        frequency_penalty=frequency_penalty, # allowed amount of repitition in the model's output\n",
834
    "        # max_tokens=max_tokens # maximum number of tokens the model can output\n",
835
    "    )\n",
836
    "    message_history = message_history + messages[1:]\n",
837
    "\n",
838
    "    # Make sure total_tokens < 4096.\n",
839
    "    token_dict = {\n",
840
    "        'prompt_tokens':responses.usage.prompt_tokens,\n",
841
    "        'completion_tokens':responses.usage.completion_tokens,\n",
842
    "        'total_tokens':responses.usage.total_tokens,\n",
843
    "    }\n",
844
    "\n",
845
    "    # Return answer as a JSON object.\n",
846
    "    openai_response = responses.choices[0].message.content\n",
847
    "    json_response = json.loads(openai_response)\n",
848
    "    json_response # single json object with 3 fields\n",
849
    "\n",
850
    "    # Create a DataFrame from a list of dictionaries.\n",
851
    "    response_df = pd.DataFrame([json_response])\n",
852
    "    token_use_df = pd.DataFrame([token_dict])\n",
853
    "\n",
854
    "    return response_df, token_use_df\n",
855
    "\n",
856
    "def get_answer_from_openai_chat_response(chat_response):\n",
857
    "    # Extract the answer from the 0th choice's message content\n",
858
    "    answer = chat_response.choices[0].message.content\n",
859
    "    return answer"
860
   ]
861
  },
862
  {
863
   "cell_type": "code",
864
   "execution_count": 12,
865
   "id": "d671601b",
866
   "metadata": {},
867
   "outputs": [],
868
   "source": [
869
    "# STEP1: Moderation check of user question.  If pass, continue.\n",
870
    "# STEP2: Retrieve closest chunk to question from default collection.\n",
871
    "#        Check distance score of the retrieved chunk.  \n",
872
    "#   STEP3:  If score is too low, get the intent from the question.\n",
873
    "#   STEP4:  Based on question intent, retrieve from a different collection containing that data.\n",
874
    "# STEP5: Generate answer to the user's question, using context in the ASSISTANT PROMPT.\n",
875
    "# STEP6: Moderation check of generated answer.  If pass, continue.\n",
876
    "# STEP7: Return final answer to user.\n",
877
    "\n",
878
    "# Define a custom execution loop for RAG.\n",
879
    "def process_user_message(user_input, question_number, message_history, top_k=3, debug=False):\n",
880
    "    delimiter = \"```\"\n",
881
    "    retrieval_done = False\n",
882
    "    threshold_retrieval_score = 0.6\n",
883
    "    ragas_metrics= ['answer_relevancy', 'faithfulness']\n",
884
    "\n",
885
    "    # # Step 1: Check input to see if it flags the Moderation API or is a prompt injection\n",
886
    "    # if debug:\n",
887
    "    #    print()\n",
888
    "    #    print(\"STEP 1: Check input to see if it flags the Moderation API or is a prompt injection\")\n",
889
    "    # response = openai_client.moderations.create(input=user_input)\n",
890
    "    # moderation_output = response.results[0]\n",
891
    "    # print(moderation_output.flagged) # False\n",
892
    "\n",
893
    "    # if moderation_output.flagged:\n",
894
    "    #     print(\"Step 1: Input flagged by Moderation API.\")\n",
895
    "    #     return \"Sorry, we cannot process this request.\", message_history\n",
896
    "\n",
897
    "    # Step 2: Retrieval from collection #1.\n",
898
    "    if debug:\n",
899
    "        print()\n",
900
    "        print(\"STEP 2: Retrieval from collection #1 MilvusDocs.\")\n",
901
    "    response = zilliz_pipeline_collection_search(TOKEN, user_input)\n",
902
    "    distance_score = response['data']['result'][0]['distance']\n",
903
    "\n",
904
    "    # Branching logic based on distance score.\n",
905
    "    if distance_score >= threshold_retrieval_score: \n",
906
    "        # Extract the retrieval context.\n",
907
    "        retrieval_context = response['data']['result'][0]['chunk_text']\n",
908
    "        retrieval_source = response['data']['result'][0]['source']\n",
909
    "        if debug:\n",
910
    "            print(f\"DISTANCE SCORE: {distance_score} branch logic.\")\n",
911
    "            print(f\"chunk_answer: {retrieval_context[:150]}\")\n",
912
    "        retrieval_done = True\n",
913
    "\n",
914
    "    if not retrieval_done and distance_score < threshold_retrieval_score:\n",
915
    "        # Step 3: If score is too low, get the intent from the prompt.\n",
916
    "        if debug:\n",
917
    "            print(f\"DISTANCE SCORE: {distance_score} branching logic...\")\n",
918
    "            print()\n",
919
    "            print(\"STEP 3: Score is too low, GET INTENT from the user's question.\")\n",
920
    "        if \"New York City\" in user_input:\n",
921
    "            intent = \"new_york\"\n",
922
    "            print(f\"intent = {intent}\")\n",
923
    "        # elif could check for other intents here...\n",
924
    "            \n",
925
    "        # Step 4: Based on question intent, retrieve from collection containing that data.\n",
926
    "        if intent == \"new_york\":\n",
927
    "            if debug:\n",
928
    "                print()\n",
929
    "                print(\"STEP 4: Based on question intent, retrieve from collection #2 Wikipedia.\")\n",
930
    "            OUTPUT_FIELDS = [\"h1\", \"source\", \"chunk\"]\n",
931
    "            response = wikipedia_search(mc, COLLECTION_NAME, encoder, user_input, OUTPUT_FIELDS, top_k)\n",
932
    "            # Extract the retrieval score, context, source citation.\n",
933
    "            distance_score = response[0][0]['distance']\n",
934
    "            retrieval_context = response[0][0]['entity']['chunk']\n",
935
    "            retrieval_source = response[0][0]['entity']['source']\n",
936
    "            if debug:\n",
937
    "                print(f\"chunk_answer: {retrieval_context[:150]}\")\n",
938
    "        else:\n",
939
    "            print(f\"STEP 4: No matching collection for intent {intent}.\")\n",
940
    "            return \"Sorry, we cannot process this request.\", message_history\n",
941
    "\n",
942
    "    # Branching logic based on distance score.\n",
943
    "    if debug:\n",
944
    "        print(f\"DISTANCE SCORE: {distance_score} branch logic...\")\n",
945
    "    if distance_score < threshold_retrieval_score: \n",
946
    "        print(\"UNABLE TO MATCH INTENT WITH ANY INTERNAL DOC STORE.\")\n",
947
    "        return \"Sorry, we cannot process this request.\", message_history\n",
948
    "    else: \n",
949
    "        print()\n",
950
    "        print(f\"Score from custom RAG Retrieval is above threshold, proceed to answer generation step.\")\n",
951
    "        # STEP 5: Generating GPT3.5 answer from the custom execution loop for RAG in the ASSISTANT PROMPT.\n",
952
    "        if debug:\n",
953
    "            print()\n",
954
    "            print(\"STEP 5: Generating GPT3.5 answer from the custom execution loop for RAG in the ASSISTANT PROMPT.\")\n",
955
    "        system_message = f\"\"\"\n",
956
    "        Use the Context below to answer the user's question. Be clear, factual, complete, concise.\n",
957
    "        If the answer is not in the Context, say \"I don't know\".  Otherwise answer using this format:\n",
958
    "        Context: {retrieval_context}\n",
959
    "        Answer: The answer to the question.\n",
960
    "        Grounding source: {retrieval_source}\n",
961
    "        \"\"\"\n",
962
    "        messages = [\n",
963
    "            {'role': 'system', 'content': system_message},\n",
964
    "            {'role': 'user', 'content': f\"{delimiter}{user_input}{delimiter}\"},\n",
965
    "            {'role': 'assistant', 'content': f\"Relevant context:\\n{retrieval_context}\"}\n",
966
    "        ]\n",
967
    "        final_response = openai_client.chat.completions.create(\n",
968
    "            messages=message_history + messages,\n",
969
    "            model=LLM_NAME,\n",
970
    "            temperature=TEMPERATURE,\n",
971
    "            seed=RANDOM_SEED,\n",
972
    "        )\n",
973
    "        message_history = message_history + messages[1:]\n",
974
    "        answer = get_answer_from_openai_chat_response(final_response)\n",
975
    "\n",
976
    "        # STEP 6: Evaluate whether the chatbot response answers the initial user query well.\n",
977
    "        if debug:\n",
978
    "            print()\n",
979
    "            print(\"STEP 6: Evaluate whether the chatbot response answers the initial user query well.\")\n",
980
    "            ragas_result = evaluate_ragas(eval_df, \"Custom_RAG_answer\", True, question_number, \"final_only\")\n",
981
    "            ragas_df = ragas_result.to_pandas()\n",
982
    "            print(f\"Ragas evaluation: answer similarity: {ragas_df.answer_similarity[0]}, answer relevancy: {np.round(ragas_df.answer_relevancy[0],3)}, answer correctness: {np.round(ragas_df.answer_correctness[0],3)}\")\n",
983
    "            # could also check for other metrics here...\n",
984
    "            evaluation_response = \"Y\"\n",
985
    "\n",
986
    "        # STEP 7: If LLM answer passed Evaluation, return it to the user.\n",
987
    "        if evaluation_response == \"Y\":\n",
988
    "            if debug:\n",
989
    "                print()\n",
990
    "                print(\"STEP 7: LLM answer passed Evaluation, return it to the user.\")\n",
991
    "            return answer, message_history\n",
992
    "        else:\n",
993
    "            if debug:\n",
994
    "                print()\n",
995
    "                print(f\"STEP 7: The LLM answer does not pass Evaluation.\")\n",
996
    "            return answer, message_history\n"
997
   ]
998
  },
999
  {
1000
   "cell_type": "code",
1001
   "execution_count": 13,
1002
   "id": "bb1a52ca",
1003
   "metadata": {},
1004
   "outputs": [
1005
    {
1006
     "name": "stdout",
1007
     "output_type": "stream",
1008
     "text": [
1009
      "question = How did New York City get its name?\n",
1010
      "\n",
1011
      "STEP 2: Retrieval from collection #1 MilvusDocs.\n",
1012
      "DISTANCE SCORE: 0.39108937978744507 branching logic...\n",
1013
      "\n",
1014
      "STEP 3: Score is too low, GET INTENT from the user's question.\n",
1015
      "intent = new_york\n",
1016
      "\n",
1017
      "STEP 4: Based on question intent, retrieve from collection #2 Wikipedia.\n",
1018
      "chunk_answer: New York City traces its origins to Fort Amsterdam and a trading post founded on the southern tip of Manhattan Island by Dutch colonists in approximat\n",
1019
      "DISTANCE SCORE: 0.7961502075195312 branch logic...\n",
1020
      "\n",
1021
      "Score from custom RAG Retrieval is above threshold, proceed to answer generation step.\n",
1022
      "\n",
1023
      "STEP 5: Generating GPT3.5 answer from the custom execution loop for RAG in the ASSISTANT PROMPT.\n"
1024
     ]
1025
    },
1026
    {
1027
     "name": "stderr",
1028
     "output_type": "stream",
1029
     "text": [
1030
      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
1031
      "To disable this warning, you can either:\n",
1032
      "\t- Avoid using `tokenizers` before the fork if possible\n",
1033
      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
1034
     ]
1035
    },
1036
    {
1037
     "name": "stdout",
1038
     "output_type": "stream",
1039
     "text": [
1040
      "\n",
1041
      "STEP 6: Evaluate whether the chatbot response answers the initial user query well.\n",
1042
      "evaluating with [answer_similarity]\n"
1043
     ]
1044
    },
1045
    {
1046
     "name": "stderr",
1047
     "output_type": "stream",
1048
     "text": [
1049
      "100%|██████████| 1/1 [00:00<00:00,  1.49it/s]\n"
1050
     ]
1051
    },
1052
    {
1053
     "name": "stdout",
1054
     "output_type": "stream",
1055
     "text": [
1056
      "evaluating with [answer_relevancy]\n"
1057
     ]
1058
    },
1059
    {
1060
     "name": "stderr",
1061
     "output_type": "stream",
1062
     "text": [
1063
      "100%|██████████| 1/1 [00:01<00:00,  1.73s/it]\n"
1064
     ]
1065
    },
1066
    {
1067
     "name": "stdout",
1068
     "output_type": "stream",
1069
     "text": [
1070
      "evaluating with [answer_correctness]\n"
1071
     ]
1072
    },
1073
    {
1074
     "name": "stderr",
1075
     "output_type": "stream",
1076
     "text": [
1077
      "100%|██████████| 1/1 [00:05<00:00,  5.98s/it]\n"
1078
     ]
1079
    },
1080
    {
1081
     "name": "stdout",
1082
     "output_type": "stream",
1083
     "text": [
1084
      "Ragas evaluation: answer similarity: 0.9421961714808575, answer relevancy: 0.894, answer correctness: 0.664\n",
1085
      "\n",
1086
      "STEP 7: LLM answer passed Evaluation, return it to the user.\n",
1087
      "('Answer: New York City was originally named New Amsterdam by Dutch colonists '\n",
1088
      " 'in 1626. However, it was renamed New York in 1664 after King Charles II '\n",
1089
      " 'granted the lands to his brother, the Duke of York, when the city came under '\n",
1090
      " 'British control.')\n"
1091
     ]
1092
    }
1093
   ],
1094
   "source": [
1095
    "# Test the custom RAG execution loop using a question.\n",
1096
    "\n",
1097
    "QUESTION_NUMBER = 3 #2 or 3\n",
1098
    "SAMPLE_QUESTION = question_list[QUESTION_NUMBER]\n",
1099
    "print(f\"question = {SAMPLE_QUESTION}\")\n",
1100
    "\n",
1101
    "truth_answer = truth_list[QUESTION_NUMBER]\n",
1102
    "\n",
1103
    "# Test the OpenAI answer.\n",
1104
    "all_messages = []\n",
1105
    "answer_history = []\n",
1106
    "openai_answer, messages = process_user_message(SAMPLE_QUESTION, QUESTION_NUMBER, all_messages, debug=True)\n",
1107
    "all_messages.append(messages)\n",
1108
    "answer_history.append(openai_answer)\n",
1109
    "pprint.pprint(f\"Answer: {openai_answer}\")"
1110
   ]
1111
  },
1112
  {
1113
   "cell_type": "markdown",
1114
   "id": "67fa1791",
1115
   "metadata": {},
1116
   "source": [
1117
    "## Final Eval Comparisons Custom RAG vs OpenAI RAG"
1118
   ]
1119
  },
1120
  {
1121
   "cell_type": "code",
1122
   "execution_count": 14,
1123
   "id": "aa9a35cd",
1124
   "metadata": {},
1125
   "outputs": [
1126
    {
1127
     "name": "stdout",
1128
     "output_type": "stream",
1129
     "text": [
1130
      "evaluating with [context_recall]\n"
1131
     ]
1132
    },
1133
    {
1134
     "name": "stderr",
1135
     "output_type": "stream",
1136
     "text": [
1137
      "100%|██████████| 1/1 [00:14<00:00, 14.62s/it]\n"
1138
     ]
1139
    },
1140
    {
1141
     "name": "stdout",
1142
     "output_type": "stream",
1143
     "text": [
1144
      "evaluating with [context_precision]\n"
1145
     ]
1146
    },
1147
    {
1148
     "name": "stderr",
1149
     "output_type": "stream",
1150
     "text": [
1151
      "100%|██████████| 1/1 [00:07<00:00,  7.86s/it]\n"
1152
     ]
1153
    },
1154
    {
1155
     "name": "stdout",
1156
     "output_type": "stream",
1157
     "text": [
1158
      "evaluating with [faithfulness]\n"
1159
     ]
1160
    },
1161
    {
1162
     "name": "stderr",
1163
     "output_type": "stream",
1164
     "text": [
1165
      "100%|██████████| 1/1 [00:29<00:00, 29.35s/it]\n"
1166
     ]
1167
    },
1168
    {
1169
     "name": "stdout",
1170
     "output_type": "stream",
1171
     "text": [
1172
      "evaluating with [answer_similarity]\n"
1173
     ]
1174
    },
1175
    {
1176
     "name": "stderr",
1177
     "output_type": "stream",
1178
     "text": [
1179
      "100%|██████████| 1/1 [00:01<00:00,  1.20s/it]\n"
1180
     ]
1181
    },
1182
    {
1183
     "name": "stdout",
1184
     "output_type": "stream",
1185
     "text": [
1186
      "evaluating with [answer_relevancy]\n"
1187
     ]
1188
    },
1189
    {
1190
     "name": "stderr",
1191
     "output_type": "stream",
1192
     "text": [
1193
      "100%|██████████| 1/1 [00:07<00:00,  7.96s/it]\n"
1194
     ]
1195
    },
1196
    {
1197
     "name": "stdout",
1198
     "output_type": "stream",
1199
     "text": [
1200
      "evaluating with [answer_correctness]\n"
1201
     ]
1202
    },
1203
    {
1204
     "name": "stderr",
1205
     "output_type": "stream",
1206
     "text": [
1207
      "100%|██████████| 1/1 [00:20<00:00, 20.12s/it]\n"
1208
     ]
1209
    },
1210
    {
1211
     "data": {
1212
      "text/html": [
1213
       "<div>\n",
1214
       "<style scoped>\n",
1215
       "    .dataframe tbody tr th:only-of-type {\n",
1216
       "        vertical-align: middle;\n",
1217
       "    }\n",
1218
       "\n",
1219
       "    .dataframe tbody tr th {\n",
1220
       "        vertical-align: top;\n",
1221
       "    }\n",
1222
       "\n",
1223
       "    .dataframe thead th {\n",
1224
       "        text-align: right;\n",
1225
       "    }\n",
1226
       "</style>\n",
1227
       "<table border=\"1\" class=\"dataframe\">\n",
1228
       "  <thead>\n",
1229
       "    <tr style=\"text-align: right;\">\n",
1230
       "      <th></th>\n",
1231
       "      <th>question</th>\n",
1232
       "      <th>ground_truths</th>\n",
1233
       "      <th>contexts_Custom_RAG</th>\n",
1234
       "      <th>answer_Custom_RAG</th>\n",
1235
       "      <th>context_recall</th>\n",
1236
       "      <th>context_precision</th>\n",
1237
       "      <th>faithfulness</th>\n",
1238
       "      <th>answer_similarity_Custom_RAG</th>\n",
1239
       "      <th>answer_relevancy_Custom_RAG</th>\n",
1240
       "      <th>answer_correctness_Custom_RAG</th>\n",
1241
       "    </tr>\n",
1242
       "  </thead>\n",
1243
       "  <tbody>\n",
1244
       "    <tr>\n",
1245
       "      <th>0</th>\n",
1246
       "      <td>What do the parameters for HNSW mean?\\n</td>\n",
1247
       "      <td>[- M: maximum degree of nodes in a layer of th...</td>\n",
1248
       "      <td>[performance, HNSW limits the maximum degree o...</td>\n",
1249
       "      <td>The parameters for HNSW have the following mea...</td>\n",
1250
       "      <td>1.0</td>\n",
1251
       "      <td>1.0</td>\n",
1252
       "      <td>0.8</td>\n",
1253
       "      <td>0.844867</td>\n",
1254
       "      <td>0.979217</td>\n",
1255
       "      <td>0.620304</td>\n",
1256
       "    </tr>\n",
1257
       "    <tr>\n",
1258
       "      <th>1</th>\n",
1259
       "      <td>What are HNSW good default parameters when dat...</td>\n",
1260
       "      <td>[M=16, efConstruction=32, ef=32]</td>\n",
1261
       "      <td>[Metrics. Vector Index¶ FLAT IVF_FLAT IVF_SQ8 ...</td>\n",
1262
       "      <td>For a data size of 25K vectors with a dimensio...</td>\n",
1263
       "      <td>0.0</td>\n",
1264
       "      <td>0.0</td>\n",
1265
       "      <td>0.0</td>\n",
1266
       "      <td>0.776006</td>\n",
1267
       "      <td>0.977902</td>\n",
1268
       "      <td>0.622550</td>\n",
1269
       "    </tr>\n",
1270
       "    <tr>\n",
1271
       "      <th>2</th>\n",
1272
       "      <td>what is the default distance metric used in AU...</td>\n",
1273
       "      <td>[Trick answer:  IP inner product, not yet upda...</td>\n",
1274
       "      <td>[The attributes of collection can be extracted...</td>\n",
1275
       "      <td>The default distance metric used in AUTOINDEX ...</td>\n",
1276
       "      <td>0.0</td>\n",
1277
       "      <td>0.0</td>\n",
1278
       "      <td>0.0</td>\n",
1279
       "      <td>0.738060</td>\n",
1280
       "      <td>0.990814</td>\n",
1281
       "      <td>0.484557</td>\n",
1282
       "    </tr>\n",
1283
       "    <tr>\n",
1284
       "      <th>3</th>\n",
1285
       "      <td>How did New York City get its name?</td>\n",
1286
       "      <td>[In the 1600’s, the Dutch planted a trading po...</td>\n",
1287
       "      <td>[Etymology\\nSee also: Nicknames of New York Ci...</td>\n",
1288
       "      <td>New York City was originally named New Amsterd...</td>\n",
1289
       "      <td>1.0</td>\n",
1290
       "      <td>1.0</td>\n",
1291
       "      <td>0.5</td>\n",
1292
       "      <td>0.942196</td>\n",
1293
       "      <td>0.894259</td>\n",
1294
       "      <td>0.664120</td>\n",
1295
       "    </tr>\n",
1296
       "  </tbody>\n",
1297
       "</table>\n",
1298
       "</div>"
1299
      ],
1300
      "text/plain": [
1301
       "                                            question  \\\n",
1302
       "0            What do the parameters for HNSW mean?\\n   \n",
1303
       "1  What are HNSW good default parameters when dat...   \n",
1304
       "2  what is the default distance metric used in AU...   \n",
1305
       "3                How did New York City get its name?   \n",
1306
       "\n",
1307
       "                                       ground_truths  \\\n",
1308
       "0  [- M: maximum degree of nodes in a layer of th...   \n",
1309
       "1                   [M=16, efConstruction=32, ef=32]   \n",
1310
       "2  [Trick answer:  IP inner product, not yet upda...   \n",
1311
       "3  [In the 1600’s, the Dutch planted a trading po...   \n",
1312
       "\n",
1313
       "                                 contexts_Custom_RAG  \\\n",
1314
       "0  [performance, HNSW limits the maximum degree o...   \n",
1315
       "1  [Metrics. Vector Index¶ FLAT IVF_FLAT IVF_SQ8 ...   \n",
1316
       "2  [The attributes of collection can be extracted...   \n",
1317
       "3  [Etymology\\nSee also: Nicknames of New York Ci...   \n",
1318
       "\n",
1319
       "                                   answer_Custom_RAG  context_recall  \\\n",
1320
       "0  The parameters for HNSW have the following mea...             1.0   \n",
1321
       "1  For a data size of 25K vectors with a dimensio...             0.0   \n",
1322
       "2  The default distance metric used in AUTOINDEX ...             0.0   \n",
1323
       "3  New York City was originally named New Amsterd...             1.0   \n",
1324
       "\n",
1325
       "   context_precision  faithfulness  answer_similarity_Custom_RAG  \\\n",
1326
       "0                1.0           0.8                      0.844867   \n",
1327
       "1                0.0           0.0                      0.776006   \n",
1328
       "2                0.0           0.0                      0.738060   \n",
1329
       "3                1.0           0.5                      0.942196   \n",
1330
       "\n",
1331
       "   answer_relevancy_Custom_RAG  answer_correctness_Custom_RAG  \n",
1332
       "0                     0.979217                       0.620304  \n",
1333
       "1                     0.977902                       0.622550  \n",
1334
       "2                     0.990814                       0.484557  \n",
1335
       "3                     0.894259                       0.664120  "
1336
      ]
1337
     },
1338
     "metadata": {},
1339
     "output_type": "display_data"
1340
    }
1341
   ],
1342
   "source": [
1343
    "# Run Ragas Eval for all Questions, all Custom RAG Answers.\n",
1344
    "\n",
1345
    "# def evaluate_ragas(input_df, answer_col_name=\"OpenAI_RAG_answer\", context_exists=False, row_number=-9999, metrics=\"final_only\"):\n",
1346
    "ragas_result = evaluate_ragas(eval_df, \"Custom_RAG_answer\", True, -9999, \"all\")\n",
1347
    "ragas_df_Custom_RAG = ragas_result.to_pandas()\n",
1348
    "\n",
1349
    "# Rename the columns.\n",
1350
    "rename_dict = {\n",
1351
    "    \"contexts\": \"contexts_Custom_RAG\",\n",
1352
    "    \"answer\": \"answer_Custom_RAG\",\n",
1353
    "    \"answer_similarity\": \"answer_similarity_Custom_RAG\",\n",
1354
    "    \"answer_relevancy\": \"answer_relevancy_Custom_RAG\",\n",
1355
    "    \"answer_correctness\": \"answer_correctness_Custom_RAG\"\n",
1356
    "}\n",
1357
    "ragas_df_Custom_RAG.rename(columns=rename_dict, inplace=True)\n",
1358
    "# Reorder the columns.\n",
1359
    "ragas_df_Custom_RAG = ragas_df_Custom_RAG.iloc[:,[0, 3, 1, 2, 4,5,6,7,8,9]]\n",
1360
    "display(ragas_df_Custom_RAG.head())"
1361
   ]
1362
  },
1363
  {
1364
   "cell_type": "code",
1365
   "execution_count": 15,
1366
   "id": "1f1b1f4e",
1367
   "metadata": {},
1368
   "outputs": [
1369
    {
1370
     "name": "stdout",
1371
     "output_type": "stream",
1372
     "text": [
1373
      "evaluating with [answer_similarity]\n"
1374
     ]
1375
    },
1376
    {
1377
     "name": "stderr",
1378
     "output_type": "stream",
1379
     "text": [
1380
      "100%|██████████| 1/1 [00:00<00:00,  2.01it/s]\n"
1381
     ]
1382
    },
1383
    {
1384
     "name": "stdout",
1385
     "output_type": "stream",
1386
     "text": [
1387
      "evaluating with [answer_relevancy]\n"
1388
     ]
1389
    },
1390
    {
1391
     "name": "stderr",
1392
     "output_type": "stream",
1393
     "text": [
1394
      "100%|██████████| 1/1 [00:07<00:00,  7.85s/it]\n"
1395
     ]
1396
    },
1397
    {
1398
     "name": "stdout",
1399
     "output_type": "stream",
1400
     "text": [
1401
      "evaluating with [answer_correctness]\n"
1402
     ]
1403
    },
1404
    {
1405
     "name": "stderr",
1406
     "output_type": "stream",
1407
     "text": [
1408
      "100%|██████████| 1/1 [00:14<00:00, 14.49s/it]\n"
1409
     ]
1410
    },
1411
    {
1412
     "data": {
1413
      "text/html": [
1414
       "<div>\n",
1415
       "<style scoped>\n",
1416
       "    .dataframe tbody tr th:only-of-type {\n",
1417
       "        vertical-align: middle;\n",
1418
       "    }\n",
1419
       "\n",
1420
       "    .dataframe tbody tr th {\n",
1421
       "        vertical-align: top;\n",
1422
       "    }\n",
1423
       "\n",
1424
       "    .dataframe thead th {\n",
1425
       "        text-align: right;\n",
1426
       "    }\n",
1427
       "</style>\n",
1428
       "<table border=\"1\" class=\"dataframe\">\n",
1429
       "  <thead>\n",
1430
       "    <tr style=\"text-align: right;\">\n",
1431
       "      <th></th>\n",
1432
       "      <th>question</th>\n",
1433
       "      <th>ground_truths</th>\n",
1434
       "      <th>contexts_OpenAI_RAG</th>\n",
1435
       "      <th>answer_OpenAI_RAG</th>\n",
1436
       "      <th>answer_similarity_OpenAI_RAG</th>\n",
1437
       "      <th>answer_relevancy_OpenAI_RAG</th>\n",
1438
       "      <th>answer_correctness_OpenAI_RAG</th>\n",
1439
       "    </tr>\n",
1440
       "  </thead>\n",
1441
       "  <tbody>\n",
1442
       "    <tr>\n",
1443
       "      <th>0</th>\n",
1444
       "      <td>What do the parameters for HNSW mean?\\n</td>\n",
1445
       "      <td>[- M: maximum degree of nodes in a layer of th...</td>\n",
1446
       "      <td>[]</td>\n",
1447
       "      <td>The HNSW parameters include the “nlist” which ...</td>\n",
1448
       "      <td>0.747939</td>\n",
1449
       "      <td>0.936005</td>\n",
1450
       "      <td>0.186985</td>\n",
1451
       "    </tr>\n",
1452
       "    <tr>\n",
1453
       "      <th>1</th>\n",
1454
       "      <td>What are HNSW good default parameters when dat...</td>\n",
1455
       "      <td>[M=16, efConstruction=32, ef=32]</td>\n",
1456
       "      <td>[]</td>\n",
1457
       "      <td>The default HNSW parameters for data size of 2...</td>\n",
1458
       "      <td>0.824929</td>\n",
1459
       "      <td>0.981672</td>\n",
1460
       "      <td>0.206232</td>\n",
1461
       "    </tr>\n",
1462
       "    <tr>\n",
1463
       "      <th>2</th>\n",
1464
       "      <td>what is the default distance metric used in AU...</td>\n",
1465
       "      <td>[Trick answer:  IP inner product, not yet upda...</td>\n",
1466
       "      <td>[]</td>\n",
1467
       "      <td>The default distance metric used in AUTOINDEX ...</td>\n",
1468
       "      <td>0.770590</td>\n",
1469
       "      <td>0.990814</td>\n",
1470
       "      <td>0.692648</td>\n",
1471
       "    </tr>\n",
1472
       "    <tr>\n",
1473
       "      <th>3</th>\n",
1474
       "      <td>How did New York City get its name?</td>\n",
1475
       "      <td>[In the 1600’s, the Dutch planted a trading po...</td>\n",
1476
       "      <td>[]</td>\n",
1477
       "      <td>I'm sorry, but I couldn't find any information...</td>\n",
1478
       "      <td>0.777967</td>\n",
1479
       "      <td>0.000000</td>\n",
1480
       "      <td>0.194492</td>\n",
1481
       "    </tr>\n",
1482
       "  </tbody>\n",
1483
       "</table>\n",
1484
       "</div>"
1485
      ],
1486
      "text/plain": [
1487
       "                                            question  \\\n",
1488
       "0            What do the parameters for HNSW mean?\\n   \n",
1489
       "1  What are HNSW good default parameters when dat...   \n",
1490
       "2  what is the default distance metric used in AU...   \n",
1491
       "3                How did New York City get its name?   \n",
1492
       "\n",
1493
       "                                       ground_truths contexts_OpenAI_RAG  \\\n",
1494
       "0  [- M: maximum degree of nodes in a layer of th...                  []   \n",
1495
       "1                   [M=16, efConstruction=32, ef=32]                  []   \n",
1496
       "2  [Trick answer:  IP inner product, not yet upda...                  []   \n",
1497
       "3  [In the 1600’s, the Dutch planted a trading po...                  []   \n",
1498
       "\n",
1499
       "                                   answer_OpenAI_RAG  \\\n",
1500
       "0  The HNSW parameters include the “nlist” which ...   \n",
1501
       "1  The default HNSW parameters for data size of 2...   \n",
1502
       "2  The default distance metric used in AUTOINDEX ...   \n",
1503
       "3  I'm sorry, but I couldn't find any information...   \n",
1504
       "\n",
1505
       "   answer_similarity_OpenAI_RAG  answer_relevancy_OpenAI_RAG  \\\n",
1506
       "0                      0.747939                     0.936005   \n",
1507
       "1                      0.824929                     0.981672   \n",
1508
       "2                      0.770590                     0.990814   \n",
1509
       "3                      0.777967                     0.000000   \n",
1510
       "\n",
1511
       "   answer_correctness_OpenAI_RAG  \n",
1512
       "0                       0.186985  \n",
1513
       "1                       0.206232  \n",
1514
       "2                       0.692648  \n",
1515
       "3                       0.194492  "
1516
      ]
1517
     },
1518
     "metadata": {},
1519
     "output_type": "display_data"
1520
    }
1521
   ],
1522
   "source": [
1523
    "# Run Ragas Eval for all Questions, all OpenAI RAG Answers.\n",
1524
    "\n",
1525
    "ragas_result = evaluate_ragas(eval_df, \"OpenAI_RAG_answer\", False, -9999)\n",
1526
    "ragas_df_OpenAI_RAG = ragas_result.to_pandas()\n",
1527
    "\n",
1528
    "# Rename the columns.\n",
1529
    "# Rename the columns.\n",
1530
    "rename_dict = {\n",
1531
    "    \"contexts\": \"contexts_OpenAI_RAG\",\n",
1532
    "    \"answer\": \"answer_OpenAI_RAG\",\n",
1533
    "    \"answer_similarity\": \"answer_similarity_OpenAI_RAG\",\n",
1534
    "    \"answer_relevancy\": \"answer_relevancy_OpenAI_RAG\",\n",
1535
    "    \"answer_correctness\": \"answer_correctness_OpenAI_RAG\"\n",
1536
    "}\n",
1537
    "ragas_df_OpenAI_RAG.rename(columns=rename_dict, inplace=True)\n",
1538
    "# Reorder the columns.\n",
1539
    "ragas_df_OpenAI_RAG = ragas_df_OpenAI_RAG.iloc[:,[0, 3, 1, 2, 4,5,6]]\n",
1540
    "display(ragas_df_OpenAI_RAG)"
1541
   ]
1542
  },
1543
  {
1544
   "cell_type": "code",
1545
   "execution_count": 16,
1546
   "id": "c19bc0a5",
1547
   "metadata": {},
1548
   "outputs": [
1549
    {
1550
     "data": {
1551
      "text/html": [
1552
       "<div>\n",
1553
       "<style scoped>\n",
1554
       "    .dataframe tbody tr th:only-of-type {\n",
1555
       "        vertical-align: middle;\n",
1556
       "    }\n",
1557
       "\n",
1558
       "    .dataframe tbody tr th {\n",
1559
       "        vertical-align: top;\n",
1560
       "    }\n",
1561
       "\n",
1562
       "    .dataframe thead th {\n",
1563
       "        text-align: right;\n",
1564
       "    }\n",
1565
       "</style>\n",
1566
       "<table border=\"1\" class=\"dataframe\">\n",
1567
       "  <thead>\n",
1568
       "    <tr style=\"text-align: right;\">\n",
1569
       "      <th></th>\n",
1570
       "      <th>question</th>\n",
1571
       "      <th>ground_truths</th>\n",
1572
       "      <th>contexts_Custom_RAG</th>\n",
1573
       "      <th>answer_Custom_RAG</th>\n",
1574
       "      <th>contexts_OpenAI_RAG</th>\n",
1575
       "      <th>answer_OpenAI_RAG</th>\n",
1576
       "      <th>answer_similarity_Custom_RAG</th>\n",
1577
       "      <th>answer_relevancy_Custom_RAG</th>\n",
1578
       "      <th>answer_correctness_Custom_RAG</th>\n",
1579
       "      <th>answer_similarity_OpenAI_RAG</th>\n",
1580
       "      <th>answer_relevancy_OpenAI_RAG</th>\n",
1581
       "      <th>answer_correctness_OpenAI_RAG</th>\n",
1582
       "    </tr>\n",
1583
       "  </thead>\n",
1584
       "  <tbody>\n",
1585
       "    <tr>\n",
1586
       "      <th>0</th>\n",
1587
       "      <td>What do the parameters for HNSW mean?\\n</td>\n",
1588
       "      <td>[- M: maximum degree of nodes in a layer of th...</td>\n",
1589
       "      <td>[performance, HNSW limits the maximum degree o...</td>\n",
1590
       "      <td>The parameters for HNSW have the following mea...</td>\n",
1591
       "      <td>[]</td>\n",
1592
       "      <td>The HNSW parameters include the “nlist” which ...</td>\n",
1593
       "      <td>0.844867</td>\n",
1594
       "      <td>0.979217</td>\n",
1595
       "      <td>0.620304</td>\n",
1596
       "      <td>0.747939</td>\n",
1597
       "      <td>0.936005</td>\n",
1598
       "      <td>0.186985</td>\n",
1599
       "    </tr>\n",
1600
       "    <tr>\n",
1601
       "      <th>1</th>\n",
1602
       "      <td>What are HNSW good default parameters when dat...</td>\n",
1603
       "      <td>[M=16, efConstruction=32, ef=32]</td>\n",
1604
       "      <td>[Metrics. Vector Index¶ FLAT IVF_FLAT IVF_SQ8 ...</td>\n",
1605
       "      <td>For a data size of 25K vectors with a dimensio...</td>\n",
1606
       "      <td>[]</td>\n",
1607
       "      <td>The default HNSW parameters for data size of 2...</td>\n",
1608
       "      <td>0.776006</td>\n",
1609
       "      <td>0.977902</td>\n",
1610
       "      <td>0.622550</td>\n",
1611
       "      <td>0.824929</td>\n",
1612
       "      <td>0.981672</td>\n",
1613
       "      <td>0.206232</td>\n",
1614
       "    </tr>\n",
1615
       "    <tr>\n",
1616
       "      <th>2</th>\n",
1617
       "      <td>what is the default distance metric used in AU...</td>\n",
1618
       "      <td>[Trick answer:  IP inner product, not yet upda...</td>\n",
1619
       "      <td>[The attributes of collection can be extracted...</td>\n",
1620
       "      <td>The default distance metric used in AUTOINDEX ...</td>\n",
1621
       "      <td>[]</td>\n",
1622
       "      <td>The default distance metric used in AUTOINDEX ...</td>\n",
1623
       "      <td>0.738060</td>\n",
1624
       "      <td>0.990814</td>\n",
1625
       "      <td>0.484557</td>\n",
1626
       "      <td>0.770590</td>\n",
1627
       "      <td>0.990814</td>\n",
1628
       "      <td>0.692648</td>\n",
1629
       "    </tr>\n",
1630
       "    <tr>\n",
1631
       "      <th>3</th>\n",
1632
       "      <td>How did New York City get its name?</td>\n",
1633
       "      <td>[In the 1600’s, the Dutch planted a trading po...</td>\n",
1634
       "      <td>[Etymology\\nSee also: Nicknames of New York Ci...</td>\n",
1635
       "      <td>New York City was originally named New Amsterd...</td>\n",
1636
       "      <td>[]</td>\n",
1637
       "      <td>I'm sorry, but I couldn't find any information...</td>\n",
1638
       "      <td>0.942196</td>\n",
1639
       "      <td>0.894259</td>\n",
1640
       "      <td>0.664120</td>\n",
1641
       "      <td>0.777967</td>\n",
1642
       "      <td>0.000000</td>\n",
1643
       "      <td>0.194492</td>\n",
1644
       "    </tr>\n",
1645
       "  </tbody>\n",
1646
       "</table>\n",
1647
       "</div>"
1648
      ],
1649
      "text/plain": [
1650
       "                                            question  \\\n",
1651
       "0            What do the parameters for HNSW mean?\\n   \n",
1652
       "1  What are HNSW good default parameters when dat...   \n",
1653
       "2  what is the default distance metric used in AU...   \n",
1654
       "3                How did New York City get its name?   \n",
1655
       "\n",
1656
       "                                       ground_truths  \\\n",
1657
       "0  [- M: maximum degree of nodes in a layer of th...   \n",
1658
       "1                   [M=16, efConstruction=32, ef=32]   \n",
1659
       "2  [Trick answer:  IP inner product, not yet upda...   \n",
1660
       "3  [In the 1600’s, the Dutch planted a trading po...   \n",
1661
       "\n",
1662
       "                                 contexts_Custom_RAG  \\\n",
1663
       "0  [performance, HNSW limits the maximum degree o...   \n",
1664
       "1  [Metrics. Vector Index¶ FLAT IVF_FLAT IVF_SQ8 ...   \n",
1665
       "2  [The attributes of collection can be extracted...   \n",
1666
       "3  [Etymology\\nSee also: Nicknames of New York Ci...   \n",
1667
       "\n",
1668
       "                                   answer_Custom_RAG contexts_OpenAI_RAG  \\\n",
1669
       "0  The parameters for HNSW have the following mea...                  []   \n",
1670
       "1  For a data size of 25K vectors with a dimensio...                  []   \n",
1671
       "2  The default distance metric used in AUTOINDEX ...                  []   \n",
1672
       "3  New York City was originally named New Amsterd...                  []   \n",
1673
       "\n",
1674
       "                                   answer_OpenAI_RAG  \\\n",
1675
       "0  The HNSW parameters include the “nlist” which ...   \n",
1676
       "1  The default HNSW parameters for data size of 2...   \n",
1677
       "2  The default distance metric used in AUTOINDEX ...   \n",
1678
       "3  I'm sorry, but I couldn't find any information...   \n",
1679
       "\n",
1680
       "   answer_similarity_Custom_RAG  answer_relevancy_Custom_RAG  \\\n",
1681
       "0                      0.844867                     0.979217   \n",
1682
       "1                      0.776006                     0.977902   \n",
1683
       "2                      0.738060                     0.990814   \n",
1684
       "3                      0.942196                     0.894259   \n",
1685
       "\n",
1686
       "   answer_correctness_Custom_RAG  answer_similarity_OpenAI_RAG  \\\n",
1687
       "0                       0.620304                      0.747939   \n",
1688
       "1                       0.622550                      0.824929   \n",
1689
       "2                       0.484557                      0.770590   \n",
1690
       "3                       0.664120                      0.777967   \n",
1691
       "\n",
1692
       "   answer_relevancy_OpenAI_RAG  answer_correctness_OpenAI_RAG  \n",
1693
       "0                     0.936005                       0.186985  \n",
1694
       "1                     0.981672                       0.206232  \n",
1695
       "2                     0.990814                       0.692648  \n",
1696
       "3                     0.000000                       0.194492  "
1697
      ]
1698
     },
1699
     "metadata": {},
1700
     "output_type": "display_data"
1701
    },
1702
    {
1703
     "name": "stdout",
1704
     "output_type": "stream",
1705
     "text": [
1706
      "\n",
1707
      "####### FINAL SCORES OPENAI RAG vs MILVUS CUSTOM RAG #########\n",
1708
      "LLM as judge model: gpt-3.5-turbo-1106 with temperature: 0.1 scores:\n",
1709
      "# Truth vs RAG answers: 4\n",
1710
      "\n",
1711
      "avg_similarity_Custom_RAG: 0.83\n",
1712
      "avg_similarity_OpenAI_RAG: 0.78\n",
1713
      "\n",
1714
      "answer_relevancy_Custom_RAG: 0.96\n",
1715
      "avg_relevancy_OpenAI_RAG: 0.73\n",
1716
      "\n",
1717
      "avg_correctness_Custom_RAG: 0.6\n",
1718
      "avg_correctness_OpenAI_RAG: 0.32\n"
1719
     ]
1720
    }
1721
   ],
1722
   "source": [
1723
    "# Merge the 2 ragas dfs so they are easier to compare.\n",
1724
    "ragas_merged_df = ragas_df_Custom_RAG.iloc[:,[0,1,2,3,7,8,9]].merge(ragas_df_OpenAI_RAG.iloc[:, 2:], how='inner', left_index=True, right_index=True)\n",
1725
    "# reorder columns\n",
1726
    "ragas_merged_df = ragas_merged_df.iloc[:,[0,1,2,3,7,8,4,5,6,9,10,11]]\n",
1727
    "display(ragas_merged_df.head())\n",
1728
    "\n",
1729
    "print()\n",
1730
    "print(f\"####### FINAL SCORES OPENAI RAG vs MILVUS CUSTOM RAG #########\")\n",
1731
    "print(f\"LLM as judge model: {LLM_NAME} with temperature: {TEMPERATURE} scores:\")\n",
1732
    "print(f\"# Truth vs RAG answers: {len(ragas_merged_df)}\")\n",
1733
    "print()\n",
1734
    "print(f\"avg_similarity_Custom_RAG: {np.round(ragas_merged_df.answer_similarity_Custom_RAG.mean(), 2)}\")\n",
1735
    "print(f\"avg_similarity_OpenAI_RAG: {np.round(ragas_merged_df.answer_similarity_OpenAI_RAG.mean(), 2)}\")\n",
1736
    "print()\n",
1737
    "print(f\"answer_relevancy_Custom_RAG: {np.round(ragas_merged_df.answer_relevancy_Custom_RAG.mean(), 2)}\")\n",
1738
    "print(f\"avg_relevancy_OpenAI_RAG: {np.round(ragas_merged_df.answer_relevancy_OpenAI_RAG.mean(), 2)}\")\n",
1739
    "print()\n",
1740
    "print(f\"avg_correctness_Custom_RAG: {np.round(ragas_merged_df.answer_correctness_Custom_RAG.mean(), 2)}\")\n",
1741
    "print(f\"avg_correctness_OpenAI_RAG: {np.round(ragas_merged_df.answer_correctness_OpenAI_RAG.mean(), 2)}\")"
1742
   ]
1743
  },
1744
  {
1745
   "cell_type": "code",
1746
   "execution_count": 17,
1747
   "id": "d0e81e68",
1748
   "metadata": {},
1749
   "outputs": [],
1750
   "source": [
1751
    "# Drop collection\n",
1752
    "utility.drop_collection(COLLECTION_NAME)"
1753
   ]
1754
  },
1755
  {
1756
   "cell_type": "code",
1757
   "execution_count": 18,
1758
   "id": "c777937e",
1759
   "metadata": {},
1760
   "outputs": [
1761
    {
1762
     "name": "stdout",
1763
     "output_type": "stream",
1764
     "text": [
1765
      "Author: Christy Bergman\n",
1766
      "\n",
1767
      "Python implementation: CPython\n",
1768
      "Python version       : 3.11.6\n",
1769
      "IPython version      : 8.18.1\n",
1770
      "\n",
1771
      "torch                : 2.1.1\n",
1772
      "transformers         : 4.35.2\n",
1773
      "sentence_transformers: 2.2.2\n",
1774
      "pymilvus             : 2.3.4\n",
1775
      "langchain            : 0.1.0\n",
1776
      "openai               : 1.7.2\n",
1777
      "\n",
1778
      "conda environment: py311new\n",
1779
      "\n"
1780
     ]
1781
    }
1782
   ],
1783
   "source": [
1784
    "# Props to Sebastian Raschka for this handy watermark.\n",
1785
    "# !pip install watermark\n",
1786
    "\n",
1787
    "%load_ext watermark\n",
1788
    "%watermark -a 'Christy Bergman' -v -p torch,transformers,sentence_transformers,pymilvus,langchain,openai --conda"
1789
   ]
1790
  }
1791
 ],
1792
 "metadata": {
1793
  "kernelspec": {
1794
   "display_name": "Python 3 (ipykernel)",
1795
   "language": "python",
1796
   "name": "python3"
1797
  },
1798
  "language_info": {
1799
   "codemirror_mode": {
1800
    "name": "ipython",
1801
    "version": 3
1802
   },
1803
   "file_extension": ".py",
1804
   "mimetype": "text/x-python",
1805
   "name": "python",
1806
   "nbconvert_exporter": "python",
1807
   "pygments_lexer": "ipython3",
1808
   "version": "3.11.6"
1809
  }
1810
 },
1811
 "nbformat": 4,
1812
 "nbformat_minor": 5
1813
}
1814
milvus-io_bootcamp

Использование cookies