examples

sagemaker-llama-2-rag.ipynb
1939 строк · 54.2 Кб
Перенос по словам
1
{
2
 "cells": [
3
  {
4
   "attachments": {},
5
   "cell_type": "markdown",
6
   "metadata": {},
7
   "source": [
8
    "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/pinecone-io/examples/blob/master/learn/generation/aws/sagemaker/sagemaker-llama-2-rag.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/aws/sagemaker/sagemaker-llama-2-rag.ipynb)"
9
   ]
10
  },
11
  {
12
   "attachments": {},
13
   "cell_type": "markdown",
14
   "metadata": {},
15
   "source": [
16
    "# Retrieval-Augmented Generation: Question Answering using LLama-2, Pinecone & Custom Dataset\n"
17
   ]
18
  },
19
  {
20
   "attachments": {},
21
   "cell_type": "markdown",
22
   "metadata": {},
23
   "source": [
24
    "In this notebook we will demonstrate how to use [**Llama-2-7b**](https://ai.meta.com/llama/) to answer questions using a library of documents as a reference, by using document embeddings and retrieval. The embeddings are generated from **MiniLM** embedding model and retrieved from [**Pinecone Vector Database**](https://www.pinecone.io/). \n",
25
    "Access to a Pinecone environment is a prerequisite to run this notebook fully. \n",
26
    "\n",
27
    "**You can start by using the [Free Tier on Pinecone](https://www.pinecone.io/pricing/). This notebook serves a template such that you can easily replace the example dataset by your own to build a custom question and asnwering application.**\n",
28
    "\n",
29
    "To perform inference on the [Llama models](https://ai.meta.com/llama/), you need to pass `custom_attributes='accept_eula=true'` as part of header. This means you have read and accept the end-user-license-agreement (EULA) of the model. EULA can be found in model card description or from this [webpage](https://ai.meta.com/resources/models-and-libraries/llama-downloads/). By default, this notebook sets `custom_attributes='accept_eula=false'`, so all inference requests will fail until you explicitly change this custom attribute.\n",
30
    "\n",
31
    "Note: Custom_attributes used to pass EULA are key/value pairs. The key and value are separated by '=' and pairs are separated by ';'. If the user passes the same key more than once, the last value is kept and passed to the script handler (i.e., in this case, used for conditional logic). For example, if 'accept_eula=false; accept_eula=true' is passed to the server, then 'accept_eula=true' is kept and passed to the script handler."
32
   ]
33
  },
34
  {
35
   "attachments": {},
36
   "cell_type": "markdown",
37
   "metadata": {},
38
   "source": [
39
    "## Step 1. Deploy Llama-2 7 Billion Chat Model in SageMaker JumpStart"
40
   ]
41
  },
42
  {
43
   "cell_type": "code",
44
   "execution_count": 18,
45
   "metadata": {
46
    "collapsed": false,
47
    "jupyter": {
48
     "outputs_hidden": false
49
    },
50
    "pycharm": {
51
     "name": "#%%\n"
52
    },
53
    "tags": []
54
   },
55
   "outputs": [
56
    {
57
     "name": "stdout",
58
     "output_type": "stream",
59
     "text": [
60
      "\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
61
      "\u001b[0m\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
62
      "\u001b[0m"
63
     ]
64
    }
65
   ],
66
   "source": [
67
    "!pip install -qU \\\n",
68
    "    sagemaker \\\n",
69
    "    pinecone-client==2.2.1 \\\n",
70
    "    ipywidgets==7.0.0"
71
   ]
72
  },
73
  {
74
   "attachments": {},
75
   "cell_type": "markdown",
76
   "metadata": {},
77
   "source": [
78
    "To begin, we will initialize all of the SageMaker session variables we'll need to use throughout the walkthrough."
79
   ]
80
  },
81
  {
82
   "cell_type": "code",
83
   "execution_count": 24,
84
   "metadata": {},
85
   "outputs": [],
86
   "source": [
87
    "import sagemaker\n",
88
    "from sagemaker.jumpstart.model import JumpStartModel\n",
89
    "from sagemaker.huggingface import HuggingFaceModel\n",
90
    "\n",
91
    "role = sagemaker.get_execution_role()\n",
92
    "\n",
93
    "my_model = JumpStartModel(model_id=\"meta-textgeneration-llama-2-7b-f\")"
94
   ]
95
  },
96
  {
97
   "attachments": {},
98
   "cell_type": "markdown",
99
   "metadata": {},
100
   "source": [
101
    "We will use a `ml.g5.4xlarge` instance to deploy our Llama-2-7 billion model. We can find pricing for all instances [here](https://aws.amazon.com/sagemaker/pricing/)."
102
   ]
103
  },
104
  {
105
   "cell_type": "code",
106
   "execution_count": 21,
107
   "metadata": {
108
    "tags": []
109
   },
110
   "outputs": [
111
    {
112
     "name": "stdout",
113
     "output_type": "stream",
114
     "text": [
115
      "---------------!"
116
     ]
117
    }
118
   ],
119
   "source": [
120
    "predictor = my_model.deploy(\n",
121
    "    initial_instance_count=1,\n",
122
    "    instance_type=\"ml.g5.4xlarge\",\n",
123
    "    endpoint_name=\"llama-2-generator\"\n",
124
    ")"
125
   ]
126
  },
127
  {
128
   "attachments": {},
129
   "cell_type": "markdown",
130
   "metadata": {},
131
   "source": [
132
    "## Step 2. Ask a question to LLM without providing the context\n",
133
    "\n",
134
    "To better illustrate why we need retrieval-augmented generation (RAG) based approach to solve the question and anwering problem. Let's directly ask the model a question and see how they respond."
135
   ]
136
  },
137
  {
138
   "cell_type": "code",
139
   "execution_count": 22,
140
   "metadata": {
141
    "tags": []
142
   },
143
   "outputs": [],
144
   "source": [
145
    "question = \"Which instances can I use with Managed Spot Training in SageMaker?\""
146
   ]
147
  },
148
  {
149
   "cell_type": "code",
150
   "execution_count": 104,
151
   "metadata": {
152
    "tags": []
153
   },
154
   "outputs": [
155
    {
156
     "data": {
157
      "text/plain": [
158
       "' Based on the context provided, Managed Spot Training in SageMaker allows you to use the following instances:\\n\\n* m5.xlarge\\n* m5.2xlarge\\n* m5.4xlarge\\n* m5.8xlarge\\n* m5.16x'"
159
      ]
160
     },
161
     "execution_count": 104,
162
     "metadata": {},
163
     "output_type": "execute_result"
164
    }
165
   ],
166
   "source": [
167
    "# https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/\n",
168
    "\n",
169
    "prompt = \"\"\"Answer the following QUESTION based on the CONTEXT\n",
170
    "given. If you do not know the answer and the CONTEXT doesn't\n",
171
    "contain the answer truthfully say \"I don't know\n",
172
    "\n",
173
    "ANSWER:\n",
174
    "\n",
175
    "\"\"\"\n",
176
    "\n",
177
    "\n",
178
    "payload = {\n",
179
    "    \"inputs\":  \n",
180
    "      [\n",
181
    "        [\n",
182
    "         {\"role\": \"system\", \"content\": prompt},\n",
183
    "         {\"role\": \"user\", \"content\": question},\n",
184
    "        ]   \n",
185
    "      ],\n",
186
    "   \"parameters\":{\"max_new_tokens\": 64, \"top_p\": 0.9, \"temperature\": 0.6, \"return_full_text\": False}\n",
187
    "}\n",
188
    "\n",
189
    "out = predictor.predict(payload, custom_attributes='accept_eula=true')\n",
190
    "out[0]['generation']['content']"
191
   ]
192
  },
193
  {
194
   "attachments": {},
195
   "cell_type": "markdown",
196
   "metadata": {},
197
   "source": [
198
    "You can see the generated answer is wrong or doesn't make much sense. "
199
   ]
200
  },
201
  {
202
   "attachments": {},
203
   "cell_type": "markdown",
204
   "metadata": {},
205
   "source": [
206
    "## Step 3. Improve the answer to the same question using **prompt engineering** with insightful context\n",
207
    "\n",
208
    "\n",
209
    "To better answer the question well, we provide extra contextual information, combine it with a prompt, and send it to model together with the question. Below is an example."
210
   ]
211
  },
212
  {
213
   "cell_type": "code",
214
   "execution_count": 76,
215
   "metadata": {
216
    "tags": []
217
   },
218
   "outputs": [],
219
   "source": [
220
    "context = \"\"\"Managed Spot Training can be used with all instances\n",
221
    "supported in Amazon SageMaker. Managed Spot Training is supported\n",
222
    "in all AWS Regions where Amazon SageMaker is currently available.\"\"\""
223
   ]
224
  },
225
  {
226
   "cell_type": "code",
227
   "execution_count": 105,
228
   "metadata": {
229
    "tags": []
230
   },
231
   "outputs": [
232
    {
233
     "name": "stdout",
234
     "output_type": "stream",
235
     "text": [
236
      "[Input]: Which instances can I use with Managed Spot Training in SageMaker?\n",
237
      "[Output]:  Based on the given context, you can use Managed Spot Training with all instances supported in Amazon SageMaker. Therefore, the answer is:\n",
238
      "\n",
239
      "All instances supported in Amazon SageMaker.\n"
240
     ]
241
    }
242
   ],
243
   "source": [
244
    "prompt_template = \"\"\"Answer the following QUESTION based on the CONTEXT\n",
245
    "given. If you do not know the answer and the CONTEXT doesn't\n",
246
    "contain the answer truthfully say \"I don't know\".\n",
247
    "\n",
248
    "CONTEXT:\n",
249
    "{context}\n",
250
    "\n",
251
    "\n",
252
    "ANSWER:\n",
253
    "\"\"\"\n",
254
    "\n",
255
    "text_input = prompt_template.replace(\"{context}\", context).replace(\"{question}\", question)\n",
256
    "\n",
257
    "payload = {\n",
258
    "    \"inputs\":  \n",
259
    "      [\n",
260
    "        [\n",
261
    "         {\"role\": \"system\", \"content\": text_input},\n",
262
    "         {\"role\": \"user\", \"content\": question},\n",
263
    "        ]   \n",
264
    "      ],\n",
265
    "   \"parameters\":{\"max_new_tokens\": 64, \"top_p\": 0.9, \"temperature\": 0.6, \"return_full_text\": False}\n",
266
    "}\n",
267
    "\n",
268
    "out = predictor.predict(payload, custom_attributes='accept_eula=true')\n",
269
    "generated_text = out[0]['generation']['content']\n",
270
    "print(f\"[Input]: {question}\\n[Output]: {generated_text}\")"
271
   ]
272
  },
273
  {
274
   "attachments": {},
275
   "cell_type": "markdown",
276
   "metadata": {},
277
   "source": [
278
    "Let's see if our LLM is capable of following our instructions..."
279
   ]
280
  },
281
  {
282
   "cell_type": "code",
283
   "execution_count": 82,
284
   "metadata": {
285
    "tags": []
286
   },
287
   "outputs": [
288
    {
289
     "name": "stdout",
290
     "output_type": "stream",
291
     "text": [
292
      "[Input]: What color is my desk?\n",
293
      "[Output]:  I don't know the answer to your question about the color of your desk as it is not related to the context provided, which is about Amazon SageMaker and its supported instances and regions.\n"
294
     ]
295
    }
296
   ],
297
   "source": [
298
    "unanswerable_question = \"What color is my desk?\"\n",
299
    "\n",
300
    "text_input = prompt_template.replace(\"{context}\", context).replace(\"{question}\", question)\n",
301
    "\n",
302
    "payload = {\n",
303
    "    \"inputs\":  \n",
304
    "      [\n",
305
    "        [\n",
306
    "         {\"role\": \"system\", \"content\": text_input},\n",
307
    "         {\"role\": \"user\", \"content\": unanswerable_question},\n",
308
    "        ]   \n",
309
    "      ],\n",
310
    "   \"parameters\":{\"max_new_tokens\":256, \"top_p\":0.9, \"temperature\":0.6}\n",
311
    "}\n",
312
    "\n",
313
    "\n",
314
    "out = predictor.predict(payload, custom_attributes='accept_eula=true')\n",
315
    "generated_text = out[0]['generation']['content']\n",
316
    "print(f\"[Input]: {unanswerable_question}\\n[Output]: {generated_text}\")"
317
   ]
318
  },
319
  {
320
   "attachments": {},
321
   "cell_type": "markdown",
322
   "metadata": {},
323
   "source": [
324
    "Looks great! The LLM is following instructions and we've also demonstrated how contexts can help our LLM answer questions accurately. However, we're unlikely to be inserting a context directly into a prompt like this unless we already know the answer — and if we already know the answer why would we be asking the question at all?\n",
325
    "\n",
326
    "We need a way of extracting _relevant contexts_ from huge bases of information. For that we need **R**etrieval **A**ugmented **G**eneration (RAG)."
327
   ]
328
  },
329
  {
330
   "attachments": {},
331
   "cell_type": "markdown",
332
   "metadata": {},
333
   "source": [
334
    "## Step 4. Use RAG based approach to identify the correct documents, and use them along with prompt and question to query LLM\n",
335
    "\n",
336
    "\n",
337
    "We plan to use document embeddings to fetch the most relevant documents in our document knowledge library and combine them with the prompt that we provide to LLM.\n",
338
    "\n",
339
    "To achieve that, we will do following.\n",
340
    "\n",
341
    "* Generate embedings for each of document in the knowledge library with the MiniLM embedding model.\n",
342
    "* Identify top K most relevant documents based on user query.\n",
343
    "    * For a query of your interest, generate the embedding of the query using the same embedding model.\n",
344
    "    * Search the indexes of top K most relevant documents in the embedding space using the SageMaker KNN algorithm.\n",
345
    "    * Use the indexes to retrieve the corresponded documents.\n",
346
    "* Combine the retrieved documents with prompt and question and send them into LLM.\n",
347
    "\n",
348
    "\n",
349
    "\n",
350
    "Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt -- maximum sequence length of 1024 tokens. "
351
   ]
352
  },
353
  {
354
   "attachments": {},
355
   "cell_type": "markdown",
356
   "metadata": {},
357
   "source": [
358
    "### 4.1 Deploying the model endpoint for Sentence Transformer embedding model"
359
   ]
360
  },
361
  {
362
   "cell_type": "code",
363
   "execution_count": 25,
364
   "metadata": {
365
    "tags": []
366
   },
367
   "outputs": [],
368
   "source": [
369
    "hub_config = {\n",
370
    "    \"HF_MODEL_ID\": \"sentence-transformers/all-MiniLM-L6-v2\",  # model_id from hf.co/models\n",
371
    "    \"HF_TASK\": \"feature-extraction\",\n",
372
    "}\n",
373
    "\n",
374
    "huggingface_model = HuggingFaceModel(\n",
375
    "    env=hub_config,\n",
376
    "    role=role,\n",
377
    "    transformers_version=\"4.6\",  # transformers version used\n",
378
    "    pytorch_version=\"1.7\",  # pytorch version used\n",
379
    "    py_version=\"py36\",  # python version of the DLC\n",
380
    ")"
381
   ]
382
  },
383
  {
384
   "attachments": {},
385
   "cell_type": "markdown",
386
   "metadata": {},
387
   "source": [
388
    "Then we deploy the model as we did earlier for our generative LLM:"
389
   ]
390
  },
391
  {
392
   "cell_type": "code",
393
   "execution_count": 26,
394
   "metadata": {
395
    "tags": []
396
   },
397
   "outputs": [
398
    {
399
     "name": "stdout",
400
     "output_type": "stream",
401
     "text": [
402
      "----!"
403
     ]
404
    }
405
   ],
406
   "source": [
407
    "encoder = huggingface_model.deploy(\n",
408
    "    initial_instance_count=1, instance_type=\"ml.t2.large\", endpoint_name=\"minilm-embedding\"\n",
409
    ")"
410
   ]
411
  },
412
  {
413
   "attachments": {},
414
   "cell_type": "markdown",
415
   "metadata": {},
416
   "source": [
417
    "We can then create the embeddings like so:"
418
   ]
419
  },
420
  {
421
   "cell_type": "code",
422
   "execution_count": 27,
423
   "metadata": {
424
    "tags": []
425
   },
426
   "outputs": [],
427
   "source": [
428
    "out = encoder.predict({\"inputs\": [\"some text here\", \"some more text goes here too\"]})"
429
   ]
430
  },
431
  {
432
   "attachments": {},
433
   "cell_type": "markdown",
434
   "metadata": {},
435
   "source": [
436
    "We will see that we have two outputs (one for each of our input sentences):"
437
   ]
438
  },
439
  {
440
   "cell_type": "code",
441
   "execution_count": 12,
442
   "metadata": {
443
    "tags": []
444
   },
445
   "outputs": [
446
    {
447
     "data": {
448
      "text/plain": [
449
       "2"
450
      ]
451
     },
452
     "execution_count": 12,
453
     "metadata": {},
454
     "output_type": "execute_result"
455
    }
456
   ],
457
   "source": [
458
    "len(out)"
459
   ]
460
  },
461
  {
462
   "attachments": {},
463
   "cell_type": "markdown",
464
   "metadata": {},
465
   "source": [
466
    "But if we look at each of these outputs we see something strange..."
467
   ]
468
  },
469
  {
470
   "cell_type": "code",
471
   "execution_count": 13,
472
   "metadata": {
473
    "tags": []
474
   },
475
   "outputs": [
476
    {
477
     "data": {
478
      "text/plain": [
479
       "(8, 8)"
480
      ]
481
     },
482
     "execution_count": 13,
483
     "metadata": {},
484
     "output_type": "execute_result"
485
    }
486
   ],
487
   "source": [
488
    "len(out[0]), len(out[1])"
489
   ]
490
  },
491
  {
492
   "attachments": {},
493
   "cell_type": "markdown",
494
   "metadata": {},
495
   "source": [
496
    "We would expect the embeddings to be of dimensionality *384*, but we're seeing two lists containing _eight_ items each? What is happening here?\n",
497
    "\n",
498
    "When we output feature embeddings from the MiniLM model we're actually outputting a single 384-dimensional vector for every _token_ contained in the inputs we provided. Our second text `\"some more text goes here too\"` contains _eight_ tokens, and so this is where the value `8` is coming from.\n",
499
    "\n",
500
    "So, if we were to take a look at one of these vectors we should find the dimensionality of `384`:"
501
   ]
502
  },
503
  {
504
   "cell_type": "code",
505
   "execution_count": 14,
506
   "metadata": {
507
    "tags": []
508
   },
509
   "outputs": [
510
    {
511
     "data": {
512
      "text/plain": [
513
       "384"
514
      ]
515
     },
516
     "execution_count": 14,
517
     "metadata": {},
518
     "output_type": "execute_result"
519
    }
520
   ],
521
   "source": [
522
    "len(out[0][0])"
523
   ]
524
  },
525
  {
526
   "attachments": {},
527
   "cell_type": "markdown",
528
   "metadata": {},
529
   "source": [
530
    "Perfect! There's just one problem, how do we transform these eight vector embeddings into a single _sentence embedding_? For this, we simply take the mean value across each vector dimension, like so:"
531
   ]
532
  },
533
  {
534
   "cell_type": "code",
535
   "execution_count": 28,
536
   "metadata": {
537
    "tags": []
538
   },
539
   "outputs": [
540
    {
541
     "data": {
542
      "text/plain": [
543
       "(2, 384)"
544
      ]
545
     },
546
     "execution_count": 28,
547
     "metadata": {},
548
     "output_type": "execute_result"
549
    }
550
   ],
551
   "source": [
552
    "import numpy as np\n",
553
    "\n",
554
    "embeddings = np.mean(np.array(out), axis=1)\n",
555
    "embeddings.shape"
556
   ]
557
  },
558
  {
559
   "attachments": {},
560
   "cell_type": "markdown",
561
   "metadata": {},
562
   "source": [
563
    "Now we have two 384-dimensional vector embeddings, one for each of our input texts. To make our lives easier later, we will wrap this encoding process into a single function:"
564
   ]
565
  },
566
  {
567
   "cell_type": "code",
568
   "execution_count": 29,
569
   "metadata": {
570
    "tags": []
571
   },
572
   "outputs": [],
573
   "source": [
574
    "from typing import List\n",
575
    "\n",
576
    "\n",
577
    "def embed_docs(docs: List[str]) -> List[List[float]]:\n",
578
    "    out = encoder.predict({\"inputs\": docs})\n",
579
    "    embeddings = np.mean(np.array(out), axis=1)\n",
580
    "    return embeddings.tolist()"
581
   ]
582
  },
583
  {
584
   "attachments": {},
585
   "cell_type": "markdown",
586
   "metadata": {},
587
   "source": [
588
    "### 4.2. Generate embeddings for each of document in the knowledge library with the Sentence Transformer model.\n",
589
    "\n",
590
    "For the purpose of the demo we will use [Amazon SageMaker FAQs](https://aws.amazon.com/sagemaker/faqs/) as knowledge library. The data are formatted in a CSV file with two columns Question and Answer. We use **only** the Answer column as the documents of knowledge library, from which relevant documents are retrieved based on a query. \n",
591
    "\n",
592
    "**Each row in the CSV format dataset corresponds to a textual document. \n",
593
    "We will iterate each document to get its embedding vector via the MiniLM embedding model. \n",
594
    "For your purpose, you can replace the example dataset of your own to build a custom question and answering application.**\n"
595
   ]
596
  },
597
  {
598
   "attachments": {},
599
   "cell_type": "markdown",
600
   "metadata": {},
601
   "source": [
602
    "First, we download the dataset from our S3 bucket to the local."
603
   ]
604
  },
605
  {
606
   "cell_type": "code",
607
   "execution_count": 10,
608
   "metadata": {
609
    "tags": []
610
   },
611
   "outputs": [],
612
   "source": [
613
    "s3_path = f\"s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv\""
614
   ]
615
  },
616
  {
617
   "cell_type": "code",
618
   "execution_count": 11,
619
   "metadata": {
620
    "tags": []
621
   },
622
   "outputs": [
623
    {
624
     "name": "stdout",
625
     "output_type": "stream",
626
     "text": [
627
      "download: s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv to ./Amazon_SageMaker_FAQs.csv\n"
628
     ]
629
    }
630
   ],
631
   "source": [
632
    "# Downloading the Database\n",
633
    "!aws s3 cp $s3_path Amazon_SageMaker_FAQs.csv"
634
   ]
635
  },
636
  {
637
   "attachments": {},
638
   "cell_type": "markdown",
639
   "metadata": {},
640
   "source": [
641
    "Open the dataset with Pandas:"
642
   ]
643
  },
644
  {
645
   "cell_type": "code",
646
   "execution_count": 12,
647
   "metadata": {
648
    "tags": []
649
   },
650
   "outputs": [
651
    {
652
     "data": {
653
      "text/html": [
654
       "<div>\n",
655
       "<style scoped>\n",
656
       "    .dataframe tbody tr th:only-of-type {\n",
657
       "        vertical-align: middle;\n",
658
       "    }\n",
659
       "\n",
660
       "    .dataframe tbody tr th {\n",
661
       "        vertical-align: top;\n",
662
       "    }\n",
663
       "\n",
664
       "    .dataframe thead th {\n",
665
       "        text-align: right;\n",
666
       "    }\n",
667
       "</style>\n",
668
       "<table border=\"1\" class=\"dataframe\">\n",
669
       "  <thead>\n",
670
       "    <tr style=\"text-align: right;\">\n",
671
       "      <th></th>\n",
672
       "      <th>Question</th>\n",
673
       "      <th>Answer</th>\n",
674
       "    </tr>\n",
675
       "  </thead>\n",
676
       "  <tbody>\n",
677
       "    <tr>\n",
678
       "      <th>0</th>\n",
679
       "      <td>What is Amazon SageMaker?</td>\n",
680
       "      <td>Amazon SageMaker is a fully managed service to...</td>\n",
681
       "    </tr>\n",
682
       "    <tr>\n",
683
       "      <th>1</th>\n",
684
       "      <td>In which Regions is Amazon SageMaker available...</td>\n",
685
       "      <td>For a list of the supported Amazon SageMaker A...</td>\n",
686
       "    </tr>\n",
687
       "    <tr>\n",
688
       "      <th>2</th>\n",
689
       "      <td>What is the service availability of Amazon Sag...</td>\n",
690
       "      <td>Amazon SageMaker is designed for high availabi...</td>\n",
691
       "    </tr>\n",
692
       "    <tr>\n",
693
       "      <th>3</th>\n",
694
       "      <td>How does Amazon SageMaker secure my code?</td>\n",
695
       "      <td>Amazon SageMaker stores code in ML storage vol...</td>\n",
696
       "    </tr>\n",
697
       "    <tr>\n",
698
       "      <th>4</th>\n",
699
       "      <td>What security measures does Amazon SageMaker h...</td>\n",
700
       "      <td>Amazon SageMaker ensures that ML model artifac...</td>\n",
701
       "    </tr>\n",
702
       "  </tbody>\n",
703
       "</table>\n",
704
       "</div>"
705
      ],
706
      "text/plain": [
707
       "                                            Question  \\\n",
708
       "0                          What is Amazon SageMaker?   \n",
709
       "1  In which Regions is Amazon SageMaker available...   \n",
710
       "2  What is the service availability of Amazon Sag...   \n",
711
       "3          How does Amazon SageMaker secure my code?   \n",
712
       "4  What security measures does Amazon SageMaker h...   \n",
713
       "\n",
714
       "                                              Answer  \n",
715
       "0  Amazon SageMaker is a fully managed service to...  \n",
716
       "1  For a list of the supported Amazon SageMaker A...  \n",
717
       "2  Amazon SageMaker is designed for high availabi...  \n",
718
       "3  Amazon SageMaker stores code in ML storage vol...  \n",
719
       "4  Amazon SageMaker ensures that ML model artifac...  "
720
      ]
721
     },
722
     "execution_count": 12,
723
     "metadata": {},
724
     "output_type": "execute_result"
725
    }
726
   ],
727
   "source": [
728
    "import pandas as pd\n",
729
    "\n",
730
    "df_knowledge = pd.read_csv(\"Amazon_SageMaker_FAQs.csv\", header=None, names=[\"Question\", \"Answer\"])\n",
731
    "df_knowledge.head()"
732
   ]
733
  },
734
  {
735
   "attachments": {},
736
   "cell_type": "markdown",
737
   "metadata": {},
738
   "source": [
739
    "Drop the `Question` column since it is not used in this notebook."
740
   ]
741
  },
742
  {
743
   "cell_type": "code",
744
   "execution_count": 13,
745
   "metadata": {
746
    "tags": []
747
   },
748
   "outputs": [
749
    {
750
     "data": {
751
      "text/html": [
752
       "<div>\n",
753
       "<style scoped>\n",
754
       "    .dataframe tbody tr th:only-of-type {\n",
755
       "        vertical-align: middle;\n",
756
       "    }\n",
757
       "\n",
758
       "    .dataframe tbody tr th {\n",
759
       "        vertical-align: top;\n",
760
       "    }\n",
761
       "\n",
762
       "    .dataframe thead th {\n",
763
       "        text-align: right;\n",
764
       "    }\n",
765
       "</style>\n",
766
       "<table border=\"1\" class=\"dataframe\">\n",
767
       "  <thead>\n",
768
       "    <tr style=\"text-align: right;\">\n",
769
       "      <th></th>\n",
770
       "      <th>Answer</th>\n",
771
       "    </tr>\n",
772
       "  </thead>\n",
773
       "  <tbody>\n",
774
       "    <tr>\n",
775
       "      <th>0</th>\n",
776
       "      <td>Amazon SageMaker is a fully managed service to...</td>\n",
777
       "    </tr>\n",
778
       "    <tr>\n",
779
       "      <th>1</th>\n",
780
       "      <td>For a list of the supported Amazon SageMaker A...</td>\n",
781
       "    </tr>\n",
782
       "    <tr>\n",
783
       "      <th>2</th>\n",
784
       "      <td>Amazon SageMaker is designed for high availabi...</td>\n",
785
       "    </tr>\n",
786
       "    <tr>\n",
787
       "      <th>3</th>\n",
788
       "      <td>Amazon SageMaker stores code in ML storage vol...</td>\n",
789
       "    </tr>\n",
790
       "    <tr>\n",
791
       "      <th>4</th>\n",
792
       "      <td>Amazon SageMaker ensures that ML model artifac...</td>\n",
793
       "    </tr>\n",
794
       "  </tbody>\n",
795
       "</table>\n",
796
       "</div>"
797
      ],
798
      "text/plain": [
799
       "                                              Answer\n",
800
       "0  Amazon SageMaker is a fully managed service to...\n",
801
       "1  For a list of the supported Amazon SageMaker A...\n",
802
       "2  Amazon SageMaker is designed for high availabi...\n",
803
       "3  Amazon SageMaker stores code in ML storage vol...\n",
804
       "4  Amazon SageMaker ensures that ML model artifac..."
805
      ]
806
     },
807
     "execution_count": 13,
808
     "metadata": {},
809
     "output_type": "execute_result"
810
    }
811
   ],
812
   "source": [
813
    "df_knowledge.drop([\"Question\"], axis=1, inplace=True)\n",
814
    "df_knowledge.head()"
815
   ]
816
  },
817
  {
818
   "attachments": {},
819
   "cell_type": "markdown",
820
   "metadata": {
821
    "tags": []
822
   },
823
   "source": [
824
    "Next we can initialize our connection to **Pinecone**. To do this we need a [free API key](https://app.pinecone.io)."
825
   ]
826
  },
827
  {
828
   "cell_type": "code",
829
   "execution_count": 21,
830
   "metadata": {
831
    "tags": []
832
   },
833
   "outputs": [
834
    {
835
     "name": "stderr",
836
     "output_type": "stream",
837
     "text": [
838
      "/opt/conda/lib/python3.7/site-packages/pinecone/index.py:4: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)\n",
839
      "  from tqdm.autonotebook import tqdm\n"
840
     ]
841
    }
842
   ],
843
   "source": [
844
    "from pinecone import Pinecone\n",
845
    "import os\n",
846
    "\n",
847
    "# add Pinecone API key from app.pinecone.io\n",
848
    "api_key = os.environ.get(\"PINECONE_API_KEY\") or \"YOUR_API_KEY\"\n",
849
    "# set Pinecone environment - find next to API key in console\n",
850
    "env = os.environ.get(\"PINECONE_ENVIRONMENT\") or \"YOUR_ENV\"\n",
851
    "\n",
852
    "pc = Pinecone(api_key=api_key)"
853
   ]
854
  },
855
  {
856
   "attachments": {},
857
   "cell_type": "markdown",
858
   "metadata": {},
859
   "source": [
860
    "List all present indexes associated with your key, should be empty on the first run"
861
   ]
862
  },
863
  {
864
   "cell_type": "code",
865
   "execution_count": 15,
866
   "metadata": {
867
    "tags": []
868
   },
869
   "outputs": [
870
    {
871
     "data": {
872
      "text/plain": [
873
       "['jumpstart-minilm-l6',\n",
874
       " 'retrieval-augmentation-aws-6j',\n",
875
       " 'retrieval-augmentation-aws']"
876
      ]
877
     },
878
     "execution_count": 15,
879
     "metadata": {},
880
     "output_type": "execute_result"
881
    }
882
   ],
883
   "source": [
884
    "pinecone.list_indexes().names()"
885
   ]
886
  },
887
  {
888
   "attachments": {},
889
   "cell_type": "markdown",
890
   "metadata": {},
891
   "source": [
892
    "Now we create a new index called `retrieval-augmentation-aws`. It's important that we align the index `dimension` and `metric` parameters with those required by the MiniLM model."
893
   ]
894
  },
895
  {
896
   "cell_type": "code",
897
   "execution_count": 30,
898
   "metadata": {
899
    "tags": []
900
   },
901
   "outputs": [],
902
   "source": [
903
    "import time\n",
904
    "\n",
905
    "index_name = \"llama-2-7b-example\"\n",
906
    "\n",
907
    "if index_name in pinecone.list_indexes().names():\n",
908
    "    pinecone.delete_index(index_name)\n",
909
    "\n",
910
    "pinecone.create_index(name=index_name, dimension=embeddings.shape[1], metric=\"cosine\")\n",
911
    "# wait for index to finish initialization\n",
912
    "while not pinecone.describe_index(index_name).status[\"ready\"]:\n",
913
    "    time.sleep(1)"
914
   ]
915
  },
916
  {
917
   "cell_type": "code",
918
   "execution_count": 31,
919
   "metadata": {
920
    "tags": []
921
   },
922
   "outputs": [
923
    {
924
     "data": {
925
      "text/plain": [
926
       "['jumpstart-minilm-l6',\n",
927
       " 'llama-2-7b-example',\n",
928
       " 'retrieval-augmentation-aws-6j',\n",
929
       " 'retrieval-augmentation-aws']"
930
      ]
931
     },
932
     "execution_count": 31,
933
     "metadata": {},
934
     "output_type": "execute_result"
935
    }
936
   ],
937
   "source": [
938
    "pinecone.list_indexes().names()"
939
   ]
940
  },
941
  {
942
   "attachments": {},
943
   "cell_type": "markdown",
944
   "metadata": {},
945
   "source": [
946
    "Now we upsert the data, we will do this in batches of `128`."
947
   ]
948
  },
949
  {
950
   "cell_type": "code",
951
   "execution_count": 32,
952
   "metadata": {
953
    "scrolled": true,
954
    "tags": []
955
   },
956
   "outputs": [
957
    {
958
     "data": {
959
      "application/vnd.jupyter.widget-view+json": {
960
       "model_id": "aa50e0ad54eb4b35b685f27b27874c5d",
961
       "version_major": 2,
962
       "version_minor": 0
963
      },
964
      "text/plain": [
965
       "A Jupyter Widget"
966
      ]
967
     },
968
     "metadata": {},
969
     "output_type": "display_data"
970
    }
971
   ],
972
   "source": [
973
    "from tqdm.auto import tqdm\n",
974
    "\n",
975
    "batch_size = 2  # can increase but needs larger instance size otherwise instance runs out of memory\n",
976
    "vector_limit = 1000\n",
977
    "\n",
978
    "answers = df_knowledge[:vector_limit]\n",
979
    "index = pinecone.Index(index_name)\n",
980
    "\n",
981
    "for i in tqdm(range(0, len(answers), batch_size)):\n",
982
    "    # find end of batch\n",
983
    "    i_end = min(i + batch_size, len(answers))\n",
984
    "    # create IDs batch\n",
985
    "    ids = [str(x) for x in range(i, i_end)]\n",
986
    "    # create metadata batch\n",
987
    "    metadatas = [{\"text\": text} for text in answers[\"Answer\"][i:i_end]]\n",
988
    "    # create embeddings\n",
989
    "    texts = answers[\"Answer\"][i:i_end].tolist()\n",
990
    "    embeddings = embed_docs(texts)\n",
991
    "    # create records list for upsert\n",
992
    "    records = zip(ids, embeddings, metadatas)\n",
993
    "    # upsert to Pinecone\n",
994
    "    index.upsert(vectors=records)"
995
   ]
996
  },
997
  {
998
   "cell_type": "code",
999
   "execution_count": 33,
1000
   "metadata": {
1001
    "tags": []
1002
   },
1003
   "outputs": [
1004
    {
1005
     "data": {
1006
      "text/plain": [
1007
       "{'dimension': 384,\n",
1008
       " 'index_fullness': 0.0,\n",
1009
       " 'namespaces': {'': {'vector_count': 154}},\n",
1010
       " 'total_vector_count': 154}"
1011
      ]
1012
     },
1013
     "execution_count": 33,
1014
     "metadata": {},
1015
     "output_type": "execute_result"
1016
    }
1017
   ],
1018
   "source": [
1019
    "# check number of records in the index\n",
1020
    "index.describe_index_stats()"
1021
   ]
1022
  },
1023
  {
1024
   "attachments": {},
1025
   "cell_type": "markdown",
1026
   "metadata": {
1027
    "tags": []
1028
   },
1029
   "source": [
1030
    "### 4.3 Combine the retrieved documents, prompt, and question to query the LLM"
1031
   ]
1032
  },
1033
  {
1034
   "attachments": {},
1035
   "cell_type": "markdown",
1036
   "metadata": {},
1037
   "source": [
1038
    "Now we're ready begin querying our LLM with a **R**etrieval **A**ugmented **G**eneration (RAG) pipeline. Let's see how this will work step-by-step first."
1039
   ]
1040
  },
1041
  {
1042
   "attachments": {},
1043
   "cell_type": "markdown",
1044
   "metadata": {},
1045
   "source": [
1046
    "First we create our _query embedding_ and use it to query Pinecone:"
1047
   ]
1048
  },
1049
  {
1050
   "cell_type": "code",
1051
   "execution_count": 65,
1052
   "metadata": {
1053
    "scrolled": true,
1054
    "tags": []
1055
   },
1056
   "outputs": [
1057
    {
1058
     "data": {
1059
      "text/plain": [
1060
       "{'matches': [{'id': '90',\n",
1061
       "              'metadata': {'text': 'Managed Spot Training can be used with all '\n",
1062
       "                                   'instances supported in Amazon '\n",
1063
       "                                   'SageMaker.\\r\\n'},\n",
1064
       "              'score': 0.881181657,\n",
1065
       "              'values': []}],\n",
1066
       " 'namespace': ''}"
1067
      ]
1068
     },
1069
     "execution_count": 65,
1070
     "metadata": {},
1071
     "output_type": "execute_result"
1072
    }
1073
   ],
1074
   "source": [
1075
    "# extract embeddings for the questions\n",
1076
    "query_vec = embed_docs(question)[0]\n",
1077
    "\n",
1078
    "# query pinecone\n",
1079
    "res = index.query(vector=query_vec, top_k=1, include_metadata=True)\n",
1080
    "\n",
1081
    "# show the results\n",
1082
    "res"
1083
   ]
1084
  },
1085
  {
1086
   "attachments": {},
1087
   "cell_type": "markdown",
1088
   "metadata": {},
1089
   "source": [
1090
    "We get multiple relevant contexts here. We can use these to contruct a single `context` to feed into our LLM prompt."
1091
   ]
1092
  },
1093
  {
1094
   "cell_type": "code",
1095
   "execution_count": 66,
1096
   "metadata": {
1097
    "tags": []
1098
   },
1099
   "outputs": [],
1100
   "source": [
1101
    "contexts = [match.metadata[\"text\"] for match in res.matches]"
1102
   ]
1103
  },
1104
  {
1105
   "cell_type": "code",
1106
   "execution_count": 67,
1107
   "metadata": {
1108
    "tags": []
1109
   },
1110
   "outputs": [],
1111
   "source": [
1112
    "max_section_len = 1000\n",
1113
    "separator = \"\\n\"\n",
1114
    "\n",
1115
    "\n",
1116
    "def construct_context(contexts: List[str]) -> str:\n",
1117
    "    chosen_sections = []\n",
1118
    "    chosen_sections_len = 0\n",
1119
    "\n",
1120
    "    for text in contexts:\n",
1121
    "        text = text.strip()\n",
1122
    "        # Add contexts until we run out of space.\n",
1123
    "        chosen_sections_len += len(text) + 2\n",
1124
    "        if chosen_sections_len > max_section_len:\n",
1125
    "            break\n",
1126
    "        chosen_sections.append(text)\n",
1127
    "    concatenated_doc = separator.join(chosen_sections)\n",
1128
    "    print(\n",
1129
    "        f\"With maximum sequence length {max_section_len}, selected top {len(chosen_sections)} document sections: \\n{concatenated_doc}\"\n",
1130
    "    )\n",
1131
    "    return concatenated_doc"
1132
   ]
1133
  },
1134
  {
1135
   "cell_type": "code",
1136
   "execution_count": 68,
1137
   "metadata": {
1138
    "tags": []
1139
   },
1140
   "outputs": [
1141
    {
1142
     "name": "stdout",
1143
     "output_type": "stream",
1144
     "text": [
1145
      "With maximum sequence length 1000, selected top 1 document sections: \n",
1146
      "Managed Spot Training can be used with all instances supported in Amazon SageMaker.\n"
1147
     ]
1148
    }
1149
   ],
1150
   "source": [
1151
    "context_str = construct_context(contexts=contexts)"
1152
   ]
1153
  },
1154
  {
1155
   "attachments": {},
1156
   "cell_type": "markdown",
1157
   "metadata": {},
1158
   "source": [
1159
    "We would then feed this `context_str` into our LLama-2 prompt:"
1160
   ]
1161
  },
1162
  {
1163
   "cell_type": "code",
1164
   "execution_count": 78,
1165
   "metadata": {
1166
    "tags": []
1167
   },
1168
   "outputs": [],
1169
   "source": [
1170
    "def create_payload(question, context_str) -> dict:\n",
1171
    "    prompt_template = \"\"\"Answer the following QUESTION based on the CONTEXT\n",
1172
    "    given. If you do not know the answer and the CONTEXT doesn't\n",
1173
    "    contain the answer truthfully say \"I don't know\".\n",
1174
    "\n",
1175
    "    CONTEXT:\n",
1176
    "    {context}\n",
1177
    "\n",
1178
    "\n",
1179
    "    ANSWER:\n",
1180
    "    \"\"\"\n",
1181
    "\n",
1182
    "    text_input = prompt_template.replace(\"{context}\", context_str).replace(\"{question}\", question)\n",
1183
    "\n",
1184
    "    payload = {\n",
1185
    "        \"inputs\":  \n",
1186
    "          [\n",
1187
    "            [\n",
1188
    "             {\"role\": \"system\", \"content\": text_input},\n",
1189
    "             {\"role\": \"user\", \"content\": question},\n",
1190
    "            ]   \n",
1191
    "          ],\n",
1192
    "       \"parameters\":{\"max_new_tokens\": 256, \"top_p\": 0.9, \"temperature\": 0.6, \"return_full_text\": False}\n",
1193
    "    }\n",
1194
    "    return(payload)"
1195
   ]
1196
  },
1197
  {
1198
   "cell_type": "code",
1199
   "execution_count": 79,
1200
   "metadata": {
1201
    "tags": []
1202
   },
1203
   "outputs": [
1204
    {
1205
     "name": "stdout",
1206
     "output_type": "stream",
1207
     "text": [
1208
      "[Input]: Which instances can I use with Managed Spot Training in SageMaker?\n",
1209
      "[Output]:  Based on the context provided, you can use Managed Spot Training with all instances supported in Amazon SageMaker. Therefore, the answer is:\n",
1210
      "\n",
1211
      "All instances supported in Amazon SageMaker.\n"
1212
     ]
1213
    }
1214
   ],
1215
   "source": [
1216
    "payload = create_payload(question, context_str)\n",
1217
    "out = predictor.predict(payload, custom_attributes='accept_eula=true')\n",
1218
    "generated_text = out[0]['generation']['content']\n",
1219
    "print(f\"[Input]: {question}\\n[Output]: {generated_text}\")"
1220
   ]
1221
  },
1222
  {
1223
   "attachments": {},
1224
   "cell_type": "markdown",
1225
   "metadata": {},
1226
   "source": [
1227
    "Let's place all of this logic into a single RAG query function:"
1228
   ]
1229
  },
1230
  {
1231
   "cell_type": "code",
1232
   "execution_count": 80,
1233
   "metadata": {
1234
    "tags": []
1235
   },
1236
   "outputs": [],
1237
   "source": [
1238
    "def rag_query(question: str) -> str:\n",
1239
    "    # create query vec\n",
1240
    "    query_vec = embed_docs(question)[0]\n",
1241
    "    # query pinecone\n",
1242
    "    res = index.query(vector=query_vec, top_k=5, include_metadata=True)\n",
1243
    "    # get contexts\n",
1244
    "    contexts = [match.metadata[\"text\"] for match in res.matches]\n",
1245
    "    # build the multiple contexts string\n",
1246
    "    context_str = construct_context(contexts=contexts)\n",
1247
    "    # create our retrieval augmented prompt\n",
1248
    "    payload = create_payload(question, context_str)\n",
1249
    "    # make prediction\n",
1250
    "    out = predictor.predict(payload, custom_attributes='accept_eula=true')\n",
1251
    "    return out[0][\"generation\"][\"content\"]"
1252
   ]
1253
  },
1254
  {
1255
   "attachments": {},
1256
   "cell_type": "markdown",
1257
   "metadata": {},
1258
   "source": [
1259
    "We can now ask the question:"
1260
   ]
1261
  },
1262
  {
1263
   "cell_type": "code",
1264
   "execution_count": 85,
1265
   "metadata": {},
1266
   "outputs": [
1267
    {
1268
     "name": "stdout",
1269
     "output_type": "stream",
1270
     "text": [
1271
      "With maximum sequence length 1000, selected top 5 document sections: \n",
1272
      "Managed Spot Training can be used with all instances supported in Amazon SageMaker.\n",
1273
      "Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available.\n",
1274
      "Managed Spot Training with Amazon SageMaker lets you train your ML models using Amazon EC2 Spot instances, while reducing the cost of training your models by up to 90%.\n",
1275
      "For a list of the supported Amazon SageMaker AWS Regions, please visit the AWS Regional Services page. Also, for more information, see Regional endpoints in the AWS general reference guide.\n",
1276
      "At launch, we will support all Regions supported by Amazon SageMaker, except the AWS China Regions.\n"
1277
     ]
1278
    },
1279
    {
1280
     "data": {
1281
      "text/plain": [
1282
       "' Yes, Amazon SageMaker supports spot instances for managed spot training. According to the provided context, Managed Spot Training can be used with all instances supported in Amazon SageMaker, and Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available.\\n\\nTherefore, the answer to your question is:\\n\\nYes, SageMaker supports spot instances in all regions where Amazon SageMaker is available.'"
1283
      ]
1284
     },
1285
     "execution_count": 85,
1286
     "metadata": {},
1287
     "output_type": "execute_result"
1288
    }
1289
   ],
1290
   "source": [
1291
    "rag_query(\"Does SageMaker support spot instances?\")"
1292
   ]
1293
  },
1294
  {
1295
   "attachments": {},
1296
   "cell_type": "markdown",
1297
   "metadata": {},
1298
   "source": [
1299
    "We can also ask questions about things that are out of context (not contained within our dataset). From this we expect the model to *not* hallucinate and honestly tell us that it does not know the answer:"
1300
   ]
1301
  },
1302
  {
1303
   "cell_type": "code",
1304
   "execution_count": 87,
1305
   "metadata": {
1306
    "tags": []
1307
   },
1308
   "outputs": [
1309
    {
1310
     "name": "stdout",
1311
     "output_type": "stream",
1312
     "text": [
1313
      "With maximum sequence length 1000, selected top 2 document sections: \n",
1314
      "No. Amazon SageMaker operates the compute infrastructure on your behalf, allowing it to perform health checks, apply security patches, and do other routine maintenance. You can also deploy the model artifacts from training with custom inference code in your own hosting environment.\n",
1315
      "Amazon SageMaker Data Wrangler provides a unified experience enabling you to prepare data and seamlessly train a machine learning model in Amazon SageMaker Autopilot. SageMaker Autopilot automatically builds, trains, and tunes the best ML models based on your data. With SageMaker Autopilot, you still maintain full control and visibility of your data and model. You can also use features prepared in SageMaker Data Wrangler with your existing models. You can configure Amazon SageMaker Data Wrangler processing jobs to run as part of your SageMaker training pipeline either by configuring the job in the user interface (UI) or exporting a notebook with the orchestration code.\n"
1316
     ]
1317
    },
1318
    {
1319
     "data": {
1320
      "text/plain": [
1321
       "' Based on the context provided, the answer is \"Yes, you can deploy a model trained outside of Amazon SageMaker.\"\\n\\nAccording to the text, Amazon SageMaker Data Wrangler provides a unified experience for preparing data and training machine learning models, including models trained outside of SageMaker. This suggests that SageMaker Autopilot can be used to deploy models trained outside of the platform, as long as they are in a format that can be processed by SageMaker.\\n\\nAdditionally, the text states that you can configure Amazon SageMaker Data Wrangler processing jobs to run as part of your SageMaker training pipeline, either through the user interface or by exporting a notebook with the orchestration code. This suggests that you can integrate models trained outside of SageMaker into your SageMaker workflows and deploy them using the platform\\'s infrastructure.\\n\\nTherefore, based on the context provided, the answer is \"Yes, you can deploy a model trained outside of Amazon SageMaker.\"'"
1322
      ]
1323
     },
1324
     "execution_count": 87,
1325
     "metadata": {},
1326
     "output_type": "execute_result"
1327
    }
1328
   ],
1329
   "source": [
1330
    "rag_query(\"Can I deploy a model trained outside of SageMaker?\")"
1331
   ]
1332
  },
1333
  {
1334
   "attachments": {},
1335
   "cell_type": "markdown",
1336
   "metadata": {},
1337
   "source": [
1338
    "---"
1339
   ]
1340
  }
1341
 ],
1342
 "metadata": {
1343
  "availableInstances": [
1344
   {
1345
    "_defaultOrder": 0,
1346
    "_isFastLaunch": true,
1347
    "category": "General purpose",
1348
    "gpuNum": 0,
1349
    "hideHardwareSpecs": false,
1350
    "memoryGiB": 4,
1351
    "name": "ml.t3.medium",
1352
    "vcpuNum": 2
1353
   },
1354
   {
1355
    "_defaultOrder": 1,
1356
    "_isFastLaunch": false,
1357
    "category": "General purpose",
1358
    "gpuNum": 0,
1359
    "hideHardwareSpecs": false,
1360
    "memoryGiB": 8,
1361
    "name": "ml.t3.large",
1362
    "vcpuNum": 2
1363
   },
1364
   {
1365
    "_defaultOrder": 2,
1366
    "_isFastLaunch": false,
1367
    "category": "General purpose",
1368
    "gpuNum": 0,
1369
    "hideHardwareSpecs": false,
1370
    "memoryGiB": 16,
1371
    "name": "ml.t3.xlarge",
1372
    "vcpuNum": 4
1373
   },
1374
   {
1375
    "_defaultOrder": 3,
1376
    "_isFastLaunch": false,
1377
    "category": "General purpose",
1378
    "gpuNum": 0,
1379
    "hideHardwareSpecs": false,
1380
    "memoryGiB": 32,
1381
    "name": "ml.t3.2xlarge",
1382
    "vcpuNum": 8
1383
   },
1384
   {
1385
    "_defaultOrder": 4,
1386
    "_isFastLaunch": true,
1387
    "category": "General purpose",
1388
    "gpuNum": 0,
1389
    "hideHardwareSpecs": false,
1390
    "memoryGiB": 8,
1391
    "name": "ml.m5.large",
1392
    "vcpuNum": 2
1393
   },
1394
   {
1395
    "_defaultOrder": 5,
1396
    "_isFastLaunch": false,
1397
    "category": "General purpose",
1398
    "gpuNum": 0,
1399
    "hideHardwareSpecs": false,
1400
    "memoryGiB": 16,
1401
    "name": "ml.m5.xlarge",
1402
    "vcpuNum": 4
1403
   },
1404
   {
1405
    "_defaultOrder": 6,
1406
    "_isFastLaunch": false,
1407
    "category": "General purpose",
1408
    "gpuNum": 0,
1409
    "hideHardwareSpecs": false,
1410
    "memoryGiB": 32,
1411
    "name": "ml.m5.2xlarge",
1412
    "vcpuNum": 8
1413
   },
1414
   {
1415
    "_defaultOrder": 7,
1416
    "_isFastLaunch": false,
1417
    "category": "General purpose",
1418
    "gpuNum": 0,
1419
    "hideHardwareSpecs": false,
1420
    "memoryGiB": 64,
1421
    "name": "ml.m5.4xlarge",
1422
    "vcpuNum": 16
1423
   },
1424
   {
1425
    "_defaultOrder": 8,
1426
    "_isFastLaunch": false,
1427
    "category": "General purpose",
1428
    "gpuNum": 0,
1429
    "hideHardwareSpecs": false,
1430
    "memoryGiB": 128,
1431
    "name": "ml.m5.8xlarge",
1432
    "vcpuNum": 32
1433
   },
1434
   {
1435
    "_defaultOrder": 9,
1436
    "_isFastLaunch": false,
1437
    "category": "General purpose",
1438
    "gpuNum": 0,
1439
    "hideHardwareSpecs": false,
1440
    "memoryGiB": 192,
1441
    "name": "ml.m5.12xlarge",
1442
    "vcpuNum": 48
1443
   },
1444
   {
1445
    "_defaultOrder": 10,
1446
    "_isFastLaunch": false,
1447
    "category": "General purpose",
1448
    "gpuNum": 0,
1449
    "hideHardwareSpecs": false,
1450
    "memoryGiB": 256,
1451
    "name": "ml.m5.16xlarge",
1452
    "vcpuNum": 64
1453
   },
1454
   {
1455
    "_defaultOrder": 11,
1456
    "_isFastLaunch": false,
1457
    "category": "General purpose",
1458
    "gpuNum": 0,
1459
    "hideHardwareSpecs": false,
1460
    "memoryGiB": 384,
1461
    "name": "ml.m5.24xlarge",
1462
    "vcpuNum": 96
1463
   },
1464
   {
1465
    "_defaultOrder": 12,
1466
    "_isFastLaunch": false,
1467
    "category": "General purpose",
1468
    "gpuNum": 0,
1469
    "hideHardwareSpecs": false,
1470
    "memoryGiB": 8,
1471
    "name": "ml.m5d.large",
1472
    "vcpuNum": 2
1473
   },
1474
   {
1475
    "_defaultOrder": 13,
1476
    "_isFastLaunch": false,
1477
    "category": "General purpose",
1478
    "gpuNum": 0,
1479
    "hideHardwareSpecs": false,
1480
    "memoryGiB": 16,
1481
    "name": "ml.m5d.xlarge",
1482
    "vcpuNum": 4
1483
   },
1484
   {
1485
    "_defaultOrder": 14,
1486
    "_isFastLaunch": false,
1487
    "category": "General purpose",
1488
    "gpuNum": 0,
1489
    "hideHardwareSpecs": false,
1490
    "memoryGiB": 32,
1491
    "name": "ml.m5d.2xlarge",
1492
    "vcpuNum": 8
1493
   },
1494
   {
1495
    "_defaultOrder": 15,
1496
    "_isFastLaunch": false,
1497
    "category": "General purpose",
1498
    "gpuNum": 0,
1499
    "hideHardwareSpecs": false,
1500
    "memoryGiB": 64,
1501
    "name": "ml.m5d.4xlarge",
1502
    "vcpuNum": 16
1503
   },
1504
   {
1505
    "_defaultOrder": 16,
1506
    "_isFastLaunch": false,
1507
    "category": "General purpose",
1508
    "gpuNum": 0,
1509
    "hideHardwareSpecs": false,
1510
    "memoryGiB": 128,
1511
    "name": "ml.m5d.8xlarge",
1512
    "vcpuNum": 32
1513
   },
1514
   {
1515
    "_defaultOrder": 17,
1516
    "_isFastLaunch": false,
1517
    "category": "General purpose",
1518
    "gpuNum": 0,
1519
    "hideHardwareSpecs": false,
1520
    "memoryGiB": 192,
1521
    "name": "ml.m5d.12xlarge",
1522
    "vcpuNum": 48
1523
   },
1524
   {
1525
    "_defaultOrder": 18,
1526
    "_isFastLaunch": false,
1527
    "category": "General purpose",
1528
    "gpuNum": 0,
1529
    "hideHardwareSpecs": false,
1530
    "memoryGiB": 256,
1531
    "name": "ml.m5d.16xlarge",
1532
    "vcpuNum": 64
1533
   },
1534
   {
1535
    "_defaultOrder": 19,
1536
    "_isFastLaunch": false,
1537
    "category": "General purpose",
1538
    "gpuNum": 0,
1539
    "hideHardwareSpecs": false,
1540
    "memoryGiB": 384,
1541
    "name": "ml.m5d.24xlarge",
1542
    "vcpuNum": 96
1543
   },
1544
   {
1545
    "_defaultOrder": 20,
1546
    "_isFastLaunch": false,
1547
    "category": "General purpose",
1548
    "gpuNum": 0,
1549
    "hideHardwareSpecs": true,
1550
    "memoryGiB": 0,
1551
    "name": "ml.geospatial.interactive",
1552
    "supportedImageNames": [
1553
     "sagemaker-geospatial-v1-0"
1554
    ],
1555
    "vcpuNum": 0
1556
   },
1557
   {
1558
    "_defaultOrder": 21,
1559
    "_isFastLaunch": true,
1560
    "category": "Compute optimized",
1561
    "gpuNum": 0,
1562
    "hideHardwareSpecs": false,
1563
    "memoryGiB": 4,
1564
    "name": "ml.c5.large",
1565
    "vcpuNum": 2
1566
   },
1567
   {
1568
    "_defaultOrder": 22,
1569
    "_isFastLaunch": false,
1570
    "category": "Compute optimized",
1571
    "gpuNum": 0,
1572
    "hideHardwareSpecs": false,
1573
    "memoryGiB": 8,
1574
    "name": "ml.c5.xlarge",
1575
    "vcpuNum": 4
1576
   },
1577
   {
1578
    "_defaultOrder": 23,
1579
    "_isFastLaunch": false,
1580
    "category": "Compute optimized",
1581
    "gpuNum": 0,
1582
    "hideHardwareSpecs": false,
1583
    "memoryGiB": 16,
1584
    "name": "ml.c5.2xlarge",
1585
    "vcpuNum": 8
1586
   },
1587
   {
1588
    "_defaultOrder": 24,
1589
    "_isFastLaunch": false,
1590
    "category": "Compute optimized",
1591
    "gpuNum": 0,
1592
    "hideHardwareSpecs": false,
1593
    "memoryGiB": 32,
1594
    "name": "ml.c5.4xlarge",
1595
    "vcpuNum": 16
1596
   },
1597
   {
1598
    "_defaultOrder": 25,
1599
    "_isFastLaunch": false,
1600
    "category": "Compute optimized",
1601
    "gpuNum": 0,
1602
    "hideHardwareSpecs": false,
1603
    "memoryGiB": 72,
1604
    "name": "ml.c5.9xlarge",
1605
    "vcpuNum": 36
1606
   },
1607
   {
1608
    "_defaultOrder": 26,
1609
    "_isFastLaunch": false,
1610
    "category": "Compute optimized",
1611
    "gpuNum": 0,
1612
    "hideHardwareSpecs": false,
1613
    "memoryGiB": 96,
1614
    "name": "ml.c5.12xlarge",
1615
    "vcpuNum": 48
1616
   },
1617
   {
1618
    "_defaultOrder": 27,
1619
    "_isFastLaunch": false,
1620
    "category": "Compute optimized",
1621
    "gpuNum": 0,
1622
    "hideHardwareSpecs": false,
1623
    "memoryGiB": 144,
1624
    "name": "ml.c5.18xlarge",
1625
    "vcpuNum": 72
1626
   },
1627
   {
1628
    "_defaultOrder": 28,
1629
    "_isFastLaunch": false,
1630
    "category": "Compute optimized",
1631
    "gpuNum": 0,
1632
    "hideHardwareSpecs": false,
1633
    "memoryGiB": 192,
1634
    "name": "ml.c5.24xlarge",
1635
    "vcpuNum": 96
1636
   },
1637
   {
1638
    "_defaultOrder": 29,
1639
    "_isFastLaunch": true,
1640
    "category": "Accelerated computing",
1641
    "gpuNum": 1,
1642
    "hideHardwareSpecs": false,
1643
    "memoryGiB": 16,
1644
    "name": "ml.g4dn.xlarge",
1645
    "vcpuNum": 4
1646
   },
1647
   {
1648
    "_defaultOrder": 30,
1649
    "_isFastLaunch": false,
1650
    "category": "Accelerated computing",
1651
    "gpuNum": 1,
1652
    "hideHardwareSpecs": false,
1653
    "memoryGiB": 32,
1654
    "name": "ml.g4dn.2xlarge",
1655
    "vcpuNum": 8
1656
   },
1657
   {
1658
    "_defaultOrder": 31,
1659
    "_isFastLaunch": false,
1660
    "category": "Accelerated computing",
1661
    "gpuNum": 1,
1662
    "hideHardwareSpecs": false,
1663
    "memoryGiB": 64,
1664
    "name": "ml.g4dn.4xlarge",
1665
    "vcpuNum": 16
1666
   },
1667
   {
1668
    "_defaultOrder": 32,
1669
    "_isFastLaunch": false,
1670
    "category": "Accelerated computing",
1671
    "gpuNum": 1,
1672
    "hideHardwareSpecs": false,
1673
    "memoryGiB": 128,
1674
    "name": "ml.g4dn.8xlarge",
1675
    "vcpuNum": 32
1676
   },
1677
   {
1678
    "_defaultOrder": 33,
1679
    "_isFastLaunch": false,
1680
    "category": "Accelerated computing",
1681
    "gpuNum": 4,
1682
    "hideHardwareSpecs": false,
1683
    "memoryGiB": 192,
1684
    "name": "ml.g4dn.12xlarge",
1685
    "vcpuNum": 48
1686
   },
1687
   {
1688
    "_defaultOrder": 34,
1689
    "_isFastLaunch": false,
1690
    "category": "Accelerated computing",
1691
    "gpuNum": 1,
1692
    "hideHardwareSpecs": false,
1693
    "memoryGiB": 256,
1694
    "name": "ml.g4dn.16xlarge",
1695
    "vcpuNum": 64
1696
   },
1697
   {
1698
    "_defaultOrder": 35,
1699
    "_isFastLaunch": false,
1700
    "category": "Accelerated computing",
1701
    "gpuNum": 1,
1702
    "hideHardwareSpecs": false,
1703
    "memoryGiB": 61,
1704
    "name": "ml.p3.2xlarge",
1705
    "vcpuNum": 8
1706
   },
1707
   {
1708
    "_defaultOrder": 36,
1709
    "_isFastLaunch": false,
1710
    "category": "Accelerated computing",
1711
    "gpuNum": 4,
1712
    "hideHardwareSpecs": false,
1713
    "memoryGiB": 244,
1714
    "name": "ml.p3.8xlarge",
1715
    "vcpuNum": 32
1716
   },
1717
   {
1718
    "_defaultOrder": 37,
1719
    "_isFastLaunch": false,
1720
    "category": "Accelerated computing",
1721
    "gpuNum": 8,
1722
    "hideHardwareSpecs": false,
1723
    "memoryGiB": 488,
1724
    "name": "ml.p3.16xlarge",
1725
    "vcpuNum": 64
1726
   },
1727
   {
1728
    "_defaultOrder": 38,
1729
    "_isFastLaunch": false,
1730
    "category": "Accelerated computing",
1731
    "gpuNum": 8,
1732
    "hideHardwareSpecs": false,
1733
    "memoryGiB": 768,
1734
    "name": "ml.p3dn.24xlarge",
1735
    "vcpuNum": 96
1736
   },
1737
   {
1738
    "_defaultOrder": 39,
1739
    "_isFastLaunch": false,
1740
    "category": "Memory Optimized",
1741
    "gpuNum": 0,
1742
    "hideHardwareSpecs": false,
1743
    "memoryGiB": 16,
1744
    "name": "ml.r5.large",
1745
    "vcpuNum": 2
1746
   },
1747
   {
1748
    "_defaultOrder": 40,
1749
    "_isFastLaunch": false,
1750
    "category": "Memory Optimized",
1751
    "gpuNum": 0,
1752
    "hideHardwareSpecs": false,
1753
    "memoryGiB": 32,
1754
    "name": "ml.r5.xlarge",
1755
    "vcpuNum": 4
1756
   },
1757
   {
1758
    "_defaultOrder": 41,
1759
    "_isFastLaunch": false,
1760
    "category": "Memory Optimized",
1761
    "gpuNum": 0,
1762
    "hideHardwareSpecs": false,
1763
    "memoryGiB": 64,
1764
    "name": "ml.r5.2xlarge",
1765
    "vcpuNum": 8
1766
   },
1767
   {
1768
    "_defaultOrder": 42,
1769
    "_isFastLaunch": false,
1770
    "category": "Memory Optimized",
1771
    "gpuNum": 0,
1772
    "hideHardwareSpecs": false,
1773
    "memoryGiB": 128,
1774
    "name": "ml.r5.4xlarge",
1775
    "vcpuNum": 16
1776
   },
1777
   {
1778
    "_defaultOrder": 43,
1779
    "_isFastLaunch": false,
1780
    "category": "Memory Optimized",
1781
    "gpuNum": 0,
1782
    "hideHardwareSpecs": false,
1783
    "memoryGiB": 256,
1784
    "name": "ml.r5.8xlarge",
1785
    "vcpuNum": 32
1786
   },
1787
   {
1788
    "_defaultOrder": 44,
1789
    "_isFastLaunch": false,
1790
    "category": "Memory Optimized",
1791
    "gpuNum": 0,
1792
    "hideHardwareSpecs": false,
1793
    "memoryGiB": 384,
1794
    "name": "ml.r5.12xlarge",
1795
    "vcpuNum": 48
1796
   },
1797
   {
1798
    "_defaultOrder": 45,
1799
    "_isFastLaunch": false,
1800
    "category": "Memory Optimized",
1801
    "gpuNum": 0,
1802
    "hideHardwareSpecs": false,
1803
    "memoryGiB": 512,
1804
    "name": "ml.r5.16xlarge",
1805
    "vcpuNum": 64
1806
   },
1807
   {
1808
    "_defaultOrder": 46,
1809
    "_isFastLaunch": false,
1810
    "category": "Memory Optimized",
1811
    "gpuNum": 0,
1812
    "hideHardwareSpecs": false,
1813
    "memoryGiB": 768,
1814
    "name": "ml.r5.24xlarge",
1815
    "vcpuNum": 96
1816
   },
1817
   {
1818
    "_defaultOrder": 47,
1819
    "_isFastLaunch": false,
1820
    "category": "Accelerated computing",
1821
    "gpuNum": 1,
1822
    "hideHardwareSpecs": false,
1823
    "memoryGiB": 16,
1824
    "name": "ml.g5.xlarge",
1825
    "vcpuNum": 4
1826
   },
1827
   {
1828
    "_defaultOrder": 48,
1829
    "_isFastLaunch": false,
1830
    "category": "Accelerated computing",
1831
    "gpuNum": 1,
1832
    "hideHardwareSpecs": false,
1833
    "memoryGiB": 32,
1834
    "name": "ml.g5.2xlarge",
1835
    "vcpuNum": 8
1836
   },
1837
   {
1838
    "_defaultOrder": 49,
1839
    "_isFastLaunch": false,
1840
    "category": "Accelerated computing",
1841
    "gpuNum": 1,
1842
    "hideHardwareSpecs": false,
1843
    "memoryGiB": 64,
1844
    "name": "ml.g5.4xlarge",
1845
    "vcpuNum": 16
1846
   },
1847
   {
1848
    "_defaultOrder": 50,
1849
    "_isFastLaunch": false,
1850
    "category": "Accelerated computing",
1851
    "gpuNum": 1,
1852
    "hideHardwareSpecs": false,
1853
    "memoryGiB": 128,
1854
    "name": "ml.g5.8xlarge",
1855
    "vcpuNum": 32
1856
   },
1857
   {
1858
    "_defaultOrder": 51,
1859
    "_isFastLaunch": false,
1860
    "category": "Accelerated computing",
1861
    "gpuNum": 1,
1862
    "hideHardwareSpecs": false,
1863
    "memoryGiB": 256,
1864
    "name": "ml.g5.16xlarge",
1865
    "vcpuNum": 64
1866
   },
1867
   {
1868
    "_defaultOrder": 52,
1869
    "_isFastLaunch": false,
1870
    "category": "Accelerated computing",
1871
    "gpuNum": 4,
1872
    "hideHardwareSpecs": false,
1873
    "memoryGiB": 192,
1874
    "name": "ml.g5.12xlarge",
1875
    "vcpuNum": 48
1876
   },
1877
   {
1878
    "_defaultOrder": 53,
1879
    "_isFastLaunch": false,
1880
    "category": "Accelerated computing",
1881
    "gpuNum": 4,
1882
    "hideHardwareSpecs": false,
1883
    "memoryGiB": 384,
1884
    "name": "ml.g5.24xlarge",
1885
    "vcpuNum": 96
1886
   },
1887
   {
1888
    "_defaultOrder": 54,
1889
    "_isFastLaunch": false,
1890
    "category": "Accelerated computing",
1891
    "gpuNum": 8,
1892
    "hideHardwareSpecs": false,
1893
    "memoryGiB": 768,
1894
    "name": "ml.g5.48xlarge",
1895
    "vcpuNum": 192
1896
   },
1897
   {
1898
    "_defaultOrder": 55,
1899
    "_isFastLaunch": false,
1900
    "category": "Accelerated computing",
1901
    "gpuNum": 8,
1902
    "hideHardwareSpecs": false,
1903
    "memoryGiB": 1152,
1904
    "name": "ml.p4d.24xlarge",
1905
    "vcpuNum": 96
1906
   },
1907
   {
1908
    "_defaultOrder": 56,
1909
    "_isFastLaunch": false,
1910
    "category": "Accelerated computing",
1911
    "gpuNum": 8,
1912
    "hideHardwareSpecs": false,
1913
    "memoryGiB": 1152,
1914
    "name": "ml.p4de.24xlarge",
1915
    "vcpuNum": 96
1916
   }
1917
  ],
1918
  "instance_type": "ml.t3.medium",
1919
  "kernelspec": {
1920
   "display_name": "Python 3 (Data Science)",
1921
   "language": "python",
1922
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
1923
  },
1924
  "language_info": {
1925
   "codemirror_mode": {
1926
    "name": "ipython",
1927
    "version": 3
1928
   },
1929
   "file_extension": ".py",
1930
   "mimetype": "text/x-python",
1931
   "name": "python",
1932
   "nbconvert_exporter": "python",
1933
   "pygments_lexer": "ipython3",
1934
   "version": "3.7.10"
1935
  }
1936
 },
1937
 "nbformat": 4,
1938
 "nbformat_minor": 4
1939
}
1940
examples

Использование cookies