examples

sagemaker-huggingface-rag.ipynb
1938 строк · 54.0 Кб
Перенос по словам
1
{
2
 "cells": [
3
  {
4
   "attachments": {},
5
   "cell_type": "markdown",
6
   "metadata": {},
7
   "source": [
8
    "[![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/pinecone-io/examples/blob/master/learn/generation/aws/sagemaker/sagemaker-huggingface-rag.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/aws/sagemaker/sagemaker-huggingface-rag.ipynb)"
9
   ]
10
  },
11
  {
12
   "attachments": {},
13
   "cell_type": "markdown",
14
   "metadata": {},
15
   "source": [
16
    "# Retrieval-Augmented Generation: Question Answering based on Custom Dataset\n"
17
   ]
18
  },
19
  {
20
   "attachments": {},
21
   "cell_type": "markdown",
22
   "metadata": {},
23
   "source": [
24
    "\n",
25
    "Many use cases such as building a chatbot require text (text2text) generation models like **[BloomZ 7B1](https://huggingface.co/bigscience/bloomz-7b1)**, **[Flan T5 XL](https://huggingface.co/google/flan-t5-xl)**, and **[Flan T5 UL2](https://huggingface.co/google/flan-ul2)** to respond to user questions with insightful answers. The **BloomZ 7B1**, **Flan T5 XL**, and **Flan T5 UL2** models have picked up a lot of general knowledge in training, but we often need to ingest and use a large library of more specific information.\n",
26
    "\n",
27
    "In this notebook we will demonstrate how to use **Flan T5 XL** to answer questions using a library of documents as a reference, by using document embeddings and retrieval. The embeddings are generated from **MiniLM** embedding model. \n",
28
    "\n",
29
    "**This notebook serves a template such that you can easily replace the example dataset by your own to build a custom question and asnwering application.**"
30
   ]
31
  },
32
  {
33
   "attachments": {},
34
   "cell_type": "markdown",
35
   "metadata": {},
36
   "source": [
37
    "## Step 1. Deploy large language model (LLM) in SageMaker JumpStart\n",
38
    "\n",
39
    "To better illustrate the idea, let's first deploy all the models that are required to perform the demo. You can choose either deploying all three Flan T5 XL, BloomZ 7B1, and Flan UL2 models as the large language model (LLM) to compare their model performances, or select **subset** of the models based on your preference. To do that, you need modify the `_MODEL_CONFIG_` python dictionary defined as below."
40
   ]
41
  },
42
  {
43
   "cell_type": "code",
44
   "execution_count": 2,
45
   "metadata": {
46
    "collapsed": false,
47
    "jupyter": {
48
     "outputs_hidden": false
49
    },
50
    "pycharm": {
51
     "name": "#%%\n"
52
    },
53
    "tags": []
54
   },
55
   "outputs": [
56
    {
57
     "name": "stdout",
58
     "output_type": "stream",
59
     "text": [
60
      "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n",
61
      "\u001b[0m\n",
62
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.1.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.2.1\u001b[0m\n",
63
      "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n"
64
     ]
65
    }
66
   ],
67
   "source": [
68
    "!pip install -qU \\\n",
69
    "    sagemaker==2.173.0 \\\n",
70
    "    pinecone-client==2.2.1 \\\n",
71
    "    ipywidgets==7.0.0"
72
   ]
73
  },
74
  {
75
   "attachments": {},
76
   "cell_type": "markdown",
77
   "metadata": {},
78
   "source": [
79
    "To begin, we will initialize all of the SageMaker session variables we'll need to use throughout the walkthrough."
80
   ]
81
  },
82
  {
83
   "cell_type": "code",
84
   "execution_count": 3,
85
   "metadata": {
86
    "tags": []
87
   },
88
   "outputs": [],
89
   "source": [
90
    "import sagemaker\n",
91
    "from sagemaker.huggingface import (\n",
92
    "    HuggingFaceModel,\n",
93
    "    get_huggingface_llm_image_uri\n",
94
    ")\n",
95
    "\n",
96
    "role = sagemaker.get_execution_role()\n",
97
    "\n",
98
    "hub_config = {\n",
99
    "    'HF_MODEL_ID':'google/flan-t5-xl', # model_id from hf.co/models\n",
100
    "    'HF_TASK':'text-generation' # NLP task you want to use for predictions\n",
101
    "}\n",
102
    "\n",
103
    "# retrieve the llm image uri\n",
104
    "llm_image = get_huggingface_llm_image_uri(\n",
105
    "  \"huggingface\",\n",
106
    "  version=\"0.8.2\"\n",
107
    ")\n",
108
    "\n",
109
    "huggingface_model = HuggingFaceModel(\n",
110
    "    env=hub_config,\n",
111
    "    role=role, # iam role with permissions to create an Endpoint\n",
112
    "    image_uri=llm_image\n",
113
    ")"
114
   ]
115
  },
116
  {
117
   "attachments": {},
118
   "cell_type": "markdown",
119
   "metadata": {},
120
   "source": [
121
    "We will use a `ml.g5.4xlarge` instance to deploy our Flan T5 XL model. We can find pricing for all instances [here](https://aws.amazon.com/sagemaker/pricing/)."
122
   ]
123
  },
124
  {
125
   "cell_type": "code",
126
   "execution_count": 4,
127
   "metadata": {
128
    "tags": []
129
   },
130
   "outputs": [
131
    {
132
     "name": "stdout",
133
     "output_type": "stream",
134
     "text": [
135
      "-----------!"
136
     ]
137
    }
138
   ],
139
   "source": [
140
    "llm = huggingface_model.deploy(\n",
141
    "    initial_instance_count=1,\n",
142
    "    instance_type=\"ml.g5.4xlarge\",\n",
143
    "    endpoint_name=\"flan-t5-demo\"\n",
144
    ")"
145
   ]
146
  },
147
  {
148
   "attachments": {},
149
   "cell_type": "markdown",
150
   "metadata": {},
151
   "source": [
152
    "## Step 2. Ask a question to LLM without providing the context\n",
153
    "\n",
154
    "To better illustrate why we need retrieval-augmented generation (RAG) based approach to solve the question and anwering problem. Let's directly ask the model a question and see how they respond."
155
   ]
156
  },
157
  {
158
   "cell_type": "code",
159
   "execution_count": 5,
160
   "metadata": {
161
    "tags": []
162
   },
163
   "outputs": [
164
    {
165
     "data": {
166
      "text/plain": [
167
       "[{'generated_text': 'SageMaker and SageMaker XL.'}]"
168
      ]
169
     },
170
     "execution_count": 5,
171
     "metadata": {},
172
     "output_type": "execute_result"
173
    }
174
   ],
175
   "source": [
176
    "question = \"Which instances can I use with Managed Spot Training in SageMaker?\"\n",
177
    "\n",
178
    "out = llm.predict({\"inputs\": question})\n",
179
    "out"
180
   ]
181
  },
182
  {
183
   "attachments": {},
184
   "cell_type": "markdown",
185
   "metadata": {},
186
   "source": [
187
    "You can see the generated answer is wrong or doesn't make much sense. "
188
   ]
189
  },
190
  {
191
   "attachments": {},
192
   "cell_type": "markdown",
193
   "metadata": {},
194
   "source": [
195
    "## Step 3. Improve the answer to the same question using **prompt engineering** with insightful context\n",
196
    "\n",
197
    "\n",
198
    "To better answer the question well, we provide extra contextual information, combine it with a prompt, and send it to model together with the question. Below is an example."
199
   ]
200
  },
201
  {
202
   "cell_type": "code",
203
   "execution_count": 6,
204
   "metadata": {
205
    "tags": []
206
   },
207
   "outputs": [],
208
   "source": [
209
    "context = \"\"\"Managed Spot Training can be used with all instances\n",
210
    "supported in Amazon SageMaker. Managed Spot Training is supported\n",
211
    "in all AWS Regions where Amazon SageMaker is currently available.\"\"\""
212
   ]
213
  },
214
  {
215
   "cell_type": "code",
216
   "execution_count": 7,
217
   "metadata": {
218
    "tags": []
219
   },
220
   "outputs": [
221
    {
222
     "name": "stdout",
223
     "output_type": "stream",
224
     "text": [
225
      "[Input]: Which instances can I use with Managed Spot Training in SageMaker?\n",
226
      "[Output]: all instances supported in Amazon SageMaker\n"
227
     ]
228
    }
229
   ],
230
   "source": [
231
    "prompt_template = \"\"\"Answer the following QUESTION based on the CONTEXT\n",
232
    "given. If you do not know the answer and the CONTEXT doesn't\n",
233
    "contain the answer truthfully say \"I don't know\".\n",
234
    "\n",
235
    "CONTEXT:\n",
236
    "{context}\n",
237
    "\n",
238
    "QUESTION:\n",
239
    "{question}\n",
240
    "\n",
241
    "ANSWER:\n",
242
    "\"\"\"\n",
243
    "\n",
244
    "text_input = prompt_template.replace(\"{context}\", context).replace(\"{question}\", question)\n",
245
    "\n",
246
    "out = llm.predict({\"inputs\": text_input})\n",
247
    "generated_text = out[0][\"generated_text\"]\n",
248
    "print(f\"[Input]: {question}\\n[Output]: {generated_text}\")"
249
   ]
250
  },
251
  {
252
   "attachments": {},
253
   "cell_type": "markdown",
254
   "metadata": {},
255
   "source": [
256
    "Let's see if our LLM is capable of following our instructions..."
257
   ]
258
  },
259
  {
260
   "cell_type": "code",
261
   "execution_count": 8,
262
   "metadata": {
263
    "tags": []
264
   },
265
   "outputs": [
266
    {
267
     "name": "stdout",
268
     "output_type": "stream",
269
     "text": [
270
      "[Input]: What color is my desk?\n",
271
      "[Output]: I don't know\n"
272
     ]
273
    }
274
   ],
275
   "source": [
276
    "unanswerable_question = \"What color is my desk?\"\n",
277
    "\n",
278
    "text_input = prompt_template.replace(\"{context}\", context).replace(\"{question}\", unanswerable_question)\n",
279
    "\n",
280
    "out = llm.predict({\"inputs\": text_input})\n",
281
    "generated_text = out[0][\"generated_text\"]\n",
282
    "print(f\"[Input]: {unanswerable_question}\\n[Output]: {generated_text}\")"
283
   ]
284
  },
285
  {
286
   "attachments": {},
287
   "cell_type": "markdown",
288
   "metadata": {},
289
   "source": [
290
    "Looks great! The LLM is following instructions and we've also demonstrated how contexts can help our LLM answer questions accurately. However, we're unlikely to be inserting a context directly into a prompt like this unless we already know the answer — and if we already know the answer why would we be asking the question at all?\n",
291
    "\n",
292
    "We need a way of extracting _relevant contexts_ from huge bases of information. For that we need **R**etrieval **A**ugmented **G**eneration (RAG)."
293
   ]
294
  },
295
  {
296
   "attachments": {},
297
   "cell_type": "markdown",
298
   "metadata": {},
299
   "source": [
300
    "## Step 4. Use RAG based approach to identify the correct documents, and use them along with prompt and question to query LLM\n",
301
    "\n",
302
    "\n",
303
    "We plan to use document embeddings to fetch the most relevant documents in our document knowledge library and combine them with the prompt that we provide to LLM.\n",
304
    "\n",
305
    "To achieve that, we will do following.\n",
306
    "\n",
307
    "* Generate embedings for each of document in the knowledge library with the MiniLM embedding model.\n",
308
    "* Identify top K most relevant documents based on user query.\n",
309
    "    * For a query of your interest, generate the embedding of the query using the same embedding model.\n",
310
    "    * Search the indexes of top K most relevant documents in the embedding space using the SageMaker KNN algorithm.\n",
311
    "    * Use the indexes to retrieve the corresponded documents.\n",
312
    "* Combine the retrieved documents with prompt and question and send them into LLM.\n",
313
    "\n",
314
    "\n",
315
    "\n",
316
    "Note: The retrieved document/text should be large enough to contain enough information to answer a question; but small enough to fit into the LLM prompt -- maximum sequence length of 1024 tokens. "
317
   ]
318
  },
319
  {
320
   "attachments": {},
321
   "cell_type": "markdown",
322
   "metadata": {},
323
   "source": [
324
    "### 4.1 Deploying the model endpoint for Sentence Transformer embedding model"
325
   ]
326
  },
327
  {
328
   "cell_type": "code",
329
   "execution_count": 9,
330
   "metadata": {
331
    "tags": []
332
   },
333
   "outputs": [],
334
   "source": [
335
    "hub_config = {\n",
336
    "    'HF_MODEL_ID': 'sentence-transformers/all-MiniLM-L6-v2', # model_id from hf.co/models\n",
337
    "    'HF_TASK': 'feature-extraction'\n",
338
    "}\n",
339
    "\n",
340
    "huggingface_model = HuggingFaceModel(\n",
341
    "    env=hub_config,\n",
342
    "    role=role,\n",
343
    "    transformers_version=\"4.6\", # transformers version used\n",
344
    "    pytorch_version=\"1.7\", # pytorch version used\n",
345
    "    py_version=\"py36\", # python version of the DLC\n",
346
    ")"
347
   ]
348
  },
349
  {
350
   "attachments": {},
351
   "cell_type": "markdown",
352
   "metadata": {},
353
   "source": [
354
    "Then we deploy the model as we did earlier for our generative LLM:"
355
   ]
356
  },
357
  {
358
   "cell_type": "code",
359
   "execution_count": 10,
360
   "metadata": {
361
    "tags": []
362
   },
363
   "outputs": [
364
    {
365
     "name": "stdout",
366
     "output_type": "stream",
367
     "text": [
368
      "----!"
369
     ]
370
    }
371
   ],
372
   "source": [
373
    "encoder = huggingface_model.deploy(\n",
374
    "    initial_instance_count=1,\n",
375
    "    instance_type=\"ml.t2.large\",\n",
376
    "    endpoint_name=\"minilm-demo\"\n",
377
    ")"
378
   ]
379
  },
380
  {
381
   "attachments": {},
382
   "cell_type": "markdown",
383
   "metadata": {},
384
   "source": [
385
    "We can then create the embeddings like so:"
386
   ]
387
  },
388
  {
389
   "cell_type": "code",
390
   "execution_count": 11,
391
   "metadata": {
392
    "tags": []
393
   },
394
   "outputs": [],
395
   "source": [
396
    "out = encoder.predict({\"inputs\": [\"some text here\", \"some more text goes here too\"]})"
397
   ]
398
  },
399
  {
400
   "attachments": {},
401
   "cell_type": "markdown",
402
   "metadata": {},
403
   "source": [
404
    "We will see that we have two outputs (one for each of our input sentences):"
405
   ]
406
  },
407
  {
408
   "cell_type": "code",
409
   "execution_count": 12,
410
   "metadata": {
411
    "tags": []
412
   },
413
   "outputs": [
414
    {
415
     "data": {
416
      "text/plain": [
417
       "2"
418
      ]
419
     },
420
     "execution_count": 12,
421
     "metadata": {},
422
     "output_type": "execute_result"
423
    }
424
   ],
425
   "source": [
426
    "len(out)"
427
   ]
428
  },
429
  {
430
   "attachments": {},
431
   "cell_type": "markdown",
432
   "metadata": {},
433
   "source": [
434
    "But if we look at each of these outputs we see something strange..."
435
   ]
436
  },
437
  {
438
   "cell_type": "code",
439
   "execution_count": 13,
440
   "metadata": {
441
    "tags": []
442
   },
443
   "outputs": [
444
    {
445
     "data": {
446
      "text/plain": [
447
       "(8, 8)"
448
      ]
449
     },
450
     "execution_count": 13,
451
     "metadata": {},
452
     "output_type": "execute_result"
453
    }
454
   ],
455
   "source": [
456
    "len(out[0]), len(out[1])"
457
   ]
458
  },
459
  {
460
   "attachments": {},
461
   "cell_type": "markdown",
462
   "metadata": {},
463
   "source": [
464
    "We would expect the embeddings to be of dimensionality *384*, but we're seeing two lists containing _eight_ items each? What is happening here?\n",
465
    "\n",
466
    "When we output feature embeddings from the MiniLM model we're actually outputting a single 384-dimensional vector for every _token_ contained in the inputs we provided. Our second text `\"some more text goes here too\"` contains _eight_ tokens, and so this is where the value `8` is coming from.\n",
467
    "\n",
468
    "So, if we were to take a look at one of these vectors we should find the dimensionality of `384`:"
469
   ]
470
  },
471
  {
472
   "cell_type": "code",
473
   "execution_count": 14,
474
   "metadata": {
475
    "tags": []
476
   },
477
   "outputs": [
478
    {
479
     "data": {
480
      "text/plain": [
481
       "384"
482
      ]
483
     },
484
     "execution_count": 14,
485
     "metadata": {},
486
     "output_type": "execute_result"
487
    }
488
   ],
489
   "source": [
490
    "len(out[0][0])"
491
   ]
492
  },
493
  {
494
   "attachments": {},
495
   "cell_type": "markdown",
496
   "metadata": {},
497
   "source": [
498
    "Perfect! There's just one problem, how do we transform these eight vector embeddings into a single _sentence embedding_? For this, we simply take the mean value across each vector dimension, like so:"
499
   ]
500
  },
501
  {
502
   "cell_type": "code",
503
   "execution_count": 15,
504
   "metadata": {
505
    "tags": []
506
   },
507
   "outputs": [
508
    {
509
     "data": {
510
      "text/plain": [
511
       "(2, 384)"
512
      ]
513
     },
514
     "execution_count": 15,
515
     "metadata": {},
516
     "output_type": "execute_result"
517
    }
518
   ],
519
   "source": [
520
    "import numpy as np\n",
521
    "\n",
522
    "embeddings = np.mean(np.array(out), axis=1)\n",
523
    "embeddings.shape"
524
   ]
525
  },
526
  {
527
   "attachments": {},
528
   "cell_type": "markdown",
529
   "metadata": {},
530
   "source": [
531
    "Now we have two 384-dimensional vector embeddings, one for each of our input texts. To make our lives easier later, we will wrap this encoding process into a single function:"
532
   ]
533
  },
534
  {
535
   "cell_type": "code",
536
   "execution_count": 16,
537
   "metadata": {
538
    "tags": []
539
   },
540
   "outputs": [],
541
   "source": [
542
    "from typing import List\n",
543
    "\n",
544
    "def embed_docs(docs: List[str]) -> List[List[float]]:\n",
545
    "    out = encoder.predict({'inputs': docs})\n",
546
    "    embeddings = np.mean(np.array(out), axis=1)\n",
547
    "    return embeddings.tolist()"
548
   ]
549
  },
550
  {
551
   "attachments": {},
552
   "cell_type": "markdown",
553
   "metadata": {},
554
   "source": [
555
    "### 4.2. Generate embedings for each of document in the knowledge library with the Sentence Transformer model.\n",
556
    "\n",
557
    "For the purpose of the demo we will use [Amazon SageMaker FAQs](https://aws.amazon.com/sagemaker/faqs/) as knowledge library. The data are formatted in a CSV file with two columns Question and Answer. We use **only** the Answer column as the documents of knowledge library, from which relevant documents are retrieved based on a query. \n",
558
    "\n",
559
    "**Each row in the CSV format dataset corresponds to a textual document. \n",
560
    "We will iterate each document to get its embedding vector via the MiniLM embedding model. \n",
561
    "For your purpose, you can replace the example dataset of your own to build a custom question and answering application.**\n"
562
   ]
563
  },
564
  {
565
   "attachments": {},
566
   "cell_type": "markdown",
567
   "metadata": {},
568
   "source": [
569
    "First, we download the dataset from our S3 bucket to the local."
570
   ]
571
  },
572
  {
573
   "cell_type": "code",
574
   "execution_count": 17,
575
   "metadata": {
576
    "tags": []
577
   },
578
   "outputs": [],
579
   "source": [
580
    "s3_path = f\"s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv\""
581
   ]
582
  },
583
  {
584
   "cell_type": "code",
585
   "execution_count": 18,
586
   "metadata": {
587
    "tags": []
588
   },
589
   "outputs": [
590
    {
591
     "name": "stdout",
592
     "output_type": "stream",
593
     "text": [
594
      "download: s3://jumpstart-cache-prod-us-east-2/training-datasets/Amazon_SageMaker_FAQs/Amazon_SageMaker_FAQs.csv to ./Amazon_SageMaker_FAQs.csv\n"
595
     ]
596
    }
597
   ],
598
   "source": [
599
    "# Downloading the Database\n",
600
    "!aws s3 cp $s3_path Amazon_SageMaker_FAQs.csv"
601
   ]
602
  },
603
  {
604
   "attachments": {},
605
   "cell_type": "markdown",
606
   "metadata": {},
607
   "source": [
608
    "Open the dataset with Pandas:"
609
   ]
610
  },
611
  {
612
   "cell_type": "code",
613
   "execution_count": 19,
614
   "metadata": {
615
    "tags": []
616
   },
617
   "outputs": [
618
    {
619
     "data": {
620
      "text/html": [
621
       "<div>\n",
622
       "<style scoped>\n",
623
       "    .dataframe tbody tr th:only-of-type {\n",
624
       "        vertical-align: middle;\n",
625
       "    }\n",
626
       "\n",
627
       "    .dataframe tbody tr th {\n",
628
       "        vertical-align: top;\n",
629
       "    }\n",
630
       "\n",
631
       "    .dataframe thead th {\n",
632
       "        text-align: right;\n",
633
       "    }\n",
634
       "</style>\n",
635
       "<table border=\"1\" class=\"dataframe\">\n",
636
       "  <thead>\n",
637
       "    <tr style=\"text-align: right;\">\n",
638
       "      <th></th>\n",
639
       "      <th>Question</th>\n",
640
       "      <th>Answer</th>\n",
641
       "    </tr>\n",
642
       "  </thead>\n",
643
       "  <tbody>\n",
644
       "    <tr>\n",
645
       "      <th>0</th>\n",
646
       "      <td>What is Amazon SageMaker?</td>\n",
647
       "      <td>Amazon SageMaker is a fully managed service to...</td>\n",
648
       "    </tr>\n",
649
       "    <tr>\n",
650
       "      <th>1</th>\n",
651
       "      <td>In which Regions is Amazon SageMaker available...</td>\n",
652
       "      <td>For a list of the supported Amazon SageMaker A...</td>\n",
653
       "    </tr>\n",
654
       "    <tr>\n",
655
       "      <th>2</th>\n",
656
       "      <td>What is the service availability of Amazon Sag...</td>\n",
657
       "      <td>Amazon SageMaker is designed for high availabi...</td>\n",
658
       "    </tr>\n",
659
       "    <tr>\n",
660
       "      <th>3</th>\n",
661
       "      <td>How does Amazon SageMaker secure my code?</td>\n",
662
       "      <td>Amazon SageMaker stores code in ML storage vol...</td>\n",
663
       "    </tr>\n",
664
       "    <tr>\n",
665
       "      <th>4</th>\n",
666
       "      <td>What security measures does Amazon SageMaker h...</td>\n",
667
       "      <td>Amazon SageMaker ensures that ML model artifac...</td>\n",
668
       "    </tr>\n",
669
       "  </tbody>\n",
670
       "</table>\n",
671
       "</div>"
672
      ],
673
      "text/plain": [
674
       "                                            Question  \\\n",
675
       "0                          What is Amazon SageMaker?   \n",
676
       "1  In which Regions is Amazon SageMaker available...   \n",
677
       "2  What is the service availability of Amazon Sag...   \n",
678
       "3          How does Amazon SageMaker secure my code?   \n",
679
       "4  What security measures does Amazon SageMaker h...   \n",
680
       "\n",
681
       "                                              Answer  \n",
682
       "0  Amazon SageMaker is a fully managed service to...  \n",
683
       "1  For a list of the supported Amazon SageMaker A...  \n",
684
       "2  Amazon SageMaker is designed for high availabi...  \n",
685
       "3  Amazon SageMaker stores code in ML storage vol...  \n",
686
       "4  Amazon SageMaker ensures that ML model artifac...  "
687
      ]
688
     },
689
     "execution_count": 19,
690
     "metadata": {},
691
     "output_type": "execute_result"
692
    }
693
   ],
694
   "source": [
695
    "import pandas as pd\n",
696
    "\n",
697
    "df_knowledge = pd.read_csv(\"Amazon_SageMaker_FAQs.csv\", header=None, names=[\"Question\", \"Answer\"])\n",
698
    "df_knowledge.head()"
699
   ]
700
  },
701
  {
702
   "attachments": {},
703
   "cell_type": "markdown",
704
   "metadata": {},
705
   "source": [
706
    "Drop the `Question` column since it is not used in this notebook."
707
   ]
708
  },
709
  {
710
   "cell_type": "code",
711
   "execution_count": 20,
712
   "metadata": {
713
    "tags": []
714
   },
715
   "outputs": [
716
    {
717
     "data": {
718
      "text/html": [
719
       "<div>\n",
720
       "<style scoped>\n",
721
       "    .dataframe tbody tr th:only-of-type {\n",
722
       "        vertical-align: middle;\n",
723
       "    }\n",
724
       "\n",
725
       "    .dataframe tbody tr th {\n",
726
       "        vertical-align: top;\n",
727
       "    }\n",
728
       "\n",
729
       "    .dataframe thead th {\n",
730
       "        text-align: right;\n",
731
       "    }\n",
732
       "</style>\n",
733
       "<table border=\"1\" class=\"dataframe\">\n",
734
       "  <thead>\n",
735
       "    <tr style=\"text-align: right;\">\n",
736
       "      <th></th>\n",
737
       "      <th>Answer</th>\n",
738
       "    </tr>\n",
739
       "  </thead>\n",
740
       "  <tbody>\n",
741
       "    <tr>\n",
742
       "      <th>0</th>\n",
743
       "      <td>Amazon SageMaker is a fully managed service to...</td>\n",
744
       "    </tr>\n",
745
       "    <tr>\n",
746
       "      <th>1</th>\n",
747
       "      <td>For a list of the supported Amazon SageMaker A...</td>\n",
748
       "    </tr>\n",
749
       "    <tr>\n",
750
       "      <th>2</th>\n",
751
       "      <td>Amazon SageMaker is designed for high availabi...</td>\n",
752
       "    </tr>\n",
753
       "    <tr>\n",
754
       "      <th>3</th>\n",
755
       "      <td>Amazon SageMaker stores code in ML storage vol...</td>\n",
756
       "    </tr>\n",
757
       "    <tr>\n",
758
       "      <th>4</th>\n",
759
       "      <td>Amazon SageMaker ensures that ML model artifac...</td>\n",
760
       "    </tr>\n",
761
       "  </tbody>\n",
762
       "</table>\n",
763
       "</div>"
764
      ],
765
      "text/plain": [
766
       "                                              Answer\n",
767
       "0  Amazon SageMaker is a fully managed service to...\n",
768
       "1  For a list of the supported Amazon SageMaker A...\n",
769
       "2  Amazon SageMaker is designed for high availabi...\n",
770
       "3  Amazon SageMaker stores code in ML storage vol...\n",
771
       "4  Amazon SageMaker ensures that ML model artifac..."
772
      ]
773
     },
774
     "execution_count": 20,
775
     "metadata": {},
776
     "output_type": "execute_result"
777
    }
778
   ],
779
   "source": [
780
    "df_knowledge.drop([\"Question\"], axis=1, inplace=True)\n",
781
    "df_knowledge.head()"
782
   ]
783
  },
784
  {
785
   "attachments": {},
786
   "cell_type": "markdown",
787
   "metadata": {
788
    "tags": []
789
   },
790
   "source": [
791
    "Next we can initialize our connection to **Pinecone**. To do this we need a [free API key](https://app.pinecone.io)."
792
   ]
793
  },
794
  {
795
   "cell_type": "code",
796
   "execution_count": 21,
797
   "metadata": {
798
    "tags": []
799
   },
800
   "outputs": [
801
    {
802
     "name": "stderr",
803
     "output_type": "stream",
804
     "text": [
805
      "/opt/conda/lib/python3.7/site-packages/pinecone/index.py:4: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)\n",
806
      "  from tqdm.autonotebook import tqdm\n"
807
     ]
808
    }
809
   ],
810
   "source": [
811
    "from pinecone import Pinecone\n",
812
    "import os\n",
813
    "\n",
814
    "# add Pinecone API key from app.pinecone.io\n",
815
    "api_key = os.environ.get(\"PINECONE_API_KEY\") or \"YOUR_API_KEY\"\n",
816
    "# set Pinecone environment - find next to API key in console\n",
817
    "env = os.environ.get(\"PINECONE_ENVIRONMENT\") or \"YOUR_ENV\"\n",
818
    "\n",
819
    "pinecone.init(\n",
820
    "    api_key=api_key,\n",
821
    "    environment=env\n",
822
    ")"
823
   ]
824
  },
825
  {
826
   "attachments": {},
827
   "cell_type": "markdown",
828
   "metadata": {},
829
   "source": [
830
    "List all present indexes associated with your key, should be empty on the first run"
831
   ]
832
  },
833
  {
834
   "cell_type": "code",
835
   "execution_count": 22,
836
   "metadata": {
837
    "tags": []
838
   },
839
   "outputs": [
840
    {
841
     "data": {
842
      "text/plain": [
843
       "['jumpstart-minilm-l6',\n",
844
       " 'retrieval-augmentation-aws-6j',\n",
845
       " 'retrieval-augmentation-aws']"
846
      ]
847
     },
848
     "execution_count": 22,
849
     "metadata": {},
850
     "output_type": "execute_result"
851
    }
852
   ],
853
   "source": [
854
    "pinecone.list_indexes().names()"
855
   ]
856
  },
857
  {
858
   "attachments": {},
859
   "cell_type": "markdown",
860
   "metadata": {},
861
   "source": [
862
    "Now we create a new index called `retrieval-augmentation-aws`. It's important that we align the index `dimension` and `metric` parameters with those required by the MiniLM model."
863
   ]
864
  },
865
  {
866
   "cell_type": "code",
867
   "execution_count": 23,
868
   "metadata": {
869
    "tags": []
870
   },
871
   "outputs": [],
872
   "source": [
873
    "import time\n",
874
    "\n",
875
    "index_name = 'retrieval-augmentation-aws'\n",
876
    "\n",
877
    "if index_name in pinecone.list_indexes().names():\n",
878
    "    pinecone.delete_index(index_name)\n",
879
    "    \n",
880
    "pinecone.create_index(\n",
881
    "    name=index_name,\n",
882
    "    dimension=embeddings.shape[1],\n",
883
    "    metric='cosine'\n",
884
    ")\n",
885
    "# wait for index to finish initialization\n",
886
    "while not pinecone.describe_index(index_name).status['ready']:\n",
887
    "    time.sleep(1)"
888
   ]
889
  },
890
  {
891
   "cell_type": "code",
892
   "execution_count": 24,
893
   "metadata": {
894
    "tags": []
895
   },
896
   "outputs": [
897
    {
898
     "data": {
899
      "text/plain": [
900
       "['jumpstart-minilm-l6',\n",
901
       " 'retrieval-augmentation-aws-6j',\n",
902
       " 'retrieval-augmentation-aws']"
903
      ]
904
     },
905
     "execution_count": 24,
906
     "metadata": {},
907
     "output_type": "execute_result"
908
    }
909
   ],
910
   "source": [
911
    "pinecone.list_indexes().names()"
912
   ]
913
  },
914
  {
915
   "attachments": {},
916
   "cell_type": "markdown",
917
   "metadata": {},
918
   "source": [
919
    "Now we upsert the data, we will do this in batches of `128`."
920
   ]
921
  },
922
  {
923
   "cell_type": "code",
924
   "execution_count": 29,
925
   "metadata": {
926
    "scrolled": true,
927
    "tags": []
928
   },
929
   "outputs": [
930
    {
931
     "data": {
932
      "application/vnd.jupyter.widget-view+json": {
933
       "model_id": "9adb133d7847415a97322d3d107a9926",
934
       "version_major": 2,
935
       "version_minor": 0
936
      },
937
      "text/plain": [
938
       "A Jupyter Widget"
939
      ]
940
     },
941
     "metadata": {},
942
     "output_type": "display_data"
943
    }
944
   ],
945
   "source": [
946
    "from tqdm.auto import tqdm\n",
947
    "\n",
948
    "batch_size = 2  # can increase but needs larger instance size otherwise instance runs out of memory\n",
949
    "vector_limit = 1000\n",
950
    "\n",
951
    "answers = df_knowledge[:vector_limit]\n",
952
    "index = pinecone.Index(index_name)\n",
953
    "\n",
954
    "for i in tqdm(range(0, len(answers), batch_size)):\n",
955
    "    # find end of batch\n",
956
    "    i_end = min(i+batch_size, len(answers))\n",
957
    "    # create IDs batch\n",
958
    "    ids = [str(x) for x in range(i, i_end)]\n",
959
    "    # create metadata batch\n",
960
    "    metadatas = [{'text': text} for text in answers[\"Answer\"][i:i_end]]\n",
961
    "    # create embeddings\n",
962
    "    texts = answers[\"Answer\"][i:i_end].tolist()\n",
963
    "    embeddings = embed_docs(texts)\n",
964
    "    # create records list for upsert\n",
965
    "    records = zip(ids, embeddings, metadatas)\n",
966
    "    # upsert to Pinecone\n",
967
    "    index.upsert(vectors=records)"
968
   ]
969
  },
970
  {
971
   "cell_type": "code",
972
   "execution_count": 30,
973
   "metadata": {
974
    "tags": []
975
   },
976
   "outputs": [
977
    {
978
     "data": {
979
      "text/plain": [
980
       "{'dimension': 384,\n",
981
       " 'index_fullness': 0.0,\n",
982
       " 'namespaces': {'': {'vector_count': 154}},\n",
983
       " 'total_vector_count': 154}"
984
      ]
985
     },
986
     "execution_count": 30,
987
     "metadata": {},
988
     "output_type": "execute_result"
989
    }
990
   ],
991
   "source": [
992
    "# check number of records in the index\n",
993
    "index.describe_index_stats()"
994
   ]
995
  },
996
  {
997
   "attachments": {},
998
   "cell_type": "markdown",
999
   "metadata": {
1000
    "tags": []
1001
   },
1002
   "source": [
1003
    "### 4.5 Combine the retrieved documents, prompt, and question to query the LLM"
1004
   ]
1005
  },
1006
  {
1007
   "attachments": {},
1008
   "cell_type": "markdown",
1009
   "metadata": {},
1010
   "source": [
1011
    "Now we're ready begin querying our LLM with a **R**etrieval **A**ugmented **G**eneration (RAG) pipeline. Let's see how this will work step-by-step first."
1012
   ]
1013
  },
1014
  {
1015
   "cell_type": "code",
1016
   "execution_count": 31,
1017
   "metadata": {
1018
    "tags": []
1019
   },
1020
   "outputs": [
1021
    {
1022
     "data": {
1023
      "text/plain": [
1024
       "'Which instances can I use with Managed Spot Training in SageMaker?'"
1025
      ]
1026
     },
1027
     "execution_count": 31,
1028
     "metadata": {},
1029
     "output_type": "execute_result"
1030
    }
1031
   ],
1032
   "source": [
1033
    "question"
1034
   ]
1035
  },
1036
  {
1037
   "attachments": {},
1038
   "cell_type": "markdown",
1039
   "metadata": {},
1040
   "source": [
1041
    "First we create our _query embedding_ and use it to query Pinecone:"
1042
   ]
1043
  },
1044
  {
1045
   "cell_type": "code",
1046
   "execution_count": 32,
1047
   "metadata": {
1048
    "tags": []
1049
   },
1050
   "outputs": [
1051
    {
1052
     "data": {
1053
      "text/plain": [
1054
       "{'matches': [{'id': '90',\n",
1055
       "              'metadata': {'text': 'Managed Spot Training can be used with all '\n",
1056
       "                                   'instances supported in Amazon '\n",
1057
       "                                   'SageMaker.\\r\\n'},\n",
1058
       "              'score': 0.881181657,\n",
1059
       "              'values': []},\n",
1060
       "             {'id': '91',\n",
1061
       "              'metadata': {'text': 'Managed Spot Training is supported in all '\n",
1062
       "                                   'AWS Regions where Amazon SageMaker is '\n",
1063
       "                                   'currently available.\\r\\n'},\n",
1064
       "              'score': 0.799186468,\n",
1065
       "              'values': []},\n",
1066
       "             {'id': '85',\n",
1067
       "              'metadata': {'text': 'You enable the Managed Spot Training '\n",
1068
       "                                   'option when submitting your training jobs '\n",
1069
       "                                   'and you also specify how long you want to '\n",
1070
       "                                   'wait for Spot capacity. Amazon SageMaker '\n",
1071
       "                                   'will then use Amazon EC2 Spot instances to '\n",
1072
       "                                   'run your job and manages the Spot '\n",
1073
       "                                   'capacity. You have full visibility into '\n",
1074
       "                                   'the status of your training jobs, both '\n",
1075
       "                                   'while they are running and while they are '\n",
1076
       "                                   'waiting for capacity.'},\n",
1077
       "              'score': 0.733643115,\n",
1078
       "              'values': []},\n",
1079
       "             {'id': '84',\n",
1080
       "              'metadata': {'text': 'Managed Spot Training with Amazon '\n",
1081
       "                                   'SageMaker lets you train your ML models '\n",
1082
       "                                   'using Amazon EC2 Spot instances, while '\n",
1083
       "                                   'reducing the cost of training your models '\n",
1084
       "                                   'by up to 90%.'},\n",
1085
       "              'score': 0.722585857,\n",
1086
       "              'values': []},\n",
1087
       "             {'id': '87',\n",
1088
       "              'metadata': {'text': 'Managed Spot Training uses Amazon EC2 Spot '\n",
1089
       "                                   'instances for training, and these '\n",
1090
       "                                   'instances can be pre-empted when AWS needs '\n",
1091
       "                                   'capacity. As a result, Managed Spot '\n",
1092
       "                                   'Training jobs can run in small increments '\n",
1093
       "                                   'as and when capacity becomes available. '\n",
1094
       "                                   'The training jobs need not be restarted '\n",
1095
       "                                   'from scratch when there is an '\n",
1096
       "                                   'interruption, as Amazon SageMaker can '\n",
1097
       "                                   'resume the training jobs using the latest '\n",
1098
       "                                   'model checkpoint. The built-in frameworks '\n",
1099
       "                                   'and the built-in computer vision '\n",
1100
       "                                   'algorithms with SageMaker enable periodic '\n",
1101
       "                                   'checkpoints, and you can enable '\n",
1102
       "                                   'checkpoints with custom models.'},\n",
1103
       "              'score': 0.7210114,\n",
1104
       "              'values': []}],\n",
1105
       " 'namespace': ''}"
1106
      ]
1107
     },
1108
     "execution_count": 32,
1109
     "metadata": {},
1110
     "output_type": "execute_result"
1111
    }
1112
   ],
1113
   "source": [
1114
    "# extract embeddings for the questions\n",
1115
    "query_vec = embed_docs(question)[0]\n",
1116
    "\n",
1117
    "# query pinecone\n",
1118
    "res = index.query(vector=query_vec, top_k=5, include_metadata=True)\n",
1119
    "\n",
1120
    "# show the results\n",
1121
    "res"
1122
   ]
1123
  },
1124
  {
1125
   "attachments": {},
1126
   "cell_type": "markdown",
1127
   "metadata": {},
1128
   "source": [
1129
    "We get multiple relevant contexts here. We can use these to contruct a single `context` to feed into our LLM prompt."
1130
   ]
1131
  },
1132
  {
1133
   "cell_type": "code",
1134
   "execution_count": 33,
1135
   "metadata": {
1136
    "tags": []
1137
   },
1138
   "outputs": [],
1139
   "source": [
1140
    "contexts = [match.metadata['text'] for match in res.matches]"
1141
   ]
1142
  },
1143
  {
1144
   "cell_type": "code",
1145
   "execution_count": 34,
1146
   "metadata": {
1147
    "tags": []
1148
   },
1149
   "outputs": [],
1150
   "source": [
1151
    "max_section_len = 1000\n",
1152
    "separator = \"\\n\"\n",
1153
    "\n",
1154
    "def construct_context(contexts: List[str]) -> str:\n",
1155
    "    chosen_sections = []\n",
1156
    "    chosen_sections_len = 0\n",
1157
    "\n",
1158
    "    for text in contexts:\n",
1159
    "        text = text.strip()\n",
1160
    "        # Add contexts until we run out of space.\n",
1161
    "        chosen_sections_len += len(text) + 2\n",
1162
    "        if chosen_sections_len > max_section_len:\n",
1163
    "            break\n",
1164
    "        chosen_sections.append(text)\n",
1165
    "    concatenated_doc = separator.join(chosen_sections)\n",
1166
    "    print(\n",
1167
    "        f\"With maximum sequence length {max_section_len}, selected top {len(chosen_sections)} document sections: \\n{concatenated_doc}\"\n",
1168
    "    )\n",
1169
    "    return concatenated_doc"
1170
   ]
1171
  },
1172
  {
1173
   "cell_type": "code",
1174
   "execution_count": 35,
1175
   "metadata": {
1176
    "tags": []
1177
   },
1178
   "outputs": [
1179
    {
1180
     "name": "stdout",
1181
     "output_type": "stream",
1182
     "text": [
1183
      "With maximum sequence length 1000, selected top 4 document sections: \n",
1184
      "Managed Spot Training can be used with all instances supported in Amazon SageMaker.\n",
1185
      "Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available.\n",
1186
      "You enable the Managed Spot Training option when submitting your training jobs and you also specify how long you want to wait for Spot capacity. Amazon SageMaker will then use Amazon EC2 Spot instances to run your job and manages the Spot capacity. You have full visibility into the status of your training jobs, both while they are running and while they are waiting for capacity.\n",
1187
      "Managed Spot Training with Amazon SageMaker lets you train your ML models using Amazon EC2 Spot instances, while reducing the cost of training your models by up to 90%.\n"
1188
     ]
1189
    }
1190
   ],
1191
   "source": [
1192
    "context_str = construct_context(contexts=contexts)"
1193
   ]
1194
  },
1195
  {
1196
   "attachments": {},
1197
   "cell_type": "markdown",
1198
   "metadata": {},
1199
   "source": [
1200
    "We would then feed this `context_str` into our LLM prompt:"
1201
   ]
1202
  },
1203
  {
1204
   "cell_type": "code",
1205
   "execution_count": 36,
1206
   "metadata": {
1207
    "tags": []
1208
   },
1209
   "outputs": [
1210
    {
1211
     "name": "stdout",
1212
     "output_type": "stream",
1213
     "text": [
1214
      "[Input]: Which instances can I use with Managed Spot Training in SageMaker?\n",
1215
      "[Output]: all instances supported in Amazon SageMaker\n"
1216
     ]
1217
    }
1218
   ],
1219
   "source": [
1220
    "text_input = prompt_template.replace(\"{context}\", context_str).replace(\"{question}\", question)\n",
1221
    "\n",
1222
    "out = llm.predict({\"inputs\": text_input})\n",
1223
    "generated_text = out[0][\"generated_text\"]\n",
1224
    "print(f\"[Input]: {question}\\n[Output]: {generated_text}\")"
1225
   ]
1226
  },
1227
  {
1228
   "attachments": {},
1229
   "cell_type": "markdown",
1230
   "metadata": {},
1231
   "source": [
1232
    "Let's place all of this logic into a single RAG query function:"
1233
   ]
1234
  },
1235
  {
1236
   "cell_type": "code",
1237
   "execution_count": 37,
1238
   "metadata": {},
1239
   "outputs": [],
1240
   "source": [
1241
    "def rag_query(question: str) -> str:\n",
1242
    "    # create query vec\n",
1243
    "    query_vec = embed_docs(question)[0]\n",
1244
    "    # query pinecone\n",
1245
    "    res = index.query(vector=query_vec, top_k=5, include_metadata=True)\n",
1246
    "    # get contexts\n",
1247
    "    contexts = [match.metadata['text'] for match in res.matches]\n",
1248
    "    # build the multiple contexts string\n",
1249
    "    context_str = construct_context(contexts=contexts)\n",
1250
    "    # create our retrieval augmented prompt\n",
1251
    "    text_input = prompt_template.replace(\"{context}\", context_str).replace(\"{question}\", question)\n",
1252
    "    # make prediction\n",
1253
    "    out = llm.predict({\"inputs\": text_input})\n",
1254
    "    return out[0][\"generated_text\"]"
1255
   ]
1256
  },
1257
  {
1258
   "attachments": {},
1259
   "cell_type": "markdown",
1260
   "metadata": {},
1261
   "source": [
1262
    "We can now ask the question:"
1263
   ]
1264
  },
1265
  {
1266
   "cell_type": "code",
1267
   "execution_count": 38,
1268
   "metadata": {},
1269
   "outputs": [
1270
    {
1271
     "name": "stdout",
1272
     "output_type": "stream",
1273
     "text": [
1274
      "With maximum sequence length 1000, selected top 4 document sections: \n",
1275
      "Managed Spot Training can be used with all instances supported in Amazon SageMaker.\n",
1276
      "Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available.\n",
1277
      "You enable the Managed Spot Training option when submitting your training jobs and you also specify how long you want to wait for Spot capacity. Amazon SageMaker will then use Amazon EC2 Spot instances to run your job and manages the Spot capacity. You have full visibility into the status of your training jobs, both while they are running and while they are waiting for capacity.\n",
1278
      "Managed Spot Training with Amazon SageMaker lets you train your ML models using Amazon EC2 Spot instances, while reducing the cost of training your models by up to 90%.\n"
1279
     ]
1280
    },
1281
    {
1282
     "data": {
1283
      "text/plain": [
1284
       "\"all instances supported in Amazon SageMaker\""
1285
      ]
1286
     },
1287
     "execution_count": 38,
1288
     "metadata": {},
1289
     "output_type": "execute_result"
1290
    }
1291
   ],
1292
   "source": [
1293
    "rag_query(\"Which instances can I use with Managed Spot Training in SageMaker?\")"
1294
   ]
1295
  },
1296
  {
1297
   "attachments": {},
1298
   "cell_type": "markdown",
1299
   "metadata": {},
1300
   "source": [
1301
    "We can also ask questions about things that are out of context (not contained within our dataset). From this we expect the model to *not* hallucinate and honestly tell us that it does not know the answer:"
1302
   ]
1303
  },
1304
  {
1305
   "cell_type": "code",
1306
   "execution_count": 39,
1307
   "metadata": {},
1308
   "outputs": [
1309
    {
1310
     "name": "stdout",
1311
     "output_type": "stream",
1312
     "text": [
1313
      "With maximum sequence length 1000, selected top 1 document sections: \n",
1314
      "To get started with Amazon SageMaker Edge Manager, you need to compile and package your trained ML models in the cloud, register your devices, and prepare your devices with the SageMaker Edge Manager SDK. To prepare your model for deployment, SageMaker Edge Manager uses SageMaker Neo to compile your model for your target edge hardware. Once a model is compiled, SageMaker Edge Manager signs the model with an AWS generated key, then packages the model with its runtime and your necessary credentials to get it ready for deployment. On the device side, you register your device with SageMaker Edge Manager, download the SageMaker Edge Manager SDK, and then follow the instructions to install the SageMaker Edge Manager agent on your devices. The tutorial notebook provides a step-by-step example of how you can prepare the models and connect your models on edge devices with SageMaker Edge Manager.\n"
1315
     ]
1316
    },
1317
    {
1318
     "data": {
1319
      "text/plain": [
1320
       "\"I don't know\""
1321
      ]
1322
     },
1323
     "execution_count": 38,
1324
     "metadata": {},
1325
     "output_type": "execute_result"
1326
    }
1327
   ],
1328
   "source": [
1329
    "rag_query(\"How do I create a Hugging Face instance on Sagemaker?\")"
1330
   ]
1331
  },
1332
  {
1333
   "attachments": {},
1334
   "cell_type": "markdown",
1335
   "metadata": {},
1336
   "source": [
1337
    "---"
1338
   ]
1339
  }
1340
 ],
1341
 "metadata": {
1342
  "availableInstances": [
1343
   {
1344
    "_defaultOrder": 0,
1345
    "_isFastLaunch": true,
1346
    "category": "General purpose",
1347
    "gpuNum": 0,
1348
    "hideHardwareSpecs": false,
1349
    "memoryGiB": 4,
1350
    "name": "ml.t3.medium",
1351
    "vcpuNum": 2
1352
   },
1353
   {
1354
    "_defaultOrder": 1,
1355
    "_isFastLaunch": false,
1356
    "category": "General purpose",
1357
    "gpuNum": 0,
1358
    "hideHardwareSpecs": false,
1359
    "memoryGiB": 8,
1360
    "name": "ml.t3.large",
1361
    "vcpuNum": 2
1362
   },
1363
   {
1364
    "_defaultOrder": 2,
1365
    "_isFastLaunch": false,
1366
    "category": "General purpose",
1367
    "gpuNum": 0,
1368
    "hideHardwareSpecs": false,
1369
    "memoryGiB": 16,
1370
    "name": "ml.t3.xlarge",
1371
    "vcpuNum": 4
1372
   },
1373
   {
1374
    "_defaultOrder": 3,
1375
    "_isFastLaunch": false,
1376
    "category": "General purpose",
1377
    "gpuNum": 0,
1378
    "hideHardwareSpecs": false,
1379
    "memoryGiB": 32,
1380
    "name": "ml.t3.2xlarge",
1381
    "vcpuNum": 8
1382
   },
1383
   {
1384
    "_defaultOrder": 4,
1385
    "_isFastLaunch": true,
1386
    "category": "General purpose",
1387
    "gpuNum": 0,
1388
    "hideHardwareSpecs": false,
1389
    "memoryGiB": 8,
1390
    "name": "ml.m5.large",
1391
    "vcpuNum": 2
1392
   },
1393
   {
1394
    "_defaultOrder": 5,
1395
    "_isFastLaunch": false,
1396
    "category": "General purpose",
1397
    "gpuNum": 0,
1398
    "hideHardwareSpecs": false,
1399
    "memoryGiB": 16,
1400
    "name": "ml.m5.xlarge",
1401
    "vcpuNum": 4
1402
   },
1403
   {
1404
    "_defaultOrder": 6,
1405
    "_isFastLaunch": false,
1406
    "category": "General purpose",
1407
    "gpuNum": 0,
1408
    "hideHardwareSpecs": false,
1409
    "memoryGiB": 32,
1410
    "name": "ml.m5.2xlarge",
1411
    "vcpuNum": 8
1412
   },
1413
   {
1414
    "_defaultOrder": 7,
1415
    "_isFastLaunch": false,
1416
    "category": "General purpose",
1417
    "gpuNum": 0,
1418
    "hideHardwareSpecs": false,
1419
    "memoryGiB": 64,
1420
    "name": "ml.m5.4xlarge",
1421
    "vcpuNum": 16
1422
   },
1423
   {
1424
    "_defaultOrder": 8,
1425
    "_isFastLaunch": false,
1426
    "category": "General purpose",
1427
    "gpuNum": 0,
1428
    "hideHardwareSpecs": false,
1429
    "memoryGiB": 128,
1430
    "name": "ml.m5.8xlarge",
1431
    "vcpuNum": 32
1432
   },
1433
   {
1434
    "_defaultOrder": 9,
1435
    "_isFastLaunch": false,
1436
    "category": "General purpose",
1437
    "gpuNum": 0,
1438
    "hideHardwareSpecs": false,
1439
    "memoryGiB": 192,
1440
    "name": "ml.m5.12xlarge",
1441
    "vcpuNum": 48
1442
   },
1443
   {
1444
    "_defaultOrder": 10,
1445
    "_isFastLaunch": false,
1446
    "category": "General purpose",
1447
    "gpuNum": 0,
1448
    "hideHardwareSpecs": false,
1449
    "memoryGiB": 256,
1450
    "name": "ml.m5.16xlarge",
1451
    "vcpuNum": 64
1452
   },
1453
   {
1454
    "_defaultOrder": 11,
1455
    "_isFastLaunch": false,
1456
    "category": "General purpose",
1457
    "gpuNum": 0,
1458
    "hideHardwareSpecs": false,
1459
    "memoryGiB": 384,
1460
    "name": "ml.m5.24xlarge",
1461
    "vcpuNum": 96
1462
   },
1463
   {
1464
    "_defaultOrder": 12,
1465
    "_isFastLaunch": false,
1466
    "category": "General purpose",
1467
    "gpuNum": 0,
1468
    "hideHardwareSpecs": false,
1469
    "memoryGiB": 8,
1470
    "name": "ml.m5d.large",
1471
    "vcpuNum": 2
1472
   },
1473
   {
1474
    "_defaultOrder": 13,
1475
    "_isFastLaunch": false,
1476
    "category": "General purpose",
1477
    "gpuNum": 0,
1478
    "hideHardwareSpecs": false,
1479
    "memoryGiB": 16,
1480
    "name": "ml.m5d.xlarge",
1481
    "vcpuNum": 4
1482
   },
1483
   {
1484
    "_defaultOrder": 14,
1485
    "_isFastLaunch": false,
1486
    "category": "General purpose",
1487
    "gpuNum": 0,
1488
    "hideHardwareSpecs": false,
1489
    "memoryGiB": 32,
1490
    "name": "ml.m5d.2xlarge",
1491
    "vcpuNum": 8
1492
   },
1493
   {
1494
    "_defaultOrder": 15,
1495
    "_isFastLaunch": false,
1496
    "category": "General purpose",
1497
    "gpuNum": 0,
1498
    "hideHardwareSpecs": false,
1499
    "memoryGiB": 64,
1500
    "name": "ml.m5d.4xlarge",
1501
    "vcpuNum": 16
1502
   },
1503
   {
1504
    "_defaultOrder": 16,
1505
    "_isFastLaunch": false,
1506
    "category": "General purpose",
1507
    "gpuNum": 0,
1508
    "hideHardwareSpecs": false,
1509
    "memoryGiB": 128,
1510
    "name": "ml.m5d.8xlarge",
1511
    "vcpuNum": 32
1512
   },
1513
   {
1514
    "_defaultOrder": 17,
1515
    "_isFastLaunch": false,
1516
    "category": "General purpose",
1517
    "gpuNum": 0,
1518
    "hideHardwareSpecs": false,
1519
    "memoryGiB": 192,
1520
    "name": "ml.m5d.12xlarge",
1521
    "vcpuNum": 48
1522
   },
1523
   {
1524
    "_defaultOrder": 18,
1525
    "_isFastLaunch": false,
1526
    "category": "General purpose",
1527
    "gpuNum": 0,
1528
    "hideHardwareSpecs": false,
1529
    "memoryGiB": 256,
1530
    "name": "ml.m5d.16xlarge",
1531
    "vcpuNum": 64
1532
   },
1533
   {
1534
    "_defaultOrder": 19,
1535
    "_isFastLaunch": false,
1536
    "category": "General purpose",
1537
    "gpuNum": 0,
1538
    "hideHardwareSpecs": false,
1539
    "memoryGiB": 384,
1540
    "name": "ml.m5d.24xlarge",
1541
    "vcpuNum": 96
1542
   },
1543
   {
1544
    "_defaultOrder": 20,
1545
    "_isFastLaunch": false,
1546
    "category": "General purpose",
1547
    "gpuNum": 0,
1548
    "hideHardwareSpecs": true,
1549
    "memoryGiB": 0,
1550
    "name": "ml.geospatial.interactive",
1551
    "supportedImageNames": [
1552
     "sagemaker-geospatial-v1-0"
1553
    ],
1554
    "vcpuNum": 0
1555
   },
1556
   {
1557
    "_defaultOrder": 21,
1558
    "_isFastLaunch": true,
1559
    "category": "Compute optimized",
1560
    "gpuNum": 0,
1561
    "hideHardwareSpecs": false,
1562
    "memoryGiB": 4,
1563
    "name": "ml.c5.large",
1564
    "vcpuNum": 2
1565
   },
1566
   {
1567
    "_defaultOrder": 22,
1568
    "_isFastLaunch": false,
1569
    "category": "Compute optimized",
1570
    "gpuNum": 0,
1571
    "hideHardwareSpecs": false,
1572
    "memoryGiB": 8,
1573
    "name": "ml.c5.xlarge",
1574
    "vcpuNum": 4
1575
   },
1576
   {
1577
    "_defaultOrder": 23,
1578
    "_isFastLaunch": false,
1579
    "category": "Compute optimized",
1580
    "gpuNum": 0,
1581
    "hideHardwareSpecs": false,
1582
    "memoryGiB": 16,
1583
    "name": "ml.c5.2xlarge",
1584
    "vcpuNum": 8
1585
   },
1586
   {
1587
    "_defaultOrder": 24,
1588
    "_isFastLaunch": false,
1589
    "category": "Compute optimized",
1590
    "gpuNum": 0,
1591
    "hideHardwareSpecs": false,
1592
    "memoryGiB": 32,
1593
    "name": "ml.c5.4xlarge",
1594
    "vcpuNum": 16
1595
   },
1596
   {
1597
    "_defaultOrder": 25,
1598
    "_isFastLaunch": false,
1599
    "category": "Compute optimized",
1600
    "gpuNum": 0,
1601
    "hideHardwareSpecs": false,
1602
    "memoryGiB": 72,
1603
    "name": "ml.c5.9xlarge",
1604
    "vcpuNum": 36
1605
   },
1606
   {
1607
    "_defaultOrder": 26,
1608
    "_isFastLaunch": false,
1609
    "category": "Compute optimized",
1610
    "gpuNum": 0,
1611
    "hideHardwareSpecs": false,
1612
    "memoryGiB": 96,
1613
    "name": "ml.c5.12xlarge",
1614
    "vcpuNum": 48
1615
   },
1616
   {
1617
    "_defaultOrder": 27,
1618
    "_isFastLaunch": false,
1619
    "category": "Compute optimized",
1620
    "gpuNum": 0,
1621
    "hideHardwareSpecs": false,
1622
    "memoryGiB": 144,
1623
    "name": "ml.c5.18xlarge",
1624
    "vcpuNum": 72
1625
   },
1626
   {
1627
    "_defaultOrder": 28,
1628
    "_isFastLaunch": false,
1629
    "category": "Compute optimized",
1630
    "gpuNum": 0,
1631
    "hideHardwareSpecs": false,
1632
    "memoryGiB": 192,
1633
    "name": "ml.c5.24xlarge",
1634
    "vcpuNum": 96
1635
   },
1636
   {
1637
    "_defaultOrder": 29,
1638
    "_isFastLaunch": true,
1639
    "category": "Accelerated computing",
1640
    "gpuNum": 1,
1641
    "hideHardwareSpecs": false,
1642
    "memoryGiB": 16,
1643
    "name": "ml.g4dn.xlarge",
1644
    "vcpuNum": 4
1645
   },
1646
   {
1647
    "_defaultOrder": 30,
1648
    "_isFastLaunch": false,
1649
    "category": "Accelerated computing",
1650
    "gpuNum": 1,
1651
    "hideHardwareSpecs": false,
1652
    "memoryGiB": 32,
1653
    "name": "ml.g4dn.2xlarge",
1654
    "vcpuNum": 8
1655
   },
1656
   {
1657
    "_defaultOrder": 31,
1658
    "_isFastLaunch": false,
1659
    "category": "Accelerated computing",
1660
    "gpuNum": 1,
1661
    "hideHardwareSpecs": false,
1662
    "memoryGiB": 64,
1663
    "name": "ml.g4dn.4xlarge",
1664
    "vcpuNum": 16
1665
   },
1666
   {
1667
    "_defaultOrder": 32,
1668
    "_isFastLaunch": false,
1669
    "category": "Accelerated computing",
1670
    "gpuNum": 1,
1671
    "hideHardwareSpecs": false,
1672
    "memoryGiB": 128,
1673
    "name": "ml.g4dn.8xlarge",
1674
    "vcpuNum": 32
1675
   },
1676
   {
1677
    "_defaultOrder": 33,
1678
    "_isFastLaunch": false,
1679
    "category": "Accelerated computing",
1680
    "gpuNum": 4,
1681
    "hideHardwareSpecs": false,
1682
    "memoryGiB": 192,
1683
    "name": "ml.g4dn.12xlarge",
1684
    "vcpuNum": 48
1685
   },
1686
   {
1687
    "_defaultOrder": 34,
1688
    "_isFastLaunch": false,
1689
    "category": "Accelerated computing",
1690
    "gpuNum": 1,
1691
    "hideHardwareSpecs": false,
1692
    "memoryGiB": 256,
1693
    "name": "ml.g4dn.16xlarge",
1694
    "vcpuNum": 64
1695
   },
1696
   {
1697
    "_defaultOrder": 35,
1698
    "_isFastLaunch": false,
1699
    "category": "Accelerated computing",
1700
    "gpuNum": 1,
1701
    "hideHardwareSpecs": false,
1702
    "memoryGiB": 61,
1703
    "name": "ml.p3.2xlarge",
1704
    "vcpuNum": 8
1705
   },
1706
   {
1707
    "_defaultOrder": 36,
1708
    "_isFastLaunch": false,
1709
    "category": "Accelerated computing",
1710
    "gpuNum": 4,
1711
    "hideHardwareSpecs": false,
1712
    "memoryGiB": 244,
1713
    "name": "ml.p3.8xlarge",
1714
    "vcpuNum": 32
1715
   },
1716
   {
1717
    "_defaultOrder": 37,
1718
    "_isFastLaunch": false,
1719
    "category": "Accelerated computing",
1720
    "gpuNum": 8,
1721
    "hideHardwareSpecs": false,
1722
    "memoryGiB": 488,
1723
    "name": "ml.p3.16xlarge",
1724
    "vcpuNum": 64
1725
   },
1726
   {
1727
    "_defaultOrder": 38,
1728
    "_isFastLaunch": false,
1729
    "category": "Accelerated computing",
1730
    "gpuNum": 8,
1731
    "hideHardwareSpecs": false,
1732
    "memoryGiB": 768,
1733
    "name": "ml.p3dn.24xlarge",
1734
    "vcpuNum": 96
1735
   },
1736
   {
1737
    "_defaultOrder": 39,
1738
    "_isFastLaunch": false,
1739
    "category": "Memory Optimized",
1740
    "gpuNum": 0,
1741
    "hideHardwareSpecs": false,
1742
    "memoryGiB": 16,
1743
    "name": "ml.r5.large",
1744
    "vcpuNum": 2
1745
   },
1746
   {
1747
    "_defaultOrder": 40,
1748
    "_isFastLaunch": false,
1749
    "category": "Memory Optimized",
1750
    "gpuNum": 0,
1751
    "hideHardwareSpecs": false,
1752
    "memoryGiB": 32,
1753
    "name": "ml.r5.xlarge",
1754
    "vcpuNum": 4
1755
   },
1756
   {
1757
    "_defaultOrder": 41,
1758
    "_isFastLaunch": false,
1759
    "category": "Memory Optimized",
1760
    "gpuNum": 0,
1761
    "hideHardwareSpecs": false,
1762
    "memoryGiB": 64,
1763
    "name": "ml.r5.2xlarge",
1764
    "vcpuNum": 8
1765
   },
1766
   {
1767
    "_defaultOrder": 42,
1768
    "_isFastLaunch": false,
1769
    "category": "Memory Optimized",
1770
    "gpuNum": 0,
1771
    "hideHardwareSpecs": false,
1772
    "memoryGiB": 128,
1773
    "name": "ml.r5.4xlarge",
1774
    "vcpuNum": 16
1775
   },
1776
   {
1777
    "_defaultOrder": 43,
1778
    "_isFastLaunch": false,
1779
    "category": "Memory Optimized",
1780
    "gpuNum": 0,
1781
    "hideHardwareSpecs": false,
1782
    "memoryGiB": 256,
1783
    "name": "ml.r5.8xlarge",
1784
    "vcpuNum": 32
1785
   },
1786
   {
1787
    "_defaultOrder": 44,
1788
    "_isFastLaunch": false,
1789
    "category": "Memory Optimized",
1790
    "gpuNum": 0,
1791
    "hideHardwareSpecs": false,
1792
    "memoryGiB": 384,
1793
    "name": "ml.r5.12xlarge",
1794
    "vcpuNum": 48
1795
   },
1796
   {
1797
    "_defaultOrder": 45,
1798
    "_isFastLaunch": false,
1799
    "category": "Memory Optimized",
1800
    "gpuNum": 0,
1801
    "hideHardwareSpecs": false,
1802
    "memoryGiB": 512,
1803
    "name": "ml.r5.16xlarge",
1804
    "vcpuNum": 64
1805
   },
1806
   {
1807
    "_defaultOrder": 46,
1808
    "_isFastLaunch": false,
1809
    "category": "Memory Optimized",
1810
    "gpuNum": 0,
1811
    "hideHardwareSpecs": false,
1812
    "memoryGiB": 768,
1813
    "name": "ml.r5.24xlarge",
1814
    "vcpuNum": 96
1815
   },
1816
   {
1817
    "_defaultOrder": 47,
1818
    "_isFastLaunch": false,
1819
    "category": "Accelerated computing",
1820
    "gpuNum": 1,
1821
    "hideHardwareSpecs": false,
1822
    "memoryGiB": 16,
1823
    "name": "ml.g5.xlarge",
1824
    "vcpuNum": 4
1825
   },
1826
   {
1827
    "_defaultOrder": 48,
1828
    "_isFastLaunch": false,
1829
    "category": "Accelerated computing",
1830
    "gpuNum": 1,
1831
    "hideHardwareSpecs": false,
1832
    "memoryGiB": 32,
1833
    "name": "ml.g5.2xlarge",
1834
    "vcpuNum": 8
1835
   },
1836
   {
1837
    "_defaultOrder": 49,
1838
    "_isFastLaunch": false,
1839
    "category": "Accelerated computing",
1840
    "gpuNum": 1,
1841
    "hideHardwareSpecs": false,
1842
    "memoryGiB": 64,
1843
    "name": "ml.g5.4xlarge",
1844
    "vcpuNum": 16
1845
   },
1846
   {
1847
    "_defaultOrder": 50,
1848
    "_isFastLaunch": false,
1849
    "category": "Accelerated computing",
1850
    "gpuNum": 1,
1851
    "hideHardwareSpecs": false,
1852
    "memoryGiB": 128,
1853
    "name": "ml.g5.8xlarge",
1854
    "vcpuNum": 32
1855
   },
1856
   {
1857
    "_defaultOrder": 51,
1858
    "_isFastLaunch": false,
1859
    "category": "Accelerated computing",
1860
    "gpuNum": 1,
1861
    "hideHardwareSpecs": false,
1862
    "memoryGiB": 256,
1863
    "name": "ml.g5.16xlarge",
1864
    "vcpuNum": 64
1865
   },
1866
   {
1867
    "_defaultOrder": 52,
1868
    "_isFastLaunch": false,
1869
    "category": "Accelerated computing",
1870
    "gpuNum": 4,
1871
    "hideHardwareSpecs": false,
1872
    "memoryGiB": 192,
1873
    "name": "ml.g5.12xlarge",
1874
    "vcpuNum": 48
1875
   },
1876
   {
1877
    "_defaultOrder": 53,
1878
    "_isFastLaunch": false,
1879
    "category": "Accelerated computing",
1880
    "gpuNum": 4,
1881
    "hideHardwareSpecs": false,
1882
    "memoryGiB": 384,
1883
    "name": "ml.g5.24xlarge",
1884
    "vcpuNum": 96
1885
   },
1886
   {
1887
    "_defaultOrder": 54,
1888
    "_isFastLaunch": false,
1889
    "category": "Accelerated computing",
1890
    "gpuNum": 8,
1891
    "hideHardwareSpecs": false,
1892
    "memoryGiB": 768,
1893
    "name": "ml.g5.48xlarge",
1894
    "vcpuNum": 192
1895
   },
1896
   {
1897
    "_defaultOrder": 55,
1898
    "_isFastLaunch": false,
1899
    "category": "Accelerated computing",
1900
    "gpuNum": 8,
1901
    "hideHardwareSpecs": false,
1902
    "memoryGiB": 1152,
1903
    "name": "ml.p4d.24xlarge",
1904
    "vcpuNum": 96
1905
   },
1906
   {
1907
    "_defaultOrder": 56,
1908
    "_isFastLaunch": false,
1909
    "category": "Accelerated computing",
1910
    "gpuNum": 8,
1911
    "hideHardwareSpecs": false,
1912
    "memoryGiB": 1152,
1913
    "name": "ml.p4de.24xlarge",
1914
    "vcpuNum": 96
1915
   }
1916
  ],
1917
  "instance_type": "ml.t3.medium",
1918
  "kernelspec": {
1919
   "display_name": "Python 3 (Data Science)",
1920
   "language": "python",
1921
   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
1922
  },
1923
  "language_info": {
1924
   "codemirror_mode": {
1925
    "name": "ipython",
1926
    "version": 3
1927
   },
1928
   "file_extension": ".py",
1929
   "mimetype": "text/x-python",
1930
   "name": "python",
1931
   "nbconvert_exporter": "python",
1932
   "pygments_lexer": "ipython3",
1933
   "version": "3.7.10"
1934
  }
1935
 },
1936
 "nbformat": 4,
1937
 "nbformat_minor": 4
1938
}
1939
examples

Использование cookies