examples

Форк
0
/
05-langchain-retrieval-augmentation.ipynb 
1251 строка · 54.3 Кб
1
{
2
  "cells": [
3
    {
4
      "attachments": {},
5
      "cell_type": "markdown",
6
      "metadata": {},
7
      "source": [
8
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/05-langchain-retrieval-augmentation.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/05-langchain-retrieval-augmentation.ipynb)"
9
      ]
10
    },
11
    {
12
      "attachments": {},
13
      "cell_type": "markdown",
14
      "metadata": {},
15
      "source": [
16
        "#### [LangChain Handbook](https://pinecone.io/learn/langchain)\n",
17
        "\n",
18
        "# Retrieval Augmentation\n",
19
        "\n",
20
        "**L**arge **L**anguage **M**odels (LLMs) have a data freshness problem. The most powerful LLMs in the world, like GPT-4, have no idea about recent world events.\n",
21
        "\n",
22
        "The world of LLMs is frozen in time. Their world exists as a static snapshot of the world as it was within their training data.\n",
23
        "\n",
24
        "A solution to this problem is *retrieval augmentation*. The idea behind this is that we retrieve relevant information from an external knowledge base and give that information to our LLM. In this notebook we will learn how to do that.\n",
25
        "\n",
26
        "[![Open fast notebook](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/fast-link.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/generation/langchain/handbook/05-langchain-retrieval-augmentation-fast.ipynb)\n",
27
        "\n",
28
        "To begin, we must install the prerequisite libraries that we will be using in this notebook. If we install all libraries we will find a conflict in the Hugging Face `datasets` library so we must install everything in a specific order like so:"
29
      ]
30
    },
31
    {
32
      "cell_type": "code",
33
      "execution_count": null,
34
      "metadata": {},
35
      "outputs": [],
36
      "source": [
37
        "!pip install -qU \\\n",
38
        "    datasets==2.12.0 \\\n",
39
        "    apache_beam \\\n",
40
        "    mwparserfromhell"
41
      ]
42
    },
43
    {
44
      "attachments": {},
45
      "cell_type": "markdown",
46
      "metadata": {},
47
      "source": [
48
        "## Building the Knowledge Base"
49
      ]
50
    },
51
    {
52
      "cell_type": "code",
53
      "execution_count": 1,
54
      "metadata": {
55
        "colab": {
56
          "base_uri": "https://localhost:8080/"
57
        },
58
        "id": "DiRWzKh0mMGv",
59
        "outputId": "5bfa8cb2-5c9f-40ba-f832-edc51dafbef4"
60
      },
61
      "outputs": [
62
        {
63
          "data": {
64
            "application/vnd.jupyter.widget-view+json": {
65
              "model_id": "b259cf6a00884cf7af61b7582eee001f",
66
              "version_major": 2,
67
              "version_minor": 0
68
            },
69
            "text/plain": [
70
              "Downloading readme:   0%|          | 0.00/16.3k [00:00<?, ?B/s]"
71
            ]
72
          },
73
          "metadata": {},
74
          "output_type": "display_data"
75
        },
76
        {
77
          "data": {
78
            "text/plain": [
79
              "Dataset({\n",
80
              "    features: ['id', 'url', 'title', 'text'],\n",
81
              "    num_rows: 10000\n",
82
              "})"
83
            ]
84
          },
85
          "execution_count": 1,
86
          "metadata": {},
87
          "output_type": "execute_result"
88
        }
89
      ],
90
      "source": [
91
        "from datasets import load_dataset\n",
92
        "\n",
93
        "data = load_dataset(\"wikipedia\", \"20220301.simple\", split='train[:10000]')\n",
94
        "data"
95
      ]
96
    },
97
    {
98
      "cell_type": "code",
99
      "execution_count": 2,
100
      "metadata": {
101
        "colab": {
102
          "base_uri": "https://localhost:8080/"
103
        },
104
        "id": "LarkabZgtbhQ",
105
        "outputId": "30a76a4d-c40c-4a9b-fc58-822c499dbbc3"
106
      },
107
      "outputs": [
108
        {
109
          "data": {
110
            "text/plain": [
111
              "{'id': '13',\n",
112
              " 'url': 'https://simple.wikipedia.org/wiki/Alan%20Turing',\n",
113
              " 'title': 'Alan Turing',\n",
114
              " 'text': 'Alan Mathison Turing OBE FRS (London, 23 June 1912 – Wilmslow, Cheshire, 7 June 1954) was an English mathematician and computer scientist. He was born in Maida Vale, London.\\n\\nEarly life and family \\nAlan Turing was born in Maida Vale, London on 23 June 1912. His father was part of a family of merchants from Scotland. His mother, Ethel Sara, was the daughter of an engineer.\\n\\nEducation \\nTuring went to St. Michael\\'s, a school at 20 Charles Road, St Leonards-on-sea, when he was five years old.\\n\"This is only a foretaste of what is to come, and only the shadow of what is going to be.” – Alan Turing.\\n\\nThe Stoney family were once prominent landlords, here in North Tipperary. His mother Ethel Sara Stoney (1881–1976) was daughter of Edward Waller Stoney (Borrisokane, North Tipperary) and Sarah Crawford (Cartron Abbey, Co. Longford); Protestant Anglo-Irish gentry.\\n\\nEducated in Dublin at Alexandra School and College; on October 1st 1907 she married Julius Mathison Turing, latter son of Reverend John Robert Turing and Fanny Boyd, in Dublin. Born on June 23rd 1912, Alan Turing would go on to be regarded as one of the greatest figures of the twentieth century.\\n\\nA brilliant mathematician and cryptographer Alan was to become the founder of modern-day computer science and artificial intelligence; designing a machine at Bletchley Park to break secret Enigma encrypted messages used by the Nazi German war machine to protect sensitive commercial, diplomatic and military communications during World War 2. Thus, Turing made the single biggest contribution to the Allied victory in the war against Nazi Germany, possibly saving the lives of an estimated 2 million people, through his effort in shortening World War II.\\n\\nIn 2013, almost 60 years later, Turing received a posthumous Royal Pardon from Queen Elizabeth II. Today, the “Turing law” grants an automatic pardon to men who died before the law came into force, making it possible for living convicted gay men to seek pardons for offences now no longer on the statute book.\\n\\nAlas, Turing accidentally or otherwise lost his life in 1954, having been subjected by a British court to chemical castration, thus avoiding a custodial sentence. He is known to have ended his life at the age of 41 years, by eating an apple laced with cyanide.\\n\\nCareer \\nTuring was one of the people who worked on the first computers. He created the theoretical  Turing machine in 1936. The machine was imaginary, but it included the idea of a computer program.\\n\\nTuring was interested in artificial intelligence. He proposed the Turing test, to say when a machine could be called \"intelligent\". A computer could be said to \"think\" if a human talking with it could not tell it was a machine.\\n\\nDuring World War II, Turing worked with others to break German ciphers (secret messages). He  worked for the Government Code and Cypher School (GC&CS) at Bletchley Park, Britain\\'s codebreaking centre that produced Ultra intelligence.\\nUsing cryptanalysis, he helped to break the codes of the Enigma machine. After that, he worked on other German codes.\\n\\nFrom 1945 to 1947, Turing worked on the design of the ACE (Automatic Computing Engine) at the National Physical Laboratory. He presented a paper on 19 February 1946. That paper was \"the first detailed design of a stored-program computer\". Although it was possible to build ACE, there were delays in starting the project. In late 1947 he returned to Cambridge for a sabbatical year. While he was at Cambridge, the Pilot ACE was built without him. It ran its first program on 10\\xa0May 1950.\\n\\nPrivate life \\nTuring was a homosexual man. In 1952, he admitted having had sex with a man in England. At that time, homosexual acts were illegal. Turing was convicted. He had to choose between going to jail and taking hormones to lower his sex drive. He decided to take the hormones. After his punishment, he became impotent. He also grew breasts.\\n\\nIn May 2012, a private member\\'s bill was put before the House of Lords to grant Turing a statutory pardon. In July 2013, the government supported it. A royal pardon was granted on 24 December 2013.\\n\\nDeath \\nIn 1954, Turing died from cyanide poisoning. The cyanide came from either an apple which was poisoned with cyanide, or from water that had cyanide in it. The reason for the confusion is that the police never tested the apple for cyanide. It is also suspected that he committed suicide.\\n\\nThe treatment forced on him is now believed to be very wrong. It is against medical ethics and international laws of human rights. In August 2009, a petition asking the British Government to apologise to Turing for punishing him for being a homosexual was started. The petition received thousands of signatures. Prime Minister Gordon Brown acknowledged the petition. He called Turing\\'s treatment \"appalling\".\\n\\nReferences\\n\\nOther websites \\nJack Copeland 2012. Alan Turing: The codebreaker who saved \\'millions of lives\\'. BBC News / Technology \\n\\nEnglish computer scientists\\nEnglish LGBT people\\nEnglish mathematicians\\nGay men\\nLGBT scientists\\nScientists from London\\nSuicides by poison\\nSuicides in the United Kingdom\\n1912 births\\n1954 deaths\\nOfficers of the Order of the British Empire'}"
115
            ]
116
          },
117
          "execution_count": 2,
118
          "metadata": {},
119
          "output_type": "execute_result"
120
        }
121
      ],
122
      "source": [
123
        "data[6]"
124
      ]
125
    },
126
    {
127
      "attachments": {},
128
      "cell_type": "markdown",
129
      "metadata": {},
130
      "source": [
131
        "Now we install the remaining libraries:"
132
      ]
133
    },
134
    {
135
      "cell_type": "code",
136
      "execution_count": 1,
137
      "metadata": {
138
        "id": "0_4wHAWtmAvJ"
139
      },
140
      "outputs": [],
141
      "source": [
142
        "!pip install -qU \\\n",
143
        "  langchain==0.0.355 \\\n",
144
        "  openai==1.6.1 \\\n",
145
        "  pinecone-client==3.1.0 \\\n",
146
        "  tiktoken==0.5.2"
147
      ]
148
    },
149
    {
150
      "attachments": {},
151
      "cell_type": "markdown",
152
      "metadata": {},
153
      "source": [
154
        "---\n",
155
        "\n",
156
        "🚨 _Note: the above `pip install` is formatted for Jupyter notebooks. If running elsewhere you may need to drop the `!`._\n",
157
        "\n",
158
        "---"
159
      ]
160
    },
161
    {
162
      "attachments": {},
163
      "cell_type": "markdown",
164
      "metadata": {
165
        "id": "OPpcO-TwuQwD"
166
      },
167
      "source": [
168
        "Every record contains *a lot* of text. Our first task is therefore to identify a good preprocessing methodology for chunking these articles into more \"concise\" chunks to later be embedding and stored in our Pinecone vector database.\n",
169
        "\n",
170
        "For this we use LangChain's `RecursiveCharacterTextSplitter` to split our text into chunks of a specified max length."
171
      ]
172
    },
173
    {
174
      "cell_type": "code",
175
      "execution_count": 3,
176
      "metadata": {},
177
      "outputs": [
178
        {
179
          "data": {
180
            "text/plain": [
181
              "<Encoding 'cl100k_base'>"
182
            ]
183
          },
184
          "execution_count": 3,
185
          "metadata": {},
186
          "output_type": "execute_result"
187
        }
188
      ],
189
      "source": [
190
        "import tiktoken\n",
191
        "\n",
192
        "tiktoken.encoding_for_model('gpt-3.5-turbo')"
193
      ]
194
    },
195
    {
196
      "cell_type": "code",
197
      "execution_count": 4,
198
      "metadata": {
199
        "id": "a3ChSxlcwX8n"
200
      },
201
      "outputs": [
202
        {
203
          "data": {
204
            "text/plain": [
205
              "26"
206
            ]
207
          },
208
          "execution_count": 4,
209
          "metadata": {},
210
          "output_type": "execute_result"
211
        }
212
      ],
213
      "source": [
214
        "import tiktoken\n",
215
        "\n",
216
        "tokenizer = tiktoken.get_encoding('cl100k_base')\n",
217
        "\n",
218
        "# create the length function\n",
219
        "def tiktoken_len(text):\n",
220
        "    tokens = tokenizer.encode(\n",
221
        "        text,\n",
222
        "        disallowed_special=()\n",
223
        "    )\n",
224
        "    return len(tokens)\n",
225
        "\n",
226
        "tiktoken_len(\"hello I am a chunk of text and using the tiktoken_len function \"\n",
227
        "             \"we can find the length of this chunk of text in tokens\")"
228
      ]
229
    },
230
    {
231
      "cell_type": "code",
232
      "execution_count": 5,
233
      "metadata": {
234
        "id": "58J-y6GHtvQP"
235
      },
236
      "outputs": [],
237
      "source": [
238
        "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
239
        "\n",
240
        "text_splitter = RecursiveCharacterTextSplitter(\n",
241
        "    chunk_size=400,\n",
242
        "    chunk_overlap=20,\n",
243
        "    length_function=tiktoken_len,\n",
244
        "    separators=[\"\\n\\n\", \"\\n\", \" \", \"\"]\n",
245
        ")"
246
      ]
247
    },
248
    {
249
      "cell_type": "code",
250
      "execution_count": 6,
251
      "metadata": {
252
        "colab": {
253
          "base_uri": "https://localhost:8080/"
254
        },
255
        "id": "W8KGqv-rzEgH",
256
        "outputId": "b8a954b2-038c-4e00-8081-7f1c3934afb5"
257
      },
258
      "outputs": [
259
        {
260
          "data": {
261
            "text/plain": [
262
              "['Alan Mathison Turing OBE FRS (London, 23 June 1912 – Wilmslow, Cheshire, 7 June 1954) was an English mathematician and computer scientist. He was born in Maida Vale, London.\\n\\nEarly life and family \\nAlan Turing was born in Maida Vale, London on 23 June 1912. His father was part of a family of merchants from Scotland. His mother, Ethel Sara, was the daughter of an engineer.\\n\\nEducation \\nTuring went to St. Michael\\'s, a school at 20 Charles Road, St Leonards-on-sea, when he was five years old.\\n\"This is only a foretaste of what is to come, and only the shadow of what is going to be.” – Alan Turing.\\n\\nThe Stoney family were once prominent landlords, here in North Tipperary. His mother Ethel Sara Stoney (1881–1976) was daughter of Edward Waller Stoney (Borrisokane, North Tipperary) and Sarah Crawford (Cartron Abbey, Co. Longford); Protestant Anglo-Irish gentry.\\n\\nEducated in Dublin at Alexandra School and College; on October 1st 1907 she married Julius Mathison Turing, latter son of Reverend John Robert Turing and Fanny Boyd, in Dublin. Born on June 23rd 1912, Alan Turing would go on to be regarded as one of the greatest figures of the twentieth century.',\n",
263
              " 'A brilliant mathematician and cryptographer Alan was to become the founder of modern-day computer science and artificial intelligence; designing a machine at Bletchley Park to break secret Enigma encrypted messages used by the Nazi German war machine to protect sensitive commercial, diplomatic and military communications during World War 2. Thus, Turing made the single biggest contribution to the Allied victory in the war against Nazi Germany, possibly saving the lives of an estimated 2 million people, through his effort in shortening World War II.\\n\\nIn 2013, almost 60 years later, Turing received a posthumous Royal Pardon from Queen Elizabeth II. Today, the “Turing law” grants an automatic pardon to men who died before the law came into force, making it possible for living convicted gay men to seek pardons for offences now no longer on the statute book.\\n\\nAlas, Turing accidentally or otherwise lost his life in 1954, having been subjected by a British court to chemical castration, thus avoiding a custodial sentence. He is known to have ended his life at the age of 41 years, by eating an apple laced with cyanide.\\n\\nCareer \\nTuring was one of the people who worked on the first computers. He created the theoretical  Turing machine in 1936. The machine was imaginary, but it included the idea of a computer program.\\n\\nTuring was interested in artificial intelligence. He proposed the Turing test, to say when a machine could be called \"intelligent\". A computer could be said to \"think\" if a human talking with it could not tell it was a machine.',\n",
264
              " 'During World War II, Turing worked with others to break German ciphers (secret messages). He  worked for the Government Code and Cypher School (GC&CS) at Bletchley Park, Britain\\'s codebreaking centre that produced Ultra intelligence.\\nUsing cryptanalysis, he helped to break the codes of the Enigma machine. After that, he worked on other German codes.\\n\\nFrom 1945 to 1947, Turing worked on the design of the ACE (Automatic Computing Engine) at the National Physical Laboratory. He presented a paper on 19 February 1946. That paper was \"the first detailed design of a stored-program computer\". Although it was possible to build ACE, there were delays in starting the project. In late 1947 he returned to Cambridge for a sabbatical year. While he was at Cambridge, the Pilot ACE was built without him. It ran its first program on 10\\xa0May 1950.\\n\\nPrivate life \\nTuring was a homosexual man. In 1952, he admitted having had sex with a man in England. At that time, homosexual acts were illegal. Turing was convicted. He had to choose between going to jail and taking hormones to lower his sex drive. He decided to take the hormones. After his punishment, he became impotent. He also grew breasts.\\n\\nIn May 2012, a private member\\'s bill was put before the House of Lords to grant Turing a statutory pardon. In July 2013, the government supported it. A royal pardon was granted on 24 December 2013.\\n\\nDeath \\nIn 1954, Turing died from cyanide poisoning. The cyanide came from either an apple which was poisoned with cyanide, or from water that had cyanide in it. The reason for the confusion is that the police never tested the apple for cyanide. It is also suspected that he committed suicide.']"
265
            ]
266
          },
267
          "execution_count": 6,
268
          "metadata": {},
269
          "output_type": "execute_result"
270
        }
271
      ],
272
      "source": [
273
        "chunks = text_splitter.split_text(data[6]['text'])[:3]\n",
274
        "chunks"
275
      ]
276
    },
277
    {
278
      "cell_type": "code",
279
      "execution_count": 7,
280
      "metadata": {
281
        "colab": {
282
          "base_uri": "https://localhost:8080/"
283
        },
284
        "id": "K9hdjy22zVuJ",
285
        "outputId": "0989fc50-6b31-4109-9a9f-a3445d607fcd"
286
      },
287
      "outputs": [
288
        {
289
          "data": {
290
            "text/plain": [
291
              "(299, 323, 382)"
292
            ]
293
          },
294
          "execution_count": 7,
295
          "metadata": {},
296
          "output_type": "execute_result"
297
        }
298
      ],
299
      "source": [
300
        "tiktoken_len(chunks[0]), tiktoken_len(chunks[1]), tiktoken_len(chunks[2])"
301
      ]
302
    },
303
    {
304
      "attachments": {},
305
      "cell_type": "markdown",
306
      "metadata": {
307
        "id": "SvApQNma0K8u"
308
      },
309
      "source": [
310
        "Using the `text_splitter` we get much better sized chunks of text. We'll use this functionality during the indexing process later. Now let's take a look at embedding.\n",
311
        "\n",
312
        "## Creating Embeddings\n",
313
        "\n",
314
        "Building embeddings using LangChain's OpenAI embedding support is fairly straightforward. We first need to add our [OpenAI api key]() by running the next cell:"
315
      ]
316
    },
317
    {
318
      "cell_type": "code",
319
      "execution_count": 8,
320
      "metadata": {
321
        "colab": {
322
          "base_uri": "https://localhost:8080/"
323
        },
324
        "id": "dphi6CC33p62",
325
        "outputId": "b8a95521-bd7f-476e-c643-c712ee8dcc43"
326
      },
327
      "outputs": [],
328
      "source": [
329
        "import os\n",
330
        "\n",
331
        "# get openai api key from platform.openai.com\n",
332
        "OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or 'YOUR_API_KEY'"
333
      ]
334
    },
335
    {
336
      "attachments": {},
337
      "cell_type": "markdown",
338
      "metadata": {
339
        "id": "49hoj_ZS3wAr"
340
      },
341
      "source": [
342
        "*(Note that OpenAI is a paid service and so running the remainder of this notebook may incur some small cost)*\n",
343
        "\n",
344
        "After initializing the API key we can initialize our `text-embedding-ada-002` embedding model like so:"
345
      ]
346
    },
347
    {
348
      "cell_type": "code",
349
      "execution_count": 9,
350
      "metadata": {
351
        "id": "mBLIWLkLzyGi"
352
      },
353
      "outputs": [],
354
      "source": [
355
        "from langchain.embeddings.openai import OpenAIEmbeddings\n",
356
        "\n",
357
        "model_name = 'text-embedding-ada-002'\n",
358
        "\n",
359
        "embed = OpenAIEmbeddings(\n",
360
        "    model=model_name,\n",
361
        "    openai_api_key=OPENAI_API_KEY\n",
362
        ")"
363
      ]
364
    },
365
    {
366
      "attachments": {},
367
      "cell_type": "markdown",
368
      "metadata": {
369
        "id": "SwbZGT-v4iMi"
370
      },
371
      "source": [
372
        "Now we embed some text like so:"
373
      ]
374
    },
375
    {
376
      "cell_type": "code",
377
      "execution_count": 10,
378
      "metadata": {
379
        "colab": {
380
          "base_uri": "https://localhost:8080/"
381
        },
382
        "id": "vM-HuKtl4cyt",
383
        "outputId": "45e64ca2-ac56-42fc-ae57-098497ab645c"
384
      },
385
      "outputs": [
386
        {
387
          "data": {
388
            "text/plain": [
389
              "(2, 1536)"
390
            ]
391
          },
392
          "execution_count": 10,
393
          "metadata": {},
394
          "output_type": "execute_result"
395
        }
396
      ],
397
      "source": [
398
        "texts = [\n",
399
        "    'this is the first chunk of text',\n",
400
        "    'then another second chunk of text is here'\n",
401
        "]\n",
402
        "\n",
403
        "res = embed.embed_documents(texts)\n",
404
        "len(res), len(res[0])"
405
      ]
406
    },
407
    {
408
      "attachments": {},
409
      "cell_type": "markdown",
410
      "metadata": {
411
        "id": "QPUmWYSA43eC"
412
      },
413
      "source": [
414
        "From this we get *two* (aligning to our two chunks of text) 1536-dimensional embeddings.\n",
415
        "\n",
416
        "Now we move on to initializing our Pinecone vector database.\n",
417
        "\n",
418
        "## Vector Database\n",
419
        "\n",
420
        "To create our vector database we first need a [free API key from Pinecone](https://app.pinecone.io). Then we initialize like so:"
421
      ]
422
    },
423
    {
424
      "cell_type": "code",
425
      "execution_count": 11,
426
      "metadata": {},
427
      "outputs": [],
428
      "source": [
429
        "from pinecone import Pinecone\n",
430
        "\n",
431
        "# initialize connection to pinecone (get API key at app.pinecone.io)\n",
432
        "api_key = os.getenv(\"PINECONE_API_KEY\") or \"YOUR_API_KEY\"\n",
433
        "\n",
434
        "# configure client\n",
435
        "pc = Pinecone(api_key=api_key)"
436
      ]
437
    },
438
    {
439
      "cell_type": "markdown",
440
      "metadata": {},
441
      "source": [
442
        "Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects)."
443
      ]
444
    },
445
    {
446
      "cell_type": "code",
447
      "execution_count": 12,
448
      "metadata": {},
449
      "outputs": [],
450
      "source": [
451
        "from pinecone import ServerlessSpec\n",
452
        "\n",
453
        "spec = ServerlessSpec(\n",
454
        "    cloud=\"aws\", region=\"us-west-2\"\n",
455
        ")"
456
      ]
457
    },
458
    {
459
      "cell_type": "markdown",
460
      "metadata": {},
461
      "source": [
462
        "Then we initialize the index. We will be using OpenAI's `text-embedding-ada-002` model for creating the embeddings, so we set the `dimension` to `1536`."
463
      ]
464
    },
465
    {
466
      "cell_type": "code",
467
      "execution_count": 13,
468
      "metadata": {
469
        "colab": {
470
          "base_uri": "https://localhost:8080/"
471
        },
472
        "id": "9pT9C4nW4vwo",
473
        "outputId": "f4ae4545-6c50-4db5-8ce5-5e7e9840f512"
474
      },
475
      "outputs": [
476
        {
477
          "data": {
478
            "text/plain": [
479
              "{'dimension': 1536,\n",
480
              " 'index_fullness': 0.0,\n",
481
              " 'namespaces': {},\n",
482
              " 'total_vector_count': 0}"
483
            ]
484
          },
485
          "execution_count": 13,
486
          "metadata": {},
487
          "output_type": "execute_result"
488
        }
489
      ],
490
      "source": [
491
        "import time\n",
492
        "\n",
493
        "index_name = 'langchain-retrieval-augmentation'\n",
494
        "existing_indexes = [\n",
495
        "    index_info[\"name\"] for index_info in pc.list_indexes()\n",
496
        "]\n",
497
        "\n",
498
        "# check if index already exists (it shouldn't if this is first time)\n",
499
        "if index_name not in existing_indexes:\n",
500
        "    # if does not exist, create index\n",
501
        "    pc.create_index(\n",
502
        "        index_name,\n",
503
        "        dimension=1536,  # dimensionality of ada 002\n",
504
        "        metric='dotproduct',\n",
505
        "        spec=spec\n",
506
        "    )\n",
507
        "    # wait for index to be initialized\n",
508
        "    while not pc.describe_index(index_name).status['ready']:\n",
509
        "        time.sleep(1)\n",
510
        "\n",
511
        "# connect to index\n",
512
        "index = pc.Index(index_name)\n",
513
        "time.sleep(1)\n",
514
        "# view index stats\n",
515
        "index.describe_index_stats()"
516
      ]
517
    },
518
    {
519
      "attachments": {},
520
      "cell_type": "markdown",
521
      "metadata": {
522
        "id": "0RqIF2mIDwFu"
523
      },
524
      "source": [
525
        "We should see that the new Pinecone index has a `total_vector_count` of `0`, as we haven't added any vectors yet.\n",
526
        "\n",
527
        "## Indexing\n",
528
        "\n",
529
        "We can perform the indexing task using the LangChain vector store object. But for now it is much faster to do it via the Pinecone python client directly. We will do this in batches of `100` or more."
530
      ]
531
    },
532
    {
533
      "cell_type": "code",
534
      "execution_count": 14,
535
      "metadata": {
536
        "colab": {
537
          "base_uri": "https://localhost:8080/",
538
          "height": 49,
539
          "referenced_widgets": [
540
            "28a553d3a3704b3aa8b061b71b1fe2ee",
541
            "ee030d62f3a54f5288cccf954caa7d85",
542
            "55cdb4e0b33a48b298f760e7ff2af0f9",
543
            "9de7f27011b346f8b7a13fa649164ee7",
544
            "f362a565ff90457f904233d4fc625119",
545
            "059918bb59744634aaa181dc4ec256a2",
546
            "f762e8d37ab6441d87b2a66bfddd5239",
547
            "83ac28af70074e998663f6f247278a83",
548
            "3c6290e0ee42461eb47dfcc5d5cd0629",
549
            "88a2b48b3b4f415797bab96eaa925aa7",
550
            "c241146f1475404282c35bc09e7cc945"
551
          ]
552
        },
553
        "id": "W-cIOoTWGY1R",
554
        "outputId": "93e3a0b2-f00c-4872-bdf6-740a2d628735"
555
      },
556
      "outputs": [
557
        {
558
          "data": {
559
            "application/vnd.jupyter.widget-view+json": {
560
              "model_id": "afdf3e7e63554b59b4156e54fb607877",
561
              "version_major": 2,
562
              "version_minor": 0
563
            },
564
            "text/plain": [
565
              "  0%|          | 0/10000 [00:00<?, ?it/s]"
566
            ]
567
          },
568
          "metadata": {},
569
          "output_type": "display_data"
570
        }
571
      ],
572
      "source": [
573
        "from tqdm.auto import tqdm\n",
574
        "from uuid import uuid4\n",
575
        "\n",
576
        "batch_limit = 100\n",
577
        "\n",
578
        "texts = []\n",
579
        "metadatas = []\n",
580
        "\n",
581
        "for i, record in enumerate(tqdm(data)):\n",
582
        "    # first get metadata fields for this record\n",
583
        "    metadata = {\n",
584
        "        'wiki-id': str(record['id']),\n",
585
        "        'source': record['url'],\n",
586
        "        'title': record['title']\n",
587
        "    }\n",
588
        "    # now we create chunks from the record text\n",
589
        "    record_texts = text_splitter.split_text(record['text'])\n",
590
        "    # create individual metadata dicts for each chunk\n",
591
        "    record_metadatas = [{\n",
592
        "        \"chunk\": j, \"text\": text, **metadata\n",
593
        "    } for j, text in enumerate(record_texts)]\n",
594
        "    # append these to current batches\n",
595
        "    texts.extend(record_texts)\n",
596
        "    metadatas.extend(record_metadatas)\n",
597
        "    # if we have reached the batch_limit we can add texts\n",
598
        "    if len(texts) >= batch_limit:\n",
599
        "        ids = [str(uuid4()) for _ in range(len(texts))]\n",
600
        "        embeds = embed.embed_documents(texts)\n",
601
        "        index.upsert(vectors=zip(ids, embeds, metadatas))\n",
602
        "        texts = []\n",
603
        "        metadatas = []\n",
604
        "\n",
605
        "if len(texts) > 0:\n",
606
        "    ids = [str(uuid4()) for _ in range(len(texts))]\n",
607
        "    embeds = embed.embed_documents(texts)\n",
608
        "    index.upsert(vectors=zip(ids, embeds, metadatas))"
609
      ]
610
    },
611
    {
612
      "attachments": {},
613
      "cell_type": "markdown",
614
      "metadata": {
615
        "id": "XaF3daSxyCwB"
616
      },
617
      "source": [
618
        "We've now indexed everything. We can check the number of vectors in our index like so:"
619
      ]
620
    },
621
    {
622
      "cell_type": "code",
623
      "execution_count": 15,
624
      "metadata": {
625
        "colab": {
626
          "base_uri": "https://localhost:8080/"
627
        },
628
        "id": "CaEBhsAM22M3",
629
        "outputId": "b647b1d1-809d-40d1-ff24-0772bc2506fc"
630
      },
631
      "outputs": [
632
        {
633
          "data": {
634
            "text/plain": [
635
              "{'dimension': 1536,\n",
636
              " 'index_fullness': 0.0,\n",
637
              " 'namespaces': {'': {'vector_count': 28422}},\n",
638
              " 'total_vector_count': 28422}"
639
            ]
640
          },
641
          "execution_count": 15,
642
          "metadata": {},
643
          "output_type": "execute_result"
644
        }
645
      ],
646
      "source": [
647
        "index.describe_index_stats()"
648
      ]
649
    },
650
    {
651
      "attachments": {},
652
      "cell_type": "markdown",
653
      "metadata": {
654
        "id": "-8P2PryCy8W3"
655
      },
656
      "source": [
657
        "## Creating a Vector Store and Querying\n",
658
        "\n",
659
        "Now that we've build our index we can switch back over to LangChain. We start by initializing a vector store using the same index we just built. We do that like so:"
660
      ]
661
    },
662
    {
663
      "cell_type": "code",
664
      "execution_count": 18,
665
      "metadata": {
666
        "id": "qMXlvXOAyJHy"
667
      },
668
      "outputs": [
669
        {
670
          "name": "stderr",
671
          "output_type": "stream",
672
          "text": [
673
            "/Users/jamesbriggs/opt/anaconda3/envs/ml/lib/python3.9/site-packages/langchain_community/vectorstores/pinecone.py:74: UserWarning: Passing in `embedding` as a Callable is deprecated. Please pass in an Embeddings object instead.\n",
674
            "  warnings.warn(\n"
675
          ]
676
        }
677
      ],
678
      "source": [
679
        "from langchain.vectorstores import Pinecone\n",
680
        "\n",
681
        "text_field = \"text\"  # the metadata field that contains our text\n",
682
        "\n",
683
        "# initialize the vector store object\n",
684
        "vectorstore = Pinecone(\n",
685
        "    index, embed.embed_query, text_field\n",
686
        ")"
687
      ]
688
    },
689
    {
690
      "cell_type": "code",
691
      "execution_count": 19,
692
      "metadata": {
693
        "colab": {
694
          "base_uri": "https://localhost:8080/"
695
        },
696
        "id": "COT5s7hcyPiq",
697
        "outputId": "29dfe2c3-2cc7-473d-f702-ad5c4e1fa32c"
698
      },
699
      "outputs": [
700
        {
701
          "data": {
702
            "text/plain": [
703
              "[Document(page_content='Benito Amilcare Andrea Mussolini KSMOM GCTE (29 July 1883 – 28 April 1945) was an Italian politician and journalist. He was also the Prime Minister of Italy from 1922 until 1943. He was the leader of the National Fascist Party.\\n\\nBiography\\n\\nEarly life\\nBenito Mussolini was named after Benito Juarez, a Mexican opponent of the political power of the Roman Catholic Church, by his anticlerical (a person who opposes the political interference of the Roman Catholic Church in secular affairs) father. Mussolini\\'s father was a blacksmith. Before being involved in politics, Mussolini was a newspaper editor (where he learned all his propaganda skills) and elementary school teacher.\\n\\nAt first, Mussolini was a socialist, but when he wanted Italy to join the First World War, he was thrown out of the socialist party. He \\'invented\\' a new ideology, Fascism, much out of Nationalist\\xa0and Conservative views.\\n\\nRise to power and becoming dictator\\nIn 1922, he took power by having a large group of men, \"Black Shirts,\" march on Rome and threaten to take over the government. King Vittorio Emanuele III gave in, allowed him to form a government, and made him prime minister. In the following five years, he gained power, and in 1927 created the OVRA, his personal secret police force. Using the agency to arrest, scare, or murder people against his regime, Mussolini was dictator\\xa0of Italy by the end of 1927. Only the King and his own Fascist party could challenge his power.', metadata={'chunk': 0.0, 'source': 'https://simple.wikipedia.org/wiki/Benito%20Mussolini', 'title': 'Benito Mussolini', 'wiki-id': '6754'}),\n",
704
              " Document(page_content='Fascism as practiced by Mussolini\\nMussolini\\'s form of Fascism, \"Italian Fascism\"- unlike Nazism, the racist ideology that Adolf Hitler followed- was different and less destructive than Hitler\\'s. Although a believer in the superiority of the Italian nation and national unity, Mussolini, unlike Hitler, is quoted \"Race? It is a feeling, not a reality. Nothing will ever make me believe that biologically pure races can be shown to exist today\".\\n\\nMussolini wanted Italy to become a new Roman Empire. In 1923, he attacked the island of Corfu, and in 1924, he occupied the city state of Fiume. In 1935, he attacked the African country Abyssinia (now called Ethiopia). His forces occupied it in 1936. Italy was thrown out of the League of Nations because of this aggression. In 1939, he occupied the country Albania. In 1936, Mussolini signed an alliance with Adolf Hitler, the dictator of Germany.\\n\\nFall from power and death\\nIn 1940, he sent Italy into the Second World War on the side of the Axis countries. Mussolini attacked Greece, but he failed to conquer it. In 1943, the Allies landed in Southern Italy. The Fascist party and King Vittorio Emanuel III deposed Mussolini and put him in jail, but he was set free by the Germans, who made him ruler of the Italian Social Republic puppet state which was in a small part of Central Italy. When the war was almost over, Mussolini tried to escape to Switzerland with his mistress, Clara Petacci, but they were both captured and shot by partisans. Mussolini\\'s dead body was hanged upside-down, together with his mistress and some of Mussolini\\'s helpers, on a pole at a gas station in the village of Millan, which is near the border  between Italy and Switzerland.', metadata={'chunk': 1.0, 'source': 'https://simple.wikipedia.org/wiki/Benito%20Mussolini', 'title': 'Benito Mussolini', 'wiki-id': '6754'}),\n",
705
              " Document(page_content='Veneto was made part of Italy in 1866 after a war with Austria. Italian soldiers won Latium in 1870. That was when they took away the Pope\\'s power. The Pope, who was angry, said that he was a prisoner to keep Catholic people from being active in politics. That was the year of Italian unification.\\n\\nItaly participated in World War I. It was an ally of Great Britain, France, and Russia against the Central Powers. Almost all of Italy\\'s fighting was on the Eastern border, near Austria. After the \"Caporetto defeat\", Italy thought they would lose the war. But, in 1918, the Central Powers surrendered. Italy gained the Trentino-South Tyrol, which once was owned by Austria.\\n\\nFascist Italy \\nIn 1922, a new Italian government started. It was ruled by Benito Mussolini, the leader of Fascism in Italy. He became head of government and dictator, calling himself \"Il Duce\" (which means \"leader\" in Italian). He became friends with German dictator Adolf Hitler. Germany, Japan, and Italy became the Axis Powers. In 1940, they entered World War II together against France, Great Britain, and later the Soviet Union. During the war, Italy controlled most of the Mediterranean Sea.', metadata={'chunk': 5.0, 'source': 'https://simple.wikipedia.org/wiki/Italy', 'title': 'Italy', 'wiki-id': '363'})]"
706
            ]
707
          },
708
          "execution_count": 19,
709
          "metadata": {},
710
          "output_type": "execute_result"
711
        }
712
      ],
713
      "source": [
714
        "query = \"who was Benito Mussolini?\"\n",
715
        "\n",
716
        "vectorstore.similarity_search(\n",
717
        "    query,  # our search query\n",
718
        "    k=3  # return 3 most relevant docs\n",
719
        ")"
720
      ]
721
    },
722
    {
723
      "attachments": {},
724
      "cell_type": "markdown",
725
      "metadata": {
726
        "id": "ZCvtmREd0pdo"
727
      },
728
      "source": [
729
        "All of these are good, relevant results. But what can we do with this? There are many tasks, one of the most interesting (and well supported by LangChain) is called _\"Generative Question-Answering\"_ or GQA.\n",
730
        "\n",
731
        "## Generative Question-Answering\n",
732
        "\n",
733
        "In GQA we take the query as a question that is to be answered by a LLM, but the LLM must answer the question based on the information it is seeing being returned from the `vectorstore`.\n",
734
        "\n",
735
        "To do this we initialize a `RetrievalQA` object like so:"
736
      ]
737
    },
738
    {
739
      "cell_type": "code",
740
      "execution_count": 20,
741
      "metadata": {
742
        "id": "moCvQR-p0Zsb"
743
      },
744
      "outputs": [],
745
      "source": [
746
        "from langchain.chat_models import ChatOpenAI\n",
747
        "from langchain.chains import RetrievalQA\n",
748
        "\n",
749
        "# completion llm\n",
750
        "llm = ChatOpenAI(\n",
751
        "    openai_api_key=OPENAI_API_KEY,\n",
752
        "    model_name='gpt-3.5-turbo',\n",
753
        "    temperature=0.0\n",
754
        ")\n",
755
        "\n",
756
        "qa = RetrievalQA.from_chain_type(\n",
757
        "    llm=llm,\n",
758
        "    chain_type=\"stuff\",\n",
759
        "    retriever=vectorstore.as_retriever()\n",
760
        ")"
761
      ]
762
    },
763
    {
764
      "cell_type": "code",
765
      "execution_count": 21,
766
      "metadata": {
767
        "colab": {
768
          "base_uri": "https://localhost:8080/",
769
          "height": 71
770
        },
771
        "id": "KS9sa19K3LkQ",
772
        "outputId": "e8bc7b0a-1e41-4efb-e383-549ea42ac525"
773
      },
774
      "outputs": [
775
        {
776
          "data": {
777
            "text/plain": [
778
              "'Benito Mussolini was an Italian politician and journalist who served as the Prime Minister of Italy from 1922 until 1943. He was the leader of the National Fascist Party and played a significant role in the rise of fascism in Italy. Mussolini established a dictatorship and implemented policies that aimed to create a new Roman Empire. He allied with Adolf Hitler and led Italy into World War II as part of the Axis Powers. Mussolini was eventually deposed and captured by Italian partisans in 1945, and he was executed by firing squad.'"
779
            ]
780
          },
781
          "execution_count": 21,
782
          "metadata": {},
783
          "output_type": "execute_result"
784
        }
785
      ],
786
      "source": [
787
        "qa.run(query)"
788
      ]
789
    },
790
    {
791
      "attachments": {},
792
      "cell_type": "markdown",
793
      "metadata": {
794
        "id": "0qf5e3xf3ggq"
795
      },
796
      "source": [
797
        "We can also include the sources of information that the LLM is using to answer our question. We can do this using a slightly different version of `RetrievalQA` called `RetrievalQAWithSourcesChain`:"
798
      ]
799
    },
800
    {
801
      "cell_type": "code",
802
      "execution_count": 22,
803
      "metadata": {
804
        "id": "aYVMGDA13cTz"
805
      },
806
      "outputs": [],
807
      "source": [
808
        "from langchain.chains import RetrievalQAWithSourcesChain\n",
809
        "\n",
810
        "qa_with_sources = RetrievalQAWithSourcesChain.from_chain_type(\n",
811
        "    llm=llm,\n",
812
        "    chain_type=\"stuff\",\n",
813
        "    retriever=vectorstore.as_retriever()\n",
814
        ")"
815
      ]
816
    },
817
    {
818
      "cell_type": "code",
819
      "execution_count": 23,
820
      "metadata": {
821
        "colab": {
822
          "base_uri": "https://localhost:8080/"
823
        },
824
        "id": "RXsVEh3S4ZJO",
825
        "outputId": "c8677998-ddc1-485b-d8a5-85bc9b7a3af7"
826
      },
827
      "outputs": [
828
        {
829
          "data": {
830
            "text/plain": [
831
              "{'question': 'who was Benito Mussolini?',\n",
832
              " 'answer': \"Benito Mussolini was an Italian politician and journalist who served as the Prime Minister of Italy from 1922 until 1943. He was the leader of the National Fascist Party and played a significant role in the rise of fascism in Italy. Mussolini's form of fascism, known as Italian Fascism, differed from Hitler's Nazism and was focused on the idea of creating a new Roman Empire. He pursued expansionist policies, including the occupation of Abyssinia (Ethiopia) and Albania. Mussolini aligned Italy with Nazi Germany and entered World War II as part of the Axis Powers. However, Italy faced defeats and internal unrest, leading to Mussolini's deposition and imprisonment in 1943. He was later freed by the Germans and became the ruler of the Italian Social Republic puppet state. Towards the end of the war, Mussolini attempted to escape but was captured and executed by partisans. His body was publicly displayed in Milan. Mussolini's granddaughter, Alessandra Mussolini, has been involved in Neo-Fascist movements in Italy. \\n\",\n",
833
              " 'sources': 'https://simple.wikipedia.org/wiki/Benito%20Mussolini'}"
834
            ]
835
          },
836
          "execution_count": 23,
837
          "metadata": {},
838
          "output_type": "execute_result"
839
        }
840
      ],
841
      "source": [
842
        "qa_with_sources(query)"
843
      ]
844
    },
845
    {
846
      "attachments": {},
847
      "cell_type": "markdown",
848
      "metadata": {
849
        "id": "nMRk_P3Q7l5J"
850
      },
851
      "source": [
852
        "Now we answer the question being asked, *and* return the source of this information being used by the LLM."
853
      ]
854
    },
855
    {
856
      "cell_type": "markdown",
857
      "metadata": {},
858
      "source": [
859
        "Delete the index to save resources when you're done!"
860
      ]
861
    },
862
    {
863
      "cell_type": "code",
864
      "execution_count": 24,
865
      "metadata": {},
866
      "outputs": [],
867
      "source": [
868
        "pc.delete_index(index_name)"
869
      ]
870
    },
871
    {
872
      "attachments": {},
873
      "cell_type": "markdown",
874
      "metadata": {
875
        "id": "ehJEn68qADoH"
876
      },
877
      "source": [
878
        "---"
879
      ]
880
    }
881
  ],
882
  "metadata": {
883
    "colab": {
884
      "provenance": []
885
    },
886
    "kernelspec": {
887
      "display_name": "Python 3",
888
      "name": "python3"
889
    },
890
    "language_info": {
891
      "codemirror_mode": {
892
        "name": "ipython",
893
        "version": 3
894
      },
895
      "file_extension": ".py",
896
      "mimetype": "text/x-python",
897
      "name": "python",
898
      "nbconvert_exporter": "python",
899
      "pygments_lexer": "ipython3",
900
      "version": "3.9.12"
901
    },
902
    "widgets": {
903
      "application/vnd.jupyter.widget-state+json": {
904
        "059918bb59744634aaa181dc4ec256a2": {
905
          "model_module": "@jupyter-widgets/base",
906
          "model_module_version": "1.2.0",
907
          "model_name": "LayoutModel",
908
          "state": {
909
            "_model_module": "@jupyter-widgets/base",
910
            "_model_module_version": "1.2.0",
911
            "_model_name": "LayoutModel",
912
            "_view_count": null,
913
            "_view_module": "@jupyter-widgets/base",
914
            "_view_module_version": "1.2.0",
915
            "_view_name": "LayoutView",
916
            "align_content": null,
917
            "align_items": null,
918
            "align_self": null,
919
            "border": null,
920
            "bottom": null,
921
            "display": null,
922
            "flex": null,
923
            "flex_flow": null,
924
            "grid_area": null,
925
            "grid_auto_columns": null,
926
            "grid_auto_flow": null,
927
            "grid_auto_rows": null,
928
            "grid_column": null,
929
            "grid_gap": null,
930
            "grid_row": null,
931
            "grid_template_areas": null,
932
            "grid_template_columns": null,
933
            "grid_template_rows": null,
934
            "height": null,
935
            "justify_content": null,
936
            "justify_items": null,
937
            "left": null,
938
            "margin": null,
939
            "max_height": null,
940
            "max_width": null,
941
            "min_height": null,
942
            "min_width": null,
943
            "object_fit": null,
944
            "object_position": null,
945
            "order": null,
946
            "overflow": null,
947
            "overflow_x": null,
948
            "overflow_y": null,
949
            "padding": null,
950
            "right": null,
951
            "top": null,
952
            "visibility": null,
953
            "width": null
954
          }
955
        },
956
        "28a553d3a3704b3aa8b061b71b1fe2ee": {
957
          "model_module": "@jupyter-widgets/controls",
958
          "model_module_version": "1.5.0",
959
          "model_name": "HBoxModel",
960
          "state": {
961
            "_dom_classes": [],
962
            "_model_module": "@jupyter-widgets/controls",
963
            "_model_module_version": "1.5.0",
964
            "_model_name": "HBoxModel",
965
            "_view_count": null,
966
            "_view_module": "@jupyter-widgets/controls",
967
            "_view_module_version": "1.5.0",
968
            "_view_name": "HBoxView",
969
            "box_style": "",
970
            "children": [
971
              "IPY_MODEL_ee030d62f3a54f5288cccf954caa7d85",
972
              "IPY_MODEL_55cdb4e0b33a48b298f760e7ff2af0f9",
973
              "IPY_MODEL_9de7f27011b346f8b7a13fa649164ee7"
974
            ],
975
            "layout": "IPY_MODEL_f362a565ff90457f904233d4fc625119"
976
          }
977
        },
978
        "3c6290e0ee42461eb47dfcc5d5cd0629": {
979
          "model_module": "@jupyter-widgets/controls",
980
          "model_module_version": "1.5.0",
981
          "model_name": "ProgressStyleModel",
982
          "state": {
983
            "_model_module": "@jupyter-widgets/controls",
984
            "_model_module_version": "1.5.0",
985
            "_model_name": "ProgressStyleModel",
986
            "_view_count": null,
987
            "_view_module": "@jupyter-widgets/base",
988
            "_view_module_version": "1.2.0",
989
            "_view_name": "StyleView",
990
            "bar_color": null,
991
            "description_width": ""
992
          }
993
        },
994
        "55cdb4e0b33a48b298f760e7ff2af0f9": {
995
          "model_module": "@jupyter-widgets/controls",
996
          "model_module_version": "1.5.0",
997
          "model_name": "FloatProgressModel",
998
          "state": {
999
            "_dom_classes": [],
1000
            "_model_module": "@jupyter-widgets/controls",
1001
            "_model_module_version": "1.5.0",
1002
            "_model_name": "FloatProgressModel",
1003
            "_view_count": null,
1004
            "_view_module": "@jupyter-widgets/controls",
1005
            "_view_module_version": "1.5.0",
1006
            "_view_name": "ProgressView",
1007
            "bar_style": "success",
1008
            "description": "",
1009
            "description_tooltip": null,
1010
            "layout": "IPY_MODEL_83ac28af70074e998663f6f247278a83",
1011
            "max": 10000,
1012
            "min": 0,
1013
            "orientation": "horizontal",
1014
            "style": "IPY_MODEL_3c6290e0ee42461eb47dfcc5d5cd0629",
1015
            "value": 10000
1016
          }
1017
        },
1018
        "83ac28af70074e998663f6f247278a83": {
1019
          "model_module": "@jupyter-widgets/base",
1020
          "model_module_version": "1.2.0",
1021
          "model_name": "LayoutModel",
1022
          "state": {
1023
            "_model_module": "@jupyter-widgets/base",
1024
            "_model_module_version": "1.2.0",
1025
            "_model_name": "LayoutModel",
1026
            "_view_count": null,
1027
            "_view_module": "@jupyter-widgets/base",
1028
            "_view_module_version": "1.2.0",
1029
            "_view_name": "LayoutView",
1030
            "align_content": null,
1031
            "align_items": null,
1032
            "align_self": null,
1033
            "border": null,
1034
            "bottom": null,
1035
            "display": null,
1036
            "flex": null,
1037
            "flex_flow": null,
1038
            "grid_area": null,
1039
            "grid_auto_columns": null,
1040
            "grid_auto_flow": null,
1041
            "grid_auto_rows": null,
1042
            "grid_column": null,
1043
            "grid_gap": null,
1044
            "grid_row": null,
1045
            "grid_template_areas": null,
1046
            "grid_template_columns": null,
1047
            "grid_template_rows": null,
1048
            "height": null,
1049
            "justify_content": null,
1050
            "justify_items": null,
1051
            "left": null,
1052
            "margin": null,
1053
            "max_height": null,
1054
            "max_width": null,
1055
            "min_height": null,
1056
            "min_width": null,
1057
            "object_fit": null,
1058
            "object_position": null,
1059
            "order": null,
1060
            "overflow": null,
1061
            "overflow_x": null,
1062
            "overflow_y": null,
1063
            "padding": null,
1064
            "right": null,
1065
            "top": null,
1066
            "visibility": null,
1067
            "width": null
1068
          }
1069
        },
1070
        "88a2b48b3b4f415797bab96eaa925aa7": {
1071
          "model_module": "@jupyter-widgets/base",
1072
          "model_module_version": "1.2.0",
1073
          "model_name": "LayoutModel",
1074
          "state": {
1075
            "_model_module": "@jupyter-widgets/base",
1076
            "_model_module_version": "1.2.0",
1077
            "_model_name": "LayoutModel",
1078
            "_view_count": null,
1079
            "_view_module": "@jupyter-widgets/base",
1080
            "_view_module_version": "1.2.0",
1081
            "_view_name": "LayoutView",
1082
            "align_content": null,
1083
            "align_items": null,
1084
            "align_self": null,
1085
            "border": null,
1086
            "bottom": null,
1087
            "display": null,
1088
            "flex": null,
1089
            "flex_flow": null,
1090
            "grid_area": null,
1091
            "grid_auto_columns": null,
1092
            "grid_auto_flow": null,
1093
            "grid_auto_rows": null,
1094
            "grid_column": null,
1095
            "grid_gap": null,
1096
            "grid_row": null,
1097
            "grid_template_areas": null,
1098
            "grid_template_columns": null,
1099
            "grid_template_rows": null,
1100
            "height": null,
1101
            "justify_content": null,
1102
            "justify_items": null,
1103
            "left": null,
1104
            "margin": null,
1105
            "max_height": null,
1106
            "max_width": null,
1107
            "min_height": null,
1108
            "min_width": null,
1109
            "object_fit": null,
1110
            "object_position": null,
1111
            "order": null,
1112
            "overflow": null,
1113
            "overflow_x": null,
1114
            "overflow_y": null,
1115
            "padding": null,
1116
            "right": null,
1117
            "top": null,
1118
            "visibility": null,
1119
            "width": null
1120
          }
1121
        },
1122
        "9de7f27011b346f8b7a13fa649164ee7": {
1123
          "model_module": "@jupyter-widgets/controls",
1124
          "model_module_version": "1.5.0",
1125
          "model_name": "HTMLModel",
1126
          "state": {
1127
            "_dom_classes": [],
1128
            "_model_module": "@jupyter-widgets/controls",
1129
            "_model_module_version": "1.5.0",
1130
            "_model_name": "HTMLModel",
1131
            "_view_count": null,
1132
            "_view_module": "@jupyter-widgets/controls",
1133
            "_view_module_version": "1.5.0",
1134
            "_view_name": "HTMLView",
1135
            "description": "",
1136
            "description_tooltip": null,
1137
            "layout": "IPY_MODEL_88a2b48b3b4f415797bab96eaa925aa7",
1138
            "placeholder": "​",
1139
            "style": "IPY_MODEL_c241146f1475404282c35bc09e7cc945",
1140
            "value": " 10000/10000 [03:52&lt;00:00, 79.57it/s]"
1141
          }
1142
        },
1143
        "c241146f1475404282c35bc09e7cc945": {
1144
          "model_module": "@jupyter-widgets/controls",
1145
          "model_module_version": "1.5.0",
1146
          "model_name": "DescriptionStyleModel",
1147
          "state": {
1148
            "_model_module": "@jupyter-widgets/controls",
1149
            "_model_module_version": "1.5.0",
1150
            "_model_name": "DescriptionStyleModel",
1151
            "_view_count": null,
1152
            "_view_module": "@jupyter-widgets/base",
1153
            "_view_module_version": "1.2.0",
1154
            "_view_name": "StyleView",
1155
            "description_width": ""
1156
          }
1157
        },
1158
        "ee030d62f3a54f5288cccf954caa7d85": {
1159
          "model_module": "@jupyter-widgets/controls",
1160
          "model_module_version": "1.5.0",
1161
          "model_name": "HTMLModel",
1162
          "state": {
1163
            "_dom_classes": [],
1164
            "_model_module": "@jupyter-widgets/controls",
1165
            "_model_module_version": "1.5.0",
1166
            "_model_name": "HTMLModel",
1167
            "_view_count": null,
1168
            "_view_module": "@jupyter-widgets/controls",
1169
            "_view_module_version": "1.5.0",
1170
            "_view_name": "HTMLView",
1171
            "description": "",
1172
            "description_tooltip": null,
1173
            "layout": "IPY_MODEL_059918bb59744634aaa181dc4ec256a2",
1174
            "placeholder": "​",
1175
            "style": "IPY_MODEL_f762e8d37ab6441d87b2a66bfddd5239",
1176
            "value": "100%"
1177
          }
1178
        },
1179
        "f362a565ff90457f904233d4fc625119": {
1180
          "model_module": "@jupyter-widgets/base",
1181
          "model_module_version": "1.2.0",
1182
          "model_name": "LayoutModel",
1183
          "state": {
1184
            "_model_module": "@jupyter-widgets/base",
1185
            "_model_module_version": "1.2.0",
1186
            "_model_name": "LayoutModel",
1187
            "_view_count": null,
1188
            "_view_module": "@jupyter-widgets/base",
1189
            "_view_module_version": "1.2.0",
1190
            "_view_name": "LayoutView",
1191
            "align_content": null,
1192
            "align_items": null,
1193
            "align_self": null,
1194
            "border": null,
1195
            "bottom": null,
1196
            "display": null,
1197
            "flex": null,
1198
            "flex_flow": null,
1199
            "grid_area": null,
1200
            "grid_auto_columns": null,
1201
            "grid_auto_flow": null,
1202
            "grid_auto_rows": null,
1203
            "grid_column": null,
1204
            "grid_gap": null,
1205
            "grid_row": null,
1206
            "grid_template_areas": null,
1207
            "grid_template_columns": null,
1208
            "grid_template_rows": null,
1209
            "height": null,
1210
            "justify_content": null,
1211
            "justify_items": null,
1212
            "left": null,
1213
            "margin": null,
1214
            "max_height": null,
1215
            "max_width": null,
1216
            "min_height": null,
1217
            "min_width": null,
1218
            "object_fit": null,
1219
            "object_position": null,
1220
            "order": null,
1221
            "overflow": null,
1222
            "overflow_x": null,
1223
            "overflow_y": null,
1224
            "padding": null,
1225
            "right": null,
1226
            "top": null,
1227
            "visibility": null,
1228
            "width": null
1229
          }
1230
        },
1231
        "f762e8d37ab6441d87b2a66bfddd5239": {
1232
          "model_module": "@jupyter-widgets/controls",
1233
          "model_module_version": "1.5.0",
1234
          "model_name": "DescriptionStyleModel",
1235
          "state": {
1236
            "_model_module": "@jupyter-widgets/controls",
1237
            "_model_module_version": "1.5.0",
1238
            "_model_name": "DescriptionStyleModel",
1239
            "_view_count": null,
1240
            "_view_module": "@jupyter-widgets/base",
1241
            "_view_module_version": "1.2.0",
1242
            "_view_name": "StyleView",
1243
            "description_width": ""
1244
          }
1245
        }
1246
      }
1247
    }
1248
  },
1249
  "nbformat": 4,
1250
  "nbformat_minor": 0
1251
}
1252

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.