examples

rag-chatbot.ipynb
1163 строки · 49.6 Кб
Перенос по словам
1
{
2
 "cells": [
3
  {
4
   "attachments": {},
5
   "cell_type": "markdown",
6
   "metadata": {},
7
   "source": [
8
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb)"
9
   ]
10
  },
11
  {
12
   "attachments": {},
13
   "cell_type": "markdown",
14
   "metadata": {},
15
   "source": [
16
    "# Building RAG Chatbots with LangChain"
17
   ]
18
  },
19
  {
20
   "attachments": {},
21
   "cell_type": "markdown",
22
   "metadata": {},
23
   "source": [
24
    "In this example, we'll work on building an AI chatbot from start-to-finish. We will be using LangChain, OpenAI, and Pinecone vector DB, to build a chatbot capable of learning from the external world using **R**etrieval **A**ugmented **G**eneration (RAG).\n",
25
    "\n",
26
    "We will be using a dataset sourced from the Llama 2 ArXiv paper and other related papers to help our chatbot answer questions about the latest and greatest in the world of GenAI.\n",
27
    "\n",
28
    "By the end of the example we'll have a functioning chatbot and RAG pipeline that can hold a conversation and provide informative responses based on a knowledge base.\n",
29
    "\n",
30
    "### Before you begin\n",
31
    "\n",
32
    "You'll need to get an [OpenAI API key](https://platform.openai.com/account/api-keys) and [Pinecone API key](https://app.pinecone.io)."
33
   ]
34
  },
35
  {
36
   "attachments": {},
37
   "cell_type": "markdown",
38
   "metadata": {},
39
   "source": [
40
    "### Prerequisites"
41
   ]
42
  },
43
  {
44
   "attachments": {},
45
   "cell_type": "markdown",
46
   "metadata": {},
47
   "source": [
48
    "Before we start building our chatbot, we need to install some Python libraries. Here's a brief overview of what each library does:\n",
49
    "\n",
50
    "- **langchain**: This is a library for GenAI. We'll use it to chain together different language models and components for our chatbot.\n",
51
    "- **openai**: This is the official OpenAI Python client. We'll use it to interact with the OpenAI API and generate responses for our chatbot.\n",
52
    "- **datasets**: This library provides a vast array of datasets for machine learning. We'll use it to load our knowledge base for the chatbot.\n",
53
    "- **pinecone-client**: This is the official Pinecone Python client. We'll use it to interact with the Pinecone API and store our chatbot's knowledge base in a vector database.\n",
54
    "\n",
55
    "You can install these libraries using pip like so:"
56
   ]
57
  },
58
  {
59
   "cell_type": "code",
60
   "execution_count": null,
61
   "metadata": {},
62
   "outputs": [],
63
   "source": [
64
    "!pip install -qU \\\n",
65
    "    langchain==0.0.354 \\\n",
66
    "    openai==1.6.1 \\\n",
67
    "    datasets==2.10.1 \\\n",
68
    "    pinecone-client==3.1.0 \\\n",
69
    "    tiktoken==0.5.2"
70
   ]
71
  },
72
  {
73
   "attachments": {},
74
   "cell_type": "markdown",
75
   "metadata": {},
76
   "source": [
77
    "### Building a Chatbot (no RAG)"
78
   ]
79
  },
80
  {
81
   "attachments": {},
82
   "cell_type": "markdown",
83
   "metadata": {},
84
   "source": [
85
    "We will be relying heavily on the LangChain library to bring together the different components needed for our chatbot. To begin, we'll create a simple chatbot without any retrieval augmentation. We do this by initializing a `ChatOpenAI` object. For this we do need an [OpenAI API key](https://platform.openai.com/account/api-keys)."
86
   ]
87
  },
88
  {
89
   "cell_type": "code",
90
   "execution_count": 1,
91
   "metadata": {},
92
   "outputs": [],
93
   "source": [
94
    "import os\n",
95
    "from langchain.chat_models import ChatOpenAI\n",
96
    "\n",
97
    "os.environ[\"OPENAI_API_KEY\"] = os.getenv(\"OPENAI_API_KEY\") or \"YOUR_API_KEY\"\n",
98
    "\n",
99
    "chat = ChatOpenAI(\n",
100
    "    openai_api_key=os.environ[\"OPENAI_API_KEY\"],\n",
101
    "    model='gpt-3.5-turbo'\n",
102
    ")"
103
   ]
104
  },
105
  {
106
   "attachments": {},
107
   "cell_type": "markdown",
108
   "metadata": {},
109
   "source": [
110
    "Chats with OpenAI's `gpt-3.5-turbo` and `gpt-4` chat models are typically structured (in plain text) like this:\n",
111
    "\n",
112
    "```\n",
113
    "System: You are a helpful assistant.\n",
114
    "\n",
115
    "User: Hi AI, how are you today?\n",
116
    "\n",
117
    "Assistant: I'm great thank you. How can I help you?\n",
118
    "\n",
119
    "User: I'd like to understand string theory.\n",
120
    "\n",
121
    "Assistant:\n",
122
    "```\n",
123
    "\n",
124
    "The final `\"Assistant:\"` without a response is what would prompt the model to continue the conversation. In the official OpenAI `ChatCompletion` endpoint these would be passed to the model in a format like:\n",
125
    "\n",
126
    "```python\n",
127
    "[\n",
128
    "    {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
129
    "    {\"role\": \"user\", \"content\": \"Hi AI, how are you today?\"},\n",
130
    "    {\"role\": \"assistant\", \"content\": \"I'm great thank you. How can I help you?\"}\n",
131
    "    {\"role\": \"user\", \"content\": \"I'd like to understand string theory.\"}\n",
132
    "]\n",
133
    "```\n",
134
    "\n",
135
    "In LangChain there is a slightly different format. We use three _message_ objects like so:"
136
   ]
137
  },
138
  {
139
   "cell_type": "code",
140
   "execution_count": 2,
141
   "metadata": {},
142
   "outputs": [],
143
   "source": [
144
    "from langchain.schema import (\n",
145
    "    SystemMessage,\n",
146
    "    HumanMessage,\n",
147
    "    AIMessage\n",
148
    ")\n",
149
    "\n",
150
    "messages = [\n",
151
    "    SystemMessage(content=\"You are a helpful assistant.\"),\n",
152
    "    HumanMessage(content=\"Hi AI, how are you today?\"),\n",
153
    "    AIMessage(content=\"I'm great thank you. How can I help you?\"),\n",
154
    "    HumanMessage(content=\"I'd like to understand string theory.\")\n",
155
    "]"
156
   ]
157
  },
158
  {
159
   "attachments": {},
160
   "cell_type": "markdown",
161
   "metadata": {},
162
   "source": [
163
    "The format is very similar, we're just swapped the role of `\"user\"` for `HumanMessage`, and the role of `\"assistant\"` for `AIMessage`.\n",
164
    "\n",
165
    "We generate the next response from the AI by passing these messages to the `ChatOpenAI` object."
166
   ]
167
  },
168
  {
169
   "cell_type": "code",
170
   "execution_count": 3,
171
   "metadata": {},
172
   "outputs": [
173
    {
174
     "data": {
175
      "text/plain": [
176
       "AIMessage(content='String theory is a theoretical framework in physics that aims to explain the fundamental nature of particles and their interactions. According to string theory, the basic building blocks of the universe are not particles, but tiny vibrating strings.\\n\\nHere are some key points to help you understand string theory:\\n\\n1. Fundamental Particles: In traditional particle physics, elementary particles are considered point-like entities. In string theory, they are replaced by tiny, one-dimensional strings. These strings can vibrate in different modes, producing different particle properties such as mass and charge.\\n\\n2. Extra Dimensions: String theory suggests that there may be extra dimensions beyond the three spatial dimensions (length, width, and height) that we experience. These additional dimensions are believed to be curled up or compactified at incredibly small scales, making them invisible to our current observations.\\n\\n3. Unification of Forces: String theory has the potential to unify the four fundamental forces in physics: gravity, electromagnetism, and the strong and weak nuclear forces. By considering the vibrational patterns of strings, it becomes possible to describe these forces as different manifestations of a single underlying theory.\\n\\n4. Multiverse and String Landscape: String theory proposes the existence of multiple possible solutions or configurations, known as the \"string landscape.\" Each configuration corresponds to a possible universe with its own physical laws and properties. This idea leads to the concept of a multiverse, where our universe is just one among many.\\n\\n5. Challenges and Research: Despite its promising potential, string theory is still a work in progress and faces several challenges. For example, it has yet to make testable predictions that can be experimentally verified. Researchers continue to explore and refine the theory through mathematical calculations and simulations.\\n\\nUnderstanding string theory requires a strong background in physics and mathematics, as it is a complex and mathematically rigorous subject. It is an active area of research, and scientists are continually working to uncover its implications and validate its predictions.\\n\\nPlease note that this is a simplified overview of string theory, and there are many more intricate details and concepts involved.')"
177
      ]
178
     },
179
     "execution_count": 3,
180
     "metadata": {},
181
     "output_type": "execute_result"
182
    }
183
   ],
184
   "source": [
185
    "res = chat(messages)\n",
186
    "res"
187
   ]
188
  },
189
  {
190
   "attachments": {},
191
   "cell_type": "markdown",
192
   "metadata": {},
193
   "source": [
194
    "In response we get another AI message object. We can print it more clearly like so:"
195
   ]
196
  },
197
  {
198
   "cell_type": "code",
199
   "execution_count": 4,
200
   "metadata": {},
201
   "outputs": [
202
    {
203
     "name": "stdout",
204
     "output_type": "stream",
205
     "text": [
206
      "String theory is a theoretical framework in physics that aims to explain the fundamental nature of particles and their interactions. According to string theory, the basic building blocks of the universe are not particles, but tiny vibrating strings.\n",
207
      "\n",
208
      "Here are some key points to help you understand string theory:\n",
209
      "\n",
210
      "1. Fundamental Particles: In traditional particle physics, elementary particles are considered point-like entities. In string theory, they are replaced by tiny, one-dimensional strings. These strings can vibrate in different modes, producing different particle properties such as mass and charge.\n",
211
      "\n",
212
      "2. Extra Dimensions: String theory suggests that there may be extra dimensions beyond the three spatial dimensions (length, width, and height) that we experience. These additional dimensions are believed to be curled up or compactified at incredibly small scales, making them invisible to our current observations.\n",
213
      "\n",
214
      "3. Unification of Forces: String theory has the potential to unify the four fundamental forces in physics: gravity, electromagnetism, and the strong and weak nuclear forces. By considering the vibrational patterns of strings, it becomes possible to describe these forces as different manifestations of a single underlying theory.\n",
215
      "\n",
216
      "4. Multiverse and String Landscape: String theory proposes the existence of multiple possible solutions or configurations, known as the \"string landscape.\" Each configuration corresponds to a possible universe with its own physical laws and properties. This idea leads to the concept of a multiverse, where our universe is just one among many.\n",
217
      "\n",
218
      "5. Challenges and Research: Despite its promising potential, string theory is still a work in progress and faces several challenges. For example, it has yet to make testable predictions that can be experimentally verified. Researchers continue to explore and refine the theory through mathematical calculations and simulations.\n",
219
      "\n",
220
      "Understanding string theory requires a strong background in physics and mathematics, as it is a complex and mathematically rigorous subject. It is an active area of research, and scientists are continually working to uncover its implications and validate its predictions.\n",
221
      "\n",
222
      "Please note that this is a simplified overview of string theory, and there are many more intricate details and concepts involved.\n"
223
     ]
224
    }
225
   ],
226
   "source": [
227
    "print(res.content)"
228
   ]
229
  },
230
  {
231
   "attachments": {},
232
   "cell_type": "markdown",
233
   "metadata": {},
234
   "source": [
235
    "Because `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation."
236
   ]
237
  },
238
  {
239
   "cell_type": "code",
240
   "execution_count": 5,
241
   "metadata": {},
242
   "outputs": [
243
    {
244
     "name": "stdout",
245
     "output_type": "stream",
246
     "text": [
247
      "Physicists believe that string theory has the potential to produce a unified theory because it incorporates all the fundamental forces of nature within a single framework. The four fundamental forces in physics are gravity, electromagnetism, and the strong and weak nuclear forces. \n",
248
      "\n",
249
      "In the standard model of particle physics, these forces are described by different theories that are not fully compatible with each other. For example, gravity is described by general relativity, which explains the behavior of massive objects on large scales, while the other forces are described by quantum field theories.\n",
250
      "\n",
251
      "String theory, on the other hand, provides a framework that can potentially reconcile and unify these forces. The vibrations of the tiny strings in string theory can give rise to different types of particles, and the way these particles interact can reproduce the behavior of the various fundamental forces. This means that, in principle, all the forces of nature could be described by a single underlying theory.\n",
252
      "\n",
253
      "Additionally, string theory incorporates gravity naturally, unlike the other forces, which have been challenging to reconcile with gravity in other theoretical frameworks. By including gravity within the framework of string theory, physicists hope to achieve a consistent and unified description of all the fundamental forces.\n",
254
      "\n",
255
      "However, it is important to note that achieving a complete and fully validated unified theory based on string theory is still a work in progress. Many challenges and open questions remain, and further research and experimentation are needed to test and refine the theory.\n"
256
     ]
257
    }
258
   ],
259
   "source": [
260
    "# add latest AI response to messages\n",
261
    "messages.append(res)\n",
262
    "\n",
263
    "# now create a new user prompt\n",
264
    "prompt = HumanMessage(\n",
265
    "    content=\"Why do physicists believe it can produce a 'unified theory'?\"\n",
266
    ")\n",
267
    "# add to messages\n",
268
    "messages.append(prompt)\n",
269
    "\n",
270
    "# send to chat-gpt\n",
271
    "res = chat(messages)\n",
272
    "\n",
273
    "print(res.content)"
274
   ]
275
  },
276
  {
277
   "attachments": {},
278
   "cell_type": "markdown",
279
   "metadata": {},
280
   "source": [
281
    "### Dealing with Hallucinations"
282
   ]
283
  },
284
  {
285
   "attachments": {},
286
   "cell_type": "markdown",
287
   "metadata": {},
288
   "source": [
289
    "We have our chatbot, but as mentioned — the knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the \"world\" as seen in the training data into the internal parameters of the model. We call this knowledge the _parametric knowledge_ of the model.\n",
290
    "\n",
291
    "By default, LLMs have no access to the external world.\n",
292
    "\n",
293
    "The result of this is very clear when we ask LLMs about more recent information, like about the new (and very popular) Llama 2 LLM."
294
   ]
295
  },
296
  {
297
   "cell_type": "code",
298
   "execution_count": 6,
299
   "metadata": {},
300
   "outputs": [],
301
   "source": [
302
    "# add latest AI response to messages\n",
303
    "messages.append(res)\n",
304
    "\n",
305
    "# now create a new user prompt\n",
306
    "prompt = HumanMessage(\n",
307
    "    content=\"What is so special about Llama 2?\"\n",
308
    ")\n",
309
    "# add to messages\n",
310
    "messages.append(prompt)\n",
311
    "\n",
312
    "# send to OpenAI\n",
313
    "res = chat(messages)"
314
   ]
315
  },
316
  {
317
   "cell_type": "code",
318
   "execution_count": 7,
319
   "metadata": {},
320
   "outputs": [
321
    {
322
     "name": "stdout",
323
     "output_type": "stream",
324
     "text": [
325
      "I'm sorry, but I don't have any specific information about \"Llama 2.\" It could be a reference to something specific that I am not aware of. Could you please provide more context or clarify your question?\n"
326
     ]
327
    }
328
   ],
329
   "source": [
330
    "print(res.content)"
331
   ]
332
  },
333
  {
334
   "attachments": {},
335
   "cell_type": "markdown",
336
   "metadata": {},
337
   "source": [
338
    "Our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear from this answer that the LLM doesn't know the informaiton, but sometimes an LLM may respond like it _does_ know the answer — and this can be very hard to detect.\n",
339
    "\n",
340
    "OpenAI have since adjusted the behavior for this particular example as we can see below:"
341
   ]
342
  },
343
  {
344
   "cell_type": "code",
345
   "execution_count": 8,
346
   "metadata": {},
347
   "outputs": [],
348
   "source": [
349
    "# add latest AI response to messages\n",
350
    "messages.append(res)\n",
351
    "\n",
352
    "# now create a new user prompt\n",
353
    "prompt = HumanMessage(\n",
354
    "    content=\"Can you tell me about the LLMChain in LangChain?\"\n",
355
    ")\n",
356
    "# add to messages\n",
357
    "messages.append(prompt)\n",
358
    "\n",
359
    "# send to OpenAI\n",
360
    "res = chat(messages)"
361
   ]
362
  },
363
  {
364
   "cell_type": "code",
365
   "execution_count": 9,
366
   "metadata": {},
367
   "outputs": [
368
    {
369
     "name": "stdout",
370
     "output_type": "stream",
371
     "text": [
372
      "I apologize, but I couldn't find any specific information about \"LLMChain\" or \"LangChain.\" It's possible that these terms are specific to a particular context or project that I am not familiar with. If you can provide more details or clarify your question, I'll do my best to assist you.\n"
373
     ]
374
    }
375
   ],
376
   "source": [
377
    "print(res.content)"
378
   ]
379
  },
380
  {
381
   "attachments": {},
382
   "cell_type": "markdown",
383
   "metadata": {},
384
   "source": [
385
    "There is another way of feeding knowledge into LLMs. It is called _source knowledge_ and it refers to any information fed into the LLM via the prompt. We can try that with the LLMChain question. We can take a description of this object from the LangChain documentation."
386
   ]
387
  },
388
  {
389
   "cell_type": "code",
390
   "execution_count": 10,
391
   "metadata": {},
392
   "outputs": [],
393
   "source": [
394
    "llmchain_information = [\n",
395
    "    \"A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.\",\n",
396
    "    \"Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.\",\n",
397
    "    \"LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications.\"\n",
398
    "]\n",
399
    "\n",
400
    "source_knowledge = \"\\n\".join(llmchain_information)"
401
   ]
402
  },
403
  {
404
   "attachments": {},
405
   "cell_type": "markdown",
406
   "metadata": {},
407
   "source": [
408
    "We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query."
409
   ]
410
  },
411
  {
412
   "cell_type": "code",
413
   "execution_count": 11,
414
   "metadata": {},
415
   "outputs": [],
416
   "source": [
417
    "query = \"Can you tell me about the LLMChain in LangChain?\"\n",
418
    "\n",
419
    "augmented_prompt = f\"\"\"Using the contexts below, answer the query.\n",
420
    "\n",
421
    "Contexts:\n",
422
    "{source_knowledge}\n",
423
    "\n",
424
    "Query: {query}\"\"\""
425
   ]
426
  },
427
  {
428
   "attachments": {},
429
   "cell_type": "markdown",
430
   "metadata": {},
431
   "source": [
432
    "Now we feed this into our chatbot as we were before."
433
   ]
434
  },
435
  {
436
   "cell_type": "code",
437
   "execution_count": 12,
438
   "metadata": {},
439
   "outputs": [],
440
   "source": [
441
    "# create a new user prompt\n",
442
    "prompt = HumanMessage(\n",
443
    "    content=augmented_prompt\n",
444
    ")\n",
445
    "# add to messages\n",
446
    "messages.append(prompt)\n",
447
    "\n",
448
    "# send to OpenAI\n",
449
    "res = chat(messages)"
450
   ]
451
  },
452
  {
453
   "cell_type": "code",
454
   "execution_count": 13,
455
   "metadata": {},
456
   "outputs": [
457
    {
458
     "name": "stdout",
459
     "output_type": "stream",
460
     "text": [
461
      "The LLMChain is a specific type of chain within the LangChain framework. It is designed to facilitate the development of applications powered by language models. \n",
462
      "\n",
463
      "A LLMChain consists of three main components: a PromptTemplate, a language model (either an LLM or a ChatModel), and an optional output parser. \n",
464
      "\n",
465
      "The PromptTemplate takes multiple input variables and formats them into a prompt. This prompt is then passed to the language model, which generates a response based on the provided input. \n",
466
      "\n",
467
      "The optional output parser, if provided, is used to parse the output of the language model into a final format that suits the specific application's needs. This allows for customization and flexibility in processing the model's output.\n",
468
      "\n",
469
      "The LangChain framework, of which the LLMChain is a part, aims to enable powerful and differentiated applications by not only connecting to a language model via an API but also by being data-aware and agentic. Being data-aware means that the framework allows for integration with other sources of data, enhancing the capabilities of the language model. Being agentic means that the framework enables the language model to interact with its environment, making it more dynamic and adaptable.\n",
470
      "\n",
471
      "In summary, the LLMChain is a key component within the LangChain framework, providing a structure for integrating language models into applications and enabling data-aware and agentic functionality.\n"
472
     ]
473
    }
474
   ],
475
   "source": [
476
    "print(res.content)"
477
   ]
478
  },
479
  {
480
   "attachments": {},
481
   "cell_type": "markdown",
482
   "metadata": {},
483
   "source": [
484
    "The quality of this answer is phenomenal. This is made possible thanks to the idea of augmented our query with external knowledge (source knowledge). There's just one problem — how do we get this information in the first place?\n",
485
    "\n",
486
    "We learned in the previous chapters about Pinecone and vector databases. Well, they can help us here too. But first, we'll need a dataset."
487
   ]
488
  },
489
  {
490
   "attachments": {},
491
   "cell_type": "markdown",
492
   "metadata": {},
493
   "source": [
494
    "### Importing the Data"
495
   ]
496
  },
497
  {
498
   "attachments": {},
499
   "cell_type": "markdown",
500
   "metadata": {},
501
   "source": [
502
    "In this task, we will be importing our data. We will be using the Hugging Face Datasets library to load our data. Specifically, we will be using the `\"jamescalam/llama-2-arxiv-papers\"` dataset. This dataset contains a collection of ArXiv papers which will serve as the external knowledge base for our chatbot."
503
   ]
504
  },
505
  {
506
   "cell_type": "code",
507
   "execution_count": 14,
508
   "metadata": {},
509
   "outputs": [
510
    {
511
     "data": {
512
      "text/plain": [
513
       "Dataset({\n",
514
       "    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],\n",
515
       "    num_rows: 4838\n",
516
       "})"
517
      ]
518
     },
519
     "execution_count": 14,
520
     "metadata": {},
521
     "output_type": "execute_result"
522
    }
523
   ],
524
   "source": [
525
    "from datasets import load_dataset\n",
526
    "\n",
527
    "dataset = load_dataset(\n",
528
    "    \"jamescalam/llama-2-arxiv-papers-chunked\",\n",
529
    "    split=\"train\"\n",
530
    ")\n",
531
    "\n",
532
    "dataset"
533
   ]
534
  },
535
  {
536
   "cell_type": "code",
537
   "execution_count": 15,
538
   "metadata": {},
539
   "outputs": [
540
    {
541
     "data": {
542
      "text/plain": [
543
       "{'doi': '1102.0183',\n",
544
       " 'chunk-id': '0',\n",
545
       " 'chunk': 'High-Performance Neural Networks\\nfor Visual Object Classi\\x0ccation\\nDan C. Cire\\x18 san, Ueli Meier, Jonathan Masci,\\nLuca M. Gambardella and J\\x7f urgen Schmidhuber\\nTechnical Report No. IDSIA-01-11\\nJanuary 2011\\nIDSIA / USI-SUPSI\\nDalle Molle Institute for Arti\\x0ccial Intelligence\\nGalleria 2, 6928 Manno, Switzerland\\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\\nTechnical Report No. IDSIA-01-11 1\\nHigh-Performance Neural Networks\\nfor Visual Object Classi\\x0ccation\\nDan C. Cire\\x18 san, Ueli Meier, Jonathan Masci,\\nLuca M. Gambardella and J\\x7f urgen Schmidhuber\\nJanuary 2011\\nAbstract\\nWe present a fast, fully parameterizable GPU implementation of Convolutional Neural\\nNetwork variants. Our feature extractors are neither carefully designed nor pre-wired, but',\n",
546
       " 'id': '1102.0183',\n",
547
       " 'title': 'High-Performance Neural Networks for Visual Object Classification',\n",
548
       " 'summary': 'We present a fast, fully parameterizable GPU implementation of Convolutional\\nNeural Network variants. Our feature extractors are neither carefully designed\\nnor pre-wired, but rather learned in a supervised way. Our deep hierarchical\\narchitectures achieve the best published results on benchmarks for object\\nclassification (NORB, CIFAR10) and handwritten digit recognition (MNIST), with\\nerror rates of 2.53%, 19.51%, 0.35%, respectively. Deep nets trained by simple\\nback-propagation perform better than more shallow ones. Learning is\\nsurprisingly rapid. NORB is completely trained within five epochs. Test error\\nrates on MNIST drop to 2.42%, 0.97% and 0.48% after 1, 3 and 17 epochs,\\nrespectively.',\n",
549
       " 'source': 'http://arxiv.org/pdf/1102.0183',\n",
550
       " 'authors': ['Dan C. Cireşan',\n",
551
       "  'Ueli Meier',\n",
552
       "  'Jonathan Masci',\n",
553
       "  'Luca M. Gambardella',\n",
554
       "  'Jürgen Schmidhuber'],\n",
555
       " 'categories': ['cs.AI', 'cs.NE'],\n",
556
       " 'comment': '12 pages, 2 figures, 5 tables',\n",
557
       " 'journal_ref': None,\n",
558
       " 'primary_category': 'cs.AI',\n",
559
       " 'published': '20110201',\n",
560
       " 'updated': '20110201',\n",
561
       " 'references': []}"
562
      ]
563
     },
564
     "execution_count": 15,
565
     "metadata": {},
566
     "output_type": "execute_result"
567
    }
568
   ],
569
   "source": [
570
    "dataset[0]"
571
   ]
572
  },
573
  {
574
   "attachments": {},
575
   "cell_type": "markdown",
576
   "metadata": {},
577
   "source": [
578
    "#### Dataset Overview\n",
579
    "\n",
580
    "The dataset we are using is sourced from the Llama 2 ArXiv papers. It is a collection of academic papers from ArXiv, a repository of electronic preprints approved for publication after moderation. Each entry in the dataset represents a \"chunk\" of text from these papers.\n",
581
    "\n",
582
    "Because most **L**arge **L**anguage **M**odels (LLMs) only contain knowledge of the world as it was during training, they cannot answer our questions about Llama 2 — at least not without this data."
583
   ]
584
  },
585
  {
586
   "attachments": {},
587
   "cell_type": "markdown",
588
   "metadata": {},
589
   "source": [
590
    "### Task 4: Building the Knowledge Base"
591
   ]
592
  },
593
  {
594
   "attachments": {},
595
   "cell_type": "markdown",
596
   "metadata": {},
597
   "source": [
598
    "We now have a dataset that can serve as our chatbot knowledge base. Our next task is to transform that dataset into the knowledge base that our chatbot can use. To do this we must use an embedding model and vector database.\n",
599
    "\n",
600
    "We begin by initializing our connection to Pinecone, this requires a [free API key](https://app.pinecone.io)."
601
   ]
602
  },
603
  {
604
   "cell_type": "code",
605
   "execution_count": 16,
606
   "metadata": {},
607
   "outputs": [],
608
   "source": [
609
    "from pinecone import Pinecone\n",
610
    "\n",
611
    "# initialize connection to pinecone (get API key at app.pinecone.io)\n",
612
    "api_key = os.getenv(\"PINECONE_API_KEY\") or \"YOUR_API_KEY\"\n",
613
    "\n",
614
    "# configure client\n",
615
    "pc = Pinecone(api_key=api_key)"
616
   ]
617
  },
618
  {
619
   "cell_type": "markdown",
620
   "metadata": {},
621
   "source": [
622
    "Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects)."
623
   ]
624
  },
625
  {
626
   "cell_type": "code",
627
   "execution_count": 17,
628
   "metadata": {},
629
   "outputs": [],
630
   "source": [
631
    "from pinecone import ServerlessSpec\n",
632
    "\n",
633
    "spec = ServerlessSpec(\n",
634
    "    cloud=\"aws\", region=\"us-west-2\"\n",
635
    ")"
636
   ]
637
  },
638
  {
639
   "attachments": {},
640
   "cell_type": "markdown",
641
   "metadata": {},
642
   "source": [
643
    "Then we initialize the index. We will be using OpenAI's `text-embedding-ada-002` model for creating the embeddings, so we set the `dimension` to `1536`."
644
   ]
645
  },
646
  {
647
   "cell_type": "code",
648
   "execution_count": 18,
649
   "metadata": {},
650
   "outputs": [
651
    {
652
     "data": {
653
      "text/plain": [
654
       "{'dimension': 1536,\n",
655
       " 'index_fullness': 0.0,\n",
656
       " 'namespaces': {},\n",
657
       " 'total_vector_count': 0}"
658
      ]
659
     },
660
     "execution_count": 18,
661
     "metadata": {},
662
     "output_type": "execute_result"
663
    }
664
   ],
665
   "source": [
666
    "import time\n",
667
    "\n",
668
    "index_name = 'llama-2-rag'\n",
669
    "existing_indexes = [\n",
670
    "    index_info[\"name\"] for index_info in pc.list_indexes()\n",
671
    "]\n",
672
    "\n",
673
    "# check if index already exists (it shouldn't if this is first time)\n",
674
    "if index_name not in existing_indexes:\n",
675
    "    # if does not exist, create index\n",
676
    "    pc.create_index(\n",
677
    "        index_name,\n",
678
    "        dimension=1536,  # dimensionality of ada 002\n",
679
    "        metric='dotproduct',\n",
680
    "        spec=spec\n",
681
    "    )\n",
682
    "    # wait for index to be initialized\n",
683
    "    while not pc.describe_index(index_name).status['ready']:\n",
684
    "        time.sleep(1)\n",
685
    "\n",
686
    "# connect to index\n",
687
    "index = pc.Index(index_name)\n",
688
    "time.sleep(1)\n",
689
    "# view index stats\n",
690
    "index.describe_index_stats()"
691
   ]
692
  },
693
  {
694
   "attachments": {},
695
   "cell_type": "markdown",
696
   "metadata": {},
697
   "source": [
698
    "Our index is now ready but it's empty. It is a vector index, so it needs vectors. As mentioned, to create these vector embeddings we will OpenAI's `text-embedding-ada-002` model — we can access it via LangChain like so:"
699
   ]
700
  },
701
  {
702
   "cell_type": "code",
703
   "execution_count": 19,
704
   "metadata": {},
705
   "outputs": [],
706
   "source": [
707
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
708
    "\n",
709
    "embed_model = OpenAIEmbeddings(model=\"text-embedding-ada-002\")"
710
   ]
711
  },
712
  {
713
   "attachments": {},
714
   "cell_type": "markdown",
715
   "metadata": {},
716
   "source": [
717
    "Using this model we can create embeddings like so:"
718
   ]
719
  },
720
  {
721
   "cell_type": "code",
722
   "execution_count": 20,
723
   "metadata": {},
724
   "outputs": [
725
    {
726
     "data": {
727
      "text/plain": [
728
       "(2, 1536)"
729
      ]
730
     },
731
     "execution_count": 20,
732
     "metadata": {},
733
     "output_type": "execute_result"
734
    }
735
   ],
736
   "source": [
737
    "texts = [\n",
738
    "    'this is the first chunk of text',\n",
739
    "    'then another second chunk of text is here'\n",
740
    "]\n",
741
    "\n",
742
    "res = embed_model.embed_documents(texts)\n",
743
    "len(res), len(res[0])"
744
   ]
745
  },
746
  {
747
   "attachments": {},
748
   "cell_type": "markdown",
749
   "metadata": {},
750
   "source": [
751
    "From this we get two (aligning to our two chunks of text) 1536-dimensional embeddings.\n",
752
    "\n",
753
    "We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches."
754
   ]
755
  },
756
  {
757
   "cell_type": "code",
758
   "execution_count": 21,
759
   "metadata": {},
760
   "outputs": [
761
    {
762
     "data": {
763
      "application/vnd.jupyter.widget-view+json": {
764
       "model_id": "26087e3b0d2e4b8cb7c3e9fb12dc92a7",
765
       "version_major": 2,
766
       "version_minor": 0
767
      },
768
      "text/plain": [
769
       "  0%|          | 0/49 [00:00<?, ?it/s]"
770
      ]
771
     },
772
     "metadata": {},
773
     "output_type": "display_data"
774
    }
775
   ],
776
   "source": [
777
    "from tqdm.auto import tqdm  # for progress bar\n",
778
    "\n",
779
    "data = dataset.to_pandas()  # this makes it easier to iterate over the dataset\n",
780
    "\n",
781
    "batch_size = 100\n",
782
    "\n",
783
    "for i in tqdm(range(0, len(data), batch_size)):\n",
784
    "    i_end = min(len(data), i+batch_size)\n",
785
    "    # get batch of data\n",
786
    "    batch = data.iloc[i:i_end]\n",
787
    "    # generate unique ids for each chunk\n",
788
    "    ids = [f\"{x['doi']}-{x['chunk-id']}\" for i, x in batch.iterrows()]\n",
789
    "    # get text to embed\n",
790
    "    texts = [x['chunk'] for _, x in batch.iterrows()]\n",
791
    "    # embed text\n",
792
    "    embeds = embed_model.embed_documents(texts)\n",
793
    "    # get metadata to store in Pinecone\n",
794
    "    metadata = [\n",
795
    "        {'text': x['chunk'],\n",
796
    "         'source': x['source'],\n",
797
    "         'title': x['title']} for i, x in batch.iterrows()\n",
798
    "    ]\n",
799
    "    # add to Pinecone\n",
800
    "    index.upsert(vectors=zip(ids, embeds, metadata))"
801
   ]
802
  },
803
  {
804
   "attachments": {},
805
   "cell_type": "markdown",
806
   "metadata": {},
807
   "source": [
808
    "We can check that the vector index has been populated using `describe_index_stats` like before:"
809
   ]
810
  },
811
  {
812
   "cell_type": "code",
813
   "execution_count": 22,
814
   "metadata": {},
815
   "outputs": [
816
    {
817
     "data": {
818
      "text/plain": [
819
       "{'dimension': 1536,\n",
820
       " 'index_fullness': 0.0,\n",
821
       " 'namespaces': {'': {'vector_count': 4838}},\n",
822
       " 'total_vector_count': 4838}"
823
      ]
824
     },
825
     "execution_count": 22,
826
     "metadata": {},
827
     "output_type": "execute_result"
828
    }
829
   ],
830
   "source": [
831
    "index.describe_index_stats()"
832
   ]
833
  },
834
  {
835
   "attachments": {},
836
   "cell_type": "markdown",
837
   "metadata": {},
838
   "source": [
839
    "#### Retrieval Augmented Generation"
840
   ]
841
  },
842
  {
843
   "attachments": {},
844
   "cell_type": "markdown",
845
   "metadata": {},
846
   "source": [
847
    "We've built a fully-fledged knowledge base. Now it's time to connect that knowledge base to our chatbot. To do that we'll be diving back into LangChain and reusing our template prompt from earlier."
848
   ]
849
  },
850
  {
851
   "attachments": {},
852
   "cell_type": "markdown",
853
   "metadata": {},
854
   "source": [
855
    "To use LangChain here we need to load the LangChain abstraction for a vector index, called a `vectorstore`. We pass in our vector `index` to initialize the object."
856
   ]
857
  },
858
  {
859
   "cell_type": "code",
860
   "execution_count": 23,
861
   "metadata": {},
862
   "outputs": [
863
    {
864
     "name": "stderr",
865
     "output_type": "stream",
866
     "text": [
867
      "/Users/jamesbriggs/opt/anaconda3/envs/ml/lib/python3.9/site-packages/langchain_community/vectorstores/pinecone.py:74: UserWarning: Passing in `embedding` as a Callable is deprecated. Please pass in an Embeddings object instead.\n",
868
      "  warnings.warn(\n"
869
     ]
870
    }
871
   ],
872
   "source": [
873
    "from langchain.vectorstores import Pinecone\n",
874
    "\n",
875
    "text_field = \"text\"  # the metadata field that contains our text\n",
876
    "\n",
877
    "# initialize the vector store object\n",
878
    "vectorstore = Pinecone(\n",
879
    "    index, embed_model.embed_query, text_field\n",
880
    ")"
881
   ]
882
  },
883
  {
884
   "attachments": {},
885
   "cell_type": "markdown",
886
   "metadata": {},
887
   "source": [
888
    "Using this `vectorstore` we can already query the index and see if we have any relevant information given our question about Llama 2."
889
   ]
890
  },
891
  {
892
   "cell_type": "code",
893
   "execution_count": 24,
894
   "metadata": {},
895
   "outputs": [
896
    {
897
     "data": {
898
      "text/plain": [
899
       "[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\\nSergey Edunov Thomas Scialom\\x03\\nGenAI, Meta\\nAbstract\\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\\nmodels outperform open-source chat models on most benchmarks we tested, and based on\\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'}),\n",
900
       " Document(page_content='asChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyﬁne-tunedtoalignwithhuman\\npreferences, which greatly enhances their usability and safety. This step can require signiﬁcant costs in\\ncomputeandhumanannotation,andisoftennottransparentoreasilyreproducible,limitingprogresswithin\\nthe community to advance AI alignment research.\\nIn this work, we develop and release Llama 2, a family of pretrained and ﬁne-tuned LLMs, L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle and\\nL/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , at scales up to 70B parameters. On the series of helpfulness and safety benchmarks we tested,\\nL/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc models generally perform better than existing open-source models. They also appear to\\nbe on par with some of the closed-source models, at least on the human evaluations we performed (see', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'}),\n",
901
       " Document(page_content='Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aur’elien Rodriguez, Armand Joulin, Edouard\\nGrave, and Guillaume Lample. Llama: Open and eﬃcient foundation language models. arXiv preprint\\narXiv:2302.13971 , 2023.\\nAshish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser,\\nand Illia Polosukhin. Attention is all you need, 2017.\\nOriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung,\\nDavid H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in starcraft ii using\\nmulti-agent reinforcement learning. Nature, 575(7782):350–354, 2019.\\nYizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and HannanehHajishirzi. Self-instruct: Aligninglanguagemodel withselfgeneratedinstructions. arXivpreprint', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'})]"
902
      ]
903
     },
904
     "execution_count": 24,
905
     "metadata": {},
906
     "output_type": "execute_result"
907
    }
908
   ],
909
   "source": [
910
    "query = \"What is so special about Llama 2?\"\n",
911
    "\n",
912
    "vectorstore.similarity_search(query, k=3)"
913
   ]
914
  },
915
  {
916
   "attachments": {},
917
   "cell_type": "markdown",
918
   "metadata": {},
919
   "source": [
920
    "We return a lot of text here and it's not that clear what we need or what is relevant. Fortunately, our LLM will be able to parse this information much faster than us. All we need is to connect the output from our `vectorstore` to our `chat` chatbot. To do that we can use the same logic as we used earlier."
921
   ]
922
  },
923
  {
924
   "cell_type": "code",
925
   "execution_count": 25,
926
   "metadata": {},
927
   "outputs": [],
928
   "source": [
929
    "def augment_prompt(query: str):\n",
930
    "    # get top 3 results from knowledge base\n",
931
    "    results = vectorstore.similarity_search(query, k=3)\n",
932
    "    # get the text from the results\n",
933
    "    source_knowledge = \"\\n\".join([x.page_content for x in results])\n",
934
    "    # feed into an augmented prompt\n",
935
    "    augmented_prompt = f\"\"\"Using the contexts below, answer the query.\n",
936
    "\n",
937
    "    Contexts:\n",
938
    "    {source_knowledge}\n",
939
    "\n",
940
    "    Query: {query}\"\"\"\n",
941
    "    return augmented_prompt"
942
   ]
943
  },
944
  {
945
   "attachments": {},
946
   "cell_type": "markdown",
947
   "metadata": {},
948
   "source": [
949
    "Using this we produce an augmented prompt:"
950
   ]
951
  },
952
  {
953
   "cell_type": "code",
954
   "execution_count": 26,
955
   "metadata": {},
956
   "outputs": [
957
    {
958
     "name": "stdout",
959
     "output_type": "stream",
960
     "text": [
961
      "Using the contexts below, answer the query.\n",
962
      "\n",
963
      "    Contexts:\n",
964
      "    Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\n",
965
      "Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\n",
966
      "Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\n",
967
      "Sergey Edunov Thomas Scialom\u0003\n",
968
      "GenAI, Meta\n",
969
      "Abstract\n",
970
      "In this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\n",
971
      "large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\n",
972
      "Our ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\n",
973
      "models outperform open-source chat models on most benchmarks we tested, and based on\n",
974
      "ourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety\n",
975
      "asChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyﬁne-tunedtoalignwithhuman\n",
976
      "preferences, which greatly enhances their usability and safety. This step can require signiﬁcant costs in\n",
977
      "computeandhumanannotation,andisoftennottransparentoreasilyreproducible,limitingprogresswithin\n",
978
      "the community to advance AI alignment research.\n",
979
      "In this work, we develop and release Llama 2, a family of pretrained and ﬁne-tuned LLMs, L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle and\n",
980
      "L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , at scales up to 70B parameters. On the series of helpfulness and safety benchmarks we tested,\n",
981
      "L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc models generally perform better than existing open-source models. They also appear to\n",
982
      "be on par with some of the closed-source models, at least on the human evaluations we performed (see\n",
983
      "Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aur’elien Rodriguez, Armand Joulin, Edouard\n",
984
      "Grave, and Guillaume Lample. Llama: Open and eﬃcient foundation language models. arXiv preprint\n",
985
      "arXiv:2302.13971 , 2023.\n",
986
      "Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser,\n",
987
      "and Illia Polosukhin. Attention is all you need, 2017.\n",
988
      "Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung,\n",
989
      "David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in starcraft ii using\n",
990
      "multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.\n",
991
      "Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and HannanehHajishirzi. Self-instruct: Aligninglanguagemodel withselfgeneratedinstructions. arXivpreprint\n",
992
      "\n",
993
      "    Query: What is so special about Llama 2?\n"
994
     ]
995
    }
996
   ],
997
   "source": [
998
    "print(augment_prompt(query))"
999
   ]
1000
  },
1001
  {
1002
   "attachments": {},
1003
   "cell_type": "markdown",
1004
   "metadata": {},
1005
   "source": [
1006
    "There is still a lot of text here, so let's pass it onto our chat model to see how it performs."
1007
   ]
1008
  },
1009
  {
1010
   "cell_type": "code",
1011
   "execution_count": 27,
1012
   "metadata": {},
1013
   "outputs": [
1014
    {
1015
     "name": "stdout",
1016
     "output_type": "stream",
1017
     "text": [
1018
      "Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) developed in a work. These LLMs range in scale from 7 billion to 70 billion parameters. They are optimized specifically for dialogue use cases.\n",
1019
      "\n",
1020
      "What makes Llama 2 special is that its fine-tuned LLMs, such as L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, have shown better performance compared to existing open-source chat models on various benchmarks. They have also been evaluated in terms of helpfulness and safety, and they appear to be on par with some closed-source models.\n",
1021
      "\n",
1022
      "This work aims to provide a detailed description of the approach used for fine-tuning and ensuring safety in Llama 2, similar to other closed-source LLMs like ChatGPT, BARD, and Claude. The closed-source LLMs are extensively fine-tuned to align with human preferences, enhancing their usability and safety. However, this fine-tuning process often involves significant costs in terms of computation and human annotation, and it is not always transparent or easily reproducible, which can limit progress in AI alignment research within the community.\n",
1023
      "\n",
1024
      "In summary, Llama 2 offers pretrained and fine-tuned LLMs optimized for dialogue use cases, outperforming open-source chat models on benchmarks and potentially serving as substitutes for closed-source models in terms of helpfulness and safety.\n"
1025
     ]
1026
    }
1027
   ],
1028
   "source": [
1029
    "# create a new user prompt\n",
1030
    "prompt = HumanMessage(\n",
1031
    "    content=augment_prompt(query)\n",
1032
    ")\n",
1033
    "# add to messages\n",
1034
    "messages.append(prompt)\n",
1035
    "\n",
1036
    "res = chat(messages)\n",
1037
    "\n",
1038
    "print(res.content)"
1039
   ]
1040
  },
1041
  {
1042
   "attachments": {},
1043
   "cell_type": "markdown",
1044
   "metadata": {},
1045
   "source": [
1046
    "We can continue with more Llama 2 questions. Let's try _without_ RAG first:"
1047
   ]
1048
  },
1049
  {
1050
   "cell_type": "code",
1051
   "execution_count": 28,
1052
   "metadata": {},
1053
   "outputs": [
1054
    {
1055
     "name": "stdout",
1056
     "output_type": "stream",
1057
     "text": [
1058
      "In the provided context, it is mentioned that Llama 2, the collection of pretrained and fine-tuned large language models (LLMs), includes safety measures in its development. However, the specific safety measures used in the development of Llama 2 are not detailed in the given text. It only mentions that safety is a consideration in their approach to fine-tuning and that closed-source models are heavily fine-tuned to align with human preferences, enhancing their usability and safety.\n",
1059
      "\n",
1060
      "To obtain more specific information regarding the safety measures implemented in the development of Llama 2, it would be necessary to refer to the original research paper or documentation related to Llama 2.\n"
1061
     ]
1062
    }
1063
   ],
1064
   "source": [
1065
    "prompt = HumanMessage(\n",
1066
    "    content=\"what safety measures were used in the development of llama 2?\"\n",
1067
    ")\n",
1068
    "\n",
1069
    "res = chat(messages + [prompt])\n",
1070
    "print(res.content)"
1071
   ]
1072
  },
1073
  {
1074
   "attachments": {},
1075
   "cell_type": "markdown",
1076
   "metadata": {},
1077
   "source": [
1078
    "The chatbot is able to respond about Llama 2 thanks to it's conversational history stored in `messages`. However, it doesn't know anything about the safety measures themselves as we have not provided it with that information via the RAG pipeline. Let's try again but with RAG."
1079
   ]
1080
  },
1081
  {
1082
   "cell_type": "code",
1083
   "execution_count": 29,
1084
   "metadata": {},
1085
   "outputs": [
1086
    {
1087
     "name": "stdout",
1088
     "output_type": "stream",
1089
     "text": [
1090
      "Based on the provided contexts, the development of Llama 2 involved several safety measures. The authors of Llama 2 took steps to increase the safety of the models by using safety-specific data annotation and tuning. They also conducted red-teaming and employed iterative evaluations to ensure safety. These measures were implemented to reduce potential risks and enhance the safety of the Llama 2 models.\n",
1091
      "\n",
1092
      "The authors of Llama 2 also emphasized the importance of improving the safety of language models in their work. They provided a detailed description of their approach to fine-tuning and safety, aiming to contribute to the responsible development of large language models (LLMs). By sharing their methodology and observations, they hope to enable the community to reproduce fine-tuned LLMs and continue to enhance their safety.\n",
1093
      "\n",
1094
      "It is worth noting that the specific details of the safety measures used in the development of Llama 2 may require further examination of the referenced papers and resources.\n"
1095
     ]
1096
    }
1097
   ],
1098
   "source": [
1099
    "prompt = HumanMessage(\n",
1100
    "    content=augment_prompt(\n",
1101
    "        \"what safety measures were used in the development of llama 2?\"\n",
1102
    "    )\n",
1103
    ")\n",
1104
    "\n",
1105
    "res = chat(messages + [prompt])\n",
1106
    "print(res.content)"
1107
   ]
1108
  },
1109
  {
1110
   "attachments": {},
1111
   "cell_type": "markdown",
1112
   "metadata": {},
1113
   "source": [
1114
    "We get a much more informed response that includes several items missing in the previous non-RAG response, such as \"red-teaming\", \"iterative evaluations\", and the intention of the researchers to share this research to help \"improve their safety, promoting responsible development in the field\"."
1115
   ]
1116
  },
1117
  {
1118
   "cell_type": "markdown",
1119
   "metadata": {},
1120
   "source": [
1121
    "Delete the index to save resources:"
1122
   ]
1123
  },
1124
  {
1125
   "cell_type": "code",
1126
   "execution_count": null,
1127
   "metadata": {},
1128
   "outputs": [],
1129
   "source": [
1130
    "pc.delete_index(index_name)"
1131
   ]
1132
  },
1133
  {
1134
   "cell_type": "markdown",
1135
   "metadata": {},
1136
   "source": [
1137
    "---"
1138
   ]
1139
  }
1140
 ],
1141
 "metadata": {
1142
  "kernelspec": {
1143
   "display_name": "redacre",
1144
   "language": "python",
1145
   "name": "python3"
1146
  },
1147
  "language_info": {
1148
   "codemirror_mode": {
1149
    "name": "ipython",
1150
    "version": 3
1151
   },
1152
   "file_extension": ".py",
1153
   "mimetype": "text/x-python",
1154
   "name": "python",
1155
   "nbconvert_exporter": "python",
1156
   "pygments_lexer": "ipython3",
1157
   "version": "3.9.12"
1158
  },
1159
  "orig_nbformat": 4
1160
 },
1161
 "nbformat": 4,
1162
 "nbformat_minor": 2
1163
}
1164
examples

Использование cookies