instructor

2-tips.ipynb
517 строк · 16.0 Кб
Перенос по словам
1
{
2
 "cells": [
3
  {
4
   "cell_type": "markdown",
5
   "id": "8bb7d0d0-2b7f-4e9e-8565-467dc5c6fd22",
6
   "metadata": {},
7
   "source": [
8
    "# General Tips on Prompting\n",
9
    "\n",
10
    "Before we get into some big applications of schema engineering I want to equip you with the tools for success.\n",
11
    "This notebook is to share some general advice when using prompts to get the most of your models.\n",
12
    "\n",
13
    "Before you might think of prompt engineering as massaging this wall of text, almost like coding in a notepad. But with schema engineering you can get a lot more out of your prompts with a lot less work.\n"
14
   ]
15
  },
16
  {
17
   "cell_type": "markdown",
18
   "id": "8a785c25-b08d-4ab4-bbd7-22e3b090c2ed",
19
   "metadata": {},
20
   "source": [
21
    "## Classification\n",
22
    "\n",
23
    "For classification we've found theres generally two methods of modeling.\n",
24
    "\n",
25
    "1. using Enums\n",
26
    "2. using Literals\n",
27
    "\n",
28
    "Use an enum in Python when you need a set of named constants that are related and you want to ensure type safety, readability, and prevent invalid values. Enums are helpful for grouping and iterating over these constants.\n",
29
    "\n",
30
    "Use literals when you have a small, unchanging set of values that you don't need to group or iterate over, and when type safety and preventing invalid values is less of a concern. Literals are simpler and more direct for basic, one-off values.\n"
31
   ]
32
  },
33
  {
34
   "cell_type": "code",
35
   "execution_count": 1,
36
   "id": "fdf5e1d9-31ad-4e8a-a55e-e2e70fff598d",
37
   "metadata": {},
38
   "outputs": [
39
    {
40
     "data": {
41
      "text/plain": [
42
       "{'age': 17, 'name': 'Harry Potter', 'house': <House.Gryffindor: 'gryffindor'>}"
43
      ]
44
     },
45
     "execution_count": 1,
46
     "metadata": {},
47
     "output_type": "execute_result"
48
    }
49
   ],
50
   "source": [
51
    "import instructor\n",
52
    "from openai import OpenAI\n",
53
    "\n",
54
    "from enum import Enum\n",
55
    "from pydantic import BaseModel, Field\n",
56
    "from typing_extensions import Literal\n",
57
    "\n",
58
    "\n",
59
    "client = instructor.patch(OpenAI())\n",
60
    "\n",
61
    "\n",
62
    "# Tip: Do not use auto() as they cast to 1,2,3,4\n",
63
    "class House(Enum):\n",
64
    "    Gryffindor = \"gryffindor\"\n",
65
    "    Hufflepuff = \"hufflepuff\"\n",
66
    "    Ravenclaw = \"ravenclaw\"\n",
67
    "    Slytherin = \"slytherin\"\n",
68
    "\n",
69
    "\n",
70
    "class Character(BaseModel):\n",
71
    "    age: int\n",
72
    "    name: str\n",
73
    "    house: House\n",
74
    "\n",
75
    "    def say_hello(self):\n",
76
    "        print(\n",
77
    "            f\"Hello, I'm {self.name}, I'm {self.age} years old and I'm from {self.house.value.title()}\"\n",
78
    "        )\n",
79
    "\n",
80
    "\n",
81
    "resp = client.chat.completions.create(\n",
82
    "    model=\"gpt-4-1106-preview\",\n",
83
    "    messages=[{\"role\": \"user\", \"content\": \"Harry Potter\"}],\n",
84
    "    response_model=Character,\n",
85
    ")\n",
86
    "resp.model_dump()"
87
   ]
88
  },
89
  {
90
   "cell_type": "code",
91
   "execution_count": 2,
92
   "id": "c609eb44",
93
   "metadata": {},
94
   "outputs": [
95
    {
96
     "name": "stdout",
97
     "output_type": "stream",
98
     "text": [
99
      "Hello, I'm Harry Potter, I'm 17 years old and I'm from Gryffindor\n"
100
     ]
101
    }
102
   ],
103
   "source": [
104
    "resp.say_hello()"
105
   ]
106
  },
107
  {
108
   "cell_type": "code",
109
   "execution_count": 3,
110
   "id": "03db160c-81e9-4373-bfec-7a107224b6dd",
111
   "metadata": {},
112
   "outputs": [
113
    {
114
     "data": {
115
      "text/plain": [
116
       "{'age': 11, 'name': 'Harry Potter', 'house': 'Gryffindor'}"
117
      ]
118
     },
119
     "execution_count": 3,
120
     "metadata": {},
121
     "output_type": "execute_result"
122
    }
123
   ],
124
   "source": [
125
    "class Character(BaseModel):\n",
126
    "    age: int\n",
127
    "    name: str\n",
128
    "    house: Literal[\"Gryffindor\", \"Hufflepuff\", \"Ravenclaw\", \"Slytherin\"]\n",
129
    "\n",
130
    "\n",
131
    "resp = client.chat.completions.create(\n",
132
    "    model=\"gpt-4-1106-preview\",\n",
133
    "    messages=[{\"role\": \"user\", \"content\": \"Harry Potter\"}],\n",
134
    "    response_model=Character,\n",
135
    ")\n",
136
    "resp.model_dump()"
137
   ]
138
  },
139
  {
140
   "cell_type": "markdown",
141
   "id": "803e0ce6-6e7e-4d86-a7a8-49ebaad0a40b",
142
   "metadata": {},
143
   "source": [
144
    "## Arbitrary properties\n",
145
    "\n",
146
    "Often times there are long properties that you might want to extract from data that we can not specify in advanced. We can get around this by defining an arbitrary key value store like so:\n"
147
   ]
148
  },
149
  {
150
   "cell_type": "code",
151
   "execution_count": 4,
152
   "id": "0e7938b8-4666-4df4-bd80-f53e8baf7550",
153
   "metadata": {},
154
   "outputs": [
155
    {
156
     "data": {
157
      "text/plain": [
158
       "{'age': 38,\n",
159
       " 'name': 'Severus Snape',\n",
160
       " 'house': 'Slytherin',\n",
161
       " 'properties': [{'key': 'role', 'value': 'Potions Master'},\n",
162
       "  {'key': 'patronus', 'value': 'Doe'},\n",
163
       "  {'key': 'loyalty', 'value': 'Dumbledore'},\n",
164
       "  {'key': 'played_by', 'value': 'Alan Rickman'}]}"
165
      ]
166
     },
167
     "execution_count": 4,
168
     "metadata": {},
169
     "output_type": "execute_result"
170
    }
171
   ],
172
   "source": [
173
    "from typing import List\n",
174
    "\n",
175
    "\n",
176
    "class Property(BaseModel):\n",
177
    "    key: str = Field(description=\"Must be snake case\")\n",
178
    "    value: str\n",
179
    "\n",
180
    "\n",
181
    "class Character(BaseModel):\n",
182
    "    age: int\n",
183
    "    name: str\n",
184
    "    house: Literal[\"Gryffindor\", \"Hufflepuff\", \"Ravenclaw\", \"Slytherin\"]\n",
185
    "    properties: List[Property]\n",
186
    "\n",
187
    "\n",
188
    "resp = client.chat.completions.create(\n",
189
    "    model=\"gpt-4-1106-preview\",\n",
190
    "    messages=[{\"role\": \"user\", \"content\": \"Snape from Harry Potter\"}],\n",
191
    "    response_model=Character,\n",
192
    ")\n",
193
    "resp.model_dump()"
194
   ]
195
  },
196
  {
197
   "cell_type": "markdown",
198
   "id": "b3e62f68-a79f-4f65-9c1f-726e4e2d340a",
199
   "metadata": {},
200
   "source": [
201
    "## Limiting the length of lists\n",
202
    "\n",
203
    "In later chapters we'll talk about how to use validators to assert the length of lists but we can also use prompting tricks to enumerate values. Here we'll define a index to count the properties.\n",
204
    "\n",
205
    "In this following example instead of extraction we're going to work on generation instead.\n"
206
   ]
207
  },
208
  {
209
   "cell_type": "code",
210
   "execution_count": 5,
211
   "id": "69a58d01-ab6f-41b6-bc0c-b0e55fdb6fe4",
212
   "metadata": {},
213
   "outputs": [
214
    {
215
     "data": {
216
      "text/plain": [
217
       "{'age': 38,\n",
218
       " 'name': 'Severus Snape',\n",
219
       " 'house': 'Slytherin',\n",
220
       " 'properties': [{'index': '1',\n",
221
       "   'key': 'position_at_hogwarts',\n",
222
       "   'value': 'Potions Master'},\n",
223
       "  {'index': '2', 'key': 'patronus_form', 'value': 'Doe'},\n",
224
       "  {'index': '3', 'key': 'loyalty', 'value': 'Albus Dumbledore'},\n",
225
       "  {'index': '4', 'key': 'played_by', 'value': 'Alan Rickman'},\n",
226
       "  {'index': '5', 'key': 'final_act', 'value': 'Protecting Harry Potter'}]}"
227
      ]
228
     },
229
     "execution_count": 5,
230
     "metadata": {},
231
     "output_type": "execute_result"
232
    }
233
   ],
234
   "source": [
235
    "class Property(BaseModel):\n",
236
    "    index: str = Field(..., description=\"Monotonically increasing ID\")\n",
237
    "    key: str = Field(description=\"Must be snake case\")\n",
238
    "    value: str\n",
239
    "\n",
240
    "\n",
241
    "class Character(BaseModel):\n",
242
    "    age: int\n",
243
    "    name: str\n",
244
    "    house: Literal[\"Gryffindor\", \"Hufflepuff\", \"Ravenclaw\", \"Slytherin\"]\n",
245
    "    properties: List[Property] = Field(\n",
246
    "        ...,\n",
247
    "        description=\"Numbered list of arbitrary extracted properties, should be exactly 5\",\n",
248
    "    )\n",
249
    "\n",
250
    "\n",
251
    "resp = client.chat.completions.create(\n",
252
    "    model=\"gpt-4-1106-preview\",\n",
253
    "    messages=[{\"role\": \"user\", \"content\": \"Snape from Harry Potter\"}],\n",
254
    "    response_model=Character,\n",
255
    ")\n",
256
    "resp.model_dump()"
257
   ]
258
  },
259
  {
260
   "cell_type": "markdown",
261
   "id": "bbc1d900-617a-4e4d-a401-6d10a5153cda",
262
   "metadata": {},
263
   "source": [
264
    "## Defining Multiple Entities\n",
265
    "\n",
266
    "Now that we see a single entity with many properties we can continue to nest them into many users\n"
267
   ]
268
  },
269
  {
270
   "cell_type": "code",
271
   "execution_count": 6,
272
   "id": "1f2a2b14-a956-4f96-90c9-e11ca04ab7d1",
273
   "metadata": {},
274
   "outputs": [
275
    {
276
     "name": "stdout",
277
     "output_type": "stream",
278
     "text": [
279
      "age=11 name='Harry Potter' house='Gryffindor'\n",
280
      "age=11 name='Hermione Granger' house='Gryffindor'\n",
281
      "age=11 name='Ron Weasley' house='Gryffindor'\n",
282
      "age=11 name='Draco Malfoy' house='Slytherin'\n",
283
      "age=11 name='Neville Longbottom' house='Gryffindor'\n"
284
     ]
285
    }
286
   ],
287
   "source": [
288
    "from typing import Iterable\n",
289
    "\n",
290
    "\n",
291
    "class Character(BaseModel):\n",
292
    "    age: int\n",
293
    "    name: str\n",
294
    "    house: Literal[\"Gryffindor\", \"Hufflepuff\", \"Ravenclaw\", \"Slytherin\"]\n",
295
    "\n",
296
    "\n",
297
    "resp = client.chat.completions.create(\n",
298
    "    model=\"gpt-4-1106-preview\",\n",
299
    "    messages=[{\"role\": \"user\", \"content\": \"Five characters from Harry Potter\"}],\n",
300
    "    response_model=Iterable[Character],\n",
301
    ")\n",
302
    "\n",
303
    "for character in resp:\n",
304
    "    print(character)"
305
   ]
306
  },
307
  {
308
   "cell_type": "code",
309
   "execution_count": 7,
310
   "id": "a3091aba",
311
   "metadata": {},
312
   "outputs": [
313
    {
314
     "name": "stdout",
315
     "output_type": "stream",
316
     "text": [
317
      "age=11 name='Harry Potter' house='Gryffindor'\n",
318
      "age=11 name='Hermione Granger' house='Gryffindor'\n",
319
      "age=11 name='Ron Weasley' house='Gryffindor'\n",
320
      "age=17 name='Draco Malfoy' house='Slytherin'\n",
321
      "age=11 name='Luna Lovegood' house='Ravenclaw'\n"
322
     ]
323
    }
324
   ],
325
   "source": [
326
    "from typing import Iterable\n",
327
    "\n",
328
    "\n",
329
    "class Character(BaseModel):\n",
330
    "    age: int\n",
331
    "    name: str\n",
332
    "    house: Literal[\"Gryffindor\", \"Hufflepuff\", \"Ravenclaw\", \"Slytherin\"]\n",
333
    "\n",
334
    "\n",
335
    "resp = client.chat.completions.create(\n",
336
    "    model=\"gpt-4-1106-preview\",\n",
337
    "    messages=[{\"role\": \"user\", \"content\": \"Five characters from Harry Potter\"}],\n",
338
    "    stream=True,\n",
339
    "    response_model=Iterable[Character],\n",
340
    ")\n",
341
    "\n",
342
    "for character in resp:\n",
343
    "    print(character)"
344
   ]
345
  },
346
  {
347
   "cell_type": "markdown",
348
   "id": "f6ed3144-bde1-4033-9c94-a6926fa079d2",
349
   "metadata": {},
350
   "source": [
351
    "## Defining Relationships\n",
352
    "\n",
353
    "Now only can we define lists of users, with list of properties one of the more interesting things I've learned about prompting is that we can also easily define lists of references.\n"
354
   ]
355
  },
356
  {
357
   "cell_type": "code",
358
   "execution_count": 8,
359
   "id": "6de8768e-b36a-4a51-9cf9-940d178552f6",
360
   "metadata": {},
361
   "outputs": [
362
    {
363
     "name": "stdout",
364
     "output_type": "stream",
365
     "text": [
366
      "id=1 name='Harry Potter' friends_array=[2, 3, 4, 5, 6]\n",
367
      "id=2 name='Hermione Granger' friends_array=[1, 3, 4, 5]\n",
368
      "id=3 name='Ron Weasley' friends_array=[1, 2, 4, 6]\n",
369
      "id=4 name='Neville Longbottom' friends_array=[1, 2, 3, 5]\n",
370
      "id=5 name='Luna Lovegood' friends_array=[1, 2, 4, 6]\n",
371
      "id=6 name='Draco Malfoy' friends_array=[1, 3, 5]\n"
372
     ]
373
    }
374
   ],
375
   "source": [
376
    "class Character(BaseModel):\n",
377
    "    id: int\n",
378
    "    name: str\n",
379
    "    friends_array: List[int] = Field(description=\"Relationships to their friends using the id\")\n",
380
    "\n",
381
    "\n",
382
    "resp = client.chat.completions.create(\n",
383
    "    model=\"gpt-4-1106-preview\",\n",
384
    "    messages=[{\"role\": \"user\", \"content\": \"5 kids from Harry Potter\"}],\n",
385
    "    stream=True,\n",
386
    "    response_model=Iterable[Character],\n",
387
    ")\n",
388
    "\n",
389
    "for character in resp:\n",
390
    "    print(character)"
391
   ]
392
  },
393
  {
394
   "cell_type": "markdown",
395
   "id": "523b5797-71a5-4a96-a4b7-21280fb73015",
396
   "metadata": {},
397
   "source": [
398
    "With the tools we've discussed, we can find numerous real-world applications in production settings. These include extracting action items from transcripts, generating fake data, filling out forms, and creating objects that correspond to generative UI. These simple tricks will be highly useful.\n"
399
   ]
400
  },
401
  {
402
   "cell_type": "markdown",
403
   "id": "a9d20fd9-0cd0-4300-a8c1-d16388969e8e",
404
   "metadata": {},
405
   "source": [
406
    "# Missing Data\n",
407
    "\n",
408
    "The Maybe pattern is a concept in functional programming used for error handling. Instead of raising exceptions or returning None, you can use a Maybe type to encapsulate both the result and potential errors.\n",
409
    "\n",
410
    "This pattern is particularly useful when making LLM calls, as providing language models with an escape hatch can effectively reduce hallucinations."
411
   ]
412
  },
413
  {
414
   "cell_type": "code",
415
   "execution_count": 9,
416
   "id": "c04f44aa-dc4b-4499-a151-e812512e77e6",
417
   "metadata": {},
418
   "outputs": [],
419
   "source": [
420
    "from typing import Optional\n",
421
    "\n",
422
    "class Character(BaseModel):\n",
423
    "    age: int\n",
424
    "    name: str\n",
425
    "\n",
426
    "class MaybeCharacter(BaseModel):\n",
427
    "    result: Optional[Character] = Field(default=None)\n",
428
    "    error: bool = Field(default=False)\n",
429
    "    message: Optional[str]"
430
   ]
431
  },
432
  {
433
   "cell_type": "code",
434
   "execution_count": 10,
435
   "id": "a2155190-e104-4ed6-a17f-e0732499dd51",
436
   "metadata": {},
437
   "outputs": [],
438
   "source": [
439
    "def extract(content: str) -> MaybeCharacter:\n",
440
    "    return client.chat.completions.create(\n",
441
    "        model=\"gpt-3.5-turbo\",\n",
442
    "        response_model=MaybeCharacter,\n",
443
    "        messages=[\n",
444
    "            {\"role\": \"user\", \"content\": f\"Extract `{content}`\"},\n",
445
    "        ],\n",
446
    "    )"
447
   ]
448
  },
449
  {
450
   "cell_type": "code",
451
   "execution_count": 11,
452
   "id": "a7b59afa-9bf0-4dc0-a5ca-de584514f33b",
453
   "metadata": {},
454
   "outputs": [
455
    {
456
     "data": {
457
      "text/plain": [
458
       "MaybeCharacter(result=Character(age=17, name='Harry Potter'), error=False, message=None)"
459
      ]
460
     },
461
     "execution_count": 11,
462
     "metadata": {},
463
     "output_type": "execute_result"
464
    }
465
   ],
466
   "source": [
467
    "extract(\"Harry Potter\")"
468
   ]
469
  },
470
  {
471
   "cell_type": "code",
472
   "execution_count": 12,
473
   "id": "b5ddd5c1-ca75-49a9-95ad-181170435291",
474
   "metadata": {},
475
   "outputs": [
476
    {
477
     "ename": "ValueError",
478
     "evalue": "404 Error",
479
     "output_type": "error",
480
     "traceback": [
481
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
482
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
483
      "\u001b[1;32m/Users/jasonliu/dev/instructor/docs/tutorials/2-tips.ipynb Cell 20\u001b[0m line \u001b[0;36m4\n\u001b[1;32m      <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/docs/tutorials/2-tips.ipynb#X25sZmlsZQ%3D%3D?line=0'>1</a>\u001b[0m user \u001b[39m=\u001b[39m extract(\u001b[39m\"\u001b[39m\u001b[39m404 Error\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[1;32m      <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/docs/tutorials/2-tips.ipynb#X25sZmlsZQ%3D%3D?line=2'>3</a>\u001b[0m \u001b[39mif\u001b[39;00m user\u001b[39m.\u001b[39merror:\n\u001b[0;32m----> <a href='vscode-notebook-cell:/Users/jasonliu/dev/instructor/docs/tutorials/2-tips.ipynb#X25sZmlsZQ%3D%3D?line=3'>4</a>\u001b[0m     \u001b[39mraise\u001b[39;00m \u001b[39mValueError\u001b[39;00m(user\u001b[39m.\u001b[39mmessage)\n",
484
      "\u001b[0;31mValueError\u001b[0m: 404 Error"
485
     ]
486
    }
487
   ],
488
   "source": [
489
    "user = extract(\"404 Error\")\n",
490
    "\n",
491
    "if user.error:\n",
492
    "    raise ValueError(user.message)"
493
   ]
494
  }
495
 ],
496
 "metadata": {
497
  "kernelspec": {
498
   "display_name": "Python 3 (ipykernel)",
499
   "language": "python",
500
   "name": "python3"
501
  },
502
  "language_info": {
503
   "codemirror_mode": {
504
    "name": "ipython",
505
    "version": 3
506
   },
507
   "file_extension": ".py",
508
   "mimetype": "text/x-python",
509
   "name": "python",
510
   "nbconvert_exporter": "python",
511
   "pygments_lexer": "ipython3",
512
   "version": "3.11.6"
513
  }
514
 },
515
 "nbformat": 4,
516
 "nbformat_minor": 5
517
}
518
instructor

Использование cookies