openai-cookbook

Форк
0
/
How_to_count_tokens_with_tiktoken.ipynb 
612 строк · 20.6 Кб
1
{
2
 "cells": [
3
  {
4
   "attachments": {},
5
   "cell_type": "markdown",
6
   "metadata": {},
7
   "source": [
8
    "# How to count tokens with tiktoken\n",
9
    "\n",
10
    "[`tiktoken`](https://github.com/openai/tiktoken/blob/main/README.md) is a fast open-source tokenizer by OpenAI.\n",
11
    "\n",
12
    "Given a text string (e.g., `\"tiktoken is great!\"`) and an encoding (e.g., `\"cl100k_base\"`), a tokenizer can split the text string into a list of tokens (e.g., `[\"t\", \"ik\", \"token\", \" is\", \" great\", \"!\"]`).\n",
13
    "\n",
14
    "Splitting text strings into tokens is useful because GPT models see text in the form of tokens. Knowing how many tokens are in a text string can tell you (a) whether the string is too long for a text model to process and (b) how much an OpenAI API call costs (as usage is priced by token).\n",
15
    "\n",
16
    "\n",
17
    "## Encodings\n",
18
    "\n",
19
    "Encodings specify how text is converted into tokens. Different models use different encodings.\n",
20
    "\n",
21
    "`tiktoken` supports three encodings used by OpenAI models:\n",
22
    "\n",
23
    "| Encoding name           | OpenAI models                                       |\n",
24
    "|-------------------------|-----------------------------------------------------|\n",
25
    "| `cl100k_base`           | `gpt-4`, `gpt-3.5-turbo`, `text-embedding-ada-002`, `text-embedding-3-small`, `text-embedding-3-large`  |\n",
26
    "| `p50k_base`             | Codex models, `text-davinci-002`, `text-davinci-003`|\n",
27
    "| `r50k_base` (or `gpt2`) | GPT-3 models like `davinci`                         |\n",
28
    "\n",
29
    "You can retrieve the encoding for a model using `tiktoken.encoding_for_model()` as follows:\n",
30
    "```python\n",
31
    "encoding = tiktoken.encoding_for_model('gpt-3.5-turbo')\n",
32
    "```\n",
33
    "\n",
34
    "Note that `p50k_base` overlaps substantially with `r50k_base`, and for non-code applications, they will usually give the same tokens.\n",
35
    "\n",
36
    "## Tokenizer libraries by language\n",
37
    "\n",
38
    "For `cl100k_base` and `p50k_base` encodings:\n",
39
    "- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md)\n",
40
    "- .NET / C#: [SharpToken](https://github.com/dmitry-brazhenko/SharpToken), [TiktokenSharp](https://github.com/aiqinxuancai/TiktokenSharp)\n",
41
    "- Java: [jtokkit](https://github.com/knuddelsgmbh/jtokkit)\n",
42
    "- Golang: [tiktoken-go](https://github.com/pkoukk/tiktoken-go)\n",
43
    "- Rust: [tiktoken-rs](https://github.com/zurawiki/tiktoken-rs)\n",
44
    "\n",
45
    "For `r50k_base` (`gpt2`) encodings, tokenizers are available in many languages.\n",
46
    "- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md) (or alternatively [GPT2TokenizerFast](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2TokenizerFast))\n",
47
    "- JavaScript: [gpt-3-encoder](https://www.npmjs.com/package/gpt-3-encoder)\n",
48
    "- .NET / C#: [GPT Tokenizer](https://github.com/dluc/openai-tools)\n",
49
    "- Java: [gpt2-tokenizer-java](https://github.com/hyunwoongko/gpt2-tokenizer-java)\n",
50
    "- PHP: [GPT-3-Encoder-PHP](https://github.com/CodeRevolutionPlugins/GPT-3-Encoder-PHP)\n",
51
    "- Golang: [tiktoken-go](https://github.com/pkoukk/tiktoken-go)\n",
52
    "- Rust: [tiktoken-rs](https://github.com/zurawiki/tiktoken-rs)\n",
53
    "\n",
54
    "(OpenAI makes no endorsements or guarantees of third-party libraries.)\n",
55
    "\n",
56
    "\n",
57
    "## How strings are typically tokenized\n",
58
    "\n",
59
    "In English, tokens commonly range in length from one character to one word (e.g., `\"t\"` or `\" great\"`), though in some languages tokens can be shorter than one character or longer than one word. Spaces are usually grouped with the starts of words (e.g., `\" is\"` instead of `\"is \"` or `\" \"`+`\"is\"`). You can quickly check how a string is tokenized at the [OpenAI Tokenizer](https://beta.openai.com/tokenizer), or the third-party [Tiktokenizer](https://tiktokenizer.vercel.app/) webapp."
60
   ]
61
  },
62
  {
63
   "attachments": {},
64
   "cell_type": "markdown",
65
   "metadata": {},
66
   "source": [
67
    "## 0. Install `tiktoken`\n",
68
    "\n",
69
    "If needed, install `tiktoken` with `pip`:"
70
   ]
71
  },
72
  {
73
   "cell_type": "code",
74
   "execution_count": null,
75
   "metadata": {},
76
   "outputs": [],
77
   "source": [
78
    "%pip install --upgrade tiktoken\n",
79
    "%pip install --upgrade openai"
80
   ]
81
  },
82
  {
83
   "attachments": {},
84
   "cell_type": "markdown",
85
   "metadata": {},
86
   "source": [
87
    "## 1. Import `tiktoken`"
88
   ]
89
  },
90
  {
91
   "cell_type": "code",
92
   "execution_count": 1,
93
   "metadata": {},
94
   "outputs": [],
95
   "source": [
96
    "import tiktoken"
97
   ]
98
  },
99
  {
100
   "attachments": {},
101
   "cell_type": "markdown",
102
   "metadata": {},
103
   "source": [
104
    "## 2. Load an encoding\n",
105
    "\n",
106
    "Use `tiktoken.get_encoding()` to load an encoding by name.\n",
107
    "\n",
108
    "The first time this runs, it will require an internet connection to download. Later runs won't need an internet connection."
109
   ]
110
  },
111
  {
112
   "cell_type": "code",
113
   "execution_count": 3,
114
   "metadata": {},
115
   "outputs": [],
116
   "source": [
117
    "encoding = tiktoken.get_encoding(\"cl100k_base\")\n"
118
   ]
119
  },
120
  {
121
   "attachments": {},
122
   "cell_type": "markdown",
123
   "metadata": {},
124
   "source": [
125
    "Use `tiktoken.encoding_for_model()` to automatically load the correct encoding for a given model name."
126
   ]
127
  },
128
  {
129
   "cell_type": "code",
130
   "execution_count": 4,
131
   "metadata": {},
132
   "outputs": [],
133
   "source": [
134
    "encoding = tiktoken.encoding_for_model(\"gpt-3.5-turbo\")"
135
   ]
136
  },
137
  {
138
   "attachments": {},
139
   "cell_type": "markdown",
140
   "metadata": {},
141
   "source": [
142
    "## 3. Turn text into tokens with `encoding.encode()`\n",
143
    "\n"
144
   ]
145
  },
146
  {
147
   "attachments": {},
148
   "cell_type": "markdown",
149
   "metadata": {},
150
   "source": [
151
    "The `.encode()` method converts a text string into a list of token integers."
152
   ]
153
  },
154
  {
155
   "cell_type": "code",
156
   "execution_count": 5,
157
   "metadata": {},
158
   "outputs": [
159
    {
160
     "data": {
161
      "text/plain": [
162
       "[83, 1609, 5963, 374, 2294, 0]"
163
      ]
164
     },
165
     "execution_count": 5,
166
     "metadata": {},
167
     "output_type": "execute_result"
168
    }
169
   ],
170
   "source": [
171
    "encoding.encode(\"tiktoken is great!\")\n"
172
   ]
173
  },
174
  {
175
   "attachments": {},
176
   "cell_type": "markdown",
177
   "metadata": {},
178
   "source": [
179
    "Count tokens by counting the length of the list returned by `.encode()`."
180
   ]
181
  },
182
  {
183
   "cell_type": "code",
184
   "execution_count": 6,
185
   "metadata": {},
186
   "outputs": [],
187
   "source": [
188
    "def num_tokens_from_string(string: str, encoding_name: str) -> int:\n",
189
    "    \"\"\"Returns the number of tokens in a text string.\"\"\"\n",
190
    "    encoding = tiktoken.get_encoding(encoding_name)\n",
191
    "    num_tokens = len(encoding.encode(string))\n",
192
    "    return num_tokens\n"
193
   ]
194
  },
195
  {
196
   "cell_type": "code",
197
   "execution_count": 7,
198
   "metadata": {},
199
   "outputs": [
200
    {
201
     "data": {
202
      "text/plain": [
203
       "6"
204
      ]
205
     },
206
     "execution_count": 7,
207
     "metadata": {},
208
     "output_type": "execute_result"
209
    }
210
   ],
211
   "source": [
212
    "num_tokens_from_string(\"tiktoken is great!\", \"cl100k_base\")\n"
213
   ]
214
  },
215
  {
216
   "attachments": {},
217
   "cell_type": "markdown",
218
   "metadata": {},
219
   "source": [
220
    "## 4. Turn tokens into text with `encoding.decode()`"
221
   ]
222
  },
223
  {
224
   "attachments": {},
225
   "cell_type": "markdown",
226
   "metadata": {},
227
   "source": [
228
    "`.decode()` converts a list of token integers to a string."
229
   ]
230
  },
231
  {
232
   "cell_type": "code",
233
   "execution_count": 8,
234
   "metadata": {},
235
   "outputs": [
236
    {
237
     "data": {
238
      "text/plain": [
239
       "'tiktoken is great!'"
240
      ]
241
     },
242
     "execution_count": 8,
243
     "metadata": {},
244
     "output_type": "execute_result"
245
    }
246
   ],
247
   "source": [
248
    "encoding.decode([83, 1609, 5963, 374, 2294, 0])\n"
249
   ]
250
  },
251
  {
252
   "attachments": {},
253
   "cell_type": "markdown",
254
   "metadata": {},
255
   "source": [
256
    "Warning: although `.decode()` can be applied to single tokens, beware that it can be lossy for tokens that aren't on utf-8 boundaries."
257
   ]
258
  },
259
  {
260
   "attachments": {},
261
   "cell_type": "markdown",
262
   "metadata": {},
263
   "source": [
264
    "For single tokens, `.decode_single_token_bytes()` safely converts a single integer token to the bytes it represents."
265
   ]
266
  },
267
  {
268
   "cell_type": "code",
269
   "execution_count": 9,
270
   "metadata": {},
271
   "outputs": [
272
    {
273
     "data": {
274
      "text/plain": [
275
       "[b't', b'ik', b'token', b' is', b' great', b'!']"
276
      ]
277
     },
278
     "execution_count": 9,
279
     "metadata": {},
280
     "output_type": "execute_result"
281
    }
282
   ],
283
   "source": [
284
    "[encoding.decode_single_token_bytes(token) for token in [83, 1609, 5963, 374, 2294, 0]]\n"
285
   ]
286
  },
287
  {
288
   "attachments": {},
289
   "cell_type": "markdown",
290
   "metadata": {},
291
   "source": [
292
    "(The `b` in front of the strings indicates that the strings are byte strings.)"
293
   ]
294
  },
295
  {
296
   "attachments": {},
297
   "cell_type": "markdown",
298
   "metadata": {},
299
   "source": [
300
    "## 5. Comparing encodings\n",
301
    "\n",
302
    "Different encodings vary in how they split words, group spaces, and handle non-English characters. Using the methods above, we can compare different encodings on a few example strings."
303
   ]
304
  },
305
  {
306
   "cell_type": "code",
307
   "execution_count": 10,
308
   "metadata": {},
309
   "outputs": [],
310
   "source": [
311
    "def compare_encodings(example_string: str) -> None:\n",
312
    "    \"\"\"Prints a comparison of three string encodings.\"\"\"\n",
313
    "    # print the example string\n",
314
    "    print(f'\\nExample string: \"{example_string}\"')\n",
315
    "    # for each encoding, print the # of tokens, the token integers, and the token bytes\n",
316
    "    for encoding_name in [\"r50k_base\", \"p50k_base\", \"cl100k_base\"]:\n",
317
    "        encoding = tiktoken.get_encoding(encoding_name)\n",
318
    "        token_integers = encoding.encode(example_string)\n",
319
    "        num_tokens = len(token_integers)\n",
320
    "        token_bytes = [encoding.decode_single_token_bytes(token) for token in token_integers]\n",
321
    "        print()\n",
322
    "        print(f\"{encoding_name}: {num_tokens} tokens\")\n",
323
    "        print(f\"token integers: {token_integers}\")\n",
324
    "        print(f\"token bytes: {token_bytes}\")\n",
325
    "        "
326
   ]
327
  },
328
  {
329
   "cell_type": "code",
330
   "execution_count": 11,
331
   "metadata": {},
332
   "outputs": [
333
    {
334
     "name": "stdout",
335
     "output_type": "stream",
336
     "text": [
337
      "\n",
338
      "Example string: \"antidisestablishmentarianism\"\n",
339
      "\n",
340
      "r50k_base: 5 tokens\n",
341
      "token integers: [415, 29207, 44390, 3699, 1042]\n",
342
      "token bytes: [b'ant', b'idis', b'establishment', b'arian', b'ism']\n",
343
      "\n",
344
      "p50k_base: 5 tokens\n",
345
      "token integers: [415, 29207, 44390, 3699, 1042]\n",
346
      "token bytes: [b'ant', b'idis', b'establishment', b'arian', b'ism']\n",
347
      "\n",
348
      "cl100k_base: 6 tokens\n",
349
      "token integers: [519, 85342, 34500, 479, 8997, 2191]\n",
350
      "token bytes: [b'ant', b'idis', b'establish', b'ment', b'arian', b'ism']\n"
351
     ]
352
    }
353
   ],
354
   "source": [
355
    "compare_encodings(\"antidisestablishmentarianism\")\n"
356
   ]
357
  },
358
  {
359
   "cell_type": "code",
360
   "execution_count": 12,
361
   "metadata": {},
362
   "outputs": [
363
    {
364
     "name": "stdout",
365
     "output_type": "stream",
366
     "text": [
367
      "\n",
368
      "Example string: \"2 + 2 = 4\"\n",
369
      "\n",
370
      "r50k_base: 5 tokens\n",
371
      "token integers: [17, 1343, 362, 796, 604]\n",
372
      "token bytes: [b'2', b' +', b' 2', b' =', b' 4']\n",
373
      "\n",
374
      "p50k_base: 5 tokens\n",
375
      "token integers: [17, 1343, 362, 796, 604]\n",
376
      "token bytes: [b'2', b' +', b' 2', b' =', b' 4']\n",
377
      "\n",
378
      "cl100k_base: 7 tokens\n",
379
      "token integers: [17, 489, 220, 17, 284, 220, 19]\n",
380
      "token bytes: [b'2', b' +', b' ', b'2', b' =', b' ', b'4']\n"
381
     ]
382
    }
383
   ],
384
   "source": [
385
    "compare_encodings(\"2 + 2 = 4\")\n"
386
   ]
387
  },
388
  {
389
   "cell_type": "code",
390
   "execution_count": 13,
391
   "metadata": {},
392
   "outputs": [
393
    {
394
     "name": "stdout",
395
     "output_type": "stream",
396
     "text": [
397
      "\n",
398
      "Example string: \"お誕生日おめでとう\"\n",
399
      "\n",
400
      "r50k_base: 14 tokens\n",
401
      "token integers: [2515, 232, 45739, 243, 37955, 33768, 98, 2515, 232, 1792, 223, 30640, 30201, 29557]\n",
402
      "token bytes: [b'\\xe3\\x81', b'\\x8a', b'\\xe8\\xaa', b'\\x95', b'\\xe7\\x94\\x9f', b'\\xe6\\x97', b'\\xa5', b'\\xe3\\x81', b'\\x8a', b'\\xe3\\x82', b'\\x81', b'\\xe3\\x81\\xa7', b'\\xe3\\x81\\xa8', b'\\xe3\\x81\\x86']\n",
403
      "\n",
404
      "p50k_base: 14 tokens\n",
405
      "token integers: [2515, 232, 45739, 243, 37955, 33768, 98, 2515, 232, 1792, 223, 30640, 30201, 29557]\n",
406
      "token bytes: [b'\\xe3\\x81', b'\\x8a', b'\\xe8\\xaa', b'\\x95', b'\\xe7\\x94\\x9f', b'\\xe6\\x97', b'\\xa5', b'\\xe3\\x81', b'\\x8a', b'\\xe3\\x82', b'\\x81', b'\\xe3\\x81\\xa7', b'\\xe3\\x81\\xa8', b'\\xe3\\x81\\x86']\n",
407
      "\n",
408
      "cl100k_base: 9 tokens\n",
409
      "token integers: [33334, 45918, 243, 21990, 9080, 33334, 62004, 16556, 78699]\n",
410
      "token bytes: [b'\\xe3\\x81\\x8a', b'\\xe8\\xaa', b'\\x95', b'\\xe7\\x94\\x9f', b'\\xe6\\x97\\xa5', b'\\xe3\\x81\\x8a', b'\\xe3\\x82\\x81', b'\\xe3\\x81\\xa7', b'\\xe3\\x81\\xa8\\xe3\\x81\\x86']\n"
411
     ]
412
    }
413
   ],
414
   "source": [
415
    "compare_encodings(\"お誕生日おめでとう\")\n"
416
   ]
417
  },
418
  {
419
   "attachments": {},
420
   "cell_type": "markdown",
421
   "metadata": {},
422
   "source": [
423
    "## 6. Counting tokens for chat completions API calls\n",
424
    "\n",
425
    "ChatGPT models like `gpt-3.5-turbo` and `gpt-4` use tokens in the same way as older completions models, but because of their message-based formatting, it's more difficult to count how many tokens will be used by a conversation.\n",
426
    "\n",
427
    "Below is an example function for counting tokens for messages passed to `gpt-3.5-turbo` or `gpt-4`.\n",
428
    "\n",
429
    "Note that the exact way that tokens are counted from messages may change from model to model. Consider the counts from the function below an estimate, not a timeless guarantee.\n",
430
    "\n",
431
    "In particular, requests that use the optional functions input will consume extra tokens on top of the estimates calculated below."
432
   ]
433
  },
434
  {
435
   "cell_type": "code",
436
   "execution_count": 2,
437
   "metadata": {},
438
   "outputs": [],
439
   "source": [
440
    "def num_tokens_from_messages(messages, model=\"gpt-3.5-turbo-0613\"):\n",
441
    "    \"\"\"Return the number of tokens used by a list of messages.\"\"\"\n",
442
    "    try:\n",
443
    "        encoding = tiktoken.encoding_for_model(model)\n",
444
    "    except KeyError:\n",
445
    "        print(\"Warning: model not found. Using cl100k_base encoding.\")\n",
446
    "        encoding = tiktoken.get_encoding(\"cl100k_base\")\n",
447
    "    if model in {\n",
448
    "        \"gpt-3.5-turbo-0613\",\n",
449
    "        \"gpt-3.5-turbo-16k-0613\",\n",
450
    "        \"gpt-4-0314\",\n",
451
    "        \"gpt-4-32k-0314\",\n",
452
    "        \"gpt-4-0613\",\n",
453
    "        \"gpt-4-32k-0613\",\n",
454
    "        }:\n",
455
    "        tokens_per_message = 3\n",
456
    "        tokens_per_name = 1\n",
457
    "    elif model == \"gpt-3.5-turbo-0301\":\n",
458
    "        tokens_per_message = 4  # every message follows <|start|>{role/name}\\n{content}<|end|>\\n\n",
459
    "        tokens_per_name = -1  # if there's a name, the role is omitted\n",
460
    "    elif \"gpt-3.5-turbo\" in model:\n",
461
    "        print(\"Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.\")\n",
462
    "        return num_tokens_from_messages(messages, model=\"gpt-3.5-turbo-0613\")\n",
463
    "    elif \"gpt-4\" in model:\n",
464
    "        print(\"Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.\")\n",
465
    "        return num_tokens_from_messages(messages, model=\"gpt-4-0613\")\n",
466
    "    else:\n",
467
    "        raise NotImplementedError(\n",
468
    "            f\"\"\"num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.\"\"\"\n",
469
    "        )\n",
470
    "    num_tokens = 0\n",
471
    "    for message in messages:\n",
472
    "        num_tokens += tokens_per_message\n",
473
    "        for key, value in message.items():\n",
474
    "            num_tokens += len(encoding.encode(value))\n",
475
    "            if key == \"name\":\n",
476
    "                num_tokens += tokens_per_name\n",
477
    "    num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>\n",
478
    "    return num_tokens\n"
479
   ]
480
  },
481
  {
482
   "cell_type": "code",
483
   "execution_count": 4,
484
   "metadata": {},
485
   "outputs": [
486
    {
487
     "name": "stdout",
488
     "output_type": "stream",
489
     "text": [
490
      "gpt-3.5-turbo-0301\n",
491
      "127 prompt tokens counted by num_tokens_from_messages().\n",
492
      "127 prompt tokens counted by the OpenAI API.\n",
493
      "\n",
494
      "gpt-3.5-turbo-0613\n",
495
      "129 prompt tokens counted by num_tokens_from_messages().\n",
496
      "129 prompt tokens counted by the OpenAI API.\n",
497
      "\n",
498
      "gpt-3.5-turbo\n",
499
      "Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.\n",
500
      "129 prompt tokens counted by num_tokens_from_messages().\n",
501
      "129 prompt tokens counted by the OpenAI API.\n",
502
      "\n",
503
      "gpt-4-0314\n",
504
      "129 prompt tokens counted by num_tokens_from_messages().\n",
505
      "129 prompt tokens counted by the OpenAI API.\n",
506
      "\n",
507
      "gpt-4-0613\n",
508
      "129 prompt tokens counted by num_tokens_from_messages().\n",
509
      "129 prompt tokens counted by the OpenAI API.\n",
510
      "\n",
511
      "gpt-4\n",
512
      "Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.\n",
513
      "129 prompt tokens counted by num_tokens_from_messages().\n",
514
      "129 prompt tokens counted by the OpenAI API.\n",
515
      "\n"
516
     ]
517
    }
518
   ],
519
   "source": [
520
    "# let's verify the function above matches the OpenAI API response\n",
521
    "\n",
522
    "from openai import OpenAI\n",
523
    "import os\n",
524
    "\n",
525
    "client = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\", \"<your OpenAI API key if not set as env var>\"))\n",
526
    "\n",
527
    "example_messages = [\n",
528
    "    {\n",
529
    "        \"role\": \"system\",\n",
530
    "        \"content\": \"You are a helpful, pattern-following assistant that translates corporate jargon into plain English.\",\n",
531
    "    },\n",
532
    "    {\n",
533
    "        \"role\": \"system\",\n",
534
    "        \"name\": \"example_user\",\n",
535
    "        \"content\": \"New synergies will help drive top-line growth.\",\n",
536
    "    },\n",
537
    "    {\n",
538
    "        \"role\": \"system\",\n",
539
    "        \"name\": \"example_assistant\",\n",
540
    "        \"content\": \"Things working well together will increase revenue.\",\n",
541
    "    },\n",
542
    "    {\n",
543
    "        \"role\": \"system\",\n",
544
    "        \"name\": \"example_user\",\n",
545
    "        \"content\": \"Let's circle back when we have more bandwidth to touch base on opportunities for increased leverage.\",\n",
546
    "    },\n",
547
    "    {\n",
548
    "        \"role\": \"system\",\n",
549
    "        \"name\": \"example_assistant\",\n",
550
    "        \"content\": \"Let's talk later when we're less busy about how to do better.\",\n",
551
    "    },\n",
552
    "    {\n",
553
    "        \"role\": \"user\",\n",
554
    "        \"content\": \"This late pivot means we don't have time to boil the ocean for the client deliverable.\",\n",
555
    "    },\n",
556
    "]\n",
557
    "\n",
558
    "for model in [\n",
559
    "    \"gpt-3.5-turbo-0301\",\n",
560
    "    \"gpt-3.5-turbo-0613\",\n",
561
    "    \"gpt-3.5-turbo\",\n",
562
    "    \"gpt-4-0314\",\n",
563
    "    \"gpt-4-0613\",\n",
564
    "    \"gpt-4\",\n",
565
    "    ]:\n",
566
    "    print(model)\n",
567
    "    # example token count from the function defined above\n",
568
    "    print(f\"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_from_messages().\")\n",
569
    "    # example token count from the OpenAI API\n",
570
    "    response = client.chat.completions.create(model=model,\n",
571
    "    messages=example_messages,\n",
572
    "    temperature=0,\n",
573
    "    max_tokens=1)\n",
574
    "    print(f'{response.usage.prompt_tokens} prompt tokens counted by the OpenAI API.')\n",
575
    "    print()\n"
576
   ]
577
  },
578
  {
579
   "cell_type": "code",
580
   "execution_count": null,
581
   "metadata": {},
582
   "outputs": [],
583
   "source": []
584
  }
585
 ],
586
 "metadata": {
587
  "kernelspec": {
588
   "display_name": "Python 3",
589
   "language": "python",
590
   "name": "python3"
591
  },
592
  "language_info": {
593
   "codemirror_mode": {
594
    "name": "ipython",
595
    "version": 3
596
   },
597
   "file_extension": ".py",
598
   "mimetype": "text/x-python",
599
   "name": "python",
600
   "nbconvert_exporter": "python",
601
   "pygments_lexer": "ipython3",
602
   "version": "3.11.5"
603
  },
604
  "vscode": {
605
   "interpreter": {
606
    "hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
607
   }
608
  }
609
 },
610
 "nbformat": 4,
611
 "nbformat_minor": 2
612
}
613

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.