examples

Форк
0
/
product_recommender.ipynb 
2085 строк · 68.1 Кб
1
{
2
 "cells": [
3
  {
4
   "attachments": {},
5
   "cell_type": "markdown",
6
   "metadata": {},
7
   "source": [
8
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/recommendation/product-recommender/product_recommender.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/recommendation/product-recommender/product_recommender.ipynb)"
9
   ]
10
  },
11
  {
12
   "attachments": {},
13
   "cell_type": "markdown",
14
   "metadata": {
15
    "id": "YmdWGrw4t5G2"
16
   },
17
   "source": [
18
    "# Product Recommendation Engine"
19
   ]
20
  },
21
  {
22
   "attachments": {},
23
   "cell_type": "markdown",
24
   "metadata": {
25
    "id": "lXd46ecEt5G7"
26
   },
27
   "source": [
28
    "Learn how to build a product recommendation engine using collaborative filtering and Pinecone.\n",
29
    "\n",
30
    "In this example, we will generate product recommendations for ecommerce customers based on previous orders and trending items. This example covers preparing the vector embeddings, creating and deploying the Pinecone service, writing data to Pinecone, and finally querying Pinecone to receive a ranked list of recommended products."
31
   ]
32
  },
33
  {
34
   "attachments": {},
35
   "cell_type": "markdown",
36
   "metadata": {},
37
   "source": [
38
    "---\n",
39
    "\n",
40
    "🚨 _Note that running this on CPU is slow! If running on Google Colab you go to **Runtime > Change runtime type > Hardware accelerator > GPU** to switch to GPU._\n",
41
    "\n",
42
    "---"
43
   ]
44
  },
45
  {
46
   "attachments": {},
47
   "cell_type": "markdown",
48
   "metadata": {
49
    "id": "XvrvUTLvt5G7"
50
   },
51
   "source": [
52
    "## Data Preparation"
53
   ]
54
  },
55
  {
56
   "cell_type": "code",
57
   "execution_count": 1,
58
   "metadata": {},
59
   "outputs": [
60
    {
61
     "name": "stdout",
62
     "output_type": "stream",
63
     "text": [
64
      "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 23.1.2 is available.\n",
65
      "You should consider upgrading via the '/home/jelena/.pyenv/versions/3.8.16/envs/pinecone-product-recommender/bin/python3.8 -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n",
66
      "\u001b[0m"
67
     ]
68
    }
69
   ],
70
   "source": [
71
    "!pip install -qU numpy pandas scipy"
72
   ]
73
  },
74
  {
75
   "attachments": {},
76
   "cell_type": "markdown",
77
   "metadata": {
78
    "id": "dG4733cIt5G8"
79
   },
80
   "source": [
81
    "**Import Python Libraries**"
82
   ]
83
  },
84
  {
85
   "cell_type": "code",
86
   "execution_count": 2,
87
   "metadata": {
88
    "id": "emp_MSXZt5G8"
89
   },
90
   "outputs": [],
91
   "source": [
92
    "import os\n",
93
    "import time\n",
94
    "import numpy as np\n",
95
    "import pandas as pd\n",
96
    "import scipy.sparse as sparse\n",
97
    "import itertools"
98
   ]
99
  },
100
  {
101
   "cell_type": "code",
102
   "execution_count": 3,
103
   "metadata": {},
104
   "outputs": [
105
    {
106
     "name": "stdout",
107
     "output_type": "stream",
108
     "text": [
109
      "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 23.1.2 is available.\n",
110
      "You should consider upgrading via the '/home/jelena/.pyenv/versions/3.8.16/envs/pinecone-product-recommender/bin/python3.8 -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n",
111
      "\u001b[0m"
112
     ]
113
    }
114
   ],
115
   "source": [
116
    "!pip install -qU kaggle"
117
   ]
118
  },
119
  {
120
   "cell_type": "code",
121
   "execution_count": 4,
122
   "metadata": {},
123
   "outputs": [],
124
   "source": [
125
    "try:\n",
126
    "    import kaggle\n",
127
    "except OSError as e:\n",
128
    "    print(e)"
129
   ]
130
  },
131
  {
132
   "attachments": {},
133
   "cell_type": "markdown",
134
   "metadata": {},
135
   "source": [
136
    "The first time you `import kaggle` you will see an `OSError`, that is because we need to add our Kaggle credentials to the `/root/.kaggle/kaggle.json` file. You can find these credentials on [Kaggle](https://kaggle.com) by accessing your profile in the top-right corner of the page. This will download a `kaggle.json` file which contains your username and secret key. You can enter them below:"
137
   ]
138
  },
139
  {
140
   "cell_type": "code",
141
   "execution_count": 5,
142
   "metadata": {},
143
   "outputs": [],
144
   "source": [
145
    "import json\n",
146
    "\n",
147
    "with open('/root/.kaggle/kaggle.json', 'w') as fp:\n",
148
    "    fp.write(json.dumps({\"username\":\"YOUR_USERNAME\",\"key\":\"YOUR_SECRET_KEY\"}))"
149
   ]
150
  },
151
  {
152
   "attachments": {},
153
   "cell_type": "markdown",
154
   "metadata": {},
155
   "source": [
156
    "Now we can download the dataset:"
157
   ]
158
  },
159
  {
160
   "cell_type": "code",
161
   "execution_count": 6,
162
   "metadata": {},
163
   "outputs": [
164
    {
165
     "name": "stdout",
166
     "output_type": "stream",
167
     "text": [
168
      "Downloading instacart-market-basket-analysis.zip to /home/jelena/Projects/Pinecone/examples/recommendation/product-recommender\n",
169
      "100%|███████████████████████████████████████▉| 196M/196M [00:27<00:00, 9.01MB/s]\n",
170
      "100%|████████████████████████████████████████| 196M/196M [00:27<00:00, 7.36MB/s]\n"
171
     ]
172
    }
173
   ],
174
   "source": [
175
    "!kaggle competitions download -c instacart-market-basket-analysis"
176
   ]
177
  },
178
  {
179
   "attachments": {},
180
   "cell_type": "markdown",
181
   "metadata": {},
182
   "source": [
183
    "This downloads a set of zip files, we extract them like so:"
184
   ]
185
  },
186
  {
187
   "cell_type": "code",
188
   "execution_count": 7,
189
   "metadata": {},
190
   "outputs": [],
191
   "source": [
192
    "import zipfile\n",
193
    "\n",
194
    "files = [\n",
195
    "    'instacart-market-basket-analysis.zip',\n",
196
    "    'order_products__train.csv.zip',\n",
197
    "    'order_products__prior.csv.zip',\n",
198
    "    'products.csv.zip',\n",
199
    "    'orders.csv.zip'\n",
200
    "]\n",
201
    "\n",
202
    "for filename in files:\n",
203
    "    with zipfile.ZipFile(filename, 'r') as zip_ref:\n",
204
    "        zip_ref.extractall('./')"
205
   ]
206
  },
207
  {
208
   "attachments": {},
209
   "cell_type": "markdown",
210
   "metadata": {},
211
   "source": [
212
    "Now we can move on to loading the dataset."
213
   ]
214
  },
215
  {
216
   "attachments": {},
217
   "cell_type": "markdown",
218
   "metadata": {
219
    "id": "xYHyJO15t5G-"
220
   },
221
   "source": [
222
    "**Load the (Example) Instacart Data**"
223
   ]
224
  },
225
  {
226
   "attachments": {},
227
   "cell_type": "markdown",
228
   "metadata": {
229
    "id": "Oxgd8tjXt5G-"
230
   },
231
   "source": [
232
    "We are going to use the [Instacart Market Basket Analysis](https://www.kaggle.com/c/instacart-market-basket-analysis/data) dataset for this task.\n",
233
    "\n",
234
    "The data used throughout this example is a set of files describing customers' orders over time. The main focus is on the *orders.csv* file, where each line represents a relation between a user and the order. In other words, each line has information on *userid* (user who made the order) and *orderid*. Note there is no information about products in this table. Product information related to specific orders is stored in the *order_product__*.csv* dataset."
235
   ]
236
  },
237
  {
238
   "cell_type": "code",
239
   "execution_count": 8,
240
   "metadata": {
241
    "id": "cBbbR7Rut5G_"
242
   },
243
   "outputs": [],
244
   "source": [
245
    "order_products_train = pd.read_csv('order_products__train.csv')\n",
246
    "order_products_prior = pd.read_csv('order_products__prior.csv')\n",
247
    "products = pd.read_csv('products.csv')\n",
248
    "orders = pd.read_csv('orders.csv')\n",
249
    "\n",
250
    "order_products = pd.concat([order_products_train, order_products_prior])"
251
   ]
252
  },
253
  {
254
   "attachments": {},
255
   "cell_type": "markdown",
256
   "metadata": {
257
    "id": "XecuCyNlt5HA"
258
   },
259
   "source": [
260
    "**Preparing data for the model**\n"
261
   ]
262
  },
263
  {
264
   "attachments": {},
265
   "cell_type": "markdown",
266
   "metadata": {
267
    "id": "5FV_GGjst5HA"
268
   },
269
   "source": [
270
    "The Collaborative Filtering model used in this example requires only users’ historical preferences on a set of items. As there is no explicit rating in the data we are using, the purchase quantity can represent a “confidence” in terms of how strong the interaction was between the user and the products.\n",
271
    "\n",
272
    "The dataframe data will store this data and will be the base for the model."
273
   ]
274
  },
275
  {
276
   "cell_type": "code",
277
   "execution_count": 9,
278
   "metadata": {
279
    "id": "ZjRh7RYpt5HB"
280
   },
281
   "outputs": [],
282
   "source": [
283
    "customer_order_products = pd.merge(orders, order_products, how='inner',on='order_id')\n",
284
    "\n",
285
    "# creating a table with \"confidences\"\n",
286
    "data = customer_order_products.groupby(['user_id', 'product_id'])[['order_id']].count().reset_index()\n",
287
    "data.columns=[\"user_id\", \"product_id\", \"total_orders\"]\n",
288
    "data.product_id = data.product_id.astype('int64')\n",
289
    "\n",
290
    "# Create a lookup frame so we can get the product names back in readable form later.\n",
291
    "products_lookup = products[['product_id', 'product_name']].drop_duplicates()\n",
292
    "products_lookup['product_id'] = products_lookup.product_id.astype('int64')"
293
   ]
294
  },
295
  {
296
   "attachments": {},
297
   "cell_type": "markdown",
298
   "metadata": {
299
    "id": "77lvwm0St5HC"
300
   },
301
   "source": [
302
    "We will create two prototype users here and add them to our data dataframe. Each user will be buying only a specific product:\n",
303
    "- The first user will be buying only **Mineral Water**\n",
304
    "- The second user will be buying baby products: **No More Tears Baby Shampoo** and **Baby Wash & Shampoo**\n",
305
    "\n",
306
    "These users will be later used for querying and examination of the model results."
307
   ]
308
  },
309
  {
310
   "cell_type": "code",
311
   "execution_count": 10,
312
   "metadata": {
313
    "id": "A06EfAf-t5HC",
314
    "outputId": "d040560e-4401-47d4-8749-fb1bb9397a29"
315
   },
316
   "outputs": [
317
    {
318
     "data": {
319
      "text/html": [
320
       "<div>\n",
321
       "<style scoped>\n",
322
       "    .dataframe tbody tr th:only-of-type {\n",
323
       "        vertical-align: middle;\n",
324
       "    }\n",
325
       "\n",
326
       "    .dataframe tbody tr th {\n",
327
       "        vertical-align: top;\n",
328
       "    }\n",
329
       "\n",
330
       "    .dataframe thead th {\n",
331
       "        text-align: right;\n",
332
       "    }\n",
333
       "</style>\n",
334
       "<table border=\"1\" class=\"dataframe\">\n",
335
       "  <thead>\n",
336
       "    <tr style=\"text-align: right;\">\n",
337
       "      <th></th>\n",
338
       "      <th>user_id</th>\n",
339
       "      <th>product_id</th>\n",
340
       "      <th>total_orders</th>\n",
341
       "    </tr>\n",
342
       "  </thead>\n",
343
       "  <tbody>\n",
344
       "    <tr>\n",
345
       "      <th>0</th>\n",
346
       "      <td>206210</td>\n",
347
       "      <td>22802</td>\n",
348
       "      <td>97</td>\n",
349
       "    </tr>\n",
350
       "    <tr>\n",
351
       "      <th>1</th>\n",
352
       "      <td>206211</td>\n",
353
       "      <td>26834</td>\n",
354
       "      <td>89</td>\n",
355
       "    </tr>\n",
356
       "    <tr>\n",
357
       "      <th>2</th>\n",
358
       "      <td>206211</td>\n",
359
       "      <td>12590</td>\n",
360
       "      <td>77</td>\n",
361
       "    </tr>\n",
362
       "  </tbody>\n",
363
       "</table>\n",
364
       "</div>"
365
      ],
366
      "text/plain": [
367
       "   user_id  product_id  total_orders\n",
368
       "0   206210       22802            97\n",
369
       "1   206211       26834            89\n",
370
       "2   206211       12590            77"
371
      ]
372
     },
373
     "execution_count": 10,
374
     "metadata": {},
375
     "output_type": "execute_result"
376
    }
377
   ],
378
   "source": [
379
    "data_new = pd.DataFrame([[data.user_id.max() + 1, 22802, 97],\n",
380
    "                         [data.user_id.max() + 2, 26834, 89],\n",
381
    "                         [data.user_id.max() + 2, 12590, 77]\n",
382
    "                        ], columns=['user_id', 'product_id', 'total_orders'])\n",
383
    "data_new"
384
   ]
385
  },
386
  {
387
   "cell_type": "code",
388
   "execution_count": 11,
389
   "metadata": {
390
    "id": "mNIJ2hq6t5HD",
391
    "outputId": "cf155dc8-82ba-4f29-9cba-49d8a5052db9"
392
   },
393
   "outputs": [
394
    {
395
     "data": {
396
      "text/html": [
397
       "<div>\n",
398
       "<style scoped>\n",
399
       "    .dataframe tbody tr th:only-of-type {\n",
400
       "        vertical-align: middle;\n",
401
       "    }\n",
402
       "\n",
403
       "    .dataframe tbody tr th {\n",
404
       "        vertical-align: top;\n",
405
       "    }\n",
406
       "\n",
407
       "    .dataframe thead th {\n",
408
       "        text-align: right;\n",
409
       "    }\n",
410
       "</style>\n",
411
       "<table border=\"1\" class=\"dataframe\">\n",
412
       "  <thead>\n",
413
       "    <tr style=\"text-align: right;\">\n",
414
       "      <th></th>\n",
415
       "      <th>user_id</th>\n",
416
       "      <th>product_id</th>\n",
417
       "      <th>total_orders</th>\n",
418
       "    </tr>\n",
419
       "  </thead>\n",
420
       "  <tbody>\n",
421
       "    <tr>\n",
422
       "      <th>13863744</th>\n",
423
       "      <td>206209</td>\n",
424
       "      <td>48697</td>\n",
425
       "      <td>1</td>\n",
426
       "    </tr>\n",
427
       "    <tr>\n",
428
       "      <th>13863745</th>\n",
429
       "      <td>206209</td>\n",
430
       "      <td>48742</td>\n",
431
       "      <td>2</td>\n",
432
       "    </tr>\n",
433
       "    <tr>\n",
434
       "      <th>13863746</th>\n",
435
       "      <td>206210</td>\n",
436
       "      <td>22802</td>\n",
437
       "      <td>97</td>\n",
438
       "    </tr>\n",
439
       "    <tr>\n",
440
       "      <th>13863747</th>\n",
441
       "      <td>206211</td>\n",
442
       "      <td>26834</td>\n",
443
       "      <td>89</td>\n",
444
       "    </tr>\n",
445
       "    <tr>\n",
446
       "      <th>13863748</th>\n",
447
       "      <td>206211</td>\n",
448
       "      <td>12590</td>\n",
449
       "      <td>77</td>\n",
450
       "    </tr>\n",
451
       "  </tbody>\n",
452
       "</table>\n",
453
       "</div>"
454
      ],
455
      "text/plain": [
456
       "          user_id  product_id  total_orders\n",
457
       "13863744   206209       48697             1\n",
458
       "13863745   206209       48742             2\n",
459
       "13863746   206210       22802            97\n",
460
       "13863747   206211       26834            89\n",
461
       "13863748   206211       12590            77"
462
      ]
463
     },
464
     "execution_count": 11,
465
     "metadata": {},
466
     "output_type": "execute_result"
467
    }
468
   ],
469
   "source": [
470
    "data = pd.concat([data, data_new]).reset_index(drop = True)\n",
471
    "data.tail()"
472
   ]
473
  },
474
  {
475
   "attachments": {},
476
   "cell_type": "markdown",
477
   "metadata": {
478
    "id": "xBC-8PFTt5HD"
479
   },
480
   "source": [
481
    "In the next step, we will first extract user and item unique ids, in order to create a CSR (Compressed Sparse Row) matrix. \n"
482
   ]
483
  },
484
  {
485
   "cell_type": "code",
486
   "execution_count": 12,
487
   "metadata": {
488
    "id": "v2_2R7zmt5HE"
489
   },
490
   "outputs": [],
491
   "source": [
492
    "users = list(np.sort(data.user_id.unique()))\n",
493
    "items = list(np.sort(products.product_id.unique()))\n",
494
    "purchases = list(data.total_orders)\n",
495
    "\n",
496
    "# create zero-based index position <-> user/item ID mappings\n",
497
    "index_to_user = pd.Series(users)\n",
498
    "\n",
499
    "# create reverse mappings from user/item ID to index positions\n",
500
    "user_to_index = pd.Series(data=index_to_user.index + 1, index=index_to_user.values)\n",
501
    "\n",
502
    "# create zero-based index position <-> item/user ID mappings\n",
503
    "index_to_item = pd.Series(items)\n",
504
    "\n",
505
    "# create reverse mapping from item/user ID to index positions\n",
506
    "item_to_index = pd.Series(data=index_to_item.index, index=index_to_item.values)\n",
507
    "\n",
508
    "# Get the rows and columns for our new matrix\n",
509
    "products_rows = data.product_id.astype(int)\n",
510
    "users_cols = data.user_id.astype(int)\n",
511
    "\n",
512
    "# Create a sparse matrix for our users and products containing number of purchases\n",
513
    "sparse_product_user = sparse.csr_matrix((purchases, (products_rows, users_cols)), shape=(len(items) + 1, len(users) + 1))\n",
514
    "sparse_product_user.data = np.nan_to_num(sparse_product_user.data, copy=False)\n",
515
    "\n",
516
    "sparse_user_product = sparse.csr_matrix((purchases, (users_cols, products_rows)), shape=(len(users) + 1, len(items) + 1))\n",
517
    "sparse_user_product.data = np.nan_to_num(sparse_user_product.data, copy=False)"
518
   ]
519
  },
520
  {
521
   "attachments": {},
522
   "cell_type": "markdown",
523
   "metadata": {
524
    "id": "VrHSgtvht5HE"
525
   },
526
   "source": [
527
    "## Implicit Model"
528
   ]
529
  },
530
  {
531
   "attachments": {},
532
   "cell_type": "markdown",
533
   "metadata": {
534
    "id": "II6wOH96t5HF"
535
   },
536
   "source": [
537
    "In this section we will demonstrate creation and training of a recommender model using the **implicit** library. The recommendation model is based off the algorithms described in the paper [Collaborative Filtering for Implicit Feedback Datasets](https://www.researchgate.net/publication/220765111_Collaborative_Filtering_for_Implicit_Feedback_Datasets) with performance optimizations described in [Applications of the Conjugate Gradient Method for Implicit Feedback Collaborative Filtering](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.379.6473&rep=rep1&type=pdf).\n"
538
   ]
539
  },
540
  {
541
   "cell_type": "code",
542
   "execution_count": 13,
543
   "metadata": {
544
    "id": "OFHfWKD9t5HF"
545
   },
546
   "outputs": [
547
    {
548
     "name": "stdout",
549
     "output_type": "stream",
550
     "text": [
551
      "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 23.1.2 is available.\n",
552
      "You should consider upgrading via the '/home/jelena/.pyenv/versions/3.8.16/envs/pinecone-product-recommender/bin/python3.8 -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n",
553
      "\u001b[0m"
554
     ]
555
    }
556
   ],
557
   "source": [
558
    "!pip install -qU implicit"
559
   ]
560
  },
561
  {
562
   "cell_type": "code",
563
   "execution_count": 14,
564
   "metadata": {
565
    "colab": {
566
     "referenced_widgets": [
567
      "f9f40f34ce41471e9385b2745c39df66"
568
     ]
569
    },
570
    "id": "k0GW99kxt5HF",
571
    "outputId": "6dccdd29-eea1-421e-d2b3-cd87c588c606"
572
   },
573
   "outputs": [
574
    {
575
     "name": "stderr",
576
     "output_type": "stream",
577
     "text": [
578
      "/home/jelena/.pyenv/versions/3.8.16/envs/pinecone-product-recommender/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
579
      "  from .autonotebook import tqdm as notebook_tqdm\n",
580
      "100%|██████████| 50/50 [12:30<00:00, 15.02s/it]\n"
581
     ]
582
    }
583
   ],
584
   "source": [
585
    "import implicit\n",
586
    "from implicit import evaluation\n",
587
    "\n",
588
    "#split data into train and test sets\n",
589
    "train_set, test_set = evaluation.train_test_split(sparse_user_product, train_percentage=0.9)\n",
590
    "\n",
591
    "# initialize a model\n",
592
    "model = implicit.als.AlternatingLeastSquares(factors=100,\n",
593
    "                                             regularization=0.05,\n",
594
    "                                             iterations=50,\n",
595
    "                                             num_threads=1)\n",
596
    "\n",
597
    "alpha_val = 15\n",
598
    "train_set = (train_set * alpha_val).astype('double')\n",
599
    "\n",
600
    "# train the model on a sparse matrix of item/user/confidence weights\n",
601
    "model.fit(train_set, show_progress = True)"
602
   ]
603
  },
604
  {
605
   "attachments": {},
606
   "cell_type": "markdown",
607
   "metadata": {
608
    "id": "yN80hSojt5HF"
609
   },
610
   "source": [
611
    "We will evaluate the model using the inbuilt library function"
612
   ]
613
  },
614
  {
615
   "cell_type": "code",
616
   "execution_count": 15,
617
   "metadata": {
618
    "colab": {
619
     "referenced_widgets": [
620
      "9611edb2a0584f3e88f2c8f0aac1c86d"
621
     ]
622
    },
623
    "id": "BbD8of_nt5HG",
624
    "outputId": "7fdb7da2-a4b2-47ba-d93e-2463c9a40004"
625
   },
626
   "outputs": [
627
    {
628
     "name": "stderr",
629
     "output_type": "stream",
630
     "text": [
631
      "100%|██████████| 193027/193027 [01:22<00:00, 2331.67it/s]\n"
632
     ]
633
    },
634
    {
635
     "data": {
636
      "text/plain": [
637
       "{'precision': 0.2749977629022807,\n",
638
       " 'map': 0.04447513449736459,\n",
639
       " 'ndcg': 0.1442493819257263,\n",
640
       " 'auc': 0.6550438975250414}"
641
      ]
642
     },
643
     "execution_count": 15,
644
     "metadata": {},
645
     "output_type": "execute_result"
646
    }
647
   ],
648
   "source": [
649
    "test_set = (test_set * alpha_val).astype('double')\n",
650
    "evaluation.ranking_metrics_at_k(model, train_set, test_set, K=100,\n",
651
    "                         show_progress=True, num_threads=1)"
652
   ]
653
  },
654
  {
655
   "attachments": {},
656
   "cell_type": "markdown",
657
   "metadata": {
658
    "id": "LNmva3Dlt5HG"
659
   },
660
   "source": [
661
    "This is what item and user factors look like. These vectors will be stored in our vector index later and used for recommendation."
662
   ]
663
  },
664
  {
665
   "cell_type": "code",
666
   "execution_count": 16,
667
   "metadata": {
668
    "id": "JUtCROQKt5HG",
669
    "outputId": "af356153-eed8-4457-fe64-94c1607f1abc"
670
   },
671
   "outputs": [
672
    {
673
     "data": {
674
      "text/plain": [
675
       "array([[ 7.62302289e-03, -1.87499879e-03, -2.67132535e-04,\n",
676
       "        -1.98393618e-03,  1.25893755e-02,  2.26164684e-02,\n",
677
       "        -2.98452517e-03,  1.48304272e-03,  4.13032155e-03,\n",
678
       "         1.40623152e-02,  7.02867238e-03,  1.75751895e-02,\n",
679
       "        -8.48765578e-03,  7.23443553e-03,  1.19221059e-03,\n",
680
       "         2.13307124e-02,  1.94959883e-02,  5.55894384e-03,\n",
681
       "         5.24600095e-04,  4.55727894e-03,  8.32605269e-03,\n",
682
       "         2.58112140e-03, -4.82219504e-04, -1.20439762e-02,\n",
683
       "         2.47381651e-03,  1.28009180e-02,  2.08530622e-03,\n",
684
       "        -3.33145494e-03,  1.92555189e-02,  4.34594555e-03,\n",
685
       "        -6.63968595e-03,  5.56971226e-03,  6.72743656e-03,\n",
686
       "         2.78714020e-03, -2.83625396e-03, -6.03283104e-03,\n",
687
       "         8.45496822e-03,  1.18931178e-02,  2.10034065e-02,\n",
688
       "        -1.68929249e-02,  1.79267656e-02, -9.86911077e-03,\n",
689
       "         1.26773408e-02, -1.57724898e-02,  8.18107463e-03,\n",
690
       "        -3.72606446e-03, -1.64890327e-02,  1.08041009e-02,\n",
691
       "         1.01122679e-02,  1.74866407e-03, -1.00692408e-02,\n",
692
       "         3.72917973e-03,  1.77870430e-02,  1.15388604e-02,\n",
693
       "         1.66979171e-02,  2.30163876e-02,  1.45245846e-02,\n",
694
       "         1.09790787e-02,  2.25082710e-02,  8.88522156e-03,\n",
695
       "         3.84999276e-03,  6.78812014e-03, -6.29206235e-03,\n",
696
       "         1.20011866e-02,  5.73713623e-04, -3.58484423e-04,\n",
697
       "         2.25704573e-02,  1.97128039e-02,  2.48618261e-03,\n",
698
       "         1.00341532e-02,  3.30620557e-02,  4.67582047e-03,\n",
699
       "         1.25671867e-02,  9.63101815e-03, -6.98126294e-03,\n",
700
       "         4.13466897e-03,  7.89871905e-03, -1.68388349e-03,\n",
701
       "         4.41470556e-03,  2.16695480e-02,  3.36539634e-02,\n",
702
       "         1.15288170e-02,  1.78786926e-02,  1.03035588e-02,\n",
703
       "         8.54070950e-03,  2.61212088e-04, -1.41171052e-03,\n",
704
       "        -1.05745839e-02, -8.94313399e-03,  1.24474419e-02,\n",
705
       "        -1.49678264e-03,  1.30350972e-02, -3.78952269e-03,\n",
706
       "         1.29853217e-02, -5.84876491e-03,  8.83117318e-03,\n",
707
       "         1.09084947e-02,  1.75822247e-02, -2.42153485e-03,\n",
708
       "         2.69188061e-02],\n",
709
       "       [ 5.21759735e-03,  1.47629611e-03,  4.69345972e-03,\n",
710
       "         2.87870294e-03,  4.24716249e-03,  4.83599026e-03,\n",
711
       "        -1.84578449e-03,  3.33843078e-03,  9.51172377e-04,\n",
712
       "         7.68936379e-03,  1.14221417e-03,  5.26204892e-03,\n",
713
       "         5.94653701e-03,  5.13354549e-03,  5.08742314e-03,\n",
714
       "         1.38968031e-03,  2.90560676e-03,  4.72711818e-03,\n",
715
       "         3.81981768e-03,  8.63695331e-03,  1.98779325e-03,\n",
716
       "         4.14589746e-03,  2.45648017e-03,  8.72450136e-03,\n",
717
       "         3.72051424e-03,  3.63018154e-03,  8.17742944e-03,\n",
718
       "         2.83566094e-03,  1.01597689e-03,  2.12685089e-03,\n",
719
       "         6.25073735e-05,  7.25640962e-03,  3.82022094e-03,\n",
720
       "         4.34174994e-03,  5.85072301e-03,  9.47047374e-04,\n",
721
       "         2.79816077e-03,  4.49727383e-03,  4.60178824e-03,\n",
722
       "         1.36057427e-03,  4.69741132e-03,  9.56356467e-04,\n",
723
       "         1.61343289e-03,  6.29134802e-03,  2.78463890e-03,\n",
724
       "         5.11198118e-03, -7.20114913e-04,  3.37949023e-03,\n",
725
       "         2.91588088e-03,  5.00901928e-03,  4.65172669e-03,\n",
726
       "         5.20490110e-03,  3.70147056e-03,  6.90481300e-03,\n",
727
       "         2.34888913e-03,  2.18196819e-03,  3.43826832e-03,\n",
728
       "         5.50945289e-03,  1.69402442e-03,  7.94254523e-03,\n",
729
       "         4.06751060e-04,  2.06253398e-03,  5.51156874e-04,\n",
730
       "         3.54768243e-03,  4.68675978e-03,  4.41560568e-03,\n",
731
       "         3.10626742e-03,  6.76616607e-03,  2.97088153e-03,\n",
732
       "         5.23018744e-03,  8.21463531e-04,  3.55466595e-03,\n",
733
       "         3.55025590e-03,  4.21751942e-03,  7.53766426e-06,\n",
734
       "         3.38028139e-03,  6.64595002e-03,  1.43511093e-03,\n",
735
       "         2.78131408e-03,  7.84743484e-03,  1.02839316e-03,\n",
736
       "         3.32346838e-03,  3.19992891e-03,  1.24205020e-04,\n",
737
       "         6.94080070e-03,  5.62512595e-03,  4.24749590e-03,\n",
738
       "         4.09889780e-03,  8.02688207e-03,  7.17388117e-04,\n",
739
       "         4.47497796e-03,  4.97345300e-03,  4.65388224e-03,\n",
740
       "         3.28435283e-03,  5.25846845e-03,  7.61173666e-03,\n",
741
       "         3.32462339e-04,  3.95298610e-03,  4.66898642e-03,\n",
742
       "         2.08678166e-03]], dtype=float32)"
743
      ]
744
     },
745
     "execution_count": 16,
746
     "metadata": {},
747
     "output_type": "execute_result"
748
    }
749
   ],
750
   "source": [
751
    "model.item_factors[1:3]"
752
   ]
753
  },
754
  {
755
   "cell_type": "code",
756
   "execution_count": 17,
757
   "metadata": {
758
    "id": "O3onbJmnt5HG",
759
    "outputId": "0c7cbbbc-eb8c-4a8f-da75-a16e62693cc3"
760
   },
761
   "outputs": [
762
    {
763
     "data": {
764
      "text/plain": [
765
       "array([[-1.1535235e-01, -1.0716660e+00,  1.9001467e+00, -2.8103048e-01,\n",
766
       "         2.3507787e-01, -1.3524084e-01, -9.7265404e-01, -2.4441715e-01,\n",
767
       "        -1.3157841e+00, -9.3033111e-01,  1.3205792e+00,  1.5934669e+00,\n",
768
       "        -8.7007034e-01,  8.2152611e-01, -1.1706644e+00,  1.0730673e+00,\n",
769
       "         1.8356223e+00, -6.4666368e-02, -3.6772355e-01,  9.4636500e-01,\n",
770
       "         2.3708563e+00, -1.0900316e+00, -1.9363825e-01, -1.2721621e+00,\n",
771
       "        -3.8839546e-01,  1.5143086e+00, -8.0761343e-01, -9.5621413e-01,\n",
772
       "         1.4447758e+00, -3.5387911e-02,  1.3933902e+00,  8.5588759e-01,\n",
773
       "         4.9890396e-01,  3.4040824e-01, -3.7110674e-01,  3.9586887e-01,\n",
774
       "        -4.4984985e-02,  1.0853641e+00,  1.0416827e+00, -2.0454068e+00,\n",
775
       "         2.1352434e+00, -1.4058716e+00,  8.2302362e-01,  2.8762394e-01,\n",
776
       "        -8.2267590e-02, -9.1558552e-01,  5.0190097e-01,  2.0039406e-01,\n",
777
       "         1.2773964e+00, -4.9887997e-01, -1.5381695e-02, -7.4103689e-01,\n",
778
       "        -4.8554859e-01, -6.2013620e-01,  1.2602403e+00,  3.7264615e-01,\n",
779
       "         1.6895621e-01, -1.3288108e+00,  7.5886017e-01,  3.5683268e-01,\n",
780
       "        -3.4131634e-01,  1.8973587e-01,  2.9325524e-01, -7.4078739e-01,\n",
781
       "        -1.2132510e+00, -1.0879766e+00, -1.3704516e+00, -1.2282028e+00,\n",
782
       "        -9.8826337e-01,  4.5458439e-01,  2.5428922e+00,  1.1061906e+00,\n",
783
       "         9.2474401e-01,  1.5637769e+00, -1.2372558e+00, -9.5266867e-01,\n",
784
       "         3.3413932e-01,  8.3561289e-01, -5.5887586e-01,  7.6054811e-02,\n",
785
       "         1.9956827e+00,  9.4774431e-01,  3.6637652e-01,  8.2821363e-01,\n",
786
       "         2.8309381e-01, -9.9692512e-01,  9.8618686e-01,  8.0894047e-01,\n",
787
       "        -4.6432391e-01, -7.3879558e-01, -1.4049910e+00,  3.1375384e-01,\n",
788
       "         7.0995593e-01, -3.4533828e-01,  1.0314369e+00, -7.4499923e-01,\n",
789
       "        -1.0274127e+00, -5.7535094e-01, -1.9709327e+00,  1.4919670e+00],\n",
790
       "       [-1.5587401e+00,  1.5053092e-01,  3.7092552e-02,  2.1527238e+00,\n",
791
       "        -2.2145588e+00, -6.1379347e-02,  1.2396275e+00, -8.3214724e-01,\n",
792
       "        -3.2939258e+00, -5.2348149e-01,  9.3661207e-01, -8.3603543e-01,\n",
793
       "        -2.0481503e+00,  3.6520573e-01,  2.1556497e+00,  1.6638597e+00,\n",
794
       "         1.0686793e+00,  1.4715923e+00, -1.3041624e+00, -2.5738211e+00,\n",
795
       "         1.6351489e+00,  2.6861623e-01,  1.4985343e+00, -3.6418435e-01,\n",
796
       "         6.7551094e-01,  1.4531764e+00, -9.3211579e-01,  1.6065925e+00,\n",
797
       "        -5.6125438e-01,  1.5840977e+00,  1.1768103e+00,  2.9195817e+00,\n",
798
       "         3.0073445e+00, -2.6590629e-03, -6.4741218e-01, -1.6547465e-01,\n",
799
       "        -3.2860899e-01, -1.3454194e+00, -1.3172647e-01, -1.9245317e+00,\n",
800
       "        -3.2763949e-01,  1.6461329e+00,  1.2353618e+00,  8.4044081e-01,\n",
801
       "         1.9554125e+00,  1.3067812e+00,  3.1975982e+00, -1.4149319e-01,\n",
802
       "        -1.8368236e+00, -1.0103904e+00,  5.0911957e-01,  5.5677569e-01,\n",
803
       "         6.9721764e-01, -5.6430489e-01,  2.2003100e+00, -1.0744971e+00,\n",
804
       "         1.6468230e+00, -2.3111634e+00,  1.3065448e+00, -8.5009754e-01,\n",
805
       "         2.0129523e+00, -1.7714916e-01, -9.0549275e-02, -7.5850022e-01,\n",
806
       "        -6.3388091e-01,  1.2075551e+00, -3.2875051e+00, -1.0517368e+00,\n",
807
       "        -4.8032269e-01,  1.6501316e-01, -3.2218367e-01,  1.8979825e+00,\n",
808
       "         1.3875642e+00,  1.9148558e+00,  9.3831289e-01,  9.1719218e-02,\n",
809
       "        -3.4196714e-01,  1.0367199e+00,  2.6459680e+00, -2.4039178e+00,\n",
810
       "        -3.1851164e-01, -4.1336271e-01, -5.2495323e-02, -4.7513500e-01,\n",
811
       "        -2.1291137e+00, -2.8807511e+00,  1.7764181e+00,  2.3209753e+00,\n",
812
       "        -2.4233973e+00,  2.3937972e+00, -5.7148945e-01,  6.0630590e-01,\n",
813
       "         5.6338543e-01,  9.7472930e-01, -6.0403675e-02, -5.1794976e-01,\n",
814
       "         1.6027211e+00,  1.0213125e-03,  1.5301393e+00, -2.9542151e-01]],\n",
815
       "      dtype=float32)"
816
      ]
817
     },
818
     "execution_count": 17,
819
     "metadata": {},
820
     "output_type": "execute_result"
821
    }
822
   ],
823
   "source": [
824
    "model.user_factors[1:3]"
825
   ]
826
  },
827
  {
828
   "attachments": {},
829
   "cell_type": "markdown",
830
   "metadata": {
831
    "id": "n2ymVmqct5HH"
832
   },
833
   "source": [
834
    "## Configure Pinecone"
835
   ]
836
  },
837
  {
838
   "attachments": {},
839
   "cell_type": "markdown",
840
   "metadata": {
841
    "id": "YSRAKA56t5HH"
842
   },
843
   "source": [
844
    "Install and setup Pinecone"
845
   ]
846
  },
847
  {
848
   "cell_type": "code",
849
   "execution_count": 18,
850
   "metadata": {
851
    "id": "oxZDkCjht5HH"
852
   },
853
   "outputs": [
854
    {
855
     "name": "stdout",
856
     "output_type": "stream",
857
     "text": [
858
      "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 23.1.2 is available.\n",
859
      "You should consider upgrading via the '/home/jelena/.pyenv/versions/3.8.16/envs/pinecone-product-recommender/bin/python3.8 -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n",
860
      "\u001b[0m"
861
     ]
862
    }
863
   ],
864
   "source": [
865
    "!pip install -qU pinecone-client"
866
   ]
867
  },
868
  {
869
   "cell_type": "code",
870
   "execution_count": 19,
871
   "metadata": {
872
    "id": "mOEmO-yft5HH"
873
   },
874
   "outputs": [],
875
   "source": [
876
    "from pinecone import Pinecone"
877
   ]
878
  },
879
  {
880
   "cell_type": "code",
881
   "execution_count": 20,
882
   "metadata": {
883
    "id": "3_ykVLT6t5HH"
884
   },
885
   "outputs": [],
886
   "source": [
887
    "# Load Pinecone API key\n",
888
    "api_key = os.getenv('PINECONE_API_KEY') or 'YOUR_API_KEY'\n",
889
    "# Set Pinecone environment. Find next to API key in console\n",
890
    "env = os.getenv('PINECONE_ENVIRONMENT') or \"YOUR_ENV\"\n",
891
    "\n",
892
    "pc = Pinecone(api_key=api_key)"
893
   ]
894
  },
895
  {
896
   "attachments": {},
897
   "cell_type": "markdown",
898
   "metadata": {
899
    "id": "KpWkJa-Dt5HH"
900
   },
901
   "source": [
902
    "[Get a Pinecone API key](http://app.pinecone.io/) if you don't have one."
903
   ]
904
  },
905
  {
906
   "cell_type": "code",
907
   "execution_count": 36,
908
   "metadata": {
909
    "id": "By8wu8B3t5HH",
910
    "outputId": "57b1aa59-d440-4f83-e3bd-3170d4f39a94"
911
   },
912
   "outputs": [
913
    {
914
     "data": {
915
      "text/plain": [
916
       "[]"
917
      ]
918
     },
919
     "execution_count": 36,
920
     "metadata": {},
921
     "output_type": "execute_result"
922
    }
923
   ],
924
   "source": [
925
    "#List all present indexes associated with your key, should be empty on the first run\n",
926
    "pinecone.list_indexes().names()"
927
   ]
928
  },
929
  {
930
   "attachments": {},
931
   "cell_type": "markdown",
932
   "metadata": {
933
    "id": "YQx7UIDOt5HI"
934
   },
935
   "source": [
936
    "**Create an Index**"
937
   ]
938
  },
939
  {
940
   "cell_type": "code",
941
   "execution_count": 37,
942
   "metadata": {
943
    "id": "4VDr7Wqst5HI"
944
   },
945
   "outputs": [],
946
   "source": [
947
    "# Set a name for your index\n",
948
    "index_name = 'product-recommender'"
949
   ]
950
  },
951
  {
952
   "cell_type": "code",
953
   "execution_count": 38,
954
   "metadata": {
955
    "id": "ufHsF0o4t5HI"
956
   },
957
   "outputs": [],
958
   "source": [
959
    "# Make sure service with the same name does not exist\n",
960
    "if index_name in pinecone.list_indexes().names():\n",
961
    "    pinecone.delete_index(index_name)\n",
962
    "pinecone.create_index(name=index_name, dimension=100)"
963
   ]
964
  },
965
  {
966
   "attachments": {},
967
   "cell_type": "markdown",
968
   "metadata": {
969
    "id": "v50hrlJNt5HI"
970
   },
971
   "source": [
972
    "**Connect to the new index**"
973
   ]
974
  },
975
  {
976
   "cell_type": "code",
977
   "execution_count": 39,
978
   "metadata": {
979
    "id": "ONg-J3ost5HI"
980
   },
981
   "outputs": [],
982
   "source": [
983
    "index = pinecone.Index(index_name=index_name)"
984
   ]
985
  },
986
  {
987
   "attachments": {},
988
   "cell_type": "markdown",
989
   "metadata": {
990
    "id": "8BjfHb2Ht5HI"
991
   },
992
   "source": [
993
    "## Load Data"
994
   ]
995
  },
996
  {
997
   "attachments": {},
998
   "cell_type": "markdown",
999
   "metadata": {
1000
    "id": "0-2K-g4-t5HJ"
1001
   },
1002
   "source": [
1003
    "Uploading all items (products that one can buy) and displaying some examples of products and their vector representations.\n"
1004
   ]
1005
  },
1006
  {
1007
   "cell_type": "code",
1008
   "execution_count": 40,
1009
   "metadata": {},
1010
   "outputs": [
1011
    {
1012
     "name": "stdout",
1013
     "output_type": "stream",
1014
     "text": [
1015
      "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 23.1.2 is available.\n",
1016
      "You should consider upgrading via the '/home/jelena/.pyenv/versions/3.8.16/envs/pinecone-product-recommender/bin/python3.8 -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n",
1017
      "\u001b[0m"
1018
     ]
1019
    }
1020
   ],
1021
   "source": [
1022
    "!pip install -qU torch"
1023
   ]
1024
  },
1025
  {
1026
   "cell_type": "code",
1027
   "execution_count": 41,
1028
   "metadata": {
1029
    "id": "NOOPF9zOt5HJ",
1030
    "outputId": "397b0227-0394-4fd6-e97c-bbd0ba7422ea"
1031
   },
1032
   "outputs": [
1033
    {
1034
     "data": {
1035
      "text/plain": [
1036
       "[('1',\n",
1037
       "  [0.007623022887855768,\n",
1038
       "   -0.0018749987939372659,\n",
1039
       "   -0.0002671325346454978,\n",
1040
       "   -0.0019839361775666475,\n",
1041
       "   0.012589375488460064,\n",
1042
       "   0.022616468369960785,\n",
1043
       "   -0.0029845251701772213,\n",
1044
       "   0.001483042724430561,\n",
1045
       "   0.004130321554839611,\n",
1046
       "   0.014062315225601196,\n",
1047
       "   0.007028672378510237,\n",
1048
       "   0.017575189471244812,\n",
1049
       "   -0.008487655781209469,\n",
1050
       "   0.007234435528516769,\n",
1051
       "   0.0011922105913981795,\n",
1052
       "   0.021330712363123894,\n",
1053
       "   0.01949598826467991,\n",
1054
       "   0.0055589438416063786,\n",
1055
       "   0.0005246000946499407,\n",
1056
       "   0.00455727893859148,\n",
1057
       "   0.008326052688062191,\n",
1058
       "   0.002581121399998665,\n",
1059
       "   -0.0004822195041924715,\n",
1060
       "   -0.012043976224958897,\n",
1061
       "   0.0024738165084272623,\n",
1062
       "   0.012800917960703373,\n",
1063
       "   0.002085306216031313,\n",
1064
       "   -0.003331454936414957,\n",
1065
       "   0.019255518913269043,\n",
1066
       "   0.004345945548266172,\n",
1067
       "   -0.006639685947448015,\n",
1068
       "   0.00556971225887537,\n",
1069
       "   0.0067274365574121475,\n",
1070
       "   0.0027871401980519295,\n",
1071
       "   -0.002836253959685564,\n",
1072
       "   -0.006032831035554409,\n",
1073
       "   0.008454968221485615,\n",
1074
       "   0.011893117800354958,\n",
1075
       "   0.02100340649485588,\n",
1076
       "   -0.016892924904823303,\n",
1077
       "   0.017926765605807304,\n",
1078
       "   -0.009869110770523548,\n",
1079
       "   0.012677340768277645,\n",
1080
       "   -0.015772489830851555,\n",
1081
       "   0.008181074634194374,\n",
1082
       "   -0.003726064460352063,\n",
1083
       "   -0.01648903265595436,\n",
1084
       "   0.010804100893437862,\n",
1085
       "   0.010112267918884754,\n",
1086
       "   0.0017486640717834234,\n",
1087
       "   -0.010069240815937519,\n",
1088
       "   0.003729179734364152,\n",
1089
       "   0.017787043005228043,\n",
1090
       "   0.011538860388100147,\n",
1091
       "   0.016697917133569717,\n",
1092
       "   0.023016387596726418,\n",
1093
       "   0.014524584636092186,\n",
1094
       "   0.010979078710079193,\n",
1095
       "   0.022508271038532257,\n",
1096
       "   0.008885221555829048,\n",
1097
       "   0.003849992761388421,\n",
1098
       "   0.00678812013939023,\n",
1099
       "   -0.006292062345892191,\n",
1100
       "   0.012001186609268188,\n",
1101
       "   0.0005737136234529316,\n",
1102
       "   -0.0003584844234865159,\n",
1103
       "   0.022570457309484482,\n",
1104
       "   0.01971280388534069,\n",
1105
       "   0.0024861826095730066,\n",
1106
       "   0.010034153237938881,\n",
1107
       "   0.033062055706977844,\n",
1108
       "   0.004675820469856262,\n",
1109
       "   0.01256718672811985,\n",
1110
       "   0.009631018154323101,\n",
1111
       "   -0.0069812629371881485,\n",
1112
       "   0.004134668968617916,\n",
1113
       "   0.007898719049990177,\n",
1114
       "   -0.0016838834853842854,\n",
1115
       "   0.0044147055596113205,\n",
1116
       "   0.021669548004865646,\n",
1117
       "   0.03365396335721016,\n",
1118
       "   0.011528817005455494,\n",
1119
       "   0.017878692597150803,\n",
1120
       "   0.01030355878174305,\n",
1121
       "   0.008540709502995014,\n",
1122
       "   0.00026121208793483675,\n",
1123
       "   -0.0014117105165496469,\n",
1124
       "   -0.010574583895504475,\n",
1125
       "   -0.008943133987486362,\n",
1126
       "   0.012447441928088665,\n",
1127
       "   -0.001496782642789185,\n",
1128
       "   0.013035097159445286,\n",
1129
       "   -0.003789522685110569,\n",
1130
       "   0.012985321693122387,\n",
1131
       "   -0.005848764907568693,\n",
1132
       "   0.008831173181533813,\n",
1133
       "   0.01090849470347166,\n",
1134
       "   0.017582224681973457,\n",
1135
       "   -0.002421534853056073,\n",
1136
       "   0.02691880613565445],\n",
1137
       "  {'title': 'Chocolate Sandwich Cookies'}),\n",
1138
       " ('2',\n",
1139
       "  [0.005217597354203463,\n",
1140
       "   0.0014762961072847247,\n",
1141
       "   0.004693459719419479,\n",
1142
       "   0.0028787029441446066,\n",
1143
       "   0.004247162491083145,\n",
1144
       "   0.004835990257561207,\n",
1145
       "   -0.0018457844853401184,\n",
1146
       "   0.0033384307753294706,\n",
1147
       "   0.0009511723765172064,\n",
1148
       "   0.007689363788813353,\n",
1149
       "   0.0011422141687944531,\n",
1150
       "   0.005262048915028572,\n",
1151
       "   0.005946537014096975,\n",
1152
       "   0.005133545491844416,\n",
1153
       "   0.005087423138320446,\n",
1154
       "   0.0013896803138777614,\n",
1155
       "   0.002905606757849455,\n",
1156
       "   0.00472711818292737,\n",
1157
       "   0.003819817677140236,\n",
1158
       "   0.008636953309178352,\n",
1159
       "   0.0019877932500094175,\n",
1160
       "   0.004145897459238768,\n",
1161
       "   0.002456480171531439,\n",
1162
       "   0.008724501356482506,\n",
1163
       "   0.0037205142434686422,\n",
1164
       "   0.0036301815416663885,\n",
1165
       "   0.008177429437637329,\n",
1166
       "   0.0028356609400361776,\n",
1167
       "   0.0010159768862649798,\n",
1168
       "   0.002126850886270404,\n",
1169
       "   6.250737351365387e-05,\n",
1170
       "   0.007256409619003534,\n",
1171
       "   0.0038202209398150444,\n",
1172
       "   0.00434174994006753,\n",
1173
       "   0.005850723013281822,\n",
1174
       "   0.0009470473742112517,\n",
1175
       "   0.002798160770907998,\n",
1176
       "   0.004497273825109005,\n",
1177
       "   0.004601788241416216,\n",
1178
       "   0.001360574271529913,\n",
1179
       "   0.004697411321103573,\n",
1180
       "   0.0009563564672134817,\n",
1181
       "   0.0016134328907355666,\n",
1182
       "   0.006291348021477461,\n",
1183
       "   0.002784638898447156,\n",
1184
       "   0.005111981183290482,\n",
1185
       "   -0.0007201149128377438,\n",
1186
       "   0.003379490226507187,\n",
1187
       "   0.002915880875661969,\n",
1188
       "   0.005009019281715155,\n",
1189
       "   0.0046517266891896725,\n",
1190
       "   0.005204901099205017,\n",
1191
       "   0.003701470559462905,\n",
1192
       "   0.0069048129953444,\n",
1193
       "   0.0023488891310989857,\n",
1194
       "   0.0021819681860506535,\n",
1195
       "   0.0034382683224976063,\n",
1196
       "   0.005509452894330025,\n",
1197
       "   0.0016940244240686297,\n",
1198
       "   0.007942545227706432,\n",
1199
       "   0.0004067510599270463,\n",
1200
       "   0.0020625339820981026,\n",
1201
       "   0.0005511568742804229,\n",
1202
       "   0.0035476824268698692,\n",
1203
       "   0.004686759784817696,\n",
1204
       "   0.004415605682879686,\n",
1205
       "   0.003106267424300313,\n",
1206
       "   0.0067661660723388195,\n",
1207
       "   0.002970881527289748,\n",
1208
       "   0.005230187438428402,\n",
1209
       "   0.000821463530883193,\n",
1210
       "   0.0035546659491956234,\n",
1211
       "   0.0035502559039741755,\n",
1212
       "   0.004217519424855709,\n",
1213
       "   7.537664259871235e-06,\n",
1214
       "   0.0033802813850343227,\n",
1215
       "   0.006645950023084879,\n",
1216
       "   0.0014351109275594354,\n",
1217
       "   0.0027813140768557787,\n",
1218
       "   0.007847434841096401,\n",
1219
       "   0.0010283931624144316,\n",
1220
       "   0.003323468379676342,\n",
1221
       "   0.003199928905814886,\n",
1222
       "   0.0001242050202563405,\n",
1223
       "   0.006940800696611404,\n",
1224
       "   0.005625125952064991,\n",
1225
       "   0.0042474959045648575,\n",
1226
       "   0.00409889779984951,\n",
1227
       "   0.008026882074773312,\n",
1228
       "   0.0007173881167545915,\n",
1229
       "   0.00447497796267271,\n",
1230
       "   0.004973453003913164,\n",
1231
       "   0.00465388223528862,\n",
1232
       "   0.003284352831542492,\n",
1233
       "   0.005258468445390463,\n",
1234
       "   0.0076117366552352905,\n",
1235
       "   0.0003324623394291848,\n",
1236
       "   0.003952986095100641,\n",
1237
       "   0.004668986424803734,\n",
1238
       "   0.0020867816638201475],\n",
1239
       "  {'title': 'All-Seasons Salt'})]"
1240
      ]
1241
     },
1242
     "metadata": {},
1243
     "output_type": "display_data"
1244
    }
1245
   ],
1246
   "source": [
1247
    "import torch\n",
1248
    "\n",
1249
    "# Get all of the items\n",
1250
    "all_items_titles = [{'title': title} for title in products_lookup['product_name']]\n",
1251
    "all_items_ids = [str(product_id) for product_id in products_lookup['product_id']]\n",
1252
    "\n",
1253
    "# Transform items into factors\n",
1254
    "items_factors = model.item_factors\n",
1255
    "\n",
1256
    "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
1257
    "item_embeddings = items_factors[1:].to_numpy().tolist() if device == \"cuda\" else items_factors[1:].tolist()\n",
1258
    "\n",
1259
    "# Prepare item factors for upload\n",
1260
    "items_to_insert = list(zip(all_items_ids, item_embeddings, all_items_titles))\n",
1261
    "display(items_to_insert[:2])"
1262
   ]
1263
  },
1264
  {
1265
   "attachments": {},
1266
   "cell_type": "markdown",
1267
   "metadata": {
1268
    "id": "YnEHPvuTt5HJ"
1269
   },
1270
   "source": [
1271
    "**Insert items into the index**"
1272
   ]
1273
  },
1274
  {
1275
   "cell_type": "code",
1276
   "execution_count": 42,
1277
   "metadata": {
1278
    "id": "BYwN6MI1t5HJ",
1279
    "outputId": "ddcbfcd4-f620-414c-e59d-220dd28d9224"
1280
   },
1281
   "outputs": [
1282
    {
1283
     "name": "stdout",
1284
     "output_type": "stream",
1285
     "text": [
1286
      "Index statistics before upsert: {'dimension': 100,\n",
1287
      " 'index_fullness': 0.0,\n",
1288
      " 'namespaces': {},\n",
1289
      " 'total_vector_count': 0}\n"
1290
     ]
1291
    },
1292
    {
1293
     "name": "stderr",
1294
     "output_type": "stream",
1295
     "text": [
1296
      "100%|██████████| 497/497 [05:36<00:00,  1.48it/s]\n"
1297
     ]
1298
    },
1299
    {
1300
     "name": "stdout",
1301
     "output_type": "stream",
1302
     "text": [
1303
      "Index statistics after upsert: {'dimension': 100,\n",
1304
      " 'index_fullness': 0.0,\n",
1305
      " 'namespaces': {'': {'vector_count': 49688}},\n",
1306
      " 'total_vector_count': 49688}\n"
1307
     ]
1308
    }
1309
   ],
1310
   "source": [
1311
    "from tqdm.auto import tqdm\n",
1312
    "\n",
1313
    "BATCH_SIZE = 100\n",
1314
    "\n",
1315
    "print('Index statistics before upsert:', index.describe_index_stats())\n",
1316
    "\n",
1317
    "for i in tqdm(range(0, len(items_to_insert), BATCH_SIZE)):\n",
1318
    "        index.upsert(vectors=items_to_insert[i:i+BATCH_SIZE])\n",
1319
    "\n",
1320
    "print('Index statistics after upsert:', index.describe_index_stats())"
1321
   ]
1322
  },
1323
  {
1324
   "attachments": {},
1325
   "cell_type": "markdown",
1326
   "metadata": {
1327
    "id": "ibNMrxyRt5HK"
1328
   },
1329
   "source": [
1330
    "This is a helper method for analysing recommendations later.\n",
1331
    "This method returns top N products that someone bought in the past (based on product quantity)."
1332
   ]
1333
  },
1334
  {
1335
   "cell_type": "code",
1336
   "execution_count": 43,
1337
   "metadata": {
1338
    "id": "Uzgk5Od0t5HK"
1339
   },
1340
   "outputs": [],
1341
   "source": [
1342
    "def products_bought_by_user_in_the_past(user_id: int, top: int = 10):\n",
1343
    "\n",
1344
    "    selected = data[data.user_id == user_id].sort_values(by=['total_orders'], ascending=False)\n",
1345
    "\n",
1346
    "    selected['product_name'] = selected['product_id'].map(products_lookup.set_index('product_id')['product_name'])\n",
1347
    "    selected = selected[['product_id', 'product_name', 'total_orders']].reset_index(drop=True)\n",
1348
    "    if selected.shape[0] < top:\n",
1349
    "        return selected\n",
1350
    "\n",
1351
    "    return selected[:top]"
1352
   ]
1353
  },
1354
  {
1355
   "cell_type": "code",
1356
   "execution_count": 44,
1357
   "metadata": {
1358
    "id": "1Gthi5Dkt5HK",
1359
    "outputId": "2acc3a2f-2c14-41ae-e93e-b2b3190055e2"
1360
   },
1361
   "outputs": [
1362
    {
1363
     "data": {
1364
      "text/html": [
1365
       "<div>\n",
1366
       "<style scoped>\n",
1367
       "    .dataframe tbody tr th:only-of-type {\n",
1368
       "        vertical-align: middle;\n",
1369
       "    }\n",
1370
       "\n",
1371
       "    .dataframe tbody tr th {\n",
1372
       "        vertical-align: top;\n",
1373
       "    }\n",
1374
       "\n",
1375
       "    .dataframe thead th {\n",
1376
       "        text-align: right;\n",
1377
       "    }\n",
1378
       "</style>\n",
1379
       "<table border=\"1\" class=\"dataframe\">\n",
1380
       "  <thead>\n",
1381
       "    <tr style=\"text-align: right;\">\n",
1382
       "      <th></th>\n",
1383
       "      <th>user_id</th>\n",
1384
       "      <th>product_id</th>\n",
1385
       "      <th>total_orders</th>\n",
1386
       "    </tr>\n",
1387
       "  </thead>\n",
1388
       "  <tbody>\n",
1389
       "    <tr>\n",
1390
       "      <th>13863744</th>\n",
1391
       "      <td>206209</td>\n",
1392
       "      <td>48697</td>\n",
1393
       "      <td>1</td>\n",
1394
       "    </tr>\n",
1395
       "    <tr>\n",
1396
       "      <th>13863745</th>\n",
1397
       "      <td>206209</td>\n",
1398
       "      <td>48742</td>\n",
1399
       "      <td>2</td>\n",
1400
       "    </tr>\n",
1401
       "    <tr>\n",
1402
       "      <th>13863746</th>\n",
1403
       "      <td>206210</td>\n",
1404
       "      <td>22802</td>\n",
1405
       "      <td>97</td>\n",
1406
       "    </tr>\n",
1407
       "    <tr>\n",
1408
       "      <th>13863747</th>\n",
1409
       "      <td>206211</td>\n",
1410
       "      <td>26834</td>\n",
1411
       "      <td>89</td>\n",
1412
       "    </tr>\n",
1413
       "    <tr>\n",
1414
       "      <th>13863748</th>\n",
1415
       "      <td>206211</td>\n",
1416
       "      <td>12590</td>\n",
1417
       "      <td>77</td>\n",
1418
       "    </tr>\n",
1419
       "  </tbody>\n",
1420
       "</table>\n",
1421
       "</div>"
1422
      ],
1423
      "text/plain": [
1424
       "          user_id  product_id  total_orders\n",
1425
       "13863744   206209       48697             1\n",
1426
       "13863745   206209       48742             2\n",
1427
       "13863746   206210       22802            97\n",
1428
       "13863747   206211       26834            89\n",
1429
       "13863748   206211       12590            77"
1430
      ]
1431
     },
1432
     "execution_count": 44,
1433
     "metadata": {},
1434
     "output_type": "execute_result"
1435
    }
1436
   ],
1437
   "source": [
1438
    "data.tail()"
1439
   ]
1440
  },
1441
  {
1442
   "attachments": {},
1443
   "cell_type": "markdown",
1444
   "metadata": {
1445
    "id": "Xah1FIs0t5HK"
1446
   },
1447
   "source": [
1448
    "## Query for Recommendations"
1449
   ]
1450
  },
1451
  {
1452
   "attachments": {},
1453
   "cell_type": "markdown",
1454
   "metadata": {
1455
    "id": "ULyVnHEXt5HK"
1456
   },
1457
   "source": [
1458
    "We are now retrieving user factors for users that we have manually created before for testing purposes. Besides these users, we are adding a random existing user. We are also displaying these users so you can see what these factors look like."
1459
   ]
1460
  },
1461
  {
1462
   "cell_type": "code",
1463
   "execution_count": 45,
1464
   "metadata": {
1465
    "id": "Wwl7yFKTt5HK",
1466
    "outputId": "b3baef91-2d00-4231-e546-2ee966f62d4d"
1467
   },
1468
   "outputs": [
1469
    {
1470
     "data": {
1471
      "text/plain": [
1472
       "array([[-0.313707  , -0.27664858, -1.2828674 ,  1.7485007 , -0.99290836,\n",
1473
       "         2.2764893 , -0.9494704 ,  1.1454551 , -2.0393448 , -1.6061454 ,\n",
1474
       "         0.5251323 , -1.3181114 ,  1.0427406 ,  1.8733109 , -2.549778  ,\n",
1475
       "         1.3321211 , -0.3133029 ,  2.208259  , -2.3020349 ,  0.5815489 ,\n",
1476
       "        -0.37377855,  0.7690419 , -0.40076098, -2.4527717 ,  1.2445227 ,\n",
1477
       "        -0.46194887, -2.7551808 , -1.1535347 ,  0.10361311,  2.6116688 ,\n",
1478
       "        -1.0924482 ,  1.6089348 , -2.4051344 , -2.931227  ,  1.8216416 ,\n",
1479
       "         1.0059888 ,  1.7596883 , -2.4735103 , -1.3774238 ,  1.2440904 ,\n",
1480
       "        -2.4833293 , -0.39517453, -0.00621076,  3.1343987 , -0.8446826 ,\n",
1481
       "        -1.5209638 ,  0.83655167,  1.7331113 , -0.35771912,  1.4926167 ,\n",
1482
       "         0.42455527,  0.6774385 , -0.63032067,  0.14995454, -1.2719532 ,\n",
1483
       "         1.0312878 ,  1.2454114 , -2.1102207 , -2.3025262 ,  0.55968195,\n",
1484
       "        -0.18068558,  0.99221903,  0.8804258 ,  0.9581545 , -0.7357688 ,\n",
1485
       "        -1.3392165 , -0.37945822,  1.0634457 , -0.10247187,  0.22213912,\n",
1486
       "         0.10265829, -1.0366185 , -0.6038233 ,  0.47490817,  2.7055511 ,\n",
1487
       "         2.5368512 , -0.5959401 ,  0.13280135,  0.67197865,  0.12398392,\n",
1488
       "        -0.04903562,  0.81579864,  0.2347543 , -0.9798626 ,  0.2947345 ,\n",
1489
       "         0.2806802 ,  1.1976005 ,  2.1994352 ,  0.26454824,  1.1742004 ,\n",
1490
       "         0.11709324,  1.6044425 , -0.5988262 ,  0.15046799,  0.46880785,\n",
1491
       "        -0.35391572, -0.08479875,  1.7974043 , -2.7378693 , -0.89177626],\n",
1492
       "       [-1.113506  ,  0.646045  ,  0.91660035,  3.427136  , -1.8506187 ,\n",
1493
       "        -2.001958  , -1.167917  , -0.09067711, -1.2732714 ,  0.72084534,\n",
1494
       "         0.92990404, -0.2793406 ,  2.5933106 ,  0.2095659 ,  0.20489155,\n",
1495
       "         2.6076956 ,  1.4788582 ,  0.55408037,  0.7565834 , -0.6248087 ,\n",
1496
       "        -1.0415096 , -1.0918716 ,  1.6078566 ,  1.0329262 , -2.4481158 ,\n",
1497
       "         1.5047243 , -0.5709003 , -1.8746959 ,  0.2135229 ,  2.4907224 ,\n",
1498
       "         0.19642703,  1.4449238 ,  1.6527305 , -0.17461014,  0.11057296,\n",
1499
       "        -1.7177945 , -0.04621175,  1.2366726 ,  0.8969147 , -1.5486971 ,\n",
1500
       "        -0.3412704 , -2.034647  ,  0.6454408 ,  0.14385808,  2.1624665 ,\n",
1501
       "         0.7950704 ,  1.4016353 , -0.1812926 ,  0.5354293 , -1.378532  ,\n",
1502
       "         0.01187539, -0.00947361, -1.6387362 , -1.3973973 ,  0.31152153,\n",
1503
       "         1.370239  , -0.3467708 , -1.2985501 ,  2.492775  , -0.75529194,\n",
1504
       "        -0.21149556,  1.1047683 ,  2.1180177 , -2.7552977 , -0.14160857,\n",
1505
       "         0.7567414 ,  0.12927827, -1.1602536 , -1.2509437 , -1.0854826 ,\n",
1506
       "        -0.18008146, -0.00589125,  0.5193577 ,  0.80703855,  1.7236317 ,\n",
1507
       "        -0.07526795, -0.41325498,  1.6021116 ,  5.0426197 , -0.05752076,\n",
1508
       "         1.3786825 , -2.10659   , -0.37388846,  1.1092849 , -3.7238026 ,\n",
1509
       "         0.04807726,  0.6400418 ,  2.1282876 , -2.405108  ,  2.343005  ,\n",
1510
       "         0.02623899, -1.3713418 ,  0.13835156, -0.68146217,  1.8052037 ,\n",
1511
       "         0.5837893 , -1.3996686 , -1.2638658 ,  1.2598401 ,  0.02347608]],\n",
1512
       "      dtype=float32)"
1513
      ]
1514
     },
1515
     "metadata": {},
1516
     "output_type": "display_data"
1517
    }
1518
   ],
1519
   "source": [
1520
    "user_ids = [206210, 206211, 103593]\n",
1521
    "user_factors = model.user_factors[user_to_index[user_ids]]\n",
1522
    "\n",
1523
    "display(user_factors[1:])"
1524
   ]
1525
  },
1526
  {
1527
   "attachments": {},
1528
   "cell_type": "markdown",
1529
   "metadata": {
1530
    "id": "d2rlTIuyt5HK"
1531
   },
1532
   "source": [
1533
    "### Model recommendations\n",
1534
    "\n",
1535
    "We will now retrieve recommendations from our model directly, just to have these results as a baseline."
1536
   ]
1537
  },
1538
  {
1539
   "cell_type": "code",
1540
   "execution_count": 46,
1541
   "metadata": {
1542
    "id": "IM9lyHTVt5HL",
1543
    "outputId": "fdb5f8e4-57a0-41c8-9ffb-34c629fec7e5"
1544
   },
1545
   "outputs": [
1546
    {
1547
     "name": "stdout",
1548
     "output_type": "stream",
1549
     "text": [
1550
      "Model recommendations\n",
1551
      "\n",
1552
      "Time needed for retrieving recommended products: 0.08775425699968764 seconds.\n",
1553
      "\n",
1554
      "\n",
1555
      "Recommendations for person 0:\n",
1556
      "['Sparkling Water']\n",
1557
      "['Mineral Water']\n",
1558
      "['Sparkling Natural Mineral Water']\n",
1559
      "['Soda']\n",
1560
      "['Spring Water']\n",
1561
      "['Smartwater']\n",
1562
      "['Zero Calorie Cola']\n",
1563
      "['Sparkling Mineral Water']\n",
1564
      "['Drinking Water']\n",
1565
      "['Distilled Water']\n",
1566
      "\n",
1567
      "Recommendations for person 1:\n",
1568
      "['Baby Wipes Sensitive']\n",
1569
      "['Organic Garbanzo Beans']\n",
1570
      "['Ezekiel 4:9 Bread Organic Sprouted Whole Grain']\n",
1571
      "['Chocolate Ice Cream']\n",
1572
      "['YoBaby Peach Pear Yogurt']\n",
1573
      "['Free and Gentle High Efficiency Liquid Laundry Detergent']\n",
1574
      "['Baby Wash & Shampoo']\n",
1575
      "['No More Tears Baby Shampoo']\n",
1576
      "['White Buttermints']\n",
1577
      "['Eggo Pancakes Minis']\n",
1578
      "\n",
1579
      "Recommendations for person 2:\n",
1580
      "['Organic Blackberries']\n",
1581
      "['Peach']\n",
1582
      "['Organic Bosc Pear']\n",
1583
      "['Organic Strawberries']\n",
1584
      "['Organic Blueberries']\n",
1585
      "['Red Plums']\n",
1586
      "['Clementines, Bag']\n",
1587
      "['Organic Bartlett Pear']\n",
1588
      "['Blood Oranges']\n",
1589
      "['Organic Fuji Apple']\n"
1590
     ]
1591
    }
1592
   ],
1593
   "source": [
1594
    "print(\"Model recommendations\\n\")\n",
1595
    "\n",
1596
    "start_time = time.process_time()\n",
1597
    "recommendations0 = model.recommend(userid=user_ids[0], user_items=sparse_user_product[0])\n",
1598
    "recommendations1 = model.recommend(userid=user_ids[1], user_items=sparse_user_product[1])\n",
1599
    "recommendations2 = model.recommend(userid=user_ids[2], user_items=sparse_user_product[2])\n",
1600
    "print(\"Time needed for retrieving recommended products: \" + str(time.process_time() - start_time) + ' seconds.\\n')\n",
1601
    "\n",
1602
    "print('\\nRecommendations for person 0:')\n",
1603
    "for recommendation in recommendations0[0]:\n",
1604
    "    print(products_lookup[products_lookup.product_id == recommendation]['product_name'].values)\n",
1605
    "\n",
1606
    "print('\\nRecommendations for person 1:')\n",
1607
    "for recommendation in recommendations1[0]:\n",
1608
    "    print(products_lookup[products_lookup.product_id == recommendation]['product_name'].values)\n",
1609
    "\n",
1610
    "print('\\nRecommendations for person 2:')\n",
1611
    "for recommendation in recommendations2[0]:\n",
1612
    "    print(products_lookup[products_lookup.product_id == recommendation]['product_name'].values)"
1613
   ]
1614
  },
1615
  {
1616
   "attachments": {},
1617
   "cell_type": "markdown",
1618
   "metadata": {
1619
    "id": "wTh61ou3t5HL"
1620
   },
1621
   "source": [
1622
    "### Query the index\n",
1623
    "\n",
1624
    "Let's now query the index to check how quickly we retrieve results. Please note that query speed depends in part on your internet connection."
1625
   ]
1626
  },
1627
  {
1628
   "cell_type": "code",
1629
   "execution_count": 47,
1630
   "metadata": {
1631
    "id": "UiZg4Iset5HL",
1632
    "outputId": "06898130-2d66-4860-d7ea-e7ac6c9c92f6"
1633
   },
1634
   "outputs": [
1635
    {
1636
     "name": "stdout",
1637
     "output_type": "stream",
1638
     "text": [
1639
      "Time needed for retrieving recommended products using Pinecone: 0.016210021000006236 seconds.\n",
1640
      "\n",
1641
      "user_id=206210\n",
1642
      "Recommendation: \n"
1643
     ]
1644
    },
1645
    {
1646
     "data": {
1647
      "text/html": [
1648
       "<div>\n",
1649
       "<style scoped>\n",
1650
       "    .dataframe tbody tr th:only-of-type {\n",
1651
       "        vertical-align: middle;\n",
1652
       "    }\n",
1653
       "\n",
1654
       "    .dataframe tbody tr th {\n",
1655
       "        vertical-align: top;\n",
1656
       "    }\n",
1657
       "\n",
1658
       "    .dataframe thead th {\n",
1659
       "        text-align: right;\n",
1660
       "    }\n",
1661
       "</style>\n",
1662
       "<table border=\"1\" class=\"dataframe\">\n",
1663
       "  <thead>\n",
1664
       "    <tr style=\"text-align: right;\">\n",
1665
       "      <th></th>\n",
1666
       "      <th>products</th>\n",
1667
       "      <th>scores</th>\n",
1668
       "    </tr>\n",
1669
       "  </thead>\n",
1670
       "  <tbody>\n",
1671
       "    <tr>\n",
1672
       "      <th>0</th>\n",
1673
       "      <td>Mineral Water</td>\n",
1674
       "      <td>0.927818</td>\n",
1675
       "    </tr>\n",
1676
       "    <tr>\n",
1677
       "      <th>1</th>\n",
1678
       "      <td>Zero Calorie Cola</td>\n",
1679
       "      <td>0.660643</td>\n",
1680
       "    </tr>\n",
1681
       "    <tr>\n",
1682
       "      <th>2</th>\n",
1683
       "      <td>Orange &amp; Lemon Flavor Variety Pack Sparkling F...</td>\n",
1684
       "      <td>0.640357</td>\n",
1685
       "    </tr>\n",
1686
       "    <tr>\n",
1687
       "      <th>3</th>\n",
1688
       "      <td>Sparkling Water</td>\n",
1689
       "      <td>0.617592</td>\n",
1690
       "    </tr>\n",
1691
       "    <tr>\n",
1692
       "      <th>4</th>\n",
1693
       "      <td>Extra Fancy Unsalted Mixed Nuts</td>\n",
1694
       "      <td>0.596714</td>\n",
1695
       "    </tr>\n",
1696
       "    <tr>\n",
1697
       "      <th>5</th>\n",
1698
       "      <td>Popcorn</td>\n",
1699
       "      <td>0.592487</td>\n",
1700
       "    </tr>\n",
1701
       "    <tr>\n",
1702
       "      <th>6</th>\n",
1703
       "      <td>Organic Variety Pack</td>\n",
1704
       "      <td>0.586585</td>\n",
1705
       "    </tr>\n",
1706
       "    <tr>\n",
1707
       "      <th>7</th>\n",
1708
       "      <td>Drinking Water</td>\n",
1709
       "      <td>0.585892</td>\n",
1710
       "    </tr>\n",
1711
       "    <tr>\n",
1712
       "      <th>8</th>\n",
1713
       "      <td>Tall Kitchen Bag With Febreze Odor Shield</td>\n",
1714
       "      <td>0.569821</td>\n",
1715
       "    </tr>\n",
1716
       "    <tr>\n",
1717
       "      <th>9</th>\n",
1718
       "      <td>Milk Chocolate Almonds</td>\n",
1719
       "      <td>0.561412</td>\n",
1720
       "    </tr>\n",
1721
       "  </tbody>\n",
1722
       "</table>\n",
1723
       "</div>"
1724
      ],
1725
      "text/plain": [
1726
       "                                            products    scores\n",
1727
       "0                                      Mineral Water  0.927818\n",
1728
       "1                                  Zero Calorie Cola  0.660643\n",
1729
       "2  Orange & Lemon Flavor Variety Pack Sparkling F...  0.640357\n",
1730
       "3                                    Sparkling Water  0.617592\n",
1731
       "4                    Extra Fancy Unsalted Mixed Nuts  0.596714\n",
1732
       "5                                            Popcorn  0.592487\n",
1733
       "6                               Organic Variety Pack  0.586585\n",
1734
       "7                                     Drinking Water  0.585892\n",
1735
       "8          Tall Kitchen Bag With Febreze Odor Shield  0.569821\n",
1736
       "9                             Milk Chocolate Almonds  0.561412"
1737
      ]
1738
     },
1739
     "metadata": {},
1740
     "output_type": "display_data"
1741
    },
1742
    {
1743
     "name": "stdout",
1744
     "output_type": "stream",
1745
     "text": [
1746
      "Top buys from the past: \n"
1747
     ]
1748
    },
1749
    {
1750
     "data": {
1751
      "text/html": [
1752
       "<div>\n",
1753
       "<style scoped>\n",
1754
       "    .dataframe tbody tr th:only-of-type {\n",
1755
       "        vertical-align: middle;\n",
1756
       "    }\n",
1757
       "\n",
1758
       "    .dataframe tbody tr th {\n",
1759
       "        vertical-align: top;\n",
1760
       "    }\n",
1761
       "\n",
1762
       "    .dataframe thead th {\n",
1763
       "        text-align: right;\n",
1764
       "    }\n",
1765
       "</style>\n",
1766
       "<table border=\"1\" class=\"dataframe\">\n",
1767
       "  <thead>\n",
1768
       "    <tr style=\"text-align: right;\">\n",
1769
       "      <th></th>\n",
1770
       "      <th>product_id</th>\n",
1771
       "      <th>product_name</th>\n",
1772
       "      <th>total_orders</th>\n",
1773
       "    </tr>\n",
1774
       "  </thead>\n",
1775
       "  <tbody>\n",
1776
       "    <tr>\n",
1777
       "      <th>0</th>\n",
1778
       "      <td>22802</td>\n",
1779
       "      <td>Mineral Water</td>\n",
1780
       "      <td>97</td>\n",
1781
       "    </tr>\n",
1782
       "  </tbody>\n",
1783
       "</table>\n",
1784
       "</div>"
1785
      ],
1786
      "text/plain": [
1787
       "   product_id   product_name  total_orders\n",
1788
       "0       22802  Mineral Water            97"
1789
      ]
1790
     },
1791
     "metadata": {},
1792
     "output_type": "display_data"
1793
    },
1794
    {
1795
     "name": "stdout",
1796
     "output_type": "stream",
1797
     "text": [
1798
      "user_id=206211\n",
1799
      "Recommendation: \n"
1800
     ]
1801
    },
1802
    {
1803
     "data": {
1804
      "text/html": [
1805
       "<div>\n",
1806
       "<style scoped>\n",
1807
       "    .dataframe tbody tr th:only-of-type {\n",
1808
       "        vertical-align: middle;\n",
1809
       "    }\n",
1810
       "\n",
1811
       "    .dataframe tbody tr th {\n",
1812
       "        vertical-align: top;\n",
1813
       "    }\n",
1814
       "\n",
1815
       "    .dataframe thead th {\n",
1816
       "        text-align: right;\n",
1817
       "    }\n",
1818
       "</style>\n",
1819
       "<table border=\"1\" class=\"dataframe\">\n",
1820
       "  <thead>\n",
1821
       "    <tr style=\"text-align: right;\">\n",
1822
       "      <th></th>\n",
1823
       "      <th>products</th>\n",
1824
       "      <th>scores</th>\n",
1825
       "    </tr>\n",
1826
       "  </thead>\n",
1827
       "  <tbody>\n",
1828
       "    <tr>\n",
1829
       "      <th>0</th>\n",
1830
       "      <td>Baby Wash &amp; Shampoo</td>\n",
1831
       "      <td>0.705169</td>\n",
1832
       "    </tr>\n",
1833
       "    <tr>\n",
1834
       "      <th>1</th>\n",
1835
       "      <td>No More Tears Baby Shampoo</td>\n",
1836
       "      <td>0.675753</td>\n",
1837
       "    </tr>\n",
1838
       "    <tr>\n",
1839
       "      <th>2</th>\n",
1840
       "      <td>Baby Wipes Sensitive</td>\n",
1841
       "      <td>0.563419</td>\n",
1842
       "    </tr>\n",
1843
       "    <tr>\n",
1844
       "      <th>3</th>\n",
1845
       "      <td>Size 6 Baby Dry Diapers</td>\n",
1846
       "      <td>0.525253</td>\n",
1847
       "    </tr>\n",
1848
       "    <tr>\n",
1849
       "      <th>4</th>\n",
1850
       "      <td>Graduates Lil' Crunchies Mild Cheddar Corn Snacks</td>\n",
1851
       "      <td>0.502849</td>\n",
1852
       "    </tr>\n",
1853
       "    <tr>\n",
1854
       "      <th>5</th>\n",
1855
       "      <td>Sensitive Infant Formula for Fussiness and Gas</td>\n",
1856
       "      <td>0.494810</td>\n",
1857
       "    </tr>\n",
1858
       "    <tr>\n",
1859
       "      <th>6</th>\n",
1860
       "      <td>Size 5 Cruisers Diapers Super Pack</td>\n",
1861
       "      <td>0.494335</td>\n",
1862
       "    </tr>\n",
1863
       "    <tr>\n",
1864
       "      <th>7</th>\n",
1865
       "      <td>Grow &amp; Gain Chocolate Shake Nutritional Drink</td>\n",
1866
       "      <td>0.494275</td>\n",
1867
       "    </tr>\n",
1868
       "    <tr>\n",
1869
       "      <th>8</th>\n",
1870
       "      <td>Stage 1 Newborn Hypoallergenic Liquid Detergent</td>\n",
1871
       "      <td>0.492494</td>\n",
1872
       "    </tr>\n",
1873
       "    <tr>\n",
1874
       "      <th>9</th>\n",
1875
       "      <td>Strawberry Yogurt Melts</td>\n",
1876
       "      <td>0.490631</td>\n",
1877
       "    </tr>\n",
1878
       "  </tbody>\n",
1879
       "</table>\n",
1880
       "</div>"
1881
      ],
1882
      "text/plain": [
1883
       "                                            products    scores\n",
1884
       "0                                Baby Wash & Shampoo  0.705169\n",
1885
       "1                         No More Tears Baby Shampoo  0.675753\n",
1886
       "2                               Baby Wipes Sensitive  0.563419\n",
1887
       "3                            Size 6 Baby Dry Diapers  0.525253\n",
1888
       "4  Graduates Lil' Crunchies Mild Cheddar Corn Snacks  0.502849\n",
1889
       "5     Sensitive Infant Formula for Fussiness and Gas  0.494810\n",
1890
       "6                 Size 5 Cruisers Diapers Super Pack  0.494335\n",
1891
       "7      Grow & Gain Chocolate Shake Nutritional Drink  0.494275\n",
1892
       "8    Stage 1 Newborn Hypoallergenic Liquid Detergent  0.492494\n",
1893
       "9                            Strawberry Yogurt Melts  0.490631"
1894
      ]
1895
     },
1896
     "metadata": {},
1897
     "output_type": "display_data"
1898
    },
1899
    {
1900
     "name": "stdout",
1901
     "output_type": "stream",
1902
     "text": [
1903
      "Top buys from the past: \n"
1904
     ]
1905
    },
1906
    {
1907
     "data": {
1908
      "text/html": [
1909
       "<div>\n",
1910
       "<style scoped>\n",
1911
       "    .dataframe tbody tr th:only-of-type {\n",
1912
       "        vertical-align: middle;\n",
1913
       "    }\n",
1914
       "\n",
1915
       "    .dataframe tbody tr th {\n",
1916
       "        vertical-align: top;\n",
1917
       "    }\n",
1918
       "\n",
1919
       "    .dataframe thead th {\n",
1920
       "        text-align: right;\n",
1921
       "    }\n",
1922
       "</style>\n",
1923
       "<table border=\"1\" class=\"dataframe\">\n",
1924
       "  <thead>\n",
1925
       "    <tr style=\"text-align: right;\">\n",
1926
       "      <th></th>\n",
1927
       "      <th>product_id</th>\n",
1928
       "      <th>product_name</th>\n",
1929
       "      <th>total_orders</th>\n",
1930
       "    </tr>\n",
1931
       "  </thead>\n",
1932
       "  <tbody>\n",
1933
       "    <tr>\n",
1934
       "      <th>0</th>\n",
1935
       "      <td>26834</td>\n",
1936
       "      <td>No More Tears Baby Shampoo</td>\n",
1937
       "      <td>89</td>\n",
1938
       "    </tr>\n",
1939
       "    <tr>\n",
1940
       "      <th>1</th>\n",
1941
       "      <td>12590</td>\n",
1942
       "      <td>Baby Wash &amp; Shampoo</td>\n",
1943
       "      <td>77</td>\n",
1944
       "    </tr>\n",
1945
       "  </tbody>\n",
1946
       "</table>\n",
1947
       "</div>"
1948
      ],
1949
      "text/plain": [
1950
       "   product_id                product_name  total_orders\n",
1951
       "0       26834  No More Tears Baby Shampoo            89\n",
1952
       "1       12590         Baby Wash & Shampoo            77"
1953
      ]
1954
     },
1955
     "metadata": {},
1956
     "output_type": "display_data"
1957
    }
1958
   ],
1959
   "source": [
1960
    "# Query by user factors\n",
1961
    "user_embeddings = user_factors.to_numpy()[:-1].tolist() if device == \"cuda\" else user_factors[:-1].tolist()\n",
1962
    "\n",
1963
    "start_time = time.process_time()\n",
1964
    "query_results = index.query(queries=user_embeddings, top_k=10, include_metadata=True)\n",
1965
    "print(\"Time needed for retrieving recommended products using Pinecone: \" + str(time.process_time() - start_time) + ' seconds.\\n')\n",
1966
    "\n",
1967
    "for _id, res in zip(user_ids, query_results.results):\n",
1968
    "    print(f'user_id={_id}')\n",
1969
    "    df = pd.DataFrame(\n",
1970
    "        {\n",
1971
    "            'products': [match.metadata['title'] for match in res.matches],\n",
1972
    "            'scores': [match.score for match in res.matches]\n",
1973
    "        }\n",
1974
    "    )\n",
1975
    "    print(\"Recommendation: \")\n",
1976
    "    display(df)\n",
1977
    "    print(\"Top buys from the past: \")\n",
1978
    "    display(products_bought_by_user_in_the_past(_id, top=15))"
1979
   ]
1980
  },
1981
  {
1982
   "attachments": {},
1983
   "cell_type": "markdown",
1984
   "metadata": {
1985
    "id": "Dkxi6IYbt5HL"
1986
   },
1987
   "source": [
1988
    "*Note* The inference using Pinecone is much faster compared to retrieving recommendations from a model directly. Please note that this result depends on your internet connection as well. "
1989
   ]
1990
  },
1991
  {
1992
   "attachments": {},
1993
   "cell_type": "markdown",
1994
   "metadata": {
1995
    "id": "Jl64xOvKt5HL"
1996
   },
1997
   "source": [
1998
    "All that’s left to do is surface these recommendations on the shopping site, or feed them into other applications."
1999
   ]
2000
  },
2001
  {
2002
   "attachments": {},
2003
   "cell_type": "markdown",
2004
   "metadata": {
2005
    "id": "HUvu0FG9t5HM"
2006
   },
2007
   "source": [
2008
    "## Clean up"
2009
   ]
2010
  },
2011
  {
2012
   "attachments": {},
2013
   "cell_type": "markdown",
2014
   "metadata": {
2015
    "id": "QTpsePdJt5HM"
2016
   },
2017
   "source": [
2018
    "Delete the index once you are sure that you do not want to use it anymore. Once it is deleted, you cannot reuse it."
2019
   ]
2020
  },
2021
  {
2022
   "cell_type": "code",
2023
   "execution_count": 48,
2024
   "metadata": {
2025
    "id": "7gCA-7kbt5HM"
2026
   },
2027
   "outputs": [],
2028
   "source": [
2029
    "pinecone.delete_index(index_name)"
2030
   ]
2031
  },
2032
  {
2033
   "attachments": {},
2034
   "cell_type": "markdown",
2035
   "metadata": {
2036
    "id": "Ece-xKrYt5HM"
2037
   },
2038
   "source": [
2039
    "## Summary"
2040
   ]
2041
  },
2042
  {
2043
   "attachments": {},
2044
   "cell_type": "markdown",
2045
   "metadata": {
2046
    "id": "FXywcRrYt5HM"
2047
   },
2048
   "source": [
2049
    "In this example we used [Pinecone](https://www.pinecone.io/) to build and deploy a product recommendation engine that uses collaborative filtering, relatively quickly.\n",
2050
    "\n",
2051
    "Once deployed, the product recommendation engine can index new data, retrieve recommendations in milliseconds, and send results to production applications."
2052
   ]
2053
  }
2054
 ],
2055
 "metadata": {
2056
  "colab": {
2057
   "name": "product_recommender.ipynb",
2058
   "provenance": []
2059
  },
2060
  "kernelspec": {
2061
   "display_name": "base",
2062
   "language": "python",
2063
   "name": "python3"
2064
  },
2065
  "language_info": {
2066
   "codemirror_mode": {
2067
    "name": "ipython",
2068
    "version": 3
2069
   },
2070
   "file_extension": ".py",
2071
   "mimetype": "text/x-python",
2072
   "name": "python",
2073
   "nbconvert_exporter": "python",
2074
   "pygments_lexer": "ipython3",
2075
   "version": "3.8.16"
2076
  },
2077
  "vscode": {
2078
   "interpreter": {
2079
    "hash": "5fe10bf018ef3e697f9035d60bf60847932a12bface18908407fd371fe880db9"
2080
   }
2081
  }
2082
 },
2083
 "nbformat": 4,
2084
 "nbformat_minor": 1
2085
}
2086

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.