Amazing-Python-Scripts

Форк
0
/
Unsupervised_Learning_.ipynb 
1765 строк · 124.4 Кб
1
{
2
 "cells": [
3
  {
4
   "cell_type": "markdown",
5
   "metadata": {
6
    "colab_type": "text",
7
    "id": "view-in-github"
8
   },
9
   "source": [
10
    "<a href=\"https://colab.research.google.com/github/ayush-09/K-Means-Clustering/blob/master/Unsupervised_Learning_.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
11
   ]
12
  },
13
  {
14
   "cell_type": "markdown",
15
   "metadata": {
16
    "id": "WdGfBpUVoyJ_"
17
   },
18
   "source": [
19
    "# Unsupervised Learning Algorithm\n",
20
    "- Salaray Dataset from kaggle name salary.csv : https://www.kaggle.com/rsadiq/salary\n",
21
    "- K-means++ Clustering algoritm(k=3), I choose K-means++ instead instead of K-means because it doesnot chosse any random for centroid.\n",
22
    "k=3 is fit after check the wscc and silhouette score for each k value.\n",
23
    "- I plot both the graphs **Elbow Method** and **Silhoutte** for the best cluster value.\n",
24
    "- After made the model I will check the sum of square error of the first 10 iteration of the model by plot the graph."
25
   ]
26
  },
27
  {
28
   "cell_type": "markdown",
29
   "metadata": {
30
    "id": "f2jVcjGfGpuA"
31
   },
32
   "source": [
33
    "## Connect with drive\n"
34
   ]
35
  },
36
  {
37
   "cell_type": "code",
38
   "execution_count": 1,
39
   "metadata": {
40
    "colab": {
41
     "base_uri": "https://localhost:8080/"
42
    },
43
    "id": "iqQ8Pt17GpH7",
44
    "outputId": "ddb95c5b-fae4-427c-9167-4abc76fc538d"
45
   },
46
   "outputs": [
47
    {
48
     "name": "stdout",
49
     "output_type": "stream",
50
     "text": [
51
      "Mounted at /content/gdrive\n"
52
     ]
53
    }
54
   ],
55
   "source": [
56
    "from google.colab import drive\n",
57
    "\n",
58
    "drive.mount(\"/content/gdrive\")"
59
   ]
60
  },
61
  {
62
   "cell_type": "markdown",
63
   "metadata": {
64
    "id": "ALFYwu6oHUVU"
65
   },
66
   "source": [
67
    "## Import the libraries\n"
68
   ]
69
  },
70
  {
71
   "cell_type": "code",
72
   "execution_count": 2,
73
   "metadata": {
74
    "id": "4jK_6EZqEnCv"
75
   },
76
   "outputs": [],
77
   "source": [
78
    "from sklearn.cluster import KMeans\n",
79
    "from sklearn.metrics import silhouette_score,silhouette_samples\n",
80
    "import pandas as pd\n",
81
    "import numpy as np\n",
82
    "from sklearn.preprocessing import MinMaxScaler\n",
83
    "from matplotlib import pyplot as plt\n",
84
    "import matplotlib.cm as cm\n",
85
    "%matplotlib inline"
86
   ]
87
  },
88
  {
89
   "cell_type": "markdown",
90
   "metadata": {
91
    "id": "9rM8vR3kXzzy"
92
   },
93
   "source": [
94
    "## Visualize the data\n"
95
   ]
96
  },
97
  {
98
   "cell_type": "code",
99
   "execution_count": 4,
100
   "metadata": {
101
    "colab": {
102
     "base_uri": "https://localhost:8080/",
103
     "height": 195
104
    },
105
    "id": "uedCNPgSEnCw",
106
    "outputId": "dd3e2dac-e7e4-4975-fb6a-36180ee68f51"
107
   },
108
   "outputs": [
109
    {
110
     "data": {
111
      "text/html": [
112
       "<div>\n",
113
       "<style scoped>\n",
114
       "    .dataframe tbody tr th:only-of-type {\n",
115
       "        vertical-align: middle;\n",
116
       "    }\n",
117
       "\n",
118
       "    .dataframe tbody tr th {\n",
119
       "        vertical-align: top;\n",
120
       "    }\n",
121
       "\n",
122
       "    .dataframe thead th {\n",
123
       "        text-align: right;\n",
124
       "    }\n",
125
       "</style>\n",
126
       "<table border=\"1\" class=\"dataframe\">\n",
127
       "  <thead>\n",
128
       "    <tr style=\"text-align: right;\">\n",
129
       "      <th></th>\n",
130
       "      <th>YearsExperience</th>\n",
131
       "      <th>Salary</th>\n",
132
       "    </tr>\n",
133
       "  </thead>\n",
134
       "  <tbody>\n",
135
       "    <tr>\n",
136
       "      <th>0</th>\n",
137
       "      <td>1.1</td>\n",
138
       "      <td>39343</td>\n",
139
       "    </tr>\n",
140
       "    <tr>\n",
141
       "      <th>1</th>\n",
142
       "      <td>1.3</td>\n",
143
       "      <td>46205</td>\n",
144
       "    </tr>\n",
145
       "    <tr>\n",
146
       "      <th>2</th>\n",
147
       "      <td>1.5</td>\n",
148
       "      <td>37731</td>\n",
149
       "    </tr>\n",
150
       "    <tr>\n",
151
       "      <th>3</th>\n",
152
       "      <td>2.0</td>\n",
153
       "      <td>43525</td>\n",
154
       "    </tr>\n",
155
       "    <tr>\n",
156
       "      <th>4</th>\n",
157
       "      <td>2.2</td>\n",
158
       "      <td>39891</td>\n",
159
       "    </tr>\n",
160
       "  </tbody>\n",
161
       "</table>\n",
162
       "</div>"
163
      ],
164
      "text/plain": [
165
       "   YearsExperience  Salary\n",
166
       "0              1.1   39343\n",
167
       "1              1.3   46205\n",
168
       "2              1.5   37731\n",
169
       "3              2.0   43525\n",
170
       "4              2.2   39891"
171
      ]
172
     },
173
     "execution_count": 4,
174
     "metadata": {
175
      "tags": []
176
     },
177
     "output_type": "execute_result"
178
    }
179
   ],
180
   "source": [
181
    "df=pd.read_csv(\"/content/gdrive/MyDrive/archive/Salary.csv\")\n",
182
    "df.head()"
183
   ]
184
  },
185
  {
186
   "cell_type": "code",
187
   "execution_count": 5,
188
   "metadata": {
189
    "colab": {
190
     "base_uri": "https://localhost:8080/",
191
     "height": 282
192
    },
193
    "id": "BRp_Z0-6EnCx",
194
    "outputId": "aaaf49c0-a056-422b-a7aa-4348a7a17d3b"
195
   },
196
   "outputs": [
197
    {
198
     "data": {
199
      "text/plain": [
200
       "<matplotlib.collections.PathCollection at 0x7f9eb3baab10>"
201
      ]
202
     },
203
     "execution_count": 5,
204
     "metadata": {
205
      "tags": []
206
     },
207
     "output_type": "execute_result"
208
    },
209
    {
210
     "data": {
211
      "image/png": "\n",
212
      "text/plain": [
213
       "<Figure size 432x288 with 1 Axes>"
214
      ]
215
     },
216
     "metadata": {
217
      "needs_background": "light",
218
      "tags": []
219
     },
220
     "output_type": "display_data"
221
    }
222
   ],
223
   "source": [
224
    "plt.scatter(df['YearsExperience'],df['Salary'])"
225
   ]
226
  },
227
  {
228
   "cell_type": "code",
229
   "execution_count": 6,
230
   "metadata": {
231
    "id": "k8kzph36Rj71"
232
   },
233
   "outputs": [],
234
   "source": [
235
    "X=df[['YearsExperience','Salary']]"
236
   ]
237
  },
238
  {
239
   "cell_type": "markdown",
240
   "metadata": {
241
    "id": "q4Xgg5R_rcKM"
242
   },
243
   "source": [
244
    "## Calculate the WCSS and Silhouette Score"
245
   ]
246
  },
247
  {
248
   "cell_type": "code",
249
   "execution_count": 7,
250
   "metadata": {
251
    "colab": {
252
     "base_uri": "https://localhost:8080/"
253
    },
254
    "id": "8Y6xQrl8RbXU",
255
    "outputId": "ada348b3-660f-4534-d784-bc39080779ce"
256
   },
257
   "outputs": [
258
    {
259
     "name": "stdout",
260
     "output_type": "stream",
261
     "text": [
262
      "Cluster = 2, wcss=6062232833.744474, Silhouette= 0.7028572890853004\n",
263
      "Cluster = 3, wcss=2903864662.5549126, Silhouette= 0.6331618019952167\n",
264
      "Cluster = 4, wcss=1663816733.7989414, Silhouette= 0.6368549820565115\n",
265
      "Cluster = 5, wcss=894398887.721595, Silhouette= 0.6173381925348792\n",
266
      "Cluster = 6, wcss=652477856.6519287, Silhouette= 0.627089162952214\n",
267
      "Cluster = 7, wcss=441789859.79249996, Silhouette= 0.6421588941076305\n"
268
     ]
269
    }
270
   ],
271
   "source": [
272
    "wcss =[]\n",
273
    "silhouette=[]\n",
274
    "for i in range(2,8):\n",
275
    "    k_means = KMeans(n_clusters=i,init = 'k-means++', random_state=20)\n",
276
    "    k_means.fit(X)\n",
277
    "    wcss.append(k_means.inertia_)\n",
278
    "    pred = k_means.predict(X)\n",
279
    "    silhouette.append(silhouette_score(X,pred))\n",
280
    "    print(\"Cluster = {0}, wcss={1}, Silhouette= {2}\".format(i,k_means.inertia_,silhouette_score(X,pred)))"
281
   ]
282
  },
283
  {
284
   "cell_type": "markdown",
285
   "metadata": {
286
    "id": "NGdx-zSDsJXU"
287
   },
288
   "source": [
289
    "## Ploting the Graphs"
290
   ]
291
  },
292
  {
293
   "cell_type": "code",
294
   "execution_count": 8,
295
   "metadata": {
296
    "colab": {
297
     "base_uri": "https://localhost:8080/",
298
     "height": 295
299
    },
300
    "id": "_-31zovzLY2x",
301
    "outputId": "3c47cd93-c177-42f4-8178-c501177cf402"
302
   },
303
   "outputs": [
304
    {
305
     "data": {
306
      "image/png": "\n",
307
      "text/plain": [
308
       "<Figure size 432x288 with 1 Axes>"
309
      ]
310
     },
311
     "metadata": {
312
      "needs_background": "light",
313
      "tags": []
314
     },
315
     "output_type": "display_data"
316
    }
317
   ],
318
   "source": [
319
    "plt.plot(range(2,8),wcss)\n",
320
    "plt.title('Elbow Method')\n",
321
    "plt.xlabel('No. of clusters')\n",
322
    "plt.ylabel('wcss score')\n",
323
    "plt.show()\n"
324
   ]
325
  },
326
  {
327
   "cell_type": "code",
328
   "execution_count": 9,
329
   "metadata": {
330
    "colab": {
331
     "base_uri": "https://localhost:8080/",
332
     "height": 295
333
    },
334
    "id": "bgAtKjOAR1aF",
335
    "outputId": "9e23b406-b881-4076-f94f-95ad5db99442"
336
   },
337
   "outputs": [
338
    {
339
     "data": {
340
      "image/png": "\n",
341
      "text/plain": [
342
       "<Figure size 432x288 with 1 Axes>"
343
      ]
344
     },
345
     "metadata": {
346
      "needs_background": "light",
347
      "tags": []
348
     },
349
     "output_type": "display_data"
350
    }
351
   ],
352
   "source": [
353
    "plt.plot(range(2,8),silhouette)\n",
354
    "plt.title('Silhouette')\n",
355
    "plt.xlabel('No. of clusters')\n",
356
    "plt.ylabel('Silhouette score')\n",
357
    "plt.show()\n"
358
   ]
359
  },
360
  {
361
   "cell_type": "markdown",
362
   "metadata": {
363
    "id": "rIeAHgsJsP-n"
364
   },
365
   "source": [
366
    "## Train the model\n"
367
   ]
368
  },
369
  {
370
   "cell_type": "code",
371
   "execution_count": 10,
372
   "metadata": {
373
    "colab": {
374
     "base_uri": "https://localhost:8080/"
375
    },
376
    "id": "Zbzrurv3EnCx",
377
    "outputId": "213d54db-2fdd-43ea-99be-59445f77f141"
378
   },
379
   "outputs": [
380
    {
381
     "data": {
382
      "text/plain": [
383
       "KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=100,\n",
384
       "       n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',\n",
385
       "       random_state=None, tol=0.0001, verbose=True)"
386
      ]
387
     },
388
     "execution_count": 10,
389
     "metadata": {
390
      "tags": []
391
     },
392
     "output_type": "execute_result"
393
    }
394
   ],
395
   "source": [
396
    "km=KMeans(n_clusters=5, init='k-means++',max_iter=100,verbose=True)\n",
397
    "km"
398
   ]
399
  },
400
  {
401
   "cell_type": "markdown",
402
   "metadata": {
403
    "id": "Rtf4lXtDsUtB"
404
   },
405
   "source": [
406
    "## Prediction"
407
   ]
408
  },
409
  {
410
   "cell_type": "code",
411
   "execution_count": 11,
412
   "metadata": {
413
    "colab": {
414
     "base_uri": "https://localhost:8080/"
415
    },
416
    "id": "a_rjkpRYEnCy",
417
    "outputId": "6296e2f8-7255-46cd-f7e1-40d0e8d56389"
418
   },
419
   "outputs": [
420
    {
421
     "name": "stdout",
422
     "output_type": "stream",
423
     "text": [
424
      "Initialization complete\n",
425
      "start iteration\n",
426
      "done sorting\n",
427
      "end inner loop\n",
428
      "Iteration 0, inertia 944865214.6304046\n",
429
      "start iteration\n",
430
      "done sorting\n",
431
      "end inner loop\n",
432
      "Iteration 1, inertia 944865214.6304046\n",
433
      "center shift 0.000000e+00 within tolerance 5.024411e+04\n",
434
      "Initialization complete\n",
435
      "start iteration\n",
436
      "done sorting\n",
437
      "end inner loop\n",
438
      "Iteration 0, inertia 1504371017.23425\n",
439
      "start iteration\n",
440
      "done sorting\n",
441
      "end inner loop\n",
442
      "Iteration 1, inertia 1504371017.23425\n",
443
      "center shift 0.000000e+00 within tolerance 5.024411e+04\n",
444
      "Initialization complete\n",
445
      "start iteration\n",
446
      "done sorting\n",
447
      "end inner loop\n",
448
      "Iteration 0, inertia 1198190608.6271667\n",
449
      "start iteration\n",
450
      "done sorting\n",
451
      "end inner loop\n",
452
      "Iteration 1, inertia 1198190608.6271667\n",
453
      "center shift 0.000000e+00 within tolerance 5.024411e+04\n",
454
      "Initialization complete\n",
455
      "start iteration\n",
456
      "done sorting\n",
457
      "end inner loop\n",
458
      "Iteration 0, inertia 1251859328.2879996\n",
459
      "start iteration\n",
460
      "done sorting\n",
461
      "end inner loop\n",
462
      "Iteration 1, inertia 1158509896.5262141\n",
463
      "start iteration\n",
464
      "done sorting\n",
465
      "end inner loop\n",
466
      "Iteration 2, inertia 1158509896.5262141\n",
467
      "center shift 0.000000e+00 within tolerance 5.024411e+04\n",
468
      "Initialization complete\n",
469
      "start iteration\n",
470
      "done sorting\n",
471
      "end inner loop\n",
472
      "Iteration 0, inertia 919084793.4247856\n",
473
      "start iteration\n",
474
      "done sorting\n",
475
      "end inner loop\n",
476
      "Iteration 1, inertia 919084793.4247856\n",
477
      "center shift 0.000000e+00 within tolerance 5.024411e+04\n",
478
      "Initialization complete\n",
479
      "start iteration\n",
480
      "done sorting\n",
481
      "end inner loop\n",
482
      "Iteration 0, inertia 917097459.1652617\n",
483
      "start iteration\n",
484
      "done sorting\n",
485
      "end inner loop\n",
486
      "Iteration 1, inertia 917097459.1652617\n",
487
      "center shift 0.000000e+00 within tolerance 5.024411e+04\n",
488
      "Initialization complete\n",
489
      "start iteration\n",
490
      "done sorting\n",
491
      "end inner loop\n",
492
      "Iteration 0, inertia 1570578612.1835585\n",
493
      "start iteration\n",
494
      "done sorting\n",
495
      "end inner loop\n",
496
      "Iteration 1, inertia 1392384670.572329\n",
497
      "start iteration\n",
498
      "done sorting\n",
499
      "end inner loop\n",
500
      "Iteration 2, inertia 1248208083.0671272\n",
501
      "start iteration\n",
502
      "done sorting\n",
503
      "end inner loop\n",
504
      "Iteration 3, inertia 944865214.6304046\n",
505
      "start iteration\n",
506
      "done sorting\n",
507
      "end inner loop\n",
508
      "Iteration 4, inertia 944865214.6304046\n",
509
      "center shift 0.000000e+00 within tolerance 5.024411e+04\n",
510
      "Initialization complete\n",
511
      "start iteration\n",
512
      "done sorting\n",
513
      "end inner loop\n",
514
      "Iteration 0, inertia 1053324728.9333895\n",
515
      "start iteration\n",
516
      "done sorting\n",
517
      "end inner loop\n",
518
      "Iteration 1, inertia 944865214.6304046\n",
519
      "start iteration\n",
520
      "done sorting\n",
521
      "end inner loop\n",
522
      "Iteration 2, inertia 944865214.6304046\n",
523
      "center shift 0.000000e+00 within tolerance 5.024411e+04\n",
524
      "Initialization complete\n",
525
      "start iteration\n",
526
      "done sorting\n",
527
      "end inner loop\n",
528
      "Iteration 0, inertia 944865214.6304046\n",
529
      "start iteration\n",
530
      "done sorting\n",
531
      "end inner loop\n",
532
      "Iteration 1, inertia 944865214.6304046\n",
533
      "center shift 0.000000e+00 within tolerance 5.024411e+04\n",
534
      "Initialization complete\n",
535
      "start iteration\n",
536
      "done sorting\n",
537
      "end inner loop\n",
538
      "Iteration 0, inertia 1198190608.6271667\n",
539
      "start iteration\n",
540
      "done sorting\n",
541
      "end inner loop\n",
542
      "Iteration 1, inertia 1198190608.6271667\n",
543
      "center shift 0.000000e+00 within tolerance 5.024411e+04\n"
544
     ]
545
    },
546
    {
547
     "data": {
548
      "text/plain": [
549
       "array([4, 4, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2,\n",
550
       "       2, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3], dtype=int32)"
551
      ]
552
     },
553
     "execution_count": 11,
554
     "metadata": {
555
      "tags": []
556
     },
557
     "output_type": "execute_result"
558
    }
559
   ],
560
   "source": [
561
    "y_predicted=km.fit_predict(df[['YearsExperience','Salary']])\n",
562
    "y_predicted"
563
   ]
564
  },
565
  {
566
   "cell_type": "code",
567
   "execution_count": 12,
568
   "metadata": {
569
    "colab": {
570
     "base_uri": "https://localhost:8080/",
571
     "height": 1000
572
    },
573
    "id": "rS6hTWw2EnCy",
574
    "outputId": "9067a9a4-d0ee-43d9-9d7d-e5d927a07606"
575
   },
576
   "outputs": [
577
    {
578
     "data": {
579
      "text/html": [
580
       "<div>\n",
581
       "<style scoped>\n",
582
       "    .dataframe tbody tr th:only-of-type {\n",
583
       "        vertical-align: middle;\n",
584
       "    }\n",
585
       "\n",
586
       "    .dataframe tbody tr th {\n",
587
       "        vertical-align: top;\n",
588
       "    }\n",
589
       "\n",
590
       "    .dataframe thead th {\n",
591
       "        text-align: right;\n",
592
       "    }\n",
593
       "</style>\n",
594
       "<table border=\"1\" class=\"dataframe\">\n",
595
       "  <thead>\n",
596
       "    <tr style=\"text-align: right;\">\n",
597
       "      <th></th>\n",
598
       "      <th>YearsExperience</th>\n",
599
       "      <th>Salary</th>\n",
600
       "      <th>cluster</th>\n",
601
       "    </tr>\n",
602
       "  </thead>\n",
603
       "  <tbody>\n",
604
       "    <tr>\n",
605
       "      <th>0</th>\n",
606
       "      <td>1.1</td>\n",
607
       "      <td>39343</td>\n",
608
       "      <td>4</td>\n",
609
       "    </tr>\n",
610
       "    <tr>\n",
611
       "      <th>1</th>\n",
612
       "      <td>1.3</td>\n",
613
       "      <td>46205</td>\n",
614
       "      <td>4</td>\n",
615
       "    </tr>\n",
616
       "    <tr>\n",
617
       "      <th>2</th>\n",
618
       "      <td>1.5</td>\n",
619
       "      <td>37731</td>\n",
620
       "      <td>4</td>\n",
621
       "    </tr>\n",
622
       "    <tr>\n",
623
       "      <th>3</th>\n",
624
       "      <td>2.0</td>\n",
625
       "      <td>43525</td>\n",
626
       "      <td>4</td>\n",
627
       "    </tr>\n",
628
       "    <tr>\n",
629
       "      <th>4</th>\n",
630
       "      <td>2.2</td>\n",
631
       "      <td>39891</td>\n",
632
       "      <td>4</td>\n",
633
       "    </tr>\n",
634
       "    <tr>\n",
635
       "      <th>5</th>\n",
636
       "      <td>2.9</td>\n",
637
       "      <td>56642</td>\n",
638
       "      <td>0</td>\n",
639
       "    </tr>\n",
640
       "    <tr>\n",
641
       "      <th>6</th>\n",
642
       "      <td>3.0</td>\n",
643
       "      <td>60150</td>\n",
644
       "      <td>0</td>\n",
645
       "    </tr>\n",
646
       "    <tr>\n",
647
       "      <th>7</th>\n",
648
       "      <td>3.2</td>\n",
649
       "      <td>54445</td>\n",
650
       "      <td>0</td>\n",
651
       "    </tr>\n",
652
       "    <tr>\n",
653
       "      <th>8</th>\n",
654
       "      <td>3.2</td>\n",
655
       "      <td>64445</td>\n",
656
       "      <td>0</td>\n",
657
       "    </tr>\n",
658
       "    <tr>\n",
659
       "      <th>9</th>\n",
660
       "      <td>3.7</td>\n",
661
       "      <td>57189</td>\n",
662
       "      <td>0</td>\n",
663
       "    </tr>\n",
664
       "    <tr>\n",
665
       "      <th>10</th>\n",
666
       "      <td>3.9</td>\n",
667
       "      <td>63218</td>\n",
668
       "      <td>0</td>\n",
669
       "    </tr>\n",
670
       "    <tr>\n",
671
       "      <th>11</th>\n",
672
       "      <td>4.0</td>\n",
673
       "      <td>55794</td>\n",
674
       "      <td>0</td>\n",
675
       "    </tr>\n",
676
       "    <tr>\n",
677
       "      <th>12</th>\n",
678
       "      <td>4.0</td>\n",
679
       "      <td>56957</td>\n",
680
       "      <td>0</td>\n",
681
       "    </tr>\n",
682
       "    <tr>\n",
683
       "      <th>13</th>\n",
684
       "      <td>4.1</td>\n",
685
       "      <td>57081</td>\n",
686
       "      <td>0</td>\n",
687
       "    </tr>\n",
688
       "    <tr>\n",
689
       "      <th>14</th>\n",
690
       "      <td>4.5</td>\n",
691
       "      <td>61111</td>\n",
692
       "      <td>0</td>\n",
693
       "    </tr>\n",
694
       "    <tr>\n",
695
       "      <th>15</th>\n",
696
       "      <td>4.9</td>\n",
697
       "      <td>67938</td>\n",
698
       "      <td>0</td>\n",
699
       "    </tr>\n",
700
       "    <tr>\n",
701
       "      <th>16</th>\n",
702
       "      <td>5.1</td>\n",
703
       "      <td>66029</td>\n",
704
       "      <td>0</td>\n",
705
       "    </tr>\n",
706
       "    <tr>\n",
707
       "      <th>17</th>\n",
708
       "      <td>5.3</td>\n",
709
       "      <td>83088</td>\n",
710
       "      <td>2</td>\n",
711
       "    </tr>\n",
712
       "    <tr>\n",
713
       "      <th>18</th>\n",
714
       "      <td>5.9</td>\n",
715
       "      <td>81363</td>\n",
716
       "      <td>2</td>\n",
717
       "    </tr>\n",
718
       "    <tr>\n",
719
       "      <th>19</th>\n",
720
       "      <td>6.0</td>\n",
721
       "      <td>93940</td>\n",
722
       "      <td>2</td>\n",
723
       "    </tr>\n",
724
       "    <tr>\n",
725
       "      <th>20</th>\n",
726
       "      <td>6.8</td>\n",
727
       "      <td>91738</td>\n",
728
       "      <td>2</td>\n",
729
       "    </tr>\n",
730
       "    <tr>\n",
731
       "      <th>21</th>\n",
732
       "      <td>7.1</td>\n",
733
       "      <td>98273</td>\n",
734
       "      <td>2</td>\n",
735
       "    </tr>\n",
736
       "    <tr>\n",
737
       "      <th>22</th>\n",
738
       "      <td>7.9</td>\n",
739
       "      <td>101302</td>\n",
740
       "      <td>2</td>\n",
741
       "    </tr>\n",
742
       "    <tr>\n",
743
       "      <th>23</th>\n",
744
       "      <td>8.2</td>\n",
745
       "      <td>113812</td>\n",
746
       "      <td>1</td>\n",
747
       "    </tr>\n",
748
       "    <tr>\n",
749
       "      <th>24</th>\n",
750
       "      <td>8.7</td>\n",
751
       "      <td>109431</td>\n",
752
       "      <td>1</td>\n",
753
       "    </tr>\n",
754
       "    <tr>\n",
755
       "      <th>25</th>\n",
756
       "      <td>9.0</td>\n",
757
       "      <td>105582</td>\n",
758
       "      <td>1</td>\n",
759
       "    </tr>\n",
760
       "    <tr>\n",
761
       "      <th>26</th>\n",
762
       "      <td>9.5</td>\n",
763
       "      <td>116969</td>\n",
764
       "      <td>1</td>\n",
765
       "    </tr>\n",
766
       "    <tr>\n",
767
       "      <th>27</th>\n",
768
       "      <td>9.6</td>\n",
769
       "      <td>112635</td>\n",
770
       "      <td>1</td>\n",
771
       "    </tr>\n",
772
       "    <tr>\n",
773
       "      <th>28</th>\n",
774
       "      <td>10.3</td>\n",
775
       "      <td>122391</td>\n",
776
       "      <td>3</td>\n",
777
       "    </tr>\n",
778
       "    <tr>\n",
779
       "      <th>29</th>\n",
780
       "      <td>10.5</td>\n",
781
       "      <td>121872</td>\n",
782
       "      <td>3</td>\n",
783
       "    </tr>\n",
784
       "    <tr>\n",
785
       "      <th>30</th>\n",
786
       "      <td>11.2</td>\n",
787
       "      <td>127345</td>\n",
788
       "      <td>3</td>\n",
789
       "    </tr>\n",
790
       "    <tr>\n",
791
       "      <th>31</th>\n",
792
       "      <td>11.5</td>\n",
793
       "      <td>126756</td>\n",
794
       "      <td>3</td>\n",
795
       "    </tr>\n",
796
       "    <tr>\n",
797
       "      <th>32</th>\n",
798
       "      <td>12.3</td>\n",
799
       "      <td>128765</td>\n",
800
       "      <td>3</td>\n",
801
       "    </tr>\n",
802
       "    <tr>\n",
803
       "      <th>33</th>\n",
804
       "      <td>12.9</td>\n",
805
       "      <td>135675</td>\n",
806
       "      <td>3</td>\n",
807
       "    </tr>\n",
808
       "    <tr>\n",
809
       "      <th>34</th>\n",
810
       "      <td>13.5</td>\n",
811
       "      <td>139465</td>\n",
812
       "      <td>3</td>\n",
813
       "    </tr>\n",
814
       "  </tbody>\n",
815
       "</table>\n",
816
       "</div>"
817
      ],
818
      "text/plain": [
819
       "    YearsExperience  Salary  cluster\n",
820
       "0               1.1   39343        4\n",
821
       "1               1.3   46205        4\n",
822
       "2               1.5   37731        4\n",
823
       "3               2.0   43525        4\n",
824
       "4               2.2   39891        4\n",
825
       "5               2.9   56642        0\n",
826
       "6               3.0   60150        0\n",
827
       "7               3.2   54445        0\n",
828
       "8               3.2   64445        0\n",
829
       "9               3.7   57189        0\n",
830
       "10              3.9   63218        0\n",
831
       "11              4.0   55794        0\n",
832
       "12              4.0   56957        0\n",
833
       "13              4.1   57081        0\n",
834
       "14              4.5   61111        0\n",
835
       "15              4.9   67938        0\n",
836
       "16              5.1   66029        0\n",
837
       "17              5.3   83088        2\n",
838
       "18              5.9   81363        2\n",
839
       "19              6.0   93940        2\n",
840
       "20              6.8   91738        2\n",
841
       "21              7.1   98273        2\n",
842
       "22              7.9  101302        2\n",
843
       "23              8.2  113812        1\n",
844
       "24              8.7  109431        1\n",
845
       "25              9.0  105582        1\n",
846
       "26              9.5  116969        1\n",
847
       "27              9.6  112635        1\n",
848
       "28             10.3  122391        3\n",
849
       "29             10.5  121872        3\n",
850
       "30             11.2  127345        3\n",
851
       "31             11.5  126756        3\n",
852
       "32             12.3  128765        3\n",
853
       "33             12.9  135675        3\n",
854
       "34             13.5  139465        3"
855
      ]
856
     },
857
     "execution_count": 12,
858
     "metadata": {
859
      "tags": []
860
     },
861
     "output_type": "execute_result"
862
    }
863
   ],
864
   "source": [
865
    "df['cluster']=y_predicted\n",
866
    "df"
867
   ]
868
  },
869
  {
870
   "cell_type": "code",
871
   "execution_count": 13,
872
   "metadata": {
873
    "colab": {
874
     "base_uri": "https://localhost:8080/"
875
    },
876
    "id": "OauQMQkrNiBO",
877
    "outputId": "e25a0b6e-b30a-49d6-9af4-984ea626cc49"
878
   },
879
   "outputs": [
880
    {
881
     "name": "stdout",
882
     "output_type": "stream",
883
     "text": [
884
      "(array([0, 1, 2, 3, 4], dtype=int32), array([12,  5,  6,  7,  5]))\n"
885
     ]
886
    }
887
   ],
888
   "source": [
889
    "print(np.unique(km.labels_,return_counts= True))"
890
   ]
891
  },
892
  {
893
   "cell_type": "code",
894
   "execution_count": 14,
895
   "metadata": {
896
    "colab": {
897
     "base_uri": "https://localhost:8080/",
898
     "height": 297
899
    },
900
    "id": "wJQphFQNEnCy",
901
    "outputId": "731e78f4-24e4-4dd2-b94b-b196ca5dc08d"
902
   },
903
   "outputs": [
904
    {
905
     "data": {
906
      "text/plain": [
907
       "Text(0, 0.5, 'Salary')"
908
      ]
909
     },
910
     "execution_count": 14,
911
     "metadata": {
912
      "tags": []
913
     },
914
     "output_type": "execute_result"
915
    },
916
    {
917
     "data": {
918
      "image/png": "\n",
919
      "text/plain": [
920
       "<Figure size 432x288 with 1 Axes>"
921
      ]
922
     },
923
     "metadata": {
924
      "needs_background": "light",
925
      "tags": []
926
     },
927
     "output_type": "display_data"
928
    }
929
   ],
930
   "source": [
931
    "df1=df[df.cluster==0]\n",
932
    "df2=df[df.cluster==1]\n",
933
    "df3=df[df.cluster==2]\n",
934
    "df4=df[df.cluster==3]\n",
935
    "df5=df[df.cluster==4]\n",
936
    "plt.scatter(df1.YearsExperience, df1['Salary'],color='green')\n",
937
    "plt.scatter(df2.YearsExperience, df2['Salary'],color='red')\n",
938
    "plt.scatter(df3.YearsExperience, df3['Salary'],color='black')\n",
939
    "plt.scatter(df4.YearsExperience, df4['Salary'],color='blue')\n",
940
    "plt.scatter(df5.YearsExperience, df5['Salary'],color='purple')\n",
941
    "plt.xlabel('YearsExperience')\n",
942
    "plt.ylabel('Salary')"
943
   ]
944
  },
945
  {
946
   "cell_type": "markdown",
947
   "metadata": {
948
    "id": "f81w__fxsbdp"
949
   },
950
   "source": [
951
    "## Normalization and then again check"
952
   ]
953
  },
954
  {
955
   "cell_type": "code",
956
   "execution_count": 15,
957
   "metadata": {
958
    "colab": {
959
     "base_uri": "https://localhost:8080/",
960
     "height": 1000
961
    },
962
    "id": "D6H77wGlEnCz",
963
    "outputId": "8d876e6a-3533-4ae3-effa-684345613a62"
964
   },
965
   "outputs": [
966
    {
967
     "data": {
968
      "text/html": [
969
       "<div>\n",
970
       "<style scoped>\n",
971
       "    .dataframe tbody tr th:only-of-type {\n",
972
       "        vertical-align: middle;\n",
973
       "    }\n",
974
       "\n",
975
       "    .dataframe tbody tr th {\n",
976
       "        vertical-align: top;\n",
977
       "    }\n",
978
       "\n",
979
       "    .dataframe thead th {\n",
980
       "        text-align: right;\n",
981
       "    }\n",
982
       "</style>\n",
983
       "<table border=\"1\" class=\"dataframe\">\n",
984
       "  <thead>\n",
985
       "    <tr style=\"text-align: right;\">\n",
986
       "      <th></th>\n",
987
       "      <th>YearsExperience</th>\n",
988
       "      <th>Salary</th>\n",
989
       "      <th>cluster</th>\n",
990
       "    </tr>\n",
991
       "  </thead>\n",
992
       "  <tbody>\n",
993
       "    <tr>\n",
994
       "      <th>0</th>\n",
995
       "      <td>0.000000</td>\n",
996
       "      <td>0.015845</td>\n",
997
       "      <td>4</td>\n",
998
       "    </tr>\n",
999
       "    <tr>\n",
1000
       "      <th>1</th>\n",
1001
       "      <td>0.016129</td>\n",
1002
       "      <td>0.083296</td>\n",
1003
       "      <td>4</td>\n",
1004
       "    </tr>\n",
1005
       "    <tr>\n",
1006
       "      <th>2</th>\n",
1007
       "      <td>0.032258</td>\n",
1008
       "      <td>0.000000</td>\n",
1009
       "      <td>4</td>\n",
1010
       "    </tr>\n",
1011
       "    <tr>\n",
1012
       "      <th>3</th>\n",
1013
       "      <td>0.072581</td>\n",
1014
       "      <td>0.056952</td>\n",
1015
       "      <td>4</td>\n",
1016
       "    </tr>\n",
1017
       "    <tr>\n",
1018
       "      <th>4</th>\n",
1019
       "      <td>0.088710</td>\n",
1020
       "      <td>0.021232</td>\n",
1021
       "      <td>4</td>\n",
1022
       "    </tr>\n",
1023
       "    <tr>\n",
1024
       "      <th>5</th>\n",
1025
       "      <td>0.145161</td>\n",
1026
       "      <td>0.185887</td>\n",
1027
       "      <td>0</td>\n",
1028
       "    </tr>\n",
1029
       "    <tr>\n",
1030
       "      <th>6</th>\n",
1031
       "      <td>0.153226</td>\n",
1032
       "      <td>0.220369</td>\n",
1033
       "      <td>0</td>\n",
1034
       "    </tr>\n",
1035
       "    <tr>\n",
1036
       "      <th>7</th>\n",
1037
       "      <td>0.169355</td>\n",
1038
       "      <td>0.164291</td>\n",
1039
       "      <td>0</td>\n",
1040
       "    </tr>\n",
1041
       "    <tr>\n",
1042
       "      <th>8</th>\n",
1043
       "      <td>0.169355</td>\n",
1044
       "      <td>0.262587</td>\n",
1045
       "      <td>0</td>\n",
1046
       "    </tr>\n",
1047
       "    <tr>\n",
1048
       "      <th>9</th>\n",
1049
       "      <td>0.209677</td>\n",
1050
       "      <td>0.191263</td>\n",
1051
       "      <td>0</td>\n",
1052
       "    </tr>\n",
1053
       "    <tr>\n",
1054
       "      <th>10</th>\n",
1055
       "      <td>0.225806</td>\n",
1056
       "      <td>0.250526</td>\n",
1057
       "      <td>0</td>\n",
1058
       "    </tr>\n",
1059
       "    <tr>\n",
1060
       "      <th>11</th>\n",
1061
       "      <td>0.233871</td>\n",
1062
       "      <td>0.177551</td>\n",
1063
       "      <td>0</td>\n",
1064
       "    </tr>\n",
1065
       "    <tr>\n",
1066
       "      <th>12</th>\n",
1067
       "      <td>0.233871</td>\n",
1068
       "      <td>0.188983</td>\n",
1069
       "      <td>0</td>\n",
1070
       "    </tr>\n",
1071
       "    <tr>\n",
1072
       "      <th>13</th>\n",
1073
       "      <td>0.241935</td>\n",
1074
       "      <td>0.190202</td>\n",
1075
       "      <td>0</td>\n",
1076
       "    </tr>\n",
1077
       "    <tr>\n",
1078
       "      <th>14</th>\n",
1079
       "      <td>0.274194</td>\n",
1080
       "      <td>0.229815</td>\n",
1081
       "      <td>0</td>\n",
1082
       "    </tr>\n",
1083
       "    <tr>\n",
1084
       "      <th>15</th>\n",
1085
       "      <td>0.306452</td>\n",
1086
       "      <td>0.296921</td>\n",
1087
       "      <td>0</td>\n",
1088
       "    </tr>\n",
1089
       "    <tr>\n",
1090
       "      <th>16</th>\n",
1091
       "      <td>0.322581</td>\n",
1092
       "      <td>0.278157</td>\n",
1093
       "      <td>0</td>\n",
1094
       "    </tr>\n",
1095
       "    <tr>\n",
1096
       "      <th>17</th>\n",
1097
       "      <td>0.338710</td>\n",
1098
       "      <td>0.445839</td>\n",
1099
       "      <td>2</td>\n",
1100
       "    </tr>\n",
1101
       "    <tr>\n",
1102
       "      <th>18</th>\n",
1103
       "      <td>0.387097</td>\n",
1104
       "      <td>0.428883</td>\n",
1105
       "      <td>2</td>\n",
1106
       "    </tr>\n",
1107
       "    <tr>\n",
1108
       "      <th>19</th>\n",
1109
       "      <td>0.395161</td>\n",
1110
       "      <td>0.552509</td>\n",
1111
       "      <td>2</td>\n",
1112
       "    </tr>\n",
1113
       "    <tr>\n",
1114
       "      <th>20</th>\n",
1115
       "      <td>0.459677</td>\n",
1116
       "      <td>0.530865</td>\n",
1117
       "      <td>2</td>\n",
1118
       "    </tr>\n",
1119
       "    <tr>\n",
1120
       "      <th>21</th>\n",
1121
       "      <td>0.483871</td>\n",
1122
       "      <td>0.595101</td>\n",
1123
       "      <td>2</td>\n",
1124
       "    </tr>\n",
1125
       "    <tr>\n",
1126
       "      <th>22</th>\n",
1127
       "      <td>0.548387</td>\n",
1128
       "      <td>0.624875</td>\n",
1129
       "      <td>2</td>\n",
1130
       "    </tr>\n",
1131
       "    <tr>\n",
1132
       "      <th>23</th>\n",
1133
       "      <td>0.572581</td>\n",
1134
       "      <td>0.747842</td>\n",
1135
       "      <td>1</td>\n",
1136
       "    </tr>\n",
1137
       "    <tr>\n",
1138
       "      <th>24</th>\n",
1139
       "      <td>0.612903</td>\n",
1140
       "      <td>0.704779</td>\n",
1141
       "      <td>1</td>\n",
1142
       "    </tr>\n",
1143
       "    <tr>\n",
1144
       "      <th>25</th>\n",
1145
       "      <td>0.637097</td>\n",
1146
       "      <td>0.666945</td>\n",
1147
       "      <td>1</td>\n",
1148
       "    </tr>\n",
1149
       "    <tr>\n",
1150
       "      <th>26</th>\n",
1151
       "      <td>0.677419</td>\n",
1152
       "      <td>0.778874</td>\n",
1153
       "      <td>1</td>\n",
1154
       "    </tr>\n",
1155
       "    <tr>\n",
1156
       "      <th>27</th>\n",
1157
       "      <td>0.685484</td>\n",
1158
       "      <td>0.736273</td>\n",
1159
       "      <td>1</td>\n",
1160
       "    </tr>\n",
1161
       "    <tr>\n",
1162
       "      <th>28</th>\n",
1163
       "      <td>0.741935</td>\n",
1164
       "      <td>0.832170</td>\n",
1165
       "      <td>3</td>\n",
1166
       "    </tr>\n",
1167
       "    <tr>\n",
1168
       "      <th>29</th>\n",
1169
       "      <td>0.758065</td>\n",
1170
       "      <td>0.827069</td>\n",
1171
       "      <td>3</td>\n",
1172
       "    </tr>\n",
1173
       "    <tr>\n",
1174
       "      <th>30</th>\n",
1175
       "      <td>0.814516</td>\n",
1176
       "      <td>0.880866</td>\n",
1177
       "      <td>3</td>\n",
1178
       "    </tr>\n",
1179
       "    <tr>\n",
1180
       "      <th>31</th>\n",
1181
       "      <td>0.838710</td>\n",
1182
       "      <td>0.875076</td>\n",
1183
       "      <td>3</td>\n",
1184
       "    </tr>\n",
1185
       "    <tr>\n",
1186
       "      <th>32</th>\n",
1187
       "      <td>0.903226</td>\n",
1188
       "      <td>0.894824</td>\n",
1189
       "      <td>3</td>\n",
1190
       "    </tr>\n",
1191
       "    <tr>\n",
1192
       "      <th>33</th>\n",
1193
       "      <td>0.951613</td>\n",
1194
       "      <td>0.962746</td>\n",
1195
       "      <td>3</td>\n",
1196
       "    </tr>\n",
1197
       "    <tr>\n",
1198
       "      <th>34</th>\n",
1199
       "      <td>1.000000</td>\n",
1200
       "      <td>1.000000</td>\n",
1201
       "      <td>3</td>\n",
1202
       "    </tr>\n",
1203
       "  </tbody>\n",
1204
       "</table>\n",
1205
       "</div>"
1206
      ],
1207
      "text/plain": [
1208
       "    YearsExperience    Salary  cluster\n",
1209
       "0          0.000000  0.015845        4\n",
1210
       "1          0.016129  0.083296        4\n",
1211
       "2          0.032258  0.000000        4\n",
1212
       "3          0.072581  0.056952        4\n",
1213
       "4          0.088710  0.021232        4\n",
1214
       "5          0.145161  0.185887        0\n",
1215
       "6          0.153226  0.220369        0\n",
1216
       "7          0.169355  0.164291        0\n",
1217
       "8          0.169355  0.262587        0\n",
1218
       "9          0.209677  0.191263        0\n",
1219
       "10         0.225806  0.250526        0\n",
1220
       "11         0.233871  0.177551        0\n",
1221
       "12         0.233871  0.188983        0\n",
1222
       "13         0.241935  0.190202        0\n",
1223
       "14         0.274194  0.229815        0\n",
1224
       "15         0.306452  0.296921        0\n",
1225
       "16         0.322581  0.278157        0\n",
1226
       "17         0.338710  0.445839        2\n",
1227
       "18         0.387097  0.428883        2\n",
1228
       "19         0.395161  0.552509        2\n",
1229
       "20         0.459677  0.530865        2\n",
1230
       "21         0.483871  0.595101        2\n",
1231
       "22         0.548387  0.624875        2\n",
1232
       "23         0.572581  0.747842        1\n",
1233
       "24         0.612903  0.704779        1\n",
1234
       "25         0.637097  0.666945        1\n",
1235
       "26         0.677419  0.778874        1\n",
1236
       "27         0.685484  0.736273        1\n",
1237
       "28         0.741935  0.832170        3\n",
1238
       "29         0.758065  0.827069        3\n",
1239
       "30         0.814516  0.880866        3\n",
1240
       "31         0.838710  0.875076        3\n",
1241
       "32         0.903226  0.894824        3\n",
1242
       "33         0.951613  0.962746        3\n",
1243
       "34         1.000000  1.000000        3"
1244
      ]
1245
     },
1246
     "execution_count": 15,
1247
     "metadata": {
1248
      "tags": []
1249
     },
1250
     "output_type": "execute_result"
1251
    }
1252
   ],
1253
   "source": [
1254
    "scaler=MinMaxScaler()\n",
1255
    "scaler.fit(df[['Salary']])\n",
1256
    "df['Salary']=scaler.transform(df[['Salary']])\n",
1257
    "scaler.fit(df[['YearsExperience']])\n",
1258
    "df['YearsExperience']=scaler.transform(df[['YearsExperience']])\n",
1259
    "\n",
1260
    "df\n"
1261
   ]
1262
  },
1263
  {
1264
   "cell_type": "code",
1265
   "execution_count": 16,
1266
   "metadata": {
1267
    "colab": {
1268
     "base_uri": "https://localhost:8080/"
1269
    },
1270
    "id": "0s7kGdAGEnCz",
1271
    "outputId": "07278e7f-cfa7-4121-962f-71f7c9daaac7"
1272
   },
1273
   "outputs": [
1274
    {
1275
     "name": "stdout",
1276
     "output_type": "stream",
1277
     "text": [
1278
      "Initialization complete\n",
1279
      "start iteration\n",
1280
      "done sorting\n",
1281
      "end inner loop\n",
1282
      "Iteration 0, inertia 0.24599877930393588\n",
1283
      "start iteration\n",
1284
      "done sorting\n",
1285
      "end inner loop\n",
1286
      "Iteration 1, inertia 0.21469464378831662\n",
1287
      "start iteration\n",
1288
      "done sorting\n",
1289
      "end inner loop\n",
1290
      "Iteration 2, inertia 0.21469464378831662\n",
1291
      "center shift 0.000000e+00 within tolerance 8.990979e-06\n",
1292
      "Initialization complete\n",
1293
      "start iteration\n",
1294
      "done sorting\n",
1295
      "end inner loop\n",
1296
      "Iteration 0, inertia 0.24976734481600707\n",
1297
      "start iteration\n",
1298
      "done sorting\n",
1299
      "end inner loop\n",
1300
      "Iteration 1, inertia 0.2340310975743367\n",
1301
      "start iteration\n",
1302
      "done sorting\n",
1303
      "end inner loop\n",
1304
      "Iteration 2, inertia 0.21368181551259902\n",
1305
      "start iteration\n",
1306
      "done sorting\n",
1307
      "end inner loop\n",
1308
      "Iteration 3, inertia 0.21368181551259902\n",
1309
      "center shift 0.000000e+00 within tolerance 8.990979e-06\n",
1310
      "Initialization complete\n",
1311
      "start iteration\n",
1312
      "done sorting\n",
1313
      "end inner loop\n",
1314
      "Iteration 0, inertia 0.276907403808879\n",
1315
      "start iteration\n",
1316
      "done sorting\n",
1317
      "end inner loop\n",
1318
      "Iteration 1, inertia 0.24463440390283883\n",
1319
      "start iteration\n",
1320
      "done sorting\n",
1321
      "end inner loop\n",
1322
      "Iteration 2, inertia 0.2154232730708973\n",
1323
      "start iteration\n",
1324
      "done sorting\n",
1325
      "end inner loop\n",
1326
      "Iteration 3, inertia 0.2154232730708973\n",
1327
      "center shift 0.000000e+00 within tolerance 8.990979e-06\n",
1328
      "Initialization complete\n",
1329
      "start iteration\n",
1330
      "done sorting\n",
1331
      "end inner loop\n",
1332
      "Iteration 0, inertia 0.21469464378831662\n",
1333
      "start iteration\n",
1334
      "done sorting\n",
1335
      "end inner loop\n",
1336
      "Iteration 1, inertia 0.21469464378831662\n",
1337
      "center shift 0.000000e+00 within tolerance 8.990979e-06\n",
1338
      "Initialization complete\n",
1339
      "start iteration\n",
1340
      "done sorting\n",
1341
      "end inner loop\n",
1342
      "Iteration 0, inertia 0.24463440390283883\n",
1343
      "start iteration\n",
1344
      "done sorting\n",
1345
      "end inner loop\n",
1346
      "Iteration 1, inertia 0.2154232730708973\n",
1347
      "start iteration\n",
1348
      "done sorting\n",
1349
      "end inner loop\n",
1350
      "Iteration 2, inertia 0.2154232730708973\n",
1351
      "center shift 0.000000e+00 within tolerance 8.990979e-06\n",
1352
      "Initialization complete\n",
1353
      "start iteration\n",
1354
      "done sorting\n",
1355
      "end inner loop\n",
1356
      "Iteration 0, inertia 0.2154232730708973\n",
1357
      "start iteration\n",
1358
      "done sorting\n",
1359
      "end inner loop\n",
1360
      "Iteration 1, inertia 0.2154232730708973\n",
1361
      "center shift 0.000000e+00 within tolerance 8.990979e-06\n",
1362
      "Initialization complete\n",
1363
      "start iteration\n",
1364
      "done sorting\n",
1365
      "end inner loop\n",
1366
      "Iteration 0, inertia 0.2154232730708973\n",
1367
      "start iteration\n",
1368
      "done sorting\n",
1369
      "end inner loop\n",
1370
      "Iteration 1, inertia 0.2154232730708973\n",
1371
      "center shift 0.000000e+00 within tolerance 8.990979e-06\n",
1372
      "Initialization complete\n",
1373
      "start iteration\n",
1374
      "done sorting\n",
1375
      "end inner loop\n",
1376
      "Iteration 0, inertia 0.4307209532375872\n",
1377
      "start iteration\n",
1378
      "done sorting\n",
1379
      "end inner loop\n",
1380
      "Iteration 1, inertia 0.4233023757955806\n",
1381
      "start iteration\n",
1382
      "done sorting\n",
1383
      "end inner loop\n",
1384
      "Iteration 2, inertia 0.4233023757955806\n",
1385
      "center shift 0.000000e+00 within tolerance 8.990979e-06\n",
1386
      "Initialization complete\n",
1387
      "start iteration\n",
1388
      "done sorting\n",
1389
      "end inner loop\n",
1390
      "Iteration 0, inertia 0.2608751054662088\n",
1391
      "start iteration\n",
1392
      "done sorting\n",
1393
      "end inner loop\n",
1394
      "Iteration 1, inertia 0.21469464378831662\n",
1395
      "start iteration\n",
1396
      "done sorting\n",
1397
      "end inner loop\n",
1398
      "Iteration 2, inertia 0.21469464378831662\n",
1399
      "center shift 0.000000e+00 within tolerance 8.990979e-06\n",
1400
      "Initialization complete\n",
1401
      "start iteration\n",
1402
      "done sorting\n",
1403
      "end inner loop\n",
1404
      "Iteration 0, inertia 0.2451278190383588\n",
1405
      "start iteration\n",
1406
      "done sorting\n",
1407
      "end inner loop\n",
1408
      "Iteration 1, inertia 0.21368181551259902\n",
1409
      "start iteration\n",
1410
      "done sorting\n",
1411
      "end inner loop\n",
1412
      "Iteration 2, inertia 0.21368181551259902\n",
1413
      "center shift 0.000000e+00 within tolerance 8.990979e-06\n"
1414
     ]
1415
    },
1416
    {
1417
     "data": {
1418
      "text/plain": [
1419
       "array([3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2,\n",
1420
       "       2, 1, 1, 1, 1, 1, 1, 1, 4, 4, 4, 4, 4], dtype=int32)"
1421
      ]
1422
     },
1423
     "execution_count": 16,
1424
     "metadata": {
1425
      "tags": []
1426
     },
1427
     "output_type": "execute_result"
1428
    }
1429
   ],
1430
   "source": [
1431
    "km=KMeans(n_clusters=5, init='k-means++',max_iter=100,verbose=True)\n",
1432
    "y_predicted=km.fit_predict(df[['YearsExperience','Salary']])\n",
1433
    "y_predicted"
1434
   ]
1435
  },
1436
  {
1437
   "cell_type": "code",
1438
   "execution_count": 17,
1439
   "metadata": {
1440
    "colab": {
1441
     "base_uri": "https://localhost:8080/",
1442
     "height": 297
1443
    },
1444
    "id": "NSLvhEcuEnCz",
1445
    "outputId": "63aae2f7-f85e-48de-9625-d5dcd1c0ff24"
1446
   },
1447
   "outputs": [
1448
    {
1449
     "data": {
1450
      "text/plain": [
1451
       "Text(0, 0.5, 'Salary')"
1452
      ]
1453
     },
1454
     "execution_count": 17,
1455
     "metadata": {
1456
      "tags": []
1457
     },
1458
     "output_type": "execute_result"
1459
    },
1460
    {
1461
     "data": {
1462
      "image/png": "\n",
1463
      "text/plain": [
1464
       "<Figure size 432x288 with 1 Axes>"
1465
      ]
1466
     },
1467
     "metadata": {
1468
      "needs_background": "light",
1469
      "tags": []
1470
     },
1471
     "output_type": "display_data"
1472
    }
1473
   ],
1474
   "source": [
1475
    "df1=df[df.cluster==0]\n",
1476
    "df2=df[df.cluster==1]\n",
1477
    "df3=df[df.cluster==2]\n",
1478
    "df4=df[df.cluster==3]\n",
1479
    "df5=df[df.cluster==4]\n",
1480
    "plt.scatter(df1.YearsExperience, df1['Salary'],color='green')\n",
1481
    "plt.scatter(df2.YearsExperience, df2['Salary'],color='red')\n",
1482
    "plt.scatter(df3.YearsExperience, df3['Salary'],color='black')\n",
1483
    "plt.scatter(df4.YearsExperience, df4['Salary'],color='blue')\n",
1484
    "plt.scatter(df5.YearsExperience, df5['Salary'],color='purple')\n",
1485
    "plt.xlabel('YearsExperience')\n",
1486
    "plt.ylabel('Salary')"
1487
   ]
1488
  },
1489
  {
1490
   "cell_type": "markdown",
1491
   "metadata": {
1492
    "id": "0XcrjJDtsnhX"
1493
   },
1494
   "source": [
1495
    "## Cluster Centroids"
1496
   ]
1497
  },
1498
  {
1499
   "cell_type": "code",
1500
   "execution_count": 18,
1501
   "metadata": {
1502
    "colab": {
1503
     "base_uri": "https://localhost:8080/"
1504
    },
1505
    "id": "bJ3J8xpSEnC0",
1506
    "outputId": "e7c176fb-93f3-47a5-a4ab-599c1b9849d0"
1507
   },
1508
   "outputs": [
1509
    {
1510
     "data": {
1511
      "text/plain": [
1512
       "array([[0.22379032, 0.21971268],\n",
1513
       "       [0.66935484, 0.75627898],\n",
1514
       "       [0.43548387, 0.5296787 ],\n",
1515
       "       [0.04193548, 0.03546504],\n",
1516
       "       [0.9016129 , 0.92270234]])"
1517
      ]
1518
     },
1519
     "execution_count": 18,
1520
     "metadata": {
1521
      "tags": []
1522
     },
1523
     "output_type": "execute_result"
1524
    }
1525
   ],
1526
   "source": [
1527
    "km.cluster_centers_"
1528
   ]
1529
  },
1530
  {
1531
   "cell_type": "code",
1532
   "execution_count": 19,
1533
   "metadata": {
1534
    "colab": {
1535
     "base_uri": "https://localhost:8080/",
1536
     "height": 297
1537
    },
1538
    "id": "qXxAiISrEnC0",
1539
    "outputId": "61732b04-0857-4436-9c72-e8b5616ceb08"
1540
   },
1541
   "outputs": [
1542
    {
1543
     "data": {
1544
      "text/plain": [
1545
       "<matplotlib.collections.PathCollection at 0x7f9eb3b23650>"
1546
      ]
1547
     },
1548
     "execution_count": 19,
1549
     "metadata": {
1550
      "tags": []
1551
     },
1552
     "output_type": "execute_result"
1553
    },
1554
    {
1555
     "data": {
1556
      "image/png": "\n",
1557
      "text/plain": [
1558
       "<Figure size 432x288 with 1 Axes>"
1559
      ]
1560
     },
1561
     "metadata": {
1562
      "needs_background": "light",
1563
      "tags": []
1564
     },
1565
     "output_type": "display_data"
1566
    }
1567
   ],
1568
   "source": [
1569
    "df1=df[df.cluster==0]\n",
1570
    "df2=df[df.cluster==1]\n",
1571
    "df3=df[df.cluster==2]\n",
1572
    "df4=df[df.cluster==3]\n",
1573
    "df5=df[df.cluster==4]\n",
1574
    "plt.scatter(df1.YearsExperience, df1['Salary'],color='green')\n",
1575
    "plt.scatter(df2.YearsExperience, df2['Salary'],color='red')\n",
1576
    "plt.scatter(df3.YearsExperience, df3['Salary'],color='black')\n",
1577
    "plt.scatter(df4.YearsExperience, df4['Salary'],color='blue')\n",
1578
    "plt.scatter(df5.YearsExperience, df5['Salary'],color='purple')\n",
1579
    "plt.xlabel('YearsExperience')\n",
1580
    "plt.ylabel('Salary')\n",
1581
    "plt.scatter(km.cluster_centers_[:,0],km.cluster_centers_[:,1],color='yellow')"
1582
   ]
1583
  },
1584
  {
1585
   "cell_type": "markdown",
1586
   "metadata": {
1587
    "id": "EvZ0OYekstVb"
1588
   },
1589
   "source": [
1590
    "## Sum of Square Error"
1591
   ]
1592
  },
1593
  {
1594
   "cell_type": "code",
1595
   "execution_count": 20,
1596
   "metadata": {
1597
    "id": "EDenEChVEnC0"
1598
   },
1599
   "outputs": [],
1600
   "source": [
1601
    "k_rng=range(1,10)\n",
1602
    "sse=[]\n",
1603
    "for k in k_rng:\n",
1604
    "    km=KMeans(n_clusters=k,init='k-means++')\n",
1605
    "    km.fit(df[['YearsExperience','Salary']])\n",
1606
    "    sse.append(km.inertia_)"
1607
   ]
1608
  },
1609
  {
1610
   "cell_type": "code",
1611
   "execution_count": 21,
1612
   "metadata": {
1613
    "colab": {
1614
     "base_uri": "https://localhost:8080/"
1615
    },
1616
    "id": "JXPVlnE-EnC1",
1617
    "outputId": "2bd21e27-d926-4b71-c6a5-bef652c4d04e"
1618
   },
1619
   "outputs": [
1620
    {
1621
     "data": {
1622
      "text/plain": [
1623
       "[6.293685484541874,\n",
1624
       " 1.292690919141886,\n",
1625
       " 0.6527227565513628,\n",
1626
       " 0.4101141952961812,\n",
1627
       " 0.21368181551259902,\n",
1628
       " 0.15226682764323596,\n",
1629
       " 0.12182725552641988,\n",
1630
       " 0.08849338117507208,\n",
1631
       " 0.0759704313741079]"
1632
      ]
1633
     },
1634
     "execution_count": 21,
1635
     "metadata": {
1636
      "tags": []
1637
     },
1638
     "output_type": "execute_result"
1639
    }
1640
   ],
1641
   "source": [
1642
    "sse"
1643
   ]
1644
  },
1645
  {
1646
   "cell_type": "markdown",
1647
   "metadata": {
1648
    "id": "N0ED_rACs4b8"
1649
   },
1650
   "source": [
1651
    "## Plot the SSE"
1652
   ]
1653
  },
1654
  {
1655
   "cell_type": "code",
1656
   "execution_count": 22,
1657
   "metadata": {
1658
    "colab": {
1659
     "base_uri": "https://localhost:8080/",
1660
     "height": 297
1661
    },
1662
    "id": "jxbEV7RoEnC1",
1663
    "outputId": "5a4bb2c7-260e-4d38-9437-75213cf0abb0"
1664
   },
1665
   "outputs": [
1666
    {
1667
     "data": {
1668
      "text/plain": [
1669
       "[<matplotlib.lines.Line2D at 0x7f9ea9a34e10>]"
1670
      ]
1671
     },
1672
     "execution_count": 22,
1673
     "metadata": {
1674
      "tags": []
1675
     },
1676
     "output_type": "execute_result"
1677
    },
1678
    {
1679
     "data": {
1680
      "image/png": "\n",
1681
      "text/plain": [
1682
       "<Figure size 432x288 with 1 Axes>"
1683
      ]
1684
     },
1685
     "metadata": {
1686
      "needs_background": "light",
1687
      "tags": []
1688
     },
1689
     "output_type": "display_data"
1690
    }
1691
   ],
1692
   "source": [
1693
    "plt.xlabel('k')\n",
1694
    "plt.ylabel('sum of square error')\n",
1695
    "plt.plot(k_rng,sse)"
1696
   ]
1697
  },
1698
  {
1699
   "cell_type": "code",
1700
   "execution_count": 23,
1701
   "metadata": {
1702
    "colab": {
1703
     "base_uri": "https://localhost:8080/"
1704
    },
1705
    "id": "8na1sMNvEnC3",
1706
    "outputId": "2a9b215e-5140-40dd-b8a2-9fec0cc7cf97"
1707
   },
1708
   "outputs": [
1709
    {
1710
     "data": {
1711
      "text/plain": [
1712
       "['K-means++_model.pkl']"
1713
      ]
1714
     },
1715
     "execution_count": 23,
1716
     "metadata": {
1717
      "tags": []
1718
     },
1719
     "output_type": "execute_result"
1720
    }
1721
   ],
1722
   "source": [
1723
    "import joblib\n",
1724
    "joblib.dump(km,'K-means++_model.pkl')"
1725
   ]
1726
  },
1727
  {
1728
   "cell_type": "code",
1729
   "execution_count": null,
1730
   "metadata": {
1731
    "id": "fv1I8H1mq53L"
1732
   },
1733
   "outputs": [],
1734
   "source": []
1735
  }
1736
 ],
1737
 "metadata": {
1738
  "colab": {
1739
   "collapsed_sections": [],
1740
   "include_colab_link": true,
1741
   "name": "Unsupervised Learning .ipynb",
1742
   "provenance": [],
1743
   "toc_visible": true
1744
  },
1745
  "kernelspec": {
1746
   "display_name": "Python 3 (ipykernel)",
1747
   "language": "python",
1748
   "name": "python3"
1749
  },
1750
  "language_info": {
1751
   "codemirror_mode": {
1752
    "name": "ipython",
1753
    "version": 3
1754
   },
1755
   "file_extension": ".py",
1756
   "mimetype": "text/x-python",
1757
   "name": "python",
1758
   "nbconvert_exporter": "python",
1759
   "pygments_lexer": "ipython3",
1760
   "version": "3.9.13"
1761
  }
1762
 },
1763
 "nbformat": 4,
1764
 "nbformat_minor": 1
1765
}
1766

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.