K-means Method - Finding the Best Clustering

Machine Learning Statistical Methods K-means

We will create the best clustering using the k-means method.

Irem Dastan
2022-08-31

# importing data

wholesale <- read_excel("C:/Users/irem/Desktop/wholesale.xlsx")

First, let’s get to know the variables:

FRESH: Annual expenditure on fresh produce

MILK: Annual milk expenditure

GROCERY: Grocery products annual expenditure

FROZEN: Frozen products annual expenditure

DETERGENTS_PAPER: Detergent and paper annual expenditure

DELICATESSEN: Delicatessen annual expenditure

CHANNEL: (Hotel/Restoran/Cafe) or Retail channel (Nominal)

REGION: Customer area Lisnon, Oporto or other (Nominal)

I scale units to get rid of outliers.

scaleModel <- preProcess(wholesale, method = c("center","scale"))
modelData <- predict(scaleModel, wholesale)

I’m checking to see if there are any missing observations.

md.pattern(modelData) #missing value does not exist.
 /\     /\
{  `---'  }
{  O   O  }
==>  V <==  No need for mice. This data set is completely observed.
 \  \|/  /
  `-----'

    Channel Region Fresh Milk Grocery Frozen Detergents_Paper
440       1      1     1    1       1      1                1
          0      0     0    0       0      0                0
    Delicassen  
440          1 0
             0 0
#Creating model
clusterModel <- kmeans(modelData, centers = 4, 
                iter.max = 15, nstart = 15)
clusterModel
K-means clustering with 4 clusters of sizes 296, 3, 131, 10

Cluster means:
     Channel      Region       Fresh       Milk    Grocery
1 -0.6822942 -0.04704424  0.07979916 -0.3554503 -0.4344436
2 -0.6895122  0.15948501  3.84044484  3.2957757  0.9852919
3  1.4470045  0.10690343 -0.29218794  0.4286373  0.6330682
4  1.4470045 -0.05577083  0.31347349  3.9174467  4.2707490
        Frozen Detergents_Paper  Delicassen
1  0.073138067       -0.4417690 -0.10544775
2  7.204892918       -0.1527927  6.79967230
3 -0.329983553        0.6495639  0.04416479
4 -0.003570131        4.6129149  0.50279301

Clustering vector:
  [1] 3 3 3 1 3 3 3 3 1 3 3 3 3 3 3 1 3 1 3 1 3 1 1 3 3 3 1 1 3 1 1 1
 [33] 1 1 1 3 1 3 3 1 1 1 3 3 3 3 3 4 3 3 1 1 3 3 1 1 4 3 1 1 3 4 3 3
 [65] 1 4 1 3 1 1 1 1 1 3 3 1 1 3 1 1 1 3 3 1 3 4 4 1 1 1 1 1 4 1 3 1
 [97] 3 1 1 1 3 3 3 1 1 1 3 3 3 3 1 3 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 3
[129] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 3 3 1 3 3
[161] 3 1 1 3 3 3 3 1 1 1 3 3 1 3 1 3 1 1 1 1 1 2 1 2 1 1 1 1 3 3 1 1
[193] 1 3 1 1 1 3 1 1 3 3 1 1 1 3 1 3 1 3 1 4 1 1 3 1 3 1 3 1 1 1 1 3
[225] 1 1 3 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 4 1 1 1 1
[257] 1 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 1 1 1 1
[289] 1 1 1 1 1 3 1 3 1 3 3 1 3 3 3 3 3 3 3 1 1 3 1 1 3 1 1 3 1 1 1 3
[321] 1 1 1 1 1 2 1 1 1 1 1 3 1 4 1 3 1 1 1 1 3 3 1 3 1 1 3 3 1 3 1 3
[353] 1 3 1 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 3 1 1 3 1 1 3 1 1 3 1 1 1 1
[385] 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 3
[417] 3 1 3 1 1 3 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1

Within cluster sum of squares by cluster:
[1] 1020.2318  215.6517  437.3645  160.2905
 (between_SS / total_SS =  47.8 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"    
[5] "tot.withinss" "betweenss"    "size"         "iter"        
[9] "ifault"      
fittedCluster <- fitted(clusterModel)

clusterModel$cluster
  [1] 3 3 3 1 3 3 3 3 1 3 3 3 3 3 3 1 3 1 3 1 3 1 1 3 3 3 1 1 3 1 1 1
 [33] 1 1 1 3 1 3 3 1 1 1 3 3 3 3 3 4 3 3 1 1 3 3 1 1 4 3 1 1 3 4 3 3
 [65] 1 4 1 3 1 1 1 1 1 3 3 1 1 3 1 1 1 3 3 1 3 4 4 1 1 1 1 1 4 1 3 1
 [97] 3 1 1 1 3 3 3 1 1 1 3 3 3 3 1 3 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 3
[129] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 3 3 1 3 3
[161] 3 1 1 3 3 3 3 1 1 1 3 3 1 3 1 3 1 1 1 1 1 2 1 2 1 1 1 1 3 3 1 1
[193] 1 3 1 1 1 3 1 1 3 3 1 1 1 3 1 3 1 3 1 4 1 1 3 1 3 1 3 1 1 1 1 3
[225] 1 1 3 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 4 1 1 1 1
[257] 1 1 1 1 1 1 1 1 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 3 1 3 1 1 1 1 1 1
[289] 1 1 1 1 1 3 1 3 1 3 3 1 3 3 3 3 3 3 3 1 1 3 1 1 3 1 1 3 1 1 1 3
[321] 1 1 1 1 1 2 1 1 1 1 1 3 1 4 1 3 1 1 1 1 3 3 1 3 1 1 3 3 1 3 1 3
[353] 1 3 1 1 1 3 1 1 1 1 1 1 1 3 1 1 1 1 3 1 1 3 1 1 3 1 1 3 1 1 1 1
[385] 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 1 1 1 3 3 1 1 1 1 1 1 3
[417] 3 1 3 1 1 3 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1

I’m doing a reverse scale here.

reversedData <- modelData %>% select(one_of(scaleModel$mean %>% names)) %>%
                  map2_df(scaleModel$std, function(sd, var){var*sd})  %>%
                  map2_df(scaleModel$mean, function(mu, var){var+mu})

scaleModel$mean 
         Channel           Region            Fresh             Milk 
        1.322727         2.543182     12000.297727      5796.265909 
         Grocery           Frozen Detergents_Paper       Delicassen 
     7951.277273      3071.931818      2881.493182      1524.870455 
reversedData$cluster <- clusterModel$cluster


reversedData %>% group_by(cluster) %>% summarise_all(mean)
# A tibble: 4 x 9
  cluster Channel Region  Fresh   Milk Grocery Frozen Detergents_Paper
    <int>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl>            <dbl>
1       1    1.00   2.51 13010.  3173.   3823.  3427.             775.
2       2    1      2.67 60572. 30120.  17315. 38049.            2153 
3       3    2      2.63  8305.  8960.  13967.  1470.            5979.
4       4    2      2.5  15965. 34708.  48537.  3055.           24875.
# ... with 1 more variable: Delicassen <dbl>
boxplot(Fresh ~ cluster, data = reversedData,
        col="#80d7cc",border="pink")

The range in cluster 1 and the range in cluster 4 are almost the same. The median of the 3rd cluster and the 4th cluster is similar.

fviz_cluster(clusterModel, data = modelData,
             ggtheme = theme_minimal()
             )

Optimum “k”

K Calculation with Elbow Method

clusterModel$withinss #cost function
[1] 1020.2318  215.6517  437.3645  160.2905
clusterModel$tot.withinss 
[1] 1833.538
wss <- sapply(2:10, FUN = function(x){kmeans(modelData, centers = x, 
                        nstart = 10, iter.max = 15)$tot.withinss})

plot(2:10, wss, type = "b")
fviz_nbclust(modelData, kmeans, method = "wss")

K Calculation with Silhouette Method

silhouette(clusterModel$cluster, dist(modelData))
       cluster neighbor   sil_width
  [1,]       3        1  0.37355819
  [2,]       3        1  0.42920620
  [3,]       3        1  0.23506491
  [4,]       1        3  0.45914352
  [5,]       3        1  0.17530370
  [6,]       3        1  0.31555646
  [7,]       3        1  0.30676907
  [8,]       3        1  0.38282887
  [9,]       1        3  0.41154373
 [10,]       3        1  0.50823261
 [11,]       3        1  0.42750335
 [12,]       3        1  0.13984907
 [13,]       3        1  0.27744045
 [14,]       3        1  0.38694142
 [15,]       3        1  0.36442282
 [16,]       1        3  0.46915071
 [17,]       3        1  0.46288391
 [18,]       1        3  0.33310271
 [19,]       3        1  0.32248382
 [20,]       1        3  0.36150808
 [21,]       3        1  0.24156874
 [22,]       1        3  0.49248284
 [23,]       1        3  0.29687637
 [24,]       3        4  0.12069406
 [25,]       3        1  0.30097041
 [26,]       3        1  0.31868219
 [27,]       1        3  0.50662730
 [28,]       1        3  0.47953941
 [29,]       3        1  0.38716473
 [30,]       1        3  0.29738327
 [31,]       1        3  0.31004413
 [32,]       1        3  0.44539988
 [33,]       1        3  0.43473774
 [34,]       1        3  0.36141265
 [35,]       1        3  0.44200170
 [36,]       3        1  0.41374814
 [37,]       1        3  0.30346113
 [38,]       3        1  0.45887854
 [39,]       3        1  0.47304671
 [40,]       1        3  0.23760603
 [41,]       1        3  0.27984785
 [42,]       1        3  0.38206340
 [43,]       3        1  0.47398083
 [44,]       3        1  0.46348721
 [45,]       3        1  0.43018426
 [46,]       3        1  0.38214253
 [47,]       3        1  0.48087479
 [48,]       4        3  0.38141847
 [49,]       3        1  0.44822227
 [50,]       3        4  0.33332161
 [51,]       1        3  0.47219973
 [52,]       1        3  0.38008365
 [53,]       3        1  0.08958941
 [54,]       3        1  0.46123988
 [55,]       1        3  0.42858355
 [56,]       1        3  0.42272643
 [57,]       4        3 -0.10182080
 [58,]       3        1  0.48629465
 [59,]       1        3  0.48482054
 [60,]       1        3  0.31874080
 [61,]       3        1  0.34269869
 [62,]       4        3  0.41550515
 [63,]       3        1  0.34333119
 [64,]       3        1  0.49659390
 [65,]       1        3  0.47025066
 [66,]       4        3  0.17164203
 [67,]       1        3  0.31549916
 [68,]       3        1  0.40087008
 [69,]       1        3  0.34849739
 [70,]       1        3  0.47845096
 [71,]       1        3  0.39379264
 [72,]       1        3  0.02676125
 [73,]       1        3  0.37467800
 [74,]       3        1  0.12862857
 [75,]       3        1  0.36989976
 [76,]       1        3  0.47992034
 [77,]       1        3  0.37150145
 [78,]       3        1  0.42843697
 [79,]       1        3  0.50248514
 [80,]       1        3  0.41912887
 [81,]       1        3  0.47355546
 [82,]       3        1  0.48247178
 [83,]       3        1  0.43931840
 [84,]       1        3  0.47195726
 [85,]       3        1  0.28161978
 [86,]       4        3  0.39089340
 [87,]       4        3  0.25492925
 [88,]       1        3  0.08954934
 [89,]       1        3  0.39768825
 [90,]       1        3  0.38708226
 [91,]       1        3  0.50758707
 [92,]       1        3  0.42921254
 [93,]       4        3  0.02147292
 [94,]       1        3  0.11956719
 [95,]       3        1  0.45923962
 [96,]       1        3  0.39607519
 [97,]       3        1  0.32373037
 [98,]       1        3  0.44041210
 [99,]       1        3  0.44322499
[100,]       1        3  0.48246085
[101,]       3        1  0.44585231
[102,]       3        1  0.50273386
[103,]       3        1  0.40145904
[104,]       1        3  0.17004264
[105,]       1        3  0.47285853
[106,]       1        3  0.47492519
[107,]       3        1  0.46159765
[108,]       3        1  0.49645456
[109,]       3        1  0.36214506
[110,]       3        1  0.33799707
[111,]       1        3  0.50119125
[112,]       3        1  0.49384178
[113,]       1        3  0.40266365
[114,]       1        3  0.49598957
[115,]       1        3  0.49019488
[116,]       1        3  0.50597195
[117,]       1        3  0.48993422
[118,]       1        3  0.45952277
[119,]       1        3  0.46916745
[120,]       1        3  0.49935322
[121,]       1        3  0.48901930
[122,]       1        3  0.47524012
[123,]       1        3  0.50594792
[124,]       3        1  0.34928222
[125,]       1        3  0.36133157
[126,]       1        3  0.14334739
[127,]       1        3  0.43680761
[128,]       3        1  0.31611115
[129,]       1        3  0.31883136
[130,]       1        3  0.31507302
[131,]       1        3  0.46721527
[132,]       1        3  0.45158071
[133,]       1        3  0.47075941
[134,]       1        3  0.48617362
[135,]       1        3  0.48719072
[136,]       1        3  0.48621986
[137,]       1        3  0.31737489
[138,]       1        3  0.31815141
[139,]       1        3  0.39252367
[140,]       1        3  0.44420355
[141,]       1        3  0.40064496
[142,]       1        3  0.25040416
[143,]       1        3  0.29036292
[144,]       1        3  0.43869937
[145,]       1        3  0.44203643
[146,]       3        1  0.33668747
[147,]       1        3  0.49122514
[148,]       1        3  0.47864334
[149,]       1        3  0.49041734
[150,]       1        3  0.38304781
[151,]       1        3  0.48273994
[152,]       1        3  0.42773284
[153,]       1        3  0.47855459
[154,]       1        3  0.35893712
[155,]       1        3  0.43116762
[156,]       3        1  0.47326145
[157,]       3        1  0.45644839
[158,]       1        3  0.48504542
[159,]       3        1  0.42862056
[160,]       3        1  0.47950315
[161,]       3        1  0.41434722
[162,]       1        3  0.47694610
[163,]       1        3  0.49371048
[164,]       3        1  0.41658677
[165,]       3        1  0.39233591
[166,]       3        1  0.43955047
[167,]       3        1  0.40781484
[168,]       1        3  0.42490263
[169,]       1        3  0.49065356
[170,]       1        3  0.49470375
[171,]       3        1  0.48029363
[172,]       3        1  0.33401847
[173,]       1        3  0.32309458
[174,]       3        1  0.47668439
[175,]       1        3  0.43008664
[176,]       3        1  0.45803485
[177,]       1        3  0.24718967
[178,]       1        3  0.40098216
[179,]       1        3  0.41971621
[180,]       1        3  0.42986330
[181,]       1        3  0.33191953
[182,]       2        1 -0.28528892
[183,]       1        3  0.08729351
[184,]       2        3  0.15008921
[185,]       1        3  0.40861617
[186,]       1        3  0.45415628
[187,]       1        3  0.47099913
[188,]       1        3  0.24802548
[189,]       3        1  0.46540330
[190,]       3        1  0.44936267
[191,]       1        3  0.42257825
[192,]       1        3  0.48645058
[193,]       1        3  0.47536258
[194,]       3        1  0.42518881
[195,]       1        3  0.48199920
[196,]       1        3  0.40763001
[197,]       1        3  0.19696039
[198,]       3        1  0.24343766
[199,]       1        3  0.33669081
[200,]       1        3  0.34529822
[201,]       3        1  0.33402418
[202,]       3        1  0.32827953
[203,]       1        3  0.16534868
[204,]       1        3  0.31148831
[205,]       1        3  0.31826241
[206,]       3        1  0.34325373
[207,]       1        3  0.34897816
[208,]       3        1  0.18240688
[209,]       1        3  0.25943629
[210,]       3        1  0.33501539
[211,]       1        3  0.35158878
[212,]       4        3  0.15269972
[213,]       1        3  0.34698152
[214,]       1        3  0.21927604
[215,]       3        1  0.26738101
[216,]       1        3  0.04176371
[217,]       3        4  0.25115801
[218,]       1        3  0.36311709
[219,]       3        1  0.23345203
[220,]       1        3  0.35018521
[221,]       1        3  0.36676086
[222,]       1        3  0.16820371
[223,]       1        3  0.33490570
[224,]       3        1  0.05608619
[225,]       1        3  0.35181231
[226,]       1        3  0.33361678
[227,]       3        1  0.14311093
[228,]       1        3  0.34603227
[229,]       1        3  0.26894367
[230,]       1        3  0.34260346
[231,]       3        1  0.05508568
[232,]       1        3  0.23551341
[233,]       1        3  0.32202494
[234,]       1        3  0.29840220
[235,]       1        3  0.34266211
[236,]       1        3  0.28836771
[237,]       1        3  0.35551388
[238,]       1        3  0.35899210
[239,]       1        3  0.35567233
[240,]       1        3  0.24184457
[241,]       1        3  0.27515792
[242,]       1        3  0.34002688
[243,]       1        3  0.36047111
[244,]       1        3  0.29985147
[245,]       1        3  0.18897605
[246,]       3        1  0.30325475
[247,]       1        3  0.35515569
[248,]       1        3  0.37100729
[249,]       1        3  0.32873256
[250,]       1        3  0.33807492
[251,]       1        3  0.33992660
[252,]       4        3 -0.02490560
[253,]       1        3  0.32540169
[254,]       1        3  0.17201201
[255,]       1        3  0.15502567
[256,]       1        3  0.32557325
[257,]       1        3  0.31505877
[258,]       1        3  0.31618367
[259,]       1        3  0.20573178
[260,]       1        3  0.19858004
[261,]       1        3  0.32480945
[262,]       1        3  0.35387762
[263,]       1        3  0.34113578
[264,]       1        3  0.28379501
[265,]       3        1  0.30995183
[266,]       1        3  0.05634869
[267,]       3        1  0.30534220
[268,]       1        3  0.32486555
[269,]       3        1  0.30401755
[270,]       1        3  0.36759146
[271,]       1        3  0.35042011
[272,]       1        3  0.31292883
[273,]       1        3  0.24451429
[274,]       1        3  0.35699129
[275,]       1        3  0.44023157
[276,]       1        3  0.44731244
[277,]       1        3  0.37217089
[278,]       1        3  0.28375269
[279,]       1        3  0.47970342
[280,]       3        1  0.37070234
[281,]       1        3  0.46340626
[282,]       3        1  0.33195287
[283,]       1        3  0.26158848
[284,]       1        3  0.37382786
[285,]       1        3  0.14926568
[286,]       1        3  0.31783015
[287,]       1        3  0.48056051
[288,]       1        3  0.43837402
[289,]       1        3  0.47872790
[290,]       1        3  0.31685902
[291,]       1        3  0.44027919
[292,]       1        3  0.48454673
[293,]       1        3  0.44839086
[294,]       3        1  0.43151571
[295,]       1        3  0.39926672
[296,]       3        1  0.18459551
[297,]       1        3  0.45351092
[298,]       3        1  0.26672514
[299,]       3        1  0.34973423
[300,]       1        3  0.40504691
[301,]       3        1  0.24952735
[302,]       3        1  0.44791954
[303,]       3        1  0.37609132
[304,]       3        1  0.38798736
[305,]       3        1  0.41746796
[306,]       3        1  0.35473358
[307,]       3        1  0.45509056
[308,]       1        3  0.45019105
[309,]       1        3  0.44087176
[310,]       3        1  0.38387608
[311,]       1        3  0.34819467
[312,]       1        3  0.36231478
[313,]       3        1  0.33987396
[314,]       1        3  0.46857455
[315,]       1        3  0.41719492
[316,]       3        1  0.37223257
[317,]       1        3  0.46567554
[318,]       1        3  0.37593054
[319,]       1        3  0.45261073
[320,]       3        1  0.36073020
[321,]       1        3  0.40699767
[322,]       1        3  0.45399578
[323,]       1        3  0.45291391
[324,]       1        3  0.35010494
[325,]       1        3  0.41699408
[326,]       2        1 -0.10719072
[327,]       1        3  0.45393530
[328,]       1        3  0.41186050
[329,]       1        3  0.42793573
[330,]       1        3  0.43675500
[331,]       1        3  0.45803029
[332,]       3        1  0.42418446
[333,]       1        3  0.42285518
[334,]       4        3  0.23113100
[335,]       1        3  0.02381463
[336,]       3        1  0.25390623
[337,]       1        3  0.45862834
[338,]       1        3  0.42658799
[339,]       1        3  0.26491718
[340,]       1        3  0.35987323
[341,]       3        1  0.42039587
[342,]       3        1  0.44941579
[343,]       1        3  0.26250322
[344,]       3        1  0.38812809
[345,]       1        3  0.47708626
[346,]       1        3  0.29995958
[347,]       3        1  0.48360358
[348,]       3        1  0.30395956
[349,]       1        3  0.46211930
[350,]       3        1  0.45343196
[351,]       1        3  0.47981236
[352,]       3        1  0.44251389
[353,]       1        3  0.44424632
[354,]       3        1  0.46951044
[355,]       1        3  0.39691871
[356,]       1        3  0.42908542
[357,]       1        3  0.46706597
[358,]       3        1  0.38997030
[359,]       1        3  0.16374206
[360,]       1        3  0.40151451
[361,]       1        3  0.47427999
[362,]       1        3  0.48215429
[363,]       1        3  0.45370753
[364,]       1        3  0.43597284
[365,]       1        3  0.46448225
[366,]       3        1  0.39443789
[367,]       1        3  0.49668354
[368,]       1        3  0.47576168
[369,]       1        3  0.46179696
[370,]       1        3  0.48643111
[371,]       3        1  0.04959055
[372,]       1        3  0.43186364
[373,]       1        3  0.33771000
[374,]       3        1  0.29743331
[375,]       1        3  0.49271607
[376,]       1        3  0.47283265
[377,]       3        1  0.40934267
[378,]       1        3  0.32817315
[379,]       1        3  0.45024435
[380,]       3        1  0.30550501
[381,]       1        3  0.42085188
[382,]       1        3  0.39855879
[383,]       1        3  0.26474702
[384,]       1        3  0.45417802
[385,]       1        3  0.02685022
[386,]       1        3  0.51014398
[387,]       1        3  0.44606705
[388,]       1        3  0.47131037
[389,]       1        3  0.50392199
[390,]       1        3  0.50993138
[391,]       1        3  0.46105024
[392,]       1        3  0.44672237
[393,]       1        3  0.41859897
[394,]       1        3  0.37568104
[395,]       1        3  0.49539363
[396,]       1        3  0.50165925
[397,]       3        1  0.41186807
[398,]       1        3  0.47246180
[399,]       1        3  0.49078885
[400,]       1        3  0.50024330
[401,]       1        3  0.48770085
[402,]       1        3  0.32728814
[403,]       1        3  0.37941534
[404,]       1        3  0.29826480
[405,]       1        3  0.45830981
[406,]       1        3  0.49681944
[407,]       1        3  0.35589657
[408,]       3        1  0.48810598
[409,]       3        1  0.30050860
[410,]       1        3  0.30030960
[411,]       1        3  0.43914490
[412,]       1        3  0.35098296
[413,]       1        3  0.23906312
[414,]       1        3  0.22090343
[415,]       1        3  0.45393505
[416,]       3        1  0.40075454
[417,]       3        1  0.48834329
[418,]       1        3  0.25842826
[419,]       3        1  0.48995046
[420,]       1        3  0.45220436
[421,]       1        3  0.09906878
[422,]       3        1  0.34567647
[423,]       1        3  0.40509502
[424,]       3        1  0.20862012
[425,]       3        1  0.32274230
[426,]       1        3  0.30417541
[427,]       1        3  0.13106038
[428,]       1        3  0.18633007
[429,]       1        3  0.41000999
[430,]       1        3  0.43997321
[431,]       1        3  0.22854768
[432,]       1        3  0.29402515
[433,]       1        3  0.42134485
[434,]       1        3  0.45315172
[435,]       1        3  0.37824588
[436,]       1        3  0.17185246
[437,]       1        3  0.34343919
[438,]       3        4  0.36682210
[439,]       1        3  0.47554004
[440,]       1        3  0.44101326
attr(,"Ordered")
[1] FALSE
attr(,"call")
silhouette.default(x = clusterModel$cluster, dist = dist(modelData))
attr(,"class")
[1] "silhouette"
# The model with silhouette score max will be the best k model.

silScore <- function(x){
             model <- kmeans(modelData, centers = x , nstart = 10, iter.max = 15)
             sil <- silhouette(model$cluster, dist(modelData))[,3]
             score <- mean(sil)
             return(score)
             }

scores <- sapply(2:10, FUN = silScore)

scores
[1] 0.3732334 0.3598428 0.3674071 0.3562636 0.3515995 0.3541891
[7] 0.3499041 0.3293357 0.3599715
plot(2:10, scores, type = "b") 

Max score is best so “2”

Let’s sort.

fviz_nbclust(modelData, kmeans, method = "silhouette")

According to the code above, the optimal 3 point came out because we gave nstart a random number.

clusterModelK2 <- kmeans(modelData, centers = 2, nstart = 50, iter.max = 20)
clusterModelK3 <- kmeans(modelData, centers = 3, nstart = 50, iter.max = 20)

fviz_cluster(clusterModelK2, modelData)

In the code above there are outliers in the blue region at the extremes, the density is at the top.

fviz_cluster(clusterModelK3, modelData)

In the above code, it created a cluster of outliers on the left.

reversedData$clusterK2 <- clusterModelK2$cluster
reversedData$clusterK3 <- clusterModelK3$cluster

reversedData %>% group_by(clusterK2) %>% summarise_all(mean) 
# A tibble: 2 x 11
  clusterK2 Channel Region  Fresh   Milk Grocery Frozen
      <int>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl>
1         1    1.03   2.50 13576.  3318.   3944.  3667.
2         2    1.99   2.63  8441. 11394.  17005.  1727.
# ... with 4 more variables: Detergents_Paper <dbl>,
#   Delicassen <dbl>, cluster <dbl>, clusterK3 <dbl>

More milk intake in cluster 2.

reversedData %>% group_by(clusterK3) %>% summarise_all(mean)
# A tibble: 3 x 11
  clusterK3 Channel Region  Fresh   Milk Grocery Frozen
      <int>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl>
1         1    1.79   2.57 26267. 33848.  39952. 10703.
2         2    1.00   2.51 13010.  3173.   3823.  3427.
3         3    2      2.62  8166.  8749.  13905.  1442.
# ... with 4 more variables: Detergents_Paper <dbl>,
#   Delicassen <dbl>, cluster <dbl>, clusterK2 <dbl>

When we look at the region, we see that there is no distinction according to this. Meat intake is higher in cluster 3.

1 and 3 restaurants 2nd cluster hotel.

As a result, we can segment customers according to this cluster and organize campaigns accordingly. The optimum number of clusters is 2 and 3. Whichever is more meaningful for the company, that is, profitable, we can use that cluster as customer segmentation.