Cluster Quality Analysis Using Silhouette Score

dc.contributor.authorShahapure, Ketan Rajshekhar
dc.contributor.authorNicholas, Charles
dc.date.accessioned2020-12-14T16:10:51Z
dc.date.available2020-12-14T16:10:51Z
dc.date.issued2020-11-20
dc.description2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), 6-9 Oct. 2020, Sydney, Australiaen_US
dc.description.abstractClustering is an important phase in data mining. Selecting the number of clusters in a clustering algorithm, e.g. choosing the best value of k in the various k-means algorithms [1], can be difficult. We studied the use of silhouette scores and scatter plots to suggest, and then validate, the number of clusters we specified in running the k-means clustering algorithm on two publicly available data sets. Scikit-learn's [4] silhouette score method, which is a measure of the quality of a cluster, was used to find the mean silhouette co-efficient of all the samples for different number of clusters. The highest silhouette score indicates the optimal number of clusters. We present several instances of utilizing the silhouette score to determine the best value of k for those data sets.en_US
dc.description.urihttps://ieeexplore.ieee.org/document/9260048/authors#authorsen_US
dc.format.extent2 pagesen_US
dc.genreconference papers and proceedings postprintsen_US
dc.identifierdoi:10.13016/m2zwe2-2w49
dc.identifier.citationK. R. Shahapure and C. Nicholas, "Cluster Quality Analysis Using Silhouette Score," 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), sydney, Australia, 2020, pp. 747-748, doi: 10.1109/DSAA49011.2020.00096.en_US
dc.identifier.urihttps://doi.org/10.1109/DSAA49011.2020.00096
dc.identifier.urihttp://hdl.handle.net/11603/20251
dc.language.isoen_USen_US
dc.publisherIEEEen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.rights© 2020 IEEE.  Personal use of this material is permitted.  Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.titleCluster Quality Analysis Using Silhouette Scoreen_US
dc.typeTexten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Cluster_Quality_Analysis_Using_Silhouette_Score_DSAA (2).pdf
Size:
436.87 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: