Optimization of the K-means Clustering Algorithm through Initialized Principal Direction Divisive Partitioning

James, Bruce. “Optimization of the K-Means Clustering Algorithm through Initialized Principal Direction Divisive Partitioning.” UMBC Review: Journal of Undergraduate Research 19 (2018): 37–54. https://ur.umbc.edu/wp-content/uploads/sites/354/2019/05/umbc_review_2018_vol19.pdf#page=38

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

Data clustering is invaluable to the automated analysis of large document sets. Documents are converted into vectors in a finite dimensional space, and the resulting collection of salient features is then processed through an algorithm of one's choice, such as the classic k-means clustering algorithm. Due to the size of the feature space, different algorithms offer a trade-off between accuracy and computational efficiency. This study investigates the Principal Direction Divisive Partitioning (PDDP) algorithm, described as a top-down hierarchical technique, as a plug-in to the k-means algorithm. K-means reliance on initial random partitioning builds computational cost into the analysis. Using a PDDP initialized partition to seed k-means, computational efficiency will be compared to a k-means trial without PDDP.

Optimization of the K-means Clustering Algorithm through Initialized Principal Direction Divisive Partitioning

Files

Links to Files

Permanent Link

Collections

Author/Creator

Author/Creator ORCID

Date

Type of Work

Department

Program

Citation of Original Publication

Rights

Subjects

Abstract