Optimization of the K-means Clustering Algorithm through Initialized Principal Direction Divisive Partitioning
Permanent Link
Author/Creator
Author/Creator ORCID
Date
Type of Work
Department
Program
Citation of Original Publication
James, Bruce. “Optimization of the K-Means Clustering Algorithm through Initialized Principal Direction Divisive Partitioning.” UMBC Review: Journal of Undergraduate Research 19 (2018): 37–54. https://ur.umbc.edu/wp-content/uploads/sites/354/2019/05/umbc_review_2018_vol19.pdf#page=38
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Subjects
Abstract
Data clustering is invaluable to the automated analysis of large document sets. Documents are converted into vectors in a finite dimensional space, and the resulting collection of salient features is then processed through an algorithm of one's choice, such as the classic k-means clustering algorithm. Due to the size of the feature space, different algorithms offer a trade-off between accuracy and computational efficiency. This study investigates the Principal Direction Divisive Partitioning (PDDP) algorithm, described as a top-down hierarchical technique, as a plug-in to the k-means algorithm. K-means reliance on initial random partitioning builds computational cost into the analysis. Using a PDDP initialized partition to seed k-means, computational efficiency will be compared to a k-means trial without PDDP.
