Center-Based Clustering with Divergence

dc.contributor.advisorKogan, Jacob
dc.contributor.authorSu, Ziqiu
dc.contributor.departmentMathematics and Statistics
dc.contributor.programMathematics, Applied
dc.date.accessioned2015-10-14T03:11:22Z
dc.date.available2015-10-14T03:11:22Z
dc.date.issued2010-01-01
dc.description.abstractFamilies of center-based clustering methods are capable of handling high dimensional sparse data arising in Text Mining applications. All current center-based algorithms seek to minimize a particular objective function with an attempt to improve upon the standard k-means algorithm. This dissertation links traditional partition based approach to k-means with optimization treatment of the clustering problem. The interplay between two approaches sheds light on solutions generated by optimization techniques. We investigate the Hessian of the objective function, i.e, the second order optimality condition. When an algorithm leads to a critical point where the Hessian fails to be positive definite, we propose a way to move away from the critical point. Such a move may lead to possible improvement. In many cases additional information about the desired type of clusters is available. When incorporated into the clustering process this information may lead to better clustering results. We consider pairwise constrained clustering. In pairwise constrained clustering, we may have information about pairs of vectors that may not belong to the same cluster (cannot-links), information about pairs of vectors that must belong to the same cluster (must-links), or both. We focus on two k-means type clustering algorithms and two different distance--like functions. The clustering algorithms are k-means and spherical k-means. The distance-like functions are reverse Bregman divergence and cosine similarity. We show that for these algorithms and distance-like functions the pairwise constrained clustering problem can be reduced to clustering with cannot-link constraints only. We substitute cannot-link constraints by penalty, and propose clustering algorithms that tackle clustering with penalties.
dc.formatapplication/pdf
dc.genredissertations
dc.identifierdoi:10.13016/M2JM3Z
dc.identifier.other10355
dc.identifier.urihttp://hdl.handle.net/11603/1000
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.relation.ispartofUMBC Mathematics and Statistics Department Collection
dc.rightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu.
dc.sourceOriginal File Name: Su_umbc_0434D_10355.pdf
dc.subjectbregman
dc.subjectclustering
dc.subjectconstrained
dc.subjectk-means
dc.subjectsmoothing
dc.titleCenter-Based Clustering with Divergence
dc.typeText
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan through a local library, pending author/copyright holder's permission.

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
24322.pdf
Size:
609.05 KB
Format:
Adobe Portable Document Format