Novel Categories Discovery Via Constraints on Empirical Prediction Statistics

dc.contributor.authorHasan, Zahid
dc.contributor.authorFaridee, Abu Zaher Md
dc.contributor.authorAhmed, Masud
dc.contributor.authorPurushotham, Sanjay
dc.contributor.authorKwon, Heesung
dc.contributor.authorLee, Hyungtae
dc.contributor.authorRoy, Nirmalya
dc.date.accessioned2023-07-21T20:19:40Z
dc.date.available2023-07-21T20:19:40Z
dc.date.issued2023-12-17
dc.description.abstractNovel Categories Discovery (NCD) aims to cluster novel data based on the class semantics of known classes using the openworld partial class space annotated dataset. As an alternative to the traditional pseudo-labeling-based approaches, we leverage the connection between the data sampling and the provided multinoulli (categorical) distribution of novel classes. We introduce constraints on individual and collective statistics of predicted novel class probabilities to implicitly achieve semantic-based clustering. More specifically, we align the class neuron activation distributions under Monte-Carlo sampling of novel classes in large batches by matching their empirical first-order (mean) and second-order (covariance) statistics with the multinoulli distribution of the labels while applying instance information constraints and prediction consistency under label-preserving augmentations. We then explore a directional statistics-based probability formation that learns the mixture of Von Mises-Fisher distribution of class labels in a unit hypersphere. We demonstrate the discriminative ability of our approach to realize semantic clustering of novel samples in image, video, and time-series modalities. We perform extensive ablation studies regarding data, networks, and framework components to provide better insights. Our approach maintains 94%, 93%, 85%, and 93% (approx.) classification accuracy in labeled data while achieving 90%, 84%, 72% and 75% (approx.) clustering accuracy for novel categories in Cifar10, UCF101, MPSC-ARL, and SHAR datasets that match state-of-the-art approaches without any external clustering.en_US
dc.description.sponsorshipThis research is supported by the NSF CAREER grant #1750936, REU Site grant #2050999 and U.S. Army grant #W911NF2120076.en_US
dc.description.urihttps://arxiv.org/abs/2307.03856en_US
dc.format.extent13 pagesen_US
dc.genrejournal articlesen_US
dc.genrepreprintsen_US
dc.identifierdoi:10.13016/m2pztg-rqdd
dc.identifier.urihttps://doi.org/10.48550/arXiv.2307.03856
dc.identifier.urihttp://hdl.handle.net/11603/28834
dc.language.isoen_USen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.en_US
dc.rightsPublic Domain Mark 1.0*
dc.rights.urihttp://creativecommons.org/publicdomain/mark/1.0/*
dc.titleNovel Categories Discovery Via Constraints on Empirical Prediction Statistics en_US
dc.title.alternativeNovel Categories Discovery from probability matrix perspective
dc.typeTexten_US
dcterms.creatorhttps://orcid.org/0000-0002-8495-0948en_US
dcterms.creatorhttps://orcid.org/0000-0002-8324-1197

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2307.03856v2.pdf
Size:
14.81 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: