Statistical Methods for Detecting Anomalous Model Behavior with Unlabeled Data

dc.contributor.advisorOates, Tim
dc.contributor.authorLagnese, Joseph Anthony
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2021-09-01T13:55:06Z
dc.date.available2021-09-01T13:55:06Z
dc.date.issued2020-01-20
dc.description.abstractAs the applications and number of production-level machine learning models continue to increase, so too does the need for appropriate monitoring frameworks for these models. Models applied to ever-changing real world data will inevitably experience a shift in their distribution of incoming data referred to as concept drift. The quick and accurate detection of concept drift is critical to the efficient and effective use of these models. While previous approaches to solving this problem have required partially or fully labeled testing data or have focused on monitoring a single metric, we propose a model- and metric-independent approach which is able to detect concept drift in unlabeled data streams. We utilize symmetrized Kullback-Leibler divergence in combination with statistical randomization testing to provide an approach which is able to detect drift with tunable sensitivity. To demonstrate the utility of our approach, we apply to logistic regression models tested on a variety of problems using the reduced-resolution MNIST dataset from UCI [1], the National Weather Service's CF6 climate dataset [2], and Blitzer et al.'s multi-domain sentiment analysis dataset [3]. Our results show that our approach is able to reliably detect sudden drift as well as gradual drift using a sliding window approach.
dc.formatapplication:pdf
dc.genretheses
dc.identifierdoi:10.13016/m2sawm-qlsd
dc.identifier.other12199
dc.identifier.urihttp://hdl.handle.net/11603/22789
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Lagnese_umbc_0434M_12199.pdf
dc.subjectconcept drift
dc.subjectdetection
dc.subjectKullback-Leibler divergence
dc.subjectmachine learning
dc.subjectrandomization testing
dc.subjectunlabeled
dc.titleStatistical Methods for Detecting Anomalous Model Behavior with Unlabeled Data
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lagnese_umbc_0434M_12199.pdf
Size:
827.38 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Lagnese-Joseph_Open.pdf
Size:
489.98 KB
Format:
Adobe Portable Document Format
Description: