• Login
    View Item 
    •   Maryland Shared Open Access Repository Home
    • ScholarWorks@UMBC
    • UMBC Interdepartmental Collections
    • UMBC Theses and Dissertations
    • View Item
    •   Maryland Shared Open Access Repository Home
    • ScholarWorks@UMBC
    • UMBC Interdepartmental Collections
    • UMBC Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Efficient Recovery from Repeated Domain Shifts in Streaming Data

    Thumbnail
    Files
    Gandhewar_umbc_0434M_11482.pdf (886.6Kb)
    Permanent Link
    http://hdl.handle.net/11603/15474
    Collections
    • UMBC Theses and Dissertations
    Metadata
    Show full item record
    Author/Creator
    Unknown author
    Date
    2016-01-01
    Type of Work
    Text
    thesis
    Department
    Computer Science and Electrical Engineering
    Program
    Computer Science
    Rights
    This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
    Distribution Rights granted to UMBC by the author.
    Abstract
    Humans have a remarkable ability to learn how to learn, what to learn, and when to learn. We are able to assess the utility of learned knowledge to achieve an objective and adapt our learning strategies accordingly. Likewise, we want machine learning systems trained in one domain to adapt well to different domains. If a classifier system encounters a distribution which it has seen previously, it should remember the previously learned knowledge and classify accordingly. This thesis addresses the problem of recovering efficiently from repeated domain shifts in streaming data for a classifier system. This problem can be divided into two sub-problems. The first sub-problem is detecting a domain shift in a data stream representing learned knowledge. Like (Dredze, Oates, & Piatko 2010), we also use the A-distance (Kifer, Ben-David, & Gehrke 2004) over the absolute value of classification margin of support vector machines for this task. The second sub-problem is deciding what action to take after a domain shift is detected. We propose and evaluate approaches to training new models and deciding when to reuse old models to minimize cost and maximize accuracy in the face of repeated domain shifts. We use the Amazon product reviews dataset for evaluating our algorithm.


    Albin O. Kuhn Library & Gallery
    University of Maryland, Baltimore County
    1000 Hilltop Circle
    Baltimore, MD 21250
    www.umbc.edu/scholarworks

    Contact information:
    Email: scholarworks-group@umbc.edu
    Phone: 410-455-3544


    If you wish to submit a copyright complaint or withdrawal request, please email mdsoar-help@umd.edu.

     

     

    My Account

    LoginRegister

    Browse

    This CollectionBy Issue DateTitlesAuthorsSubjectsType

    Statistics

    View Usage Statistics


    Albin O. Kuhn Library & Gallery
    University of Maryland, Baltimore County
    1000 Hilltop Circle
    Baltimore, MD 21250
    www.umbc.edu/scholarworks

    Contact information:
    Email: scholarworks-group@umbc.edu
    Phone: 410-455-3544


    If you wish to submit a copyright complaint or withdrawal request, please email mdsoar-help@umd.edu.