• Login
    View Item 
    •   Maryland Shared Open Access Repository Home
    • ScholarWorks@UMBC
    • UMBC College of Engineering and Information Technology
    • UMBC Computer Science and Electrical Engineering Department
    • View Item
    •   Maryland Shared Open Access Repository Home
    • ScholarWorks@UMBC
    • UMBC College of Engineering and Information Technology
    • UMBC Computer Science and Electrical Engineering Department
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Deep Understanding of a Document's Structure

    Thumbnail
    Files
    857.pd.pdf (1.398Mb)
    Links to Files
    https://dl.acm.org/citation.cfm?id=3148055.3148080
    Permanent Link
    https://doi.org/10.1145/3148055.3148080
    http://hdl.handle.net/11603/11581
    Collections
    • UMBC Computer Science and Electrical Engineering Department
    • UMBC Faculty Collection
    • UMBC Student Collection
    Metadata
    Show full item record
    Author/Creator
    Rahman, Muhammad Mahbubur
    Finin, Tim
    Date
    2017-12-05
    Type of Work
    11 pages
    Text
    conference paper pre-print
    Rights
    This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
    Subjects
    deep learning
    learning
    natural language processing
    UMBC Ebiquity Research Group
    Abstract
    Current language understanding approaches focus on small documents, such as newswire articles, blog posts, product reviews and discussion forum discussions. Understanding and extracting information from large documents like legal briefs, proposals, technical manuals and research articles is still a challenging task. We describe a framework that can analyze a large document and help people to locate desired information in it. We aim to automatically identify and classify different sections of documents and understand their purpose within the document. A key contribution of our research is modeling and extracting the logical structure of electronic documents using machine learning techniques, including deep learning. We also make available a dataset of information about a collection of scholarly articles from the arXiv eprints collection that includes a wide range of metadata for each article, including a table of contents, section labels, section summarizations and more. We hope that this dataset will be a useful resource for the machine learning and language understanding communities for information retrieval, content-based question answering and language modeling tasks.


    Albin O. Kuhn Library & Gallery
    University of Maryland, Baltimore County
    1000 Hilltop Circle
    Baltimore, MD 21250
    www.umbc.edu/scholarworks

    Contact information:
    Email: scholarworks-group@umbc.edu
    Phone: 410-455-3021


    If you wish to submit a copyright complaint or withdrawal request, please email mdsoar-help@umd.edu.

     

     

    My Account

    LoginRegister

    Browse

    This CollectionBy Issue DateTitlesAuthorsSubjectsType

    Statistics

    View Usage Statistics


    Albin O. Kuhn Library & Gallery
    University of Maryland, Baltimore County
    1000 Hilltop Circle
    Baltimore, MD 21250
    www.umbc.edu/scholarworks

    Contact information:
    Email: scholarworks-group@umbc.edu
    Phone: 410-455-3021


    If you wish to submit a copyright complaint or withdrawal request, please email mdsoar-help@umd.edu.