Understanding the Logical and Semantic Structure of Large Documents

dc.contributor.authorRahman, Muhammad Mahbubur
dc.date.accessioned2018-10-19T13:52:36Z
dc.date.available2018-10-19T13:52:36Z
dc.date.issued2017-04-27
dc.descriptionSDM 2016 Doctoral Forum, SIAM International Conference on Data Miningen_US
dc.description.abstractUp-to-the-minute language understanding approaches are mostly focused on small documents such as newswire articles, blog posts, product reviews and discussion forum en- tries. Understanding and extracting information from large documents such as legal documents, reports, proposals, technical manuals and research articles is still a challenging task. The reason behind this challenge is that the documents may be multi-themed, complex and cover diverse topics. For example, business opportunities may contain information on the background of the business, product or service of the business, plan, team management, financial or budget related data, competitors, logistics, compliance, legal information and boilerplate content that is repeated across documents. The content can be split into multiple files or aggregated into one large file. As a result, the content in the whole document may have different structures and formats. Furthermore, the information is expressed in different forms such as paragraphs of text, headers, data forms, tables, images, mathematical equations, lists or a nested combination of these structures.en_US
dc.description.urihttps://ebiquity.umbc.edu/paper/html/id/786/Understanding-the-Logical-and-Semantic-Structure-of-Large-Documentsen_US
dc.format.extent5 pagesen_US
dc.genreconference paper pre-printen_US
dc.identifierdoi:10.13016/M2S756P6F
dc.identifier.citationMuhammad Mahbubur Rahman, Understanding the Logical and Semantic Structure of Large Documents, SDM 2016 Doctoral Forum, SIAM International Conference on Data Mining ,2017,https://ebiquity.umbc.edu/paper/html/id/786/Understanding-the-Logical-and-Semantic-Structure-of-Large-Documentsen_US
dc.identifier.urihttp://hdl.handle.net/11603/11615
dc.language.isoen_USen_US
dc.publisherSIAMen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectnatural language processingen_US
dc.subjectlearningen_US
dc.subjectSemantic Structureen_US
dc.subjectUMBC Ebiquity Research Groupen_US
dc.titleUnderstanding the Logical and Semantic Structure of Large Documentsen_US
dc.typeTexten_US

Files

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: