Unfolding the Structure of a Document using Deep Learning

Rahman, Muhammad Mahbubur; Finin, Tim

Unfolding the Structure of a Document using Deep Learning

dc.contributor.author	Rahman, Muhammad Mahbubur
dc.contributor.author	Finin, Tim
dc.date.accessioned	2019-11-22T18:08:49Z
dc.date.available	2019-11-22T18:08:49Z
dc.date.issued	2019-09-29
dc.description.abstract	Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be multi-themed, complex, noisy and cover diverse topics. We describe a framework that can analyze large documents and help people and computer systems locate desired information in them. We aim to automatically identify and classify different sections of documents and understand their purpose within the document. A key contribution of our research is modeling and extracting the logical and semantic structure of electronic documents using deep learning techniques. We evaluate the effectiveness and robustness of our framework through extensive experiments on two collections: more than one million scholarly articles from arXiv and a collection of requests for proposal documents from government sources.	en
dc.description.sponsorship	This work was partially supported by National Science Foundation grant 1549697 and gifts from IBM and Northrop Grumman.	en
dc.description.uri	https://arxiv.org/abs/1910.03678	en
dc.format.extent	16 pages	en
dc.genre	journal articles preprints	en
dc.identifier	doi:10.13016/m2b3fr-iigx
dc.identifier.citation	Rahman, Muhammad Mahbubur; Finin, Tim; Unfolding the Structure of a Document using Deep Learning; Computation and Language; https://arxiv.org/abs/1910.03678;	en
dc.identifier.uri	http://hdl.handle.net/11603/16515
dc.language.iso	en	en
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Faculty Collection
dc.rights	This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subject	document structure	en
dc.subject	deep learning	en
dc.subject	document understanding	en
dc.subject	semantic annotation	en
dc.title	Unfolding the Structure of a Document using Deep Learning	en
dc.type	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1910.03678.pdf
Size:: 2.91 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.56 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Faculty Collection