LIBFLOW: A PLATFORM TO SCHEDULE AND MANAGE WORKFLOWS USING DAGS

dc.contributor.advisorNicholas, Charles
dc.contributor.authorPanhalkar, Shreyas
dc.contributor.departmentComputer Science and Electrical Engineering
dc.contributor.programComputer Science
dc.date.accessioned2021-01-29T18:12:29Z
dc.date.available2021-01-29T18:12:29Z
dc.date.issued2019-01-01
dc.description.abstractWith continuous user growth year-on-year, Internet companies are collecting user data on a massive scale. This raw data is in turn used for generating interesting insights and using those insights to perform better. Due to various use cases, companies typically use different data stores to store a different kind of data. To name a few, Apache Hive is often being used for large-scale bulk data processing while Amazon Redshift is being for fast and real-time analytical queries. Thus, owing to various business needs and the increasing complexity of underlying data, companies are moving away from a traditional one-for-all data warehousing solution. The heterogeneous nature of these platforms' API possesses difficulty for data engineers to write a series of transformations to process data from various sources. In this work, we propose a platform, to help data engineers easily write workflows to process large-scale data involving multiple data warehouses, without much rudimentary work. To address the data dependency issues, this platform uses Directed Acyclic Graphs to define workflows and Johnson's algorithm to detect elementary cycles.
dc.formatapplication:pdf
dc.genretheses
dc.identifierdoi:10.13016/m2pzan-ngpe
dc.identifier.other12043
dc.identifier.urihttp://hdl.handle.net/11603/20711
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Panhalkar_umbc_0434M_12043.pdf
dc.subjectanalytics
dc.subjectbig data
dc.subjectgraphs
dc.subjectworkflows
dc.titleLIBFLOW: A PLATFORM TO SCHEDULE AND MANAGE WORKFLOWS USING DAGS
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Panhalkar_umbc_0434M_12043.pdf
Size:
449.18 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
PanhalkarSLIBFLOW_Open.pdf
Size:
39.77 KB
Format:
Adobe Portable Document Format
Description: