Benchmarking Resource Usage of Underlying Datatypes of Apache Spark

dc.contributor.authorNicholls, Brittany
dc.contributor.authorAdangwa, Mariama
dc.contributor.authorEstes, Rachel
dc.contributor.authorIradukunda, Hugues Nelson
dc.contributor.authorZhang, Qingquan
dc.contributor.authorZhu, Ting
dc.date.accessioned2021-05-20T15:13:26Z
dc.date.available2021-05-20T15:13:26Z
dc.date.issued2020-12-08
dc.description.abstractThe purpose of this paper is to examine how resource usage of an analytic is affected by the different underlying datatypes of Spark analytics - Resilient Distributed Datasets (RDDs), Datasets, and DataFrames. The resource usage of an analytic is explored as a viable, and preferred alternative of benchmarking big data analytics instead of the current common benchmarking performed using execution time. The run time of an analytic is shown to not be guaranteed to be a reproducible metric since many external factors to the job can affect the execution time. Instead, metrics readily available through Spark including peak execution memory are used to benchmark the resource usage of these different datatypes in common applications of Spark analytics, such as counting, caching, repartitioning, and KMeans.en
dc.description.urihttps://arxiv.org/abs/2012.04192en
dc.format.extent10 pagesen
dc.genrejournal articles preprintsen
dc.identifierdoi:10.13016/m2mypb-lapj
dc.identifier.citationNicholls, Brittany; Adangwa, Mariama; Estes, Rachel; Iradukunda, Hugues Nelson; Zhang, Qingquan; Zhu, Ting; Benchmarking Resource Usage of Underlying Datatypes of Apache Spark; Systems and Control (2020); https://arxiv.org/abs/2012.04192en
dc.identifier.urihttp://hdl.handle.net/11603/21573
dc.language.isoenen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectApache Spark analyticsen
dc.subjectresource usage for datatypesen
dc.subjectbenchmarking big data analyticsen
dc.titleBenchmarking Resource Usage of Underlying Datatypes of Apache Sparken
dc.typeTexten

Files

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: