Benchmarking of Parallel Climate Data Aggregation in a Distributed Environment

dc.contributor.advisorWang, Jianwu
dc.contributor.authorPrakash, Deepak
dc.contributor.departmentInformation Systems
dc.contributor.programInformation Systems
dc.date.accessioned2021-09-01T13:55:51Z
dc.date.available2021-09-01T13:55:51Z
dc.date.issued2019-01-01
dc.description.abstractIn atmospheric physics, the coverage of clouds with the frequency of its occurrence and the evaluation of different cloud properties give us Cloud Fraction. The climate data obtained from MODIS (Moderate Resolution Imaging Spectroradiometer) instrument in satellites are averaged to produce the cloud fraction on day scale and monthly scale to determine the cloud properties. There is a vast amount of data involved and takes tedious calculations and a longer time in the computation of Cloud Fraction. By introducing Big data platforms in this area and with the help of special features like data aggregation and data parallelization, results can be obtained in a faster way with effective reduction in computation time taken. This theses is one such project, where we use Python frameworks like Pandas and Dask to effectively perform the level-2 to level-3 data aggregation and to compute the cloud property results and the results are run on the parallel nodes, by gradual increase in the number of nodes used from 1,2 3 etc., and effectively monitoring the performance and to compare the time taken by these different frameworks in computing the results. Our experiments are carried out in the day level which uses close to 576 MODIS dataset file but used 100 files for all the other experimentation and we used Dask for parallel processing. Dask'sdifferent libraries like dask.dataframe, dask.delayed and dask distributed cluster methods have been used to achieve the parallelization. Our results demonstrate effective ways and the importance of parallel computing across the distributed clusters.
dc.formatapplication:pdf
dc.genretheses
dc.identifierdoi:10.13016/m2y9wf-gfbv
dc.identifier.other12121
dc.identifier.urihttp://hdl.handle.net/11603/22907
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Prakash_umbc_0434M_12121.pdf
dc.titleBenchmarking of Parallel Climate Data Aggregation in a Distributed Environment
dc.typeText
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRightsThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Prakash_umbc_0434M_12121.pdf
Size:
3.22 MB
Format:
Adobe Portable Document Format