Scalable Aggregation Service for Satellite Remote Sensing Data

Author/Creator ORCID

Date

2020-09-29

Department

Program

Citation of Original Publication

Wang, Jianwu; Huang, Xin; Zheng, Jianyu; Rajapakshe, Chamara; Kay, Savio; Kandoor, Lakshmi; Maxwell, Thomas; Zhang, Zhibo; Scalable Aggregation Service for Satellite Remote Sensing Data; International Conference on Algorithms and Architectures for Parallel Processing; ICA3PP 2020: Algorithms and Architectures for Parallel Processing, pp 184-199; https://link.springer.com/chapter/10.1007/978-3-030-60239-0_13

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Public Domain Mark 1.0
This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.

Subjects

Abstract

With the advances of satellite remote sensing techniques, we are receiving huge amount of satellite observation data for the Earth. While the data greatly helps Earth scientists on their research, conducting data processing and analytics from the data is getting more and more time consuming and complicated. One common data processing task is to aggregate satellite observation data from original pixel level to latitude-longitude grid level to easily obtain global information and work with global climate models. This paper focuses on how to best aggregate NASA MODIS satellite data products from pixel level to grid level in a distributed environment and provision the aggregation capability as a service for Earth scientists to use easily. We propose three different approaches of parallel data aggregation and employ three parallel platforms (Spark, Dask and MPI) to implement the approaches. We run extensive experiments based on these parallel approaches and platforms on a local cluster to benchmark their differences in execution performance and discuss key factors to achieve good speedup. We also study how to make the provisioned service adaptable to different service libraries and protocols via a unified framework.