Benchmarking Parallel K-Means Cloud Type Clustering from Satellite Data
MetadataShow full item record
Type of Work27 pages
Citation of Original PublicationBarajas, Carlos; Guo, Pei; Mukherjee, Lipi; Hoban, Susan; Wang, Jianwu; Jin, Daeho; Gangopadhyay, Aryya; Gobbert, Matthias K.; Benchmarking Parallel K-Means Cloud Type Clustering from Satellite Data; International Symposium on Benchmarking, Measuring and Optimization Journal; Benchmarking, Measuring, and Optimizing pp 248-260; https://link.springer.com/chapter/10.1007/978-3-030-32813-9_20#citeas
RightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Public Domain Mark 1.0
This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
The study of clouds, i.e., where they occur and what are their characteristics, plays a key role in the understanding of climate change. Clustering is a common machine learning technique used in atmospheric science to classify cloud types. Many parallelism techniques e.g., MPI, OpenMP and Spark, could achieve efficient and scalable clustering of large-scale satellite observation data. In order to understand their differences, this paper studies and compares three different approaches on parallel clustering of satellite observation data. Benchmarking experiments with k-means clustering are conducted with three parallelism techniques, namely OpenMP, OpenMP+MPI, and Spark, on a HPC cluster using up to 16 nodes.
The following license files are associated with this item:
- Creative Commons