Distributed Deep Learning Techniques for Remote Sensing Applications

Author/Creator

Author/Creator ORCID

Date

2023-01-01

Department

Information Systems

Program

Information Systems

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Subjects

Abstract

Distributed Deep Learning is a rapidly growing field that is concerned with training deep neural networks on multiple GPUs or even across multiple nodes. PyTorch is a popular deep learning framework that is known for its ease of use, flexible architecture, and support for distributed deep learning. The Distributed Data-Parallel (DDP) module in PyTorch provides an easy way for data parallelism in deep learning, enabling users to train models on multiple GPUs or nodes. In DDP, the model is replicated across multiple GPUs or nodes and each replica handles a mini-batch of the data. The gradients from each replica are accumulated and averaged, providing a single update to the model parameters. This results in faster training times and improved performance, as the model can be trained on a much larger batch size. PyTorch's DDP module is easy to use and requires minimal code changes to the existing codebase. It can be easily implemented by wrapping the model and optimizer with the DDP class, which splits the data across the available GPUs or nodes. Additionally, PyTorch's DDP supports both model parallelism and data parallelism, providing users with the ability to scale their models across multiple GPUs or nodes, even for complex models with non-trivial architectures.