Wang, Jianwu Dr. Gobbert, Matthias Dr.Kumari, Garima2023-07-312023-07-312023-01-0112752http://hdl.handle.net/11603/28956Distributed Deep Learning is a rapidly growing field that is concerned with training deep neural networks on multiple GPUs or even across multiple nodes. PyTorch is a popular deep learning framework that is known for its ease of use, flexible architecture, and support for distributed deep learning. The Distributed Data-Parallel (DDP) module in PyTorch provides an easy way for data parallelism in deep learning, enabling users to train models on multiple GPUs or nodes. In DDP, the model is replicated across multiple GPUs or nodes and each replica handles a mini-batch of the data. The gradients from each replica are accumulated and averaged, providing a single update to the model parameters. This results in faster training times and improved performance, as the model can be trained on a much larger batch size. PyTorch's DDP module is easy to use and requires minimal code changes to the existing codebase. It can be easily implemented by wrapping the model and optimizer with the DDP class, which splits the data across the available GPUs or nodes. Additionally, PyTorch's DDP supports both model parallelism and data parallelism, providing users with the ability to scale their models across multiple GPUs or nodes, even for complex models with non-trivial architectures.application:pdfThis item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.eduDistributed Deep Learning Techniques for Remote Sensing ApplicationsText