Deep Reinforcement Learning and NOMA-Based Multi-Objective RIS-Assisted IS-UAV-TNs: Trajectory Optimization and Beamforming Design

Date

2023-04-26

Department

Program

Citation of Original Publication

K. Guo, M. Wu, X. Li, H. Song and N. Kumar, "Deep Reinforcement Learning and NOMA-Based Multi-Objective RIS-Assisted IS-UAV-TNs: Trajectory Optimization and Beamforming Design," in IEEE Transactions on Intelligent Transportation Systems, doi: 10.1109/TITS.2023.3267607.

Rights

© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Subjects

Abstract

In this paper, we discuss the co-optimized performance of multi-reconfigurable intelligent surface (RIS)-assisted integrated satellite-unmanned aerial vehicle-terrestrial network (IS-UAV-TN), where the multiple vehicle users are applied to the network under consideration. The performance optimization of IS-UAV-TNs faces two major challenges: one is the obstacles in the transmission path and the other is the highly dynamic communication environment caused by the UAV movement for the multiple ground vehicle users. To tackle these above issues efficiently, we will install RIS on the UAV for the purpose of reshaping the wireless transmission path. In addition, non-orthogonal multiple access (NOMA) protocols are considered as a new paradigm to address spectrum shortage and enhance connection quality. Considering the UAV energy consumption, the satellite transmission beamforming matrix and RIS phase shift configuration, a multi-objective optimization problem is proposed to maximize the system achievable rate and minimize the UAV energy consumption during a specific mission. On this foundation, to facilitate the online decision problem, the deep reinforcement learning (DRL) algorithm is utilized to achieve real-time interaction with the communication environment. A multi-objective deep deterministic policy gradient (MO-DDPG) algorithm is proposed to search for sub-optimal solutions about the learning problem of multi-objective control policies in IS-UAV-TNs. Experimental results show that the method can simultaneously consider three optimization objectives and effectively adjust the optimal update policy according to the settings of different weight parameters.