Kernearl-Based Lifelong Policy Gradient Reinforcement Learning
Loading...
Links to Files
Author/Creator
Author/Creator ORCID
Date
2021-05-13
Type of Work
Department
Program
Citation of Original Publication
Mowakeaa, Rami; Kim, Seung-Jun; Emge, Darren K.; Kernearl-Based Lifelong Policy Gradient Reinforcement Learning; ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 13, 2021; https://doi.org/10.1109/ICASSP39728.2021.9414511
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Public Domain Mark 1.0
This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
Public Domain Mark 1.0
This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
Subjects
Abstract
Policy gradient methods have been widely used in reinforcement learning (RL), especially thanks to their facility to handle continuous state spaces, strong convergence guarantees, and low-complexity updates. Training of the methods for individual tasks, however, can still be taxing in terms of the learning speed and the sample trajectory collection. Lifelong learning aims to exploit the intrinsic structure shared among a suite of RL tasks, akin to multitask learning, but in an efficient online fashion. In this work, we propose a lifelong RL algorithm based on the kernel method to leverage nonlinear features of the data based on a popular union-of-subspace model. Experimental results on a set of simple related tasks verify the advantage of the proposed strategy, compared to the single-task and the parametric counterparts.