Guiding Safe Reinforcement Learning Policies Using Structured Language Constraints

Prakash, Bharat; Waytowich, Nicholas; Ganesan, Ashwinkumar; Oates, Tim; Mohsenin, Tinoosh

Guiding Safe Reinforcement Learning Policies Using Structured Language Constraints

dc.contributor.author	Prakash, Bharat
dc.contributor.author	Waytowich, Nicholas
dc.contributor.author	Ganesan, Ashwinkumar
dc.contributor.author	Oates, Tim
dc.contributor.author	Mohsenin, Tinoosh
dc.date.accessioned	2020-03-04T15:39:51Z
dc.date.available	2020-03-04T15:39:51Z
dc.description.abstract	Reinforcement learning (RL) has shown success in solving complex sequential decision making tasks when a well deﬁned reward function is available. For agents acting in the real world, these reward functions need to be designed very carefully to make sure the agents act in a safe manner . This is especially true when these agents need to interact with humans and perform tasks in such settings. However, hand-crafting such a reward function often requires specialized expertise and quickly becomes difﬁcult to scale with task-complexity .This leads to the long-standing problem in reinforcement learning known as reward sparsity where sparse or poorly speciﬁed reward functions slow down the learning process and lead to sub-optimal policies and unsafe behaviors. To make matters worse, reward functions often need to be adjusted or re-speciﬁed for each task the RL agent must learn. On the other-hand, it’s relatively easy for people to specify using language what you should or shouldn’t do in order to do a task safely. Inspired by this, we propose a framework to train RL agents conditioned on constraints that are in the form of structured language, thus reducing effort to design and integrate specialized rewards into the environment. In our experiments,we show that this method can be used to ground the language to behaviors and enable the agent to solve tasks while following the constraints .We also show how the agent can transfer these skills to other tasks.	en_US
dc.description.sponsorship	This project was sponsored by the U.S. Army Research Laboratory under Cooperative Agreement Number W911NF10-2-0022 .The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the ofﬁcial policies, either expressed or implied, of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes not withstanding any copyright notation herein.We also thank Sunil Gandhi for useful discussions during the course of this project.	en_US
dc.description.uri	http://eehpc.csee.umbc.edu/publications/pdf/2020/AAAI_RL_Workshop.pdf	en_US
dc.format.extent	9 pages	en_US
dc.genre	conference papers and proceedings preprints	en_US
dc.identifier	doi:10.13016/m2yg3c-gxs4
dc.identifier.citation	Prakash Bharat, Waytowich Nicholas, Ganesan Ashwinkumar, Oates Tim, Mohsenin Tinoosh, Guiding Safe Reinforcement Learning Policies Using Structured Language Constraints, http://eehpc.csee.umbc.edu/publications/pdf/2020/AAAI_RL_Workshop.pdf	en_US
dc.identifier.uri	http://hdl.handle.net/11603/17463
dc.language.iso	en_US	en_US
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Faculty Collection
dc.relation.ispartof	UMBC Student Collection
dc.rights	This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.rights	Public Domain Mark 1.0	*
dc.rights	This work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
dc.rights.uri	http://creativecommons.org/publicdomain/mark/1.0/	*
dc.subject	Energy Efficient High Performance Computing Lab
dc.title	Guiding Safe Reinforcement Learning Policies Using Structured Language Constraints	en_US
dc.type	Text	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: AAAI_RL_Workshop.pdf
Size:: 502.25 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.56 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Faculty Collection
UMBC Student Collection