I-CPIE Virtual Seminar: Reinforcement Learning using Generative Models for Continuous State and Action Space Systems

Wednesday, September 9 at 4:00pm to 5:00pm

Virtual Event

Reinforcement Learning using Generative Models for Continuous State and Action Space Systems

Presented by Rahul Jain, PhD, Kenneth C. Dahlberg Early Career Chair in Electrical Engineering and Associate Professor, University of Southern California

Organized by Lehigh University’s Institute for Cyber Physical Infrastructure and Energy (I-CPIE)

4:00 pm EST

September 9th, 2019

Please register here: https://forms.gle/xdrFHUV8YNsGE4Vq5

Zoom link: https://lehigh.zoom.us/j/96036473995?pwd=V1RqeTZwbFJURVU3Z3pldURLSktvQT09  

Abstract

Reinforcement Learning (RL) problems for continuous state and action space systems are among the most challenging in RL. Recently, deep reinforcement learning methods have been shown to be quite effective for certain RL problems in settings of very large/continuous state and action spaces. But such methods require extensive hyper-parameter tuning, huge amount of data, and come with no performance guarantees. We note that such methods are mostly trained `offline’ on experience replay buffers. In this talk, I will describe a series of simple reinforcement learning schemes for various settings. Our premise is that we have access to a generative model that can give us simulated samples of the next state. We will start with finite state and action space MDPs. An `empirical value learning’ (EVL) algorithm can be derived quite simply by replacing the expectation in the Bellman operator with an empirical estimate.  We note that the EVL algorithm has remarkably good numerical performance for practical purposes. We next extend this to continuous state spaces by considering randomized function approximation on a reproducible kernel Hilbert space (RKHS). This allows for arbitrarily good approximation with high probability for any problem due to its universal function approximation property. Last, I will introduce the RANDPOL (randomized function approximation for policy iteration) algorithm, an actor-critic algorithm that used randomized neural networks that can successfully solve a tough robotic problem. We also provide theoretical performance guarantees for the algorithm. I will also touch upon the probabilistic contraction analysis framework of iterative stochastic algorithms that underpins the theoretical analysis.

Biosketch

Rahul Jain is the K. C. Dahlberg Early Career Chair and Associate Professor of Electrical & Computer Engineering, Computer Science* and ISE* (*by courtesy) at the University of Southern California (USC). He received a B.Tech from the IIT Kanpur, and an MA in Statistics and a PhD in EECS from the University of California, Berkeley. Prior to joining USC, he was at the IBM T J Watson Research Center, Yorktown Heights, NY. He has received numerous awards including the NSF CAREER award, the ONR Young Investigator award, an IBM Faculty award, the James H. Zumberge Faculty Research and Innovation Award, and is a US Fulbright Scholar. His interests span reinforcement learning, statistical learning, stochastic control, stochastic networks, and game theory, and power systems and healthcare on the applications side. This talk is based on work with a number of people that includes  Vivek Borkar (IIT Bombay), Peter Glynn (Stanford), Abhishek Gupta (Ohio State), William Haskell (Purdue), Dileep Kalathil (Texas A&M), and Hiteshi Sharma (USC).

You're not going yet!

This event requires registration.