Wednesday Nov 29, 12pm-1pm, 1116-E Klaus

Solving Markov Decision Processes via Dual Embedding

Bo Dai

Advisor: Prof. Le Song

ABSTRACT

Reinforcement learning aims to learn a policy that maximizes the long-term return by sequentially interacting with an unknown environment. The dominating framework to model such an interaction is Markov decision processes, or MDPs. It remains an open question how to evaluate and optimize a policy reliably in the MDPs with nonlinear function approximators in the presence of off-policy data for large state and action space. In this talk, we take a substantial step towards solving this decades-long open problem. By leveraging the Fenchel duality, we represent the Bellman equation into a unique saddle point optimization problem equivalently, and design stochastic algorithms for solving the corresponding optimizations. We analyze the convergence of the proposed algorithms for both policy evaluation and policy learning problems and provide PAC learning bound on the number of samples needed from one single off-policy sample path for policy learning. Finally, we show that the algorithms also enjoy strong empirical performances achieving the state-of-the-art performances on several benchmark tasks.

BIO

Bo Dai is a Ph.D. candidate in Computational Science and Engineering at Georgia Tech, advised by Prof. Le Song. His research interests lie in developing effective statistical models and efficient algorithms for learning from a massive volume of complex and structured data, including large-scale nonparametric methods, reinforcement learning, and structured data modeling.