Reinforcement learning (RL) has been significantly advanced in the past few years thanks to the incorporation of deep neural networks and successfully applied to many areas of artificial intelligence such as robotics and natural language processing. However, existing deep RL algorithms often require an excessive number of samples (i.e. interactions with the environment).
This project aims to improve the sample complexity of RL algorithms and to make them more applicable to scenarios where environmental interactions are costly. We have the following three concrete goals:
- Goal 1: develop novel sample-efficient deep RL algorithms
- Goal 2: learn dynamical models for predicting the future, and exploit the dynamical model to achieve better long-term planning
- Goal 3: apply proposed methods to natural language processing tasks such as task-oriented dialogues.