Learning to think before acting
Following the success of DeepMind playing Atari games and beating the word champion in GO, this project intends to merge the best out of Learning and Planning. AlphaGO has two main components, the learning component using neural networks to learn a reward function and the state representation, while Planning with Monte Carlo Tree Search to find the best course of action using the learned state and reward functions to guide the search. The best Planning algorithms for Atari are based on a different search algorithm called Iterative Width (IW), which by themselves already outperform the learning agents. Even it's great performance, IW’s main bottleneck is the computational cost of the transition function, as it has to call the simulator engine to generate each possible successor. In this project we are going to explore how to learn approximate transition functions to guide the search using different Deep Reinforcement Learning techniques and either the RAM, screen or high level state features.
- Code Showing the current performance of IW playing Atari Games:
Leader: Nir Lipovetzky
Computing and Information Systems
Networks and data in society, Optimisation of resources and infrastructure
artificial intelligence; autonomous systems