Learning to think before acting

Project description


Following the success of DeepMind playing Atari games and beating the word champion in GO, this project intends to merge the best out of Learning and Planning. AlphaGO has two main components, the learning component using neural networks to learn a reward function and the state representation, while Planning with Monte Carlo Tree Search to find the best course of action using the learned state and reward functions to guide the search. The best Planning algorithms for Atari are based on a different search algorithm called Iterative Width (IW), which by themselves already outperform the learning agents. Even it's great performance, IW’s main bottleneck is the computational cost of the transition function, as it has to call the simulator engine to generate each possible successor. In this project we are going to explore how to learn approximate transition functions to guide the search using different Deep Reinforcement Learning techniques and either the RAM, screen or high level state features.


- Code Showing the current performance of IW playing Atari Games:




Project team

Leader: Nir Lipovetzky

Other projects

Networks and data in society projects

Optimisation of resources and infrastructure projects


Computing and Information Systems


Networks and data in society, Optimisation of resources and infrastructure


artificial intelligence; autonomous systems