Paper Title: Human-level control through deep reinforcement learning

Authors: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg and Demis Hassabis

Problem:

The theory of reinforcement learning deals with how artificial agents make intelligent decisions or actions in an environment in order to maximize their reward. But, to use this successfully in situations approaching real-world complexity, the agents have a difficult task to derive efficient representations of the environment.

The goal of this project is to provide an artificial agent a gamified environment, with a certain set of rules for receiving rewards for taking actions that could maximize them. Eventually, demonstrate how this agent can learn to do this task in real time.

Approach:

The recent advances in training deep neural networks to develop an artificial agent capable of taking intelligent decisions, by learning successful policies using end-to-end reinforcement learning.

The deep Q-network agent is provided with the environments of classic Atari 2600 games. The agent receives environment information in the form of pixels through the environment image at any particular state, and the game score at that state. The agent is challenged to maximize this score, through a combination of techniques like deep convolution network architecture, experience replay and deep Q-network.

Through this project I aim to demonstrate that using only the information mentioned above, the deep Q-network is able to achieve or in some case surpass the level of a professional human games tester.

Data:

The networks are trained to run on the environments available in the open-AI gym library. In this project I have trained the model on following 2 environments:

Running the project:

Install the following python packages before running the project

Run the main.py file with following arguments:

Implementation details:

Network Architecture:

Input (84 x 84 x 4)
Conv2D (32, (8, 8), Activation=relu)
Conv2D (64, (4, 4), Activation=relu)
Conv2D (64, (3, 3), Activation=relu)
Flatten()
Dense (512, Activation=relu)
Dense (1, Activation=linear)
Output (Policy=epsilon-greedy policy, Optimizer=RMSprop, metrics=mae)

Results:

Inferences:

References: