Reinforcement Learning

Pathway Intelligence believes that Reinforcement Learning, the subfield of Artificial Intelligence concerned with intelligent agents learning optimal action, is a watershed technology which will ultimately transform the economy, politics, health care, transportation, education, and most other fields of human endeavour.

We are witnessing this in realtime: RL is currently graduating from solving toy problems, to solving real problems.

With expertise in top existing RL frameworks, a myriad of component technologies that can complement RL, and direct experience with contemporary RL theory and applications, Pathway Intelligence is well positioned at the forefront of this revolution.

Why focus on Reinforcement Learning?

In their 2018 analysis of Artificial Intelligence trends, McKinsey rightly identified Reinforcement Learning as arguably the most complex/advanced branch of Artificial Intelligence, and also highly likely to be used in (future) AI applications.

To see why this is, consider that RL is specifically designed for Action. By comparison, other branches of Machine Learning are all about passive outputs. Only by combining them with RL can their classifications and predictions affect the real world.

Reinforcement Learning can combine the best of other kinds of machine learning and AI (unsupervised learning and supervised learning, including deep learning, bayesian methods, plus classical algorithms and data structures) to solve difficult problems requiring intelligent behaviour:

  • learned intelligent action, co-ordination, planning

  • learned tactics

  • learned strategy



Robin Chauhan , Pathway Intelligence
Nov, 2018

Submission to 2018 NeurIPS Pommerman Deep Reinforcement Learning Competition. I used Tensorflow Slim, Keras, and a modified version of OpenAI Baselines to solve this challenging problem.

For this project I explored many RL frameworks (including using Google’s Dopamine), gained strong experience on practical deep learning performance / throughput / realtime AI, and produced original solutions combining supervised learning + model-based RL + model-free RL.

Due to this competition attended NeurIPS 2018 in Montreal, met and exchanged ideas with and learned from top researchers and practitioners in reinforcement learning from around the world — including legendary Reinforcement Learning Dr. Rich Sutton, Deepmind’s DQN inventor Volodymyr Mnih, OpenAI researcher Matthias Plappert, Deepmind RL researcher Hado van Hasselt, and many more.


OpenAI Five DOTA 2 agent teardown

Robin Chauhan , Pathway Intelligence
Sept 2018
Simon Fraser University VentureLabs in Harbourfront Center, Vancouver BC Canada

Part of a set of lightening talks. My talk starts at about 58:00 below, the others are worth checking out.

I present a combination of OpenAI’s content plus original analysis and explanations. Having done this careful study of OpenAI Five, it has been influential on how I now look at RL problems.


Intro to Reinforcement Learning + Deep Q-Networks

Robin Chauhan , Pathway Intelligence
Jun 14, 2018
HiVE, Gastown, Vancouver BC Canada

These two talks were combined 3 hours of content, designed for a technical audience who is not familiar with Reinforcement Learning.

After a broad overview of RL, I focused on the fundamentals of Q-learning, and then Deep Q Networks, the seminal research from Deepmind that ignited the field of Deep Reinforcement Learning. I then proceed to explain Rainbow DQN, the current State of the Art in value-function-based agents at time of writing.

I combine well-cited authoritative sources, the best of content from research papers and the internet, my own original explanations and simplifying diagrams, plus insight based on my own work with RL systems.


AlphaXos: Deep Reinforcement Learning with Self-Play

Github Repository

Robin Chauhan , Pathway Intelligence
April 11, 2018

Exploration of self-play principles, creating custom RL environments, deep learning network design for RL. The first version uses DQN to explore the behaviour of Q-learning in board games and multi-agent / self-play systems.

My intent was to gain basic insight into the problem and solution space of Google Deepmind’s AlphaZero. This project illuminated the similarities and differences between Go vs simpler board games with smaller branching factors, between AlphaZero vs other RL algorithms that do not involve planning phases, and different ways of handling self-play.

After this project, I was much better able to appreciate the rationale for many of AlphaZero’s design decisions.

Screenshot from 2018-12-02 21-17-00.png