Reinforcement Learning

Pathway Intelligence believes that Reinforcement Learning, the subfield of Artificial Intelligence concerned with intelligent agents learning optimal action, is a watershed technology which will ultimately transform the economy, politics, health care, transportation, education, and most other fields of human endeavour.

We are witnessing this in realtime: RL is currently graduating from solving toy problems, to solving real problems.

With expertise in top existing RL frameworks, a myriad of component technologies that can complement RL, and direct experience with contemporary RL theory and applications, Pathway Intelligence is well positioned at the forefront of this revolution.

Why focus on Reinforcement Learning?

In their 2018 analysis of Artificial Intelligence trends, McKinsey rightly identified Reinforcement Learning as arguably the most complex/advanced branch of Artificial Intelligence, and also highly likely to be used in future AI applications.

To see why this is, consider that RL is specifically designed for Action. By comparison, other branches of Machine Learning are all about passive outputs. Only by combining them with RL can their classifications and predictions affect the real world over time.

Reinforcement Learning can combine the best of other kinds of machine learning and AI (unsupervised learning and supervised learning, including deep learning, bayesian methods, plus classical algorithms and data structures) to solve difficult problems requiring intelligent behaviour:

  • learned control

  • learned intelligent action, co-ordination, planning

  • learned tactics

  • learned strategy



Aug 2019

Pathway is glad to sponsor TalkRL Podcast!

TalkRL Podcast is all Reinforcement Learning, all the time. In-depth interviews with brilliant people at the forefront of RL research and practice.


Robin Chauhan , Pathway Intelligence
Nov, 2018

Submission to 2018 NeurIPS Pommerman Deep Reinforcement Learning Competition. I used Tensorflow, Keras, and a modified version of OpenAI Baselines to solve this challenging problem. The network design was largely based on a design from Ross Wightman.

For this project I explored many RL frameworks (including using Google’s Dopamine), gained strong experience on practical deep learning performance / throughput / realtime AI, and produced original solutions combining supervised learning + model-based RL + model-free RL.

I attended NeurIPS 2018 in Montreal, met with and learned from top researchers and practitioners in reinforcement learning from around the world.


OpenAI Five DOTA 2 agent teardown

Robin Chauhan , Pathway Intelligence
Sept 2018
Simon Fraser University VentureLabs in Harbourfront Center, Vancouver BC Canada

Part of a set of lightening talks. My talk starts at about 58:00 below, the others are worth checking out.

I present a combination of OpenAI’s content plus original analysis and explanations. Having done this careful study of OpenAI Five, it has been influential on how I now look at RL problems.


Intro to Reinforcement Learning + Deep Q-Networks

Robin Chauhan , Pathway Intelligence
Jun 14, 2018
HiVE, Gastown, Vancouver BC Canada

These two talks were combined 3 hours of content, designed for a technical audience who is not familiar with Reinforcement Learning.

After a broad overview of RL, I focused on the fundamentals of Q-learning, and then Deep Q Networks, the seminal research from Deepmind that ignited the field of Deep Reinforcement Learning. I then proceed to explain Rainbow DQN, the current State of the Art in value-function-based agents at time of writing.

I combine well-cited authoritative sources, the best of content from research papers and the internet, my own original explanations and simplifying diagrams, plus insight based on my own work with RL systems.


AlphaXos: Deep Reinforcement Learning with Self-Play

Github Repository

Robin Chauhan , Pathway Intelligence
April 11, 2018

Exploration of self-play principles, creating custom RL environments, deep learning network design for RL. The initial version uses DQN to explore the behaviour of Q-learning in board games and multi-agent / self-play systems.

The intent was to gain basic insight into the problem and solution space of Google Deepmind’s AlphaZero. This project illuminated the similarities and differences between Go vs simpler board games with smaller branching factors, between AlphaZero vs other RL algorithms that do not involve planning phases, ways of handling self-play, input representations, reward function design, and how ideas from game theory can apply to Multi-Agent RL.

After this project, I was much better able to appreciate the rationale for many of AlphaZero’s design decisions.

Screenshot from 2018-12-02 21-17-00.png