Benchmarks

Reinforcement learning is a developing field, and so far it has been particularly difficult to reproduce some of theresults published in the original papers. Some reasons for this are:

  • Reinforcement learning algorithms are notoriously known as having an unstable learning process.The data the neural networks trains on is dynamic, and depends on the random seed defined for the environment.

  • Reinforcement learning algorithms have many moving parts. For some environments and agents, there are many“tricks” which are needed to get the exact behavior the paper authors had seen. Also, there are a lot ofhyper-parameters to set.

In order for a reinforcement learning implementation to be useful for research or for data science, it must beshown that it achieves the expected behavior. For this reason, we collected a set of benchmark results from mostof the algorithms implemented in Coach. The algorithms were tested on a subset of the same environments that wereused in the original papers, and with multiple seed for each environment.Additionally, Coach uses some strict testing mechanisms to try and make sure the results we show for thesebenchmarks stay intact as Coach continues to develop.

To see the benchmark results, please visit thefollowing GitHub page.