Reinforcement Learning in Token Engineering

See a video presentation of this article here:

https://www.youtube.com/watch?v=52knyLvErIY&feature=youtu.be

Prepared by Shawn Anderson for the Ocean Protocol study group, hosted by the Token Engineering Community.

Google Deepmind's AlphaZero Plays Starcraft2 Professionally.

Introduction

I have this theory that we should be able to combine two powerful frameworks that each revolve around agent-based modelling. One from the field of general artificial intelligence, and one from the domain of token engineering. Firstly,Stable Baselines, the result of decades of research, open-source development, and the gumption of Elon Musk, is a high-level interface to well established and tested reinforcement learning algorithms that come with stable hyper-parameter tunings. Secondly, hailing from the domain of rigorous verification engineering, and CAD, the mind-boggling agent-based economic simulator and EVM interface TokenSpice2 made by Trent McConaghy for modeling the OceanProtocol ecosystem. I believe that these two frameworks can be combined for developing, testing, and deploying intelligent agent-based economic networks. If my theory is correct, then, watch out world, the impacts of this technology could be as profound as the Bitcoin Whitepaper itself. Satoshi enabled web-based economies. We are enabling AI-based economies.

The field of AI advances in tandem with the sophistication of testing simulations that are framed for it. From Chess, to Atari, from Go, to Starcraft, the algorithms capacity to learn tends to fit the complexity of the simulation in which it is tested. Standardized problem sets, or games, lead to re-producible results, and thus the scientific method can be applied to advance the field and disseminating working examples. The reason that tokenspice2 is so profound is because of it's capacity to be a next generation sandbox for AI research in economics. Just like Starcraft, it is an incredibly complex and meaningful environment that has quantifiable outcomes: in Starcraft, "did we win the game?", in economics, "did we make more money?".

Since this talk is directed to a study group of token engineers, a background in token engineering is not the focus. I'll give more background on reinforcement learning, since that is the lesser known subject, and to be honest, it's my favourite subject of all.

Silver. https://www.davidsilver.uk/wp-content/uploads/2020/03/intro_RL.pdf

Deep Reinforcement Learning

Reinforcement Learning is as profound as it is powerful. There is so much to be learned about the human experience and the human mind from studying reinforcement learning. For example, a fundamental topic of RL is the exploration, exploitation trade-off. Exploration is trying new things, taking risks, making discoveries. Exploitation is utilizing the information that we have, using our understanding of the world to make the right decisions.

Another example is the case of constant negative reward, this was shown to be an essential piece of the recipe in basic RL tasks such as maze solving. If an agent does not have constant negative reward, then they have no incentive finish the game, this speaks to the fundamental insight that came to Buddha, as The First Noble Truth Duhkha, Life is Suffering.