DeepBrew is Live!

Connecting off-chain deep learning to on-chain Ethereum

Ashton Hettiarachi
5 min readJan 23, 2023

Background of the project

DeepBrew is blockchain-native adaptation of the classic ‘The Beer Game’, a macroeconomic supply chain scenario, to demonstrate a proof of concept to Deeplink’s core vision — combining off-chain machine learning into on-chain environments e.g. smart contracts.

The research project aimed at progressing the synthesis of machine learning and blockchain technologies. In particular DeepBrew involves the augmentation of Ethereum smart contracts with deep reinforcement learning agency to enable intelligence and dynamism beyond the scope of traditional contracts while still maintaining the decentralization and security aspects of the EVM. Later in this article, you will be able to see the methodologies and the workflow we have used to execute the off-to-on-chain reinforcement learning systems.

In this article and thesis, we initially talked about bringing machine learning systems in smart contracts and the blockchain. We shared how the game runs entirely off-chain, and exists as a set of Python scripts which interface with a private Ethereum testnet via Web3.py (the ERC20 BEER token however is deployed as a Solidity smart contract to the testnet).

The last version was done so by creating an API which broadcasts the model’s outputs, and an oracle, which takes these outputs for use in Solidity contracts.

This game functions by displaying Beer game deep learning training results and interactions between on-chain and off-chain and does not have any commercial application or utility.

Live Performance

Watch the game unfold on the Goerli Ethereum testnet in real-time, as agents swap BEER for CASH tokens in accordance with instructions from the deep reinforcement learning environment and model keeper script.

Link to DeepBrew Demo page

You will also be able to observe the game’s transactions pulled directly from the Goerli ledger.

Architecture

DeepBrew has been devised to provide a dynamic optimisation problem involving transactions and the management of complex systems. A soft actor-critic deep reinforcement learning algorithm is then trained against rule-based agents. Finally, variables from this model and environment are passed on to prompt the execution of an Ethereum smart contract via an oracle, demonstrating an intelligent ‘on-chain agent’.

The following architecture outlines the framework for connecting off-chain machine learning to on-chain smart contract execution.

Deep Reinforcement Learning Model (Soft Actor-critic)

The selected model for optimizing The Beer Game is a relatively lightweight soft actor critic deep Q-learning model, a type of reinforcement learning model which builds on the traditional Q-learning actor-critic framework of policy adjustment via Q-functions by making two estimates for Q-values in an effeort to avoid overvaluing rewards. The model is separated into an actor and critic, the actor makes actions in the space, and the critic evaluates the effectiveness of those actions. This is done via clipped double Q-learning which takes the minimum of two Q-value estimates using the following function:

On-chain Machine Learning Workflow

This project was undertaken largely in order to develop a methodology for the implementation of more practical off-to-on-chain reinforcement learning systems. This workflow can be broken down into following steps:

1. Recreate your Web3 problem as accurately as possible in a local testnet such as Ganache

  • Local testnets are recommended for this stage as they run dramatically faster than public testnets, and have admin functionality
  • This may be less daunting of a task than it sounds, as the code on which Web3 ecosystem reside is publicly available

2. Convert this problem into a reinforcement learning environment

3. Train a reinforcement learning model to optimize your problem

4. Deploy your system onto a public testnet such as Goerli for bug testing purposes

  • Sending transactions on a public blockchain is more involved than on a local one
  • This will also give you a sense for the real-world execution speeds you can expect

5. Once this model performs satisfactorily, leverage transfer learning techniques to deploy your model onto the mainnet

  • Transfer learning allows the model to begin training from where the prototypes left off, rather than deploying a randomly acting agent onto the mainnet with real funds
  • It is strongly advised that safety measures such as spending limits are put in place to keep the model from doing anything extreme

Applications and future work

The applications of this technique are broad in scope, and can be applied to any problem in which dynamism and intelligence would benefit a Web3-based use-case. Some examples of these applications include but are not limited to:

  • DeFi Capital Efficiency
  • On-chain Algorithms
  • On-chain Credit Scores
  • Artificially Intelligent Smart Order Routing
  • DEX Aggregation

What’s next?

Stay tuned for ETA X V1.

Eta X is an open-source initiative to build an agnostic decentralized exchange (DEX) aggregator and price discovery engine/smart order router (SOR). It has been designed from first principles to be universal, scalable, and unbiased via an adaptable method of reverse engineering the conservation functions employed in the automated market maker (AMM) algorithms of DEX liquidity pools.

Originally published at https://medium.com on January 23, 2023.

--

--

Ashton Hettiarachi
Ashton Hettiarachi

Written by Ashton Hettiarachi

Founder and Architect @Openmesh & @OpenxAI. Previously head of Innovation @FantomFDN (@SonicLabs) Contact: linktr.ee/ashtonhe Telegram: @AshtonH

No responses yet