Revisiting Rainbow: Promoting More Insightful and Inclusive Deep Reinforcement Learning Research

Deeplink

Published in

Deeplink Labs

3 min readJul 15, 2021

Workshop Participants: William, Stephanie, and Gavin

What’s the paper about?

Project by the Brain Team at Google Research.
The Paper looks at reinforcement learning methodologies and their evaluation of control environments such as Atari 2600 games.
The purpose of these environments is to assess performance.
However, while new methodologies raise the bar, they often require high computational costs, which widens the gap between those who can conduct these experiments.
Put forward the argument that small-scale environments can compete with large-scale environments.

Why?

Benchmark tasks in the field are generally large-scale and require a huge amount of computation power. And anyone who wants to be accepted by reputable conferences needs to do the experiment at least in one of the benchmark tasks.

Most individuals, especially newcomers, who want a taste of deep learning own little computational resources.

Lower computational cost for a model could allow the experiments on this model to cover a wider range of hyperparameters. Since neural networks usually work as a black box with little explainability, this will help the investigation of the model’s behavior.

Lower computational costs allow more independent trials, which will produce tighter and more reliable confidence intervals. Ensuring statistical significance is particularly important when comparing with other algorithms.

How?

Rainbow Algorithm, It takes roughly five days to train a game from the ALE benchmark and requires specialized hardware. Eg. NVIDIA Tesla P100 GPU.
DQN Algorithm is a reinforcement learning algorithm that combines Q-Learning with deep neural networks to work in high-dimensional environments such as video games.
The authors decided to add components to the DQN, such as double Q-learning, prioritized experience replay, dueling networks, multi-step learning, distributional RL, and noisy nets.
multilayer perceptrons (MLPs) with two layers of 512 units each
Re ran experiments on the MinAtar environment, which consists of a set of five miniaturized Atari games. It’s a low-cost, small-scale environment and is ten times faster to train than in the original environment. Using simpler loss functions, they were able to get similar performance scores.

Vault: Fast Bootstrapping for Cryptocurrencies

What?

This paper from MIT CSAIL describes a cryptocurrency (Vault) that minimizes storage and bootstrapping costs for participants. It uses Algorand’s proof-of-stake consensus protocol.

Why?

As blockchains such as Ethereum or Bitcoin grow, the storage cost of on-chain data grows at a rate that will soon, if it hasn’t already, exclude many parties from participating in the protocol. If Ethereum wants to be the global computer it strives to be and compete with the likes of AWS and Google cloud computing, it needs to store enormous amounts of data. The vault was designed to overcome this problem by reducing the data that must be stored and downloaded to participate.

How?

Vault employs three methodologies to reduce storage and bootstrapping costs. First, it decouples the storage of recent transactions from the storage of account balances. Each transaction is valid for a window of time, meaning nodes must only keep track of recent transactions and not all transactions to ensure no double-spending. This also means account-balance states are not tied to past transactions, and zero-balance accounts can be safely evicted.

Second, Vault uses an “adaptive sharding scheme” that utilizes three properties:

Sharding account state across nodes.
A Merkle tree to store account balances allows all transactions to be validated by all nodes.
Caching of the Merkle tree’s upper layers means the bandwidth cost of transferring Merkel proofs grows gradually with the number of accounts.

Finally, Vault introduces stamping certificates as a method of validating blocks for new users.

Does it work?

Results from a prototype of this protocol show that its storage and bootstrapping costs were 477MB for 500 million transactions compared to 5GB and 143GB for Ethereum and Bitcoin, respectively.

Can this be used with Ethereum?

Sharding isn’t a new technique in computer science and is going to be incorporated with the recent upgrades to Ethereum. At the moment, each node connected to the EVM stores the entire transaction history. Shard chains are coming to Ethereum but not till 2022, at least. These shards will give Ethereum more capacity to store and access data, but they won’t be used for executing code. If someone wants to utilize the EVM as a decentralized computer to run ML training, sharding won’t yet benefit these users.

Deeplink Labs

Revisiting Rainbow: Promoting More Insightful and Inclusive Deep Reinforcement Learning Research

What’s the paper about?

Why?

How?

Vault: Fast Bootstrapping for Cryptocurrencies

What?

Why?

How?

Does it work?

Can this be used with Ethereum?

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Deeplink Labs

Written by Deeplink

No responses yet