AI Economist sums up Communism

Published in

Vitalify Asia

7 min readFeb 24, 2021

AI Economist is a project which aims to research the dynamics of economics by using Reinforcement learning (RL).

After quick introduction of AI Economist, I will show my some experiments which evaluate some scenarios such as Communism and etc.

What’s the AI Economist?

What’s the AI Economist?, it’s well described on their blog post. So, I would like to focus on explaining it, from the perspective of training RL agents.

The authors have published Gym-style API. Based on the API, we can train RL agents as well as CartPole and other RL environments. The API is configurable, so that we can add our own gaming logics in it to represent our own scenario.

We need to understand the game rules, observations, reward and actions for RL. It’s a good idea to reproduce free market scenario to understand them as a first step.

Game rules

The rule itself is customizable, but we focus on the rule used in free market scenario at this time.

Subjects and Objectives

There are two kinds of subjects in the game.

Planner (Government)
4 economic players

In the free market scenario, Planner does not appear, because there’s no taxation. So, 4 economic players just work to achieve their own objectives. Their objective is to maximize their own utilities (happiness).

The utility is a function of Labor and Coin which belongs to the player. More labor makes the player less happy. Its effect is linear. On the other hand, more coin makes the player happier, but its effect decays as the coin increases. It follows isoelastic function.

Entity

Entity represents the objects which appear in the world. Some of them are visible, but others are not.

Labor
SourceBlock
House
Water
Wood / Stone
Coin

Labor is the internal state of the player. When players do some jobs, the labor will be increased. The labor is not visible from other players and planner.

SourceBlock is a block which generates Wood or Stone. It’s invisible. House can be built by a player. Other players can not go through it. Water is a block in which no one go through.

Wood and Stone are collectable resources which appear on SourceBlock. Coin can be generated in building a house. Thus, it’s a measurement of how much value were generated in the world.

Components

Here is the components used in free market. They are basic components, so they are usually used in other scenarios too.

Build
ContinuousDoubleAuction
Gather
PeriodicBracketTax

Build component allows players to build houses. When a player builds a house, then…

Player gets about 11 ~ 22 coin. It depends on the skill of the player.
Player gets 10 labor.
Player loses 1 wood and 1 stone.

ContinuousDoubleAuction component allows players to exchange their resources with coin in auction system.

Player can ask or bid up to 10 coin for 1 resource (Wood or Stone).
Player gets 0.25 labor, when it asks or bids.

Gather component allows players to move and collect resources.

Player gets 1 labor when it moves.
Player gets 1 labor when it collects a resource.

PeriodicBracketTax component introduces taxation. It’s not used in free market. If tax rate is set to 0 in all tax brackets, it’s equivalent with free market.

Tax is applied for each 100 steps.
Tax rate can be set independently in 7 different brackets (ranges of the income).
Collected coin will be evenly distributed.

Players start playing with 10 coin and different skills. The starting position is fixed to the corners on the map. The total number of steps is 1000.

Observations, reward and actions

The definition of the observation space for players is available here. It contains spacial information, player’s skill and the information about auction.

The action space is 50 sized discrete space. The action mask in observation helps us to know which actions are allowed.

Experiments

I have tried 4 different scenarios.

Free Market (It’s just for trying reproduction of the paper.)
Communism
Machine
Dystopia

The result about free market is described in the original paper. My result is almost same with theirs. So, I skip it at this time.

The 2nd scenario is Communism. Though I call it as Communism, it would not be the Communism in strict meaning. I have just set the tax rate as 1 in all tax brackets. The ContinuousDoubleAuction is still available. Anyway, the point here is to know what happens when I try to maximize equality.

In the 3rd scenario, I have customized the reward of the players. The players do not try to maximize their own utility anymore. Instead, they try to maximize social productivity. Thus, I can estimate how much the max productivity is likely to be in RL. Furthermore, I can know how it can be achieved.

The 4th scenario is similar with 3rd one, but the player’s objective is to maximize “Social productivity times Equality” . It’s interesting to know how the term of the Equality affects.

Results

Here is the summary of all 4 experiments.

Let’s take a look at each results closely.

Communism

It looks really boring world. Movement, trading and building, they are all inactive. Though this experiment is not enough to conclude what the cause is, it would be consistent with the well known explanation about the failure of the communism.

The motivation of the highest skilled player was lost. This player does not build houses so much anymore, because the gain (Coin) is not worth with the labor. As a result, other players stop collecting resources, because no one buys them.

Machine

It would be close to our expected result. The most skillful player focuses on building houses, while the other players collect resources. The other players also build houses, probably, when there are surplus resources for the most skillful player.

One interesting observation is the initial market price of the wood and stone. They are exceptionally cheaper than others. It helps the skillful player to build a house right after starting. After that the price becomes high, but I’m not sure about the reason (any idea?).

Dystopia

The productivity times equality is 2690. It’s better than the 3rd scenario (2044). The number of the houses built by the players is similar with the 3rd scenario. So, the question is How did they achieve high equality?

When I take a closer look at the results, I notice that the Agent 0 have bought stones at very low price (4.03). Those stones are from Agent 2 and Agent 3. That’s to say, the Agent 0 is a reseller. This is the main method for Agent 0 to get coin. Each agent collaborates in this way and keeps the high equality.

Summary

In this post, I have tried 3 original scenarios. Though they are convenient scenarios which do not train Planner, they have shown some interesting insights. RL seems to work pretty well regardless of the different scenarios.

As the AI Economist is highly customizable, it brings us many ideas to try. For example,…

Communism achieved by Planner
Adversarial players (e.g. maximize the margin of total profit)

AI Economist team collaborates with the expert of economists. I look forward to seeing them to bring us many interesting ideas and results.

All source codes are available in my repository.