Crypto Portfolio Management

Joanes Espanol
amberdata
Published in
8 min readAug 12, 2019

Stay Calm, and DRY-ON (Don’t Run Your Own Node)

Today, I want to talk about and dig more into the services and the data provided by Amberdata, and more specifically the types of aggregations accessible via their API, and what makes these aggregations so powerful.

For this exercise, we will aim at providing financial information about an Ethereum account — think about it as your bank account information, but for crypto-currencies. At the very minimum, we will want to know:

  • current account and assets/tokens balances
  • the whole history of transactions and transfers of value for each

With public blockchains, all information is accessible and transparent but, not all information is easy or quick or cheap to actually retrieve. Blockchain data models and data structures were optimized for storage, point to point transactions, enforcing consensus, etc, but not for search and random retrievals*.

If you are familiar with the Ethereum data models, and how difficult it can be to retrieve some of this information, feel free to jump to the API Access section — otherwise, sit tight, we will take a quick look behind the curtains.

Account Balances

Back to the task at hand: given the address of an account, how would you retrieve all the Ether transfers to or out of the account, in order to show the history of transactions and changes? Things get complicated very quickly:

  • Simple Ether transfers from EOA (Externally Own Accounts or wallets) to EOA is straight forward, as those are directly included in the transaction data
  • But there is no direct way to know which transaction(s) our account was involved in, so we would have to replay all transactions from the beginning of time to figure out which ones participated in the balance of our account
  • Nor is there a way to directly access which blocks a miner confirmed and the amount of Ether it was rewarded with — these rewards are not stored on the blockchain as their own transaction like Bitcoin for example (nor is the value of the reward)
  • For Ether transfers from or to a contract, these operations are not recorded in the transaction data — we could look at contract events, but not all contracts emit them, and so these transfers would not be indexed and searchable in the first place
  • We now have to look at internal transactions (aka internal messages) to even know about these contract operations — but internal messages are not standardized across Ethereum clients, so you might get different results depending on how you extract them
  • Any other ideas? Drop me a note if you do!

The list above is not meant to be exhaustive, but is instead a small illustration of how difficult it can be to:

  • just retrieve this information — even though all of it is publicly available!
  • ensure that it is exhaustive — we wouldn’t want to miss an Ether transfer
  • not to mention archive it and make it available for every single account in the Ethereum network (54,859,325 EOAs and 17,504,474 contracts at the time of writing).

The only reliable source of truth, when it comes down to account balances, is the Ethereum node itself, via the getBalance(…) function call. Amberdata’s approach is to call getBalance(…) for all addresses of all transactions of every block, and record any changes from the previous state (i.e. previous block in this case), thus ensuring accurate and complete history of changes.** Note that this requires a full archive node, which at the time of writing uses just above 3 TB of disk space to hold all the different state changes.

Full Archive Node disk size over time

Token transfers & balances

A big part and use case of the Ethereum network is in its (ERC20) tokens. There are different types of tokens, used for different purposes, but at the very basic level, they hold a certain amount of value and they can be bought or sold. Which makes them a valuable trading asset, with their own historical prices, historical balances, etc.

The problem and solution here is the same as with transfers of Ether: the data is difficult to access and the only reliable source of truth is the Ethereum node itself, and the contract backing up the token. The only difference is in calling the balanceOf(…) function on the token contract for a given address (and block), instead of calling the getBalance(…) function mentioned above.

Side note: if you are not familiar with it, I encourage you to take a look at the on-chain BalanceChecker contract from DeltaBalances, which exposes two functions to retrieve a batch of account or token balances at once (note that you would need a full archive node in order to get at the historical balances here as well).

API Access

Thanks to Amberdata’s API, all these hassles have been taken care of already, and the information is consumable at a fraction of the time and cost it would take to do it “manually”:

Things that we will need are minimal:

brew install curl
brew install jq

(if you do not know jq, I highly recommend you give it a spin, you will wonder how you have survived without it until today!)

First let’s create a helper function that will ease our interaction with the API (you will be prompted to enter your API Key, available here):

read -p 'API Key: ' X_API_KEY && amberdata() {
local __URL_PATH=$1

[[ ${__URL_PATH} =~ ^/.* ]] || __URL_PATH="/${__URL_PATH}"

>&2 echo "curl -X \"GET\" -H \"x-api-key: ${X_API_KEY}\" \"https://web3api.io/api/v1${__URL_PATH}\""
>&2 echo ""

curl \
-X "GET" \
-H "x-api-key: ${X_API_KEY}" \
"https://web3api.io/api/v1${__URL_PATH}" \
2> /dev/null
}

And now we can easily call the API directly. The few examples below allow us to test that everything is configured properly and help us to get familiar with the different endpoints:

# Get a specific block
amberdata blocks/0 | jq -Cr .
# Get the most current block
amberdata blocks/latest | jq -Cr .
# Get a block in the future
amberdata blocks/8500000 | jq -Cr .
# Get the most current block and all the validation information
amberdata "blocks/latest?validationMethod=full" | jq -Cr .
# Get all transactions of a block
amberdata "blocks/7280000/transactions" | jq -Cr .

These were simple examples, the power of the API is the aggregated time series:

Let’s take a closer look.

Historical account balances:

# First define the wallet address of interest
WALLET_ADDRESS=0x3f5ce5fbfe3e9af3971dd833d26ba9b5c936f0be
# Then retrieve the historical balances
amberdata "addresses/${WALLET_ADDRESS}/account-balances/historical
# Now add historical Ether prices
amberdata "addresses/${WALLET_ADDRESS}/account-balances/historical?includePrice=true"
# And finally some jq and awk magic to convert to csv
amberdata "addresses/${WALLET_ADDRESS}/account-balances/historical?includePrice=true" \
| jq -r '.payload.data[] | [.[2], .[5]] | @csv' \
| gawk -F"," "{OFS=\",\"; \$1=strftime(\"%Y-%m-%dT%H:%M:%S+0000\", \$1 / 1000); print \$0}"

Plug the generated CSV data into Google Spreadsheet, and we get the following chart:

This particular account has seen its balance oscillate between $16.5 million and $17.25 million over the past 30 days!

Historical token balances:

For token balances, we can achieve very similar results in a few API calls:

# Retrieve top 20 tokens held in a specific wallet
amberdata "addresses/${WALLET_ADDRESS}/tokens?size=20" \
| jq -r '.payload.records[] | [.address, .symbol, .amount, .decimals] | @csv' \
| tr -d '\'r | tr -d '"' \
| awk -F"," '{OFS=","; $3 = $3 / 10 ** $4; print $0}' \
| tail -n +2 \
> tokens.csv
# Retrieve balances for each token
for line in $(<tokens.csv); do
IFS="," read address symbol amount decimals <<< "${line}"
amberdata "tokens/${address}/holders/historical?holderAddresses=${WALLET_ADDRESS}&timeFrame=30d&timeFormat=iso" \
| jq -r ".payload.data[] | [\"${symbol}\", .timestamp, .\"${WALLET_ADDRESS}\"] | @csv" \
| gawk -F"," "{OFS=\",\"; \$2=strftime(\"%Y-%m-%dT%H:%M:%S+0000\", \$2 / 1000); \$3 = \$3 / 10 ** ${decimals}; print \$0}" \
| tr -d '"'
done > token_balances.csv
# Transpose / Pivot on the token address
awk -v FS="," '{a[$2,$1]=$3; count[$2]; indic[$1]} END {for (j in indic) printf "%s ", j; printf "\n"; for (i in count) {printf "%s ", i; for (j in indic) printf "%s ", a[i,j]; printf "\n"}}' token_balances.csv \
| tr ' ' ',' \
| gawk -F"," 'NR<=1{1}{ for (i=1;i<=NF;++i) { if ($i != "" && NR > 1) a[i] = $i; printf "%s,", a[i]?a[i]:$i } printf RS }'

Send the generated data to Google Spreadsheet, and we get a quick glance at the evolution of this particular token portfolio over time, any particular holding patterns (Thunder Token TT) or the big buy & sell events (Pundi X token NPXS) for example:

Holders and historical circulating & total supplies:

# First define the token of interest
TOKEN_ADDRESS="0xb8c77482e45f1f44de1745f52c74426c631bdd52"

# Set a first date: 2017-07-01
startDate=$(date -u -j -f "%Y%m%d%H%M%S" "20170701000000" "+%s")
lastDate=$(date +%s)

# Retrieve all historical values

(
echo timestamp,numHolders,circulatingSupply,totalSupply

while [ $startDate -lt $lastDate ]; do
endDate=$(date -r $startDate -v+3m +%s)
[ $endDate -gt $lastDate ] && endDate=$lastDate
amberdata "tokens/${TOKEN_ADDRESS}/supplies/historical?startDate=$startDate&endDate=$endDate" | jq -r '.payload.data[] | @csv'
startDate=$endDate
done | gawk -F"," "{OFS=\",\"; \$1=strftime(\"%Y-%m-%dT%H:%M:%S+0000\", \$1 / 1000); print \$0}"
) > supplies.csv

Final thoughts

I hope you have enjoyed the ride. As we saw, it is very easy and very quick with Amberdata’s API to retrieve a lot of information in just one or two calls.

From here on, the possibilities are endless :) As always, we are very interested in your comments, feedback & thoughts, and we are looking forward to what you will build on top of our APIs. Don’t hesitate to reach out!

* Yes, it is true that Ethereum has implemented some very neat tricks with bloom filters to enable global search, but I would argue that: it is very limited in its use cases, and still takes quite some time to run over the whole history of blocks & transactions — not something that can be used in a real-time setting.

** To be more specific, getBalance(…) is not evaluated on each known Ethereum address at every block — instead getBalance(…) is evaluated on all addresses that are involved in the current block, by looking at the to and from addresses of the transaction and internal messages.

--

--