Analysing 1.2M Ethereum contracts in 20 seconds using Eveem and BigQuery

Last month I released an API to Eveem.org, which allows easy static analysis of the Ethereum contracts.

If you go to:

http://eveem.org/code/0x06012c8cf97bead5deae237070f9587f8e7a266d.json

You will find a JSON file there, with all the functions decompiled into an intermediate language that is relatively easy to analyze. You can build, for example, a tool like https://showme-1389.appspot.com/ , that shows you some fun facts about smart contracts.

There is one more cool thing about this intermediate representation — it makes security analysis of contracts quite easy and fast.

And, since all the contracts are decompiled already, it is possible to do it in bulk.

Originally, I was doing that from a console, using a python script that just went through all the contracts one by one. It was convenient for me, but didn’t allow other people to play with the tech — accessing 1.2M contracts would be crazy slow if being done through the API.

Fortunately, during EthSingapore, Allen Day from Google introduced me to BigQuery, that you may know from their recent announcements about Ethereum datasets.

The cool thing about BigQuery — it’s a database that was designed for a sequential access to the data, and bulk operations on it. In other words — if you have a dataset there, BigQuery won’t be super-good at just fetching random rows, but it will be amazing if you want to access all the rows, and do some operations in them.

Even cooler — it has a support for UDFs (user defined functions), but not just regular UDFs. BigQuery has a full support for Javascript (and by extension — WASM), so if you write a script to do some data analytics, it’s trivially easy to run it on the whole dataset in a matter of seconds.

How trivially, you ask?

It took me just a few hours to upload the dataset of all the decompiled contracts into BigQuery, and write Asterix — an example script that looks for all the open self-destructs in all the decompiled contracts.

How fast, you ask?

It takes 23 seconds to run a query that returns all the contracts matching a given pattern — in case of the open self-destructs it finds around 700 contracts active on the mainnet right now that anyone can kill at any moment. Additional 500 contracts that perhaps can also be killed, but some conditions apply.

So, what’s next?

I have made the dataset public:

And Asterix is fully open source here:

https://github.com/kolinko/asterix

If anyone’s interested — give it a try! You can begin by going through the BigQuery quickstarts first, which will take around 1–2 hours. After you get familiar with them, fetch Asterix and run it to get the whole list of contracts with self-destructs.

If you’re interested in some deeper analytics, check out the showme.js file. You will also need to get yourself familiar with the intermediate language Eveem provides — the showme python demo should be helpful there.

Closing thoughts

Eveem is still a work in progress, and misses a lot of things — many contracts and functions don’t decompile fully. There are also some things that it misses. For example in case of safesub library, it will often not show that there was a check done before subtraction, so you’ll get a ton of false positives there.

To be frank — this is the most inspiring and scary tool I have ever written. Given a vulnerability, one can write a scanner similar to the one you’d do with Mythril or Manticore, and immediately get a list of all the active contracts with that vulnerability.

It also makes one wonder how to do reasonable exploit disclosure in this case — obviously, finding contracts that have any significant ether or traffic attached to them, and then finding their owners seems the right thing to do.

There are also other potential interesting uses for the tool — figuring out patterns in the deployed contracts. If you were ever curious how many contracts use a certain programming pattern (like upgradeability, or managing an arbitrage between the exchanges, or a certain kind of a loop), this is probably the first tool that will allow you to figure it out so easily.

Another interesting thing that could be built on top of this: it’s now trivially easy to figure out which other contracts does a contract call. One could write a script that makes static analysis of all the connections between them. Sites like Bloxy.info already do this dynamically, by analysing transactions, but nobody seems to have done this statically. I wonder what would come out of it.

Have fun! :)