MolochDAO Interview with Antoine Toulme of Apache Tuweni
Grantee Interview Series
Antoine Toulme, of Apache Tuweni, met with Really Boring Guild over Zoom. His shirt displayed a silhouette of The Statue of Liberty and behind him hung a California State Flag.
As he spoke, he folded his hands beneath his chin, between moments of gesturing with them. Towards the end of the conversation, he smiled while he challenged himself with reflections about Apache Tuweni.
ReallyBoringGuild: To get started, how do I pronounce the name of the project?
Antoine Toulme: That’s a good one; it depends. The word is from the Lakota language. It means, at the same time, never and no one, from my research. The name of the project was a long process. It took four months to get the name going because Apache has so many names. How we say it is “Ap-a-chee Twenty”, but if you were actually from the Lakota language, you would probably say “Ap-a-chee Duweni” with a D.
RBG: Could you share with us an overview of Apache Tuweni and tell us a little bit more about the project?
AT: Sure. This project was started as a ConsenSys project, back in 2018. It’s a set of libraries to help people develop for blockchain because we found out that, at least in Java and the jvm world, there wasn’t a good way to do the law plan for blockchain. There are a lot of primitives that were not there and a couple of things that were missing, in particular, anything to do with the base level activity for consensus and anything for crypto as well.
We built the whole network layer and developed that as libraries. Today this project is used by Hyperledger Besu and also Teku. It’s using a very critical base layer where they’re using that to do all the computation, for all the consensus. For all the proofs that you’re doing, everything’s based on those libraries.
The project didn’t stop there, because I was trying to get even more traction around it. I need this to be an open-source project. There’s a whole bit on that because you know the Apache software foundation makes it so that you don’t own the project. It’s owned by the Apache software foundation. Also, commentorship is not the Apache employer, it becomes part of the project itself. I made a whole presentation on that at one of the meetups with the Chainsafe Toronto open source project. To do that, it became clear that we couldn’t just do it like the commons, we had to actually go and build up.
We started building out all sorts of interesting things, like an EVM, which is still in progress. We built an RPC proxy that was actually paid for and sponsored by MolochDAO. I’ve done other things, like a Secure Scuttlebutt server, also some interesting mid-work libraries.
The most advanced things we have are all around the discovery layer of Ethereum. We have a way to track all the nodes on the network. We did that for the Ethereum Foundation. You build a crawler that allows you to see all the nodes on the network, connect with them, get all the stats, shove the deal into the database and keep doing that as often and as necessary to get a timeline of all the nodes on the network.
RBG: What is your background and how did you begin working on Apache Tuweni?
AT: I’ve been in startups for over 12 years. I started my first job in Silicon Valley, straight out of the university. I worked at about four startups and Acer was the last job I had before transitioning to blockchain. At Acer, we were building IOT and IOT for the cloud.
I guess you have to be in Silicon Valley to get the feeling that I got at that time, which was that you could do a start-up (I built a startup and got to learn about how VCs work), or you could be working at one of the big four. Even if your startup works really well, you’ll go work for the big four when they acquire you. That was kind of the feeling, around 2017. We felt like the market was a little rigged.
My network was mostly in Silicon Valley and hard to sustain because there’s so much competition and greed and all those things going on. I got a cold email from a guy called Dan Heyman, who was at Consensys at the time and was looking for Java developers to go work on blockchain. I said, well why not, my alternative was to go be a manager at Google. So I got the job.
I worked for Consensys from December 2017 to May 2019. I got to learn a bunch of stuff, meet a whole lot of different people from all walks of life, within crypto, and really get an impact in that sense.
The interesting portion for me is that most people who work in crypto are young. I was walking around like here are all the lessons that I learned from working in open source. There are a lot of things that you need to think about, like how you want to structure the projects, who are mindful of contributors, and all that. It turned out to be an asset that I used to kind of foster adoption of what I was doing.
This is what we’re trying to achieve with Apache Tuweni. It’s a concentration of efforts to make this a very complete digitized software stack that you can build on top of and around blockchain.
When I left ConsenSys in May of 2019, I did some consulting. I worked for WhiteBlock as a CTO and I’ve been at Splunk since the pandemic started. At Splunk, I’m the engineering manager for the blockchains and there are about eight people. We’re doing an awesome job of actually using Splunk to index Web3 data.
RBG: Could you elaborate on the importance of the development of the transaction pool and signer with respect to present Ethereum protocol development?
AT: We didn’t do as much as we wanted, because we actually realized that this is a very active area of development that is moving really quickly with Eth 2. I had discussions with some of the best committers about that, so this is how I can form my opinion, but basically, you know you’re naive if you’re thinking that you just want to have mass productions being handled for you. You would just send them to the client, the client has its own algorithm, by which it would sort out collections to be included in the block, and then it goes from there and gets submitted to the actual blockchain. That changed with EIP 1559. We added more conditions into this that make the transaction pool a lot more intelligent about what should be included, where to start, and what type of cost you should apply to it. I was late to that, but there’s more coming up.
We want to have the ability to not just give one transaction, but a bundle of them. Then, it has to be exhibited together, so you can diminish MEV. On top of that, I’m seeing flash bots doing all sorts of work, whereas enabling this natively without having to do a whole lot of extra work, because they’re pretty much doing a tipping solution that allows people to do that. What I might want to do is to step away a little bit from that, because it requires so much, that’s multiple teams doing multiple things, but I can tell you a little bit from my point of view, where they could become useful.
I was using Infura for a bunch of things, like pretty much everybody, and very quickly running into time outs, because you’re asking for too much data. Even at Splunk right now, when we want to get a lot of data from all those nodes, we know that we’re going to get swallowed up pretty quickly. So there are lots of ways to do that. Of course, there’s going to be infrastructure to help you cache and do a better job, but if you’re a developer and you’re trying to develop things at home, very quickly, you run into those limits. You may also want to be able to block some of that data so that it’s available to you. You know it’s there and you can actually use that as part of your tests. You’re making that call and you’re getting this back.
It’s creating personal proxies for developers on their own machine. Why would you call Infura if you can use this in the middle so that you can have your own cache layer? You can see what’s actually going on. You can get metrics out of the value of the proxy and you can get a faster development cycle. If you’re on a train for half an hour, then your development will still work. You don’t need to connect to Infura all the time. Also, it removes a little bit of the ability for all those providers to see what you’re doing. The proxy is supposed to be that.
The production for this piece was also for me to be able to say, I’m going to stick it one transaction in that tool and it’s going to be pretty simplistic. When it’s ready, say there’s a trigger timeout or number of transactions ready, I can send them at once. I could bundle the transactions together. There are some design decisions that still need to be made there, but this piece of software itself can be useful. You can use it for all sorts of interesting use cases and do all sorts of proactive caching, like passive caching. The first time we’ll do the request, on your behalf, to the actual endpoint and the next time you ask for it, they’ve already been cached.
It’s cache for number of seconds, number of minutes, but we can also have it do a job, where every five seconds I’m going to ask you for the latest block, and then we don’t need to do anything. It’s actually doing the caching for you, collecting the data for you.
This particular approach that we’re taking, using a few benefits of memory, you have the ability to use different key-value stores. You can actually store it on disc. What was interesting also for me is that from a security point, I realized that if you have any film clients, with just one, you will have enough to support.
There’s an HTTP port or a web socket port and I’m finding out that there are a lot of tools in the space asking for the whole gamut of balls in the API. I’m just not seeing what you want to be able to see; there’s also debug information. You are able to get to the eminent points that you would want from a client. The problem comes from this being a security hole, so anyone who gets to your admin can actually ask the client to always forget the chain. You can say reset the top of the chain, of block zero, and then it’s possible to lose six weeks' worth of syncing. To avoid that you don’t want to ever expose that to the outside world.
One of the big functions of these proxies was to ask for those things, but not others. Now I can create a situation where I have my node being deployed in one big central place and then around it, it can have a proxy that says to only allow for this, or only allow for that.
The last piece of it, of course, is maybe you want to have some authentication or some throttling. One big value that Infura had back in 2017 when I was at Consensys, was being able to ask what’s the latest block. What happened there is every millisecond people would ask what’s the latest block and then complain if it was crashing down the Internet servers. There are two things you can do here. You can cache as much as you like, but at some point, that’s not even enough. You can also throttle them. Things will take a time transfer and will just return a 429 error, saying that this is currently not working.
RBG: This piece of software that you mentioned, do you think it’s being used more on the media side because the flash bot team is working on that side, or do you think it is more useful for Infura and network crawlers side?
AT: The crawler came from the fact that even though it’s the org that has been giving a kind of vision into the health of the Ethereum network, as of the time that it emerged, we’re going to be in deep trouble because we have so much going on between different versions of that software. It might be necessary to redeploy software quickly between participants.
An idea that came from Tim Beiko, from the Ethereum Foundation, was that we need to find a way to know what the population of our nodes is today. Is everybody running Yes? Do we have people running Besu? Do we have people running a derived version? Are we up to date? Do we need to think about how we are going to work with all those people, from where they’re located?
When I talked to them about this, they were saying things like, we don’t want to make all this information public. For example, if you collect all the IP addresses of all those nodes and make it public, then you might give incentives to people to go and hammer those IPs and kill the nodes. The problem is that this is actually already public, so we published on Github and in DNS records, because that’s how we do discovery. We actually use a DNS propagation method, so it’s kind of a moot point at this point.
The other thing to do is to say, I want to see if they’re running a secure OS, or something like that. Any meaningful stats are helpful for all call developers to make calls about the validity of moving between software versions or banning some aspects.
One thing that’s been coming up has been requests for more data. We hear things like, we need more information. We are making a bunch of calls about the quality of our network. Are we actually seeing everybody being up to speed? Do we see a lot of nodes having trouble syncing? What’s really the health of the network?
For a while, there was a website doing this, called ethnodes. I built the software that allows you to go and get all this data and have it displayed per node. I even went as far as building the whole infrastructure and websites into Apache Tuweni and all open source.
The idea was to allow people to connect to that website and see their own node information. What I wanted to do next is to actually have people sign up to get alerts and updates about their own nodes. How those nodes work is that they all have a key, which is private by default and is your identity on the network. If you can prove to me, by sending a message, that you own this particular node, then I can give you more access to the functions of that node. For example, how many periods you have over time, or a time series of your state.
The challenge with that is that there are two ways to connect to a node. The first aspect is, you can connect to them over UDP, which is a pretty lax exchange. The second aspect is, you will actually connect using a handshake. There’s a limit to the number of peers, so very often what you get back is, thank you, but I’m already maxed out on peers, so I’m not going to even connect with you. I’m not going to give you that much information about who I am.
When we’re able to connect, the first thing the software would ask is what’s the latest block and what capabilities are you exposing. During the handshake, we were able to get the client's handle and the version of the client.
I get this information as a time series that can then be displayed all the time. It’s useful when, for example, let’s say you’re sleeping or having a good time on the weekend. Wouldn’t you like to get an email notifying you that your node is down? I think that’s pretty cool. With this network I can say, you gave your information, you said you were willing to be part of the program, here is more information about your node and also here are a bunch of monitoring alerts.
At this stage, I want to see if there are partners that would be interested in picking that up, but the product is not completed yet.
RBG: Does Apache Tuweni currently have any signup for alerts?
AT: Not yet. Currently, you can enter your key and go to the page of your node, but it’s not exposed by default. One of the big weaknesses of the project is that I am by no means a front-end developer. It looks like a five-year-old used a crayon and you should probably wear protective glasses when you open the website.
RBG: Is the implementation specific to the execution layer or is it specific to the consensus layer?
AT: All, actually. To give you an idea, those teams have a full agenda and some of that is somewhat a reimplementation of some of the things that are available in Apache Tuweni. Tuweni has its own bytes library, which includes everything about handling bytes and making sure they’re okay. You can concatenate them, you can do a digest of them, and can do a hash of them. You can use Sha 256 on them. All those things were not provided by Java, so we actually made that available. Hyper ledger business study is using that as well. I also did some additional work to use a native library to go faster.
The funny thing for me is that when I was studying this project I was getting Apache patches from Besu and Teku. For about two weeks now, a developer, Adrian Sutton, has been a committer on Besu and Apache Tuweni. Everyone has their interest and wants hashing of bytes faster, yesterday. Many things are coming together in a really creative commons approach to make sure the projects can go much faster and get to the interesting part of the work.
RBG: Will Apache Tuweni be completely open-source?
AT: Ah, there’s no choice there!
So you can do a bunch of things and maybe draw some parallels. There are a lot of Apache projects, like Kafka and Cassandra. All those projects have what they call an open core, where it’s open and everybody can come and use it. Then they make a popular edition and a corporate edition, with some features, like better security and some additional stuff that you wouldn’t get if you were to use the open-source version, like better integration into some other system, for example. What they sell on top of that usually adds support in batches, so you get an upstream version of it that’s so much better maintained and secure. Some of the money is going to be kicked to whoever is running it for these Ethereum clients.
I don’t think there’s money in it right away, or in that sense, and I don’t think it should be a business model and then needs to be a public good. Where things could start to become more interesting, is that when you start building all sorts of interesting services on top that can become the commercial solution.
The open-source version can be just a website that you go to to get all this information, you can get all that data, you can make sense of it, you can create and I think that this is one of those things to build on top of.
At my first job, we had some open source to distribute to the community, in addition to our product. If you start with the open-source version, then the second you go to production you feel like, oh no, I actually need the real thing now. I need the enterprise edition, with all those features on top of it. Ethereum is a little different, but I think you can still apply some of that reasoning where you’re seeing that everything is open source.
You could build this generator at home, but why would you? I’m offering this on my website for a monthly fee. I give you alerts when your stuff is down. Wouldn’t you want that kind of discussion?
One thing I learned from working with DeFi companies is that most of them are running so lean that they’ll happily buy a subscription to save time. They have so much VC money, but the thing that they fight for is time. They stitch together very complex APIs to get to market faster.
RBG: How did you hear about MolochDAO and what helped you to create a proposal for a grant?
AT: I learned about MolochDAO when it was formed, in 2019, because I was following Ameen Soleimani, to some extent. I think there was maybe a grant team with MolochDAO back then because I got a grant from them to go work on Eth2. The main idea was to do basic testing or have ideas of how we would go about testing those clients together. It was so early that most of this work went by the wayside, but we built some metrics and we built some ideas about what metrics should be useful for Eth 2. That was my first exposure to MolochDAO.
I think that was the first time that I was paid in tokens and it’s been quite the journey since then.
Later, a connection on Twitter was trying to trigger people, saying, remember MolochDAO? I said, I do, and went to see what was going on. MolochDAO didn’t show much activity, and I thought of putting in a proposal. I wouldn’t take no for an answer and eventually found a way to submit something meaningful to the DAO and amazingly it went through.
RBG: Where can people go to learn more about Apache Tuweni?
AT: You can go to the main website for the projects, tuweni.apache.org.
If you’re interested in learning more about MolochDAO grants, visit our website.