RNP-004: Render Network Foundation Team Review and ANTBIT AMA Recap

Published in

Render Network

11 min readJul 26, 2023

Last Friday the ANTBIT team sat down with the Render Network Foundation’s Andrew Hyde and Ryan Shea for a community AMA hosted on Twitter Spaces. The AMA gave community members the opportunity to interact directly with the ANTBIT team and gain more insight into the technology behind the ANTBIT protocol, and how this would be integrated into the Render Network systems.

During the same week, RNP-004, the related RNP from the ANTBIT team, passed its initial community vote and has since moved onto internal review by the Render Network Foundation team. Published below are the Foundation team’s thoughts and comments on the proposal.

RNP-004 Render Network Team Review

The Render Network team (made up of core members of the Render Network engineering team, community moderators and infrastructure developers) have reviewed RNP-004 to make sure that it meets the criteria of an effective RNP (is economically and efficiently feasible, project costs and any additional issues or topics that may pose a block to the RNP).

After lengthy discussions in the Discord and AMA on Twitter, the Render Network Team is showing support for RNP-004 and putting it up for a final Community Vote starting July 26th at 5pm, PST. This vote lasts six days, ending on August 1st at 5pm, PST.

RNP-004: Summary

As summarized by the architects of the RNP, “RNP-004: Compute Clients”, is a proposed integration between the Render Network infrastructure and an API designed by the ANTBIT team, which would allow outside users to access the Render Network’s node pool to support compute jobs such as machine learning, AI training models and other similar computational tasks within those fields.

In practice, this would increase the general usage of the Render Network’s node framework, providing more work for the Node Operator’s machines themselves, while simultaneously expanding the operational abilities of the Render Network eco-system into new fields. Of particular note is that this API implementation would not be a requirement, but an opt-in feature for node operators, if they would like their nodes to participate in the ANTBIT protocol. The RNP can be read in its entirety here.

Render Network Team Feedback

While reviewing, members of the Render Network team had the following comments regarding the RNP. Among the comments are specific areas that the Render Network team believes need to be further defined prior to any implementation, if the RNP is to go forward.

Areas that will need to be defined in the implementation phase are:

How queueing jobs between Render Network and ANTBIT will function specifically
A node opt-in mechanism for GPUs that will be available on both clients (Render Network and ANTBIT’s API)
Pricing and fee structure for ANTBIT compute jobs on Render Network
Dashboard features for Node Operators to manage computing work between Render Network and ANTBIT

AMA Recap

For those that were unable to attend live, the following is a condensed recap that includes quotes from the AMA transcript. The interviewers/AMA hosts were the Render Network Foundation’s Andrew Hyde and Ryan Shea, who will be abbreviated as R/A and the interviewee was Ahmad Shadid, CTO/CEO of ANTBIT, who will be abbreviated as AS.

R/A: “Ahmad, do you want to give a little bit of background on yourself and how ANTBIT came to be?”

AS: My name is Ahmad, and I’m the CTO/CEO of the project…I started in quant trading around like five years, six years ago. And mainly I’ve been building low latency trading systems and it evolved from normal computation of backtesting strategies to actually using machine learning models. And through that journey, the algorithms were just scaling more and more and the compute power required to maintain these algorithms became extremely hectic. Just to give you a little background, our trading system ANTBIT before it pivoted to become GPU into ML GPU and so on. We were building trading systems that are capable of monitoring more than 1,000 stocks in real time plus 150 cryptocurrency. But the monitoring is not just normal prices, rather tick data prices. We’re talking about the volume of, I don’t know, billions to trillions of ticks a day.

R/A: “… it seems like ANTBIT was a product of your own personal problems with high compute costs. From a technical problem perspective, what are some of the same traits that you might bring from trading and high performance, low latency systems in the ANBIT product itself?”

AS: I think one of the most important things in serverless GPU model training or deployment is how many seconds it takes to deploy. Let’s say you are an engineer and you would like to use a stable diffusion model. Now, you don’t need it all the time, what happens is you need to quickly, with the moment you need it, it’s there. Like it’s not running before you need it, but the moment you need it in less than two, three seconds, it’s there, it’s ready for your inference on any open source model or whatever…. Now we bring this performance optimization to such types of distributed GPUs.

R/A: “Can you tell me kind of … what is the goal of RNP-004 as you see it?”

AS: As we all know, Render [Network] is the go-to for GPUs currently in the crypto industry. There’s definitely multiple people there trying to solve the same challenge, basically providing this petrol for the AI revolution…. And because of that issue, the advantage of Render [Network] that they have, GPUs coming from gamers, these underutilized GPUs that are, it wasn’t built at the beginning for enterprise use cases. But actually, if you look at the statistics of the usage of these GPUs, mostly current AI, ML engineers are actually using the gamers’ GPUs, the RTX 3090, 4090, and so on. Even my friends from mid-journey behind the algorithm of, stable diffusion algorithm behind Mid-journey, they’re using the RTX 4090….
Now, the benefit of, or the advantage that Render [Network] has is that it has these community GPUs, the gamers GPUs, the designers GPUs. And, as we all know, it has been so far used for rendering, whether 3ds Max and so on, but with the same GPUs with little bit of modification and running some, you know, our engine inside it, it can become ML compatible. And it also, it could serve just like cloud computing, just like the cloud providers, AWS, or Lambda clouds and so on. So the reason for this RNP is we would like to request that we have a way to access the GPUs that are coming from Render [Network] or under the Render Network.
And also, if we do that, then we, once that exposes an API, we can then have a, like two ways partnership in a way where the Render [Network] GPUs can still do the rendering jobs, but the implementation we want to do on these Render [Network] GPUs is we want to build a router. This router basically knows that, okay, this GPU is currently busy rendering for Render Network jobs. But 80% or 60% or 50% of the time, it’s not. So why not switch for ML compute at that point when it’s not needed for rendering?

R/A: “What is the team at ANTBIT like? How many people do you have and, you know, what’s the scale of that?”

AS: Well, the team is growing really fast. Currently, we’re mainly focused on hiring the A team and as machine learning engineers, infrastructure engineers, we just got on-boarded senior engineers from Intel handling all the performance and GPUs for Intel currently. And the team’s just scaling right now. I think very soon we’re gonna be around, I think by the end of August or mid September, the team will be around 28 people. By the end of this year, I think the team will reach around like 42, give or take.

R/A: “Can you clarify that 70% savings you mentioned in Discord discussion for the exact same compute time result or would Render [Network] be slower and have some other drawback compared to the power to the providers ML engineers are currently using now?”

AS: There are these savings and definitely there are some drawbacks. We can’t just say there’s no drawback because first of all, it’s distributed GPUs. So you have the latency between the GPUs distributed across the planet….
But if we just look at it without the solutions… Just look at it as unbiased here. Then other than that you have different connectivity tiers for each GPU like the GPU that is located or could be both GPUs located in the same city. But someone has one gigabyte of internet speed and the other one has just 30 megabits of internet speed. So that’s some of the drawbacks that would affect definitely a cluster of GPUs.
The other drawback that it’s possible that since GPUs when they are located in one data center means that the bandwidth for downloading and uploading is usually really high or unlimited and the internet speed is usually matched between all the nodes in the same data center.

R/A: “So these are the different drawbacks that you know of [on] the ANTBIT team and what are you doing to address these things?”

AS: We are trying to solve and already solved multiple of them. There’s the security drawback, which means that there is a potential of security threats on these machines that are coming from basically untrusted devices so they could do some man-in-the-middle attack to the memory. They could do some network listening to the network traffic and so on.
Another thing is that with all the things we are presenting there is still definitely a drawback. That drawback could be like 30 to 25% lower speed than having something in a data center. But let’s start with how we’re addressing some of them. First of all, the way ANTBIT works is by clusters, so mainly the target audience we’re targeting or the engineers that going to use these GPUs, they’re not gonna rent just one GPU somewhere, they are usually renting cluster of a hundred GPUs, a thousand GPUs or even 20,000 GPUs all working together like an ant colony. Now this type of use case requires that all these GPUs should be very well connected in terms of connectivity speed. We all know you are as fast as the slowest person in the team now.
What happens is when the engineers create the cluster they have multiple selections. And all these selections affect prices. Now the first selection mainly is choosing which model of GPUs you want. Do you want the RTX 4090? You want the 3090?
You choose next would be where these GPUs should be located? Maybe you want your GPU co-located in Hong Kong near the Hong Kong stock exchange. Just because you need low latency trading or maybe you want the GPUs to be all in San Francisco because you’re inferencing something like Instacart drivers that you are trying to predict with your model. You know, what is the route for your drivers to pick up the orders from different shops? So all these inferences, I mean this co-location of these GPUs is one of the most advantages of having a distributed network of GPUs. Now since you can choose the location of the GPUs you can say that I want 100% of this 10,000 GPU cluster all of them to be in Austin or all of them to be in San Francisco.
Moving to next is tier selection. If you as an engineer you don’t really care if the tier of this connectivity tier is basically we grouped the GPUs based on their internet bandwidth. So if they have 100 megabits download upload, this is the low tier or the medium tier. If you want 1 gigabits upload/download, that’s the high tier and you pay for that price.
So as a miner or as a provider of GPU for the Render Network you’re incentivized to provide or to invest in your connectivity infrastructure because you will have a higher utilization rate. When someone hires you you’re gonna be paid more. So you’re incentivized to go to your internet provider and subscribe for a higher connectivity tier to make sure your device can be upgraded to the level. The connectivity tier itself is something really crucial because some people I mean sometimes I would like to train a model, but I’m not so urgent about it. It’s not low latency trading It’s not some order delivery thing route tracking or whatever That I have a long time with I’m not urgent on so why would I have to pay for competition cloud level connectivity when I could just choose for 30 or 20 megabits of internet speed and get the GPUs that have this internet speed and save 90% of the cost. So we price the tier.
After these tiers there’s still this communication between these nodes or these GPUs. The way this communication happens is really robust. We use what is called a mesh network. Basically the traffic routes to this the shortest distance and the lowest ping to the nearest GPU. So communication don’t go to a central place. They talk to the nearest GPU and send the traffic to the other nearest GPU and it reached the target GPU. This solves big part of this latency problem.

R/A: “I was wondering if you were able to sort of just portray form you know, some of the folks in the audience today probably host nodes on the Render Network. You know at a high level for a node operator — What does it mean to use ANTBIT from a security perspective and what’s a high-level overview of the the work you guys are doing to Interface with my local machine if I’m a node operator?”

AS: That’s a great question. I mean mainly the way we made the mining client or let’s say that the ML Libraries that are going to run on these operators it’s natively built on Docker. The reason why we chose Docker is because it’s really useful in a way with that.
First of all, it’s compatible with everyone and everything and all the libraries on the planet. That’s mainly the reason why we chose Docker. The same reason why we chose Docker is because in Docker you can contain an operating system without having access to the file system of the machine. You can also contain the access to the GPU like you want to say like I only want one GPU to come to inside this Docker to be in to pass through this Docker. You can eliminate entirely file access to them to the host machine and whatever you know, anything that happens inside this container will always remain inside this container. So there’s no way anyone could run a virus. You know, like predict that he’s doing a model and then run a virus inside this container and affect your file system it will always be inside the Container and only the container will get corrupted. But it wouldn’t pass to your file system.

To keep up to date with news surrounding the RNP, including discussion channels, links to the voting platform and more, please follow the Render Network at any of the platforms below.

Join us in the Rendering Revolution at:

Website: https://render.x.io
Twitter: https://twitter.com/rendernetwork
Knowledge Base: https://know.rendernetwork.com/
Discord: https://discord.gg/rendernetwork
Render Network Foundation: https://renderfoundation.com/