Why serve AI models on the edge?

Published in

Sparque labs

5 min readAug 8, 2023

AI models on the edge network

The current focus for AI models being deployed is primarily in the cloud. This article focuses on some of the reasons and usescases on why we need to move some of the compute for AI to the edge of the cloud rather than leaving everything to the cloud.

Some reasons to decentralize the AI Compute Layer

Cost

The hyper rush to obtain GPUs and centralize AI workloads in the cloud is a constant, ongoing challenge for the hyperscalers and for customers. Try to obtain a free GPU from one of the hyperscalers or even a Spot GPU Instance to reduce your compute costs and you will run into problems. With the cost of an A100, H100 running into the tens, hundreds of thousands of dollars, the cost to stand up a cluster of GPUs is prohibitive and possibly beyond the capabilities of small companies, and accessible only to well-funded corporations and hyperscalers.

Scarcity

The scarcity of GPUs is another cause for concern. With a single supplier situation, we are literally entering a period of a single point of failure for this new driver of business growth.

Centralization

Similar to the scarcity issue, using the cloud centralizes your compute to the cloud datacenter locations. While this might be necessary for the sake of consolidating your GPU computes in a location that optimizes your delivery of AI workloads, this also goes against the grain in recent advances and trends of regular compute workloads.

Technical and Business Risk and unforeseen cost in running “business critical” infrastructure in a centralized manner

Centralizing a critical business compute workload leads to increased risk. If one critical part of your compute infrastructure is not accessible due to various reasons like cyberattacks, network, power, natural disasters, or other unforeseen reasons then the vast majority of your business critical infrastructure can be lost for an unknown amount of time resulting in an unforeseen amounts of financial cost.

Benefits of edge computing

As we get deeper in this new era of the “GPU wars”, we are possibly heading towards a possible “Quadropoly” of compute.

We probably need to step back a little bit and consider benefits of decentralizing the AI computing workloads.

Doing so, helps with:

decentralization of the compute
Distribution of the compute workloads
Greater failover mitigation
Higher availability for more users in case worst case scenarios
Reducing risks to a smaller part of your workload and customer base
spreading the compute workloads closer to the user
benefits of not relying on “quadropolies” only for your business use cases

Privacy, GDPR, Geo restrictions

Another big reason to decentralize your AI compute workloads is to not run afoul of recent increase in privacy and Geo restrictions like GDPR. Locating your AI compute workload closer to your users allows for some of these concerns to be allayed and allows you to conduct your business without legal implications.

Privacy of data is another reason for this. As AI Models get bigger and bigger, there will be additional restrictions on what data can be used and where it can be used and stored. Decentralization of the AI Compute workload allows you to locate them closer to the region in which that data needs to be secured.

Factories, oil Rigs, vehicles, office buildings, doctor’s offices

AI workloads are going to be so ubiquitous that they will be used in all areas of business applications. There are compelling cases where the person building or deploying the business application would not want the data to be sent off to centralized cloud and results sent back. These include factories, oil rigs, even office buildings. All these usecases are compelling reasons for de-centralizing your AI compute workloads.

Recent advances for deploying “on the edge”

Ofcourse, if you use an API from a centralized system like ChatGPT, it allows you to save on the costs of maintaining your own cluster of compute workloads. There are many benefits to doing this that we won’t elaborate on, but most likely, the vast majority will go down this road.

The same goes for those using the quadropolies datacenter for their AI compute workloads.

But, given the reasons above, there possibly could be opportunities in an alternate way of deploying of AI workloads that enable companies to deploy smaller AI models “on the edge”.

Its possible that not everyone needs a ChatGPT level AI Uber type of model.

There have been many advances in the Open source LLM model area to be hopeful that there could be alternatives that allow companies with data, Geo, Privacy and other restrictions to be able to decentralize their AI compute workloads.

With the Hugging face Leaderboard having a new LLM in place almost daily, we are seeing unprecedented advances in this area. With other additional advances like GGML based AI models, new ease of choosing quantization levels for your LLM, choosing newer formats like ONNX that can optimize your deployment, its probably possible that you could pick and choose a model of your choice, that works “good enough” for your use case.

But only by running through the business use-cases, would you be able to answer that question.

What the new world of open source LLMs does do, is that it provides you with a choice.

What we now need is are easy ways to deploy and use these LLMs in a cluster that can be easily deployed anywhere and everywhere.

Summary

While this new world of AI Computing workloads is a new, exciting development, the above tries to point out some benefits of distributing the compute capacity to the edges of the network.

While consumer devices like phones, conputers browsers would still be leveraged to run some local workloads, the opportunity provided by decentralizing your AI compute clusters would help for all the reasons outlined above.