Edge-AI trends in 2024

7 min readJan 23, 2024

The convergence of AI and edge computing will continue to mature, allowing for more robust real-time analytics and decision-making at the edge. Enhanced edge AI capabilities will reduce the need for data transmission to the central locations in the cloud, ensuring faster responses and better privacy preservation. In this post I’ll cover the key Edge-AI trends in 2024 that are worth watching.

Automating the Edge operations with AI assistant DevEdgeOps:

Managing numerous edge deployments can quickly become overwhelming. DevEdgeOps advocates for reducing that complexity using a shift-left approach where production issues can be identified earlier during the development phase. Having said that the automation process using Infrastructure as code (IaC) is fairly complex especially when it comes to highly distributed edge environments and must be balanced with the requirements of Edge Operating Environments which are different than IT. Gen-AI and co-pilot-based edge automation development tools can significantly reduce the development process of that automation code and help to meet the stringent requirements of Edge operations workloads. A recent study by McKinsey indicates a potential improvement of up to 56% in productivity.

Figure 1 The benefits of using Gen-AI as a coding assistant a.k.a co-pilot. (Source: McKinsey )

AI-Based Edge-Orchestration:

Next-generation edge platform will include AI-based policy-driven deployments. Those policies will include dynamic workload migration, and resource optimization algorithms to ensure seamless workload distribution and efficient task execution to match the right infrastructure for the job based on location, edge topology, application availability, software/modeling versioning, dependencies between training/inference environments, and/or SLA criteria software/modeling versioning, dependencies between training/inference environments, and/or.

AI Inferencing across Edge and Cloud:

The future isn’t a binary choice between edge and cloud; AI workloads will seamlessly flow between the edge and the cloud, depending on complexity and resource requirements. The cloud will provide the training ground for powerful models, while the edge will handle the fast-paced inferencing, delivering lightning-quick responses. Next-generation Edge platforms will need to support end-to-end automation for delivering vertical industries solutions that span across multi-cloud and edge.

The Rise of the Micro AI:

2024 will see the rise of lightweight, hyper-efficient AI models designed specifically for resource-constrained edge devices. Imagine tiny AI brains embedded in everything from smartwatches to drones, making real-time decisions without relying on the cloud. This is an area where were expect to see lots of innovation in 2024 which can be broken down into the following categories.

Domain-Specific and Task Focused Models:

Instead of aiming for general-purpose language understanding like LLMs, these models are trained on specific tasks like machine translation, text summarization, or question answering. They often outperform LLMs on these tasks due to their focused training. Similarly, Domain-Specific Models models are trained on data from specific domains like healthcare, finance, or legal documents. They offer a deeper understanding of the domain and can deliver more accurate and relevant outputs.

Smaller Models:

These models distill the extensive knowledge encapsulated in a large pre-trained model, such as an LLM, into a more compact and efficient model. This process is particularly advantageous for deployment on devices with limited resources or for minimizing computational overhead. A prime example of this approach is TensorFlow Lite, which is specifically designed for such purposes. This allows for resource-sensitive model development specifically targeted for Edge.

Introduction of a new class of Edge Optimized AI frameworks and models:

Large Language Models (LLMs), such as GPT, have been the backbone of many AI-based models. As their name suggests, LLMs were designed to process large language models. However, edge-based use cases are quite distinct, often requiring real-time, stream-based processing within more constrained environments. This necessitates the development of new models specifically designed to address the physical constraints and unique use cases of edge computing. Here, I highlight some recent advancements that have the potential to revolutionize Edge-AI models:

“LLM in a Flash”: Apple recently published a research paper titled “LLM in a Flash,” proposing a novel technique to run LLMs on devices with limited memory, such as smartphones. The paper suggests that Apple is striving to keep pace with its Silicon Valley competitors in the realm of generative artificial intelligence. The researchers claim their approach “paves the way for effective inference of LLMs on devices with limited memory,” offering a solution to a current computational bottleneck. The paper also indicates that Apple is focusing on AI that can operate directly on an iPhone, rather than delivering chatbots and other generative AI services over the internet from their extensive cloud computing platforms. The new technique, “LLM in a Flash,” utilizes flash memory to store AI data on iPhones with limited memory.
Liquid Neural Networks: Adaptable Brains at the Edge: Liquid Neural Networks (LNNs) are a state-of-the-art type of time-continuous Recurrent Neural Network (RNN) designed for continuous learning and adaptation at the edge. Unlike traditional RNNs, which operate in discrete steps, LNNs function like a flexible stream, constantly processing and adapting to new data in real-time. This makes them particularly well-suited for tasks involving time series data, such as: — Predicting future traffic patterns — Analyzing sensor data from IoT devices — Understanding and reacting to changing environments in robotics.
Different Types of Generative AI to Improve Reinforcement Learning: Reinforcement learning is used extensively in Edge for control and analytics use cases. This involves an agent learning or being trained to make control-based decisions. RL can operate in complex environments and make state-based decisions based on maximizing a reward or minimizing a cost. However, RL can be limited to use cases where the state space is small enough to effectively learn. New forms of generative (normalizing flow or generative flow) offer an ability to manage much more complex environments

General-purpose GPUs:

The exponential demand for GPUs and the dependency on a single vendor based on a data center or end-user computing solutions (Nvidia and AMF) for such a central piece in AI infrastructure has led to a supply chain challenge and thus to a peak in GPU prices. In 2024, we will see significant demand for new players and approaches for more efficient and cheaper general-purpose GPUs and edge-specific accelerators. Some of the notable players in this category are:

Intel: Focuses on integrating their Arc GPUs with their CPUs for better performance in specific workloads.
Edge-Specific AI Accelerators: Companies like Sima.ai and others are building capabilities specifically from the ground up

Non-GPU-Based AI accelerators:

There exist alternatives for managing AI workloads that do not solely rely on GPUs. For instance, Arm Neoverse CPUs, specifically designed for High-Performance Computing (HPC) and AI tasks, can deliver competitive performance with the added benefit of lower power consumption. Google Cloud TPUs offer custom-designed AI accelerators, and Qualcomm provides Snapdragon Neural Processing Units (NPUs) based on Digital Signal Processing.

Moreover, we are witnessing the emergence of hybrid scalar/vector processing architectures that can effectively support many workloads within the framework of x86/ARM64. The open-source CPU architecture is also gaining momentum (RISC-V) and could potentially pave the way for a more diverse and cost-effective range of AI hardware options in the future. The question remains whether GPUs will continue to be standalone resources or become integrated with standard CPU-based models.

Final notes

The AI landscape, still in its nascent stage, is evolving at an unprecedented rate. Consequently, we anticipate considerable disruption and fragmentation in the coming years, affecting both AI models and AI infrastructure, as well as key players in the field. It is therefore crucial to adopt an open architecture approach to manage this level of fragmentation, by decoupling the AI workload from vendor-specific AI infrastructure.

This can be accomplished using a combination of application frameworks such as:

Cloud Native: While Kubernetes is not an AI abstraction platform per se, it can be used as a platform for containerized AI workloads. This framework provides a degree of abstraction between the application and the underlying infrastructure, allowing the integration of specific GPUs, AI accelerators, etc., at runtime through Kubernetes driver configuration. Kubernetes includes stable support for managing AMD, Intel, NVIDIA GPUs (graphical processing units) across different nodes in your cluster, using device plugins.
OpenCL and SYCL programming languages: These open-source standards enable developers to write code that can operate on various hardware platforms, including GPUs from different vendors.

However, managing these combinations can become an operational challenge. This is where next-generation Edge Platforms come into play. Platforms such as Dell Native Edge, Azure Stack Edge, and Google Distributed Cloud Edge offer a pre-integrated and modular stack. This allows customers and industry-specific solution providers to concentrate more on their core business and less on delivering a generic edge-AI infrastructure. According to a report by Grand View Research, the global edge AI market size was valued at USD 5 billion in 2022 and is projected to grow at a CAGR of 24.8% from 2023 to 2032, attributed to the rising adoption of cloud computing globally where the software component in that segment takes 52.5% of the market share.