Migrating your Kafka workload on Cloud

Published in

Storm Reply

6 min readSep 12, 2023

In this article, we explore how to utilize AWS Lambda to promptly integrate and migrate to the Cloud an on-premises Apache Kafka workload and a microservices architecture that relies on it. By exploiting the power of AWS Lambda, we unlock the potential to leverage other AWS services and migrate select microservices to serverless Lambda functions.

Apache Kafka is a distributed data streaming platform that allows its users to publish (produce) and retrieve (consume) continuous data flows in real-time. It has been vastly used in the last decade thanks to its high reliability, scalability, and performance. Moreover, its numerous plugins and off-the-shelf integration tools make Kafka a solution that suits a lot of use cases, spanning several domains from IoT to website activity tracking.

Among the extensive number of use cases, one of the most common is that of a data distribution element in microservices architectures. In this context, Apache Kafka allows decoupled communications between the various components of the system, enabling seamless scalability and flexibility. However, in 2014, when Kafka started gaining popularity, the majority of microservice architectures were (and plenty still are) hosted on legacy proprietary data centers. As a result, Apache Kafka itself was often hosted on these systems.

Nowadays, it is widely recognized how beneficial cloud platforms are compared to the classical on-premises hosting. AWS, one of the leading cloud providers, highlights these advantages in its whitepaper on cloud adoption. Consequently, migrating legacy architectures to the cloud has become essential to keep up with technological advancements and maintain competitiveness. Moreover, embracing cloud-based solutions often leads to substantial cost savings in comparison to their on-premises counterparts. However, many existing on-premises microservices architectures integrated with Kafka are highly intricate, making a complete migration to the cloud a time-consuming process that could span years. Consequently, a more pragmatic approach often involves a granular migration and integration with the cloud, allowing organizations to leverage the cloud’s advantages while efficiently coping with the complexity of their systems.

In this article, we will explore a straightforward approach to do that by leveraging one of the most widely used services offered by Amazon Web Services: AWS Lambda, a serverless event-driven compute service. Specifically, we will delve into seamless migration of certain microservices to the cloud by refactoring them with a group of serverless functions. In particular, to achieve this, we will exploit Lambda native support for Kafka. Additionally, we will demonstrate how easy it is to integrate on-premises Kafka workloads with other AWS services by exploiting the same method.

AWS Lambda integration for Apache Kafka

Lambda supports Apache Kafka as an event source. This means that we can define a mapping that links the production of new messages on a Kafka topic to the invocation of a function. Specifically, Lambda internally monitors the event source to promptly invoke the corresponding Lambda function.

Apache Kafka trigger configuration for Lambda from AWS Console

As it is possible to see from the image above, there are various configurable parameters available. The Kafka system to connect to is determined by specifying the addresses of the brokers. This follows additional configuration options related to data consumption, such as topic name and starting position. Lastly, the options at the end of the list govern the methods of connecting to the brokers. In this context, Lambda supports different encryption and authentication methods, such as SASL/SCRAM, TLS, mTLS, and VPC (for when Kafka is hosted on a VPC). To properly and securely configure these last parameters that contain sensitive information, AWS utilizes its own Secrets Manager.

Leveraging its versatile trigger parameters, Lambda proves to be a highly efficient consumer for one or multiple Kafka topics across various applications. As previously mentioned, this article will delve into a specific scenario: connecting to a system entirely hosted on a proprietary data center.

Architecture

Examining a microservices architecture hosted on a proprietary data center, which employs Apache Kafka as its primary event distribution system, we will explore the substantial advantages of Lambda integration for Kafka.

The architecture depicted above illustrates the connection between the on-premises architecture (on the left) and AWS (on the right) through a series of Lambda functions. Specifically, two different applications are illustrated.

The first scenario concerns the migration of on-premises microservices (Services 1 and 2 in this case) to AWS. Specifically, each original service has been refactored into multiple serverless functions (Service 1.1–1.N and 2.1–2.N). This approach is necessary because a service in a classical microservices architecture is often composed of multiple functions, whereas Lambda represents one function at a time. Consequently, there is typically a 1-N correspondence when performing the refactoring.

When performing this type of migration, it is crucial to highlight the cases in which is appropriate to do so. Lambda is particularly well-suited for applications with sporadic or variable workloads. Therefore, migrating certain on-premises microservices, which typically run on containers, to AWS Lambda functions requires a thorough examination of the workload handled. For a more in-depth analysis of the advantages and disadvantages of both AWS Lambda and containers, and to better understand the appropriate use cases for each, you can read the following article.

On the other hand, the second depicted scenario describes how the same approach (i.e. AWS Lambda support for Apache Kafka) can seamlessly integrate the on-prem architecture with a wide range of AWS services, facilitated by Lambda’s support for the majority of such services. Being able to integrate the proprietary data center with AWS by simply setting up some Lambda functions is definitely one of the biggest advantages of what is described. The potential integrations are extensive, ranging from fundamental yet cost-efficient and high-performance computing and storage options to cutting-edge technologies like Blockchain, quantum computing, and state-of-the-art machine learning models

Another crucial aspect to highlight, relevant to both the aforementioned use cases, is that AWS Lambda, despite being utilized primarily as a consumer so far, can also function as a producer. This versatility is crucial in a microservices architecture, where services must be capable of both receiving and sending events to one another. Additionally, Lambda can be configured to interact with other on-premises resources, such as databases, providing a comprehensive and flexible solution.

Enhancements

In the introduction of this article, we presented AWS Lambda as a viable solution for quickly initiating the migration of certain microservices to the cloud and integrating an on-prem system with cloud services. While this solution marks a significant stride towards cloud adoption, there are several other opportunities for further improvement.

A pivotal step in the migration journey involves migrating Apache Kafka, which can be accomplished using Amazon Managed Streaming for Apache Kafka (MSK), AWS’s fully managed version of the renowned event streaming platform. This migration promises enhanced performance, improved Kafka integration, and potential cost savings.

For on-premises microservices that were not migrated due to workload constraints with AWS Lambda, alternative AWS services such as ECS or EKS present viable migration options.
Furthermore, databases can also be migrated, and Amazon Web Services provides a range of versatile solutions for this purpose, including both SQL (e.g. RDS and Aurora) and NoSQL services (e.g. DynamoDB and DocumentDB).

Undoubtedly, these migration alternatives demand significant effort, but they offer a multitude of advantages, making them a compelling choice to consider; particularly if a partial and rapid integration and migration has already been accomplished using AWS Lambda.

We have discussed how AWS Lambda can be used for quickly initiating the migration of certain microservices to the cloud and integrating an on-prem system with other AWS services. Nonetheless, there are opportunities for further improvements, such as migrating Apache Kafka to Amazon Managed Streaming for Apache Kafka (MSK) and migrating on-premises microservices to other AWS services.

Migrating your Kafka workload on Cloud

AWS Lambda integration for Apache Kafka

Architecture

Enhancements

Published in Storm Reply

Written by Alessio Santangelo

Responses (1)