Enhancing Kafka Cluster Integrity: Introducing the Topic Creation Policy

Mehmetcan Güleşçi
Trendyol Tech
Published in
6 min readDec 7, 2023

In the CDN&Messaging team at Trendyol, our mission is clear: to redefine the standards of content delivery and messaging solutions. From fine-tuning the efficiency of our content distribution network to crafting seamless messaging experiences, our team is a driving force behind the tech prowess that defines Trendyol. Step into the shoes of a CDN&Messaging team member, and you’ll find a landscape filled with collaboration, innovation, and a shared passion for overcoming challenges. Our daily routine is a blend of strategic planning sessions, hands-on coding endeavors, and the thrill of solving complex problems. It’s a journey where every day brings new opportunities to learn, grow, and contribute to the technological evolution of our team and Trendyol as a whole. Our portfolio of products reflects our commitment to excellence and our dedication to enhancing user experiences. At the core of our operations lies Kafka, a robust and scalable distributed streaming platform that fuels our messaging infrastructure. From handling massive data streams to ensuring real-time communication, Kafka is the backbone that empowers our team to deliver top-notch solutions to our users. In this article, we invite you to explore how to play with Kafka to deliver high-quality standards, understand the significance of our products, and witness firsthand how Kafka plays a pivotal role in shaping the future of messaging at Trendyol.

Introduction

In the ever-evolving landscape of distributed systems, Kafka remains a powerhouse for real-time data streaming. However, ensuring the integrity of a Kafka Cluster, especially in wide ones, presents unique challenges. As CDN&Messaging and the Incubating team, driven by a commitment to excellence, we introduce the Topic Creation Policy — a strategic initiative aimed at fortifying Kafka High Availability Configurations and addressing the pitfalls of inconsistent settings.

Understanding Kafka HA Configurations

Configuring Kafka — Because Data Deserves the VIP Treatment!

To understand which topic is covered, it will be useful to know what mentioned two important Kafka topic settings are.

What is the Kafka Replication Factor Value?

The Kafka Replication Factor denotes the existence of multiple data copies distributed across several Kafka brokers. Configuring the Kafka Replication Factor is essential for enabling Kafka to ensure high data availability and prevent data loss in the event of broker failures or request handling issues. To enhance data security, it is always advisable to set the Kafka Replication Factor value to be greater than 1. This ensures that at least one data replica resides in another broker, accessible in case of a server failure.

The Importance of Replication Factor Value

When creating a topic, specifying the replication factor is crucial as it directly affects system performance and durability. Incorrectly setting this value can have adverse effects, and making changes later can impact the system negatively. It is imperative to choose the right replication factor initially to avoid unexpected performance decreases due to increased network traffic and additional space usage on brokers.

What is the Kafka Minimum In-Sync Replicas Value?

The “min.insync.replicas” configuration, applicable at both the broker and topic levels, signifies the minimum number of in-sync replicas required for a broker to allow acks=all requests. For requests with acks=all, processing will be denied if the number of in-sync replicas falls below the configured minimum.

The Importance of Minimum In-Sync Replicas Value

The lead replica for a partition checks for a sufficient number of in-sync replicas to safely write a message, governed by the “min.insync.replicas“ setting. The data remains in a buffer until the leader observes successful replication by follower replicas, at which point a confirmation is sent to the client.

The “min.insync.replicas” can be configured at both topic and broker levels. Data is considered committed when written to all in-sync replicas, with a value of 2 implying that at least two ISR (including the leader) brokers must confirm receipt of the data.

For enhanced data commitment to multiple replicas, setting the minimum number of in-sync replicas to a higher value is recommended. If, for example, “min.insync.replicas” is set to 2 for a topic with three replicas, writing to the partition requires at least two replicas to be in-sync. If two out of three replicas are unavailable, producers attempting to send data will receive a “NotEnoughReplicasException”.

The Challange

When Kafka Configurations Get Complicated — We Bring a Powerful Touch!

Despite clients diligently adhering to best practices, the inherent complexities of common Kafka clusters posed significant challenges to Kafka Cluster integrity. Poor configurations persisted, impacting users’ experiences and exposing incidents. The consequences of inconsistent values for “replication.factor” and “min.insync.replicas” were especially noteworthy:

  1. Topic Unavailability during Broker Failures: Inconsistencies in “replication.factor” and “min.insync.replicas” values rendered topics unavailable when a broker went down. This unpredictability in the face of failure raised concerns about data accessibility and system reliability.
  2. Replication Factor vs. Min In-Sync Replicas: An imperative lesson learned was that the Replication Factor value must be greater than to the Min In-Sync Replicas value. Failing to align these parameters properly risked undermining the fundamental principles of data redundancy and availability.
  3. Uselessness of Misconfigured Topics: Topics configured with the aforementioned inconsistencies became practically useless, as producing was disallowed. This not only disrupted data flow but also hindered the ability to utilize these topics effectively.

Past experiences highlighted the incidents of Kafka Clusters to inconsistent values, impacting health and operations. To address this, the CDN&Messaging and the Incubating team introduced the Topic Creation Policy.

Solution

Introducing Topic Creation Policy — Turning Kafka Chaos into Configuration Magic!

To address these challenges head-on, the Topic Creation Policy is developed. This initiative brings order to the chaos, providing a systematic approach to setting “replication.factor” and “min.insync.replicas” values based on a carefully crafted formula.

minIsr ≤ replicationFactor × (dataCenter - 1) / dataCenter

“dataCenter” is a parametric value where brokers are distributed equally, differ from each other in location, and may vary depending on the Kafka cluster architecture.

Key Rules of Topic Creation Policy:

Topics will be created successfully when the following criterias are met:

  1. “replicationFactor” value must be greater than and equal to the dataCenter value.
  2. “minIsr” value must be greater than 0.
  3. “replicationFactor” value must be greater than the “minIsr” value.
  4. “minIsr” value must be less than and equal to the result of the above formula.

Key Features of the Topic Creation Policy:

  1. Controlled Replication Factor: The Topic Creation Policy ensures that the “replication.factor” is set in a controlled manner, aligning it with the specific needs of the Kafka Cluster. This prevents scenarios where topics become inaccessible during broker failures, enhancing overall cluster robustness.
  2. Aligning Replication Factor and Min In-Sync Replicas: A crucial aspect of the Topic Creation Policy is the alignment of the Replication Factor with the Min In-Sync Replicas value. This strategic synchronization ensures that redundancy levels are maintained, and the Kafka Cluster operates seamlessly, even in challenging conditions.
  3. Formula-Based Configuration: Leveraging a well-thought-out formula, the Topic Creation Policy calculates the optimal values for “replication.factor” and “min.insync.replicas”. This formula considers the unique characteristics of the Kafka Cluster, adapting to the number of data centers and distribution of brokers.
  4. Protection Against Misconfigurations: The Topic Creation Policy acts as a safeguard against misconfigurations that might otherwise compromise the effectiveness of Kafka topics. By adhering to a standardized set of rules, topics are created with precision, reducing the risk of producing data being disallowed due to inadequate configurations.

Conclusion: Fortifying Kafka for the Future

In the ever-evolving landscape of distributed systems, ensuring the integrity of Kafka Clusters is a shared responsibility. The Topic Creation Policy pioneered by the CDN&Messaging team represents a significant leap forward in achieving this goal. By instilling order in the topic-creation process and addressing the pitfalls of inconsistent configurations, this initiative fortifies Kafka Clusters, empowering users to harness the full potential of this powerful messaging system. As we embrace the future, let the Topic Creation Policy be a beacon of reliability and efficiency in the world of Kafka.

Thanks for reading 💛

More about us:

If you’re eager to join a team that embraces new technologies and thrives on facing fresh challenges every day, we invite you to join us.

--

--