Apache Kafka Top 5 Problems and Solutions for Huawei Cloud

Hakan GÜVEZ
Huawei Developers
Published in
6 min readSep 29, 2023
Apache Kafka

Introduction

Hello all, I’m going to introduce “Apache Kafka Top 5 Problems and Solutions for Huawei Cloud”.

One of the most widely used open-source distributed event streaming technologies is Apache Kafka. Its use cases for include everything from creating and maintaining high-performance data pipelines to allowing mission-critical apps. The advantages and disadvantages of adopting Apache Kafka should be understood whether to use it for the next projects.

I will only discuss the 5 problems — solutions in this medium article. I have pointed out the top issues with using Apache Kafka to help draw a panorama.

This medium article describes how to troubleshoot and categorize Kafka problems.

Problems for Apache Kafka on Huawei Cloud

Why do we diagnose problems?

When experience difficulty with Apache Kafka® — for example, an increasing number of connections to your brokers or some odd record batching — it’s tempting to dismiss these concerns as problems in and of themselves. However, as you will see, these concerns are frequently only symptoms of a larger problem. Instead of treating specific symptoms, wouldn’t it be preferable to go to the root of the problem with a proper diagnosis?

This blog series is for you if you want to improve your Kafka debugging skills and grasp frequent difficulties as well as the trouble that a single symptom could be pointing to.

1- Server Busy Problem

With AMQP clients, Event Hubs immediately returns a Server Busy exception upon service throttling. It’s equivalent to a “try again later” message. In Kafka, messages are delayed before being

The length of the delay is returned in milliseconds as throttle_time_ms in the produce/download response. In most cases, these delayed requests are not logged as server load exceptions in the Event Hubs dashboards. Instead, the Thright_time_ms value of the response should be used as an indicator that throughput has exceeded the allocated quota.

When the traffic is too heavy, the service behaves as follows:

If the production request timed out exceeds the request timeout (request.timeout.ms), Event Hubs returns a policy violation error code. If the download request timeout exceeds the request timeout, Event Hubs logs the request as throttled and responds with an empty recordset and no error code. Dedicated clusters have no throttling mechanisms. You are free to use all cluster resources.

2- Connection Exceptions Problem

To debug an abnormal connection to a Kafka instance, execute the following operations:

  • Examining the Network
  • Examining Consumer and Producer Settings
  • Examining Java Clients for Common Errors
  • Checking the Go Client for Common Errors

Examining the Network:

Check that the client and the Kafka instance can communicate. Check the network if they are unable to connect.

For example, assuming SASL has been enabled for the Kafka instance, execute the following command:

curl -kv {ip}:{port}

If the network is functioning normally, the following information is displayed:

Network Check

If the network is aberrant or disconnected, the following information is displayed:

Network Disconnection

Solution:

  • Verify that the Kafka instance and the client are in the same VPC. A VPC peering connection should be established if they are not in the same VPC.
  • Verify that the security group rules are set up properly.

Examining Consumer and Producer Settings:

Check the logs to see if the consumer and producer’s initialization settings match those specified in the configuration files.

Check the configuration files’ parameters if they differ.

Examining Java Clients for Common Errors:

Error-1: Domain name verification is not deactivated

Domain name deactivation

Solution: To disable domain name verification, leave the

ssl.endpoint.identification.algorithm=

argument in the consumer.properties and producer.properties files empty.

Error 2: The loading of SSL certificates fails.

SSL certificate fails

Solution: Verify that the client.truststore.jks file is present at the specified URL.

  • Verify that the client.truststore.jks file is present specified URL.
  • Verify the files and processes permissions.
  • Make that the ssl.truststore.password option is correctly set in the consumer.properties and producer.properties files. The server certificate password is ssl.truststore.password, and it can only be set to dms@kafka.
ssl.truststore.password=dms@kafka

Error-3: The topic name is incorrect

Topic Name Incorrection

Solution: Start a brand-new subject or activate the feature that generates topics automatically.

Checking the Go Client for Common Errors:

The error “first record does not look like a TLS handshake” is returned when the Go client is unable to connect to Kafka over SSL.

Solution: Enable the TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 or TLS_RSA_WITH_AES_128_CBC_SHA256 cipher suite (both are disabled by default) if the instance was created before January 2021. Enable the TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 cipher suite if the instance was created on or after January 2021.

3- Message Composition Errors Problem

Problem: Get the error message “A disk error occurred while trying to access the log file on disk”.

Rootcause: Broker disk usage is too high.

Solution: Please refer to “Change Instance Specifications” to expand the storage capacity.

4- Version Mismatching Problem

Event Hubs for Kafka Ecosystems supports Kafka versions 1.0 and later. Some applications using Kafka version 0.10 and later may work intermittently due to backward compatibility with the Kafka protocol. However, Generally strongly advise against using old API versions. Kafka versions 0.9 and earlier do not support the required SASL protocols and cannot connect to Event Hubs.

5- Kafka Manager Login Problem on Windows

Problem: After entering the Kafka Manager address in the Windows browser address field, the login fails and an error is displayed.

Rootcause:

  • Either the Windows server and Kafka instance are not in the same VPC and subnet, or the security groups are misconfigured.
  • Kafka manager is abnormal.

Solution:

1- Ensure that your Windows server and Kafka instance are in the same VPC and subnet.

  • If they are in the same VPC and subnet, go to step-2.
  • If not in the same VPC and subnet, change the VPC and subnet

2- Windows server to be the same as your Kafka instance. Make sure your security groups are configured correctly. For more information on configuring security groups

  • If your security groups are configured correctly, go to step-3
  • If the security group is not set correctly, please change the settings.

3- Restart Kafka Manager in Kafka Console.

Conclusion

There are 5 troubleshooting mentioned in that Medium article for Huawei Cloud.

If you have any thoughts or suggestions please feel free to comment or if you want, you can reach me at guvezhakan@gmail.com, I will try to get back to you as soon as I can.

You can reach me through LinkedIn too.

Hit the clap button 👏👏👏 or share it ✍ if you like the post.

References

--

--