Kafka Consumer Retry

Rob Golder
Lydtech Consulting
Published in
11 min readNov 22, 2021

--

Kafka Consumer Retry

The ability for an application to retry is essential in order to recover from transient errors such as network connection failures rather than simply failing the processing. When a flow is triggered by the consumption of an event, then the consumer should be configured to retry on such retryable exceptions. However there are a number of factors and pitfalls to consider with consumer retry, which this article explores.

Retryable Exceptions

There are many situations where retrying an action that threw an exception may result in a success. Examples include:

  • A REST call to a third party service that returns a Bad Gateway (502) or a Service Unavailable (503) response, where the service may recover.
  • An optimistic locking exception on a database write, where another process has updated the entity.
  • A temporary database connection loss, or network connection error, both of which are considered transient errors.
  • Internal Kafka exceptions like an offset not being available as it is lagging will also usually recover on retry.

If such exceptions are simply allowed to fail the process, perhaps writing an event to the dead-letter topic, then the system is brittle, and much time and effort will be spent subsequently trying to replay, or retrospectively fix…

--

--