Topic conventions

Ricardo Jesus
Wellhub Tech Team (formerly Gympass)
6 min readJul 23, 2024

What is the best name for a topic? Is there a conventional structure? What should be included in an event?

Industry examples aren’t easy to come by. Asynchronous communication isn’t often the preferred choice for public APIs.

Articles about the topic, albeit few, converge on some recommendations and rules that have proven themselves. This is an attempt to create some consistency. A guideline when deciding on a new topic and its contents.

· Topic naming conventions
· Event payload convention

Topic naming conventions

Structure

The first thing to discuss is a naming structure. A good structure is something similar to <field1>.<field2>.<field3>.

Other delimiters can be used, like _, but dots (.) are preferred. They are a distinctive way to split fields because they are not part of words like _ and - are.

Text format

Secondly, the format that should be used: kebab-case; camelCase; snake_case.

Avoiding capitalization prevents philosophical questions like which spelling is correct.

Together with a dot (.) delimiter, kebab-case is usually the preferred format.

Version

Topic names can be postpended with their version. Breaking changes to a topic’s payload requires a new version. This can be made explicit with vX, starting with v1. The first version can be omitted from the name.

Having the version in the name can be controversial. The approach may prove problematic when topics are created quickly, which may not be able to be deleted as quickly. An issue when there are limits to the amount of supported topics, as is the case in some providers. Multiple topic versions also increase complexity for its consumers (choosing and managing versions) and increase deployed client instances. Another option is to add the version number as part of the message header.

Versioning in the topic’s name can help with transparency. Cardinality and complexity can be managed with good definitions and specifications of each topic’s use case.

Visibility

  • Public
  • Private

private.<field>.<field>

public.<field>.<field>

It can make sense to mark cross-domain topics or help control access and use, making explicit if data is intended for internal processing within an area (domain), or whether the data can be used by others as a reliable data source. This does not replace access rights management.

Fields

Lastly, we must define the meaning of fields.

Here are some good practices to use when naming a topic:

  • Avoid using field names based on things that change. This reduces the cardinality of topics, improves searchability, and reduces ambiguity.
  • Avoid tying the topic name to a specific application or to a service. When used internally an exception can be made, but even then try to avoid it.
  • Avoid topic names based on information that would be stored in other places.
  • Avoid topic names based on their planned consumers/producers. This is essentially a special case of the first advice.

Topics’ names need to provide scope, meaning and intention. Simplicity and objectivity is crucial.

The fields suggested in Erman Terciyanlı Kafka Topic Naming are a good starting point: domain; classification; description.

domain.classification.description

Domain

Domain is the main owner of the name and should be descriptive about topic.

The domain provides the general context of a topic. The area of business it integrates. Some examples:

  • account
  • access
  • billing
  • permissions

Classification

Classification in a topic gives us the type of the topic and all topics using the same classification should have similar data. Content can be different, but there should be consistency in data formats.

The classification provides insight into the type of data it holds. And the expected use of it. Some suggested examples are:

  • cdc: Change data is used to share information about an instance/entity. It includes the latest data. Pre and post data can be optionally present when there is a change.
    This is the usual primary use case.
  • fct: Fact data is immutable information. It happens at a specific time. No information for other parties.
    Common example is data coming from devices or user actions.
  • cmd: Command topics are being used to send operations in the system. The request-response pattern.
    An example is sending a write command to a device and then it returns a response with the result of the command.
  • sys: System topics are used for internal topics that is being used in a single system or microservice. They are operational topics and do not contain any information intended for use outside of the owning system.

Description

Description is the part that gives details of the event. It can be the name of the object for a cdc event, protocol name of the command, type of the data in a fct event, or action type in a system topic.

Other fields

Extra fields can be added if necessary to divide similar descriptions into more granular subjects or, add context (like version or visibility).

Examples

  • inventory.cdc.warehouse.v1
  • web.fct.account.actions
  • public.billing.cdc.payments.v2

Topic event payload convention

Data structure

Prioritize using an object structure rather than making the data an array at the top level. It’s future-proofing for when things change and extra fields are needed.

It’s also helpful to group related fields together (instead of alphabetically, for example). This comes in extremely handy in long payloads.

Data format

The data format must be consistent within a topic. Common formats are JSON and AVRO.

Text format

snake_case is often used, but some industry guidelines have adopted camelCase. Per Google’s JSON style guide “… property names must be camel-cased, ascii strings” ².

Fields

As a general rule of thumb, the following fields should always be included in all events. They are crucial for consumers because they:

  • Prevent re-feeding, allowing consumers to discard duplicated events
  • Facilitate consistency, helping consumers find the correct offset in case of a fault or sudden restart
  • Keep all necessary consumer info in the message (e.g. using kafka headers for timestamps, then moving to a system without headers)
  1. ID
    Include some ID that references the subject of the event.
  2. Type
    Include the type of event applied to/by the subject.
  3. Timestamps
    Timestamps provide handy insight into the operation.
    Systems like Kafka guarantee messaging order inside a partition, but when correlating messages from different partitions, timestamps facilitate creating a timeline.

A note on timestamps

Some systems, like Kafka, will add a publish time to the header of a message. However, it’s useful to include timestamps in the payload for some situations, such as when the data is gathered at a different time to when it is published, or when a retry implementation is needed. Also, if additional consumers reprocess records later, a timestamp can give a handy insight into progress through an existing data set.

Good formats are:

  • Seconds since the epoch 1615910306
  • ISO 8601 format 2021-05-11T10:58:26Z including timezone information

Version

Consider adding a version field referring to the payloads schema. It may help consumers parse or discard specific payloads. For example, if a topic is being published by 2 different services, each in a different version. One consumer may expect a given field present only on the newest version, falling if consuming a message without it.

Breaking changes to the schema should be reflected (version bump) at the topic level.

Traceability

Events are usually published in logic triggered by other external factors: an API request, a cron job, consuming another event, etc. Often the logic producing the event will do HTTP calls or database operations, and it’s great to correlate them with the trigger.

Adding a correlation ID to the event can greatly improve the system’s observability and help trace issues.

Examples

billing.cdc.payments.v2
{
"id": "748e446a-d008–445b-9a87–5376d7901f0e"
"type": "debit"
"createdAt": 2024–07–14T10:58:26+01:00:00
"correlationId": "2c7fbd69-b591-46de-a151-5c1be13c6fd5"
}
web.fct.account.actions
{
"userId": "615b39ba-aa47-450b-846c-a7c3da7019f7"
"type": "update-password"
"startedAt": 2020–02–20T10:10:10Z
"version": 2.6
}

Conclusion

A topic is good if it’s objective, contextual, and explicit. Make them future-proof by avoiding dynamic, unstable concepts. These points will help your event-driven architecture be more robust and scalable.

Remember that a standard is only good if it’s followed. Use these guidelines as a starting point for your topics, and adapt them to your needs.

References

https://google.github.io/styleguide/jsoncstyleguide.xml?showone=Property_Name_Format#Property_Name_Format

--

--