How Wix manages Schemas for Kafka (and gRPC) used by 2000 microservices

Natan Silnitsky
Wix Engineering
Published in
8 min readMar 15, 2023
Created using DALL-E 2

Managing Kafka event schemas for 2000 microservices can be quite challenging and painful. Potential challenges include:

  1. How can you maintain safe schema evolution and avoid catastrophic poison pills that will bring down your Kafka consumers?
  2. How can you make schemas easily discoverable and avoid wasting precious dev time?
  3. How can you enforce good documentation to avoid usage mistakes and severe data corruption?
  4. How do you automatically sync changes in schemas across different tech stacks to avoid duplication mistakes and stale versions?

Over the past few years, Wix has experienced tremendous growth in both scale and complexity.

From a WYSIWYG visual website editor to a complete ecosystem of features that empowers self-creators and design agencies to create professional websites, and allows anyone from individuals to large enterprises to manage their online presence and business.

Intro article: How Wix Accelerated Open Platform Dev with Standardized APIs & Schemas

However, this growth introduced a technical challenge in managing a growing list of event schemas (for Kafka) and API designs (for gRPC) from hundreds of services.

Wix developers and API experts have implemented various tools and practices to make sure all the above challenges are met, dev velocity is kept high, and production issues are kept to a minimum.

Wix schemas are defined with protobuf

Previously, the standard schema definitions at Wix backend were written as Scala case classes serialized to JSON. It was mainly used for RPC as well as Kafka payloads (At the time, Kafka was less heavily used for microservices). Custom JSON-based de/serializers were created for the JVM and Node.js services (as javascript became a second language for wix backend).

Enter gRPC and protobuf
As Wix scale grew tremendously, both in terms of traffic (500 Billion HTTP requests per day), but also in terms of product areas (many verticals such as hotels, restaurants, bookings, etc…) and developers (>1000 devs), a standard microservice API architecture was introduced based on gRPC communication and protocol buffers (aka protobuf) IDLs.

Why Protobuf
Tools that generate client and server code from protobuf definitions are available for many languages, offer fast parsing, and allow forward compatibility in many cases.

In order to avoid different standards for gRPC and Kafka, Protobuf was also used to describe Kafka event schemas and not only gRPC services.

Discovering Schemas

Schema Registries like Confluent’s or Red Hat’s allow you to store your schema in one central location. This can provide many benefits such as:

  • Central location to track all schemas used in production
  • Document the data format required for each topic
  • Centralized control of schema evolution

In addition, they offer an API for Kafka producers & consumers and other clients (such as cloudEvents) to fetch the current version of the schema in order to understand how to serialize or deserialize the events.

https://docs.confluent.io/platform/current/schema-registry/index.html#sr-overview

Wix Docs
At Wix, we have our own web product for discovering and presenting schemas called Wix Docs. It is used for Wix third-party application development.

Screenshot of Wix publicly available API (REST, webhooks, SDKs)

Wix Docs utilizes build tools (e.g. Bazel) plugins, which collect API & Schema information from .proto, documentation.yaml and .md files.

The plugin generates OpenApi swagger.json files and sends them to Wix Docs services to be able to store and view the documentation.

Automatic Schema discovery & updates
In addition to Wix Docs, Wix also has a service called Business Schema that provides a list of business-entities and a machine-readable “API descriptor” for each Domain Entity with its actions and events. Business-schema API and events allow Wix developers to build client libraries, IDE plugins, and other tools that dynamically interact with Wix APIs.

Serializing Schemas at Wix

Wix Kafka Client “Greyhound” takes care of serialization out-of-the-box
Kafka producers and Kafka consumers are not used directly in Wix microservices.

Instead, an additional layer called Greyhound is used. Greyhound seeks to provide a higher-level interface to Kafka and to express richer semantics such as parallel message handling or retry policies with ease.

Open Source Greyhound Schema flexibility
The open-source Greyhound producer and Greyhound Consumer RecordHandler are flexible and allow for any type of payload de/serializer

With Open Source Greyhound, a Schema Registry can be used, the same way it would have been when directly using Kafka producers and consumers, in order to fetch the current schema. The Kafka provided SerDe can be wrapped inside the Greyhound SerDe and configured for Producer and RecordHandler seen above.

Wix Greyhound default JSON Serializer/Deserializer
For Wix needs, Greyhound has an additional dedicated layer (which is not open-sourced).
The “Wix” producer and consumer API provide out-of-the-box support for SerDe (Serializer/Deserializer).

While it is possible to override the SerDe, defaults are used for 99.9% of use cases.

For the Wix producer, the default serializer accepts protobuf-derived Scala case classes and serializes them with JSON protocol to bytes.

For the Wix consumer, the default deserializer does exactly the reverse, accepts bytes and deserializes them with JSON protocol to protobuf-derived Scala case classes.

Consumed event deserialized from bytes

Schema Evolution & Validation

Confluent Schema Registry provides compatibility checks as schemas evolve. Each Schema change is assigned a unique version with validation done according to the configured compatibility type for the schema. The main types are BACKWARD (the default) and FORWARD.

When validation is turned on for a topic, the registry will not allow producers to produce a message according to a schema that will fail to be consumed because of backward or forward compatibility issues.

Wix Validation
Most of the schema validations at Wix are the responsibility of the developer, to make sure that they don’t introduce breaking changes. There are extensive internal API lifecycle guidelines that describe how Schemas should be introduced and changed (Including backwards compatibility, maturity levels, and versioning.)

Automatic validation
In order to help developers and avoid compatibility issues, a dedicated tool was developed in order to make sure there are no breaking changes in proto files.

Building and Syncing Schemas inside CI/CD pipeline

There are 2 major build systems at Wix.
One is based on Bazel for JVM languages and services, and the other one is called Falcon (internal build tool) for Java/TypeScript and Node services.

Each has its own ways of sharing and compiling protobuf schema definitions.

Intra Bazel/Scala dependencies
Bazel dependencies can potentially be compiled from scratch (no semantic versioning!). For proto file dependencies between two Scala services built by Bazel, all compiled artifacts (including Scala jars created from proto files) are stored in caches to improve build performance. Recompilation is avoided as long as the proto files haven’t changed.

Bazel build targets
Protobuf build targets that collect all of a service’s proto files together are defined as follows:

wix_proto_library targets are actually macros that encapsulate creation of the original proto_library and also proto_npm_package for creation of NPM packages for interoperability with Node.js services.

Jars including the generated Scala files are the output of wix_scala_proto_library

Intra Node dependencies
Node services rely on an in-house library called wix-proto-codegen that scans proto NPM package dependencies (in package.json) and generates JavaScript and TypeScript classes for it.
It utilizes the protobufjs npm package as its encoding engine.

package.json declaration
Usually wix-proto-codegen is run via package.json as follows:

A gRPC client is automatically generated after dev adds proto npm package as a dependency.

Scala->Node.js dependencies
In order for proto files originating in the Scala CI system to be consumable by the Node.js CI system a dedicated service called proto-sync service was created.

proto-sync listens to build events coming from Bazel via another Wix service, filters only proto targets and triggers a build event in the Node.js CI server. (The build artifact will be created first, if it hasn’t yet.)

This build event will create a standard NPM module (with upgraded version) and will upload it to the NPM private registry.

Soon this flow will change and NPM packages will be generated directly from Bazel build.

Node.js->Scala dependencies
In order for proto files originating in the NPM registry to be consumable by the Bazel build tool, they have to be copied to a dedicated GitHub repository and Bazel build files have to be created alongside them. Now they can be consumed as a source-dependency by other Bazel targets.

Summary

While popular Schema Registry solutions for Kafka like Confluent’s can be very helpful with Schema management, it didn’t fit Wix’s unique needs and circumstances (Supporting gRPC, Kafka, and derived external APIs in one holistic standardized way).

Wix infrastructure and Devex teams have created an extensive array of frameworks and tools to support Schema management and development velocity. These Include automatic discovery, serialization, and validation.

Meeting the challenges
Challenges included multiple technological stacks and many legacy services that required support and complex migration.

All challenges were handled successfully as an organizational mindset change was made to treat every single schema very carefully and thoughtfully as it will most probably be consumed by developers external to Wix.

Thank you for reading!

If you’d like to get updates on my future software engineering blog posts, follow me on Twitter, LinkedIn, and Medium.

You can also visit my website, where you will find my previous blog posts, talks I gave at conferences and open-source projects I’m involved with.

If anything is unclear or you want to point out something, please comment down below.

--

--