EXPEDIA GROUP TECHNOLOGY — DATA
Handling Incompatible Schema Changes with Avro
What should you do when you need to make a breaking change to your data model?
Apache Avro has the notion of schema compatibility that allows us to determine whether or not a schema is compatible with one or more earlier or newer schemas with respect to some compatibility constraint. That we can have compatible changes necessarily implies that we can also have incompatible changes. In such cases, what can we do to achieve these breaking changes while minimising disruption to consumers, be they stream or batch.
A breaking change means carefully orchestrated migration and associated disruption. Therefore I suggest that breaking changes should be avoided whenever possible, even if that means that the desired end state schema can only be achieved with compromise. It actually turns out that, depending on the compatibility mode it is possible to at least achieve a functionally equivalent schema, if not something that resembles the desired state, through a sequence of managed compatible changes.
This article demonstrates how an example incompatible change can be implemented as a sequence of compatible changes, with varying degrees of success.
A breaking change
Suppose we have a business requirement where we need to change a string
field containing a composite full name, into a field that has a record
encapsulating separate name elements. The transition from string
to record
is clearly a breaking change:
Current state
record Person {
string name; // example: "Joan Smith"
}
Desired end-state
record Person {
Name name;
}record Name {
string first_name; // example: "Joan"
string last_name; // example: "Smith"
}
Compatible change sequence
This is clearly a breaking change. However, we now describe the steps required to arrive at a schema that is functionally equivalent to the desired end-state, if not aesthetically so. By avoiding a breaking change we can minimise interruptions to consumers that would otherwise be caused by migrations between incompatible versions of a dataset.
Step 1 — Add a default
You can remove fields that have a default, so we do add a default now so that we can later remove the field. Choose a default value that can hold no current meaning in consumer systems and can be later used to identify the field as deprecated. Continue to populate the field with data for your consumers.
Note: this step is not required for BACKWARDS
or BACKWARDS_TRANSITIVE
where fields may be removed without defaults.
record Person {
string name = "<DEPRECATED>"; // example: "Joan Smith"
}
Step 2 — Introduce the new field (possibly with a default)
We are introducing the field we want in our end state. We cannot use the desired field name yet however because it will be overloaded. Additionally, for compatibility modes other than FORWARDS
or FORWARDS_TRANSITIVE
we must provide a default value. The producer should populate both fields - the existing and the new with valid data. Now communicate to all consumers that they should start using the new field. When they are all doing this, you can move on to the next step.
Note: If using FORWARDS_TRANSITIVE
or FULL_TRANSITIVE
, this is the best outcome you can expect.
record Person {
string name = "<DEPRECATED>"; // example: "Joan Smith"
Name person_name = {"first_name":"<NOT_IN_USE>","last_name":"<NOT_IN_USE>"};
}record Name {
string first_name; // example: "Joan"
string last_name; // example: "Smith"
}
Step 3 — Remove old field
Because our old field has a default, and no consumers are now using it — we can now remove it.
Note: If using BACKWARDS_TRANSITIVE
, this is the best outcome you can expect.
record Person {
Name person_name = {"first_name":"<NOT_IN_USE>","last_name":"<NOT_IN_USE>"};
}record Name {
string first_name; // example: "Joan"
string last_name; // example: "Smith"
}
Step 4 — Remove default
Now we can remove the default from the new field. Note that this does not apply to FORWARDS
as the field can be declared in step 2 without a default.
record Person {
Name person_name;
}record Name {
string first_name; // example: "Joan"
string last_name; // example: "Smith"
}
Step 5 — Rename field
And finally, we can effectively rename it by providing an alias with the desired name:
record Person {
@aliases(["name"])
Name person_name;
}record Name {
string first_name; // example: "Joan"
string last_name; // example: "Smith"
}
Compatibility Modes
This section summarises the step sequences that can be applied in each compatibility mode, and what the best achievable outcome schema is in each case. Note that while the final schema may not be as succinct as the desired end-state schema, a great amount of disruption has been avoided that would otherwise have resulted from an incompatible change.
Final results
These are the best achievable outcomes available for each compatibility mode.
Backwards
record Person {
@aliases(["name"])
Name person_name;
}record Name {
string first_name; // example: "Joan"
string last_name; // example: "Smith"
}
Backwards Transitive
record Person {
string name; // example: "Joan Smith"
Name person_name = {"first_name":"<NOT_IN_USE>","last_name":"<NOT_IN_USE>"};
}record Name {
string first_name; // example: "Joan"
string last_name; // example: "Smith"
}
Forwards
record Person {
@aliases(["name"])
Name person_name;
}record Name {
string first_name; // example: "Joan"
string last_name; // example: "Smith"
}
Forwards Transitive
record Person {
string name = "<DEPRECATED>"; // example: "Joan Smith"
Name person_name = {"first_name":"<NOT_IN_USE>","last_name":"<NOT_IN_USE>"};
}record Name {
string first_name; // example: "Joan"
string last_name; // example: "Smith"
}
Full
record Person {
@aliases(["name"])
Name person_name;
}record Name {
string first_name; // example: "Joan"
string last_name; // example: "Smith"
}
Full Transitive
record Person {
string name = "<DEPRECATED>"; // example: "Joan Smith"
Name person_name = {"first_name":"<NOT_IN_USE>","last_name":"<NOT_IN_USE>"};
}record Name {
string first_name; // example: "Joan"
string last_name; // example: "Smith"
}
If you found this article useful or use Apache Avro in your projects check out my post on the topic of Avro enums and other commonly asked Avro questions.