Scaling Services: Explicit Data Contracts Using Protocol Buffers

Explicit service data contracts provide safety, uniformity, and self documentation. Explicit contracts remove the need for large classes of validation/decoding tests from an application by outsourcing them to a separate library. Protocol Buffers (protobufs) provide a language agnostic way to structure data which allows for distributing a schema in order to ensure that all services are able to serialize/deserialize data. A language agnostic data representation, such as Protocol Buffers, removes the need for large classes of contract testing and provides a foundation for RPC frameworks, automatic client generation and automatic documentation generation.


Explicit contracts remove the need for much of the manual data validation inherent in implicit data schemas. Protocol Buffers minimize the amount of time necessary to write, update and reason about data serialization, structure and validation.

Problem

Consider two applications that communicate with each other using JSON over HTTP:

The only thing that strongly binds these services together is HTTP. If Service 1 doesn’t speak HTTP and/or Service 2 doesn’t speak HTTP, neither of these services will be able to communicate. The explicit relationship stops at the protocol level meaning Service 1 can send ANY data it wants to Service 2, and the onus is on Service 2 to make a decision about what is valid or not. When Service 1 sends data over HTTP to Service 2 it won’t receive feedback on the validity of that data until Service 2 tries to interpret (deserialize) the data:

This results in an extremely long feedback loop and tight data coupling (with weak enforcement) between Service 1 and Service 2. It’s not until that Service 2 is running that the contract can be enforced: ie Service 2 is necessary to provide feedback on if the data provided is valid or not. The goal of an explicit service payload is to extend the level of protection to the data level, drastically shortening the feedback loop and protecting the client as well as the server. Tools like Pact shorten the feedback loop at the expense of CI and build complexity, and enforcing a tighter coupling between Service 1 and Service 2 in CI. Pact also doesn’t cover protecting the client at runtime.

The diagram of Service 1 and Service 2 shows that there is no connection data connection shows that there is only an implicit relationship. Service one serializes and sends some data, service two interprets and validates that data at runtime. The lack of a formal enforceable contract, which explicitly links the client and the server increases the chance of schema drift and contract errors. It opens the possibility of a service introducing a backwards incompatible change by mutating the schema and the other service doesn’t discover it until runtime. Relying on an implicit structure opens services up to backwards incompatible changes, lack of shared tooling, and necessity to do manual parsing/validation for potentially each service/language/framework. The rest of the post will look at how Protocol Buffers can be used to address each one of these issues:

Protocol Buffers

Protocol Buffers address the issues above through explicitly defining a data schema and having all dependent services serialize and deserialize data using that schema. Using Protocol Buffers Service 1 no longer depends on Service 2 to provide feedback on the data payload. Both depend on an enforceable schema:

The allows Service 1 to have immediate feedback when serializing data against the payload contract (modeled in protobuf .proto file). The implicit unenforceable data coupling between Service 1 and Service 2 has been removed. By removing the coupling, which is runtime dependent, Service 1 and Service 2 can significantly reduce feedback time on data schema validation:

The Service 1 client is no longer dependent on Service 2 at all to get feedback on the valid contract! The feedback on data schema can be enforced as early as local testing, and can be incorporated as early as the Development stage (because the schema is represented concretely in a .proto file).

Protocol Buffers achieves the benefits above by requiring an explicit schema, an example of which is below:

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
}

One thing to note is that the Protocol Buffers operates at the data level. It is only focused on enforcing data structure and types, and not how that data is being used by the service.

Having an abstract data schema is foundational to providing advanced capabilities such as RPC frameworks, enforceable functional contracts (includes request method types and responses), documentation generators, and automated client generators. By having an explicit contract Protocol Buffers pave the way may features to support Scaling Services:

Runtime Client/Server Feedback

Protocol Buffers enable explicit contract on both the client and server during runtime. The client gets explicit feedback on validity during serialization (using protobufjs):

const person = {
name: "hi",
id: 1,
email: "hi@auth0.com"
};
const err = Person.verify(person);
if (err) {
// invalid data!!! client enforceable
throw new Error(err)
}
const m = Person.create(person)
// encode for over the wire payload
const payload = Person.encode(m).finish();

The server achieves the same thing during decoding:

try {   
const m = Person.decode(payload)
} catch (e) {
if (e instanceof protobuf.util.ProtocolError) {
// invalid data!!! server enforceable,
// no custom payload validation necessary
}
}

Protocol buffers enables protection for the client that is not achievable using an implicit data strategy. It completely reduces the need for any payload parsing or validation on the server side.

Removal of Data Structure and Type Validation

The examples above illustrate the steps required to serialize and deserialize data. All logic around datatypes and parsing are outsourced to the language specific protobuf libraries. This provides each client and service with free structure and type validation. One interesting thing to note is that enforceable data is at the payload level. It will not remove the need for application specific validation. For example it’s able to guarantee a certain field is a string but the application may need additional assurances on top of that, perhaps around length, or allowed characters.

Testing

Contract tests can be achieved by storing the protobuf schema (.proto) files in a shared location (ie github repo), which allows for both client and server to reference the most recent version of the data schema. During tests the client and server will have immediate feedback on backwards incompatible schema changes, or a breach of data contract. This allows teams to have feedback around data validity during local testing, reducing a huge amount of time to get feedback.

Backwards Compatibility / Contract Drift

Backwards compatibility is a concept baked into Protocol Buffers and achieved through field numbers and default values. Each field inside of Protocol Buffers should have a unique incrementing number. Always incrementing/adding a new field and number and never mutating an already existing number removes the chance that a field will change for a client. When an update needs to occur to a schema, a new field is added. The client can continue to write to the old field and new field until all dependencies are upgraded at which point the client can stop writing to the old field. Having clear rules around updating data minimizes the risk of introducing backwards compatible changes.

Documentation/Tooling

Protocol Buffers are self descriptive lending well to documenting themselves. Consider a JSON based service. Many times the rules around interpreting data are located in the server code behind a framework specific api. If the service doesn’t expose OpenAPI or Swagger artifacts users would have to look in the server code whenever there were questions about types or data structure and validity. Contrast this with Protocol Buffers where the data structure is explicitly available in a the .proto file. Using the Person example above:

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
}

It’s obvious what data is required and what data is optional to model a Person. It exists outside of both clients and servers and is unambiguous what fields are required, and their types.

Having a structured uniform data representation is also foundational for advanced tooling such as automated documentation generators, RPC frameworks, and automated client libraries.

Conclusion

This post explores the advantages of explicit data structure using Protocol Buffers as they compare to JSON. As with everything, there’s no silver bullet. Protocol Buffers both client and server with stronger guarantees about data structure and types but comes with its own tradeoffs. Protocol Buffers do achieve modeling explicit data structures, which provide both client and server with a contract on what constitutes valid data. This contract can then be used to significantly reduce the feedback loop on payload validity and help to decouple client and server in terms of payload validation.

Resources