Combining a schema and schema-less model using MySQL

Published in

rezdy-engineering

4 min readNov 21, 2018

A schema-less database model can increase the speed of implementation, making changes to data models easier to manage in many cases. In this article, I will attempt to describe a solution to introduce a schema-less data model into an existing monolith database model.

Benefits of a schema-less document model compared with a traditional relational model

First of all, data design is simpler and defined in a single place only. In typed languages, we normally have a backend model with entities and their fields. It is extra work to manage a schema definition one more time for the database layer when creating or altering tables, e.g. mapping a String field to a VARCHAR column, creating a foreign key relation if we already have a list of child entities. We already have the business logic in place to validate these references and guard the data types.

A model manipulation and its format is more feasible to consume as we do not need to introduce an additional ORM layer to normalize the data to fit the relations and then reconstruct them back on load. Imagine having a PC with a relational model. Every time we stop using the PC, we would have to disassemble it into its component parts and store them separately, and to start using it, reassemble it from those parts. In many business scenarios, we might simply want to pass a tree model e.g. a JSON document from a client or business function instead.

Furthermore, a schema-less model is a natural fit for agile development practices and continuous integration, as it does not require pre-configured data types and risky schema migrations when downtime is not an option (e.g. blocking a table when altering its structure).

Choosing the right solution

MySQL might not seem like an obvious choice for this approach as the schema-less model is not a first-class citizen when compared to other solutions such as Mongo DB which is essentially built for document storage. However, MySQL offers a JSON data type with a solid set of functions for ongoing document migration, storage and retrieval. This data type allows us to store in a single table, a schema-less part of the model in JSON columns together with other relational columns. Let’s have a look at when it is practical to use this approach.

Benefits compared to a separated NoSQL storage

Firstly, consistency and transaction management with the rest of the relational model. We have an option to update both documents and the other relational data in one transaction.

Secondly, the flexibility in model design. All the indexed, performance critical and reference columns can be defined as standard columns, while everything else can be stored as a JSON tree.

If we have an existing MySQL database in place. We can reuse an existing model, existing libraries and utility classes. E.g. reuse the existing base classes with Long type Ids, instead of building everything from scratch for pure NoSQL storage.

Lastly, well not always, we have an option to introduce a new data structure into an existing project. The DevOps overhead to introduce a new database solution can be a blocker and in my case, our project already had MySQL database hosted in RDS AWS service with all the necessary security, replication and monitoring in place.

Implementation

Let’s have a look at an example Java implementation with JPA as ORM, to see how we might define and use a combined schema and schema-less model. The model is related to Rezdy and the tourism industry domain. The entity will store channel manager settings used for a company’s (a supplier of tours and/or activities) mapping purpose with various Online Travel Agents (OTAs). Such structures tend to change quite often and can contain deeply nested child entities, and therefore are good candidates to simplify development using a JSON document. For the simplicity of this example, I will use only one OTA referred to as “ExampleOta”.

CREATE TABLE company_channelmanager_settings (
    ID                    BIGINT       AUTO_INCREMENT    NOT NULL,
    COMPANY_ID            BIGINT(20)                     NOT NULL,
    EXAMPLE_OTA_SETTINGS  JSON                           NOT NULL,
    PRIMARY KEY (ID),
    CONSTRAINT FK_COMPANY_ID FOREIGN KEY (COMPANY_ID) 
                          REFERENCES company (ID)
);

JPA entity with converter

One option to map data to the JSON column is to use a converter from a backend model to JSON string. With the converter, we can still keep our backend model type-safe and just convert it to a JSON string upon entity persistence.

@Entity
@Table(name = "company_channelmanager_settings")
public class CompanyChannelManagerSettings extends AbstractJpaEntity {
  @OneToOne
  private Company company;

  @Convert(converter = ExampleOtaSettingsStringConverter.class)
  private ExampleOtaSettings exampleOtaSettings;

  // getters and setters
}

To implement the converter we can use an arbitrary JSON serializer. JacksonUtils in the example is a utility class which uses Jackson serializer under the hood with an additional cache of the JSON reader and writer instances for a better performance

@Converter
public class ExampleOtaSettingsStringConverter 
    implements AttributeConverter<ExampleOtaSettings, String> {

  @Override
  public String convertToDatabaseColumn(ExampleOtaSettings exampleOtaSettings) {
    try {
      return JacksonUtils.writerFor(ExampleOtaSettings.class)
          .writeValueAsString(ExampleOtaSettings);
    } catch (JsonProcessingException e) {
      throw new IllegalStateException(e);
    }
  }

  @Override
  public ExampleOtaSettings convertToEntityAttribute(String string) {
    try {
      return JacksonUtils.readerFor(ExampleOtaSettings.class)
          .readValue(string);
    } catch (IOException e) {
      throw new IllegalStateException(e);
    }
  }
}

The code snippet below updates this model. We can simply use merge on the JPA entity as we don’t need to care about merging any child relations as the exampleOtaSettings is schema-less from the JPA perspective.

@Stateless
public class CompanyChannelManagerSettingsService {

  public Response updateSettings(@Nonnull Long companyId,
    @Nonnull CompanyChannelManagerSettings settings) {

    ValidationContext ctx = new ValidationContext();

    if (validSettings(ctx, companyId, settings)) {
      companyChannelManagerSettingsDao.merge(settings);
    }

    return new Response().with(ctx, settings);
  }
}

To sum it up, a schema-less model can be beneficial for development and choosing MySQL with the JSON type compared to a new NoSQL data store can be easier, more flexible, more cost-effective and yet still preserves ACID benefits. It can be very useful mostly in cases when we have an existing infrastructure ready for or experience working with MySQL and the relational model.