Schema versioning and upgrade in document store — Implementation with Java, MongoDB and SpringData

Bernhard Ruch
ELCA IT
Published in
7 min readDec 22, 2023

The our previous article we have shown how schema versioning and upgrading in a document store could work from a conceptual point of view. In this article, we show how this can be implemented using Java, MongoDB and Spring Data.

Photo by Chris Ried on Unsplash

The source code of the sample application is accessible in GitHub: https://github.com/ELCAIT/document-store-schema-migration

Sample application

The sample application manages a set of trains that stop or pass through a series of train stations at specific times:

Domain model (sample application)

The UI of the application is a Swagger interface that offers the following operations:

Data setup operations:

  • PUT: Load the list of trains from a JSON file (included in the application)
  • DELETE: Delete all trains

Business operations:

  • GET: Get a list of all train numbers
  • GET: Get a train by train number
  • PUT: Update a train
  • PUT: Modify the label of a train by train number

Schema version operations:

  • GET: Count the number of documents per schema version
  • PUT: Upgrade the documents from a source to a target schema version

Document Store: MongoDB

In MongoDB, the data is stored in one collection: Train

The collection can be initialized with data from a JSON file and initially contains only documents in schema version V1.

Depending on the schema-compatibility-version configured in the application.yml (field “schemaCompatibilityVersion”), the application may read and write documents either in schema version V1 or V2.

Collection “Train” with schema versions V1/V2 and partial indexes

Schema-Versions

The schema version of each document is stored in the attribute “schemaVersion”, having the two possible values “V1” or “V2”.

For the purpose of this example, the versions differ only in attribute names:

Indexes

The collection has two separate indexes, one per schema version. The indexes are defined directly in the entity classes (V2: Train, V1: TrainV1):

Current version (V2):

@Document(collection="train")
@CompoundIndexes({
@CompoundIndex(
name = "v2_number",
def = "{'schemaVersion': 1, 'number': 1}",
partialFilter = "{'schemaVersion': 'V2'}"
)
})
public class Train {
...
}

Previous version (V1):


@Document(collection="train")
@CompoundIndexes({
@CompoundIndex(
name = "v1_trainNumber",
def = "{'schemaVersion': 1, 'trainNumber': 1}",
partialFilter = "{'schemaVersion': 'V1'}"
)
})
public class TrainV1 {
...
}

The indexes have the following properties:

  • The first attribute is the schema version
  • The second attribute is the train number. Note that the attribute name differs between the two versions
  • They are partial indexes, containing only data from the respective schema version (using the property “partialFilter”). This limits the size of each index and ensures that the index field names correspond to the actual documents of the respective schema version. For example, the index “v1_trainNumber” will only reference documents with schema version “V1”, each having an attribute “trainNumber”.

Queries

In order to find data for the respective schema version and to make use of the indexes, the queries need to include the schema version and use the correct attribute name for the train number:

Current version (V2):

{'schemaVersion': 'V2', 'number': 701}

Previous version (V1):


{'schemaVersion': 'V1', 'trainNumber': 701}

Note: Each query will only return documents of the respective schema version. In order to find all documents matching the given criterion (TrainNumber: 701), both queries have to be executed and their result sets have to be combined.
It is also possible to perform a combined query, for example:


{ $or: [ {'schemaVersion' : 'V2', 'number': 701}, {'schemaVersion': 'V1', 'trainNumber': 701} ]}

Note: For the application, the fact that the schema version has to be included in the queries is no problem since the queries are hard-coded in the Spring Data MongoDB repositories. However, if a user executes ad-hoc queries directly in the database, he has to be aware of the fact that he has to include the schema version in the query. Of course, it is also possible to execute queries without specifying the schema version, as long as there are documents that match the given attributes, for example:

{ $or: [ { 'number': 701}, {'trainNumber': 701} ]}

For better performance, this might require the definition of additional indexes.

Java Application

The following diagram illustrates the different classes involved in the Java application:

Separate data access services and spring data mongo repositories for each schema version

Entity classes and Spring Data MongoDB Repositories

In order to simplify the application and take advantage of Spring Data MongoDB, the application uses separate entities and spring data mongo repositories for both schema versions, each of them accessing the same MongoDB collection (train).

As for the naming conventions, the classes for older schema versions (V1) use the schema version as suffix (e.g. “TrainV1”), whereas the classes for the current schema version (V2) don’t use such a suffix. In that way, the business code can always use the current version and is not cluttered with schema version suffixes.

Current version (V2):

@Value
@Builder(toBuilder = true)
@Document(collection="train")
public class Train {

@NonNull
String id;

@NonNull
SchemaVersion schemaVersion;

@Version
Integer optimisticLockingVersion;

@NonNull
Integer number;

...
}

@EnableMongoRepositories
@Repository
public interface TrainMongoRepository extends MongoRepository<Train, String> {

@Query("{'schemaVersion' : 'V2', 'number': ?0}")
List<Train> findByNumber(int number);

...
}

Previous version (V1):

@Value
@Builder(toBuilder = true)
@Document(collection="train")
public class TrainV1 {

@NonNull
String id;

@NonNull
SchemaVersion schemaVersion;

@Version
Integer optimisticLockingVersion;

@NonNull
Integer trainNumber;

...
}

@EnableMongoRepositories
@Repository
public interface TrainV1MongoRepository extends MongoRepository<TrainV1, String> {

@Query("{'schemaVersion' : 'V1', 'trainNumber': ?0}")
List<TrainV1> findByTrainNumber(int trainNumber);

...
}

Data access services

Since the data access is split over two separate Spring Data MongoDB repositories, the data access is encapsulated in a service that handles the combination of the two versions.

There is a data access service for each schema version:

  • Current version (V2): TrainService
  • Previous version (V1): TrainV1Service

Both services provide the same methods for access to the documents, using the current version of the entity classes (Train):

  • TrainService provides access to documents of both schema versions V2 and V1
  • for documents having schema version V2, TrainService uses the spring data repository (TrainMongoRepository)
  • for documents having schema version V1, TrainService delegates the method to the access service V1 (TrainV1Service)
  • TrainV1Service provides access only to documents having schema versions V1 and uses the corresponding spring data repository (TrainV1MongoRepository)

For reading operations, two separate queries are performed on the MongoDB collection, one for each schema version, and the results are combined:

public class TrainService {

public List<Train> findByNumber(int trainNumber) {
List<Train> trainsV2 = trainMongoRepository.findByNumber(trainNumber);
List<Train> trainsV1 = trainV1Service.findByTrainNumber(trainNumber);

return Stream.concat(trainsV2.stream(), trainsV1.stream()).toList();
}
...
}

The access service for version V1 reads the documents from the repository and converts them into schema version V2:

public class TrainServiceV1 {

public List<Train> findByTrainNumber(int trainNumber) {
return trainV1MongoRepository.findByTrainNumber(trainNumber).stream()
.map(trainV1Converter::fromV1)
.toList();
}
}

...
}

Note: This approach requires that the new schema version is backward compatible to the previous schema version (e.g. each new mandatory field needs to be derived either from the content of the previous document or to be set with a useful default value). However, in case of a non backward compatible schema change, any type of update mechanism (e.g. SQL upgrade script) would face the same problem. Therefore, one should attempt to keep the schema versions backward compatible as far as possible.

For writing operations, only one operation is performed on the MongoDB collection, depending on the schema-compatibility-version. In that way, the schema-compatibility-version determines in which schema version the documents are written:


public class TrainService {

public Train save(Train train) {
switch (schemaCompatibilityVersionConfiguration.getSchemaCompatibilityVersion()) {
case V1:
return trainV1Service.save(train);

case V2:
return trainMongoRepository.save(train);
}
}

...
}

The access service for version V1 converts the entity from schema version V2 to V1 before storing it in the repository:

public class TrainServiceV1 {

public void save(Train train) {
TrainV1 trainV1 = trainV1Converter.toV1(train);
trainV1MongoRepository.save(trainV1);
}

...
}

Note: the concept of “schema-compatibility-version” is described in the related article https://medium.com/@bernhard.ruch/d15a2cecd4e9

The conversion from schema version V1 to V2 and vice versa is implemented by TrainConverterV1:

public class TrainV1Converter {

public Train fromV1(TrainV1 trainV1) {
...
}

public TrainV1 toV1(Train train) {
...
}
}

Note: there are tools like MapStruct that could be used for converting objects from one model to another.

Data migration: upgrade schema version

The upgrade of documents with older schema version (V1) to the current schema version (V2) is straightforward and can be done by the document access service (TrainService):

  • Find the ids of the documents having an older schema version
  • For each document: read the document and write a new version

In that way, the document will be written in the schema version determined by the schema-compatibility-version. If the application is running with schema-compatibility-version V2, then the documents having schema version V1 will be upgraded to V2.

public class TrainService {

public void upgradeDocuments(SchemaVersion sourceSchemaVersion) {
// Read documents from source-schema-version and write them again in schema-compatibility-version
List<String> ids = trainMongoRepository.findIdsBySchemaVersion(sourceSchemaVersion);
for (String id : ids) {
findById(id, sourceSchemaVersion)
.ifPresent(this::upgradeDocument);
}
}

private Optional<Train> findById(String id, SchemaVersion sourceSchemaVersion) {
switch (sourceSchemaVersion) {
case V1:
return trainV1Service.findById(id);

case V2:
return trainMongoRepository.findById(id);
}
}

private void upgradeDocument(Train train) {
switch (schemaCompatibilityVersionConfiguration.getSchemaCompatibilityVersion()) {
case V1:
trainV1Service.save(train);
break;

case V2:
trainMongoRepository.save(train);
break;
}

...
}

--

--

Bernhard Ruch
ELCA IT
Writer for

Bernhard is working as a software architect and developer, as well as project leader at ELCA since 1997. https://www.linkedin.com/in/bernhard-ruch-692a6168/