Cassitory: redundancy tables within Cassandra

After working with Cassandra for a while and not having Apache Spark as a viable option, running different queries may become quite of a challenge.

In the eventual case where you want to execute a query that would not be possible with the current table structure, Cassandra recommends to have redundancy tables, because storage is cheaper than memory.

In the pursuit of this goal you will end up writing a lot code just to ensure that you persist to both tables and let’s not even think about whether you have to bring a third table into the equation.

So I decided to tackle this problem by writing a library that gives you a Repository capable of persisting an entity to multiple tables in a transactional way. Meaning that if one write operation fails then all will be rollback.

And the important part is that can be done in a async or sync way depending on your use case.

How does it works?

Cassitory has a big requirement in order to work, at least for now, which is You have to write your Cassandra entities mapped using Cassandra annotations and Cassandra Object Mapper library

This image will illustrate the situations that you will ending up writing:

As you can see the Cassandra entities have been annotated with Cassandra annotations.

Cassitory rely on object mapper at the moment therefore in your Cassandra entities.

Once you have all the Cassandra entities ready to use, the next step is to create an entity that have all the values that would be used to create the cassandra entities for you and persist it.

As you can see the concept is very simple, one entity that holds all the values that you need and two annotations that does all the transformation magic from this entity to the Cassandra entities.

CassandraEntity: defines which Cassandra entities will be persisted. Also provides an optional destinationPackage attribute where you can specify the package of the generated code otherwise will generate the code in the same package of the annotated Cassitory entity.

@CassitoryEntity(target={UserByName.class, User.class}, destinationPackage="repositories")
class UserDto {

Mapping: How to map the field to each Cassandra entity. At the point you can annotate your entities in many ways eg.

//Map the field to the classes User and Username and populate in //each of them the field name.
@Mapping(target={User.class, UserByName.class}, field="name")
private String name;
//Because the field name in this entity is the same than the ones in //the Cassandra entities you can omit the field value.
@Mapping(target={User.class, UserByName.class})
private String name;
//In the case where you have different field names in your Cassandra //entities you can do this
@Mapping(target={User.class}, field="name" )
@Mapping(target={UserByName.class}, field="fullname")
private String name;
//And last, you can combine both approaches
@Mapping(target={User.class}, field="name" )
@Mapping(target={UserByName.class, UserByAge.class}, field="fullname")
private String name;

Cassitory will generate a derived BaseRepository class following the convention of

<name of the cassitory entity class>BaseRepository

and you can use it just by creating an instance of it.

MappingManager mappingManager;

UserDtoBaseRespository repository = new UserDtoBaseRespository(mappingManager);

Conclusion

Cassitory use annotation processing to generate code for you. The generated code does NOT use reflection and execution of async queries. Also provides a few advantages like

  • Catch any mistake in the config at compile time.
  • Easy to add a new table or remove.
  • Easy to integrate with your DI framework.
  • Lightweight
  • Mapping annotation provides a lot of flexibility in order to simplify your config or have more complex scenarios.
  • Use of transactions for the persistence and therefore rollback of all the operations.
  • Support for different numbers of fields between Cassandra entities.

Github: https://github.com/caelcs/cassitory

--

--