I’m currently facing a decision about the Database technology I want to use for a personal project of mine. I thought I’d blog about it here, I usually write things down to try and figure my way through, this time I just thought I’d share it.
Currently I am using AWS DynamoDB which is a great tool and I really like using. It fits in the AWS tech stack very nicely and essentially means that I don’t have to care about scaling because AWS will handle most or all of that for me. It means for a very easy life. However, the problem that I have with it is that it is a NoSQL platform. This means that in order to be using it the way it should be used then you have to think about it as a very different beast from your typical RDBMS solution. Things like Joins and relationships in a NoSQL solution are completely out of place and generally will cause issues for you down the line, especially if you are expecting to hit incredibly high performance. It’s a little bit like trying to use the edge of a hammer to screw in a screw in a piece of wood, you could probably do it, but you really shouldn’t.
The reason I am thinking about this is that, I started of with DynamoDB and everything is fine, but the more I dive into the domain model, the more I find that things are not as simple as I would have liked. Or, more specifically, I find that my domain needs to have relationships between different domain components and I need a way of maintaining those relationships through updates.
The scenario I have is, I have a user who is subscribed. This user subscribes to a learning path. This learning path may be updated over time with either new materials or new entries. I want to be able to propagate that update to all users who are on that learning path so they can take advantage of it. In a NoSQL context this means that I need a job which will manually merge the new learning path into a users profile and hopefully I won’t blat the evidence/notes or anything else that they’ve logged against various pieces of that. In an RDBMS context this problem becomes much simpler, I can add entries to a learning path easily enough without any risk to anything being deleted, the only time this could occur is if I allow learning paths to have components deleted. I don’t think I need to allow this, and if all else fails I can do a boolean delete so that it isn’t really gone, referential integrity is maintained and the notes and evidence are still available through a more central aspect of the users dashboard.
Right now I’m still fairly early in the development of this, changing from NoSQL to RDBMS is going to be difficult but is achievable. The longer I leave it though then the more I cludge in and the harder it becomes to actually make the changes that I need to make.
Typically I’ve gone for NoSQL solutions simply for ease of use and ease of getting started, but honestly, right now, I’m not sure that is a good thing anymore.
I think I’m pretty much set on making the shift. To play devils advocate though, I could move any user added content out of the learning path and do a join in the front end code to match up entries with components. See I already don’t like this, I’m adding complexity to my code to accommodate the fact that my technology stack isn’t suitable for the domain. I’m sure there are a lot of ways I could make this “fit”, but just because I can doesn’t make it right, and if the cost is complexity when a simpler solution exists then I have got to choose simplicity.
In my minds eye I think part of the problem is that I view RDBMS solutions as intense solutions which are costly. And in many ways they can be, however they are absolutely more than performant for what I need and if needs must then I can scale these out. Tech has managed to come an incredible distance on RDBMS technology, I’m pretty certain it’s more than suited to my purposes. As long as I avoid doing catastrophically stupid things like putting logic in the database, that would be bad.
One of the other benefits of using an RDBMS solution is that currently DynamoDB cannot be setup for different environments without doing something different each time, by this I mean that it’s the same DynamoDB instance for local, development, test and live. This is terrible and requires a lot of work to deal with. At least with the RDBMS solution I’ll be tackling that pretty much right out of the box.
Ok, I’m pretty happy about where I am going with this now. I hope that was useful for you too. I find that writing out my thoughts like this can really help me to figure out the best way of moving forwards, the benefit here is that you get to see this too.
Edit — 22/02/2017
I’ve now started implementing this database change and there are a few things coming out that are probably worth mentioning. None of these changes the fact that I currently feel that using an RDBMS solution is the right way to go, but it’s just extra complexity to consider and further evidence for making these decisions sooner rather than later. Putting off this decision would cause more pain later on.
Some things that I have to now consider are the fact that in my NoSQL schema, if I have embedded entities of some form, these now need to go through the correct normalisation process. This means that I need joining tables for many things. This is in fact what I was expecting to happen, but it is a stark reminder that it isn’t as simple as changing a few keywords in your entity classes (I’m using JPA) and then creating just the basic tables. Now I need to worry about keys, referential integrity and what the right way of modelling these relationships is. It’s worth keeping this in mind because it’s the kind of thing that could be ignored.