Where am I going to store the data
I stayed up pretty late last night looking at the various options for where to store the data. There are a lot and each had its pros and cons. First I looked at some the various offerings from the big guys, namely Google’s AppEngine’s datastore and AWS’s DynamoDB. I decided to not go with either of these because it really tied you into using their platform. What if at some stage in the future I wanted to host all this stuff in my own data center. Also, DynamoDB had little (there’s a Java client library that would accomplish this) to no transaction support and while this is itself not a reason to rule out this option it would mean much more coding on my end to ensure the data remained consistent.
Next I looked at some of the current NoSQL options. Again I ruled out these options as they felt a little immature or were not suitable for the task at hand. REDIS is a great option but the RAM costs are prohibitive (but I will use this for caching), MongoDB didn’t have transaction support and again while I could live with it there is just no need to deal with it now. Also there appears to be lots of forum posts about issues at scale.
At the end of the day I’m going to go with a straight up RDBMS. Facebook and Instagram started this way and they ended up OK. I also need some geo functionality so Postgres is the one I’m going to go with. Initially I will focus on just one database (no sharding) as this will probably be enough unless things are widely successful. While I won’t implement sharding initially I will try to set things up so that when needed sharding will be easier. This means I will not have joins and I will use an algorithm to ensure the identifiers can identify their shard and also have the property of being sortable by time. My next post will be on the algorithm I use to generate these IDs.