Vulgar DBMS

Which DBMS?

What are we talking about exactly?

So, what do we mean by “the best DBMS for most projects”?

  1. We’re talking about a “general purpose” DBMS which is the “source of truth” for most or all system state.
  2. We’re focused entirely on managed, cloud-based systems. Most of the time, this is the best way to deal with operational costs, even if you happen to have in-house admins. Backups and vertical scaling options are a must. Horizontal scaling and replication are both expected.
  3. We’re focusing on initial development. Some projects might grow to be huge in time and benefit from special handling, but I’m talking about what you should use to develop 1.0. Do the easy thing first, figure out how it works, then transition to something more specialized and costly when you have more time, money, and experience.
  4. There’s not necessarily just the one database in the system. A separate DBMS may be used for text searches, high volume logs, etc.
  5. I’m going to require support for basic transactions. Since we’re talking about the “source of truth”, lack of basic transaction support can be problematic. Many applications might be able to get away without transactions, but transactions are often useful and sometimes necessary. Full ACID compliance is not a requirement, however.

How do we evaluate?

First off, let’s talk about the things that don’t make much difference. I’m not going to talk much about operational costs, performance, or scalability. Moore’s law and decades of engineering have made most DBMSs fast enough on cheap hardware. The upshot is every DBMS I looked at can handle more than enough data with good performance for not much money.

Why the RDBMS is still champ.

In short, the “relational” bit of the RDBMS is important and gives us many desirable features. “Relational” here refers to a branch of mathematics (if you’re interested, here’s a concise explanation). The upshot is data managed in such a way that makes it relatively easy to search and recombine.

To be specific…

One of the nice things about relational DBMSs is that they all, at the core, “work the same way”, so moving between them is relatively straightforward. that being said, I’m selecting Cloud SQL with MySQL. While I feel that Postgres is generally the better DBMS, MySQL has better tooling in general and supports more features in Cloud SQL and gets a grudging nod.

Let’s be honest about the ugly.

The interface to relational DBs is often clunky. SQL forces us to decompose/recompose and reformat the data into the (JSON) format we really want. PSQL stored procedures are ridiculous. Luckily, there are many tools and libraries which can help here. I will say, I find most “object-relational mapping” tools to be more trouble than they’re worth, but light-weight libraries that make things just a bit more aesthetically pleasing help a lot.

Other recommendations.

  • For a real-time DBMS—such as might back a chat system—use Cloud Firestore. This is Google’s 2nd generation cloud-based, document-oriented, real-time DBMS.
  • For powerful, user-driven text search, use Eltasticsearch. It uniquely (AFAIK) offers a mature, fully managed cloud option, and though it was a few years ago, Elatsicsearch came out on top of the last big evaluation I was involved with.
  • I see no reason not to use Stackdriver for logs whenever possible. I’ve used both Kibana and Stackdriver, and though I’ve never been involved with a comprehensive evaluation of alternatives, Stackdriver is easy to use and often automatically setup when working within GCP.
  • For the adventurous, try GraphGrid GDS as the source of truth. It’s Neo4j in the cloud. Neo4j is the hands-down market leading graph-oriented DBMS and, is the only graph-oriented DBMS I know of with a reasonable, fully managed option. Graph-oriented data modeling has a lot to recommend it, and Cypher (the query language) looks to stack up well against SQL. As with any newer technology, the maturity, tooling, knowledge, and support are not as good, which is why committing to Neo4j feels a little risky.

Selected notes on other DBMSs.

  • MongoDB Atlas is managed MongoDB available on GCP (as well as AWS and Azure). Stitch adds a Cloud Functions (or AWS Lamba) like layer on top of Atlas. Devs that just never got the hang of relational DBs love MongoDB, and for good reason. It’s a fast, flexible, and mature document-oriented “NoSQL” database.
  • Cloud Datastore is Google’s other cloud-based, documented oriented DBMS. The latest incarnation actually runs on Cloud Firestore. Like CF, queries are constrained compared with SQL and both designing and iterating the data model is generally much harder than with RDBMSs.
  • Cloud Spanner is Google’s relational DBMS at big data scale. It’s super cool but expensive. As of 2018–11–08, you’d pay $660 for 1-node with 10GB of storage. For a given performance level, Spanner may be cheaper “at scale”, and provides an intriguing transition option.
  • LokiJS is a super-simple all in-memory DBMS that uses indexed JSON directly. While simple, it currently lacks a managed solution and support for horizontal scaling and replication.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store