A Startup Journey: Wherefore Art Thou, Database?
When we last chatted, we touched on how ongoing infrastructure and solution innovations enable startup teams to make more progress much faster and cheaper than in prior years.
This discussion was in the context of solo entrepreneurs and small founding teams, so it was at a high level. Let’s get a little bit more technical and explore one aspect of those innovations — startup database selection.
Note: this is not professional advice, endorsement, or recommendation — I am not your CTO.
Part of my motivation for writing this piece is the conversations that I’ve had with founders and would-be/could-be founders. So, it errs on the side of being written for a new team, just getting started, with or without a technical background.
As such, it walks the line between startup and enterprise-grade solutions, and it is not a software architecture text. I also considered stuff-as-a-service in the prior article.
You can consider this article to be a slice of database life, aimed at a generally entrepreneurial audience, possibly building their first system from scratch.
Finally, a note on my own database biases and fanboyisms— I’ve used both SQL and NoSQL databases for years, in analytical, data science, and software engineering contexts. I’ve tried to be balanced in the discussion below.
In a startup, one of the last things you want to do is take time to make seemingly long, drawn-out decisions.
Every second just thinking about things can feel like an eternity of not doing things… not making progress.
How can you move on to growth hacking and raking in cash if you’re just frittering time away, babbling about silly things, like databases?
Jam it in there and scale, right? Did you read about what your competitor is doing? You missed that? Wow. They could be miles ahead of you, right now!
But, anyone who’s seen a startup sputter and plod forward as the weight of legacy systems is borne by the entire company may raise an eyebrow or two. (And guess who wants to rewrite at that point — pretty much everyone in engineering, but pretty much no one in any other department. It’s not a super harmonious situation.)
And it’s hard to tell the future, when the present gives you so much hope. This hope makes it hard to make the changes that your system may really need.
For example, you might start off fast with a quick pick, just like you do with the lottery, and make it through your first investor and beta customer pitches. Nothing crashed. People smiled. Things might be taking off, even.
But as you grow, add features, pivot, and put off refactoring, you can then find yourself neck-deep in spaghetti code and hacks, with a surprisingly tight database integration.
Then, you’ll start to hear suggestions for a little refactoring…
In this context, the phrase “a little refactoring” essentially means printing out your entire codebase, dragging it out into the middle of the street, torching it gloriously, then building a new Franken-Phoenix from its ashes.
But to really get the point, let’s consider a scenario.
One Way To Start
I know that some people are on the fence, so let’s dig into a more concrete example.
As many a startup has proven, you can force fit your problem into a particular database solution. Particularly, if the database sounds pretty awesome.
In fact, some databases just exude an aura of awesomeness. You can feel it. Their names just roll off the tongue and make you want to use them.
Especially if we’re talking NoSQL databases. Because SQL is outdated legacy technology that can’t achieve web scale, right? And you need web scale right now, for your 34 possible customers! OK, so they’re actually the friends and acquaintances that you thought of emailing, but you’re sure they’ll all sign up, and then they’ll tell all of their friends, which just has to be like, 200 people, or maybe even more, and then — viral city! You’ll need at least petabyte storage capacity, just to get started. Should you buy a Ferrari right away? Or stay humble for a few months?
Still on the fence?
Have you heard the conversation? The deep technical selection process?
Maybe it gets kicked off based on a tech article someone saw while eating a hand-crafted organic boysenberry and tulip scone and thumbing through feeds on their morning train ride, before catching a scooter to the local co-working space…
“Well, let’s see. Redis. Mongo. Postgres. Firebase. Impala. Details. Schmetails. They just sound like they’ll solve any problem that you throw at them. They just have to work. Right? Let’s go with… Hold it. Neo4j? Wow, that sounds advanced — we gotta use that for our new extreme realtime reverse-priority to-do list and pet CRM app, FizzBuzzle. I’m sure it’s the best option. Wait. No. Wait. Oh…ho ho ho, I just noticed HBase. Did you see the logo? It’s aggressive AF. But then there’s Cassandra. I think that the logo just winked at me! It did. It winked! Did you see it? Oh. Ooohhh. That’s the one! That’s the one we’ll use to dominate our industry! Scale it up. Scale it up, now! And let’s make t-shirts to celebrate this victory! Alert the VCs! Tell them to form an orderly line.”
Someone may question the decision, but then everyone will kind of agree that just in case something new and obviously better (or at least with a better logo) comes along, you can always migrate.
“Migration is not a problem — people do it all the time, right? There’s no way we’ll get locked into the choice we’re making today. Change is easy. Just use a wrapper class and run a migration script, with a few seconds downtime, if any. What could go wrong? And those t-shirts won’t make themselves…”
Yeah, … No.
You can probably already see that I’m going to argue for taking the time to think through your persistence layer (a fancy way to refer to the way that you store data).
I’m not saying take forever, but take enough time to debate reasonably, and move forward with the best choice. At least sleep on it.
As an ex-Intel employee, I’d say disagree and commit — don’t commit and complain later.
And yes, I understand MVPs, lean everything, startup manifestos, pair programming, trust falls, and so on. But I also know what constant firefighting and resistance to change looks, sounds, feels, smells, and tastes like.
If you think through your decision (possibly even *gasp* documenting it), then you’ll have a basis for knowing when you should make changes, instead of just picking something else based on some combination of pretty marketing, instincts, peer pressure, and groupthink. You’ll thank yourself if (when) you pivot.
Problem-First Database Selection
Before you select a database, you should figure out exactly what problem you’re banding together to solve.
What does or will your company do? Why? With what uptime guarantee? What data, in what format, will be needed to solve that particular problem?
At an abstract level, you can view your data as a list of sets of answers to questions. In other words, each record or entry into the database answers some set of questions, producing a list of sets of answers.
In case I just made no sense whatsoever, consider the following list of two transactions, where each row answers the questions:
- who made the purchases?
- how many purchases were made?
- how much was spent in total?
CustomerId Purchases TotalSpent
1234 2 10.00
1235 1 10.00
And yes, I said list instead of set, in case there are duplicates. Ahhh, data science. Anyway…
So, in solving your particular problem, what questions are you asking? What should the answers look like? Who will consume the answers? When and where? And so on.
Once you clearly know the problem, then you can figure out the right tool or tools for the job — you may find out that you truly need more than one database solution, for example.
Finally, compromise where you must, while planning for the future, and implement. Reality will set in — some databases are too expensive for a startup to justify running out of the gate. And you may need to stage your implementation and flesh out an upgrade plan.
To tackle a concrete problem, let’s discuss why I founded Next Mountain in the first place. I’ll explain Next Mountain’s initial goals and what trade-offs I considered in late 2016, while briefly touching on how the company has pivoted.
Next Mountain’s Initial Focus
Many startups pivot. But, they all start somewhere, or else they’re not a startup, they’re a club.
Next Mountain’s initial product focus was AI-enhanced social personal optimization. Lots of buzzwords, I know. But let’s postpone judgment and keep going.
I’m tempted to tell you just how much I like optimization. And, if you’ve read the other posts in this series, then you already know that I’m a data guy; I’ve been working with data in educational and professional contexts since the 90s. I even attended the first international Quantified Self conference. But, I’ll leave that for another day. Suffice it to say, I really like making things better.
On the tech side, I was also familiar with using sensors to attempt to infer people’s internal states, from studying psychology in school.
And I paid attention to smart dust, smart grid, smart mesh, smart whatever, off and on since the 90s, when smart dust was the size of bricks.
I constantly wondered… Why don’t people make the most of their abilities? Why don’t people set and achieve more goals? What can be done to help those situations?
So, when I left my prior role and founded Next Mountain, this is what I wanted to work on. A tool that people could use to optimize their lives through technology.
The idea began as a combination of goal setting across multiple categories, making personal commitments, sharing progress publicly, a big sack of metrics, and an AI version of your conscience, to keep you on track and suggest what you could be doing. Additionally, you would be able to chat with and encourage friends and groups of friends, add tags to goals, and so on.
Furthermore, wherever you happened to be, and on whatever device you were using, you could input information regarding your goals and achievements.
Great. So, how should we choose a database solution?
Now that we have the broad brush strokes of a product specification, let’s consider what database might make the best sense.
We can pull out several key items as questions:
- exactly what data will need to be stored?
- how will that data be organized?
- how relational is that data?
- how will that data be queried?
- will data modifications be frequent?
- how will users view their data, and in what form?
- how will the data be aggregated or transformed?
- how will users interact with each other?
- how will users interact with the AI components of the system?
- how will users be alerted to things?
- how can we insure data security?
- what happens when the user is offline?
- how do offline changes get folded into a user’s global data, and when?
- how will data sync across a user’s devices (or not)?
- should data sync be realtime, when online?
- should the user resolve sync differences?
- are there any key performance metrics that must be achieved?
And add some higher-level technical considerations:
- how much scale is needed, right now? and what’s a best guess about future needs?
- how skilled is the team, in terms of not only implementing the database solution, but not coding in a way that makes it catch fire (in the cloud)?
- what flexibility is needed for unanticipated future changes?
- how well does this solution integrate with the rest of the technology stack? (e.g. if your team is committed to the MEAN stack, then you are probably going to use MongoDB)
- what if the entire system needs to be changed for some reason (scaling up, new capabilities, pivoting, provider shutting down, etc.)?
But we also have more business-oriented considerations:
- what’s the financial budget for engineering?
- how fast do you need to implement, realistically?
- what resources do you have — do you have dedicated database administrators and operations professionals, or do you need a managed solution?
- same as above: what if the entire system needs to be changed?
- will you outsource any development, and how will that be permissioned to insure data and system security?
There are certainly more considerations that we could list, and I’d encourage you to consider those as an exercise, with your own startup idea as the target for consideration.
Then, you can prioritize the questions in terms of how important they are to your own product. Let’s try that for Next Mountain, at a high level.
I won’t drag you through each question above, since that’s an exercise for the reader, but we can apply several of those questions to the Next Mountain example case.
You’ll see that answering these questions and prioritizing your answers can get you pretty far in terms of selecting a reasonable database solution.
So, as you consider each of those questions for Next Mountain or your own startup, you’ll begin to notice that the application needs certain key items.
I’ll list some of the more important considerations that I took into account:
- security—security should be at the top of every list; for example, some database solutions have a free version, but TLS/SSL isn’t included, which I consider to be insane to run with user data in production (to be clear, I’d put unencrypted data transmission up there with plain text password storage, in terms of bad decisions; these are both the sort of shortcuts that you’ll regret later)
- mobile first — the database should not get in the way of the app’s mobile device performance, because the user could be anywhere when using the app, and clunky interfaces discourage users; if the database solution somehow improves the mobile experience, then all the better
- offline first—along with being mobile first, the app should not fall over if the user is offline; instead, it should gracefully handle new data; kudos if the database solution can facilitate offline activity
- responsive to changes — it’s reasonable to expect that data will be added and modified frequently, on both the user side and system side, and any chats/notification should be timely
- realtime data sync — data should synchronize across all of the user’s devices, including resolving any differences, whenever the user is online; of course, this could be accomplished using websockets and some custom difference resolution code, but if the database solution supported it out of the box, then that’s a bonus
- cost effective — as a bootstrapped startup, a beefy license fee is a nonstarter; I can afford incremental service, but not big fixed costs
- managed, scalable, and reliable — as a solo founder of a bootstrapped startup company, I don’t want to spend all of my time on ops tasks — scaling the database, debugging synchronization failures, and so on, if there’s a solution that does this well (sorry, devops friends…)
- ecosystem and community — I was reasonably agnostic, as long as there was a well-supported API, no imminent danger of deprecation, and a lively development/support pace; I had preferences, but I usually avoid esoteric programming languages in production
- timeline — come on, of course I wanted it immediately! (so, I prioritized selecting the database and technology stack above most other work, which you can argue was right or wrong, but it let me get a prototype to users reasonably rapidly)
For the more technically inclined and others who think in terms of the CAP Theorem, I clearly cared more about availability and partition tolerance than consistency, because that would facilitate the desired user experience.
For everyone else, I wanted the app to be functional at all times, and data differences across devices to be resolved eventually, because users would probably be happier that way.
Let’s survey the landscape and make a decision.
If you want to make an informed decision, then this is where you start your research.
DB-Engines is a good place to start, but it’s worth taking at least a day or two to digest the marketing messages, production war stories, and other gobs of information. You could reasonably spend several days doing this.
If you’re technically inclined, then keep in mind the pace of change in this industry. The solution you may have used a few years ago could be much clunkier than what’s currently available.
For example, my first experiences with MongoDB were around version 1.8 or so (not sure if I used 1.6, but definitely 1.8). Both the database itself and the company’s offerings have evolved very nicely over the past several years, now including a completely managed solution, Atlas, among other offerings, like Stitch. These are all worth considering by a startup.
So, if the MEAN/MERN stack makes sense for your application, or if you’re using Meteor (perhaps for MiniMongo), then be sure to consider how any critiques would apply to the latest version of the stack/framework in question.
After the research comes prototyping. Try some of the solutions to see how they work. The prototype doesn’t have to be extremely complex to give you an idea of how working with the database will proceed.
What did I choose for Next Mountain?
After considering and prototyping both realtime database solutions, like RethinkDB (which is nice), and offline database solutions, I chose Firebase. Later, I migrated to Cloud Firestore, while it was still in beta (yeah, it’s a beta, but it’s a Google beta).
Keeping in mind that I was considering these database solutions in 2016, I’ll mention several others.
Apologies to any databases that I’ve omitted; I’m sure that I’ve missed several, but you’ll certainly find them as you do your research.
There are some real gems in this list, so it’s worth considering them, if your problem domain is similar to Next Mountain’s.
Overall, considering that Next Mountain has pivoted several times, both Firebase and Cloud Firestore have been very solid.
Publicly released pivots include Solo CRM, a career management platform, and a simulated cryptocurrency trading platform. The latter two are currently live.
In 2016, there was some uncertainty around mobile database acquisitions, notably Parse, which gave me pause.
However, I attended meetups and chatted with speakers and other developers, and I found that Google was investing substantially into Firebase. So, after a lot of consideration, I gave it a shot.
Also, there were some popular war stories and highly cautionary tales. However, I felt strongly enough that I could model the data in a way that would scale, given the problem. It’s worth paying attention to war stories, but do your own testing.
On the mobile side, through internal prototyping, I have had no problem with Firebase’s native mobile libraries, although Next Mountain has yet to release mobile apps to the general public as of this writing. I’ve pleasantly found that Google products integrate well with Google products, including Firebase with Android.
I’ll mention that I’ve found some brittleness in cross-platform mobile solution libraries that wrap native Firebase mobile libraries, and the one that I was using with React Native was deprecated a few months after I considered it. However, in fairness, it’s not a trivial problem and kudos to that team for even trying it. There is another version now, so if you’re using React Native and want to use Firebase, then you’re probably in luck.
I’ve also found that Firebase’s offerings have continued to expand, and if you haven’t tried going serverless, then you’ll be pretty excited when you do. [edit: Serverless works great, except for when it doesn’t, as I was reminded less than one day after writing this piece (code has been running hourly for weeks, and all of the sudden, no). Of course, Firebase Cloud Functions are still in beta, and servers can fail. But the timing was impeccable.]
Cost has been reasonable. You can find some discussion of Firebase’s cost over the past couple of years. I have read about some speed bumps, but if you’re careful, I think that the costs can be managed well. Currently, you can start free.
All in all, the solution fit the problem exceptionally well for Next Mountain. I’ve been frustrated by several things while building web and mobile apps at Next Mountain, but Firebase hasn’t been one of them.
I hope that this discussion has been helpful to those who are embarking on the journey to build a new and exciting startup from scratch.
Take your time on the critical pieces, though.
For database solutions, consider
- how well does this database really solve your problem?
- what is this database good at?
- what features are really just marketing speak?
- what is the team behind the database working on?
- do you care about new/upcoming features?
- if the database solution was recently acquired by another company, is that company investing in the ecosystem? is it integral to their business model?
- how much can you afford to spend on database-related costs?
- how will you know if you’ve outgrown your database solution?
And so on.
In summary, after full consideration, choose the system that makes sense for you — choose it because it solves your problem, given your constraints.
Of course, someone has probably been exclaiming the whole time that they’re reading this — “wait, this article applies to selecting my entire tech stack.” And they’re right.
As an aside, I did leave out blockchain considerations, and blockchain as a storage medium is highly relevant in 2018. I’m working on a related article, but for a good discussion, you can start here.
Posts In This Series
- A Startup Journey: Deciding To Take The Journey
- A Startup Journey: I’m Going Solo
- A Startup Journey: Wherefore Art Thou, Database?
- A Startup Journey: Need A Little Help?
- A Startup Journey: AI-Powered Personal Optimization, Solo CRM, Career Management, Crypto Trading, And More? [Part 1]