Quite a few people have reached out since announcing that VideoAmp finalized our migration from MongoDB to Postgres. For us, it was about data integrity. We had budgeted 6–8 weeks for migrating models, and shoring up tests; we did it in 3.5 weeks end-to-end!
SELECT * FROM database_history;
To understand the decision making context I’ll briefly explain my position. Historically, I have been a relational database thinker and tinkerer. My mentor in SQL, Steve Young, was also a CPA and learned me in the ways of 1000-line stored procedures, and accounting-friendly data dumps that would make your head spin. When document stores such as MongoDB hit the scene, I was slow to react…
After reading this somewhat dated rant: http://pastebin.com/FD3xe6Jt
I became indifferent of those downsides until the app needed scale. This is why we chose it early on with VideoAmp; to run faster with node.js.
Fast-forward to today, we have about 30 talented API, data platform, and data science engineers, and multiple postgres experts on staff. They would wince, frown, and pot-shot at mongo’s weaknesses, and inability (sans a model layer in a programming language) for strict data integrity. What about shops like ours who use node and scala extensively? How does an app enforce model integrity without double-maintaining models in multiple languages?
It creates more problems than it solves….
Loose coupling, SOA, microservices aside, we have already benefited by placing strict rules in the DB schema and letting any languages (python, node.js, scala) talk to it.
Building the Migration
So as a hack project, we decided to build a migration app that moved from mongo to postgres, using Bookshelf as a model layer for node.js. It started with a strict schema, which mapped array types from mongo, to enumerated types in postgres. Children documents in mongo became additional tables with their companion linkage tables and foreign keys. In about 3–4 days the migration was done, and the API team took over from there.
It was amazing how much cruft our mongo database accumulated over 18 months of development. Granted some of the cruft was from a lack of discipline from our engineers, but one could argue that it was such a lax environment that it promoted the data to be inconsistent over time.
For example, when you change enums in a mongodb, you would have to build a migration app to correct the data… if you want to…. or not…. but the DB itself will not complain. Having strict typing and schema checking protects us from this problem. So instead of “CPM”, “cpm”, “Cpm” as data, it’s just “CPM”. Instead of fields which may or may not be present, we’re explicit about them now. And similarly, foreign keys protect us from orphan relationships.
The Morning After
Now we are moving full steam ahead launching more features. Every engineer possesses sense of understanding and confidence that our data is integral. We utilize stored procedures, and hooks which form the basis of our audit trail; regardless of which application is making the changes; we are not reliant on the application layer to save our asses, nor do we wonder if an engineer is manually tinkering with a record. We have layered on multiple new features with impunity, and most importantly; we’ve elevated the morale around our engineering teams as postgres is much more aligned with their past experiences.
I’m not hating on MongoDB, because it helped us through the prototyping phases, and it’s been valuable getting shit done on smaller scale “one-offs”. Now that VideoAmp scaling up our engineering team and our customer volume, it was vital to abandon it because it promoted “Loosey Goosey” practices.
Originally published at https://www.linkedin.com on November 19, 2015.