All the databases

Dave Hulbert
Jul 18, 2018 · 4 min read

The database landscape was largely unchanging years ago. There was still innovation but it stayed tethered to SQL. SQL was king. DBAs wrote stored procedures and DevOps hadn’t been born. At this time there was a 90% chance you had a single database architecture that was either Microsoft SQL, Oracle or MySQL.

Times changed and people started talking about CAP instead of ACID. The idea of NoSQL came along and new database engines popped up almost as often as new JavaScript frameworks. Today, software teams have a vast choice of database systems to choose from.

Passenger is made up of many services (not quite microservices) and various database systems. When chatting about our tech stack to other PHP developers it takes a while to get through the list and explain the reasons behind them. We often forget about some of them too, as in many cases they’re sitting there in the background, doing reads and writes all day long without complaining.

If you’re interested in the “why” behind some of the tech at Passenger then read on…


Image for post
Image for post

Almost all our transport data is stored in MongoDB. Mongo falls under the “NoSQL” class of databases and is document-orientated. Mongo lends itself well here as public transport journeys are discrete objects with lots of data. To fetch a timetable, we look for all the journeys in that timetable. This is much simpler than storing individual times as rows in a SQL database as it means things like journeys crossing midnight aren’t a special case.

A benefit working with fixed timetables is that there is a whole sub-domain of data that rarely changes. This means we can focus on having simple and performant read structures. We do this by performing the more intensive writes as a series of background import jobs. Any time we want a new view on the data that existing documents don’t lend themselves to, we can add a new job to import data in a different way.


Image for post
Image for post

Ticket products and payments are stored in MariaDB. The relational nature of it allows for various different views and reports to be created and its atomicity is suitable for payment transactions.


Image for post
Image for post

Elasticsearch is a search engine. We, of course, use it for searching. This powers things like the auto-complete results in Passenger apps and websites. Google is very good at search too but Elastic allows us to fine tune what data goes in and what data is returned in search results, prioritising locations that can be reached via each transport network.


Image for post
Image for post

We use Redis for real time transport data, such as stop departures. Redis’ built in timed expiry of keys is perfect for automatically cleaning out data. Being in-memory, Redis is very fast, which allows us to serve departure boards for many thousands of stops in just a few milliseconds.


Image for post
Image for post

Amazon’s DynamoDB is used as Passenger’s configuration. Distributed services can access parts of the same config over the internet. We have a package used in each service that fetches the config for the service in a uniform way.


Image for post
Image for post

Getting more esoteric, we also have an in-memory directed graph of journey data. This allows us to plan journeys across nodes a transport network, querying based on geospatial and temporal constraints.

We’ve built 2 of our own journey planners from scratch in the past but Passenger now uses OpenTripPlanner behind the scenes, with our own data augmentation and processes going on too.

At first glance having so many different databases may look like overcomplexity. In reality, the storage engines have clear responsibilities and contexts in which they use the data. The isolation means we haven’t ended up with one monolithic SQL server that gets out of control.

Having 6* different databases can be a bit of a learning curve for new developers. Fortunately, PHP packages like Doctrine abstract away many of the differences. Ansible and Terraform (for provisioning servers and development environments) helps here too, as new instances can be created without having to configure anything.

There’s various bits we’re expecting to change in the future. There’s a good chance we’ll outgrow Redis for some of the real-time things we want to do and have to move that elsewhere. We’ve only scratched the surface of Elastic, so want to exploit more of its capabilities to provide a better UX. More of our storage is heading towards AWS too, to make scaling easier and so we can focus on the code.

*At the current count, not including storage used by websites and apps — that will have to be a separate post.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store