Scaling Search API
I’m going to tell you a story. But I’ll be honest, it doesn’t have a happy ending. In fact, it doesn’t have an ending at all… It’s about startup problems API design can cause you.
So lets get started. When we started Pygmalios defining product and finding the market fit was the number one priority. I am not telling you that it was bad. Thats the way startup works. You have tons of work and you don’t finish features to the perfect state. So we started to create technical debt.
We use mongoDB as metadata database for our installations, what stores are installed, which sensors are there allocated, user access and so on. And then there is node.js wrapper application with express.js framework providing resfull API to the rest of our ecosystem.
Lets talk about use case of searching and filtering stores by its ids when we want to return information about more then one. So we started to use query parameters and defined multiple parameters with the same name. The request query string looks like this:
GET /operations?_id=54f82fa621ca568e34efe4b8&_id=54f82fa621ca568e34efe4b9
And it will return the following array of store metadata objects:
[ {store1}, {store2} ]
This work great when you have a couple of installations. You don’t have to do a lot of work and you can focus on product development. But what will happen when you grow and do not pay out the technical dept early? The bad things…
You now have many other services accessing bad API, many api endpoints designed same way and hundreds of installations outside. The request length is gigantic (10000s of characters!) and you suddenly start getting errors for requests. It’s bad, it’s really bad. You are running production environment and can’t get any data. So you have to work fast. You have reached the limit.
You do some research and find out you can change app express.js config.
app.use(bodyParser.json({ limit: "50mb" }));
app.use(bodyParser.urlencoded({ limit: "50mb", extended: false }));
Fantastic, it works! You are ok for long time. The sky is the limit and you are far from it.
A “few” days later:
Not so great, we were collecting request logs in GCP Stackdriver and it caused minor issues as logs were getting enormous. There were also request headers used by our proxy server and logs were still growing so we had to deal with it. So we have removed the headers. Problem solved and we can go on!
However there were other big surprises. We were using Docker file with node.js major version defined which was automatically updated to the latest release in the CI & CD process after every commit to our application.
FROM node:6
And guess what? Other problem arrived! Luckily not a production as we deploy into test environment first. This commit into node.js has been released in version 6.15.0 and request headers limit change caused the problem! So after quick investigation of errors we had to revert back and pin node.js to version 6.14 as a newer version limits http_parser max header size to 8KB. So we had to update our Docker file.
FROM node:6.14
This node.js issue was meanwhile already fixed in release version 6.16.0 from 2018–12–26 and the max header size can be adjusted. However please avoid it.
Another direct hit!
After some time another problem arrived! This time in our data processing scala application and it’s Akka HTTP actor dependency. We have used the same API format, reached another limit of 512 query parameters and started getting request rejections. Luckily update to newer version solved this issue.
We had enough of hacks after a couple of severe problems and finally decided to pay out the technical debt.
The current way
So we have introduced a new API endpoint and changed GET request with query parameters to POST request with request body payload.
POST /operations/search
Now we can perform the search request by multiple ids the following way:
{ "_id": ["54f82fa621ca568e34efe4b9"] }
We can even use mongoDB search parameters in request payload:
{ "name": { "$regex": "pygmalios", "$options": "i" }
And thats it… The simple API change. However it’s not so easy as our ecosystem grow up, old API has been used by many applications, included many endpoints and we didn’t have unified request library we can use there. We had to refactor them all. It’s a pain, but we have to do it. That is our second regret we didn’t do in the beginning, to have unified library to API access. But the problem is we don’t use same programming language for all applications so we have to develop multiple of them…
As I said this story doesn’t have an ending. Pygmalios continues to operate and is still growing. It’s a fantastic ride and we are looking forward to solving the next challenges. Meanwhile new libraries emerged and GraphQL looks great.