One of our core products is Nordcloud Insight, a cost visibility and optimisation tool where we provide information about cloud spends for our customers. During the last few months, we went through a major rewrite of the tool — we have moved from Ruby on Rails application based on Elasticsearch DB towards a serverless application made with AWS powered stack, Go, and Redshift as our main data storage.
Our data exchange model fits nicely into GraphQL — our UI application queries data, with multiple filters, drill-downs, and groupings, so it was easier for us to design a proper GraphQL schema instead of creating a RESTful API. Additionally, because we designed our schema as a first step in the product development process, we had a chance to start working on the UI and backend simultaneously.
Serverless GraphQL API? One of the ways to create such an application on public cloud is to use AWS AppSync — a really powerful API gateway — think GraphQL as a Service.
We have quite a lot of data to work on within runtime, it counts in billions of Redshift rows. Our queries are pretty complex because we are combining multiple filters in almost every query. We have quickly realised that there was a tiny chance that we could optimize our schema and queries to the point where creating synchronous API was possible, especially that the time for AppSync data source to resolve the data is limited to 10 seconds. This limit is a positive thing in terms of the end user experience, but it became a serious problem for us. After the data schema optimisation we saw that sometimes Redshift will need more than 10s to respond, even about a minute to respond to the most complex queries from our client app. We’ve decided to use additional caching layer, so now every query is cached and the next query for the same data will return data quicker. The response time will be well below one second, but how to communicate such scenario to our client apps? We could, of course, create some kind of short polling mechanism, where we would ask for a data matching certain query, and if the API call is timed out by AppSync, we could repeat our query, and then at some point the first triggered query would be available in our caching layer, but this approach would be far away from optimum.
We’ve decided to leverage GraphQL spec even more and use a special kind of GraphQL operation — a Subscription. Subscriptions allow you to subscribe to a stream of events that are created every time the API data is mutated. In other words — if someone calls a mutation on our API, everyone that is subscribed to the mutation will be notified.
So in our API, we’ve got two paths — one that returns data immediately from our caching service, and second that works asynchronously.
If the data matching our client query is available in our caching layer, we can return it synchronously within a few hundred of milliseconds. Then the tricky part is when we don’t have data matching the query in our caching layer.
I’ve marked synchronous, blocking connections using white arrows and async, WebSocket based connections using the blue ones. As you can see, if our
Resolver Lambda function gets no matching data from the cache, it will return the query with information that the client should connect to our API via Subscription and wait for the data there. In the same time, this lambda calls another one, called
GetSpends — this one gets connection details from the previous lambda and asks Redshift for data requested by the client. As soon as Redshift returns the data, it calls AppSync
spendsReadymutation that is matching the subscription from previous Lambda and sends the data using the mutation. Then, AppSync sends an event to the client subscribed to our mutation. This way, even if the query takes a long time, our UI user will be notified that we are still waiting for the data to arrive and it can take longer than usual.
The only outstanding thing was to make sure, that user can subscribe only to mutations that are intended for them— we used simple VTL response mapping code and leveraged Cognito integration with AppSync. Each
spendsReadymutation is called with a user name, and in the subscription resolver, we are checking if the user name is matching the user name of the subscribed user. If not, we are throwing an unauthorized exception.
The rest of our API code is pretty straightforward, just a few API calls using AWS SDKs. AppSync helped as a lot — by leveraging its features, we’ve managed to solve a very important problem of serverless architectures — asynchronous communication between the front end and back end of an application. Ability to create such APIs opens a lot of new possibilities for serverless developers!
At Nordcloud we are always looking for talented people. If you enjoy reading this post and would like to work with public cloud projects on a daily basis — check out our open positions here.