How Apollo GraphQL had a major impact on our microsite performance

Published in

wehkamp-techblog

6 min readJan 25, 2022

Our journey on reaching new performance heights

This is going to be a story of a journey we made with one of our microsites. It was one of the first we build from the ground up at Wehkamp back in 2018 and its main purpose is to show overview pages filled with products, filters, navigation, and some related content. How in the end, less became performant.

The initial stack we started on back then was a React-based site, built with Webpack running on Node with Redux as the state management. For us, this setup was a major improvement on performance as we were migrating away (Pac-Manning as we called it) from an on-premise-based monolith to a cloud-based microsite and microservice architecture.

A new kid in town

In 2017 GraphQL came along, looking very promising, became hip & happening very quickly, and new and shiny is always enough for us Frontenders to want it implemented into our sites.

We decided to do a proof of concept with Apollo GraphQL. We created our own NodeJS server with the Apollo Server, connected Apollo Client to our React setup, and basically made the GraphQL endpoint part of the website. You can debate whether this was the right solution, but at that moment in time for us, it was the only way forward. From this moment on GraphQL was the preferred way of collecting and storing the data and it was also the basis for our newly created microsite.

GraphQL as part of the microsite calling the needed services

First cracks in performance

Over time the microsite evolved and needed to handle more and more visitors and by adding new features also more service calls needed to be handled which ended up in slower response times and broken pages when the load became too high. Of course, we could scale up the number of instances, but that would not really fix the underlying problems.

We added caching, timeouts, and the abort controller on service calls, caching on GraphQL queries and even refactored resolvers to handle the service calls synchronously. All these changes made great improvements in performance and the site was once again more robust and highly performant again.

What did it bring us?

During improving the performance, we challenged ourselves on what the benefits of GraphQL in our setup actually was, if we really needed it, and what the main reason was why we implemented it? Also, why do we want to ship more clientside javascript to customers?

We couldn’t find any good answers to all these questions, so at that moment we decided to start a new proof of concept where we removed all the GraphQL logic from our NodeJS server. Our main hypothesis was that we should be more performant without it. We also needed to create a new content microsite at that moment (which also serves the homepage) and we made this microsite with the performance lessons we learned but without the GraphQL server.

We immediately saw that the performance of this newly setup microsite was way better than the one it replaced. Fewer resources were used (CPU/memory) which led to fewer instances to be able to handle the load and the clientside javascript bundle was decreased by over 125Kb.

Our eyes were opened

After seeing these remarkable findings we decided to also remove GraphQL from our site as this one is the most heavyweight site we have. But this is also one of the most complex sites we have, with a lot of service calls, rendering multiple types of pages, and also has clientside rendering when filtering, paginating, or sorting the pages.

We managed to come up with a plan of attack and moved the fetching logic from the resolvers to an initial data service call on the server. Created our own clientside routes to fetch data and used React hooks for our state management.

Show me the numbers!

Enough talking, let's show the metrics. We deployed the updated version of the microsite at 07:30 in the morning and the drop in resources was amazing!

We also run load tests every morning, where we saw an amazing drop in latency and a rise in tests throughput.

        Threads     Tests         Latency (s)
Before  100         32.828        1,053  
        200         43.568        1,794
        400         52.382        3,290After   100         44.198        0,420
        200         58.070        0,479 
        400         69.150        0,553

Due to the gains on the server and the lighter javascript bundle, we also saw the uplift in the Core Web Vitals part. The Lighthouse performance score was upped by several points and we saw big drops in Time To First Byte, First Meaningful Paint, First CPU Idle, and SpeedIndex. All in all a great transition with phenomenal results!

Is this the end state?

Probably not, but we learned a lot these couple of years. We had a learning curve implementing Redux and later GraphQL. We really opened the hood, became more and more aware of site performance with potential issues, and managed to solve a lot of them. Our current stack is almost the same as where we started our journey, we updated the basis to the latest versions and moved state management from Redux to GraphQL to React.

The main takeaway for us is to always challenge if you really need that package, solution, or approach. Does it really benefit you, can you prove it with reason, findings or metrics? Personally, I’m glad we made this journey because we learned a lot from it and take away all these learnings as the foundation for future endeavors.

This story is not a rant against Apollo or GraphQL, I still love it but for this use case and our implementation it did more wrong than good. Think it still could be beneficial if used more like a service layer between sites and services which is the preferred setup. Can also imagine having more than one GraphQL layer to not make it a single point of failure, a layer tightly coupled to one microsite and the services it needs to call.

GraphQL layer between microsite and services

More about performance

If you like to know more about website performance, maybe these stories are also a good read to you:

Thank you for taking the time to read this story! If you enjoyed reading this story, clap for me by clicking the 👏🏻 below so other people will see this here on Medium

I work at Wehkamp.nl one of the biggest e-commerce companies of 🇳🇱
We do have a Tech blog, check it out and subscribe if you want to read more stories like this one. Or look at our job offers if you are looking for a great job!