Why Did We Migrate to GraphQL Hive from Apollo Studio

EESA
Alef Education
Published in
4 min readJan 18, 2023

Recently we had to move to GraphQL Hive from Apollo Studio. In this story, I will be answering the big question of why did we move away from Apollo Studio. I will also be sharing the experience we had during migration & our findings with GraphQL Hive so far. I will not go into much details & specifics but if you would like to know any specific detail, just leave a comment below & I will be happy to answer.

Problem with Apollo Studio

Apollo Studio (previously known as Apollo Engine) is a primary web interface which provides us the following capabilities:

  • Schema registry storage (publishing & downloading)
  • Schema composition checks
  • Usage reporting & monitoring
  • Schema explorer

& much more…

But we were not using everything

With all these many features available, we were using Apollo Studio mainly for two purposes.

  • To publish & download composed SDL schema
  • To perform schema checks

That’s it. Even though there were operation metrics & operation tracing features but we weren’t using any of them since we didn’t need them.

Redundant usage reporting

Apollo Studio charges for each operation that is reported to it, which is fine. As number of users increased in our platform, the cost was increasing every month. After some investigation we found out that redundant operations were being reported to Apollo Studio i.e. once from our graphQL gateway & once from the subgraphs 🤦‍♂

This means we were being charged twice the amount each month. We had to explicitly disable reporting for all our subgraphs through a ApolloServerPluginUsageReportingDisabled plugin to save the cost.

Though we believe that this should have been disabled by default for all the subgraphs & users should choose whether to enable it or keep it disable but not sure why Apollo kept it like this since gateway is enough to report all the operations to Apollo Studio.

No way to sample usage reporting

We thought ok maybe we can add sampling to the operations being reported to Apollo Studio to avoid the cost but weirdly enough, there was no way to add sampling to the operations which were being reporting to Apollo Studio.

They do have an option to pass fieldLevelInstrumentation to sample field usage reporting but there is no way to add sampling to usage reporting on client operations. We reached out to Apollo’s support but couldn’t get a satisfactory answer.

The cost was already high as the number of users have significantly increased in our platform in the past couple of years. There were billions of requests each month that were being reported to Apollo Studio for which we were being charged for.

To summarize, there was no way in Apollo Studio to:

  • Use Apollo’s schema registry & schema check capability only
  • Add sampling to usage reporting on client operations (not fields)
  • Add a limit to usage reporting per month (through a plan maybe)

Migration to GraphQL Hive

We started looking for other options & GraphQL Hive was the strongest candidate out there. Even though it was a new product, their pricing plans are flexible and suited what we needed and more such as:

  • Add sampling to usage reporting
  • Limit usage reporting per month
  • Get quick support

So basically everything was provided which was missing from Apollo Studio. So we decided to migrate to GraphQL Hive.

Process of migration

We migrated to GraphQL Hive using parallel implementation approach i.e. keep using Apollo Studio until we are fully migrated to GraphQL Hive & all the testing is done. To summarize, we followed these basic steps:

  • Start publishing schema from subgraphs to hive schema registry
  • Start pointing to hive schema registry from our gateway

Of course there were a lot of more steps involved in between that we had to follow e.g. adding toggles to turn off Apollo Studio in our production environments & then eventually removing integration of Apollo Studio etc..

Findings with GraphQL Hive so far

GraphQL Hive was introduced in May 2022 by The Guild so it is pretty much new in the market. Since we migrated to it, we found some critical bugs.

  • When we published schema from our subgraphs to hive registry, we found out that the schema was not getting composed hence it wasn’t available for the gateway.
  • On releasing multiple subgraphs, all of our subgraphs tried to publish schema to hive schema registry but even though the logs were validating the publishing of schema but for some reason the schema wasn’t being reflected for some of the subgraphs on the hive app. Actually, there was a race condition happening when multiple subgraphs were trying to publishing schema which was causing this issue.

We reported these bugs & The Guild team were fast enough to support & solve these bugs for us.

Summary

We moved to GraphQL Hive since Apollo Studio doesn’t provide a way to:

  • Add sampling to usage reporting on operations (not fields)
  • Limit usage reporting per month
  • Use schema registry & checks capability only

GraphQL Hive does provide solution to all these problems but it is still new in the market & has yet to get mature.

--

--