Migrating to Apollo 1.0 for iOS

Published in

Thumbtack Engineering

7 min readJul 12, 2023

Photo by NASA — Apollo 11 Saturn V lifts off

We’ve written previously about how we use graphQL on mobile at Thumbtack. The well defined, strongly-typed schema is a great benefit to have on a native platform. We began adopting graphQL on iOS all the way back in August 2018. We needed to decide on a client graphQL library to use for our implementation. At the time, there really weren’t that many options, and the de facto library was Apollo iOS. To use Apollo, you generally write graphQL queries and have apollo generate code for you to send those queries and parse those responses for you and turn them into strongly typed Swift objects. So, we added Apollo to our app and set out to prototype it and prove its viability. Things seemed to go well, so we decided to move ahead with more graphQL based APIs in the future.

Around 3 ½ years later, in early 2022, we began to run into a myriad of problems. Developers noted that when they built the app, it would hang for a very long time when launching the simulator, and sometimes it would fail to launch at all with a mysterious vague error message of “Terminated due to signal 9.” Initially, this was written off as newly introduced Xcode flakiness , but as time went on this error message began to become more and more prevalent until eventually it happened every time there was a code change. Mysteriously, though, the second launch of compiled code always seemed to launch right away with no errors. Of course, multiple engineers tried to determine the cause of this error, and there were many difficult to prove theories presented, some of which were tested and abandoned. It was discovered that disabling the debugger got rid of the error, but that was the only solution found for some time.

Around the same time, we noticed that our compile times had gotten very slow. Our compile times had grown to about 8 minutes. This is in contrast to the ~3 minute build times we are generally accustomed to. We discovered that building our graphQL auto-generated code actually took approximately 80% of this build time. We also noticed that some of these auto-generated files had gotten huge. We noticed that many of them had grown to be 80,000+ lines of code. While we generally have no reason to touch these as they are auto-generated, it’s no surprise they are quite slow to compile.

As these issues piled up, Thumbtack sought to find solutions to alleviate developer headache and slowness. We migrated our graphQL libraries to be dynamic libraries instead of static libraries which fixed the “Terminated due to signal 9” issue at the expense of a slight hit to startup time. We changed the workflow for graphQL libraries to pre-compile them, but this created more problems than it solved, so it was reverted. We upgraded our iOS developer laptops to be Apple silicon, which greatly improved compile time. We also investigated moving to other graphQL libraries. However, sadly there still aren’t a lot of great, well supported graphQL libraries for iOS.

Problems With Code-Gen

At this point, it’s worth talking about why these files were so large. To do so we need to dive into how Apollo 0.5.x worked. Let’s say you have a schema that looks something like this:

type User { 
  firstName: String!
  lastName: String!
  userName: String!
}

type QueryAResult { 
   user: User!
} 

type QueryBResult { 
  user: User!
}

extend type Query { 
   queryA(): QueryAResult!
   queryB(): QueryBResult!
}

We have two different queries which can effectively return the same kind of data. Simple enough, right? But this isn’t the whole story. The response returned from graphQL depends on the data queried from the client. I.e. the client might only care about firstName & lastName from queryA but might only care about userName from queryB.

query queryA { 
  queryA { 
     firstName
     lastName
  } 
}

query queryB { 
  queryB { 
     userName
  } 
}

Because Swift is strongly typed, the Apollo code-generator will need to create two separate classes with different fields here. This is a simplified example:

class QueryA { 
 class User { 
   var firstName: String { ... } 
   var lastName: String { ... }
 }
  
 var user: User
}

class QueryB { 
 class User {
   var userName: String { ... }
 }

 var user: User
}

So now you use your results in exactly the places you need them and you only get the fields you want. But what if you actually do want the exact same fields in each case? Well, you could just put the exact same fields in each query. Then the types generated on the client are not interoperable. You’d still have 2 distinct types QueryA.User and QueryB.User which are very similar but not interchangeable. What if we wanted a type that is interchangeable? Is there a built-in way you can write your queries in such a way? Yes, there is a solution to this called fragments. It looks something like this:

fragment UserFields on User { 
  firstName
  lastName
  userName
}

query queryA { 
  queryA { 
     ...UserFields
  } 
}

query queryB { 
  queryB { 
     ...UserFields
  } 
}

And in both queries, they will generate a new fragments property that uses a common type that can be used in both responses, which makes using common types much nicer to work with on the clients. However, that’s not the full story. The actual generated types look something like this:

class QueryA {
  class User { 
    var firstName: String { ... }
    var lastName: String { ... } 
    var userName: String { ... }
  }

  class Fragments { 
    var user: UserFields
  }
  
  var user: User
  var fragments: Fragments
}

class QueryB {
  class User { 
    var firstName: String { ... }
    var lastName: String { ... } 
    var userName: String { ... }
  }

  class Fragments { 
    var user: UserFields
  }
  
  var user: User
  var fragments: Fragments
}

So now, perhaps you see the problem. When we used a fragment to generate a common type, we didn’t consolidate from 2 types to 1, we actually added a 3rd. And it’s worth noting that you can nest fragments inside of fragments, which can make the problem even worse. As it turns out, Thumbtack relies on shared types, and fragments quite heavily. Which means as our usage of graphQL grew, our generated code grew at an unsustainable rate.

Apollo 1.0

Luckily, the folks at Apollo were aware of this problem, and rewrote the code-generator in version 1.0 almost entirely. However, this took some time. We beta tested it along the way, and it was finally released in October 2022. We then set out to update our codebase with version 1.0, which would no longer generate additional types when fragments were used. However it was not a straightforward endeavor. Since the upgrade was a large change, we ran into several issues which required client refactors in order for the upgrade to be compatible.

Generated graphQL response types no longer generated strongly typed initializers: Instead, this was replaced by a new mocking mechanism for mocking tests. This meant lots of our test setup code had to be updated to use this mocking framework. Additionally, there were a few places in our app code which were directly instantiating these objects, usually to perform some mutation on the response under certain circumstances.
GraphQL response types are no longer mutable: Though it was probably already bad practice before, now it is no longer possible to mutate a response you get from Apollo.
The way nullability was handled changed.
The way enums were handled changed.
Several changes with naming and plurality of auto-generated types.
The types of fields in response objects that used fragments changed. This is what causes the code generation to be much smaller as now there are fewer types generated.

One Last Problem

By February of 2023 we were about ¾ of the way through our upgrade but our production code was still using the older 0.5 library, and disaster struck. Suddenly our app would no longer compile but the compiler failure was seemingly within our auto-generated code. The error message was more or less undecipherable. Though we found several workarounds which would let us hobble on for a day or so at a time, we never found a good solution to get around it. However, it was noted that deleting some existing queries caused the problem to go away, therefore we hypothesized that we’d gone over some internal Swift compiler limit. This was particularly problematic though as it meant that it was very difficult to add new features to the app. Therefore we set out to buckle down and finally get the upgrade over the finish line. This was a large endeavor which required the help of multiple teams, but ultimately within a few weeks, we were ready to fully test it out.

Results

The results from the migration were incredible. For one of our apps we saw:

A >80% reduction in lines of code.
A 50% reduction in compile time in our CI system (building a debug build)
A 92% reduction in our release build time ( full compiler optimization )
A 25% reduction in our app store binary size.
A decrease in the amount of time needed run code generation

And of course, most importantly, we were back to being able to ship new features reliably.

If you are using Apollo < 1.0, and have not yet begun to work on upgrading, I’d recommend you start it as soon as possible. Depending on the size of your codebase, it could be a long process and is ultimately worth the investment. And if you are experiencing issues with your code size or compile time due to generated code from older versions of Apollo, you should certainly consider upping the priority of the upgrade.

Migrating to Apollo 1.0 for iOS

Problems With Code-Gen

Apollo 1.0

One Last Problem

Results

Written by Scott Southerland