GraphQL: Intuit’s Path to ONE API System
Intuit is thrilled to increase our support for the GraphQL community by joining the GraphQL Foundation as a founding member. As our introduction to the community, we wanted to share our history with GraphQL, how we are using it today to power our ecosystem, what we’ve learned as early adopters of GraphQL, and our plans for the future.
Intuit’s API History
Intuit recognized more than 18 years ago that small business owners using QuickBooks were also using other applications to help run their businesses, and this was causing an acute customer pain of having to enter business data twice. To address this challenge, we introduced the first API for QuickBooks Desktop in the first calendar quarter of 2001 and rapidly grew an ecosystem of several hundred business applications which integrated with QuickBooks.
For quite some time, for both QuickBooks Desktop and QuickBooks Online, QuickBooks was a product with an API. We bolted our 3rd party API layer alongside the product to enable developers to implement the integrations our mutual customers were demanding.
In 2012, we moved QuickBooks Online from a Web 1.0 server-generated page architecture to a single-page architecture (SPA) — for the first time, our own product became dependent on API services. We explored using our 3rd party API services, but they were not well-suited to key UI use-cases such as showing all the transactions eligible for inclusion in a bank deposit record (for example, a set of invoice payments, some sales receipts, and even some general journal entries), or all transactions that might be eligible for inclusion in a customer invoice (for example, estimates, billable time, billable expenses, and similar workflow transactions might result in line items in an invoice for a customer). The initial SPA of QuickBooks Online utilized approximately 80 unique API endpoints, each one custom-crafted to the needs of specific UI-driven use-cases.
That initial count very rapidly expanded as we continued to move more of the UI from the server-generated page implementation to a SPA implementation. In 2014, it became clear that we were on an unsustainable path of implementing more and more API endpoints. UI developers were forced to be full-stack developers as each new UI implementation drove the need for new, bespoke API services. Meanwhile, we continued to evolve product functionality and pay the cost of trying to keep our public (V2 and V3) APIs up-to-date with product functionality.
The need for a new, unified approach to sustainably serving both UI and general API use-cases became ever more evident. A small group was organized to explore approaches for delivering a new API system to address our needs. Many approaches were attempted, but none gained traction until…
GraphQL: A Language for the Client to Define the API it Needs
When Facebook introduced GraphQL to Open Source in 2015, the Intuit Small Business/Self Employed Group (SBSEG) was quick to realize its potential for solving our API needs. By correctly defining our domain objects, we could allow the server to describe its capabilities and let the client define the data it needed, eliminating — at the API layer at least — the REST over- and under- fetch problems that plague the performance of many SPA web apps. Initial prototypes with React and Relay were extremely promising; for the first time in our API history, engineers were genuinely excited by our API strategy and enthusiastically driving the design of the core entities of the API.
In the same time frame, we were seeking to decompose several monolithic services into a forest of microservices. We engineered what today would be called “schema stitching”, but utilizing annotations in our schema to automate where typical schema stitching requires hand-written code. Our system builds a query plan to fulfill graph requests (queries and mutations) that require orchestrating calls between multiple microservices. For example, if a request needs details about an invoice, the customer associated with that invoice, each of the lines in the invoice, and details about the item used on each line, we would need to communicate with the Transaction service, the Contact service (for the customer details), and the Item service (for the Item details). Similarly, our GMail invoice widget needs to create a customer object and an invoice object and potentially even item objects in a single mutation request, which gets orchestrated to the customer, transaction, and item microservices.
Many in the industry have implemented GraphQL on top of existing (typically REST-based) services. Given that we did not have a service layer with which we were relatively satisfied, such layering was not a viable option for us. Therefore, in order to accelerate development of services and enable the necessary orchestration, we built an extensive proprietary SDK library (with multiple patents pending), which wrapped around the graphql-java library and supported orchestration and query planning between services through a service registry. This query planning and orchestration layer was also embedded into our API gateway. This approach allows services to be decomposed from our monolith without affecting any API caller directly (UI or service to service calls), through a single orchestration endpoint for clients to call while individual service endpoints are dynamically determined during request orchestration. The Logical view of our architecture can be seen below
From a GraphQL perspective and the needs of the web UI, we rapidly realized significant value. Our journey isn’t done, as we still call our original UI-specific services for some of the less frequently used parts of QuickBooks Online. As of approximately a year ago, 80% of the actual API traffic from QuickBooks itself is served via GraphQL, and other portions of QuickBooks (such as our Accountant experience) have been developed from the start to use GraphQL. Our current GraphQL API contains almost 600 nodes with nearly 3000 custom value types.
While our journey has been largely successful, it has not been without its missteps. For the purposes of this article, I’ll focus on our transition to GraphQL, and the mistakes we made there.
Commit to GraphQL at the API Layer
As an early adopter of GraphQL, there was significant internal concern around whether this new technology would be broadly adopted or become an orphan. To hedge our bets, we designed a multi-protocol approach to our API, focusing entirely on the nouns of the system. Our SDK enabled a REST projection, a GraphQL projection, and an internal “Batch” protocol to be generated from a single API definition authored as JSON Schema in YAML. The code generation driven by the JSON schema delivered a singular advantage for a system as complex as ours: we could ensure consistency of GraphQL input and output types based on the entity/type definitions and annotations indicating whether object properties were read-only, read/write, or create-only.
Unfortunately, the existence of the REST projection, our service engineers’ relatively better understanding of REST vs. GraphQL, and the nouns-only approach that REST encourages meant that we were not fully utilizing the expressive capability of GraphQL. Rather than allowing services to define their API with GraphQL in mind from the start, and therefore support GraphQL-native constructs such as named query fields and mutations with custom inputs, we “dumbed down” our GraphQL schema to the REST equivalent: Create, Update, and Delete mutations for each core entity, and simple value-access query fields and automatically generated Relay/GraphQL Connections for any references from one entity to another entity that had a cardinality N. This has led to a “less than natural” GraphQL schema. For example, rather than having a simple mutation to request that an invoice be e-mailed to a customer, the client had to do a Transaction_Update mutation and modify a field to indicate that the transaction was TO_BE_EMAILED), which placed a significant implementation burden on many services to support implicit capabilities derived by our schema generator rather than actual use-case driven capabilities. We are actively working to solve this problem as we work toward a second generation of schema generation that leaves our REST and Batch protocols behind.
The beauty of GraphQL, when used properly, is that it allows the service implementor to defer the moment of understanding API consumption patterns: The service can design the GraphQL schema for flexibility, and then optimize for real-world usage. Unfortunately, in our eagerness to implement our new API system across our product lines, and driven by some versioning concerns forced by REST but largely irrelevant to GraphQL, we drove engineers to fully define their domains very early in the design cycle; we now recognize this as an anti-pattern. A service should define schema only insofar as it plans to implement. Yes, the design must be durable to ensure backward compatibility as the implementation gets richer, but individual properties and new types should be defined only as needed by the implementation, not all at once, because not doing so leaves clients with a confusing morass of fields that are not supported or supported only in certain use-cases. We are currently in the process of stripping unsupported fields out of our schema until they are actually needed.
Be Intentional about Edges Between Nodes
One of our design goals in the new system was to have not only an API description, but also a description of data that could be published on our event bus. Here, we soon recognized that our adoption of a complete graph of objects made it difficult to distinguish where the graph should “stop” when publishing any given node to a message bus. We also created a situation where it was difficult to constrain the depth and width of any given request from a client. We are now adding annotations to allow services to explicitly declare the “standard” view of any given node for publishing purposes and to more explicitly define the boundaries of responsibility for any given service in providing their part of any given graph request. We’ll also annotate to explicitly indicate the intent that a reference should not be reflected in the graph as a directly resolvable node; in the rare case that the client requires details about the referenced node, it should query for that directly via a new request. This is obviously the exception, but it does enable us to be much more intentional about the potential expansiveness of the graph.
Large Schemas are Challenging for Mobile Tools
Our mobile teams, like our web UI team, rapidly recognized the value of GraphQL. In fact, the introduction of GraphQL enabled us to eliminate our “mobile gateway”, which had unique endpoints that exposed a “thin” version of our 3rd party APIs to reduce the network bandwidth required by mobile clients. However, as our schema grew to its current size, we started to realize significant liabilities in mobile GraphQL client libraries like Apollo Client for iOS and Android. In some cases, our vast schema caused tooling to crash; in other cases, the tooling worked, but the resulting 40MB code bloat imposed by the generated type objects was prohibitive. We recognized that our mobile clients were, in fact, utilizing a fairly “thin” view of the entire graph, so we built tools (which we plan to make available to the community) that ingest the actual GraphQL requests issued by the client and generate a “lean” schema.json file for use by the rest of the GraphQL toolchain. Our initial experiments here have been extremely promising, reducing the generated code size for our iOS app dramatically. There is further work to do here; queries are easy, but since mutation input objects can also be large graphs and we have no static insight into the variables portion of mutation requests, we have to default to a fairly deep graph realization that may be unnecessary.
While our journey has not been without its missteps, Intuit has realized significant value from our adoption of GraphQL starting in 2015 with SBSEG, and expanding over the past year to other Intuit business units. We look forward to even more progress as we course-correct and begin to leverage the contributions of the broader GraphQL community and contribute our own innovations designed to support large schemas back to the broader community.