How to build a serverless GraphQL API with Cosmos DB

Published in

Credera Engineering

14 min readJul 11, 2022

In this guide, we’ll look at how to build a GraphQL API in Node.js with Apollo Server. By running it on Azure Functions with Cosmos DB for data storage, we can build a highly scalable serverless GraphQL API, removing the need to manage server infrastructure.

As well as the basic set up, we’ll take a look at how to get the best performance out of your API by exploring the concepts of batching and memoization (also known as de-duplication).

The technologies

GraphQL is a powerful query and manipulation language which provides a complete description of your data, allowing clients to specify exactly what data they need and nothing more. We’ll assume since you’re here that you’re familiar with GraphQL. If not, check out GraphQL’s website.

Cosmos DB is a fully managed NoSQL database offered by Azure. It supports various query APIs including MongoDB and Cassandra, but the most popular is the Cosmos DB SQL API, which we’ll be using in this article. Cosmos DB offers a consumption mode, where you’re only billed for the actual requests made and data stored, allowing our API to be entirely serverless. Find out more in the Azure Cosmos DB documentation.

Azure Functions is Azure’s Function-as-a-Service offering. We provide code for individual functions which are triggered by incoming HTTP requests. Azure handles provisioning the compute resource to run our code in, billing us only for the actual compute resource and time consumed. Learn more in the Azure Functions documentation.

This article assumes basic familiarity with GraphQL schemas, NPM, and TypeScript. Some familiarity with Azure Functions and Cosmos DB will be useful but is not required. You’ll also need an Azure account (you can get started for free here).

Please note that the Azure resources used in this project can incur a cost.

One Azure function to rule them all

Our project will consist of a single Azure Function which will handle incoming HTTP requests to our GraphQL API.

We’ll start by setting up a basic Azure Functions project. First, we’ll install the Azure Functions Core Tools by following Microsoft’s instructions. This is a useful CLI tool for developing and deploying Azure Functions. We’ll also want to make sure we have Node.js and NPM installed.

Initialising our project is as simple as running the following command:

func init graphql-cosmosdb-example --worker-runtime typescript

This will set up the directory structure and required files for our project. Next, we can run the following command to create a HTTP trigger function which will handle incoming requests to the GraphQL API.

func new --template "Http Trigger" --name graphql

We won’t go into too much detail about the structure of Azure Functions projects as this is well documented elsewhere, but the main things to note are:

The function has its own directory, graphql, containing source code (index.ts) and configuration (function.json).
Within function.json,the bindings section defines how the function is triggered and what it outputs. In this case, it’s triggered by a HTTP request and outputs a HTTP response. scriptFile specifies the path to the code for the function to execute.

We can run our project locally to check everything’s working by running the usual npm install followed by npm run start. We should see some output in the console similar to:

graphql: [GET,POST] http://localhost:7071/api/graphql

Navigating to this URL in a browser should show a success message which will tell us that everything’s working. Now, let’s replace the sample code with something more useful.

Setting up Apollo Server

Apollo Server is an open-source GraphQL server for Node.js which will allow us to serve data via a GraphQL schema. It’s database agnostic, build tool agnostic, and GraphQL client agnostic, and has a powerful plugin API for extensibility.

Let’s get started by installing the required dependencies:

npm install apollo-server-azure-functions graphql

Note that we aren’t installing the apollo-server package directly — rather the apollo-server-azure-functions package which wraps Apollo Server. This allows us to easily run it on an Azure Function.

GraphQL servers need a schema to define the shape of the data which can be queried by clients. We’ll create our schema in a file named schema.ts within the graphql directory. We’ll start with the basic schema below. It declares a single root query named user which takes an argument, id, and returns an object of type User.

import { gql } from "apollo-server-azure-functions";export const typeDefs = gql`
    type Query {
        user(id: String!): User
    }    type User {
        id: String
        firstName: String
        lastName: String
        age: Int
    }
`;

For more details on the GraphQL schema language, take a look at this documentation.

We’ll replace the code in index.ts with the snippet below. It calls createHandler with a map of resolver functions and the schema we defined above. We’ll dive deeper into what resolvers are in the next section. For now, we just have a dummy function returning hardcoded data.

import { ApolloServer } from 'apollo-server-azure-functions';
import { typeDefs } from './schema';// Resolver map.
const resolvers = {
  Query: {
    user: (_, params) => {
      return {
        id: params.id,
        firstName: 'John',
        lastName: 'Smith',
        age: 50
      }
    },
  },
};// Create our server.
const server = new ApolloServer({ typeDefs, resolvers });
export const run = server.createHandler();

Configuring function.json

Before we run this code, we need to make some changes to function.json:

{
  "bindings": [
    {
      "authLevel": "anonymous",
      "type": "httpTrigger",
      "direction": "in",
      "name": "req",
      "route": "{*segments}",
      "methods": [
        "get",
        "post",
        "options"
      ]
    },
    {
      "type": "http",
      "direction": "out",
      "name": "$return"
    }
  ],
  "scriptFile": "../dist/graphql/index.js"
}

Things to note:

authLevel has been set to anonymous so we can call our GraphQL API without a key
route has been set to a wildcard value, so that all routes are handled by this function
options has been added to the methods array so that clients can check which HTTP methods are available
name in the output block has been set to $return which is a requirement of apollo-server-azure-functions

Configuring tsconfig.json

We need to update tsconfig.json. The esModuleInterop flag is required when using apollo-server with TypeScript to prevent errors such as the one below. You can read more about the flag in the TypeScript documentation.

node_modules/apollo-server-core/dist/plugin/drainHttpServer/index.d.ts:2:13 - error TS1192: Module '"http"' has no default export.

Below is the complete tsconfig.json used for the example project:

{
  "compilerOptions": {
    "module": "commonjs",
    "target": "es6",
    "outDir": "dist",
    "rootDir": ".",
    "sourceMap": true,
    "strict": false,
    "esModuleInterop": true
  }
}

Testing with Apollo Sandbox

Restarting the development server and navigating to the localhost address for the function in a browser, we are now greeted by the the Apollo Sandbox. This is a handy web-based tool served up by Apollo in development environments that lets us easily query our GraphQL API.

To test our API, we’ll run the following query:

query($id: String!) {
  user(id: $id) {
    id
    firstName
    lastName
  }
}

We also need to provide a variables object, as the user query takes an ID parameter:

{
  "id": "testid123"
}

If all is working, the API will return the data we hardcoded in index.ts:

{
  "data": {
    "user": {
      "id": "testid123",
      "firstName": "John",
      "lastName": "Smith"
    }
  }
}

To understand how Apollo is handling our query, we need to understand resolvers.

Understanding resolvers

Apollo Server handles queries using resolver functions. For each field in the root Query type in our schema, we define a resolver function with the same name. We pass an object containing all of our resolver functions into Apollo server on startup. Apollo will execute this function whenever a client queries for that field. The function is responsible for fetching the required data (from wherever that may be) and returning an object that matches the return type in the GraphQL schema.

Let’s remind ourselves of our schema.

type Query {
    user(id: String!): User
}type User {
    id: String
    firstName: String
    lastName: String
    age: Int
}

In the root Query type, we have a field named user, and therefore we must provide a corresponding resolver function named user to Apollo Server. The resolver function must return data that conforms to the User type.

Currently, our resolver function just returns hard-coded data. In the next section, we’ll use a data source to fetch data from Cosmos DB.

Integrating Apollo Server with Cosmos DB

To fetch data from Cosmos DB, we could simply make calls to Cosmos DB directly from our resolver function using the Cosmos DB SDK. Whilst this would work, but there is a better way. Apollo Server has the concept of data sources, which are classes that know how to fetch data of a particular type from a particular place. The advantage of data sources is that they for optimisations such as caching and memoization, which can make queries more efficient. We’ll cover these optimisations in more detail later.

Adding data to Cosmos DB

First, we’ll need some data in Cosmos DB to fetch. Cosmos DB holds objects inside containers, which themselves belong to databases. Cosmos containers each have a partition key, which is used to spread the data items across logical partitions. Choosing a good partition key is vital for getting good performance and scalability from Cosmos DB. The details of choosing a partition key are out of scope of this article, but you can learn more in this Cosmos DB documentation.

For this example, we need a Cosmos container named users which is partitioned by id. Creating a Cosmos DB Account is straightforward in the Azure Portal. Navigate to the Cosmos DB console and click “Create”. Select the Core (SQL) API. Select or create a resource group, provide a name for the account, and select Serverless under Capacity mode.

Once the database is created, click ‘Go to Resource’, then select Data Explorer on the left-hand side. Click ‘New Container’. Provide a name for the database and container — for example “exampleapp” and “users”. Click ‘OK’ to create.

We can populate our container with some sample data using the ‘New item’ button.

For the purposes of demonstration, we’ve created a few different user items with different IDs.

Creating a Cosmos DB Data Source

No, we’ll hook our Apollo server up to Cosmos DB.

Install the Cosmos DB Data Source and Cosmos DB SDK packages:

npm install apollo-datasource-cosmosdb @azure/cosmos

We’ll edit index.ts to add a function and build a data source. It creates a Cosmos DB client, gets the container, and returns a newCosmosDataSource.

import { CosmosDataSource } from 'apollo-datasource-cosmosdb';const buildCosmosDataSource = <TData extends { id: string }>(
  containerId: string
) => {  const client = new CosmosClient(
    process.env.COSMOS_CONNECTION_STRING
  );  const container = client
    .database(process.env.COSMOS_DATABASE_NAME)
    .container(containerId);
    
  return new CosmosDataSource<TData, unknown>(container);
}

We’ll also need to define an interface for the items stored in the database. We’ll create an interface User in a file named user.ts within a new directory named models.

export interface User {
    id: string;
    firstName: string;
    lastName: string;
    age: number;
}

We can now update index.ts to provide the dataSources field when intialising Apollo Server. This is a function which returns a map of data sources that will be available to use in resolver functions.

// Create our server.
const server = new ApolloServer({ 
  typeDefs, 
  resolvers, 
  dataSources: () => ({
    user: buildCosmosDataSource<User>('users')
  }) 
});
export const run = server.createHandler();

The last step is to use the data source to fetch data from Cosmos DB in the resolver function. The CosmosDataSource class exposes various functions to query data, including by ID or by SQL query. For our basic example, we can use findOneById to query for a single item. We’ll update the resolver as follows:

// Resolver map.
const resolvers = {
  Query: {
    user: async (_, params, context) => {
      return context.dataSources.user.findOneById(params.id);
    },
  },
};

Here, we’re accessing the dataSources property on the context object. The context is passed into resolver functions as the third argument.

Local settings

The code assumes the connection string and database name are set in environment variables COSMOS_CONNECTION_STRINGand COSMOS_DATABASE_NAME respectively.

We can place these values in the file named local.settings.json in the root directory of our project. These values are available to our code as environment variables at runtime. The connection string can be found in the Cosmos DB portal in the Keys tab.

{
  "IsEncrypted": true,
  "Values": {
    "FUNCTIONS_WORKER_RUNTIME": "node",
    "AzureWebJobsStorage": "",
    "COSMOS_CONNECTION_STRING": "<your cosmos connection string>",
    "COSMOS_DATABASE_NAME": "<your cosmos DB name>"
  }
}

Testing the Cosmos DB integration

Running the same query as before — this time passing the ID of one of the items in our Cosmos DB container — we can now see the data coming from Cosmos DB.

Querying related objects together

Up until now, we have built a really convoluted way to fetch single items from a database. But GraphQL is a lot more powerful than that.

Let’s say we’re building an application to manage car rentals. Our database has a collection of user objects and a collection of car objects. Each user can rent one car at a time, the ID of which is stored in a field on the user named carId. We could define this in the GraphQL schema like so:

export const typeDefs = gql`
    type Query {
        user(id: String!): User
        car(id: String!): Car
    }    type User {
        id: String
        firstName: String
        lastName: String
        age: Int
        carId: String
    }    type Car {
        id: String
        make: String
        model: String
        miles: Int
    }
`;

Here’s the problem. If we want to know what car a user is currently renting, then we must first query for the user, and then make make a subsequent query for the car. This would result in two network round trips.

However, we can leverage the power of resolvers to avoid the extra network call. Rather than return the car ID in the user query, we can return the car object itself. Let’s update the schema as such:

type User {
    id: String
    firstName: String
    lastName: String
    age: Int
    car: Car # Return a Car object instead of ID
}

But the users in our database only store the car ID, so how can we return the full car object?

We can tell Apollo Server to automatically fetch the car item and inject it into the response whenever a query for a User requests the car field. We do this by adding an entry to our resolvers map called User (corresponding to the User type defined in the schema). We can then provide resolver functions for any of the fields within the User type. Apollo Server will execute them whenever it comes across that field in a query. This means that we can return fields which don’t actually exist. Rather than return carId, we can define a resolver function named car which will fetch the car object from Cosmos DB and return it. The result from this function will be assigned to the car field in the query response.

So, we’ll provide a resolver function for the car field of the User type. Here’s what our resolvers map will look like:

// Resolver map.
const resolvers = {
  Query: {
    user: async (_, params, context) => {
      return context.dataSources.user.findOneById(params.id);
    }
  },
  User: {
    car: async (parent, _, context) => {
      return context.dataSources.car.findOneById(parent.carId);
    }
  }
};

In the User.car resolver, we’re making use of the first argument passed into the resolver: parent. This will always be the parent object, allowing us to access the carId field on the user.

We also need to add an interface for the car items in models/car.ts:

export interface Car {
    id: string;
    make: string;
    model: string;
    miles: number;
}

And add a car data source:

const server = new ApolloServer({ 
  typeDefs, 
  resolvers, 
  dataSources: () => ({
    user: buildCosmosDataSource<User>('users'),
    car: buildCosmosDataSource<Car>('cars')
  }) 
});

In Cosmos DB, we’ll create a few items in a container names cars:

And set the carId field on our users:

Finally, we’ll query the user again, this time also requesting the car field.

And, like magic, Apollo Server populates the car field with the car item from the database, based on the ID stored in carId on the user.

Memoization and batching

So now we know how to harness nested resolvers to automatically fetch separate pieces of data as part of a single GraphQL query. This is very powerful, and can be taken a lot further than the example given.

So what’s the catch? Well, it can have a detrimental impact on performance, since our code is now making multiple database calls in order to fulfil a single request. We can mitigate this in a couple of ways.

Caching

Caching can be used to reduce the number of database calls by persisting recently fetched values in-memory between requests. The CosmosDataSource supports in-memory caching out the box. We can pass a ttl to findOneById and other functions on the CosmosDataSource to specify how long to cache the data. If a later request within the TTL needs the same piece of data, it can be used from the cache, helping to avoid a database call.

Caching values between requests introduces complexities such as data staleness and cache invalidation, and this needs to be considered when selecting a TTL value. Ultimately, this will be determined by the needs of your application and whether it can tolerate stale data. Even if your application cannot tolerate any stale data, memoization can still be used to optimise single requests.

Memoization

Memoization is a specific form of caching. Generally speaking, memoization is the practice of storing the results of expensive operations in a cache, to be reused next time the same input occurs. It is also known as de-duplication.

In the context of Apollo Server, memoization means caching the result from the data source to be used later within the same request. In other words, if the same piece of data needs to be fetched multiple times to fulfil a single query, it will be fetched once from the data source (Cosmos DB in our case), and the value cached to be used for the subsequent occurrences. This avoids making repeated database calls for the same piece of data within the same request, which would be very inefficient.

The CosmosDataSource package performs memoization automatically, so we get this optimisation for free.

Deploying to Azure

So far, we’ve only run our project locally, but deploying it to Azure is straightforward.

Search for ‘Function Apps’ in the Azure Portal and click ‘Create’. Select a resource group, provide a name (for example apollo-cosmosdb-example), select Node.js as the runtime stack, and Linux as the operating system. Everything else can be left as defaults.

Once the Function App is created, we can deploy our project with:

func azure functionapp publish apollo-cosmosdb-example --publish-local-settings --typescript

This will deploy our function to the Function App and apply the environment variables we have in local.settings.json, so that the Cosmos DB connection string and database name are available to the deployed app.

The GraphQL API will now be publicly available at a URL like:

https://<function-app-name>.azurewebsites.net/api/graphql

In a nutshell

In this article, we have shown you how to:

Set up a basic Azure Functions project with Core Tools CLI
Add Apollo Server to an Azure Function
Get data from Cosmos DB using data sources
Optimise Apollo Server with caching and memoization
Deploy Azure Functions with Core Tools CLI

We hope you found this blog helpful!

Interested in joining us?

Credera is currently hiring! View our open positions and apply here.

Got a question?

Please get in touch to speak to a member of our team.