A brief history of APIs at Dailymotion

Dailymotion started in 2005 with the goal of allowing users to upload, watch and share videos. Recently, Dailymotion decided to position itself as a top-down curated video experience focusing on high value publisher content rather than being a catch-all repository for short-form UGC (User Generated Content).

Fast forward many years, Dailymotion’s product portfolio, user base and API usage have grown tremendously.

We have 2 kind of data APIs:

  1. A Public Graph API that allows our API clients to access and manage videos, collections, uploads…
  2. A new internal GraphQL API for our brand-new Dailymotion experience.

Today we have more than 300 million unique users per month generating more than 1 million API requests per day, served from our hybrid stack : a main datacenter in Paris and a geo-distributed Kubernetes cloud.

This blog post is a brief overview on how I helped to scale the Dailymotion APIs as an API Lead / Evangelist during the last 4.5 years, with a focus on API design and team organisation.


Scaling the API design for performance

Like many internet startups, Dailymotion faced the explosion of consumption models. We’ve seen our users move quickly beyond browser-based web apps, and use our product from multiple devices : smartphones, game consoles, and set top boxes…

Fig1. The explosion of consumption models

Our first mobile apps were created quickly in response to a pressing need. These apps used our first REST API version to get data from our monolith PHP backend hosted in Paris.

Legacy Dailymotion mobile apps

For example to get data about a specific video, we used this kind of request to get the entire video ressource.

GET /video/x68r2tt
{
"id": "x68r2tt",
"title": "Stranger Things - Brings 80's Fashion Back",
"owner": "x1x1syp",
"description" : "Stranger Things is causing a fashion storm as many of their fans are borrowing clothes from their parents and dressing like a blast from the past.",
...
...
}

This first implementation worked well for our initial user base, however at scale we encountered some issues that impacted our user experience.

  • Over-fetching issue: requesting the entire ressource to get just a small piece of data.
  • Chattiness issue: multiple API requests are needed to build mobile views, which could be hindering the global performance of our mobile implementations.

Introducing the Graph API

To fix our performance issues we got a lot of inspiration from graph databases and Facebook Graph API to re-design an opinionated API.

Instead of thinking in terms of resources and endpoints, we decided to expose our data in terms of objects and connections (nodes and edges) :

  • Each object exposes multiple fields of different types (scalar, objects…)
  • An object field can be readeable, writeable and/or used as a filter.
  • Objects can be connected to each other.
Graph API Pattern

The Over-fetching issue was fixed by using fields selection, for example :

GET /video/x68r2tt?fields=title,description,owner.username
{
"title": "Stranger Things - Brings 80's Fashion Back",
"description" : "is causing a fashion storm as many of their fans are borrowing clothes from their parents and dressing like a blast from the past.",
"owner.username": "FYINews",
}

Exposing fields as objects allowed our API clients to also limit the chattiness.

limit chattiness

This pattern has allowed us to deprecate fields more easily without breaking our API clients — We don’t use API versioning.

Field deprecation

We were also able to track our API fields usage with ELK stack, to make decisions about our Graph API (deprecation, fields/object cleaning, mesure the impact of a breaking change…).

Track API usage with Kibana

Having our API hosted in Paris while serving 300 million users coming from multiple locations, some far away, created some latency issues that we still had to address.

Instead of going into a complex micro-services in the cloud “bling-bling” migration, we took our time to check what we could do with what we had, and this is what we came up with :

Monolith PHP7 migration: Decreasing our API average response time from 140ms to 75ms — For more details read this blog post.

Using Cache CDNs: Some of our API calls can be cached easily especially our video listing that vary by country.

curl -I https://api.dailymotion.com/videos
HTTP/1.1 200 OK
Cache-Control: public, max-age=900
....
X-Cache: HIT
Vary: X-DM-EC-Geo-Country, X-DM-SSL,Accept-Encoding

Building the field selection feature allowed us to have more granularity in our caching strategy, some fields can disable cache completely or reduce the TTL.

curl -I "https://api.dailymotion.com/videos?live&fields=onair,audience"
HTTP/1.1 200 OK
Cache-Control: public, max-age=10
....
X-Cache: MISS
Vary: X-DM-EC-Geo-Country, X-DM-SSL,Accept-Encoding

Implementing HTTP caching transport allowed us to achieve around 30% of hit ratio!

Migrating to GraphQL

One year ago we decided to relaunch our product with a new strategy. For our team it was a great opportunity and a really exciting challenge, because :

  • We had exactly 6 months to build a new API.
  • The new API had to be geo-distributed at its core and since the start.
  • It needed to be easy to consume for our front developers.
  • It needed to be easy to code and deploy for our core api developers.
  • It represented an opportunity to build from scratch a service architecture.

As our developers are already familiar with our Graph API, and the majority of our API clients are known developers we decided to switch to GraphQL.

Why we chose GraphQL :

  • We already expose our data as a Graph in our Graph HTTP API, and GraphQL already declares everything as a Graph.
  • GraphQL query operation is very interesting because it allows the WYWIWYG (What you want is what you get), thus our API clients will have more flexibility to request their data requirements.
  • The resolver feature brings a good design approach to use the underlying services.
  • Some tools, SDKs are already built around this technology : GraphiQL, Apollo SDKs, and “Check this list

Now we are proud to operate GraphQL in production, served from a geo-distributed service stack in the cloud.

GraphQL powered experience

Lessons learned

  • Creating a non open standard API has a cost! Simply because you have to build from scratch every tool needed and associated documentation to on-board your developers.
  • Treat your API as a product, and invest in making your API easy to use and understand.
  • Maintaining API SKDs is really hard if your API team is small
  • Prioritise performance and scalability for your use case when picking an API design paradigm.
  • Use RUM (Real User Monitoring) for your API to get insights into you API performance, and it will also help you answer key questions : how does API really perform from a specific country or in a specific device…
  • If you have a monolith stack and you have a performance issue check first what can be done, without a full migration.
  • Give freedom to your core team to break things, each time someone breaks our production they put money into a so-called “BeerBox” that we empty each 4 months to celebrate.
  • Don’t blindly migrate to GraphQL, if you don’t know the consequences. This techno may fix some of your issues but brings others issues that you have to deal with (Cacheability, API Lifecycle, Rate-Limit, Security…).
  • GraphQL is not a silver bullet to fix your API design. You really have to introduce an API Design guide from the beginning.
  • Dealing with mutations is a pain, because of the unpredictability of types, for instance if you want to update a Video Node ? what is the mutation without checking documentation ? (VideoUpdate, UpdateVideo, PatchVideo…).
  • Don’t see GraphQL spec with REST eyes, it’s a completely a different approach with its benefits and its limits.
  • Don’t try to compare GraphQL with REST and just forget all REST vs GraphQL debates, unless you like to compare apples with oranges!

Scaling the API team

Our API was first created and operated by our core developers, who exposed an API Layer over our monolith backend to fetch data.

As we wanted to scale our API and treat it as a product, we built a small agile API Squad to take care of our API Roadmap, build an API management platform and drive our API developers adoption.

This team consists of an API Lead/Evangelist, a senior core engineer and a support engineer. Yes, just 3! like the length of the “API” string!

The First API Squad

As our API is growing so fast, we saw our backlog growing like weeds in the backyard, thus it was a challenge to expose functional features and improve API performance while also driving developer adoption all at once.

We decided to distribute the API development and stop being a bottleneck for other teams, so we created the API Badge program :

  • Every core developer can expose optimised features in the API because he has the expertise to do it.
  • The API Squad has provided training sessions for new API Badge members, so they can collaborate as fast as possible in the API.
  • Our API Squad only reviews the API design and the API contract.
  • We provided cool “API Badge” branded T-shirts for API Badge members ;)

The “API Badge program” was definitely what helped us to think about our API strategy and how to achieve a global API governance in our organisation.

Today the API squad is part of the Tribe Scale which is the union of the API, Core Platform and Tooling squads. Its mission is to provide tools to other teams, a scalable architecture and good APIs.

Lessons learned

  • If you have a dedicated API team think twice and define clearly its mission statement.
  • Very good communication skills are needed for your API Core Developers, consider them as API Evangelists first! before coders.
  • Enabling ownership of the API across teams was painful! but it worth it at the end.
  • If you scale fast you will encounter some chaos and complexity to handle in your organisation and if you ask your self “Wow things are different” or “Why things are not done ?”, think about re-designing your organisation!

Thanks for reading this brief story :). If you have some feedback, reach out to me on Twitter or LinkedIn.