Replatforming the Lantern API

Background: What Is the Lantern Api?

  1. Article pageviews (number of times article pages are requested)
  2. Median Time on Page (having opened the article page, how long on average does the user spend there before moving on?)
  3. Percentage ‘Quality Reads’ (what proportion of page-views are deemed to be ‘quality’, ie. the user seems to have read a decent proportion of the article, and not just skimmed over)
  4. Number of significant reader interactions (eg. comments posted, social shares)
  5. Click-through-rate (CTR): [homepage only] for each view of the homepage, how many times is a particular article link clicked on.
  • Editorial ‘Desk’ (section of the website)
  • Content type (eg. regular article, video, special package)
  • Reader type (anonymous, subscriber)
  • Reader location (country/region)
  • Traffic source (eg. has the page-view originated from clicking a link on search engines result pages / social media)

Why Change It? Project Objectives

  1. Reduce costs. The infrastructure supporting this dedicated data pipeline/database was costing ~$17k per month)
  2. Reduce maintenance / complexity / key-person-dependency. The dedicated pipeline was prone to blips; only one engineer was fully au fait with the set-up; the technologies used were not well understood by the wider team.
  3. Eradicate duplication / divergence of business data logic. We wanted, as far as possible, to use the same data and logic as employed by our BI teams, to present a consistent view of key metrics to the business.

The Solution

Data Latency

Dependencies on other Teams

Speed

  1. Very quick results served back to client
  2. Reduced BQ query costs
  1. Caching query results.
  2. Pre-fetching data that is known to be commonly and frequently requested, eg. data for the main page.
  3. Fetching partial datasets in parallel, and then aggregating.
  4. Optimising the structure of database tables for the queries needed by our api.
  5. Pragmatism about how the queries we support (eg. certain queries disregard tracking data for articles older than 5 years)

Cost

  1. Caching.
  2. Optimised structure of database tables.
  3. Pragmatism about how the queries we support (eg. rounding time-ranges to the nearest minute.

Aligning Business Data Definitions

The Result

--

--

A blog by the Financial Times Product & Technology department.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store