Content Delivery at Refinery29

Published in

Refinery29 Product & Engineering

5 min readJun 15, 2018

What happens when you type a URL into your browser? In addition to understanding the big picture of how a website is rendered after you type it out and hit enter, it’s good to know which dependencies are used along the way. This lets you contribute to architectural discussions and ultimately optimize the user experience. In this blog post, I’ll go over our dependency tree at Refinery29.

For simplicity’s sake, I’m only covering what happens when a user hits our user-facing website, Refinery29.com, excluding Accelerated Mobile Pages (AMP). I’m also ignoring use cases for a few other codebases, including ones that power our Content Management System.

1. CDNs

When a user types Refinery29.com into their browser, the computer gets the website’s relevant CDN (content delivery network) IP address by way of DNS (domain name system) lookup. CDNs help limit page load time by reducing the distance a request has to travel. If all of Refinery29’s content were served exclusively from New York, for instance, the site would be slower the further a user was from New York. A CDN fixes this problem by adding servers between users and the origin servers (which are connected to Refinery29’s database). This way, a request will hit the server closest to a user, which is called an edge server. These servers usually cache content. Refinery’s CDNs, Fastly and Edgecast, store images for up to a year, CSS, JavaScript, and fonts for up to 5 minutes. (More on CDNs here.) If the requested files and assets are not cached in edge servers, the request hits the Varnish layer.

2. Frontend accelerator: Varnish

According to its docs, Varnish is

a web application accelerator also known as a caching HTTP reverse proxy. You install it in front of any server that speaks HTTP and configure it to cache the contents.

So what does that mean? A forward proxy cache is the go-between for the client requesting the content and the origin server. A reverse proxy cache works the other way, of course: It serves as a substitute for the origin server. We have Varnish configured to cache most content for 60 seconds, and images for up to an hour. In addition to decreasing page load times, Varnish adds a layer of security, since it is run inside a firewall. If the requested content is found in Varnish, the CDN will cache it and serve it. Otherwise, the request hits nginx, our routing layer.

3. Routing layer: nginx

nginx’s docs describe how Refinery29 uses it nicely:

One of the frequent uses of nginx is setting it up as a proxy server, which means a server that receives requests, passes them to the proxied servers, retrieves responses from them, and sends them to the clients.

Typically, an nginx configuration will contain blocks that correspond to routes, in the form of regexes, that point requests in the direction of the correct application. Since we have nearly 200,000 articles, each with its own SEO-optimized basename, we store our routes as rows in a key-value store, Redis (described below). In nginx, we access that key-value store as part of the request. (This logic is in the form of Lua scripts, which we then embed in our nginx configs.) Then, nginx passes on the data in that route to the front-end app (described below). This data is a small chunk of information that does not include everything necessary to render the page, but includes just enough for the Node app to make the correct API request.

4. Front-end app

In this Node app, we have a server.js file where we request the page configuration from nginx for each request the app receives. Assuming the HTTP response status code is not a 400- or 500-level, the request URL (ex: https://www.refinery29.com/shop/clothing-c918)is converted into an API URL (ex: https://www.refinery29.com/api/delivery/2/rosetta/us/shops_categories/918), and the front-end app fetches the data. Once the app has the data, it renders a React component tree as string on the server side, then sends that to the user. On the client side, React on the front-end hydrates the server-side markup, which just means it attaches event listeners to the container. This is more performant than rendering everything on the client side.

5. APIs

Our content is currently served via two different APIs: one for individual articles or video episodes, and one for aggregation pages, like a channel page or a product list page. APIs are the bridge between the database and the front-end. For performance-related reasons, we don’t let our content delivery APIs query our MySQL database. Instead, they request “compiled” data from our key-value store, Redis. (More on that later.) But first, let’s consider Solr and our process managers, which are also connected to our APIs.

Solr

A lot of the time, the user is requesting information in list form; like when she searches the site, or when she selects various filters on shopping pages. In these cases, we can’t just rely on a simple key-value store. So we have the relevant API also query Solr, a search server.

Process managers

We write our front-end code in JavaScript and our back-end code predominantly in PHP. We use process managers (PHP FastCGI Process Manager and PM2 for Node). These are basically programs running in the background that make deployment (or the process of moving software from the development stage to production) easier. They also restart the apps that power our site automatically if they crash. They’re able to keep the app alive due to their load balancing capabilities, which in a nutshell means that if one server goes down, there are backups. (More on process managers for Express and other Node apps.)

7. Key-value store: Redis

Redis is a NoSQL database we use as a key-value store. Compared to our MySQL database, which uses relational organization that can make lookups take longer, retrieving a single normalized entity from Redis is fast. An example of a normalized entity would be an article that is made up of many different rows from many different tables in MySQL. In Redis, the article becomes a single key-value pair. This key-value pair is generated the first time a user saves an article, and is updated on subsequent saves.

8. Origin Servers

The origin servers are where our MySQL database and network file storage (NFS) live.

MySQL

This is where all the data that powers the site lives: articles, videos, contributor information, routes, product information, etc.

NFS

NFS stands for Network File Storage. With our current set-up, which we’re in the process of optimizing, we need both NFS and the database to render images, because we store height, width and format information in the database.

And that’s how content is delivered at Refinery29! A lot of these steps involve speed-related concerns or fail-safes that I never encountered before working at a company that serves millions of users. Whether you also work on a tech team or you mostly have experience working on side projects, I hope this post has provided some useful context!

Content Delivery at Refinery29

Written by Maggie Love