How does the USA TODAY NETWORK publish digital content so quickly?

Schuyler Cumbie
USA TODAY NETWORK
Published in
6 min readApr 10, 2018

Co-authored by Devansh Dhutia and Aaron Thatcher.

When most people read a story on one of the many apps or websites that are part of the USA TODAY NETWORK, they probably don’t realize what that content went through to reach them. It’s not as simple as posting something to a popular social networking site. There’s a lot that happens from the beginning stages of the story, at the desk of an editor, to the story that the general public sees. Our content delivery system can put a story through hundreds of steps that involve converting formats, applying rights management rules, indexing, and storing everything in a matter of seconds…and we do it at scale. USA TODAY NETWORK consists of more than 100 sites. Our system is divided into three main focus areas: Authoring, Presentation, and Syndication. Combined, these three areas make up a very powerful content management system that operates as the backbone of the USA TODAY NETWORK. One of the driving factors for establishing the pipeline in such a manner is to ensure our journalists are able to distribute their content out in a reliable and expedient manner.

Authoring

At the head of the system, we have the Authoring space, which is responsible for the authoring and organization of the content. In this space, a writer can begin a story, share it with colleagues, attach other assets like images, and send it to an editor for approval or corrections.

Every story written begins in Presto, which is the primary tool our 3,000+ journalists use to create, manage and publish content. This app allows newsrooms to discover and reuse existing stories, photos and videos from all sites across our Network, while managing permissions for who can actually post and edit content for a particular website. With support for multi-user editing, Presto is designed to streamline the workflow for delivering content to our readers.

What truly makes our Authoring space unique is its use of Command Query Responsibility Segregation (CQRS). At its core, CQRS is based on the idea that you can use a model to write data that is completely different than the model used to read the data. So, rather than saving a singular version of a story in some kind of text or JSON format, we save the events that occurred to eventually produce that story. While this may add a small layer of complexity, it also gives us copious amounts of flexibility on the consuming side of the Authoring space.

As the events are recorded, they are validated against the domain model,which is designed to track the state of the story and enforce business logic. For example, if we wanted to enforce a rule where a field can only be set one time and never changed, the domain model would be able to identify when that field was populated. On a much simpler level, this is where we would put other checks, such as requiring a headline to be under 120 characters. If the business logic conditions aren’t met, then the API would return with an error when the event is applied.

On the consuming side we use what we call snapshots, which are various models similar to the domain model, but not used for validation. These models can be custom tailored to fit the needs of the data. For example, in a story we have the lite read model which only shows specific fields that we care about when rendering a lightweight view, while the full model contains every field and can be used when editing the content.

With snapshots, we are able to have a highly-available read-only side of the API, while still having a very flexible write system. This is used by Presto, as well as other spaces in the system for pulling data. We can roll a story back or forward by replaying events until a specific point in time, and we have an automatic audit system to keep track of who made what changes to an asset.

Presentation

The presentation space is the next layer of the application. A piece of content will only reach this layer once it is published in the Authoring space. At a high level, this space focuses on making consumer-ready versions of content available in various formats to other platforms such as websites, mobile apps and to our syndication space.

This space uses a queueing system to process large volumes of content. As each story is sent over from the Authoring space, it is copied out into the various queues that are used for processing. At the other end of each queue, we have separate consumers which have the sole responsibility to process things they see in the queue. Among the many consuming apps in this space, of note are indexing, storage, and caching.

The cache consumer has a very important role. As content is published and sent out to the various API’s, it is also cached at the CDN in order to make the data highly-available. The cache consumer will make sure that cache is busted once the asset is published, so that stale content isn’t served.

The storage consumer will transform content and store it in the various models that our api consumers desire in Couchbase. This gives us extremely fast read speeds for downstream API consumers, since there is no need to further the transform the data.

The index consumer focuses on indexing the asset in solr, a platform we use for search, then forwards the message on to post presentation queues.

Syndication

Syndication is the final stage of our content system. This space is where we host many of our public-facing APIs and queue consumers that allow partners access to our content. As every licensing relationship is different, there are a variety of ways that we use this space to get our content out to third parties.

We also use a very creative method of storing things in this space which limits returns to content updated in the last eight weeks, since most partners aren’t interested in older content.

The majority of our partners use the pull method. This means they are periodically hitting an API endpoint and consuming content. This is done through the use of what we call our syndication API. This API is an incredibly versatile app that will pull content from our solr index and provide it in the requested format to the partner. We support RSS, MRSS and JSON formats, among other format varieties.

A select few of our content providers use a push method. This method is more complicated on our end, but is able to serve the needs of content partners who prefer up-to-the-minute content without having to constantly check an API endpoint for new information. Upon a story making it through the presentation space it is copied and placed into a series of queues (one for each push partner).

While each queue consumer in syndication has its own specific set of tasks to complete in order to push content to a partner, they all share some common elements: validation, submission, and verification. They typically start by validating the story to make sure that the story is eligible to be syndicated. Next up is the submission step. The content provider provides us with their own API endpoint and format for the story and this part of the handler sends the story over. Finally, the verification step occurs. This step will typically hit an additional endpoint provided to us by the content partner which lets us check to see if the asset was accepted or if there was an error. In most cases the asset is accepted. However if it is not accepted, we might throw it back into the queue to try again.

Overall, the process for publishing and posting a story from the USA TODAY NETWORK is quite involved and ever-changing. We maintain focus on being flexible as well as ahead of the curve. Through the hard work of everyone involved, we have developed software that simplifies what sounds like an extended pipeline of workflow systems so that content publishing takes mere seconds from when it is Published in the CMS to when it shows up on a website, native app, or third-party platform.

--

--

Schuyler Cumbie
USA TODAY NETWORK

Senior Developer, Content Engineering, USA TODAY NETWORK