AMP Conversion: The Good, The Bad and The Ugly

Lessons learned while building a high-fidelity AMP converter for a wide range of publishers.

(Note: If you’re looking for a basic, not-too-technical AMP explainer, try our Procrastinator’s Guide to AMP.)

Converting your site to support Accelerated Mobile Pages is easy. Doing it well isn’t.

There are plugins and conversion services to help publishers make AMP pages. Unfortunately they don’t really do a good job. Making a good AMP (one that has all the same features as the regular page it’s sourced from) is hard.

We’ve been working on AMP conversion for a little over 10 months. In that time we’ve seen AMP evolve with more features and tighter validation requirements.

Here are some of the things we’ve had to address to make AMPs that truly reflect the design intent of the original.

HTML Conversion

Converting the base HTML is the easy part. There are open source libraries and plugins that do a passable job given a clean source page. Making an <img> into an <amp-img> isn’t that hard. The most annoying part is having to know the size of the image (AMP requires images to be dimensioned).

Once you progress beyond the basic elements, life gets more complicated. <iframe> elements become <amp-iframe> which sounds easy enough except they have to be served over a secure, https, connection. You can proxy the iframe but then you find all the components inside it also have to be served over https which gets messy fast.

How’s Your Sense of Style?

Perhaps the hardest part of the base conversion isn’t the HTML, it’s the CSS. Most auto converted AMPs look like a beginning HTML class project. The reason is that AMP has a strict limit of 50K on the size of the stylesheet (and forbids external style sheets).

50K is actually plenty if you’re starting from a blank sheet and you’re careful. However if you’re converting a site that already exists — particularly one built using 3rd party components, a theme from an external designer whose contract ended, or an internal designer who doesn’t have a month of free time — it gets hard. On most websites CSS is like that rental storage unit your in-laws have: People keep putting things in, nobody quite knows what’s in it, and you’re afraid to remove anything for fear of what you might find.

For most sites, getting under the 50K limit means throwing away their site design and starting again from scratch. Not having a month of available design time they end up extracting the text and images, doing a very basic auto conversion, and shoving result into a simplified template. The result is badly designed, ugly, and way over simplified, AMPs.

We’ve spent more time on this part of the problem than anything else, we can now take a standard web page with as much as 800K of CSS and make it work. In extreme cases we might have to trim the responsive layouts for full desktop but in vast majority we can maintain the full site CSS functionality.

The image on the left is the original responsive article. The middle image is the output of one of the text extraction converters being marketed to publishers. The image on the right is the output of Relay Media’s AMP converter.

Widgets and Video and Plugins, Oh My!

If all you want to do is write stories, show pictures and play Youtube or Vimeo videos life is fairly easy. However once you start embedding other things it gets more complex. Facebook, Twitter etc have AMP extensions. You just need to parse out their content and convert it. Other things like imgur, giphy, and a laundry list of others are not directly supported requiring a more creative approach, translating them into an iframe and proxying for https as needed.

And then there’s video. The basic <amp-video> works well if all you want to do is play video. However if you’re a TV station or newspaper and you run pre-roll ads then, at the time of writing, you have to go another route (ads are coming to <amp-video>, just not yet). If the vendor you use for your player supports AMP then “all you have to do is convert the tag.” I put that in quotes because it’s not that simple. Your ad tags and closed captions need to be served over https or they won’t work. If your vendor doesn’t support amp directly you need to figure out how to move the content into an <amp-iframe> and make it all https clean. Pro-tip: the video files themselves don’t need to be https, the browser will log a warning on the console but it will work.

Don’t Let it Go Stale

Oh, and you have to do all this in real time. The AMP cache will serve pages almost instantly to users in most cases. However you still have to render content on demand if a user requests a page that isn’t in the cache. In that situation then the conversion time becomes part of the page load time.

You also have to deal with the page content changing. Not just the article but all the elements of the page, weather alerts, related stories, latest headlines. We’ve found that a lot of effort is needed around dynamically managing cache so that high traffic stories lag the original page by no more than 60 seconds while at the same time not hammering the source server or our conversion servers. When it does change it’s important to purge and refresh the AMP cache so that users start seeing fresh pages.

Show Me the Money

Almost all the sites we support are ad supported, for-profit news organizations. As such making sure that AMP pages yield similar or better ad revenue is a key requirement. This is one place where we often intentionally diverge from the base site.

Many responsive layouts feature ads in the right rail; when the site collapses for mobile devices those ads fold below the content and are often not seen by users. We took a different approach, using a dynamic ad map that allows us to insert ad units based on paragraph or word count, as well as in fixed positions. A typical page will have ads after the 3rd paragraph and then every 3–5 paragraphs. However we made that configurable and independent of the page content.

We’re still learning how to effectively monetize AMP inventory, we’ve already seen good results with programmatic ads recognizing the inherent quality of AMP content. We’re now running some large scale A/B test using amp-experiment around optimizing the yield; we’ll be publishing case studies as we go.

So What Does it All Mean?

Like many things, the closer you get to a good outcome the more complex it is. We’re now at a point where we can automatically convert a site with 95% fidelity without any tweaks. Then we teach our system the site-specific tweaks like video players, lazy load images, carousels. The end result is a very high fidelity AMP page.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.