By: Dave Cole
It’s been almost two years since Development Seed deliberately stopped building websites with Drupal and moved away from CMS-driven applications altogether. Since then, our recent blog posts about investing in Prose.io, rebuilding our own websites with Jekyll, creating the MapBox Map Site templates, and launching new client-sponsored projects like MIX Maps indicate the new approach we’re taking. Informed by both the lessons learned by Development Seed over four years of leadership in the Drupal community and my own experience of going through two redesigns of WhiteHouse.gov and migrating it to a Drupal backend, what follows is my reflection on the evolution of our process and how it leads to simple, flexible, and reliable websites that allow for a renewed focus on design and strategy.
The old way
In the past, building websites with features like consistent templates and lists of aggregated content meant setting up complex content management systems. These CMSs consisted of templating logic, application code, and content databases so they could assemble webpages each time they were requested by site visitors. They were complicated systems that depend on many separate applications working together, like a web server to route page requests to a PHP application that uses pre-defined page layout templates to format content that’s stored in a MySQL database. Serving a page request required at least three separate applications all working together — any one failing would bring down the system.
To make this work at scale requires even more complexity: introducing new applications to cache the information in the database or the output of the PHP application, replicating content across several database servers while trying to keep new content in sync, spinning up new web servers to handle surges in traffic, and many other scalability tactics devised over the years to hold together this model. In the end though, the ability to keep the website “up” and serving content at pace with a deluge of requests depends on the developers’ ability to turn on new servers and a reliance on caching schemes.
When we want to add additional functionality to a CMS beyond templating, it’s usually something that tweaks the administrative interface. Things like adding an upload field for video or resizing images to fit the site’s layout all have more to do with administrative preprocessing of content and can be done ahead of time and by external services.
Despite the complexity of these systems and lack of consensus on their implementation, they all need to produce the same formats as outputs. These are the web’s open standards. And that’s the agreement developers have with web browsers — the developers build systems to deliver basic files that conform to established standards and the browsers visualize that content in a predictable way. Except of course Microsoft’s web browser Internet Explorer, whose irresponsible disregard for web standards and majority market share has both slowed the evolution of the entire web by a matter of years and cost enterprises unimaginably high expenses in maintenance and additionally development for compatibility. Nonetheless, when we have consensus on the output format of website content, one wonders whether there is a more elegant and simple way to generate it.
Back to the basics
By developing websites as “client-side” applications that only consist of the files directly usable by a web browser with no extra work done by backend servers, we are able to pass on substantial cost savings to our clients while virtually eliminating risk of the website going “down”. For additional functionality not available to client-side applications, we vet and integrate external services. We can deploy our projects on practically any web server and not worry about whether it has the right software to run our application or the technical resources to handle high traffic. In fact, we’ve deployed most of our projects for free using GitHub Pages, a service that hosts static files directly from a code repository. In more advanced cases, we can deploy sites on Amazon’s S3 service, which provides reliable and scalable static file hosting at high speed and low cost. Other times, we just zip up the files of our website and send them over to clients to host on their own internal web servers without any additional technical hurdles.
The basic stack is as simple as it gets:
- GitHub Pages static HTTP server
- Supplemented with external APIs where necessary
Of course, the simplicity and reliability of these websites’ architecture initially came at the expense of some of the dynamic features of CMSs. For instance, without a database we had no way to filter and visualize large datasets or accept and process user generated content like comments. And without a server-side templating system to generate our pages, we limited ourselves to single page sites to avoid the complications caused by trying to present content uniformly across several pages. And we had no web-based interface to offer clients an easy way to update and maintain the site content. Our workflow was specifically tailored toward developers and centered on the command line and git.
After dozens of projects and iterations on our process, we’re well on our way to overcoming most of these obstacles by devising a process by which we build static sites dynamically.
Embedding services for advanced functionality
First we handle functionality that previously called for a database, like handling user-generated content or visualizing data, by delegating to external services and integrating with their APIs. The nature of the modern web makes this much easier than in the past. Many services exist that focus on specific problems. We use Flickr for managing and embedding photos, Vimeo for video, and Twitter and Disqus for replies and comments. These services have APIs or simple widgets that embeds their content in our static webpages to fill in the holes in dynamic functionality.
We’re now at a point where we can build completely client-side websites that have advanced server-side applications embedded from dedicated external services. If one service fails, it does not impact others or the content of the website. This is a marked improvement in resilience over the previous model.
Templates and generating HTML content
It turns out we can invert the process of templating and generating HTML files too. In the old model, we built the shell of a webpage with its structure and layout in one file, and when the browser requests that webpage, we insert its content from the database into the appropriate places of that template to generate dynamically the HTML file the browser needs. This process enforces consistent page layout and design. All blog posts look the same, and the main blog index can have a list of the title and teaser of each blog post sorted by reverse chronology. We achieve the flexibility we need, but it comes at the cost of needing an application constantly running to generate pages on request.
Specifically this means by pointing a web browser at
http://example.com/blog/this-post the web server application would receive the request and route it to another application for processing, and then return that processing application's output back to the web browser for display. In the case of Drupal, all requests are routed from an Apache web server to the same file on the server, so
/blog/this-post is passed as a parameter to
index.php. The index file is the main processing application, so when its called it will load its core components and any additional contributed modules, process the requested path (
/blog/this-post), find the appropriate template file, and load the content record into that template (another part of the application looks up the content in the database by a url alias for
/blog/this-post). The resulting output is HTML that index.php passes back to the web server, which it relays to the browser. The browser has no idea what's happening on the server. It just expects that requesting a URL like
http://example.com/blog/this-post will return content in an established standard format. In fact, the original assumption of the web was that requesting
http://example.com/blog/this-post would return the file
this-post(/index.html) in the directory
blog from the server
http://example.com, just like operating systems organize local files. The need for flexibility in formatting traded this simplicity of routing for complex server-side systems that interpret and process otherwise simple file requests. The downsides are more applications introducing points of failure into the process of building the page, the additional time it takes the application to retrieve the content from the database and generate the page on request, and the need to repeat the whole process on subsequent requests. Efforts to optimize this process focus on caching various points of the process. Either faster access to the data or circumventing the whole process for repeated requests within a set amount of time speed up delivery of the page and lessen the load on the server. When those optimizations are exhausted, add more servers. And add more servers in different places to account for those other servers failing.
After deliberately ending our developing systems like this and focusing again on building simple client-side websites, we eventually looked for solutions to the templating problem that avoid the need for on-demand rendering. This time, we were particularly interested in applications that would generate content in advance instead of on request. There’s very little reason to build and run an entire CMS application to dynamically generate tens, hundreds, even a few thousand pages. Instead, an application can bulk-generate all of the output files needed for the site, using the same kind of dynamic templating, in a matter of a few seconds to minutes. Then, we can host the resulting static files in nearly any environment. Requests for
http://example.com/blog/this-post will actually return a file at the path
blog/this-post(/index.html). Routing and serving content is simple and fast, which is a prerequisite for reliable and scalable architectures.
Topping off the stack with Jekyll:
- Jekyll for page templates and static file generation
- GitHub Pages static HTTP server
- Supplemented with external APIs where necessary
For templating and site generation, we’re using Jekyll, an open source project that started at GitHub almost four years ago. We use it for this website developmentseed.org, mapbox.com the product page for our mapping work, and dozens of other projects. Jekyll stores all of your content in simple text files. The metadata that describes a content post sits atop the content in the same text file. This metadata associates the content with layout templates, allows for advanced formatting like filtering on categories or tags, and can store arbitrary structured data like associating posts with authors, featured photos, or anything else the template developer lays out.
From straight up blog and page content sites like this one to advanced map and data portals, we can use Jekyll to generate sites that rival the layout flexibility of our most complex Drupal sites with none of the development and maintenance challenges a dynamic CMS introduces. Where we need server-side functionality, we patch it in using external APIs. With some clever templating, we can configure our Jekyll sites to output content APIs like RSS and JSON feeds. For instance we’re using a JSON template to generate the index we use for this site’s search feature. When we need to include large datasets into our sites, we use server-side scripts that break down the dataset into individual JSON files that serve as an API for our site. We can even convert datasets into YML, Jekyll’s metadata format, and insert the data throughout our site templates. With embedded external APIs and a flexible templating and generation process, our sites look and feel dynamic with incredible performance and reliability.
Making it easier to manage a Jekyll site
Because of its reliance on editing text files instead of a web interface to add and edit content, Jekyll is often thought of as a blogging platform for hackers. But with a little investment, the rest of us can have nice things too.
To that end we’re investing heavily in Prose.io, a web-based content editor specifically designed to work well with Jekyll. Prose allows for editing text files hosted in GitHub, where we store all of our code. It provides an elegant interface that focuses on writing. Content producers can go to Prose.io to create new posts or edit existing ones through this simple interface. Their changes are saved directly to GitHub, which maintains a record of every version of every file. With GitHub Pages enabled, we can host Jekyll sites for free directly from the code we commit to GitHub. Changes made through Prose on a GitHub Pages site are automatically pushed live to the site. Or, the code edited with Prose can be copied from GitHub to any other hosting environment.
End-to-end from creating content through a web interface to pushing changes to a live website, Prose mimics the workflow of traditional CMSs. But it does so with much less bloat. For instance, every Drupal and Wordpress site has its own administrative functionality. The code to generate webpages is usually intertwined with the code to serve webpages. With Prose and Jekyll, there is a clear and deliberate separation. For the administrative application, we can focus on making Prose the best place for writing content on the web. For the actual website components, all of our Jekyll templates and posts are stored separately from the application in a version-controlled service. And the actual pages of the website are all pre-generated and served directly from GitHub or another static hosting environment. If the administrative interface fails, our content is unaffected. If we need to turn on a new site, we focus only on designing the site, developing its templates and creating content. We don’t have to think about the challenges of configuring an administrative interface or hosting infrastructure. We have flexibility at every point in the system too. We can easily move the output of a site to another host, such as Amazon S3, if we’re looking for something truly scalable and reliable.
As with anything on the web, this new model is rapidly evolving, and there is much room left for optimization. As we are showing by building Prose.io, there needs to be much more investment and refinement in the tools that enable this new approach. Nonetheless, this marks a profound new direction and substantial improvement in the way we conceive of and build websites. We now spend much more of our time on what truly matters to a project’s success — design and strategy. We worry about the lasting impact the content of our projects will have, not whether they will withstand a link from a popular news site or need crucial security patches to their administrative interface. As one of our friends mentioned in a recent meeting, this renewed focus on simple HTML websites feels a little like history repeating. For us, getting back to the basics of building for web standards and applying new tools like static site generation and external API integration is an exciting and very modern path forward.
Cover photo by Yogendra174, CC-BY-SA.