A File Format For Static Websites

Everybody’s doing it wrong. Use this file format.

Ever hear of a “static website?” They’re the Next Big Thing in website hosting — and a return to yesteryear. Unlike today’s normal hosting companies, static-site hosting companies don’t run any of your code. Economies of scale make your website crazy-fast and crazy-cheap.

Static websites can‘t write to a database on the server. No sign-in forms; no online shopping; no photo upload. With a bit of creative programming, a static site can handle just about anything else.

GitHub Pages, Jekyll, Middleman and Roots all generate static websites. They all do it by “compiling” your website into a directory full of files.

And that’s wrong.

What Is A Static Page?

A static page is a web server’s response to a web browser’s request.

For instance, here’s the dialogue between a browser and a server for the world’s first web page:

GET /hypertext/WWW/TheProject.html HTTP/1.1
Host: info.cern.ch
HTTP/1.1 200 OK
Date: Tue, 04 Apr 2017 22:25:34 GMT
Server: Apache
Last-Modified: Thu, 03 Dec 1992 08:37:20 GMT
ETag: "40521e06-8a9-291e721905000"
Accept-Ranges: bytes
Content-Length: 2217
Connection: close
Content-Type: text/html
<HEADER>
<TITLE>The World Wide Web project</TITLE>
... (obsolete HTML continues...)

That’s a static page: an HTTP response. It has three components:

  1. path: where the file is stored on the server.
  2. headers: instructions the web browser needs to decode the content.
  3. body: the data.

Here’s what today’s static website generators would produce:

<HEADER>
<TITLE>The World Wide Web project</TITLE>
... (more really-old HTML stuff)

That’s the static page body. The site generator will store it at a certain path — but often the wrong path. And there’s no sign of headers.

Today’s site generators do this:

  1. Produce static pages
  2. Write the static pages to files
  3. Upload the files to a website

They should skip step 2, because it’s problematic.

The Price Of Files

Using files means you lose your Content-Type. Something needs to calculateContent-Type: text/html before serving the HTML to your web browser. Content-Type isn’t always easy to guess; for instance, Content-Type: text/csv; charset=utf-8 is tricky. So who guesses the Content-Type? Is it your static site generator (while it uploads)? Is it your hosting service (as it receives files)? Is it custom code you write? How do you debug it?

Files can’t store ETag and Cache-Control. These headers let you configure some responses to be fast and cheap (like images) and other responses to be easy to overwrite at a moment’s notice (like breaking news articles).

Files aren’t aware of Content-Encoding. Some hosting companies (S3 springs to mind) won’t compress static pages for you, making them load more slowly and at a higher cost than you’d like. You can solve that problem by compressing the body and setting Content-Encoding … but your filesystem doesn’t understand this.

And with files, you can’t name a file after a directory. For instance, https://github.com/huffpostdata is a web page, and https://github.com/huffpostdata/in-memory-website is a web page. But if your static site generator generates a file called output/huffpostdata, it can’t generate another file called output/huffpostdata/in-memory-website because that would only work if output/huffpostdata were a directory … but it’s a file.

Maybe your site generator works around this by storing output/huffpostdata/in-memory-website and output/huffpostdata/index, where “index” is some special keyword. But then you can’t create a file called “index”. In general, files don’t map to web-server endpoints.

Plus, different filesystems allow different filenames. Linux filesystems can store a file named foo:bar, but 2017’s Windows and Mac filesystems can’t.

Finally, reading and writing files is slow.

In sum: files don’t describe a static website. So static website generators shouldn’t write files.

The Solution: In-Memory Website

A website is a bunch of HTTP responses. Each response has a path, some headers and a body.

I’ve coded this in in-memory-website: some NodeJS tools and a language-agnostic specification.

The idea is:

  • All static website generators can output static websites. At HuffPostData I coded a couple of site generators: hpd-asset-pipeline and hpd-page-generator.
  • You can store a static website in a file and load it later.
  • You can upload a static website to a hosting service like S3.
  • You can modify a static website — for instance, add ETags or gzip-compress.
  • You can develop a static website using a development server that re-runs your framework every time a file changes. (My NodeJS development server even makes the browser refresh, with LiveReload.) The development server can mimic S3 almost exactly.
  • You can stream a static website, if it’s too large to fit in memory. And if you must, you can always stream it into filesystem files.
  • You can mix and match programming languages: pipe a StaticWebsite from your Ruby site generator to your Node S3 uploader.

I look forward to the day static website generators live in a happy community. Our first step: ditch the filesystem.