Don’t write an SPA*

Published in

Semantic Scholar

8 min readJul 12, 2017

*Unless it makes sense to.

Ask almost any frontend developer out there about the state of JS tooling and frameworks, and they’ll probably tell you that it’s impossible to keep up with what’s new. Angular? React? Vue? Aurelia? Selecting the right one for a new project is a daunting task, making it easy to choose one without thinking critically about the long term implications.

The vast majority of projects choose React or Angular, and accordingly adopt a single page application (SPA) architecture. This approach has its benefits, but they don’t come without a cost — SPAs introduce several layers of complexity that aren’t always easy to recognize up front. An SPA might be the right choice for your project, but sometimes boring old technologies are the right choice too — it just depends on the goals of your project.

When we created Semantic Scholar in 2015, we built a client rendered SPA using React that we later made isomorphic. While this approach definitely paid off in some ways, it also came with a few challenges that we didn’t anticipate. Below are some of the benefits and drawbacks we’ve experienced over the past two years.

What went well.

Our site feels fast.

The decision to build an SPA really boiled down to one key requirement — speed. Performance is an extremely important part of a good user-experience, as demonstrated by experiments conducted by both Google and Amazon:

Half a second delay caused a 20% drop in traffic. Half a second delay killed user satisfaction.

The SPA architecture enables our application to render additional pages / content without downloading additional Javascript and/or CSS — the only bytes we have to send down are those which make up the data we’re rendering. While the net benefit of this from a raw bandwidth perspective is small (as the initial payload users download is larger), the benefit lies in a reduction of “perceived latency.” Client-side rendering enables the display to be immediately updated with every interaction, making the application feel snappier, even if the API is chugging hard to get you the results you’re looking for.

The architecture seems to have paid dividends in this regard. We’ve received a lot of feedback from end users that the site is incredibly responsive and fast — and our metrics back this up. The paper detail page renders within 1.16 seconds for 90% of our users, including those on mobile devices / slow networks. What’s really interesting about this is that our initial load time isn’t so great — for 90% of users it takes around 5 seconds to load and render the page! Yikes! This supports our intuition that end users care more about “perceived” latency than actual latency. If the application feels responsive via immediate visual cues, the actual time spent retrieving search results becomes less noticeable. The SPA architecture made building a responsive, snappy interface more straightforward than it otherwise might have been (that said, you can definitely accomplish the same thing with other techniques).

It’s easy to write tests.

The popularity of the JavaScript ecosystem and SPA architecture has a great side-effect : there’s a ton of great libraries for testing UI components (we use enzyme, jsdom and mocha specifically). Granted, these toolsets aren’t exclusive to SPAs — but React was developed with testing in mind, which in turn makes the code you write with it easier to verify. Riding the wave of tools and libraries driven by the popularity of an approach has very real advantages — granted, the wave can (and in all likelihood will) break at some point. Surf’s up.

It’s easy to build interactive features.

As the web and the applications we build with it continue to advance, so do the expectations of our users. Tools like React and angular simplify the code behind developing highly interactive experiences because the specifics of interacting with the DOM and its differences across browsers are handled for you. Plus, the resulting abstraction caters to well encapsulated, testable components (it’s so important it’s worth repeating).

It’s fun to work with.

The SPA is a double-edge sword — it’s the shiny, unwieldy thing we all want to pick up and use, even though we might hurt ourselves in the process. We find working with React and SPA-based architectures really interesting, more so than tried and true approaches like server-rendered templates with a dash of jQuery on top. We’re engineers after all — we like solving tough problems! That said, we’re also our own worst enemy in this regard, as keeping thing simple lends itself towards stable software that’s easy to change. This in turn empowers more iteration (and a better end product). The key is finding the right balance by applying technologies we’re excited about in a fashion that produces robust systems that can be changed rapidly.

Lessons learned.

Analytics aren’t free.

With a traditional, server rendered application the analytics story is really simple:

Login to Google Analytics.
Copy and paste the snippet into your base template.
Deploy your website.

Bam! A few days later your Google Analytics dashboard is full of useful information about the most popular pages on your site and how long it takes people to load the page in some remote location on a satellite internet connection.

With an SPA the notion of a traditional “pageview” goes out the window — and without custom instrumentation the data provided by Google Analytics becomes pretty useless. Several companies like New Relic and MixPanel are trying to fix this — but we still found the metrics provided by their SPA offerings pretty limited and/or inaccurate.

If you’re going to write an SPA, be prepared to do more work instrumenting your analytics platform. You’ll have to add hooks to your codebase for tracking pageviews (given the transitions occur on the client) and instrumentation for tracking performance related metrics (as GA nor any of the solutions mentioned above effectively track the time it takes to transition). This might sound simple, but it’s important to recognize that it’s not free. You’ll spend time writing and testing these hooks, and you’ll definitely run into unanticipated bugs / issues (no software is perfect, after all). We made a big mistake in that we didn’t test any of this initial instrumentation — and suffered the consequences when our metrics stopped working due to unrelated changes / regressions. That said, we’re happy to report that we’ve now got nearly completely coverage the analytics we track — losing sight of what our end users are experiencing is pretty drastic.

Stateful applications are complex.

Stateful systems are inevitably more complex — the developer has to reason about concurrency, mutability, locking, etc. — and with these concerns come hard to debug bugs. Embracing the SPA architecture requires diving head-first into this world, as the JavaScript runtime is one where mutable, global application state reigns supreme. Sure, things like Immutable and redux help make this more manageable — but under the hood you’re really still just modifying global state. Server side rendered web applications lack this complexity — each request is processed as a single transaction with clear I/O (a request in, a big ole’ blob of HTML out). There’s a lot less failure scenarios, and the runtime is entirely in our control.

Client-side rendered applications don’t index well.

Search engines like Google and Bing claim to index JavaScript rendered content without issue, but the specifics aren’t well documented and a test we executed demonstrated that the capabilities of these crawlers are really limited. We experimented by launching our site first as a client-rendered SPA only — requesting any page on the site simply resulted in a bare-bones HTML file that bootstrapped the experience:

<!DOCTYPE html><html>
  <head>
    <meta charset="utf-8">
    <title>Semantic Scholar</title>
  </head>
  <body>
    <script src="main.js"></script>
  </body>
</html>

We then implemented a sitemap with links to each paper in our corpus, making it easy for Google and other engines to crawl our site. As it turns out, Google indexed each and every URL, we ended up with results that looked like this:

Clearly Google was indexing the URLs, but none of the content was being ingested. Not only does this look bad, but it causes each page to be indexed as an empty document — not very relevant to most queries.

The experiment revealed that client-side rendering wasn’t going to cut it , so we moved to an “isomorphic” application that rendered on the client and server. Luckily React made this possible — but not without jumping through a few hoops (which were on fire, and moving).

“Isomorphism” is hard.

While React allowed us to forget about the complexity of interacting with Internet Explorer’s DOM abstractions, SEO required that we implement an application that could render on the server or client. For us, this was uncharted territory. There weren’t a lot of supporting libraries or tools, and the code samples we found were relatively incomplete. Something as simple as reading a cookie now forced us to think about the abstraction in more detail:

function getCookieValue(cookieName) {
  if (typeof document !== 'undefined') {
    const cookieStore = document.cookie;
    ...
  } else { 
    if (!requestContext) throw 'No cookies for you.'
    const cookieStore = requestContext.getHeader('Cookie')
    ...
  }
}

The complexity is ultimately manageable — it just requires more effort during the software design phase to make sure the abstractions are well thought out.

Where we’ve landed, and where we’re headed.

So you’re probably thinking at this point that we’d like to nuke n’ pave — but in actuality we’re quite happy with where we are today:

Our site is fast — users rave about its responsiveness.
The content indexes remarkably well. We continue to connect with a broader audience via search engines.
We have a robust suite of tests that help us change our system quickly and with confidence.
Our team enjoys working with the codebase and continues to look for ways to simplify the architecture.

That said, it didn’t come without a cost — and the codebase is far from perfect. Things we’re thinking about:

Carving out clear boundaries that represent the core “application.” For instance, one might argue that the search experience merits a SPA, while the content pages could be server-rendered.
How to manage the continued growth of our JavaScript bundle. It’s only 312 KB (compressed) currently, but as we build new features it’ll grow and at some point significantly impact the load time for new visitors. This requires navigating another perilous passage — we either “hot-load” dependencies or split our application into several, separate applications with individual JavaScript bundles.

At the end of the day, we just want to make sure the architecture we choose doesn’t get in the way of our mission: to reduce information overload for researchers by providing them with innovative, comprehensive, and fast ways to keep up with the latest advances in their fields.

So are you telling me I shouldn’t write an SPA?

Not necessarily. SPAs are a good choice if:

The interface you’re building is highly interactive and needs to be fast / responsive — and you want an architecture and abstractions that supports this out the gate.
You’re not worried about each and every bit of content being indexed by search engines correctly.

On the other hand, an SPA might not be worth the complexity if:

Your application is simple (there’s not a lot of interactive content)
SEO is of pivotal importance.

But at the very least, take a step back and consider what you need before selecting an SPA — you’ll thank yourself down the road.

~ @codeviking