yes.no — the architecture of a social network startup in 2016

yes.no is a crowd-interviewing Q&A social network. It’s pretty cool.

The following is an overview of its architecture, including an in-depth examination of pivotal decisions, why we chose each building block and how it all ties together.

Background

Below is a schematic overview of the architecture along with accompanying paragraph numbers for reference. The following can/should be read either top-down or as a reference (i.e., not all in one sitting).

Intended Audience

As a spoiler, the ‘controversial’/most interesting decisions (and corresponding articles) are:

  1. Using server-side rendering + AJAX (No SPA, no JS framework) — article 2
  2. Using Sinatra instead of Rails (with Ruby) — article 6.
  3. Using Heroku instead of AWS — article 10.

So jump directly to those if you’d like; or read the whole damn thing.

OK, enough chit-chat, let’s get started.

Major components of a yes.no, a crowd-interviewing social network

1. Cloudflare

2. Web FE: Vanilla + JQ JS (2)

Server- vs client-side rendering debates have been done ad nauseam (some advocate rendering on both client and server, e.g., using a React view generator on the server). In yes.no’s case, there is no need for powerful app-like behaviour in the browser and no justification for what would essentially require development and maintenance of an entire app (and a performance hit on every fresh page-load). Coupled with the ‘JS Fatigue’ and wariness of the ever-changing JS framework landscape, as well as my own comfort and speed with vanilla JS, made this call.

Some considerations behind this are:

  1. Code simplicity — server-side rendering is simple and robust.
  2. Time-to-first-content — pages should load immediately, with no ‘spinner’.
  3. SEO — while techniques exist to enable SEO of client-generated content (see React-on-the-server, above) this problem is clearly much better solved by rendering on the server.
  4. Coupling view and logic —despite separation of concerns cliché best-practice, server-side rendering enables quick adjustments (e.g. new dependent modifications to view, which in a client-side app might necessitate API changes and remote development. In terms of personnel, coupling hurts when you want to grow (you want codebases that can be maintained separately) but can help when you want to squeeze more performance out of full-stack devs (you often want the same person to be able to maintain both view and server at the same time/file).

The main ‘pain points’ of using server-side rendering are:

  1. 2nd-page performance (the 2nd page a user views, when content can be brought in AJAX rather than a full page load)
  2. Code complexity

The primary considerations for this are SEO, time-to-first-content, and code simplicity.

What we do use, massively, is views-over-the-wire, a technique that is becoming common-place. That is, the browser performs an AJAX call to bring new content, but the AJAX call does not return data (to be fit in a client-side template) but rather the full rendered HTML to be displayed (thus, the ‘view’ is transferred over the wire). This allows holding the template (code) only on the server instead on both client and server. The slight performance hit taken by sending the whole view rather than just JSON data is offset by the speed of development this enables. On yes.no, most ‘2nd pages’ are retrieved by AJAX which then replaces the main part of the view in the page in the browser; this allows us to maintain the template code only on the server (and serve up any 1st page with no ‘spinner’), while allowing an AJAX-loading UX for any 2nd page. This is generally surprisingly simple code, something along the lines of:

$.get('/new_page')
.success(function(res) { $("#main-content").html(res.html) } );

Along with some history-navigation modification to preserve URLs and ‘back’ clicks. This gives us the most important part of a SPA (performance) without the biggest pain points (1st page load, separate maintenance and development).

On a lighter note, it is interesting to note the defensive stance that is taken when presenting a non-SPA architecture. In any case, server-side rendering has been battle-tested for years and has been working well for us. Despite that, I expect this to be the main point of contention.

“What, no Angular? React? Backbone? Redux? Seriously, jQuery? Vanilla?” So yeah, vanilla and JQ. They work, all senior JS devs know them, they’ll be around in 5 years. They worked for us, with minimal hassle and impressive dev speed.

Soup of the day is: Vanilla

3. Bootstrap + SCSS

Custom styles are in in SCSS, with a background job in the dev env compiling it on the fly (using the Ruby gem filewatcher):

$ filewatcher ‘**/*.scss’ ‘scss $FILENAME > $FILENAME.css; echo “created”-$FILENAME; date’

This allows very quick and robust CSS development, even for CSS-challenged noobs like myself.

Bootstrap. It’s good enough.

4. iOS

The result is a native UX, the underlying code is JS (and React to boot). We have found RN to be the ‘bleeding edge’ we were expecting, both for better and for worse — and overall it gave us the desired results. It is worth stressing the main selling point of React Native from a company/personnel perspective — being written in JS, we did not have to find an iOS dev to do it; a React Native dev is generally a senior JS dev, which often means they can also do web client work (and sometimes fullstack). This is a huge win in terms of manpower, effectively breaking down the silo of expertise usually associated with iOS developers.

The internal architecture of our React Native app is outside the scope of this article.

5. Android

  1. It was not available yet for Android when we began using it
  2. Java is a mature language, for which it is easier to find senior devs, than iOS languages
  3. Don’t put all of your bleeding-edge eggs in one basket.

I will note a major architectural difference between Android and iOS — the iOS app carries a robust caching mechanism, reusing data between requests (e.g. data for the same user or post will not be requested twice during the same run), while the Android app generally regenerated each view by requesting the fresh data from the server. ‘To cache or not to cache’ is one of the quintessential architectural dilemmas. In this case the mixture of both reflects both a belief in variance across your clients as well as the personal architectural decisions of the people leading each project (which in itself, reflects my belief in empowering local-level decision making, e.g. the Android team lead/lead dev should generally be making Android architectural decisions, even while his iOS counter-part is making the opposite decision).

Other than that, the internal architecture of our Android app is also outside the scope of this article.

6. Sinatra

Sinatra is perfect (for me, at least). It’s just lightweight enough that it gives you the standard stuff you need for a web app server, but gets out of the way. You end up writing yourself some of the stuff you’d get for free in Rails, but it really doesn’t take long (because Ruby is amazing) and then you know exactly what every single line of code in your app does, and exactly how every single thing works.

This line of thought is a bit dangerous if you are uncomfortable around the web stack; I do feel however that what is basically a ‘glorified CRUD’ can be well-enough understood to be wired together ourselves in a knowledgeable fashion. The benefits are a) I now know exactly what everything does; b) a smaller memory/time footprint.

  • Sinatra also comes with tux, its equivalent to Rails’ console. Tux (in dev and prod) makes debugging a breeze.
  • Concurrency, briefly, is handled by multiple dynos (handled by Heroku, see below), each running multiple Sinatra workers (Puma), each running multiple threads. Mentioning concurrency so briefly seems crazy, but Heroku (see below) really just makes it that easy, even using a standard I/O-blocking language like Ruby.
Put this in your pipe, and smoke it

6.01 The ‘monolith’

Alternate architecture suggestions for yes.no or similar sites are built around microservices, separate processes encapsulating different parts of the business logic. While at a certain scale the benefit to separate maintenance/scaling of separate services is justifiable, the overhead is immediate and the cost (in time) is relentless.

Similarly to other ‘silver-bullet’ tricks (NodeJS’s non-blocking IO comes to mind), in many cases (including this one) the simplest structure is often the best — simplest to understand, debug, maintain, and scale, all of which come from the simplicity of being able to reason about it. In short, a simple CRUD mechanism for users, questions, answers and comments can be mapped onto a single process, scaled (load-balanced) at the process level (see ‘Heroku’ below). In short: microservices are a classic YAGNI.

6.1 API

  • The API is maintained as part of the main web server (see 6.01, ‘the monolith’), giving it access to the entire business logic of the app.
  • The API is served up from m.yes.no, enabling scaling mobile traffic separately from web traffic.
  • The API is only for mobile clients, not for the web client. While server-side views obviously would not ingest API endpoints anyway, the multiple web client-side AJAX calls call endpoints outside The [Mobile] API as well. The reason for this is backwards compatibility, which generally must be maintained for mobile clients, but does not need to be maintained for web clients. This difference means web code can be drastically altered at short intervals and aggressive continuous improvement, whereas mobile-facing code must be treated with severe caution as to never break existing functionality. Thus, separating the endpoints enables us to be aggressive on the web, while safe for mobile.
  • Despite the above, the endpoints obviously use much shared business logic. Creating a user is still creating a user, and so on.

7. Background Workers

The architectural reasoning is simple (and familiar to anyone who has ever worked with queues) — any action that might take long could slow down the response time significantly, hurting the awaiting user (and any other users waiting for the server to free up). So, we drop an event into a queue ({action: ‘send_email’, to: ‘sella@yes.no’, … }), and a separate process will grab the event and process it later.

The code for such a process could be remarkably simple:

require ‘./app’puts “Emails worker reporting for duty”Rabbit.subscribe(‘emails’) { |payload|
AsyncWrapper.do_direct('send_email', payload)
}

Really, that’s it. Two more notes:

  1. You may notice a layer of abstraction (the eponymous AsyncWrapper) which allows us to set a local flag (say, in dev env) to skip the asynchronous rerouting and execute the action directly. This allows maintaining a simple(r) dev env, namely a single web process running, without the need to maintain a queue and worker running as well.
  2. The above emails worker can be refactored into a more generic, multiple-topic worker (still running on a single queue) by encoding the event ot be performed as part of the payload itself:
require ‘./app’puts “Multiple-Topic worker reporting for duty”Rabbit.subscribe(topic) { |payload|
AsyncWrapper.do_direct(payload[‘action’], payload)
}

The loss of granularity of workers and topics (you now have only a single worker, queue, and piece of code for multiple topics) is both a pro and a con — simpler to maintain, but impossible to scale or measure separately. Depending on the use case, this is sometimes a gain and sometimes a loss.

It is worth pointing out the classical Twitter use-case (which is often used on job interviews) of ‘What Happens When Beyoncé Tweets’ (or ‘Answers a Question’) in the case of yes.no. In a nutshell, when a power user performs an action (answers a question), we want to notify all of their (potentially numerous) followers this happened. Obviously we cannot do this within the request-response cycle while ‘Beyoncé’ is waiting (we can’t ask Beyoncé to wait while we 10,000 emails), so we drop an event to the queue, release the response, and process the event later. Processing the event itself (‘send_notification_to_all_followers’), even while being processed in the background, is still heavy — we might need to iterate over thousands of DB records (if not more), possibly more than might be possible to draw into memory in a single query. So we have to run a paginated query over all followers. For each follower, we want to send a notification/email — and this action itself is already implemented in async fashion (as explained above). So now we have an event triggering another event, which might trigger another event… this is robust from a performance perspective, but brittle from a debugging POV. Once things go wrong (and you know they will), you’re guaranteed to have a good time chasing this — but what can you do? Scaling a social network is non-trivial.

8. Cron Jobs

puts “loading cron scheduler…”require ‘./app’
scheduler = Rufus::Scheduler.new
EVERY_DAY_AT_7_AM = ‘0 7 * * *’ scheduler.cron RUFUS_EVERY_DAY_AT_7_AM do
send_daily_digest
end
scheduler.join

The BL itself is implemented within the app, the cron job is just the wrapper to facilitate timed execution.

Cron jobs, event listeners, and the web server are all managed by a Procfile on Heroku (see below, #10).

9. Ruby

The main downside of Ruby is that it is not lightening-fast. In a social web-app, however, you are so rarely CPU-bound that this is just inconsequential compared to the productivity boost you get. Optimize for developer happiness and productivity.

Optimise for Developer Happiness

10. Heroku

Heroku is very expensive compared to AWS, but very cheap compared to humans. Once you start paying humans, it should be obvious that they are what is costing you money, and that any service you can throw money at will give you far more efficiency than humans (this includes yourself, assuming the reader is a human). If you have a devOps team of 1 person, you would be paying him a few thousand dollars a month just to set up and maintain your stack; once you need a 3-person devOps team, your personnel expenditures become a huge burden — salaries, insurance, office space, management time, days off, discussions, mistakes, recruitment, retaining, it never ends. Humans are sensitive; humans are difficult. They’re only human. Paying 25$ a month per process, seemingly an insane amount, suddenly becomes mere peanuts. How much devOps time does 25$ buy? In Tel-Aviv, this is less than an hour’s work — and that’s assuming you managed to find and hire a decent devOps person in the first place.

Heroku gives you load-balancing OOTB; you will never need to set up a load-balancer and deal with the world of problems that it entails. This gives you horizontal (CPU) scalability immediately.

You can use a DB-as-a-service (see MongoDB, below) and never have to set up (or maintain) a prod DB. MongoDB also supports sharding OOTB (see below) which means horizontal scalability at the DB level from day one (although you’ll have to bomb Mongo pretty hard with user-generated data before sharding is even necessary).

Memory management, logs, and so on, you get the idea. Platform and infra are the most common of issues; throw a bit of money at it — insignificant sums, compared to humans — and use somebody else’s solution for those. Buy more computers and less people.

I’d rather do it with Heroku; she’s what a platform’s supposed to be

11. MongoDB

To me, Mongo just hits the sweet spot. Schemalessness is non-negotiable. Sharding supported OOTB and JSON as (basically) the native format is such an incredible win; breakneck read performance; and a sane native library which means I can skip using classes, models, or an ORM/ODM altogether. So much code I can just never write / maintain / reason about / debug. Adding a new field to some (or all) users is seamless, you just start using that field as it was always there. Same goes for whole collections. No downtime, no migrations. The same fluidity experienced developers of dynamic languages are used to be in the code (“Poof, every user now also has a ‘height’ variable”) available in the DB as a well (“Poof, every user now also has a ‘height’ variable”).

An actual complete route for fetching a user’s details could be:

# USERS = MONGO.collection('users')get 'users/:username' do
user = USERS.get(username: params[:username])
{name: user['name'], email: user['email']}
end

Generally, I relegate control of business logic to the DB itself, without maintaining it in classes.

Brief aside on why classes and objects are overused in a web app: code objects are meant to track the life-cycle of a business logic object, but in a web request-response cycle the ‘object’ only exists for a millisecond, throughout the request. It is silly to reason about or ‘track’ its ‘state’; once the request is done the object ceases to exist. Thus, maintaining code for instantiating, serializing and tracking ‘objects’ is pointless: objects should exist only as represented data in the DB and in the client; requests and responses should only deal with transferring data. Any ‘side effects’ done to an object — say ‘update user email’ can and should be done on the DB, not on a ‘user object’. In an on-going process (say, a computer game) objects make sense — a ‘user’ object may exist for hours at a time, gain hit points, characteristics, etc. In a web app, users do not remain ‘in memory’, they are persisted to disk. Thus the flow between client and DB should deal with transferring data, not objects. In practice, this means hashes/JSON through and through.

This maps perfectly on to Mongo — the object in Mongo is exactly the object we refer to intuitively, and can be retrieved in full as such directly from the DB. The app, in turn, deals only with hashes, keys and values.

So we don’t use Mongoose or any other ORM or ODM. No ‘User’ class. No discussions about inheritance. Programmers of old might shudder at the lack of structure and declare it to be unmaintainable; practice has shown them wrong. The discipline and order it takes to maintain this is negligible compared to the benefits, and relying on the DB as the single source of truth (for data as well as structure) is simple and robust enough (especially when the data is already in JSON format, the standard both in DB, server and client).

11.1 Other DBs (or lack thereof)

12. RabbitMQ

13. Emails

Postmark (and other email add-ons) are also available as a Heroku add-on. (Have I mentioned Heroku is sweet?)

14. Parse

15. Add-Ons (other)

16. Backoffice

17. Google Docs as a DB

At yes.no, we use Google Docs as a DB for cases like the UI texts or other system values. Admins can edit the Google Doc and whenever they want they can trigger a script to pull the results and update the system. This eliminates a complicated setup and allows non-techs to manipulate system values right into Mongo, quite easily.

18. FB & Google Apps

19. Cloudinary

20. Analytics

At the end of the day, I only trust my own data, and I only trust ourselves to view and query it correctly.

Summary

As an addendum, here is a list of various software architecture maxims I believe in (*cough-rants-cough*) that influenced yes.no’s development and architecture.

  1. TradeOffs — everything is a tradeoff. You are eternally debating tradeoffs between time spent now, time spent later, money, product perfection, performance, and everything else. Every single thing you do, means less of everything else. This means you must prioritize aggressively, and not do anything that isn’t actually really important. Seriously: YAGNI. We worked aggressively to keep everything as technically simple as possible; this has resulted in a remarkably simple development environment and flow, which results in dramatically quick (read: “agile”) development.
  2. Optimize for developer happiness, speed and productivity. And for awesome developers (and keep them happy). Everything else will come naturally.
  3. Some things are just not necessary/cost-effective in web dev. No ORMs. No Classes. No Tests. (Come at me!) Tests can be done inefficiently, in which case they are pointless, or efficiently, in which they cost more time (read: money) than they save. Unit tests are nice if writing units is your job. Full-stack tests, which is all anyone cares about, are still too hard to get done efficiently. (Like I said, yeah, come at me, ye haters of the untesting!)
  4. ‘Globals’ are fine. THEY ARE FINE. Don’t use global variables with mutating states, but definitely use global constants. In any application, most things are ‘global’, in the sense that at any given moment they should be able to be accessed from anywhere, and it is obvious what they should mean. (e.g ‘Mongo’, ‘Redis’, ‘UsersCollection’, etc.) Don’t be an idiot and try to mutate these in runtime; also don’t hire idiot programmers that would mutate globals at runtime. Boom, there you go, now globals are fine.
  5. Move even faster, break even more things. Nobody cares about you and your stupid app; it doesn’t need to be perfect. Build it fast and make people care, before you run out of time/money/patience.

Me

(I’m the one on the right, in the green shirt.)

Full-Stack, Ruby, and JavaScript Consulant @ US, UK, IL. If you enjoy my work or are looking for a consultant, reach out at http://sellarafaeli.com

Full-Stack, Ruby, and JavaScript Consulant @ US, UK, IL. If you enjoy my work or are looking for a consultant, reach out at http://sellarafaeli.com