yes.no — the architecture of a social network startup in 2016

21 min readFeb 23, 2016

yes.no is a crowd-interviewing Q&A social network. It’s pretty cool.

The following is an overview of its architecture, including an in-depth examination of pivotal decisions, why we chose each building block and how it all ties together.

Background

yes.no is a social network where you can ask your friends and heroes questions, read and comment on their answers, and answer questions yourself. It has a web client, an Android client, and an iOS client. You can sign up using FB, Google, or your email. yes.no is available in up to 30 languages (varies by feature and platform). I co-founded yes.no and built most of the web and server stack during 2015.

Below is a schematic overview of the architecture along with accompanying paragraph numbers for reference. The following can/should be read either top-down or as a reference (i.e., not all in one sitting).

Intended Audience

This article is mostly an overview of the scope of what an entire social network architecture entails; it is meant for those learning and interested in the entire scope rather than just the hot topics.

As a spoiler, the ‘controversial’/most interesting decisions (and corresponding articles) are:

Using server-side rendering + AJAX (No SPA, no JS framework) — article 2
Using Sinatra instead of Rails (with Ruby) — article 6.
Using Heroku instead of AWS — article 10.

So jump directly to those if you’d like; or read the whole damn thing.

OK, enough chit-chat, let’s get started.

Major components of a yes.no, a crowd-interviewing social network

1. Cloudflare

Cloudflare gives us DNS, CDN for static files, DDOS-protection, HTTPS, and request geolocation. All of this is out-of-the-box. HTTPS is both for security, appearance, and iOS9. Geolocation is essential for serving up localized content (when a user browses from France, show French content); Cloudflare passes it along as a header they add to the request, which saves a lot of work you would otherwise have to figure out how to do yourself (without slowing down the request).

2. Web FE: Vanilla + JQ JS (2)

yes.no’s views are generally rendered server-side (Ruby, ERB). We do not use any SPA framework, and all the client-side code is VanillaJS (on top of JQ).

Server- vs client-side rendering debates have been done ad nauseam (some advocate rendering on both client and server, e.g., using a React view generator on the server). In yes.no’s case, there is no need for powerful app-like behaviour in the browser and no justification for what would essentially require development and maintenance of an entire app (and a performance hit on every fresh page-load). Coupled with the ‘JS Fatigue’ and wariness of the ever-changing JS framework landscape, as well as my own comfort and speed with vanilla JS, made this call.

Some considerations behind this are:

Code simplicity — server-side rendering is simple and robust.
Time-to-first-content — pages should load immediately, with no ‘spinner’.
SEO — while techniques exist to enable SEO of client-generated content (see React-on-the-server, above) this problem is clearly much better solved by rendering on the server.
Coupling view and logic —despite separation of concerns cliché best-practice, server-side rendering enables quick adjustments (e.g. new dependent modifications to view, which in a client-side app might necessitate API changes and remote development. In terms of personnel, coupling hurts when you want to grow (you want codebases that can be maintained separately) but can help when you want to squeeze more performance out of full-stack devs (you often want the same person to be able to maintain both view and server at the same time/file).

The main ‘pain points’ of using server-side rendering are:

2nd-page performance (the 2nd page a user views, when content can be brought in AJAX rather than a full page load)
Code complexity

The primary considerations for this are SEO, time-to-first-content, and code simplicity.

What we do use, massively, is views-over-the-wire, a technique that is becoming common-place. That is, the browser performs an AJAX call to bring new content, but the AJAX call does not return data (to be fit in a client-side template) but rather the full rendered HTML to be displayed (thus, the ‘view’ is transferred over the wire). This allows holding the template (code) only on the server instead on both client and server. The slight performance hit taken by sending the whole view rather than just JSON data is offset by the speed of development this enables. On yes.no, most ‘2nd pages’ are retrieved by AJAX which then replaces the main part of the view in the page in the browser; this allows us to maintain the template code only on the server (and serve up any 1st page with no ‘spinner’), while allowing an AJAX-loading UX for any 2nd page. This is generally surprisingly simple code, something along the lines of:

$.get('/new_page')
 .success(function(res) { $("#main-content").html(res.html) } );

Along with some history-navigation modification to preserve URLs and ‘back’ clicks. This gives us the most important part of a SPA (performance) without the biggest pain points (1st page load, separate maintenance and development).

On a lighter note, it is interesting to note the defensive stance that is taken when presenting a non-SPA architecture. In any case, server-side rendering has been battle-tested for years and has been working well for us. Despite that, I expect this to be the main point of contention.

“What, no Angular? React? Backbone? Redux? Seriously, jQuery? Vanilla?” So yeah, vanilla and JQ. They work, all senior JS devs know them, they’ll be around in 5 years. They worked for us, with minimal hassle and impressive dev speed.

3. Bootstrap + SCSS

Our CSS rests mostly on Twitter Bootstrap for basic forms, buttons, and especially — grid responsiveness. Bootstrap is not a special snowflake, but it gives you the bread and butter you need to set up not-butt-ugly design until you apply your own styles. I have found responsiveness to be quite easy with Bootstrap (and quite a pain rolling it yourself).

Custom styles are in in SCSS, with a background job in the dev env compiling it on the fly (using the Ruby gem filewatcher):

$ filewatcher ‘**/*.scss’ ‘scss $FILENAME > $FILENAME.css; echo “created”-$FILENAME; date’

This allows very quick and robust CSS development, even for CSS-challenged noobs like myself.

4. iOS

yes.no’s iOS app is written in React Native, a new-ish technology (early 2015) by Facebook which allows you to write native iOS apps with JS. (As opposed to most JS frameworks for mobile apps, which do not enable a native look & feel.)

The result is a native UX, the underlying code is JS (and React to boot). We have found RN to be the ‘bleeding edge’ we were expecting, both for better and for worse — and overall it gave us the desired results. It is worth stressing the main selling point of React Native from a company/personnel perspective — being written in JS, we did not have to find an iOS dev to do it; a React Native dev is generally a senior JS dev, which often means they can also do web client work (and sometimes fullstack). This is a huge win in terms of manpower, effectively breaking down the silo of expertise usually associated with iOS developers.

The internal architecture of our React Native app is outside the scope of this article.

5. Android

Our Android app is written in Java. As mentioned above we were insistent on a native look & feel. We did not choose React Native for Java as well because:

It was not available yet for Android when we began using it
Java is a mature language, for which it is easier to find senior devs, than iOS languages
Don’t put all of your bleeding-edge eggs in one basket.

I will note a major architectural difference between Android and iOS — the iOS app carries a robust caching mechanism, reusing data between requests (e.g. data for the same user or post will not be requested twice during the same run), while the Android app generally regenerated each view by requesting the fresh data from the server. ‘To cache or not to cache’ is one of the quintessential architectural dilemmas. In this case the mixture of both reflects both a belief in variance across your clients as well as the personal architectural decisions of the people leading each project (which in itself, reflects my belief in empowering local-level decision making, e.g. the Android team lead/lead dev should generally be making Android architectural decisions, even while his iOS counter-part is making the opposite decision).

Other than that, the internal architecture of our Android app is also outside the scope of this article.

6. Sinatra

Our main web framework is Ruby’s Sinatra. Sinatra is awesome. I come from a Rails background, and (prepare yourself for the standard anti-Rails rant) always felt Rails is too bloated, heavy, complicated, magical. I’d rather build things and connect the dots myself rather than sift through Rails documentation and try to understand the implementation details of declarative has_many bindings.

Sinatra is perfect (for me, at least). It’s just lightweight enough that it gives you the standard stuff you need for a web app server, but gets out of the way. You end up writing yourself some of the stuff you’d get for free in Rails, but it really doesn’t take long (because Ruby is amazing) and then you know exactly what every single line of code in your app does, and exactly how every single thing works.

This line of thought is a bit dangerous if you are uncomfortable around the web stack; I do feel however that what is basically a ‘glorified CRUD’ can be well-enough understood to be wired together ourselves in a knowledgeable fashion. The benefits are a) I now know exactly what everything does; b) a smaller memory/time footprint.

Sinatra also comes with tux, its equivalent to Rails’ console. Tux (in dev and prod) makes debugging a breeze.
Concurrency, briefly, is handled by multiple dynos (handled by Heroku, see below), each running multiple Sinatra workers (Puma), each running multiple threads. Mentioning concurrency so briefly seems crazy, but Heroku (see below) really just makes it that easy, even using a standard I/O-blocking language like Ruby.

6.01 The ‘monolith’

A conspicuously absent part of this architecture is the ‘services’ part. The past few years have seen a ‘microservices all-the-rage’ fad come, and hopefully go. yes.no’s Sinatra is a ‘monolith’ block of code — no external services, one old-fashion process that holds all the logic to do everything.

Alternate architecture suggestions for yes.no or similar sites are built around microservices, separate processes encapsulating different parts of the business logic. While at a certain scale the benefit to separate maintenance/scaling of separate services is justifiable, the overhead is immediate and the cost (in time) is relentless.

Similarly to other ‘silver-bullet’ tricks (NodeJS’s non-blocking IO comes to mind), in many cases (including this one) the simplest structure is often the best — simplest to understand, debug, maintain, and scale, all of which come from the simplicity of being able to reason about it. In short, a simple CRUD mechanism for users, questions, answers and comments can be mapped onto a single process, scaled (load-balanced) at the process level (see ‘Heroku’ below). In short: microservices are a classic YAGNI.

6.1 API

yes.no has an API (https://m.yes.no/mobile/ping) meant for serving the mobile clients. Some points about it:

The API is maintained as part of the main web server (see 6.01, ‘the monolith’), giving it access to the entire business logic of the app.
The API is served up from m.yes.no, enabling scaling mobile traffic separately from web traffic.
The API is only for mobile clients, not for the web client. While server-side views obviously would not ingest API endpoints anyway, the multiple web client-side AJAX calls call endpoints outside The [Mobile] API as well. The reason for this is backwards compatibility, which generally must be maintained for mobile clients, but does not need to be maintained for web clients. This difference means web code can be drastically altered at short intervals and aggressive continuous improvement, whereas mobile-facing code must be treated with severe caution as to never break existing functionality. Thus, separating the endpoints enables us to be aggressive on the web, while safe for mobile.
Despite the above, the endpoints obviously use much shared business logic. Creating a user is still creating a user, and so on.

7. Background Workers

The ‘silver bullet’ of scaling, anything that might take a long time is relegated to a background worker, which is basically a simple Ruby process waiting for events in a queue.

The architectural reasoning is simple (and familiar to anyone who has ever worked with queues) — any action that might take long could slow down the response time significantly, hurting the awaiting user (and any other users waiting for the server to free up). So, we drop an event into a queue ({action: ‘send_email’, to: ‘sella@yes.no’, … }), and a separate process will grab the event and process it later.

The code for such a process could be remarkably simple:

require ‘./app’puts “Emails worker reporting for duty”Rabbit.subscribe(‘emails’) { |payload|
   AsyncWrapper.do_direct('send_email', payload)
}

Really, that’s it. Two more notes:

You may notice a layer of abstraction (the eponymous AsyncWrapper) which allows us to set a local flag (say, in dev env) to skip the asynchronous rerouting and execute the action directly. This allows maintaining a simple(r) dev env, namely a single web process running, without the need to maintain a queue and worker running as well.
The above emails worker can be refactored into a more generic, multiple-topic worker (still running on a single queue) by encoding the event ot be performed as part of the payload itself:

require ‘./app’puts “Multiple-Topic worker reporting for duty”Rabbit.subscribe(topic) { |payload|
 AsyncWrapper.do_direct(payload[‘action’], payload)
}

The loss of granularity of workers and topics (you now have only a single worker, queue, and piece of code for multiple topics) is both a pro and a con — simpler to maintain, but impossible to scale or measure separately. Depending on the use case, this is sometimes a gain and sometimes a loss.

It is worth pointing out the classical Twitter use-case (which is often used on job interviews) of ‘What Happens When Beyoncé Tweets’ (or ‘Answers a Question’) in the case of yes.no. In a nutshell, when a power user performs an action (answers a question), we want to notify all of their (potentially numerous) followers this happened. Obviously we cannot do this within the request-response cycle while ‘Beyoncé’ is waiting (we can’t ask Beyoncé to wait while we 10,000 emails), so we drop an event to the queue, release the response, and process the event later. Processing the event itself (‘send_notification_to_all_followers’), even while being processed in the background, is still heavy — we might need to iterate over thousands of DB records (if not more), possibly more than might be possible to draw into memory in a single query. So we have to run a paginated query over all followers. For each follower, we want to send a notification/email — and this action itself is already implemented in async fashion (as explained above). So now we have an event triggering another event, which might trigger another event… this is robust from a performance perspective, but brittle from a debugging POV. Once things go wrong (and you know they will), you’re guaranteed to have a good time chasing this — but what can you do? Scaling a social network is non-trivial.

8. Cron Jobs

Some tasks need to be run periodically (rather than triggered by an external event). This is trivially implemented by a cron job, again with minimal code. (We use Rufus, https://github.com/jmettraux/rufus-scheduler).

puts “loading cron scheduler…”require ‘./app’
scheduler = Rufus::Scheduler.newEVERY_DAY_AT_7_AM = ‘0 7 * * *’ scheduler.cron RUFUS_EVERY_DAY_AT_7_AM do
 send_daily_digest
endscheduler.join

The BL itself is implemented within the app, the cron job is just the wrapper to facilitate timed execution.

Cron jobs, event listeners, and the web server are all managed by a Procfile on Heroku (see below, #10).

9. Ruby

All backend code is written in Ruby. Ruby is… amazing. It is insanely productive and fun to write in, and delivers on its maxim of maximizing developer happiness. While JS is a valid contender, it seems Ruby is the leader (or one of them) of the open web, and the community is huge and supportive. It also plays very nice with Heroku. (And, importantly, I am/was personally very experienced with it.)

The main downside of Ruby is that it is not lightening-fast. In a social web-app, however, you are so rarely CPU-bound that this is just inconsequential compared to the productivity boost you get. Optimize for developer happiness and productivity.

10. Heroku

Heroku is a PAAS, or Platform as a Service. Rather than just getting a bare-bones machine from AWS, Heroku comes battery-included, and basically gives you everything you need to run your app except your app’s code. Which is the way it should be.

Heroku is very expensive compared to AWS, but very cheap compared to humans. Once you start paying humans, it should be obvious that they are what is costing you money, and that any service you can throw money at will give you far more efficiency than humans (this includes yourself, assuming the reader is a human). If you have a devOps team of 1 person, you would be paying him a few thousand dollars a month just to set up and maintain your stack; once you need a 3-person devOps team, your personnel expenditures become a huge burden — salaries, insurance, office space, management time, days off, discussions, mistakes, recruitment, retaining, it never ends. Humans are sensitive; humans are difficult. They’re only human. Paying 25$ a month per process, seemingly an insane amount, suddenly becomes mere peanuts. How much devOps time does 25$ buy? In Tel-Aviv, this is less than an hour’s work — and that’s assuming you managed to find and hire a decent devOps person in the first place.

Heroku gives you load-balancing OOTB; you will never need to set up a load-balancer and deal with the world of problems that it entails. This gives you horizontal (CPU) scalability immediately.

You can use a DB-as-a-service (see MongoDB, below) and never have to set up (or maintain) a prod DB. MongoDB also supports sharding OOTB (see below) which means horizontal scalability at the DB level from day one (although you’ll have to bomb Mongo pretty hard with user-generated data before sharding is even necessary).

Memory management, logs, and so on, you get the idea. Platform and infra are the most common of issues; throw a bit of money at it — insignificant sums, compared to humans — and use somebody else’s solution for those. Buy more computers and less people.

I’d rather do it with Heroku; she’s what a platform’s supposed to be

11. MongoDB

We use MongoDB as the main (and as of time of writing, only) persistence layer. This is because MongoDB (henceforth, ‘Mongo’) is amazing. This point is particularly important because of the inexplicable (to me) hate that Mongo seems to get on the interwebs.

To me, Mongo just hits the sweet spot. Schemalessness is non-negotiable. Sharding supported OOTB and JSON as (basically) the native format is such an incredible win; breakneck read performance; and a sane native library which means I can skip using classes, models, or an ORM/ODM altogether. So much code I can just never write / maintain / reason about / debug. Adding a new field to some (or all) users is seamless, you just start using that field as it was always there. Same goes for whole collections. No downtime, no migrations. The same fluidity experienced developers of dynamic languages are used to be in the code (“Poof, every user now also has a ‘height’ variable”) available in the DB as a well (“Poof, every user now also has a ‘height’ variable”).

An actual complete route for fetching a user’s details could be:

# USERS = MONGO.collection('users')get 'users/:username' do
  user = USERS.get(username: params[:username])
  {name: user['name'], email: user['email']}
end

Generally, I relegate control of business logic to the DB itself, without maintaining it in classes.

Brief aside on why classes and objects are overused in a web app: code objects are meant to track the life-cycle of a business logic object, but in a web request-response cycle the ‘object’ only exists for a millisecond, throughout the request. It is silly to reason about or ‘track’ its ‘state’; once the request is done the object ceases to exist. Thus, maintaining code for instantiating, serializing and tracking ‘objects’ is pointless: objects should exist only as represented data in the DB and in the client; requests and responses should only deal with transferring data. Any ‘side effects’ done to an object — say ‘update user email’ can and should be done on the DB, not on a ‘user object’. In an on-going process (say, a computer game) objects make sense — a ‘user’ object may exist for hours at a time, gain hit points, characteristics, etc. In a web app, users do not remain ‘in memory’, they are persisted to disk. Thus the flow between client and DB should deal with transferring data, not objects. In practice, this means hashes/JSON through and through.

This maps perfectly on to Mongo — the object in Mongo is exactly the object we refer to intuitively, and can be retrieved in full as such directly from the DB. The app, in turn, deals only with hashes, keys and values.

So we don’t use Mongoose or any other ORM or ODM. No ‘User’ class. No discussions about inheritance. Programmers of old might shudder at the lack of structure and declare it to be unmaintainable; practice has shown them wrong. The discipline and order it takes to maintain this is negligible compared to the benefits, and relying on the DB as the single source of truth (for data as well as structure) is simple and robust enough (especially when the data is already in JSON format, the standard both in DB, server and client).

11.1 Other DBs (or lack thereof)

As a side note, it is worth mentioning why we don’t (yet) have other databases (e.g. Redis, ElasticSearch). At a certain scale it is inevitable to introduce function-specific databases. However, you should postpone that point as far as possible. Systems grow complex surprisingly fast, and it is much harder to remove functionality (or components, like DBs) than it is to add them. Once you add Redis, you will be maintaining it forever — in prod, in dev, fine-tuning it, teaching new devs to use it, reasoning about it, maintaining it in sync with other DBs, and so on. Mongo is fast enough that for user-generated amounts of data and with proper indexing, you should be able to go a long way before it won’t be efficient enough. Remember, your own time is the most precious resource you have.

12. RabbitMQ

As mentioned in #7 — Background Workers. Specifically, we use RabbitMQ — which is simple, robust and reliable, and we use it ‘as-a-service’ via CloudAMQP, a Heroku add-on. Dead-simple to set up and use, robust enough for massive growth.

13. Emails

We send emails using Postmark (“Emails as Service”). There are many email providers, and emails entail more work than is immediately apparent. Other than sending it asyncly, one must manage the graphic design, copy work, localization, testing on various email clients, adding ‘unsubscribe’ options, implementing it with inline CSS styles, and so on.

Postmark (and other email add-ons) are also available as a Heroku add-on. (Have I mentioned Heroku is sweet?)

14. Parse

We use Parse just for sending push notifications to our mobile apps. Sadly, Parse has announced they’re shutting down, so we’ll be switching to another provider, but the architecture will remain similar — sending a push notification to a user with an app is essentially a POST to either a proxy service like Parse or directly to GCM or APN (Google and Apple’s services for Push Notifications).

15. Add-Ons (other)

Heroku allows for a wide range of other add-ons to monitor things like performance, logs, activity graphs, and everything under the sun. We use some of them. Hey, nobody said this document would be comprehensive.

16. Backoffice

Every social network needs an admin panel — a ‘backoffice’, a way to enable viewing and interacting with the data for the non-techs, without granting complete access to the DB. This is necessary almost from day one; the first major architectural decision to this is deciding between a 3rd-party-service, a stand-alone app, or a layer on top of the existing app. We chose the third option — e.g., specific pages and abilities that are exposed only to admin users. We limit these pages to the office IP and track their usage to prevent any security concerns. (As a side benefit, once we had an ability to show specific features to specific users, this ability was easily extended to other ‘classes’ of users such as beta testers, thereby enabling feature flags in general.) Specifically, refraining from launching a separate admin removed the need to manage and develop a completely stand-alone app which would essentially need to copy much of the same business logic.

17. Google Docs as a DB

Google Docs has proven to be an excellent relational DB, for certain use-cases (mostly when the DB is read-only from the app). Its interface allows immediate robust CRUD access to anyone (including non-techs) with zero development. A simple script can be used to download the contents (from a JSON-supplying endpoint) and dump them into the actual DB (Mongo, in our case).

At yes.no, we use Google Docs as a DB for cases like the UI texts or other system values. Admins can edit the Google Doc and whenever they want they can trigger a script to pull the results and update the system. This eliminates a complicated setup and allows non-techs to manipulate system values right into Mongo, quite easily.

18. FB & Google Apps

Not much interesting to say about this — but we have FB and Google Apps. Social login is really important — to the point where it should be considered.
We opted from the get-go to hit off with FB, Google, and Email sign-up; in retrospect I think this was a mistake in the sense that it’s too much too early. This is especially true because we launched with iOS and Android and web clients, each sign-up method carries its own complexities (from malformed emails to default Google+ profile pics to FB users with no email), dealing with the matrix of combinations was simply a lot of work that might have been better applied elsewhere.

19. Cloudinary

Cloudinary is an image management service for storing, manipulating and delivering images. They handle almost everything regarding storing images, offloading that responsibility (and traffic) from your server. It’s the natural progression of X-as-a-service and should be the go-to strategy with everything that’s not your core business.

20. Analytics

Despite the above admonition regarding outsourcing whatever you can, I have had the opposite experience with analytics. We use both Google Analytics (mostly for web/server) and Mixpanel (for both web/server and mobile events). It has been my experience that in the case of analytics, rolling your own is actually a far better experience both in the short and long run. Basic tracking of events as well as customizable filters and views on the company’s data is almost trivial to create and maintain, and gives you exactly what you want without slogging through the mountain of complexity/limits of 3rd-party analytic suites (like GA/MXP). It seems that time and again, anything I want to check is usually faster for me to check myself using pre-existing (or ad-hoc) views of logs I maintain myself on my users/data than trying to navigate Google Analytics into it.

At the end of the day, I only trust my own data, and I only trust ourselves to view and query it correctly.

Summary

If you’ve made it this far, kudos. If you just scrolled all the way down — shame on you! This is why we can’t have nice things.

As an addendum, here is a list of various software architecture maxims I believe in (*cough-rants-cough*) that influenced yes.no’s development and architecture.

TradeOffs — everything is a tradeoff. You are eternally debating tradeoffs between time spent now, time spent later, money, product perfection, performance, and everything else. Every single thing you do, means less of everything else. This means you must prioritize aggressively, and not do anything that isn’t actually really important. Seriously: YAGNI. We worked aggressively to keep everything as technically simple as possible; this has resulted in a remarkably simple development environment and flow, which results in dramatically quick (read: “agile”) development.
Optimize for developer happiness, speed and productivity. And for awesome developers (and keep them happy). Everything else will come naturally.
Some things are just not necessary/cost-effective in web dev. No ORMs. No Classes. No Tests. (Come at me!) Tests can be done inefficiently, in which case they are pointless, or efficiently, in which they cost more time (read: money) than they save. Unit tests are nice if writing units is your job. Full-stack tests, which is all anyone cares about, are still too hard to get done efficiently. (Like I said, yeah, come at me, ye haters of the untesting!)
‘Globals’ are fine. THEY ARE FINE. Don’t use global variables with mutating states, but definitely use global constants. In any application, most things are ‘global’, in the sense that at any given moment they should be able to be accessed from anywhere, and it is obvious what they should mean. (e.g ‘Mongo’, ‘Redis’, ‘UsersCollection’, etc.) Don’t be an idiot and try to mutate these in runtime; also don’t hire idiot programmers that would mutate globals at runtime. Boom, there you go, now globals are fine.
Move even faster, break even more things. Nobody cares about you and your stupid app; it doesn’t need to be perfect. Build it fast and make people care, before you run out of time/money/patience.

Me

My name is Sella Rafaeli. I co-founded and built yes.no and I consult on all things web. You can read more about me at sellarafaeli.com.