Building a Social Network Without Wanting to Harm Others

Ben Stephenson
9 min readOct 29, 2015

--

Each node and vector represents an imminent threat to your very sanity

At Lens, we’ve spent the past 18 months building a professional social network based around existing professional communities. Along the way, our engineering team has slowly to come to the realisation that while network platforms can, at times, be a lot of fun to build, the challenges they present are quite different to typical application development.

Any kind of web-based programming can feel like pulling teeth at times; the complex interactions required in a social network, however, can often feel like finding the six lowest-rated dentists on Yelp and lending them your mouth for a week to practice.

We often found that the biggest problem was a lack of reference material. Building social networks just isn’t something people do that much (god knows why). In that vein, we thought it might be worth sharing a number of the big issues that we’ve faced and how we’ve started to solve them. Bear in mind that these aren’t necessarily unique to network platforms, but where they’re more common, they’re compounded by the nature of the product.

For anyone interested in checking out Lens, we’re in invite only beta at the moment, there’s a limited number of invitations available here or email me at info+invites@lens.io and we’ll get one to you.

Access Control

So you’ve decided that Facebook doesn’t do enough for Anglers and you’re going to build your own social network for fishermen. “Plaicebook” maybe? Something that’s going to arise very early on (beyond a lawsuit) is the need to deal with very complex access controls or permission sets. In a basic application, say a collaborative todo list, you have a fairly basic set of objects and relationships:

todoItem belongsTo User
User belongsTo Team

You can define some access controls like so:

todoItem {
read: TeamMember,
write: TeamMember,
delete: Owner,
update: Owner,
}

This is fairly simple and can probably be dealt with in a controller (!owner && throw new Error(‘Cannot Delete’)). It also requires very few database calls (depending on how your data is structured):

fn canRead(todoItem, user) {
if (todoItem.owner === user) return true;
return user.getTeam() === todoItem.owner.getTeam();
}

Because of the small volume of objects and thus the small volume of relationships, you can probably get away with manipulating your database architecture to suit permission lookups:

fn canRead(todoItem, user) {
return user.team === todoItem.team;
}

There we go! Clean, efficient, easy to follow and also very easy to understand for someone else. Unfortunately, within a social network you don’t get off that lightly.

Even a very simple social network (which Lens isn’t) you’re going to start needing new models like a fat child needs Splenda. What’s more, the relationships between models begin to get quite silly quite quickly.

Why does this happen a lot with social networks specifically? It’s because there are so many different components to a social network (posts, walls, friends, events, group, pages, messages, notifications etc) and they each have multiple different contexts (posted by a friend, posted by someone else, posted by a friend of a friend, posted by a page I follow, posted by a page a friend follows etc…)

To give you a brief insight into the ensuing chaos, let’s take for example the act of fetching a single post from Lens. We need to check that you have permission to read this post and return relevant data, here’s what we want to return:

post {
id,
title,
body,
owner {
// User record
}
}

To do this we have to do the following:

1. Check you have permission to get the post.
2. Mask the post so you can see only the information permitted.
3. Fetch the user record that created the post.
4. Mask the user record to see only the information permitted.

Now that doesn’t seem too bad. However, it starts to spiral out of control when you realise that you’re talking about multiple access contexts with a lot of relationships involved. Here’s a basic breakdown:

1. Check you have permission to get the post:
- Is the community the post was published to public? OK
- Are you the post owner? OK.
- Are you a member of that community? OK.
- Do you have an invitation to join that community? OK.
- Otherwise. NOT OK.
2. Mask the post so you can see only the information permitted.
3. Fetch the user record that created the post.
4. Mask the user record to see only the information permitted.
- Are you this user? Send back full.
- Are you a contact of this user? Send back some fields but not email.
- Are you none of the above? Send back basic fields.

That’s a lot of steps to get right, and to make sure you’re checking, but what’s worse, you’re touching a lot of different models (Community, User, CommunityInvite, Contacts) just to find out what my relationship to the post is. So if you’re not careful, you’re going to start making far too many database requests. And bear in mind this is what happens for fetching one single post, already knowing its ID. It gets significantly scarier when you start to try and aggregate things into a dashboard.

Slaying the Permission Beast

Unfortunately, as far as we’re aware, there’s no great solution here. You’re going to have to check permissions vigorously and often. You can certainly minimize the overhead though, by taking into account a few things:

  1. Write Complex Queries Manually
    ORMs are great for the simple stuff, but unless you designed it yourself (which you shouldn’t), complexity is going to lead to turning something that could be one query into seven.
  2. One Size Does Not Fit All
    Don’t try to have some GateKeeper method that you solely rely on for dealing with permissions. You need something like that in certain circumstances, but if you design your controllers and the client-server interaction well, you can often get away with returning the basic permission sets and then having the client request privileged information only when needed.
  3. Get As Much Done At Once As Possible
    Rather than having two endpoints for, say, “Get groups that I’m a member of” and “Get posts from group” and calling them on a dashboard page, it’s better to wrap that up into one single endpoint. This will allow you to fetch the relevant permissions only once. We’ll talk about this more below.

API Design

This problem is by no means confined solely to social networks - it’s something that gets compounded by sheer volume of different objects. Designing a strictly RESTful API is going to cause problems very quickly. Imagine the dashboard page of your favourite social or professional network. It probably contains the following (for a start):

- Posts
-- Users that posted those posts
--- Whether or not you're connected to that user
-- Post Comments
--- User that commented
---- Whether or not you're connected to that user

Now if we’re being very RESTful, we could split this down into a bunch of calls that may look something a bit like this:

GET /posts/dashboard

ForEach Post {
GET /post/:postId/user
GET /contacts/:postOwnerId
GET /posts/:postId/comment

ForEach Comment {
GET /posts/:postId/comment/:commentId/user/
GET /contacts/:commentOwnerId
}
}

Now this is pure insanity, but it demonstrates two things that can start to run away from you very quickly if you’re being overly pedantic and API-focused:

  • You’re making far, far too many requests and it’s going to slow everything down
  • You’re going to get insanely annoyed with RESTful api routes like /obj/:objId/child/:childId/grandchild/:grandChildId etc.

Stopping the Madness

  1. Consider GraphQL
    If you can, think about using GraphQL. It allows you to avoid the headaches of RESTful APIs and use a more sensible syntax. It’s not for everyone though, and does come with its own drawbacks so do consider carefully. This great summary by playlist dev Jacob Gillespie
  2. Return Sensible Data
    Is there really any chance that you’re going to need a post without the user record? Are you really going to need the user record without knowing if you are a connection or not? Think about whether the time saved by using database join is enough to justify the overhead of an extra HTTP request.
  3. Bundle Common Requests Together
    If you have a dashboard page, why not have a /dashboard endpoint that returns much of the data required to generate the page? Admittedly it’s not exactly RESTful and comes with a much bigger payload but if you’re requesting a lot of different types of data from different endpoints, this will really help.

Keeping Everything In Sync

Social networks are inherently difficult to navigate. There’s a lot of people doing a lot of different things a lot of the time. Because of this, networks typically have a few handy features to help you cut through everything, like notifications, requests and an activity feed (on Lens we actually have another which keeps track of your reputation points).

These are effectively objects whose only purpose is to reference other objects, which they do with great volume. Imagine a notification that a user receives when someone comments on a post they make:

notification {
post,
commenter,
comment,
}

This notification references three separate objects (which is about average for a notification). Now, at first glance, what you could do is have a database table structured like this:

`notifications`
id | owner | notification
---+-------+-------------------------------------------------------
1 | ben | "Joe Smith commented on your post,'Best JS function ever'"

That’s pretty much a non-starter. Forgetting the fact that this is useless to the client (which will need to link to Joe Smith and your fantastic post about your JavaScript method), pretty much everything you’ve listed there is dynamic. Joe Smith could change his name; he could delete his comment; or you could decide that your function was pretty shitty after all and change the title.

None of these things individually may seem mission critical, but you can see how with just a few changes, the notification system stops helping people and starts confusing them.

Next you might try something like this:

`notifications`
id | owner | notification_type | notification_id
-------------------------------------------------
1 | 1 | "post_comment" | 1

`postCommentNotifications`
id | comment | post | commenter |
---------------------------------
1 | 1 | 2 | 3 |

// comment, post and commenter (user) tables referencing the above

This gives you the advantage of resolving everything out at the time it’s requested and thus ensuring that it’s up to date.

The downside to this is that the number of database requests (forgetting access permissions for a second) that you’re going to need to fetch someone’s notifications becomes ungodly. In the above example, you’d have to do the following:

1. Get User Notifications
2. Get type-specific notification table (`postCommentNotifications`)
3. -> Get referenced object #1 (comment)
4. -> Get referenced object #2 (post)
5. -> Get referenced object #3 (commenter)

Even without access control confirmation, that’s a whole heap of database queries; and that’s just for one notification!

Eventual Consistency and Beyond

What we really need is a way to minimize the inconsistency between the object the notification references and the information the notification relays at request time, while taking into account that our database has feelings too.

We do this by a kind of hybrid system that allows us to store the notification in its purest form (containing referencing ID’s rather than hard-coded data), while at the same time resolving out the data we’ll want to give the client and storing it in a “cache” table.

It looks a bit like this:

`notifications`
id | owner | notification_type | notification_id
-------------------------------------------------
1 | 1 | "post_comment" | 1

`postCommentNotifications`
id | comment | post | commenter |
---------------------------------
1 | 1 | 2 | 3 |

// comment, post and commenter (user) tables referencing the above

`cacheNotifications`
id | owner | data
1 | 1 | { post: { title: 'Best JS Function, ... }, comment: ..., ... }

This allows us to only have to hit the DB once (getting the cacheNotifications belonging to the owner) whilst minimizing (although not completely avoiding) data inconsistency by periodically updating the cacheNotifications table when something may have changed.

All in all, the biggest thing to take away from this, before you go ahead and build Plaicebook, is that you’re going to have to seriously stop and plan ahead. This is true for any kind of programming, which as a professional developer, you should be doing anyway. Due to the volume of interacting objects in a social network, not planning ahead is going to get expensive quickly, from database performance to end-user experience, to being able to sleep at night without flooding your bed with tears.

Thanks for taking the time to read. If, for any reason, you think we’re a gaggle of morons and missing something pretty fundamental in our analysis, we’d love to hear from you at info@lens.io (we’re always hiring as well).

--

--

Ben Stephenson

Co-founder/Javascripter at Lens. Northern tech migrant living in Walthamstow. All views are my own. Mine. You can't have them.