We Go in Batches (Using Golang Concurrency for Facebook Events)

This post is engineering focused and details a Facebook API adventure as seen through the Golang code of junior backend engineer Jessica Weinberg. It was originally published on Timehop’s Blog.

One of the most important and time sensitive parts of our backend infrastructure is our fleet of importers. We’ve discussed them before but to summarize they are essentially background running jobs that are responsible for retrieving all your content from whatever services you have linked to your Timehop (Facebook, Instagram, Twitter, etc.)

Facebook Events: Creating an Importer

While we were already importing Timehoppers’ Facebook photos, statuses and links, we were not doing so for events. Since Facebook is deprecating FQL in the coming months (which is what the other importers are written in), having to create the events importer presented an opportunity to start using the new version of the Facebook Graph API.

To grab a list of events, you hit the `/events` endpoint, which will give you a response with two fields: data and paging. Within data, you’ll find an array of event objects that contain fields such as `location`, `name`, `start_time`, among others. This search can be further refined by providing time ranges, limiting the number of returned events or even specifying which fields you wish to receive for every event entry. This is where things started to get tricky…

The Problem: Too Slow/Erratic Behavior

We wanted to be able to pull more information such as the list of people attending the event — data that isn’t included in a regular response. However, attempting to grab all past events of a user with this extra information, I kept running into problems. One of the problems was the erratic behavior; sometimes it would work, sometimes it would error out, sometimes it wouldn’t give me back all the data. After much investigation, only two solutions worked: a) only asking for ids of the events or b) narrowing the time range while requesting all the events’ fields.

This lead to our next problem: speed. Going with solution B meant that we would have to break a single request into multiple requests (each of them for a given time range) with pagination still potentially occurring within these. This was too slow for our needs as we’re currently getting a new signup every 2 seconds.

Problem Solved: Batch It

Looking for a speedy solution, I had learned that the Facebook Graph API has the ability to make batch requests. With this functionality, we were able to simultaneously batch up to 50 requests at a time. With the ability to batch, we could use solution A — fetch all of the IDs, split them up in batches of 50, and then send them off in parallel. With Go, we were able to take advantage of what the language is amazing at: concurrency.

To give some back story to my journey, I had JUST started learning Go and am also fresh to developing in general, so the concept of concurrency was still pretty new for me. This wasn’t going to deter me though, I was ready to Go.

The first step was understanding how exactly Go handles concurrency. What I learned was that there are two main things you need for concurrency — a channel (or channels) and Goroutines. A channel is a language construct that is fundamental to Go that allows Goroutines to communicate with one another. Goroutines are like lightweight threads that can execute units of work, i.e. functions, at the same time.

Since we will soon need to migrate our other importers to use V2 of the Facebook Graph API, it made sense to create a function, GetContentInBatches, for batch requests and make it accessible for any Facebook Graph API request.

GetContentInBatches function

This function takes in a few different arguments:

  • The Graph endpoint (i.e `/me/events/attending`).
  • Since and until (which are both unix timestamps, used to specify the range of time you want to pull data for).
  • Var args of the fields that you want back (i.e. for events some fields might be description, owner of the event, and/or people marked as attending that event). The function then returns an array of `BatchResponses` or an error.

First, we would need to make one request out to the Facebook Graph API to obtain all of the object IDs for the events the user has attended. The URL for the request is passed into the function `GetFacebookGraphRequest` which performs an HTTP GET request to the Facebook Graph API and then unmarshals the response into a struct named `FacebookGraphResponse` or returns an error.

The `GetContentInBatches` function then either returns an error or appends the obtained object IDs to an array which is then split into arrays of 50 object IDs. Again, the max amount of requests you can have in a batch request to the Facebook Graph API is 50.

Once we have appended all the URLs for the batch requests into an array we issue these requests (HTTP POST) at the same time, and then collect the responses from each goroutine using channels.

By taking advantage of concurrency as well as using Facebook’s batch requests, we were able to get the most data per event object as well as get all of a Timehopper’s events much more rapidly.

_______

Still here? That means you like code. We do too. Checkout garage.timehop.com for some of our open source projects. Also, we’re hiring. ☺

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.