Building our own Cerebro

Approaching recruitment like a Geek.


A fairly well accepted approach to growing a team is to wait until your organisation has sufficient need before embarking on a potentially lengthy candidate search. I used to call this “lazy recruiting” and it’s a methodology I applied heavily in my previous companies. While it ensures you’re not spending money you might not have, it also means you’re often on a knife edge trying to balance workload with resource. You’re always on the back foot, rushing to make hires in order to fulfil demand.

If your approach to hiring is to push out adverts when you have need and wait for candidates to respond, your pool of choice will always suffer a limit. This limit will be compounded by both the extent of your reach and the number of people receptive to your proposition.

I believe in serendipity but such an approach feels a little too “Que Sera, Sera”. Much better, it seems, to become proactive in your recruitment. To start identifying the people you want and reaching out to them ahead of time. To do so as early as manageable, averting panic when you face reluctance or refusals.

As we tumble head first into scaling our operation, we want to be a talent-centric organisation, hiring the right people early and organically shaping the company around them, not hastily trying to force square pegs into round holes.

How does one orchestrate such a seismic shift in methodology? Well we’re doing what we do best, building something. We like to think of it as Cerebro for finding developers.

The benefit of both the design and development industry is that we’re very well represented online. Github, Stack Exchange, Behance and numerous other platforms give us ample opportunity to show off our wares, and for our peers to rate and review us. They give us, as professionals, an online fingerprint. While the argument’s already been made that Github is by no means an idealistic meritocracy, it’s a starting point for our experiment. Many developers have open Github profiles, with repositories rated by the community. The Github API even exposes email addresses and allows for location searches, albeit string based rather than geospatial. An oil field of recruitment data, waiting to be mined.

Taking an afternoon out for a little R&D, we started to think about how best to make the use of this offering. We considered a feasible commuting range, half an hour by train either way. In geographical terms, this equates to any train stop between Stoke-on-Trent and Manchester.

Our next step was to prototype a simple application which queued and processed requests to Github’s API, from location searches to developer profiles to repository lists. Github’s location field allows for free text entry, which unfortunately breeds divergence in how people describe their location. We devised a process to normalise addresses, running the location strings through a geocoding service and then reverse geocoding the results to obtain a complete address. For repositories we collated the dominant language used and assigned a normalised score by multiplying Github’s two social values (watchers and stars). We indexed everything with Solr, allowing us to cross-sort people by score, distance from our office and prominence of a certain language.

Within an afternoon, we had identified 964 developers, created a valuable recruitment pipeline and gleaned some very interesting insight into the development scene blossoming along the North West Mainline.

The first thing that struck us was the diversity of languages in use. Starting with Javascript on the left, the results span an impressive 52 languages.

964 Developers grouped by 52 languages

You might think of the single developer working with Cold Fusion (far right) as Will Smiths pivotal character in I Am Legend, the sole survivor who himself became a Legend.

The 5 top languages in use

A breakdown of those top 5 languages shows just how prevalent Javascript is, with Ruby coming in second and PHP slightly behind. We expected PHP to come in first considering the north west’s strong agency culture, which generally results in a glut of PHP developers. It’s interesting to see Java make it into the top 5, likely proliferated by the Android ecosystem.

When we drilled down into those Javascript repositories, we weren’t surprised to find a minuscule amount of JS repositories are Node centric. The weighting is definitely towards jQuery plugins.

Node.js looks like a moon orbiting the wider Javascript community

Considering future uses for the tool, there’s a lot of value in flipping the way we use it:

  • Instead of searching for developers for a language or framework we’ve already adopted, we can use it to identify what we should be using.
  • We can interrogate the most popular languages locally, and cross-reference that with average peer-rating across repositories, allowing us to chose a language with the highest calibre of developers.
  • We can now consider locations for subsequent offices, using the tool to whittle down location choices based on the abundance of developers.
  • We can identify trends at a local level, helping us to plan a technical strategy ahead of the industry curve.

In the short term though, we’re seeing immediate rewards for our time investment. We’ve been able to open communications with a range of exceptional developers and have a solid pipeline in place for growth. A pipeline which facilitates diversity far beyond our historic means and channels, centred primarily around our followers. Our scraper’s idempotent, running continuously to pull in new profiles, immediately identifying new developers as they come onto the scene and giving us early access to emerging or unknown talent. The same methodology could be applied to other services, like Behance to identify designers.

By applying the sort of product development thinking we’re known for to our recruitment, we’ve created a tool which will ultimately make our future growth far smoother, right across the business, for a fraction of the cost of a single recruitment campaign.