DATA SCIENCE IS NOT OUT OF YOUR REACH

Do you have a startup? Have you heard about data science? Do you think your budding business can handle the expenses of doing data science? In my experience the answer to the first two queries — pun intended — is usually yes and the third one elicits a resounding NO! You are wrong. You can do data science and you don’t need a new hire or an investment in infrastructure to help you get started. You can start with your existing resources.

As admitted by Noah Lorang in his recent piece, and professed to by every real data scientist I’ve met, data science is simply applying math to your business. Math you already know! People all over the world suffer from a disconnect between their hard-won math skills and the reality all around them. A data scientist is simply a specialist who makes the connection; who sees the world around them as a set of entities interacting in an entirely quantifiable system; a system they can analyze, understand and maybe even emulate.

I work at Anetwork and as a Digital Marketing Agency we deal with a lot of small interactions between such entities. We as an organization exist in the intersection of a lot of different inter-connected systems. A user likes a post we put up on Instagram; a viewer is exposed to one of our ads on a third-party website; or a publisher of our ads generates more or less revenue on a daily basis based on their traffic. Back in 2015 I found myself asking a series of questions: How does one publisher generating more revenue affect the others? What does that say about our publishers as a whole? and what insight lay hidden behind the patterns of mass migration of users to a certain set of our publishers? These all seem like grand statistical analyses but the key to unlocking these insights was a simple formula that I learnt way way back in school.

Do you know how to calculate a percent change? Of course you do. Now put yourself in nima’s shoes. Nima is officially the Publisher Developer at Anetwork. That title means that he is in charge of us expanding our reach to new websites and also maintaining the steady flow of ad impressions from our existing publishers. He knew how to calculate a percent change, and that was a metric he used to quantify the growth of our top publishers, but with our publishers numbering in the thousands and growing steadily, he felt overwhelmed by the sheer volume of data he had to ingest on a daily basis; lest he not know what is going on in our publishers community.

We came up with what we now call Importance Index. It is really simple. We needed the percent change(i.e. growth) of a metric as calculated for a particular customer, to factor in the percent change of that metric for all other customers. So if a publishers’s importance index in say, impressions, is high it is not just because they had a lot more traffic that day, but also maybe because they held steady while others experienced decreased traffic; or maybe because they continued to show growth despite their low ranking in the bigger scheme of things.

Enter data science!

  • Pc is Percent Change. So, for example, a customer used our services 40% more today.
  • Pc total is the overall change in usage of all our customers for that particular service. So, for instance, a service was used 30% less today.
  • The sigma is simply the sum of all raw Percent Change values for each customer.

So importance of a publisher to Anetwork is now a quantifiable metric —redundant, I KNOW! — that with Nima’s help we now actually operate on. It is not just how much traffic they generate, but how has that changed in comparison with all our other publishers. If we want to know how we can better allocate our resources, it is really simple now. Just sort the publishers based on their importance and pick the top contenders.

But where is the magic?! A couple of months ago, just as we were exploring the potential uses of our importance index, Hadi Norouzi, a well-known Iranian footballer suddenly passed away. The very next day Nima was confronted with an uncanny phenomenon. His importance-index-sorted list of publishers was suddenly invaded on the first page with football news publishers and all manner of websites engaging in the mass hysteria that arises after a celebrity’s star suddenly fades. Were these websites suddenly generating more revenue than our prized top-tier publishers? NO. They were just really important that day, and as a data-driven organization, we didn’t miss the fact that our audience were flocking en masse to read about their favorite football player. As Shayan Shalileh once put it, it’s not hard work; it’s hard core.