Building an Event Detector with Mobile Data

Cell phones’ digital footprints provide glimpses of massive social events such as rock concerts and political rallies.

Companies such as Foursquare currently have a very detailed idea of their user’s interests thanks to their “check-in” data. Telcos, on the other hand, although they might not count on such detailed information, they do have the upper hand when it comes to size: everyone owns a cell phone.

Rather than “check-in” transactions into different shops, Telco’s have the equivalent “check-in” transactions in the form of Call Detail Records (CDR). Every time someone makes a phone call, Telcos keep a record specifying information such as origin phone number, target phone number, time of day, call duration and origin antenna location. For any data scientist, sociologist or marketer data such as this is absolute bliss!

One afternoon, at GranData Labs, we started brainstorming if it could be possible to automatically detect events such as rock concerts, soccer matches and the like. It turns out that, regardless how unpredictable we humans would like to be, we behave in quite expectable ways, specially if we consider aggregate groups of people. That is, the daily rhythm present in every antenna is quite similar to each other.

Using this fact it is possible to filter out the city’s daily routine and examine all those events that deviate from the norm. Such unexpected events are when the city truly comes alive, all those moments people take a break from their regular behavior and do something different. However, how do we define what constitutes an “unusual” event? How might we know if it even makes sense?

Figure 1

For starters, to analyze each antenna’s activity we need to preprocess the original CDR data from something like {origin_phone, target_phone, time, call_duration, antenna_id} into something like {antenna_id,timeslot, number_of_calls}. Figure 1 represents the hourly number of calls for different antennas in a period of three months starting in January 2012. Notice the city’s rhythm, rising up in the morning, a peak of activity around noon, another during the afternoon and then cooling up at night. Furthermore, there is a clear difference between weeks and weekends (In fact, the higher the difference, the more work centric the place tends to be. But that is a whole different subject!).

Figure 2

Once we have every antenna’s activity it is necessary to define some sort of index that filters out routine from unexpected activity. This can be done by subtracting the average antenna’s weekly activity. Figure 2 represents our “Event Index” value for the same antennas. Notice how clear activity peaks appear, therefore making it easy to filter unusual events from the daily routine.

Having defined our “Event Index” it is now possible to pinpoint events according to a given threshold. After selecting events with the highest combined score we tried to verify if these events actually made sense.

Truth be told, the road from idea to reality tends to be quite rough on data scientists. Usually, what may seem a great idea with a quick test ends up constrained by some unexpected data inadequacy, thus making it more about the preprocessing of data than the proper analysis of it. Luckily, after searching in the web the time and place of the events, we got rather interesting results. Figure 3 and Figure 4 correspond to two antenna’s hourly activity and event index.

Figure 3

In Figure 3 one can see successive peaks occurring in March 2012. It turns out that these correspond to a series of Roger Waters concerts that took place in River Plate’s Soccer Stadium, just a few meters away from the antenna’s location.

Figure 4

In Figure 4, it is possible to distinguish a clear peak on March 24th. This antenna is located near Plaza de Mayo, the most emblematic park in Buenos Aires and the date corresponds to the date of Remembrance for Truth and Justice, hence we are looking at a political rally.

Examples such as these pinpoint to the richness of Telco’s data and how it might be used in exciting new ways. Automating an event’s semantics is a whole different subject, but there are alternatives that might be useful to characterize the users. One, for example, is to link different events if one user attended both events. In this way, one could construct an “Event Network” and find clusters of both events and users. Checking into an antenna at a given time conveys detailed information about a user’s tastes and preferences. Eventually this might be turned into a useful new product to generate detailed user profiles. These profiles are not only useful to Telcos, but also to banks, retail and many other industries.

Any person with a trained business instinct won’t take long to realize this sort of approach in processing and analyzing data is potentially a game changer. It is the equivalent difference between Blockbuster and Netflix: Telco’s can move from a static business model to a dynamic informational one or risk being taken by companies that do make this transition. Although they might feel comfortable today, they are exposing themselves to potentially upcoming innovative competitors. The problem is that, just as Blockbuster realized too late in the game, Telco’s might not have enough time to change once they realize they are competing against companies that were born in an information market. The sooner they start tapping the richness of their data, the fitter they will be when this time comes. As Lewis Carroll wrote, “Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!“.