Know Thy Customer: Insights from a VimpelCom engineer’s efforts to predict the rain in Milan using TensorFlow and TFLearn

VEON Careers
6 min readOct 20, 2016

--

We place our customers at the center of our universe. we are customer obsessed.

That’s a core belief we have here at VimpelCom. As a data engineer, every project I undertake is built around knowing and serving the customer better. In fact, it’s what motivates our entire team each day.

It’s also one of the main reasons we hold hackathons. Yes, hackathons are fun for us data scientists. But they accomplish much more.

At the last hackathon VimpelCom held, I wanted to come up with a way to predict behavior among our more than 200 million customers. And I remembered when a data science colleague asked me if I thought it was possible to use call detail record (CDR) data to predict if it was raining in Milan. This idea intrigued me and I was up for the challenge.

For the hackathon, we used public data for the city of Milan from November through December, 2013. To make my Hackathon project a success, the plan was to provide neural network data and then create a model that could predict behavior. Sounds awesome, right?

Why we did this hackathon

When we do hackathons at VimpelCom, we want to learn how to be better at our jobs. This way, we can serve our customers in a more efficient and personalized way.

Currently, our data science teams are working hard to apply logic and order to the vast amounts of data we have. We are vigorously researching and applying various segmentation models, graph and clustering methods, and predictive modeling Hackathons help us gain insights into how to more effectively analyze data and make it presentable.

As of now, there are several data visualization tools on the market. Organizing hackathons give us a way to thoroughly compare and test these data visualization tools.

For this Hackathon, the assignment was to conceptualize our own machine learning idea or product, and then figure out how it can solve a business problem for Beeline Russia. For my team, the idea was to create a model that could predict behavior.

To gain visual insights, we focused on assessing the functionalities of Tableau and TensorFlow (an open source software library for machine intelligence). Needless to say, I was excited.

Our VimpelCom team consisted of data scientists from Beeline in Russia and the headquarters in Amsterdam. Here’s how it went.

Understanding the potential

The telecommunications industry is influenced by weather. What happens during rain could tell us much about our customers, because weather has a huge impact on what people do.

Folks tend to go food shopping before inclement weather, for instance. Being able to predict the weather could help us better understand when to make offers to customers.

When it’s raining, internet usage goes up (our data here shows that). But what if some customers aren’t using the internet more when it’s raining in Milan? We could find ways to target those outliers and get them to go online.

Clearly, I saw much promise for this study. Unfortunately, I couldn’t look at customer level cases because the public data was aggregated every 10 minutes for each area in Milan. We still had CDR data, though. And if we could use that to accurately predict what’s happening with the weather, we could use a similar model to accurately predict what’s happening with our customers.

Choosing the data

As mentioned before, I focused on public CDR data in the Milan area. I also gathered public precipitation data.

The data was already divided by a100x100 grid for the CDR data and the precipitation data already divided into the four large squares. I aggregated the CDR data into the larger squares.

For every cell in the grid, SMS, internet and call data was aggregated every 10 minutes. Precipitation data was collected at the same time rate for the four large squares. Basically, what I was looking at was how people in Milan use their devices when it’s raining.

As a side note, wouldn’t it be funny if you were reading this while it’s raining? I’m actually writing this while it’s raining.

Anyway, choosing this data was what I needed to see if we could tell if it was raining currently in a certain area in Milan. Obviously, this data had to be carefully aggregated, and I had to build a model capable of making use of this time-sensitive information.

Building the model

Now we get to the fun stuff: building the neural network architecture that can make predicting behavior possible.

The first step was data preparation in Python. Since data was split over several files, much merging was needed. After that, I matched the CDR data with the precipitation data. There was a lot of info here, so I had to perform aggregation to get a clearer view of the two datasets together.

The next step was actually looking at the data. For this I used Tableau. For over four days, I focused on one particular grid area, attempting to find patterns. Unfortunately, I could not locate any strong correlation, but that was perhaps because Tableau was only showing internet usage from the CDR data (and not SMS or call data).

Following this, I used TensorFlow to create a Long Short-Term Memory (LSTM) network, a recurrent neural network that can work with sequences of data. For example, it can know that rain ten minutes ago probably means it’s raining now.

The only problem was that, while I found the tutorials straightforward, it was hard to use TensorFlow for this project. I found the documentation for LSTM networks to be limited. There was simply no way to apply it to this case.

But it was not time to give up.

Solving the problem

I had already gotten to this point. And the whole process had been exhilarating. It wasn’t time to stop. At VimpelCom, we make our customers happy by doing just that: testing solutions to problems until one works.

I realized that TFLearn, a deep learning library for TensorFlow, could help make predicting if it’s raining possible. Here’s why:

  • TensorFlow is flexible, but complex. TFLearn made my life a lot easier by providing a higher-level API capable of easily training and testing neural network models.
  • TensorFlow has a steep learning curve. Conversely, TFLearn features plenty of “good enough” default settings so you don’t have to worry about every single detail.
  • TensorFlow’s tutorials are straightforward, but parameter choices aren’t explained. TFLearn’s code is very, very clear.

So, to successfully complete the model, I used TFLearn (a great new friend of mine!).

Looking at the results (and looking ahead)

Unfortunately, the results from the model weren’t the best.

Training took place in November, and testing was done during the first week of December. The accuracy on the test set was: 0.98875.

While the model was quite accurate, it always predicted no precipitation. It was right simply because there was little rain in the public dataset for Milan.

I’ve concluded this could have happened for a number of reasons:

  • Network architecture may need to be changed.
  • The LSTM network was unnecessarily complicated.
  • Information was lost during data aggregation and processing.
  • Time periods weren’t selected properly. Public data from the Big Data Challenge has revealed that rain levels weren’t high in Milan from November to December in 2013.

I believe this case can definitely be helped with more investigation. Perhaps fine-grained grid squares could improve the model. The data format could be wrong. Also, using public data from rainy time periods in Milan could certainly help in testing accuracy.

In the end, creating a model that can predict behavior is an incredible endeavor — one that can improve our capabilities with data and help us better reach our customers at VimpelCom. That’s why I’ll continue to put more effort into this study. If we can successfully build such architecture, we can expand the limits of what we can do with data. And we can achieve more for our customers.

--

--

VEON Careers

At VEON, we know much of the world counts on us (10% and growing). We know that sitting down and being complacent with the status quo just isn’t an option.