Know the Unknown- Discover more from the Data at hand with Machine Learning

Madura Pradeep
ShoutOUT Blog
Published in
4 min readSep 12, 2017

A machine-learning approach to derive new attributes from already existing attributes of people

Let’s start this way. Take a look at the following names and see what you can guess about them.

Disclaimer: If you are seeing these names for the first time in your life, don’t worry, because I have included names that are bound to my context. Just think about any name that comes to your mind and try the exercise.

  • Adriana
  • Ashen
  • Gunapala
  • Aishwarya
  • Jibbran

⏳ ⏳ ⌛ ⌛ ⏰ ⏰ 😃 😃

Okay. Time’s up. Following are my guesses.

  • Adriana — Spanish, Female
  • Ashen — Sri Lankan, Male, Age 16–30, Christian
  • Gunapala — Sri Lankan, Male, Age above 45, Buddhist
  • Aishwarya — Indian, Female, Hindu
  • Jibbran — Arabian, Male, Muslim

Just by looking at the name, we now know pretty much about them. Moreover, if you are familiar with the context, then you will be able to guess even better, just like how I (being a Sri Lankan familiar with those names ☺️) guessed the age group of “Ashen” and “Gunapala” . So, it’s clear that we can easily guess and find patterns with our general knowledge. However, it is not an easy task at all for computers. To teach these dumb machines, you need Machine Learning techniques.

I’m sure that you know the importance of such details as above for marketers. We, at ShoutOUT, continuously research about how we can make our tools more intelligent and user friendly in order to serve our customers better. And this is one such research that is currently being carried out at ShoutOUT Labs.

Okay, Let’s dive in. This part will be a little boring for the non-techies, but don’t you worry, you can still read and get the idea ☺️.

Initially in our research, what we looked at was finding the country of a person when the mobile number is present. Now that sounds like a really really easy task as you can extract the country code from the mobile number within a second.

But, no! It’s not as easy as it sounds.

People provide mobile numbers in different formats, for example, 94778123456, 0778123456, 778123456, +940778123456 and 94–0778123456. All the five numbers are the same mobile number, from Sri Lanka, but in different formats. At first, we wanted to write simple rule sets with data cleanup for this classification. Yes, we can, but we will end up with around 40 to 50 rules for Sri Lanka only. And it means, to cover all the formats in all the countries, we probably will need more than 10000 or 100000 rules. Who will volunteer to write such a large number of rules? (Not me, because 🎶 I am lazy and I know it 🎶😉 ).

So, we tried out several classification techniques and ended up with a feature based classifier. The features we considered were the first few digits and the length of the mobile number. We trained our classifier for Sri Lankan and Nigerian mobile numbers as we have data sets for those countries and it resulted in more than 90% of accuracy. If we use Recurrent Neural Network(RNN), may be we can achieve more accuracy than this (BTW, we need to find that out. ☺️ ). Furthermore, if we combine the results from the mobile number classification with the results of the name based classification, we may be able to achieve an accuracy close to 100%.

Name based classification is something we are working on right now, and as a clue, I’ll say that we will use a Recurrent Neural Network based classification initially (who knows. 😄).

The next major aspect we will be focusing on is the possibility of finding out the name or age from the email address of the customer (Wait what? 😮 Yes you heard me correct). As we all know, many people create their email addresses by combining their name and birth year. So, by extracting those details from a particular email address, we can get the customer name and also calculate the age.

You might be wondering now, that as a customer of ShoutOUT, how these researches will benefit your business.

Data enrichment of your contacts! You can create more segments and target them better even if you have only a few details about your customers initially.

You will be able to create a campaign specifically for men, even if you have only the name and the mobile number/email address of your customers when you import the contacts, because we will find out the gender from the customer name.

Sounds awesome, right? (Yep, it does!. I know! 😄)

--

--