The Mysterious World of Machine Learning

Published in

BehaviourExchange (BEX)

8 min readJan 29, 2018

BehaviourExchange project has been developed with the purpose to offer solution for two major online business challenges: B2C companies don’t have enough traffic and even more important they don’t know who their web visitors are. The solution we are talking about is identification and profiling of online visitors, not as a group of visitors, but as individuals. This can enable companies to actively engage with each individual consumer and show personalized content to each of them. Machine learning played a very important role in our development process of profiling, however for the most of us this is still a very mysterious world. In the following article we will try to demystify it. Enjoy!

At the beginning of the automatisation era, computers did basic calculations with numbers for us. Soon it was discovered that such calculations can be nested and executed in specific order to perform more complex tasks. Along with that the first programs and algorithms were written.

Nowadays computers are more and more powerful and complex tasks can be performed faster than ever.

The desire for computers to replace humans by solving everyday problems increases. The challenge we are facing today is how to represent complex problems in the form of tasks that computers can execute. In a way computers work similarly to human brain, especially in the field of problem solving.

People’s ability to solve problems depends on the structure of the human brain. Human brain is a huge collection of neurons that are connected. The activation of neurons is executed during the process of thinking and sensing (when the sensors such as eyes, ears, nose and skin pass impulses to the brain). The structure of human brain is not completely understood but some properties are known. Even though neurologists still cannot completely explain the functioning of human brain, the computer science tries to imitate the most complex organ in human body. Computer algorithms mimic human brain particularly in the field of learning.

How is the process of computer learning similar to human learning? Computer learning means forming connections between artificial neurons according to the labeled data passed to them. In a very similar way learning happens in human brain — as mother or teacher explain things to a child, the connections between neurons in child’s brain are formed. Computer science can go beyond simple learning by implementing self-learning algorithms, also known as machine learning. And the ability of computers to execute such algorithms is called artificial intelligence.

Artificial neural networks

And now back to the analogy of artificial neural networks and human brain. There are neural cells or neurons in human brain. The analogy of neurons in computer science is a dedicated part of computer memory. Accepting a signal from a sensor (e.g. eye or ear) is analogous to passing a piece of input data into neural network, i.e. writing some data into before mentioned allocated part of memory. At the lowest level, data for computers are represented as a sequence of bits, i.e. zeros and ones. When some sequence of bits arrives to some neuron, that neuron activates or not. If it activates it can pass a signal to the neighbour neurons. For artificial neural network, “passing a signal to neighbour neurons” means that another sequence of bits is calculated from the data currently stored in the neuron and sent as an input to the connected neurons (stored in the allocated memory corresponding to the neighbour neurons).

The process of teaching an artificial network means setting properties for each neuron in the network. According to properties the neuron can decide (1) whether it activates on a given input data or not and if it activates, (2) how does it calculate the data sent to the neighbour neurons. There are many predefined families of options for the activation decision and functions. Teaching or training an artificial neural network means that we pass labeled data or learning set (input data) in the form that the neural network will be able to characterise along with the answers, i.e. actual characteristics of that data. Neural network then uses labeled data to pick appropriate activation decision algorithms and activation functions from the pre-selected family. Once the neural network is trained, it can accept new (unlabelled) data, pass it through the itself and return the characteristics of that data which it has learned about.

Example of text mining

If a person gets a collection of newspaper articles or websites that some person he does not know reads, he can draw some conclusions about that unknown person. For example, if there are many articles about sports, the unknown is obviously interested in sports. If there are articles about concerts, theatre, museums, galleries, the unknown is interested in culture and probably higher educated. If the unknown reads about children education, it probably has a child etc.

For the computer, there are several problems. First of all, computer can not know out of the box what the text is about. Second, it does not know how to map topics into the person’s interests. There is no way to implement such algorithms since there are about 200.000 words in English dictionary and each word can have several meanings depending on the context. For a “traditional” computer algorithm that would mean billions of “if there is a word in the text then … else …” which is unmanageable. There is just too many ways that some topic can be represented by a text that there is no hope for a computer or for a programmer to write a kind of topic-recognition algorithm from scratch.

That is where self-learning algorithms come into play. The above mentioned human brain imitation, called an artificial neural network, among with a collection of labeled texts, can be used. The neural network learns — connections between artificial neurons are formed — about the topic recognition from the labeled text and later on, it can accept new text, flow it through trained connections and determine the topic. The difference from the traditional algorithms which are finalised before any data flows through them is that those self-learning algorithms use labeled data to fit its parameters before being applied to the new data. It is described below in more detail.

Topic recognition

We have an idea of an algorithm that learns itself. But still, computer needs a well defined input, and a specified form of an output. For our example, the input is some text. For the computer, the text is just a sequence of characters (which are further just a sequences of bits), but we know that the content is hidden in words (or even in the context words that are used) not just in characters. Therefore, the raw text is prepared as a list of words (or a list of tuples of consecutive words, to capture also the context). And the output for the computer should be one of predefined topics.

For the purpose of efficient self-learning, we need enough examples of texts for each topic to be recognised. Imagine an intelligent ET visiting the Earth and you would like to show him what people here are doing. You take ET to four (or even hundreds of) bicycle races and explain to him that those races are about sports. But based on that one example of a sport ET is unable to identify skiing or basketball as a sport later on. And it is the same for the algorithm that is learning. It needs lots of (and diverse) set of texts in order for learning to be efficient. The more the better.

Furthermore, to increase the algorithm performance — speed and accuracy — it is desirable to clean the text. There are lots of meaningless words in each text, like ‘a’, ‘the’, articles and even auxiliary verbs do not contribute to the topic. Also, a single word has several forms (present, past, future simple and continuous form of verbs, for example) that increases the complexity of texts, but the topic can be recognised from the basic form. Those tasks are part of text preprocessing.

Once we collect enough labeled texts, preprocess them and choose a self-learning algorithm, we are ready to start algorithm teaching process. After some time, algorithm finishes saying ‘Hey, I have finished my training!’. As children are tested in school, it is necessary for self-learning algorithms to be tested, too. Usually, a part of labeled dataset (texts) are excluded from algorithm’s teaching process and are used for test purposes. The text is passed to the algorithm and the output the algorithm produces is compared to the actual label coming along with a text. Using this part of labeled set, we can report how well our algorithm learned using some proportion of correctly classified texts.

After hard work of training the interpretation follows. Several algorithms work as a black box, like human brain. Algorithms are designed to learn but programmers/data scientist cannot completely interpret the inner structure of the trained algorithm, like the individual connections between neurons in human brain are impossible to interpret. If the algorithm scores are bad, usually either the learning set is bad (small, not enough cleaned), some algorithm parameters should be tuned or an algorithm should be replaced with something else. What to do is a kind of data science magic, there is no general recipe, but there are some diagnostic tools that can help us to make a decision what to try next.

Human profiling

Here, at BehaviourExchange, we are trying to characterise people based on what they read. In that case, the input is a collection of texts and the output are demographic or psychographic characteristics. The only thing to keep in mind is that the output should depend on the input, otherwise the learning is just some random formation of connections that can not perform well on the new people being profiled. For example, the father’s shoe size is probably not hidden in texts some individual is reading. But the interests of that individual probably are. And maybe even an income, a place of residence or an education level can be predicted up to some confidence level from the texts that the individual is reading.

To keep track of the amazing work BehaviourExchange is doing, follow the project on Telegram.

The Mysterious World of Machine Learning

Artificial neural networks

Example of text mining

Topic recognition

Human profiling

Written by BehaviourExchange (BEX)