A New Model to Predict What Your Customer May Do or Buy Next

Marcos Carvalho
Hapibot Studio
Published in
6 min readJan 10, 2017

--

Predicting what a customer is most likely to buy next is a powerful advantage companies dream of. It would allow companies to easily forecast stock replenishment or create the right marketing-mix, from promotions to experiences targeted for their customers at a truly user-level, with the final intent of influencing positively (as a facilitator) their purchasing behaviour.

However, predictive analytics isn’t a whatever is next future Artificial Intelligence wish. It’s actually a real world present capability companies can now leverage and a broad scope of data mining approaches to choose from.

Current Approaches

Classical Direct Marketing RFM (recency, frequency, monetary) approach is used for ranking customers according to how recently they purchased, how frequently they buy, or how much money they spend. The subsequent iterative process of selecting and testing the impact of these purchasing variables on the outcome, require a considerable collection of attribute information, computational power on behalf of companies and induce a large overhead of time complexity commonly found in these algorithms.

The exponential growth of larger data sets has also required more rigorous sampling strategies as traditional systems haven’t kept up with the computational needs of predictive analytic solutions of Big Data.

Most of the existing software used in companies for cross-selling products to their customers, use association rules, generating thousands of rules, and therefore making it extremely difficult for the marketeer or data analyst to predict the next-item that each customer will buy. To implement cross-selling strategies we want an algorithm that can discover the next item, but one that reduces the work required by the company and with a low time complexity algorithm.

New Approach (The Ramex Algorithm)

The Ramex Algorithm is a new approach, based on sequential pattern mining and the discovery of sequential patterns in very large databases.

Developed by our data scientist, Luís Cavique, this algorithm allows companies to include their entire product range in analyzing the purchase behaviors of their customers. The Ramex Algorithm is also designed for scalability and simple visual representations of the purchasing data sequences.

How the Ramex Model Works

As a sequential technique Ramex uses a time-based analysis to extract useful information. Given a set of data sequences, the purpose of the Ramex algorithm is to identify the most frequent path of events ‘a’ to ‘h’. The figure below shows data sequences of events from ‘a’ to ‘h’, with the respective frequency.

Table 1 — data sequences

There are two main phases in the Ramex algorithm, in the input we feed the raw data sequences, and the output results in the most weighted polytree. Now, think of a most weighted path in the graph that can visit every other node. This occurs by accumulating the data sequences into the network of nodes, in phase 1, and in the second phase, the algorithm searches for the most weighted polytree sequence of events.

Out-of-the-box predictive analysis methods such as AprioriAll or FP-Growth would eventually produce a cyclic network represented in Figure 1 based on the above data sequences. Notice that for large graphs the visualization becomes very difficult, so we must reduce the number of edges.

Figure 1 — original network

AprioriAll and FP-Growth methods require exponential time to run, i.e. with small instances these algorithms perform well, but with large instances the program running these models may spend months, years or centuries to accomplish the task.

In Figure 2, we apply the Ramex algorithm, wherein a polytree solution represents the X-ray visualization of the network skeleton.

Figure 2 — polytree solution

Ramex works better because it uses a polynomial algorithm and avoids the enumeration of all sub-sequences when compared to other methods such as AprioriAll and FP-Growth.

If you want to further explore the Ramex algorithm and get down to the nitty-gritty check out Luís Cavique’s scientific article here.

Advantages of the Ramex Algorithm

The Ramex model returns solutions where all the items can be visualized in a polytree structure. The properties of the Ramex algorithm are as follows:

i) No parameters needed: Most algorithms for sequence detection use parameters such as the minimal support and minimal confidence, i.e. “run AprioriAll (min_support = 5%, min_confidence = 60%)”, in order to prune the search space. This usually means there will inevitably be a tradeoff between efficiency and quality of the results. With this algorithm, the user simply executes “run Ramex()” without any parameter requirements.

ii) Scalable and incremental: In comparison with other algorithms, our approach does not carry out an exhaustive search. The procedure returns a polytree in a polynomial time complexity, which presents great scalability. Since the data is transformed into weights in the graph, an update of new events can be performed incrementally.

iii) Good visualization of all elements: The most popular software packages tend to generate a large number of rules, but a global view of the data is lost. In our approach, all items are taken into account and the overall view of the heavier polytree corresponds to the X-ray of the sequences of events.

Conclusion

The Ramex algorithm has been already been applied to discover data sequences in different environments such as: purchasing sequences, web mining, financial markets, social networks.

Purchasing sequence: the most frequent purchasing paths;

Figure 3 — purchasing sequences

Web mining: the analysis of the click-stream originates a user flow site map;

Figure 4 — click-stream analysis

Financial markets: refineries are at the beginning of the chain of the oil industry, while retail for oil derivatives are at the end of the chain. The following figure clearly shows the influence of the refinery oil prices on other prices.

Figure 5 — financial markets

Leveraging this predictive power means companies can really understand their customers at a truly user-level or comprehend future market influences. It empowers the capability to create a personalized promotion or an experience for customers or easily forecast stock replenishment required in stores.

Companies can unlock and take advantage of their transactional, product and customer data to identify and track subtle changes in customer or market pattern behaviour, predicting the best actions to drive maximum customer engagement or generate business savings. It allows companies to act upon the algorithm’s results and automatically send the most relevant triggers via the channel most likely to convert customers or generate the greatest savings.

What to learn more about this algorithm or how to successfully implement advanced predictive analysis solutions in your product or company? Contact us today

Connect with us: Hapibot.com | Twitter | LinkedIn | Instagram | Dribbble

--

--

Marcos Carvalho
Hapibot Studio

Starter. Eclectic. Director of Product Design at TeamViewer.