Predicting Users’ Intention When Searching on Online Marketplace Platforms

Published in

Bukalapak Data

7 min readFeb 25, 2020

Let’s meet Bill, who likes shopping in an online marketplace like Bukalapak. Sometimes, he already knows what products to buy, so he makes use of Bukalapak’s search engine to compare the quality of goods, prices and sellers. However, there are also times when Bill is just “window shopping”; he searches products without even knowing exact things he wants to buy. From Bill’s story, Bill has different intentions when using search engine in a marketplace platform, as such the expectation of products’ behaviour should adapt to his needs and provide a better experience.

As a Data Scientist, there are certain user patterns in using online marketplace’s search engine that we can quantify. Let’s dive in how we do so at Bukalapak.

Part 1. The Kick Off!

/ˈkikäf/ the start of an event or activity.

What users’ intention is, and why it becomes important.

Mapping users’ intention is a common issue in a product search. Moe [1] proposed a framework for mining and predicting users’ movements and purchasing transactions in the context of mobile commerce. The goal of Moe’s study is to categorize shopping sessions and not necessarily to map out the page-to-page decisions of shoppers. By using several features, Moe managed to cluster the shopping patterns into four big segments, namely directed buying, search/deliberation, hedonic browsing and knowledge building.

Further research [2] shows user interaction patterns are closely related to satisfaction, but their detailed relation varies across different intents.

“User satisfaction, a key concept in measuring search success, Since information search is an important stage of buyer decision process offering result lists that satisfy users’ need can increase the loyalty of users and further promote sales.”

While many shopping sites use purchase as a measure of user satisfaction, satisfaction is not always associated with purchase. Let’s say Bill has a friend named Harry, they want to buy an iPhone (Table 1). First, they both have different intentions on using search engines. Bill wants to purchase the iPhone, while Harry wants to investigate different price or model options. Bill was satisfied after one click as he found the desired product at rank #2 of the result, yet he ended up not purchasing it. On the other side, Harry kept exploring results while remained unsatisfied given his goal. To sum it all, in the context of product search, user satisfaction depends on a tons of complex factors, including users’ intention, shopping habits, etc. Given those two scenarios, direct monetization-related metrics (e.g. purchase) could miss-judge the performance of the product search engine, when it is used to measure the user satisfaction.

Table 1. Example real world search sessions from two users who issued the query “iPhone” to a product search engine [2].

How’s the condition in our platform?

*Figure 1. User’s intention on search, research by Bukalapak’s UX Researcher .*

It has been observed by UX researchers, that users might explore or locate their products of interest. We hypothesize, predicting users’ intention and adjust the search strategy accordingly could potentially lift-up users’ experience in using our internal search engine. There are lots of spectrums in between exploring and locating, but for simplicity we first focus on two very-end-groups, namely, locating and exploring. the Exploring segment is still discovering what product they want to buy, meanwhile the Locating segment is more focused on how to get the best product, based on sellers or prices (Figure 1).

Part 2. Tackle

/ˈtak(ə)l/ make determined efforts to deal with (a problem or difficult task).

We proposed to create an unsupervised learning model to predict whether a search keyword (query) belongs to locating or exploring segment based on historical users behavior. Further, search strategies can be adjusted given the predicted users’ intention.

Why tackle the idea based on queries? We believe that queries are highly correlated to search behavior and the main key for users to use a search engine.

*Figure 2. The schema to tackle the idea*

Let’s start cutting into pieces

To begin with, we make a clustering model for one query and clustering features are made of users’ behavior within that query . This is a short code version of what we really do, but don’t worry we also have the longer version :)

*Figure 3. Short coded version of the Clustering Process*

The first thing is to take all queries within a week, and the total unique users who did search on each and every query within the predetermined range of time is more than or equal to 200 unique users.

Why? First, we assume that the trend on search behavior sufficiently changes within a week. Second, since we decided to use users’ behavior as our features, we need search queries that were used by a sufficient number of users as such we could see a common pattern.

*Figure 4. The Schema of Clustering Model*

For each of query,

Get Data!

We try to find features that will describe the users’ behavior that used this particular query, the features are as follows

Number of search events, how many times a user search by using a keyword
Number of click products, how many clicks a user made after search using the keyword
Make an add to cart, does a user add a product to his cart? (binary)
Make a paid transaction, does a user end-up in buying the product? (binary)
Did search last week, did a user also search by using the keyword within last week? (binary)

So the data set looks like this,

For keyword: iPhone 5

2. Deal With Imbalance Data

In general, we noticed that the conversion from search to paid might be relatively small, which skewed the distribution on paid trx. To tackle this imbalance set problem, we used combine sampling from imblearn (imbalanced-learn) library, to make the number of paid trx balanced.

3. Outlier Detection

We used DBSCAN for our outlier detection method.

4. Clustering Process

Initially, we compared a few of clustering methods and resulted in K-Means as the best method based on silhouette score. The parameter K is tuned by using Elbow method.

5. Save the results

For one query, we would have the result look like this, with a variant number of clusters. Having discussed and validating the results with our UX Researchers, we decided to name each segment based on their average value.

For query: iPhone 5

*Figure 5. Result of segments for keyword iPhone 5*

Having repeated the analysis on each and every query, the result is as follows

*Figure 6. Reference table for locating and exploring*

Part 3. et voilà !

French for “there you go!”

The result

How do we interpret the result? Back to Bill & Harry who searched for iPhone 5, so based on historical users behavior, we have 82% (score) confident that query iPhone 5 tends to be more exploring (84.24%) than locating (15.70%). Given this, we can now provide different treatment to the different segment of users, and provide them with a better experience.

What can we do with the list? For a simple use case, we can use the list to provide different search results for each segment; another example, we could add some additional query suggestions for queries that are more exploring.

*Figure 6. Query continuation for exploring queries*

There are a lot of insights and ideas that we can do with the list. We have covered a number of queries in which its average score is 0.87.

Part 4. Conclusion

/kənˈklo͞oZHən/ the end or finish of an event or process.

Mapping users’ intention is a common issue in a product search. On the Bukalapak platform, first we focused on two very-end-groups, namely, Locating (focused on how to get the best product, based on sellers or prices) and Exploring (still discovering what product they want to buy). We have created an unsupervised learning model to predict whether a search keyword (query) belongs to locating or exploring segment based on historical users behavior. The model has covered a number of queries in which its average score is 0.87.

Reference

Wendy W Moe. 2003. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. Journal of consumer psychology 13, 1 (2003), 29–39
Su, N., He, J., Liu, Y., Zhang, M. and Ma, S., 2018, February. User intent, behaviour, and perceived satisfaction in product search. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (pp. 547–555).