Data-driven location selection to support your business decisions
Introduction to the tools and approaches in location analytics, and how we, in Bukalapak, utilize the toolkit to find the best areas to acquire new customers.
What will you learn
In this article, you can expect to learn about the following:
- Utilization of location data for location analytics
- The tools and libraries that support the analysis and how to use them
- An analytics approach to tackle and answer the problem
I’ll try to bring you into the journey of the thinking process in solving the problem, step-by-step, so you can understand how you can apply the thinking process to other kinds of problems as well.
Prerequisites
I expect you are already familiar with some “typical” python libraries for data processing, such as pandas, numpy, matplotlib, etc. At times we will use them, but I will not cover the introduction here.
I also expect the readers to be quite familiar with coordinate systems, HTML, and some basic python coding.
You can also check this article to see how a simple library and data processing can optimize your geolocation search. You can also learn some spatial terminologies and usage of geohash and folium libraries there.
Problem Statement
The growth of a new business that was recently launched 3 months ago has stagnated in the past month. The daily number of transactions and gross revenue have remained the same and continue to do so. As such, the acquisition of new customers becomes tremendously important for the further expansion of the business.
But, the problem is, those new users seem to have low retention and are only interested in the new user rewards. There is a potential fraud case too since these new users just create new accounts, again and again, to abuse the new user rewards. While at the same time, we want to keep the new user rewards to attract new customers.
In summary, there are 2 problems the business is facing:
- User quality: Many new users abuse the new user rewards
- Low Retention: Many of the new users don’t come back again
A proportion of customers show high repeat purchase behavior and have a high average basket value. These are loyal customers, and of course, we want to get more users like them.
For the sake of example, let’s create our own artificial dataset using Google bigquery-public-data. Here is the query:
We will then save the data as dataset.csv
Potential solution
There will be many options to tackle the aforementioned problem, but in this case, we will try to use a location analytics approach.
We can try these 2 approaches to improve user quality while also indirectly affecting long-term retention, which is relatively simple:
- Try to avoid areas with many abusers. Avoiding means reducing our marketing efforts in that particular area, and even disabling new user rewards for users signing up from those areas
- Start focusing the effort on the area nearby of our good or best users. The logic is people who live in the same area/neighborhood tend to have similar profiles/characteristics/tech savviness/etc.
Preparation
Abusers are predetermined, but for best users, let’s define it first. Who are our best users?
This definition may vary depending on the business type, but in general, they can be:
- Users that are making repeat purchases (high chance of being retained)
- The one that brings the most money (most profitable users)
- Not abuser, promo-hunter, or one-timer
- Or a combination of all
For this specific use case, let’s assume our best users are:
- Non-abuser
- Have a relatively high number of transactions
Data processing
Let’s begin the coding step. We will now create 2 sub-datasets: abusers’ data and best users’ data.
Abusers data
Non-abusers data
You can see that we’ve segmented our users, with the best users as those with a high number of transactions.
Map Data visualization
Now let’s visualize the data that we processed into a map. This way we can get better insights by viewing it from a bird’s eye view.
Base Map
By using the function, we can easily extend the map visualization to show the data points from all users, abusers, and best users.
All users
Abusers
Best Users
Heatmap comparison, Abusers vs Best Users
We can also use a heatmap to quickly see the comparison between the abusers vs Best users.
We can see different concentrations of abusers vs best users, especially in the middle area of the map. As such, the possible actions are to avoid those middle areas and start focusing on areas with a high density of best users (please note that this is only an artificially created dataset, the result on your actual dataset may vary).
This approach can be useful for online marketing because you can start excluding those areas that have a high density of abusers and a low density of best users.
But sometimes, we need more specific areas instead of the ones offered by heatmap. The place needs to be very specific, for example, to launch a new offline campaign. The next approach will help us to solve this problem.
Clustering
We already mapped our best users, and now we want to get more users like them!
A simple approach is to find clusters of these best users and locate these clusters on the map. We can assume there will be more users like them living nearby (potential to become our best users too!). So locating these specific cluster locations can help us reach those potential users.
Density-based clustering
There are some algorithms that we can use, and one of them is DBSCAN. It stands for density-based spatial clustering of applications with noise.
There are 2 key parameters of DBSCAN:
- Epsilon: the distance between points. Any points with less or equal distance with epsilon will be considered neighbors.
- Minimum Samples/Points: the number of neighboring points to be considered as a cluster.
I will not cover too much detail on the algorithm here, basically, you can use any kind of algorithm as long as it suits your need and your data availability.
Note: cluster -1 means the users do not belong to any clusters.
Clustering result
We will then plot the clustering result and see whether it can give us a good insight or not.
We can see that in this specific area of cluster number 1, there is a relatively high density of the best users (some points can actually consist of multiple users because we are using station location data). We can use the coordinate of the cluster center for our online marketing in order to reach other potential best users inside this area.
Another case is for offline marketing campaigns, we can use google maps and find nearby popular locations (shopping centers, cafes, etc) to launch our offline marketing team.
Closing
Congratulations on reaching this point. So far, you have learned about the concept, case study, and approach to solving location analytics problems.
This is only the start of your learning journey, but as long as you’ve equipped yourself with the right tools and concepts, you can always bring solutions to any business problem you face.
Happy learning! 🚀🚀🚀