Uninstall tracking via BigQuery and Airflow
At InCred, our mobile app is an integral part of our lending process. It helps customers complete their loan application, gives them information about their loan, payment history, allows them to make payments and stay in touch with InCred. So it is critical that our customers keep the app installed. However, sometimes customers may remove the app inadvertently or otherwise. In that situation, they may miss payment reminders and other vital communication related to their loan potentially causing them financial harm. So we wanted to reach out to customers in case of an uninstall quickly. In this article, we will talk about how we are using BigQuery to track user uninstalls in conjunction with Airflow to fire remedial action notifications to ourusers.
Hunt for an API
We would like to build a REST API on top of BigQuery which can tell the app status for a given user on the given month range. At InCred, language is not the barrier where a microservice can be written in Java/GoLang/NodeJs. For this case, we went with NodeJs and used BigQuery NodeJs client along the way. The API would have the following request body:
<list of user identifiers>
For every user, this API should return the status of our app with values ranging from Uninstalled/Installed/Never Installed.
Our android app fires many events to Firebase Analytics. These events help in analysing user behaviour. To gain deeper insights, we have BigQuery integrated into Firebase analytics. To track uninstalls, we started looking out whether we can explicitly fire any event when the app is getting uninstalled by bundling it with user information. Unfortunately, this was not available and then we came across firebase auto-logged events, where one of the events
app_remove was specifically meant for that purpose. To define the uniqueness of a user, we send
user_idwhen the users authenticate via phone. Right, so just querying with
app_removeevent name will not solve the problem. That wasn’t mean to be!!
What if the user had reinstalled the app again?
If we just went with listening to the
app_remove event, the user having reinstalled the app, will be shown as Uninstalled. Clearly not the desired result. We also wanted to check if we can find anything unique about the user before logging into our app and then we saw
user_pseudo_id . Firebase automatically generates this unique id within BigQuery for each user. So, we created a map of
user_pseudo_id and then analysed the uninstalls. This failed too as it started giving false-positive results.
When the problem becomes complex, it is always good to break it down into sub-problems. Then, solve the sub-problems and connect all together. For a user on the given month range, we broke this problem into 3 parts:
- We would get the maximum
app_removeevent. This would also club the fact if the user has installed/uninstalled the app multiple times. Since BigQuery shards the table on the basis of date, we used wildcard(*) to cover all the tables for a particular month.
select max(event_timestamp) as app_remove_timestamp, user_id as user_id from " + <table_name.yyyymm*> + " where event_name = 'app_remove' and user_id is not null
- We would get the maximum engagement timestamp of a user. This would get the last timestamp when the user engaged with our app.
select max(event_timestamp) as max_engage_timestamp, user_id as user_id from " + <table_name.yyyymm*> + " where user_id is not null
- Both the results would also be run for the same day since the BigQuery intraday table name is a little different. We would combine the last results to give us the bucket for Installed/Uninstalled/Never Installed users. Let’s see how…
a) If the
app_remove_timestampis greater than
max_engage_timestamp, then the user has Uninstalled the app.
b) If the
app_remove_timestampis null or lesser than
max_engage_timestamp, then the user has the app Installed.
c) If both the timestamps are null, then the user has Never Installed the app in that range.
Note: All this works for a given month range. We know for sure that when the user has taken a loan from us and then give an estimated month range on this API to give the desired result.
Why not include date too to make this API complete?
With month ranges, we were not able to track the uninstalls on weekly basis or in the given time frame, say, we wanted to track from 5th Jan 2020 to 3rd Mar 2020 where we ran a campaign or disbursed loans via some offer. To cater that, we changed the request body a little to include optional
endDay and then tweaked the earlier solution. With the help of
TABLE_SUFFIX , we were able to scan the table for the given date range.
For the range like 5th Jan 2020 to 3rd Mar 2020, we used
TABLE_SUFFIX to get the data set from 5th Jan to 30th Jan, then ran a Wildcard(*) for Feb month, and lastly ran a range query with
TABLE_SUFFIXto query from 1st Mar to 3rd Mar.
With this, we successfully developed an API over BigQuery which can track user uninstalls on the given date range.
Periodic notifications via Airflow
Once we developed an API, we wanted to utilise it the most by pushing automatic reminders to our users to have the app installed. At InCred, we use Apache Airflow to program, schedule and monitor our workflows. Airflow provides a very neat dashboard to monitor your jobs, logging, retry policy and much more via DAG runs.
Let’s break the problem. We would get the users with active loans(SQL query to our database), run those users on our Uninstall tracking API, and then send notifications via SMS to all those who have uninstalled our android app.
Since the users with active loans would be huge, running every logic on a single DAG would take a lot of time for completion. Also, if there are a huge number of users then our SQL query underneath BigQuery would significantly increase thus hitting the Maximum unresolved standard SQL query length of 1 MB. So, we broke this problem even further and created multiple child DAGs.
- We set our master DAG to get the users having active loans.
- We configured the batch size on Airflow to create a child DAG for every batch.
- These child DAGs would run in parallel to get install status and send notifications.
- Child DAGs would have their own retry policy along with master DAG.
Once we brainstormed on the solution, with Airflow, creating this setup was a piece of cake. We did just that and executed our workflow to work like a charm. We then went ahead and scheduled it bi-weekly for periodic reminders to our users.
At InCred, by relying heavily on technology, we are on the way of revolutionising the loan-lending business. We are using latest tech stack and cutting edge technologies across all domains (Android/DevOps/Platform/Web). If this excites you, then we would be happy to talk to you. We are Hiring!