Logging And Alerting in Android Tokopedia

Hendry Setiadi
Tokopedia Engineering
5 min readNov 27, 2020

co-authored with Leonard Gunawan

In Android Development, we are often asked for below questions:

  • When users get crashes, what are the crashes? How many and how do the crashes occur?
  • We also want to know the non-fatal incidents that happen.
    Is there a page that is not correctly opened?
    Are there anomalies in the api response?
  • We also want to get the details of the issue. What OS version, brand, user id?
  • Can we have an interactive dashboard? We want to be able to do filtering, sorting, navigating for a log. For example, we can search the logs based on app version, OS version, error Type, severity, and many more.
  • Are real-time logging and alerting possible? If we know the incident in real-time, then we can handle the incident faster.

Here are 2 solutions that we already used: Firebase Crashlytics and Custom Logging. We will cover the latter in depth.

A. Firebase Crashlytics

Firebase Crashlytics (or previously Fabric) is one of the solutions for crash logging.

It is simple, very easy to implement, and already has the capability that we want. It automatically reports the crash to the Firebase Dashboard. We can also retrace the app events leading to crash. We can spot the root cause of the crash. Firebase Crashlytics sends the device detailed information, e.g. OS Version, model, threads, etc. It also integrates with Google Dashboard.

Firebase Crashlytics also supports logging non-fatal issues. We can just type

FirebaseCrashlytics.getInstance().recordException(e)

We also can put the custom tags, making it more dynamic and easier to trace the root cause.

FirebaseCrashlytics.getInstance().setCustomKey("int_key", 50);

B. Create Custom Logging System

Creating a custom logging system is also a choice. We can create a system that replicates Firebase Crashlytics behavior.

Benefits of custom logging system compare to existing Firebase Crashlytics:

  • We can customize the logging more as we like, for example, we even can add more device attributes, such as device fingerprint, battery information, network type, and many more.
  • Some Data Cloud sometimes provides better log parsing and better navigation to the logs therefore provides better insights for the logs.

We use this custom logging system as an addition to Firebase Crashlytics, not a replacement.

The implementation of our custom logging system is very straightforward.

How to Build Custom Logging System:

We created the system by using the existing popular libraries: Timber and Room.

Let’s deep dive into the components

Timber

A logger with a small, extensible API that provides utility on top of Android’s normal Log class.

The implementation is quite straightforward like the documentation. We put plant the tree on Application onCreate:

Then, every time we log with Timber in any module in our application, the function saveLog() will run. Therefore, using the Timber it is easier to send the log and centralize the logic of sending data.

Later, in our function saveLog(), we can process the log. We can add custom attributes, for example, timestamp, device id, user id, and many more.

Example output after processing the message:

message=’custom message to send’; device_id=’abc123’; user_id=‘user123’; timestamp=’1598968229’

Room (or any local database)

Room is basically an SQLite local database we can use to store the log messages.

The function of Room here is to store the data before the data is sent to the server. Why store the data? It is because users do not always have a good internet connection to send the log. If we don’t store the data in a local database and if the user doesn’t have an internet connection, the data will be lost. By storing the data, we can later send the data to the server when the device is online. Hence, it will minimize the risk of missing the logs.

The other reason we store the log is we can later send the database rows in batches (more rows in one hit).

Continuing from the previous code, the function saveLog() basically just stores the data into the local database.

WorkManager to periodically sent the logs

The next step is to send the log. In this case, we use WorkManager to send the data.

The sendLogToServer() function is to get data from the Room database, and then send them into batches to the remote server. After the log is sent successfully, we delete those data from the database.

So, that basically wraps everything in how we create the logging system to mimic the Firebase Crashlytics.

Later, if we want to send logs each time the crash happens, we can put the below code in Application onCreate:

To put a general or custom message we can call Timber anytime and then it will automatically send the log to servers like below:

Timber.w("Invalid deeplink: http://abc.com")

Scalyr: where the logs are stored

There are many options where it comes to choose the data logs. Currently, we use Scalyr.

Scalyr dashboard

Thanks to Scalyr, there is a capability to parse the message into the field and also the parser can be customized to our needs.

The dashboard also allows us to interact and deep dive the log, we can slice the data by clicking the possible values that exist in the log. That is very convenient especially when we want to analyze the root cause and what can be improved.

Alerting

Last but not least, we need the alerting system to notify the team if something unwanted happens in the production. We can notify via email, or any communication tool, e.g. Slack.

Scalyr alert management

Scalyr already provides the alerts system that we can customize. For example, we want to send the alert to Slack if a certain event reaches the threshold.

Example for alerting system in Slack

With this alert, we know the issue faster. Thus, fixing the issue will be faster too.

Conclusion

Now you know how to create a custom logging system like Tokopedia Android Application. It is so simple, yet very useful. A custom logging system can become a nice addition to Firebase Crashlytics.

We should have known that it will be very difficult to have a flawless release. With a careful approach to monitoring and alerting, we can acknowledge and fix issues that happen in production quickly. Also, by using this logging system, we can analyse the user behavior and hence continue improving the application.

Hope it is useful.

Happy android development.

--

--