Leverage Daily Emails with Key Metrics at Lime

Published in

Lime Engineering

8 min readAug 4, 2021

At Lime, we rely on internal data reports to guide many of our key business decisions, and we’re constantly looking for ways to make these reports as accessible as possible to our employees. To that end, historically we’ve leaned heavily on our custom-built web portal, called the “data portal,” which consolidates data from many different avenues such as Google Sheets, Tableau Dashboards, and our own custom-built, web-based dashboards.

However, our employees experience significant friction as users of the data portal. To get the data they need, employees must log into the data portal, find the dashboard they need, set their desired filters, and view and analyze the data returned. Sometimes, users of our data portal need to repeat this process on a daily basis, especially when monitoring our high-level business metrics.

Instead of requiring our employees to visit the data portal every single time they need to access the same set of metrics, we wanted to figure out a way to push that data directly to them. We ultimately decided to do this via automated email reports that are sent directly into our employees’ inboxes.

High Level Design

To build these email reports, we need to address these problems:

Where do we get accurate data to compose the report?
How do we send the reports to users periodically?
How do we customize the content of reports for each user?

Given that the data engineering team at Lime developed a data ecosystem with Lime Data Warehouse (LDW), which serves as our single source of truth for data analysis, and the data portal web application that serves as a single entry point for data consuming, we were able to leverage the LDW and the data portal to build out the email reporting system.

So to answer the questions in the previous section accordingly:

We can query the LDW reporting layer in SnowFlake to get the aggregated data for reporting purposes.
The reports can be generated with prebuilt HTML/CSS/Javascript templates and sent to users with SendGrid by running a scheduled job on the data portal server.
Users can customize and subscribe/unsubscribe to reports on the data portal client with preferences data stored in the MySQL database.

The following diagram shows what the overall design looks like:

In the end, we designed two solutions for the email reporting: the Daily Overview Report and the Metrics Summary Report. These two reports serve different purposes:

The Daily Overview Report is the “One Report For All,” meaning that it is the same report for all subscribers. Users subscribe to these reports by joining a specific mailing list. This report provides a generalized overview of company key business metrics everyday.
The Metrics Summary Report is a customizable report for any user. Users can specify what data they want through the data portal client, and a report with specialized content will be sent to each subscriber based on their preferences. This report especially enables the operators to watch multiple regional data together side by side.

Implementation details

The design and implementation behind the scene for the two reports are mostly the same. The data portal server first connects to Snowflake ( “snowflake-sdk”) and queries the data from LDW to create the reports. It then uses SendGrid to send the emails. This process is run on a daily cron job (“node-schedule” ).

Daily Overview Report

The Daily Overview Report serves as a performance summary for all of our markets and products.

The report is broken down into sections/subtables that display key metrics for each region, broken down by product. A product represents a mode of transportation, and today we provide 3 modes: scooters, e-bikes, and e-mopeds. Having dedicated sections for each mode allows our internal stakeholders to see how our metrics are doing on both a market level and a product level.

In addition to product-specific sections, the report contains an “All Products” section to show how each market is doing across all modes, as well as a “summary” section that shows how the company overall is doing for each mode. With all of this in mind, we fetch the relevant metrics (and calculate their week-over-week change) for the appropriate product and markets and use that data to populate our report. Below is an example of what the daily email report looks like:

We also maintain a preset list of metrics to display. These metrics are decided upon by business and data analysts where they determine which metrics can provide the best insights on how the company is performing. These metrics are then calculated and stored within a Snowflake Table that we had mentioned before. When we build out our report, we maintain a hard-coded list of metric config objects that help us determine things like which Snowflake table column do we reference and what format do we use to display the value.

Example for daily overview report configs

Metrics Summary Report

While the Daily Overview Report sends the same email to all subscribers, the Metrics Summary Report is meant to be more personalized. The report provides configuration options that allows users to select regions and products for which they want to receive daily reports.

As with the Daily Overview Report, we query Snowflake to get data for the summary report for all regions and products first. With the query results, we construct a hash-map like object in the following shape:

Type definitions for data used for reporting

Here, the key is the region that represents each market for reporting, the value is its corresponding reporting data. In the value object, the current and prev object are used to calculate the week over week difference for metrics.

Unlike the Daily Overview Report where we broke it down into smaller subtables, the Metrics Summary Report is sent as one consolidated table. This is because one single Metrics Summary Report will contain fewer regions due to users typically selecting only a few regions that are relevant to them. Furthermore, consolidating region-based and product-based data into one table makes it easier to do a side-by-side comparison as shown below:

Another key difference between the two email reports is that the product and region values of the Metrics Summary are dynamic as they are based on the user’s preference. This is where the hashmap object comes into play. Instead of making a query for each user preference, we run a single query to fetch metrics for all regions and products, and preload them into the hashmap to reduce database I/O. When we go on to create an email report for a user, we use the hashmap to quickly fetch the metrics for that specific region and product, and then populate the table content.

To store a user’s email report preferences, we created a table `email_report_subscription` in data portal’s MySQL database:

schema for email_report_subscription table

The report_id field is used to identify which report the user is subscribing to. This column allows us to easily add more email reports with customizations in the future. The regions and products columns are used to keep track of a user’s preferences (for which regions and products does the user want to receive daily metrics for?). The active column is a boolean value which indicates whether or not the user is subscribed to the report.

Based on these configurations, if a user selected N regions and M products, the table in the email report will have N * M * X cells where X is the number of metrics.

Although we consolidate all of the data into one table, we still break the table down into subsections for better readability. To implement this, we maintain a list of Section Objects where we store the name of the section as well as the columns/metrics that the section contains, similar to the ones we used in the Daily Overview Report but with an extra section layer:

Example for metric summary report configs (sections)

Challenges and Solutions

Google Groups were difficult to maintain

As mentioned before, the Daily Overview Report is sent to certain google groups. However, over time, these email groups were not being maintained regularly which made it difficult to see who was or was not in the list. People would later find out about the report and request to be added to the email group, causing more manual work. To solve this, we came up with a self-serve solution and created a designated group for data reporting. With this new email group, we included a link within the email report that users can use to add/remove themselves from the group.

Data was not ready when report is sent out

As mentioned earlier, the content of our email reports came from a Snowflake table in LDW that was updated periodically by Airflow DAGs. We encountered an issue where there was a race condition in which we sent out an email report before the DAG was able to update the table. This resulted in incomplete data within our reports. To fix this, we configured the cron job to send the email at a later hour to give more buffer time for the table to be updated as a temporarily solution. And moving forward we’ll need to optimize the DAG to make the run faster.

Future improvement

Set up monitoring and alerting

The daily overview report is something that many people expect to receive. It would be problematic if something unexpected happened and the report was not sent. It would be beneficial to add additional monitoring/alerting to be notified of these issues and enable the team to take more proactive actions.

Add resend button under admin console

There are cases when the report is not accurate for external reasons and needs to be resent that day. The current process is very manual where we need to resend the email report through our own dev environment. A future improvement is to surface a resend button in the admin console of our data portal client so that any admin user can go there and resend the report.

Lastly

Thanks for reading and stay tuned for future posts about the investments at Lime on the toolings that leverages our daily work.

Also big thanks to Garvit Patel, Can Wang, Jue Zou, the Data-Eng Team and Tooling Team at Lime for initializing and helping out the project.

Thanks to Steve Jin, Xiuming Chen, Charlie Fang, Ian hook and many others for reviewing this article.

Co-Author: Yin Zhang, Jiaying Liu