Enabling data accessibility at 7-Eleven (with Data Studio and BigQuery)

Trung Luong
7LAB
Published in
9 min readDec 12, 2018

It’s crucial to a data-driven organization that right information need to be accessible by right people to make sure everyone from bottom up aligns on ultimate business goals. The KPI breakdown and tools to keep track those KPIs need to be conveniently accessed by whoever in charge. In our case at 7-Eleven, we need to provide comprehensive reports of store performance to staffs of hundreds or thousands stores.

At 7-Eleven Vietnam we own the advantage of having a dedicated software engineering team who builds the retail management system from ground up, it’s straightforward to sit down and develop a report module for the store portal using Ruby on Rail with some touch of ReactJs for visualization.

What happen next would be like: after a two-weeks-sprint we have the new report module ready for UAT (user acceptant test), then go production one week after that if (luckily) no requirement changes. Three weeks time is not so bad, but I’m about to explain the alternative approach which takes only three hours of mine.

Phiên bản tiếng Việt của bài viết

Leverage Google Cloud Platform

When it came to business intelligent solution, we had approached Tableau, Holistics and DOMO. All of them are great services, very powerful and convenience for business users, however by default they charge by number of users or objects (dataset/connections/visualization) which doesn’t make sense to us. As the number of stores grow we’ll have huge number of users and reports need to be produced.

Meanwhile, Google BigQuery offers a sustainable pricing model which charges by data processed, as long as you have proper partitioning and caching, the cost is fairly reasonable. Not that I’m comparing above services with BigQuery, they’re fundamentally different. Just my point of view on the pricing models. In fact, we’re currently using DOMO for director level and above. All of above services will happily offer custom pricing bundle for enterprise, it’s very depend on current setup (at scale or startup, how much do you want to invest in in-house tech/data effort…) of your business to pick a right solution.

Setting up data pipeline for BigQuery would be another topic, let assume that we already have everything in place: all data we need is able to be queried from BigQuery datasets. Now, what we’re aim for is to create one single sales report from sales transaction dataset, and make it available for every store manager with proper authorization: by accessing that report each of them can view sales of their stores only, not other’s stores.

Adopting Google Suit since the beginning is our huge advantage, we can leverage existed authentication of Google accounts. Google Cloud Platform is evolving very fast recently. My favorite is AppsScript, it is very powerful yet easy to start with. And it’s also an important part in our business solutions, I will write about it in the separated article.

Enabling row-level permission in BigQuery

Row-level permission (or Personalized permission, or User filter, or Row-level security ) is to control access to rows in a database table based on the characteristics of the user executing a query (for example, group membership or execution context). In our case: in the table which records whole chain transactions, store manager should be able to view transactions of their store only.

The introduction of row-level permission in BigQuery has been mentioned in this 3 years old Google’s blog post, but since then there’s no official instruction on how it’s actually implemented. Surprisingly, even a Google Solutions Engineer from Singapore who assist us on setting up Google Cloud Platform few months ago suggested that we should create difference views for each store. Only recently, this article explains it properly which also describes exactly what I have done for our project. In this article, I explain it from my point of view with a real business use case.

The key of row-level permission is the function SESSION_USER() in BigQuery, it returns the email address of the user that is running the query. Let park that for a second, I’ll come back to this later, we need to have the permission tables in place first.
Updated — Feb 2020: Since Google Data Studio has just release the function Filter by email address on Feb 14th. It allow us to use @DS_USER_EMAIL as an alternative for SESSION_USER() in custom query. Please find further instruction at the end of this article.

1. Setup Permission Tables

Below is a basic database schema for controlling staff-store assignment. We’ll filter the transactions by credential of user who view the report (whose email return from SESSION_USER() ) by the store they’re assigned.

Staff assignment setup

Apart from the staff assignment table, you may want some special people that can overview numbers of whole chain such as CEO or Director. To eliminate the pain of modifying assignment of these people whenever there’s new store open, I recommend having an additional table for that purpose. We should end up with two views as below. sales_report_permission the list of emails and corresponding store IDs, sales_report_viewer the list of emails of people have full view access. We’ll use them in the next step.

Tips: For easier verifying result, you can just simply create the two tables above on Google Sheet and have it saved as a table in BigQuery. You can also quickly modify the permission setup and see it reflects in BigQuery in real time.
Note: The Google Sheet also need to be shared (view-only) to the people on the list.

2. Write a BigQuery Views

Assuming that I have already created a view for sales report with all proper calculations called sales_report_view. However, it isn’t filtered by viewers. Therefore, I’ll create a wrapper view like below (Standard SQL) and save it as sales_report_by_viewer_view.

Now we have a view that result will be filtered by current login user. So basically we’re done, right? Not really! You will need to share the view with anyone that may need query data from it, and it is the real problem.

Unfortunately, BigQuery doesn’t allow us to share the view, but only the dataset. Which means you will expose all the table/schema/query in that dataset to all the store managers, if you chose to share the dataset that you’re working on. That’s why we need to create a different dataset for sharing purpose and it only consist the table/view you intent to share. Let name it dataset_shared, in which we create another view called sales_report_shared as simple as below.

We will end up have a setup like below. This way, whoever has access to the shared dataset will only able to explore the information we intent to share. Everything (real query, table schema, authorization setup) is wrapped in a simple query which doesn’t mean anything to most people

Next step we’ll learn how to share the dataset_shared.

3. Setup BigQuery Permission

For the view (sales_report_shared) to be able to query data from other view from another dataset, we need to setup Authorized View. In short, you need to go to sharing option of dataset (the dataset consists the view sales_report_by_viewer_view), select Authorized View then select the view sales_report_shared from dataset_shared as demonstrate below.

Sharing the dataset to an authorized view

After saving the change, the sales_report_shared view can return proper data. However, store managers still haven’t had access yet. The Identity and Access Management (IAM) setup allows us to do it.

Firstly, let create a custom role (in my demonstration I name it DataStudio Viewer) with the two permissions below. they are the only permissions required for executing a query. The other permissions of the default role BigQuery User or BigQuery Data Viewer are overkill for my specific purpose.

bigquery.jobs.create
bigquery.tables.getData

Create new role with just this permission

Secondly, go to member page, where you can add store manager emails to the role DataStudio Viewer. For least effort maintain the list of people who can access the reports, I suggest to add the domain of your company (or the email group of store managers). That way, although other people can access to the reports, they can’t see any data because we already control the permission at data level.

Add your domain or email group to the DataStudio Viewer

That’s it. At this time you can try login with email of a store manager (or tweaking the above permission tables with your alternative email) into BigQuery and execute the sales_report_shared to verify the filter effect.

4. Visualize on DataStudio

DataStudio is a tool for visualizing data which is at it early stage yet it still powerful enough for enterprise use. More importantly, it’s within the Google Cloud Platform ecosystem. If you are not familiar with DataStudio, I suggest you to do some research before read on, but trust me, visualizing data on Data Studio is as easy as creating a deck using Google Slide.

Update Mar 2019: Google has just introduced online course for Data Studio beginner which you can enroll for free https://analytics.google.com/analytics/academy/course/10

A sample dashboard from Google Data Studio team

Now what you have to do is to connect BigQuery as a new datasource. Selecting the view sales_report_shared as datasource. In the datasource schema review step, make sure the flag USING VIEWER’S CREDENTIALS is on. By turning it on, the query when being executed will use the viewer’s credential instead of the owner who created the report.

Make sure flag USING VIEWER’S CREDENTIALS is on

After you finish create proper visualization on Data Studio, final step is to share the report to store managers using the share option of Data Studio which is similar to share a Google Docs. You can confidently share it with the whole organization or with the email group of store managers, I just keep repeating myself: permission already be controlled at data level.

Sharing the Data Studio file

Final thoughts

My final optional step is to embed the DataStudio report into the store portal which is a website for store manager to manage their store, to make it convenient for them to access the reports.

The cool thing is when you need to modify something such as adding metric or tweak formula, you just need to do it once, the changes will reflect everywhere.

Although the store managers (if they curious enough) will able to access our BigQuery and run the query (on the view which they are shared) directly, but I don’t see it is a big problem as we have already control the permission at data level.

To me, the most satisfaction with this setup is that I don’t have to manage the users and the authorization whenever there’s new employee, new store which happen in daily basis in our business. The HR department who manage the employee directory through G suite will do the job.

All the steps above, as I said, only take three hours of only me which can replace the three weeks effort of backend, frontend devs and QA. It’s also a scalable solution, not just a quick hack or work around. As the number of store increase which lead to large number of records for the consolidated view, you may need to do partition and cache on the tables, they’re standard practices.

Google Cloud Platform is evolving fast, every week we can see some changes in their products. It open the whole new approach for business solution design, go through the cumbersome software development process (even with agile) is not the only option if you have a right setup.

Updated — Feb 2020: Since Google Data Studio has just release the function Filter by email address on Feb 14th. It allow us to use @DS_USER_EMAIL as an alternative for SESSION_USER() in custom query.

In the new setup, at step #2 — create the BigQuery view above, we don’t need the where condition to filter records by SESSION_USER(). Instead, at step #4 — connecting the data source to Data Studio, we chose CUSTOM QUERY and write something similar to below screenshot to achieve the same result (row-level permission).

Note: The Enable user email parameter must be checked

Like what you’re reading? Follow us on Linkedin and Medium. We are developing the digital layer for 7-Eleven Viet Nam including the entire retail IT system as well as customer facing components like 7REWARDS and 7NOW.vn

--

--

Trung Luong
7LAB
Editor for

Doing data, tech and strategic partnership at 7-Eleven Viet Nam