How We Established Column-Level Security in Analytical Data Storage With Hundreds of Users (Part 1)

Andrey Goloborodko
Wrike TechClub
Published in
4 min readJun 26, 2023

The Why and When of Column-Level Security

TL;DR:

These are the four main advantages of column-level security:

  • Enhanced data protection
  • Data democratization due to wider default access
  • A simpler data model, as you don’t need authorized views
  • Improved data comprehensibility due to richer metadata

Introduction

Column-level security (CLS) is a crucial aspect of managing data access, particularly in organizations where multiple teams need access to different subsets of information. Implementing CLS allows for better control over who can access sensitive data, and it ensures the right people can work with the data they require. In this article, I’ll walk you through the process we followed to establish CLS in our analytical data storage, which caters to hundreds of users across various roles and responsibilities.

Wrike’s internal analytics platform had a total of 150 active users, with more than 50 creating their own data sources in the data warehouse (DWH). There were around 15 different departments and teams with potentially varying access rights, as well as approximately 1,500 public tables. During the first iteration, we covered 300 of them with CLS.

Previously, we granted access only at the dataset level. Sometimes some data from a dataset was needed by users who didn’t have access to some other data in the same dataset. Thus, it was impossible to grant full access to the entire dataset, so we created authorized views instead. At some point, these views became so numerous that we decided to stop tolerating this and implemented CLS.

Prelude: Our DWH composition

In this article, I will intentionally not focus on our specific implementation of CLS. There will be no code snippets, and I will only mention specific technologies for the sake of completeness. First, almost all big data platforms have their own CLS variants, so I will try to be platform-agnostic in my narrative. Second, the main challenges when implementing CLS in an existing platform are organizational, not technical. I will emphasize organizational difficulties and won’t focus on a particular implementation while trying to be platform-agnostic.

Nonetheless, without describing our tech stack, the story will lack essential details. Moreover, other providers may have their own peculiarities that will not be adequately considered in this article.

So our technology stack is based on Google Cloud tools. We use BigQuery as our main storage and implemented CLS using Google Data Catalog and Terraform. We import data into our BigQuery from the Wrike backend and from almost all systems and applications used across different departments: product, marketing, sales, finance, tech support, engineering, customer success, and more.

Why CLS?

Architecture Without Column-Level Security: Using Authorized Views for Access Provisioning

The obvious answer is because you want to restrict access to specific fields for some users and strengthen data protection. However, it’s usually the other way around: You want to grant broader access to data, but since confidential and non-confidential data are all in the same datasets, it’s difficult to do. Thus, CLS helps democratize access to data. When you know that confidential data is protected, it’s easier to open the dataset for a wider audience so they may use non-confidential data from that dataset.

For users without the necessary permissions, special views or even table copies must be provided. These copies provide users with basically the same data but with confidential data filtered out. This increases the number of objects in your DWH, making it harder for users to navigate, and you’ll have to spend more effort on maintenance. CLS allows you to remove redundant tables and views, which makes your data structure simpler.

Architecture With Column-Level Security

Also, there’s a less obvious but significant advantage: An already deployed CLS represents high-quality metadata. Each policy tag carries additional information about what is stored in the fields of your table. For example, it’s much easier for the security team to understand which data users are requesting access to.

When should we implement CLS?

The key criterion is whether you need to maintain additional views and tables for specific user categories that can’t access the main dataset, as it has some confidential data. Complications will arise when there are more than two such categories of users.

Another indicator is when granting access to data requires more time and involves more people than you would like. If the security team involves data engineers in access requests, it’s a clear signal to introduce CLS.

Conclusion

CLS provides improved data protection, easier data access, better data understanding, and a simpler data model. Assess the requirements of your organization and the types of users who need access to data. By understanding these factors, you can determine the right time to introduce CLS.

Stay tuned for Part 2, where we’ll explore the implementation process; discuss the prerequisites, components, and roles; and share our experience at Wrike in implementing CLS.

This article was written by a Wriker in Wrike. See how it is to work with us and what career development opportunities we offer here.

Also, hear from our founder, Andrew Filev, about Wrike’s culture and values, the ways we work and appreciate Wrikers, and more here

--

--