How We Established Column-Level Security in Analytical Data Storage With Hundreds of Users (Part 2)

Andrey Goloborodko
Wrike TechClub
Published in
5 min readJul 17, 2023

The How to of Column-Level Security

Introduction to Part 2

In Part 1, I shared why and when organizations should consider implementing column-level security (CLS) for their analytical data warehouses (DWH). Now, in Part 2, I’ll dive into the implementation process, covering the essential prerequisites, primary components, and necessary roles.

Later in the article, I’ll share our experience at Wrike in implementing CLS.

Before you begin: Prerequisites

  1. Field-level audit logs: These logs — or similar tools — allow you to see which columns users query. This is essential, as it helps prevent accidentally cutting off access for a significant portion of the company.
  2. Declarative table schema: This schema should reflect the current state of your data storage and be stored separately. Although you can technically maintain your schema within migrations or even ETL processes, having a declarative schema simplifies management. An additional benefit is that it provides a text-based interface for your security team to work with.
  3. Metadata tagging at the source level: Ideally, each data asset should have a team responsible for maintaining and understanding the data, as well as for categorizing each attribute. This metadata tagging can then be utilized within your analytical storage and integrated into your policy tag taxonomy.

Components of CLS

Policy tag taxonomy

A policy tag taxonomy is a tree-like structure that represents your data hierarchy. The upper levels of the tree should contain primary data categories from your corporate data protection policies, such as Confidential, Internal, and Public.

The lower levels of the taxonomy should consist of more specific categories directly describing the data, such as Revenue, Internal Identifier, or Email. These categories should be as specific as necessary for data owners or stewards to apply the correct tags to the data. These people shouldn’t have a problem assigning a policy tag to a column; instead, they should grab them in the most straightforward way. For example, when a data owner sees a column containing emails, they just assign a “UserEmail” policy tag to it — without thinking if it’s Confidential or Private or Sensitive or whichever categories your security team operate on.

Data tagging process

Data tables should be tagged with policy tags during table creation by the team responsible for the table. Policy tag requirements should be incorporated into the development process for the functionality supported by the table.

Granting access to policy-tagged fields

These are two crucial rules when granting access:

  1. Grant access to roles, not to individual users. Roles can encompass multiple users or be tied to a single user, and they should be associated with specific business functions. Examples of roles include “Financial Analyst,” “Support Tier 1,” “Emergency Bugfix Research,” “Engineering,” and so on.
  2. Grant access to higher-level categories in the taxonomy tree. For example, policy tags such as LeadEmail, LeadPhone, and LeadName could be included in the LeadContactInformation category. Then, you can grant your sales team access to lead contact information rather than specific policy tags.

Roles supporting CLS

  1. Taxonomy Tree Owner: This person maintains the taxonomy, advises other roles on which policy tags correspond to specific data categories, and determines how policy tags should be grouped.
  2. Data Owners or Data Stewards: These individuals are responsible for tagging columns with policy tags.
  3. Access Manager: This person reviews and grants permissions for access to specific data categories.
  4. Project Manager: This person oversees the CLS implementation at all stages, from creating the taxonomy tree to reconfiguring all access request forms.

How we implemented CLS

Act 1: Architectural proposal, preparation, and idea verification

CLS implementation is closely tied to the data access process, so we needed approval from our security team before starting the project. During this review, we also detailed the basic elements of our taxonomy tree.

It’s essential to discuss your CLS architecture with colleagues who will be affected by the changes, as well as experts you trust, even if your organization does not have a formal architecture review process.

Act 2: Test data tagging

Not all data requires CLS, and it’s always possible to start small. The first challenge was tagging all data fields. Fortunately, we already had a tagging system in place in our backend for GDPR compliance, which we happily repurposed. However, some data still had to be manually tagged. We passed this job to distributed teams with domain knowledge of the data.

As we tagged the data, we created the first version of the taxonomy. At this stage, only the most granular policy tags were required, which were assigned directly to table fields. It’s important to note that while we tagged data with policy tags, CLS was not yet enforced.

Act 3: Granting user access

The second challenge was limiting access without frustrating users. Audit logs came to our rescue here. We didn’t need to decide which users would have access to specific data categories — instead, we granted access to the categories users were already utilizing.

Once audit log analysis showed that users were only querying fields they were allowed to access, we enforced CLS. From that point on, we granted almost all users access to production data, as confidential data was protected by corresponding policy tags.

The final act: Adjusting processes to accommodate CLS and removing unnecessary objects

The third challenge was adjusting access provisioning processes to account for policy tags.

Firstly, we needed to explain to the Help Desk team that formerly confidential datasets could now be considered more open, as confidential data within them was protected by separate CLS policies.

Secondly, we had to consider that users now needed to request access not only to datasets but also to specific data categories. Since users were usually unsure about the data they needed, we kept the old dataset access request forms but added an optional field where users could specify specific table fields or data categories.

Lastly, we got to the most enjoyable part of the project: removing unnecessary objects. Extra views and tables that were only needed to grant select users access to partial datasets were deleted, as we could now give access to the entire dataset without compromising security.

Conclusion

In this two-part article, I’ve covered Wrike’s journey towards CLS in our analytical DWH.

In Part 1, I discussed the benefits of CLS, such as:

  • Enhanced data protection
  • Data democratization
  • Improved data comprehensibility
  • A simpler data model

I also talked about when organizations should consider implementing CLS based on their specific needs and user categories.

In Part 2, I delved deeper into the implementation process, covering the following:

  • Prerequisites
  • Main components
  • Supporting roles required for successful CLS implementation
  • Our practical experience of implementing CLS at Wrike

You are welcome to discuss and share your own experiences and insights regarding CLS implementation in the comments section! 👋

This article was written by a Wriker in Wrike. See how it is to work with us and what career development opportunities we offer here.

Also, hear from our founder, Andrew Filev, about Wrike’s culture and values, the ways we work and appreciate Wrikers, and more here

--

--