Data Privacy & Governance: A seed VC’s perspective

Harshal Gupta
araliventures
Published in
6 min readOct 9, 2023

Part 2 of our series on data privacy, governance, data in the context of generative AI, and more.

We started the series with a primer on our perspective surrounding Data Privacy & Governance challenges within enterprises. In this post, we delve deeper into the privacy aspects, initiatives that enterprises are taking, and opportunities in the space.

Towfiqu Barbhuiya via Unsplash

According to Latanya Sweeney, Professor of the Practice of Government and Technology at the Harvard Kennedy School, “87% of people in the United States are estimated to be unique based on date of birth, gender, and ZIP code.” Information often assumed to be anonymous — like birthdate, gender, and ZIP code — can be linked to specific individuals in public, non-de-identified datasets, like voter lists.

Until now, we saw enterprises invest heavily in data security initiatives within the organisation. However, since the last 2–3 years, we have seen CIOs ramp up investments towards data privacy initiatives too.

Whilst data privacy and data security both may fall under the gamut of data protection, they are vastly different in terms of their end goals and the problems they tackle. Data security is focused on safeguarding the enterprises’ data from external forces, whereas Data Privacy focuses on safeguarding PII, defining suitable access and storage guidelines, and ensuring appropriate use of data containing PII inside or outside an enterprise.

In a McKinsey survey, 87% respondents said they would not do business with a company if they had concerns about its security practices. Seventy-one percent said they would stop doing business with a company if it gave away sensitive data without permission.

Customers too are seeking businesses that prioritise privacy protection as fundamental to their business. All of this is igniting conversations at the highest levels of business, and companies are marching towards better data privacy and protection initiatives.

Data Privacy opportunities

Several opportunities have emerged to help enterprises implement enhanced data protection safeguards. These span operations, policy creation and implementation, infrastructure, access, masking, controls, mapping, sharing, and more.

Data Mapping

This refers to inventorying the data stored at multiple places. The output of this is often known as the Data Map. Modern day privacy regulations such as EU’s GDPR and California’s CCPA mandate companies to have an up-to-date Data Map. With troves of data being stored in enterprises, building a data map can be a daunting task and manually updating them is often impossible. Automated data mapping tools with a wide array of integrations and connectors can help enterprises continually audit their data.

Data Masking / Data Obfuscation

This is the alteration of existing sensitive data like names, card information, phone numbers, addresses, etc. into fake but convincing replicas to enable effective use of data across analytics and ML models while still preserving the privacy of it.

Data masking helps enterprises in making the data unusable in cases of data leaks, while still allowing internal and external teams leverage it in their processes.

There are multiple ways in which data can be masked:

  1. Static Data Masking: Static Data Masking creates a safe copy of a database that can be shared internally or externally. It alters the data in an irreversible manner, and the original data cannot be unmasked from the masked copy.
Example of Static Data Masking. Source: Microsoft

2. Dynamic Data Masking: Dynamic Data Masking is usually done on the original database and happens on-the-fly, typically during a query run. Data controllers can also apply different ways to mask the data for users accessing the data or based on a particular use case.

3. Data Scrambling: Re-organising the characters in the data randomly and replacing the original content. This is typically less secure and cannot be applied to all types of data.

4. Null: Marking the data null when viewed or accessed by an unverified user.

There are many more ways in which data can be masked. Before enterprises can proceed with data masking implementation, they need to define the information to be protected, mapping users and applications to the data that they can access.

Companies like Skyflow, Immuta and Protecto.ai are building products and capabilities in this space.

And in the rapidly changing world of LLMs and Generative AI, it is much more imperative for businesses to be aware of how and where their data is getting used. We are already seeing reports of employees inputting company data into ChatGPT and how over a quarter of that data is considered to be sensitive information. Companies need to ensure that no sensitive data can be used as inputs for training LLMs, or as text/file prompts in AI systems like ChatGPT.

Data Access

Implementing Data Access controls is often the first step that enterprises take in their efforts towards ensuring data privacy and data security. It’s imperative that the policies to establish data controls are focused not only on users but also on the multiple tools that access the data. Typically, below are the models which help define how data access controls are implemented:

  1. Role-based Access Control: Access based on the role, level of seniority, department, BU of the employee.
  2. Attribute-based Access Control or Policy-based Access Control: Broadening the scope from just the employee’s role, policy-based access control focuses on access basis attributes or characteristics such as subject (i.e., user), resource (i.e., application, API, database), actions (i.e., read, write, edit, copy), environment (time, location, device, network, etc).
  3. Mandatory Access Control: Access control defined using a string set of rules and often based on organisational hierarchy.
  4. Discretion Access Control: One of the most lenient access control systems, control in this set-up is defined by the person owning the data.

Attribute-based access control typically offer highly stringent controls, but can also lead to implementation complexity.

Companies like Cyral, Open Raven, Privacera, and others are building richer data access control capabilities for businesses.

Compliance and regulation management

Compliance and regulation management tools help enterprises with tracking region-specific regulations specific to data privacy and where they stand in terms of compliance towards these regulations. They help enterprises author and distribute data policies and maintain checklists. Regulations and compliances can be industry-specific (PCI-DSS, HIPPA) or geo-specific (EU’s GDPR or California’s CCPA).

Closer home, India recently announced the Digital Personal Data Protection (DPDP) Act, 2023. The act caters to the processing of digital personal data within the territory of India collected online or collected offline and later digitized. It is also applicable to processing digital personal data outside the territory of India, if it involves providing goods or services to the data principals within the territory of India.

The Act has stringent penalty clauses for non-compliance by the data fiduciaries of up to INR 250 Cr (~$30 mn) for failure to take reasonable security safeguards to prevent personal data breach.

The regulatory changes in data privacy across the globe are serving as strong catalysts for businesses to ensure that they set stronger security safeguards around the data they collect and store.

If you’re building something in the data privacy or governance space, we’d love to hear from you.

Arali Ventures is a pre-seed, seed-stage VC from India, investing in entrepreneurs building enterprise-tech solutions for the world. We help shape their journeys through product-market-fit and beyond and scale the offerings to greater heights.

Keep circling back to read our perspectives on enterprise-tech, our portfolio, and seed-stage investing in India.

--

--