Data Privacy — Balancing Data Conformance and Performance for Internal Reporting.

Krupesh Desai
Data View House
Published in
13 min readSep 11, 2024

In the previous blog, I referred to data governance as the art and science of balancing data conformance obligations and a business’s data performance needs. In this follow-up blog, I will explore this terrain further with an example, attempting to assess the effort required to meet reporting and analytical needs while fulfilling mandatory compliance to protect Personal Identification Information (PII).

The first part attempts to analyse and rationalise the need for PII across multiple categories and sub-categories of data utilities. The analysis also identifies multiple data governance deliverables to support reviewing existing reports and sustain PII-compliant reporting and analytics in the future. For simplicity, a limited number of PII attributes are used in this example: First Name, Last Name, Gender, Address, PIN code, Date of Birth, and Mobile Number.

Finding the Balance Strategically

In an enterprise with a reporting system running for years and various operational business systems as data sources, the number of reports quickly accumulates to thousands. Ferreting out which reports are critical and still in use is cumbersome and depends on the level of auditing data captured by the reporting tool. In such a situation, analysis of report usage can be fastened up if the reporting system can provide audit logs with the last access date for every report and its users.

If auditing data is not available, guessing games to denote critical ones could be risky. I once faced a situation where my client did not buy the auditing feature for their reporting system, which was running for over ten years. Activating the audit would only start capturing audit data from that point onward. The task in front of us was to identify critical reports out of 2000 odd reports for re-development with the new patient management system as a data source, which was replacing a 22-year-old legacy patient management system with multiple instances.

With thousands of reports in the re-factoring scope, depending on the time on hand to meet PII regulation, businesses can either decide to lock reporting services immediately or (if time permits) review and asses existing reports and other data utilities for PII exposure risk. Suppose the situation demands a knee-jerk reaction to lock all reports. In that case, it comes with a trade-off of delayed operational and management decisions due to increased report delivery time. The users must request each report, which is then evaluated for PII exposure risk and re-factoring. These delays could go on for a long in an organisation with a reactive culture and leadership.

Effective data governance is only possible through collaboration between an organisation’s business, IT, and legal teams. Businesses must liaise with a legal team or consultant to assess the current data usage within the scope of fair and transparent usage of PII by people consuming data (manager, analyst, executive, manager) or working with data (DBA, ETL developers, data engineer, report developer). PII compliance also includes establishing means and tools to allow data subjects to exercise their right to be forgotten (delete all data about an individual) and ensuring that businesses have the consent of the data subjects for their PII to be used in internal data analysis purposes.

Age-old Tradeoffs in Data Governance— Tug-of-War

Strategic implementation of a data governance program that fine-tunes and enforces data access policies based on data classifications coupled with the quantitate appetite of accepting certain risks could assist the stakeholders in making informed and evidence-based decisions that balance data compliance and performance in existing data utilities and set the foundation for compliant data utilities in the future. Note that privacy protection is not only about preventing access to personal information from unauthorised use but also about facilitating timely access to personal information to the right person for the right reasons. Let’s try to find the balance below.

Data Utilities — Categories and Sub-Categories

Before reviewing existing reports, let’s take a step back and look at the various types of data utilities to assess the need for PII for each type. To do so, I grouped various data utilities into the following categories and sub-categories to analyse the rationality of PII existence in each.

Data Utilities — Categories and Sub-Categories

The logic behind the PII exposure assessment for each category of data utilities is the notion of data minimisation that raises a valid question — why would you need an individual’s personal information to make a business decision? PII on reports must be exceptions or edge cases with adequate access management controls. Regular occurrence of PII exposure in reports invites unauthorised PII exposure risk.

Note : The categorisation consider the complete spectrum of data utilities. However, in terms of balancing data compliance and performance, this blog only covers reporting and analytical utilities.

Category 1 — Operational

Executing core business processes such as fulfilling a sales order, booking a surgical appointment, and processing a purchase order to update inventory are operational activities fundamental for a business to exist and include critical business entities such as customers, patients, vendors, warehouses and stores. PII should be accessible in a timely manner for operational use cases to execute a business transaction and support the customer during or after the transaction. Operational data utilities can be further split into three sub-categories.

1.1 Record Level CRUD (Create — Read — Update — Delete)

Imagine calling your bank and receiving no assistance because the representative cannot access your personal information. Even better, imagine you visiting a hospital for a multi-day stay due to a follow-up surgery, and the medical staff do not have your past history and allergy details because they could not migrate data from the old system to the recently implemented health record system.

Operational systems are supposed to collect and update individual information to store the business transactions, which are the primary purposes for which data is collected. However, the data controller entity must prevent unauthorised access to such core business systems by implementing access policies such as role-based or attribute-based access models. An entity can go a step further beyond best practices and get its business system adhere to ISO270001 or ISO385505 standards for Access Management to ensure auditable adherence to PII compliance.

1. 2 Capacity Planning

Operational Managers / Analysts are the primary consumers of reports under this sub-category of operational reports. Such reports consume recent operational data and plot trends and forecasts to assist in tactical, operational decision-making, such as human-resource planning and estimating purchase orders for the next month. Capacity planning reports assist managers and executives in keeping the show running. It must be rationalised and re-factored if your operational managers are consuming reports with PII. It is expected to have operational reports with PII for valid reasons, such as executing weekly or monthly tasks like sending appointment letters and making invoice-due reminder calls.

If feasible, push capacity planning reports with PII as a functionality in the operational business systems. It would incur a cost (the tradeoff), but it would place such reports under the Record-Level CRUD category, where access is controlled and audited by system-level access management. It is reasonable to expect contemporary SaaS and on-premise solutions for critical business systems to include operational data stores that support operational reporting within the system.

However, if the operational system does not support the functionality of operational reports due to technical or budget limitations, it’s important to refer to the data governance deliverables section below. This will help prevent unnecessary PII exposure on operational reports, which could lead to serious data breaches and legal consequences.

1.3 Throughput Monitoring

The third type of operational reports and dashboards I have seen and built focus on monitoring Operational KPIs, which are mostly throughput of some business activities or events. Some examples are orders shipped and delivered today and over the week, patients discharged and admitted in the last 24 hours/this week/month, health and safety incidents this week, etc. On such reports, transactional business data is aggregated and monitored for a short period from daily, weekly, and monthly to quarterly maximum.

With mostly aggregated data displayed as KPIs, it is difficult to rationalise the presence of the PII of an individual customer, employee, or vendor on throughput reports. At best, data may be aggregated by age, gender, pin code, city, state, or country. If access to individual records is required, the operational manager should use the business system ( CRM or ERP) to find individual details instead of demanding them on a report. (I smell PII Access Policy here.)

Category 2 —Management

Management reports are those consumed by executives and senior management to make strategic business decisions or monitor the status quo of the business against desired KPIs or strategic goals. While some reports are periodic, such as financial reporting, quarterly sales reports, and KPI reports for board meetings, senior leadership also requires ad hoc reports to address contemporary challenges requiring evidence-based decision-making.

Ideally, semantic data models powering management reports should include only aggregated transactional data with granularity up to a daily level. Including or excluding PII in the data aggregation granularity is a matter of balancing the tradeoff between data access and utility. One potential solution could be to limit PII attributes in data aggregation granularity for management reports to pin code, city, state, month of birth, and gender, denoting them with lower sensitivity compared to first name, last name, and address, which can easily identify a unique individual.

However, looking at the rationales for management reports with PII, we could find exceptional cases where executives and senior management need to access PII, such as listing the top 10 customers of the last quarter or year to revert with a ‘Thank you’ message and a strategic license renewal deal for multiple financial years. Therefore, preventing management reports from PII altogether is not a feasible tradeoff for PII compliance. In such cases, data governance deliverables (described below) such as a role-based access model, PII access policy and data catalogue with data and report classifications coupled with a robust master data management can assist in minimising the PII exposure risk.

With PII exposure inevitable in management reports, data governance plays a crucial role in ensuring that PII is accessible to the right person for the right reason under the umbrella of fair and transparent data usage, which is a key business need expressed under the privacy policy. Not having robust access controls on management reports and underlying data environments is a compliance risk that could expose PII to unauthorized persons.

Category 3 —Analytics

The last category of data utilities is analytics, which includes data utilised for statistical analysis and machine learning. Once again, rationalising the need for PII for such analytical needs could justify the need for age group, gender, and geography. Individual names and precise addresses are not required for statistical analysis or modelling. Although de-identified, specific PII attributes are inevitable for statistical needs. Therefore, Similar data governance obligations as Management reports would apply to Analytical use cases, i.e. Data used to build or train a statical model should not have means to identify an individual uniquely. On top of this, user consent may come into the picture depending on the depth of PII regulations you are mandated to comply with.

Data Governance Deliverables

I recommended the following key areas as the starting point when crafting a data governance strategy. We will look at how this applies to PII protection in reporting and analytics.

  1. Metadata Management — Via Data Catalogue and Business Glossary

A data catalogue prepared by extracting all metadata from the reporting system would provide a means to tag and categorise existing reports and assist in re-factoring prioritisation and deletion choices based on their categories. Along with the data catalogue, metadata management with a business glossary that allows additional metadata such as business definitions, context and classification can further enrich PII management by linking data classifications to actual tables, reports and individual columns (of report or table) containing PII attributes.

Therefore, under the metadata management component of data governance, setting up an efficient data catalogue solution, defining and enforcing data classification and access policies and training the reporting and analytics teams are the critical deliverables to meet PII compliance in reporting and analytics. Note that the reporting and analytics team members would eventually review and re-factor existing reports and adhere to data classification policies in new report and model development.

2. Risk Management Framework

Finding the right balance between data conformance and performance depends on the risk appetite, i.e. how much risk the data controller can accept, including the residual risk that remains even after mitigation efforts. Quantifying data privacy risks in the reporting system will assist in identifying and mitigating high-impact risks, which could become showstoppers and incur penalties. It will also identify risks within the digestible risk appetite of the data controller to improve data performance.

For example, limiting PII data points for reporting and analytical needs to encrypted unique IDs, month of birth, pin code, and gender does not entirely mitigate PII exposure risk. This method for PII protection in master data suggested below is a classic data de-identification technique where sensitive data is protected by anonymising or dropping a part of data that could link to a particular individual. De-identification techniques can be compromised by linkage attacks that leverage joint information from external data sources or homogeneity attacks that exploit the scarcity of data. However, the potential impact of the risk in terms of penalties and harm caused to data subjects is high when anonymised data is in public and out of the data controller’s reach to prevent linkage or homogeneity attacks. With internal systems, robust data security and access control mechanisms with rich audit data and disabled csv/excel exports can prevent and detect such unauthorised linkage tasks. Such robust controls support the data controller’s appetite to accept the minimised risk of PII protection while utilising the data for reporting and analytics.

There exist a few frameworks for privacy risks, such as:

However, it is worth investigating if any existing, already in-use risk management framework can be leveraged. A large and regulated enterprise usually have formal enterprise risk management framework based on OWASP Risk Rating Methodology whereas IT security team could already be using Microsoft DREAD for the cyber security risk assessment.

Data governance deliverables should include the selection and implementation of the proper privacy risk management framework or integration with existing enterprise risk management frameworks. A quantitative risk management framework could assist the data controller in finding the right risk tolerance level to meet the data performance objective while ensuring the best possible controls to ensure PII protection with fair and transparent data usage.

3. Master Data Management

Suppose customer data and consent originate and are managed in multiple systems, typical in a large enterprise. In that case, data governance deliverables should include master data management policies to integrate, reconcile, and create a master record for each customer with their current consent for data usage permissions. Master data management practices for customer data will ensure one source of PII across all reports and analytical models. With proper data classification for the master table and individual table columns, report developers and data scientists can confidently rely on enriched and trusted metadata and PII protection guidelines ( listed in the Data Access Policy).

Customer Master Data With Data Classification

From the limited PII attributes selected in this example, I would classify, encrypt, hide and modify data of a few attributes as shown above and create a separate and confidential Customer Master View table accessible for users in the roles of report developers, data engineers and data scientists. The parent table and derived table have limited attributes, and both are tagged with Regulatory Sensitivity of PII. Apart from PII, financial data, personal health data and educational records have regulatory compliances. The parent master table is classified as ‘Restricted-Confidential’ while derive table is ‘Confidential’. Both schemes have different access guidelines, as outlined in the data classification policy and enforced through the data access policy.

Confidential — Internal data available for reporting and statistical analysis, but it cannot be shared outside without customer consent and a non-disclosure agreement with the third party. Highly regulated organisations can have independent data-sharing policies requiring the application of privacy enhancement technologies before any dataset with PII can be shared outside.

Restricted-Confidential — Confidential data is further restricted and is only available on a ‘need-to-know’ basis for a limited number of authorised users. The retrieval of original PII information from the encrypted Unique Identifier should be an auditable operation available only for a few selected user roles. This feature can be facilitated by a master data management solution or a custom-built application designed to manage and secure restricted-confidential data.

Conclusion

Medium estimates it’s already an eleven-minute read . This is a topic that, although simplified and narrowed, carries significant weight and requires attention to details with careful planning and execution. In the real world, the scope of balancing data governance to protect PII with minimum impact on data utilities would span over the complete journey from datum to intelligence, covering Origin, Storage, Process, and Access.

Although an organisation can justify its data utilities under fair and transparent data usage, it still requires data security, access management and data retention controls as data governance or IT governance deliverables that prevent unauthorised access to PII. In this blog, we have identified key data governance activities and deliverables, and the quantitative risk appetite of the organisation as critical factors in balancing reporting and analytical needs with PII regulatory adherence. We also looked at how master data management practices can provide an abstract layer of de-identifiable attributes for data analysis.

Lastly, do not underestimate the power of training and education. Basics 101 on good data hygiene, types of privacy harms, and best practices to protect PII must be taught and regularly reminded to those who consume or work with data for the data controller. Training and education are essential, with the key message being: “Always treat others’ data and privacy with the same respect and care that you would want for your own.”

Thank You.

Thank you if you have read this story completely. I hope my ideas and writting assists someone looking for the right balance between PII conformance and performance. Please share your thoughts, opinions, or advice in the comment section, which could assist me to further research on this topic.

--

--

Krupesh Desai
Data View House

A Certified Data Management Professional - CDMP Associate , solving data-intensive problems, creating value, sharing the Data View House™ school of thoughts.