Beyond Encryption: Innovating Protection for Personally Identifiable Information (PII) in the Digital Age

Published in

Engineering @ Upstox

12 min readMar 22, 2024

In an era where digital footprints grow larger by the day, safeguarding personal data has never been more crucial. The emphasis on encrypting Personally Identifiable Information (PII) stems from an urgent need to protect sensitive data from breaches and unauthorized access. Here’s a deep dive into why encryption is paramount and how we’ve fortified our defenses at the Profile Service using innovative technologies:

The Shield Against Unauthorized Eyes

Encryption serves as the first line of defense, transforming readable text into a secure code that can only be deciphered with a unique key. This cryptographic process ensures that, even in the hands of unauthorized entities, the data remains a jumble of indecipherable characters. By implementing PII encryption, we significantly reduce the risk of unauthorized access, curbing potential data breaches and safeguarding against identity theft.

A Bulwark in the Face of Breaches

Despite the best-laid defenses, the specter of data breaches looms large. Encrypted data, however, remains a tough nut to crack without the decryption key. This layer of security ensures that, should our defenses be breached, the sanctity of personal information remains intact, protecting individuals’ privacy and security.

Cultivating Trust through Proactive Protection

In the aftermath of a data breach, an organization’s reputation can be tarnished, eroding stakeholder trust. Proactive encryption of PII not only underscores our commitment to data protection but also fortifies our reputation as diligent custodians of personal information. This commitment to security fosters trust and cements our standing as a responsible entity in the digital domain.

Safeguarding Against Insider Risks

The menace of insider threats — be they intentional or inadvertent — cannot be overlooked. Encrypting PII curtails the potential for sensitive data exposure, ensuring that even insiders with access can’t misuse or inadvertently leak information. This selective accessibility guarantees that only personnel with the requisite clearance can decrypt and view the data, minimizing the risk of internal breaches.

Implementing Encryption with Cutting-Edge Technology

At the Profile Service, we’ve embraced AWS KMS for PII Data Encryption, achieving robust security without compromising API latency. Our approach integrates Redis and Elasticsearch, streamlining the encryption/decryption process while maintaining performance. Here’s an outline of our method:

**Key Components in Our Encryption Strategy:**

**Envelope Key (EK):** Unique to each user, the DEK is securely obtained from KMS and never leaves the protective confines of the KMS.

**Data Encryption Key (DEK):**Stored solely in memory(and in Encrypted form in the db), ensuring that the Data Encryption Key (DEK) is shielded at all times.

By allocating a specific DEK for each user, fetched directly from KMS, we ensure personalized security. This meticulous approach to encryption, powered by advanced technologies, exemplifies our unwavering commitment to protecting personal information in the digital age. As we navigate the complexities of data security, our initiatives serve as a testament to the importance of encryption in preserving privacy, maintaining trust, and fortifying our digital ecosystem against ever-evolving threats.

Phase 1: Initial Approach to Encryption Before Tackling Latency

In the foundational stage of our journey to secure Personally Identifiable Information (PII) through encryption, we adopted a straightforward yet effective methodology. Our initial strategy revolved around ensuring that each new or updated user profile was accompanied by a fresh Data Encryption Key (DEK) sourced directly from AWS Key Management Service (KMS). This process was fundamental to our encryption architecture but introduced certain challenges, particularly concerning latency. Let’s delve into the specifics of this design and its implications on our API performance:

The Core Mechanism

The essence of our initial design hinged on interacting with AWS KMS to obtain a DEK for every user creation, update, or even retrieval operation. This approach guaranteed that every piece of PII stored in our database was encrypted with a unique key, significantly enhancing security. However, the continuous reliance on AWS KMS for DEKs during profile API operations led to notable challenges:

Create/Update API Calls: Whenever a new user was added or an existing user’s profile was updated, our system would reach out to AWS KMS to fetch a new DEK. This DEK was then used to encrypt the user’s PII before storing it in our profile database.
Get API Calls: To retrieve a user’s information, our system would first fetch the encrypted DEK from the profile database, then contact AWS KMS to decrypt this DEK. Once decrypted, the DEK was used to decrypt the user’s PII for presentation or further processing.

Illustrating the Design

The workflow for API calls under this design was as follows:

Create/Update Operations:

Initiate a create or update request.
System contacts AWS KMS to generate or retrieve a DEK.
Encrypt the user’s PII with the DEK.
Store the encrypted PII and the DEK (also encrypted with an Envelope Key) in the profile database.

2. Get Operations:

Initiate a request to retrieve user information.
Fetch the user’s encrypted DEK from the profile database.
Contact AWS KMS to decrypt the DEK.
Use the decrypted DEK to decrypt the user’s PII for retrieval.

Facing Latency Head-On

This straightforward encryption model, while robust in security, introduced significant latency into our profile management APIs. The constant need to interact with AWS KMS for DEK operations — especially for frequent create, update, and get requests — contributed to delays that impacted user experience.

In the next phase of our encryption strategy, we aimed to address these latency issues head-on, seeking solutions that maintained our stringent security standards while optimizing performance. This journey towards a more efficient encryption model underscores our commitment to innovation and continuous improvement in safeguarding user data.

Revising Our Approach: Analyzing Latency and Rethinking Encryption Strategy

Upon implementing our initial encryption strategy for protecting Personally Identifiable Information (PII), comprehensive load tests revealed a significant decrease in performance. The requests per second (rps) plummeted from approximately 1500rps to a mere 33rps, prompting an immediate reassessment of our approach. Detailed latency analysis provided insight into the bottlenecks, illuminating the path forward for optimizing our encryption process without compromising security.

Latency Breakdown and Insights

The latency analysis pinpointed the major contributors to the slowdown, especially highlighting the time-intensive nature of operations involving AWS Key Management Service (KMS) and the encryption logic. The detailed data highlighting this is shown in the table below but here are some of the major bottlenecks that we observed.

Generate DEK from KMS/Decrypt: The operation of generating or decrypting a DEK from KMS was identified as the most time-consuming step, consuming up to 241ms per operation in creation scenarios.
Encrypt PII Data: Encrypting PII data, while necessary for security, added significant time overhead, ranging from 115ms to 176ms per creation operation.
Hash PII Data: Hashing PII data for added security contributed an additional 166ms to 173ms per operation.
Decrypt PII Data for Response: In scenarios where decrypted PII data was needed for responses, this step introduced an extra 22ms to 224ms, further impacting latency.
Overall Time by Encryption Logic: The cumulative time taken by encryption-related logic was substantial, with total encryption times reaching up to 590ms for creation operations.

The insights from our latency analysis made it evident that the encryption logic, particularly the dependency on KMS for DEK generation and decryption, was the primary factor in the performance degradation. To address this without compromising the security of PII, we proposed a new design that focused on optimizing KMS interactions and the encryption process:

Caching the DEK: By caching the plain DEK, we aimed to reduce the frequent calls to KMS, thereby decreasing the time spent in DEK generation and decryption for every operation.
Caching Encrypted Data: Similarly, caching encrypted data where feasible could minimize the need for repeated encryption and decryption operations, further reducing latency.

This strategic shift toward caching not only promised to alleviate the performance bottlenecks but also maintained the integrity and security of sensitive user information. By refining our approach to encryption and data protection, we sought to achieve a balance between stringent security measures and optimal application performance, ensuring a seamless user experience without compromising on data safety.

Below were the approximate numbers post addition of caching layers:

Design 2: Optimizing Latency Through Strategic Caching and Workflow Refinement

In response to the latency challenges encountered in our initial approach to encrypting Personally Identifiable Information (PII), we embarked on a comprehensive redesign. This new strategy incorporates sophisticated caching mechanisms and streamlined processes, dramatically enhancing performance without sacrificing security. Let’s explore the key components of this evolved design and its impact on system efficiency:

1. Proactive DEK Management with Bulk Refreshing

We introduced a dedicated Redis cache for Data Encryption Keys (DEKs), maintaining a reservoir of approximately 100,000 fresh DEKs. This cache ensures immediate availability of DEKs for encrypting new or updated user data, significantly reducing dependency on AWS KMS for each operation. A scheduled job replenishes the cache bi-hourly, maintaining the DEK stockpile and virtually eliminating the risk of depletion.

2. Streamlining the New User Experience

For each new user registration, DEKs are now directly retrieved from the DEK cache, facilitating swift encryption of user data. Encrypted DEKs are stored in the database, while plain PII data and keys are cached locally for quick access during subsequent updates or retrieval requests. This setup not only accelerates the user creation process but also ensures robust handling of cache misses by defaulting to AWS KMS, thereby guaranteeing service reliability albeit with a potential latency trade-off in rare cases.

Moreover, we’ve leveraged Elasticsearch to index hashed PII data, enabling efficient search functionality without compromising data privacy.

3. Elevating the Update Process for Existing Users

In our journey to optimize app performance without compromising on data security, we paid particular attention to refining the process for updating existing user profiles. Recognizing the critical need to streamline these updates, we devised an innovative approach that significantly reduces latency and enhances user experience. Here’s a deeper dive into how we reimagined the existing user update flow:

Leveraging the DEK Cache for Efficiency

Central to our improved strategy is the utilization of a dedicated DEK (Data Encryption Key) cache. This cache acts as a reservoir of readily available DEKs, enabling swift encryption and decryption processes without the constant need to interact with AWS Key Management Service (KMS). When an existing user’s profile requires an update, our system first seeks the necessary DEK within this local cache. This direct access to DEKs — and, by extension, to the plain PII (Personally Identifiable Information) data associated with it — significantly slashes the time traditionally spent in cryptographic operations, thereby streamlining the update process.

Mastery Over Cache Misses

Despite our cache’s robustness, we remain prepared for the occasional cache miss scenarios. In the event that a required DEK or specific user data is not found within our local cache, our system is designed to default seamlessly to AWS KMS. This fallback mechanism ensures the decryption of the user’s encrypted DEK directly from the cloud service, allowing the update process to proceed without noticeable delays. Through meticulous planning and strategic cache management, we ensure that these cache misses do not detract from our application’s performance or user experience.

Integration with Elasticsearch for Enhanced Data Searchability

To complement the direct benefits of our DEK cache, we’ve also optimized our use of Elasticsearch. By updating the hashed PII data within Elasticsearch concurrently with profile updates, we maintain the integrity and searchability of sensitive user information. This ensures that any modifications to user profiles are accurately reflected in our search indices, enabling quick and efficient retrieval of updated information without compromising privacy or security.

4. Streamlining Searches on Non-PII Fields

In our comprehensive approach to enhancing the application’s efficiency and security, particular attention has been given to optimizing searches based on non-Personally Identifiable Information (non-PII) fields. This facet of our system design is pivotal, as it frequently underpins user queries and operational workflows within our platform. Here’s an expanded look into the mechanisms that facilitate swift and secure searches on non-PII data:

Leveraging Local Cache for Immediate Access

The cornerstone of our strategy for non-PII based searches is the utilization of a meticulously managed local cache. This cache stores not only the Data Encryption Keys (DEKs) but also the corresponding plain PII data for each user. When a search query is initiated based on non-PII criteria, our system’s first line of action is to retrieve the relevant user’s DEK and unencrypted PII data directly from this cache. This immediate access mechanism is designed to bypass the latency and computational overhead associated with encryption and decryption processes, thus significantly speeding up the response time for search queries.

Intelligent Handling of Cache Misses

Despite the high efficiency of our local cache, we are well-prepared for scenarios where a cache miss occurs. In such cases, our system seamlessly transitions to a fallback procedure, wherein it retrieves the encrypted DEK from the database. Following this, the encrypted DEK is decrypted through a secure request to AWS Key Management Service (KMS), as delineated in our updated user flow. This ensures that even in the absence of cache data, our application is capable of fulfilling search requests without compromising on security or significantly impacting performance.

5. Facilitating Searches on PII Data through Hashed Indexing

In the intricate landscape of safeguarding user privacy while enabling efficient data retrieval, searching based on Personally Identifiable Information (PII) presents a unique challenge. Our approach to this problem exemplifies our innovative and security-centric ethos, employing hashed indexing to ensure both privacy and performance. Here’s a detailed exploration of how we’ve mastered searches on PII data:

Hashing PII Data for Secure Indexing

The foundation of our strategy lies in the transformation of sensitive PII data into a non-reversible hashed format. This process is integral to our encryption workflow, wherein each piece of PII data is encrypted and simultaneously hashed before storage. The hashed versions of this data are then meticulously indexed and stored within Elasticsearch, a powerful, full-text search engine that facilitates rapid data retrieval without direct exposure of sensitive information.

The Dual-Flow Implementation

Creation Flow:

Upon the creation of a new user profile or the updating of existing PII data, our system automatically initiates the encryption process for the PII data.
Concurrently, the system generates a hash of this data, employing robust cryptographic algorithms to ensure the hash is secure and unique.
This hashed data is then indexed within Elasticsearch, creating a searchable reference that corresponds to the encrypted PII stored in our database.

Search Flow:

When a search query based on PII data is initiated, the system first computes the hash of the query’s PII criteria, mirroring the hashing process used during data creation or updating.
With the hashed query in hand, our system then conducts a search within the Elasticsearch indices to locate matching hashed entries.
Upon finding a match, the system retrieves the associated profile ID or other relevant non-PII identifiers from the index, effectively locating the user data corresponding to the search criteria without ever decrypting or directly handling the actual PII data.

Ensuring Privacy and Efficiency

This sophisticated method of leveraging hashed indices for PII-based searches strikes a delicate balance between operational efficiency and uncompromising data privacy. By relying on hashed data for indexing and search operations, we effectively anonymize sensitive information, ensuring that it remains protected throughout the search process. Furthermore, this approach allows for rapid data retrieval, as searches are performed on indexed hashes rather than undergoing computationally intensive decryption processes.

Achievements and Impact

Through the implementation of this refined design, we successfully mitigated the latency issues previously observed, maintaining optimal requests per second and ensuring a seamless user experience. This strategic overhaul not only addressed the performance bottlenecks but also underscored our commitment to data security and operational excellence. By leveraging advanced caching techniques and optimizing our encryption workflows, we’ve set a new standard for secure, efficient, and user-centric application design, reinforcing our dedication to safeguarding sensitive information in the digital age.