Balancing privacy concerns for analytics design

I’ve been working on a pilot project recently with a client to test out some new NoSQL database frameworks (graph databases in particular). Our goal is to see how a different storage model, representation and presentation can enhance the usability and ease of integration for master data indexes and entity data repositories.

It’s relatively easy to install an evaluation version of a data management software package and tinker with its bells and whistles. But when you’re evaluating how a product or tool will fit into an existing operational environment, you may need to assess how the product works using the same data that would be used in a production environment.

The data at this client’s organization contains a significant amount of personally identifiable information (PII). Their environment is closed — only people with the appropriate access rights are allowed to see the data. There are other complications as well. In this situation, for example, the client does not just want to see how a particular type of data management product works — they want to see how the product works with their data in their environment.

Several potential conflicts emerge. The first is obvious: You can’t test how a product will work within a closed environment if you can’t install, play with and test that product as it will be used in that environment. You can configure a test environment whose characteristics mimic the production system’s — and that might provide a reasonable platformwithin which the new tool can be tested. But it does not address the data protection issue.

Testing the product inevitably means testing the product with real data.

Posted on