The impact of personal data stores on enterprise data and AI strategy

Explore the new data integration patterns of information democracy

Jeremy Caine
Technology Futures
Published in
9 min readOct 3, 2023

--

Personal data stores and a single Web identity that has authority over them are the key tenets to democratising our information. As the transformation of the Web happens around us, so too does the landscape of enterprise data. The technology possibilities for enterprise data and artificial intelligence (AI) are evolving dramatically and rapidly. The rise of Generative AI, combined with sovereignty and privacy laws, with new and exciting ways to manage data demands a review of enterprise data strategy.

The ways an organisation manages and leverages the data it collects and generates varies depending on its business objectives, operating model, and industry. All organisations interact with the information ecosystem and digital economy of the Web and as the Web evolves to a democratised and de-centralised information system with personal data stores and new open standards, then data strategies must also evolve.

In a nutshell

  • Future-proof your data strategy with readiness for personal data store integration
  • Respect personal data stores as sovereign islands of an individual’s data
  • Design data and AI systems using new patterns of personal data integration

It is useful to revisit data strategy thinking across proven frameworks to set a new direction for enterprise architecture and technology implementation plans. Not having a data strategy and strong data management functions is no longer acceptable, especially as we are now seeing AI becoming an industry and board-level concern.

Strategy and Enterprise Architecture

A data strategy is usually positioned as either defensive or offensive. Data defence minimises risk, increases control and limits attack. Data offence emphasises flexibility so that the business can increase revenue, customer satisfaction, insight and decision making. Organisations of course need to balance all these factors to operate, but research and analysis suggests one will be dominant.

MIT CISR developed a simple view of an organisation’s operating model which influences the logical and physical strategy for data management within that organisation. The business operating model of units, divisions and countries influence the enterprise IT operating model. The degree by which business processes are standardised and integrated sets a direction on how process, data and technology are implemented.

MIT CISR Operating Model

An organisation’s business model drives its operating model. That combined with the data strategy influences its data and information architecture. This architectural thinking drives the shape and style of information technology systems.

When it comes to system implementation there may be several styles. As an example, let’s consider a bank that operates in multiple countries. Banks typically have a Unification operating model. Their approach to physical implementation of core banking might be an instance of the same core banking package (same application code, same core data schemas) in each country. And that instance is configured for the banking rules of that country, and it stores only customer and transaction data specific to that country i.e., design once, implement many. Another approach might simply be to have different banking packages from different vendors (or custom built) as instances in the different countries and each country maintains its own configuration and be accountable for implementing global functions e.g., feeds into global liquidity reporting.

The State of Enterprise Data

Data in the enterprise is a combination of data in enterprise systems that has been captured, generated, and processed into other forms. And today, that data assembly is deeply integrated into the Web. A person shares information (consent) with the enterprise to be a customer and consume a service or purchase a product from that business. Information about a person also makes its way into public data sets, for example people movement patterns in transport data sets. An enterprise might make use of those transport data sets for its own process requirements.

In our banking example, a customer may have bank accounts in multiple countries, but the data store of customer information will be separated, and regulation will likely prohibit sharing of information e.g., credit score, transaction history. In the past some multi-country banks might have implemented a single instance of the core system (or a very few globally replicated — not untypical for global HR and ERP systems). Now with the rise of data privacy and sovereignty regulations we are finding that physical instances of data (and operators of their systems) now need to be in the country.

The ”right to be forgotten” is a relatively new legal concept that grants people limited rights to have their data removed from digital footprints maintained and managed by organisations. Including EU countries, around 40 of the United Nation’s 195 countries of the world have this concept (Source: Google Bard; Anthropic Claude).

Key data questions to ask in the design and implementation of systems are:

  1. How do we absolutely know the physical location of all data about a person that might need to be deleted?
  2. Do we still need to delete data if we have sanitised it?
  3. How can we ensure we have met the duty of care in relation to data that we are legally required to retain?

Artificial intelligence systems are of course founded on data and AI is now a board level topic. As the integration of machine learning into business process increases then so does the complexity of managing the data of models. Data is often sanitised to train machine learning models but not always. There is then the debate as to whether true and complete sanitisation can be achieved. Further, the enforceability of the right to be forgotten on sanitised data is a grey area.

As data traceability and AI-first business models becomes more complicated, one tactic for enterprise data strategy is to get ahead of these challenges with personal data stores. As the demands for information democracy increases, so too will the adoption of personal data stores. Architecting information systems for personal data stores will require new approaches to data integration. Equally, personal data store providers will need to offer capabilities that enable responsible information sharing yet support the realities of latency and artificial intelligence systems.

The Impact of Personal Data Stores

There is a future when all our personal data is normalised across the web. We will likely have multiple personal data stores that hold different pieces of our personal data universe, and these stores might be physically hosted in different locations of the world. But importantly, we will have the secure control of that data universe across those personal stores.

As IT system builders we will need practical scenarios of how to consume this category of data. Key factors are the latency of getting to and using the data in a transaction; along with the type of processing needed on the data e.g., combining into a larger data set for use in a machine learning model.

Personal Data across the Web

This personal data store future enables us to exercise control over our data universe. We would like to think of the personal data store as a sovereign island. Yet, the physical location of the data stored — and the platform operator of the data store — are subject to the data sovereignty laws of the country the store is hosted in. Then an organisation that needs to utilise your information (in the store) to provide products and services to you, is further bound by local regulations on its operation.

The complexity of data privacy laws now arise.

At first glance, the introduction of a new asset — the personal data store — is another moving part in an already complicated world. However, the concept and open standards-based implementation of personal data stores can become the new advantage.

W3C Solid is the movement that is creating the protocols, specifications, and direction of the standards to implement personal data stores (a.k.a. pod) and use of standard web identities.

The ubiquity of personal data stores will likely evolve on two fronts.

Enterprises and organisations holding your personal data could move that data into a W3C Solid compliant pod and host on your behalf, giving you access and control over that data. Maybe initially this store only allows applications internal to the organisation to access the data — a restricted implementation of a pod.

In parallel, independent pod providers — commercial or as a public service from government — will emerge. Early adopters will take advantage of pod-oriented management tools (say, identity documents) and establish their own personal data stores. The chicken and egg tipping point will be when the pod providers start to offer integration components to digital properties in the same way that Stripe or Paypal offer payments integrations to e-commerce sites like Expedia or eBay, or creator platforms like Squarespace or Wordpress.

The New Patterns of Personal Data Integration

New patterns of integration to personal data stores will emerge. The premise of these patterns is that applications that organisations operate to provide us services and products will have short or long running leases on the data in a personal data store. After the lease time has expired the application can no longer have the data and must delete it (or potentially return it to the store augmented).

Consume and Forget

Data is consumed within the transaction of the calling application. The most common implementation of this today is the OpenAPI, gRPC or REST call. The store grants access to the data and the application reads the data for the purposes of the transaction. The most important consideration in using data from personal data stores this way is latency.

Consume and Forget Pattern

The Web is filled with well-architected and performant digital ecosystems that operate at huge scale built on HTTP interactions. They could easily make the switch to integrate Consume and Forget transactions with personal data stores. This pattern could evolve to enabling a direct access connection to pod data that auto-invalidates after a prescribed time.

Copy and Self-destruct

Analogous to an online video rental. A requesting application takes a copy of the data for a prescribed time, after which it is “returned” (meaning deleted from the view of the application).

Copy and Self-destruct Pattern

This pattern is useful when the data needs to be held by the calling application for a longer period of time. It can apply wherever there is a long running process, in particular a regulated process like a financial credit assessment or a supply chain order. The organisation requesting the data implements a mutually agreed “self-destruct” of the data copy, in line with the data privacy and retention law in effect.

Borrow and Promise

When we start to look at the realities of data retention policies around the world, we must consider this pattern. This pattern will be predominant where organisations are required to hold onto pieces of data about us as evidence that their regulator might be interested in. In this pattern the calling application is given access to copy a piece of data. It must then promise to give that data back when the personal store demands it.

Borrow and Promise Pattern

The promise is a contract between the application consumer and the personal store. Now comes the dilemma. The personal data store cannot unilaterally demand its data back e.g., be deleted at the application consumer end. We are in the realm of rules and laws of “right to be forgotten” vs. data retention regulation. This pattern is going to require the store and its consumer to establish an executable technical contract between the systems end points, ultimately driven by the rule set (probably set by the organisation and the regulation it is under).

Conclusion

Personal data stores represent an enabling technology to increase information democracy.

As protocols like W3C Solid emerge as the next layer of the Web, then enterprise data strategy needs to evolve. Artificial intelligence and the trust and provenance of data used to create those models is a hot topic for the technology industry and regulators. Where and how an organisation goes and gets the data to use in machine learning model development is hugely important.

With this advance in enterprise AI and the responsibility that comes with it, there will be a shift to modernise and transform implementation approaches to data. This presents an opportunity for enterprise data strategy to assess the impact personal data stores will have on enterprise architecture and get ahead of this evolution. It will require the creation and the development of further open standards, leading practices of data collaboration, and new data integration patterns.

(Views in this article are my own.)

--

--

Jeremy Caine
Technology Futures

Using technology, creativity and insight for positive change in the world.