Navigating the data ecosystem technology landscape

Hannes Ricklefs
BBC Product & Technology
5 min readAug 12, 2019
Credit: Jasmine Cox

By Hannes Ricklefs and Max Leonard

Want to message your Facebook friends on Twitter? Move your purchased music from iTunes to Amazon? Get Netflix recommendations based on your iPlayer history? Well, currently you can’t.

Many organisations are built on data, but the vast majority of the leading players in this market are structured as vertically integrated walled gardens, with few (if any) meaningful interfaces to any outside services. There are a great number of reasons for this, but regardless of whether they are intentional or technological happenstance (or a mixture of both), there is a rapidly growing movement of GDPR supercharged technologists who are putting forward decentralised and open alternatives to the data-moated household names of today. For the BBC in particular, these new ways of approaching data are well aligned with our public service ethos and commitment to treating data in the most ethical way possible.

Refining how the BBC uses data, both personal and public, is critical if we are to create a truly personalised BBC in the near term and essential if we want to remain relevant in the coming decades. Our Chief Technology and Product Officer Matthew Postgate recently spoke about the BBC’s role within data-led services, in which he outlined some of the work we have been doing in this respect to ensure the BBC and other public service organisations are not absent from new and emerging data economies.

Alongside focused technical research projects like the BBC Box, we have been mapping the emerging players, technologies and data ecosystems to further inform the BBC’s potential role in this emerging landscape. Our view is that such an ecosystem is made up of the following core capabilities: Identity, data management (storage, access, and processing), data semantics and the developer experience, which are currently handled wholesale in traditional vertical services. A first step for us is hence to ascertain which of these core capabilities can realistically be deployed in a federated, decentralised future, and which implementations currently exist to practically facilitate this.

Identity, a crucial component of the data ecosystem, proves who users say they are providing a true digital identity. Furthermore we expect standard account features such as authentication and sharing options via unique access token that could enable users to get insights or to share data to be part of any offering. We found that identity, in the context of proving a user’s identity, was not provided by any of the solutions we investigated. Standard account features were present, ranging from platform specific implementations, to decentralised identifier approaches via WebID, and blockchain based distributed ledger approaches. As we strongly believe it is important to prove a user is who they say they are, at this point we would look to integrate solutions that specialise in this domain.

Data management can be further broken down into 3 areas:

  1. Data usage and access, involves providing integration of data sources with an associated permission and authorisation model. Users should have complete governance of their data and usage by data services. Strong data security controls and progressive disclosure of data are key here. Given our investigation is based around personal data stores (PDS) and time series sensor/IoT device data platforms to capture personal, public and open data, providing access and controls around sharing of data was a fundamental capability of all offerings. All of them provided significant granularity and transparency to the users about what data is being stored, its source and usage by external services.
  2. Data storage must provide high protection guarantees of users’ data, encrypted in transit and at rest, giving users complete control and transparency of data lifecycle management. Again, this is a fundamental requirement, such that storage is either a core offering of any platform or outsourced to external services that store data in strongly encrypted formats.
  3. Data processing mechanisms to allow users to bring “algorithms” to their data, combined with a strong contract based exchange of data. Users are in control and understand what insights algorithms and services derive from their data. These might include aspects such as the creation of reports, creation and execution of machine learning models, other capabilities that reinforce the user’s control over how their personal data is used for generated insights. Through contract and authorisation based approaches users have complete audit trails of any processing performed which provides transparency of how data is utilised by services, whilst continuously being able to detect suspicious or unauthorised data access. Our investigations found that processing of data is either through providing SDKs that heavily specify the workflow for data processing, or no provisioning at all, leaving it to developers to create their own solution.

Data model and semantics, refers to mechanisms that describe (schemas, ontologies) and maintain the data domains inside of the ecosystem, which is essential to provide extensibility and interoperability. Our investigations found this being approached in a wide spectrum from:

  1. no provision requiring developers to come to conclusions about the best way to proceed
  2. using open standards such as schema.org and modeling data around linked data and RDF
  3. completely proprietary definitions around schemas within the system.

Finally the developer experience is key. It requires a set of software development tools to enable engineers to develop features and experiences as well as being able to implement unique value propositions required by services. This is the strongest and most consistent area across all our findings.

In summary our investigations have shown that there is no one solution that provides all of our identified and required capabilities. Crucially the majority of the explored end user solutions are still commercially orientated, such that they either make money from subscribers or through associated services.

So with the number of start-ups, software projects and standards that meet these capabilities snowballing, where might the BBC fit into this increasingly crowded new world?

We believe that the BBC has a role to play in all of these capabilities and that it would enhance our existing public service offering: to inform, educate and entertain. A healthy ecosystem requires multiple tenants and solutions providers, all adhering to core values such as transparency, interoperability and extensibility. Only then will users be able to freely and independently move or share their data between providers which would enable purposeful collaboration and fair competition toward delivering value to audiences, society and industry.

The BBC was incorporated at the dawn of the radio era to counteract the unbridled free-for-all that often comes with any disruptive technology, and its remit to shape standards and practices for the good of the UK and its population stands today as it did in 1927. With a scale, reach and purpose that is unique to the BBC, it is strongly congruent with our public service duty to help drive policy, standards and access rights to ensure that the riches on offer in these new ecosystems are not coopted solely for the downward pursuit of profit, and remain accessible for the benefit of all.

--

--