“Give me insights, but don’t look at my data” — Introducing KnowledgeX: A Trust-aware Data Science Platform
Over the past years, value creation for many businesses has become more and more data-driven. Energy companies reduce their CO2 footprint by analyzing their operations, logistics companies shorten their supply chain by optimizing their processes, and healthcare service providers improve their patients’ lives through data-driven prognostics.
Creating knowledge from data often requires highly specialized data scientists with broad knowledge and expertise in data analysis and, lately, machine learning.
Nevertheless, the process of transferring highly valuable data to a data scientist for knowledge generation is privacy and trust-intensive. Data owners want to be sure that the data scientist has the experience needed and only uses the data for the agreed-upon task.
KnowledgeX is a decentralized platform that
- lets companies find specialized data scientists for their specific use cases in a marketplace and
- enables traceable, transparent, and auditable knowledge generation
The platform uses blockchain, decentralized cloud computing, trusted execution environments to create trust in an environment where data is a valuable resource but expertise is needed to gain insights. Data owners can outsource data science tasks to independent contractors without risking the loss of data. Independent data scientists can bid on proposed tasks without getting prior access to confidential data.
Currently, data markets are hampered by confidentiality requirements due to competitive (e.g., cost data) or regulatory (e.g., personal data) considerations. Data scientists have to be employed in-house or are contractually restricted by non-disclosure stipulations, which tend to be ambiguous and costly to enforce.
KnowledgeX aims to solve this problem via a process where data privacy and contract fulfillment are technologically guaranteed, so the need for non-disclosure agreements does not arise.
The Problem: Give me insights, but don’t look at my data
KnowledgeX tackles a well-known problem in data science. Over the last couple of years, many companies concluded that data is the new gold and started collecting data.
But collecting data is the easy part. Data might be more aptly described as the raw ore that contains nuggets of gold. What is hard is making sense out of collected data. For this, specialist data scientists are needed.
Most data owners do not have the necessary knowledge in machine learning and data analysis themselves. Even if they have, they might also have capacity limits. At the same time, they are reluctant to give away their valuable data to people they do not know. Large companies started to build up their own data science departments with specialists. But smaller companies like a water infrastructure operator with four employees do not have the means to hire expensive full-time specialists. Freelancers would be an option. But the problem with data science freelancers currently is, that the data owners need to be sure to trust them and to find suited data science specialists at all.
The Solution: Shared problem solving based on confidential data
KnowledgeX consists of two parts: the data science marketplace and a trustworthy data science computation environment.
The KnowledgeX marketplace connects data scientists and data owners. Data owners can create data science gigs and specify which specialized skills they need for the gig. For example, a water infrastructure operator needs a specialist in time series analysis and anomaly detection to predict which of the water pipes are most worn out and likely to burst. KnowledgeX selects the three best-suited candidates for the gig and lets the data owner decide which one to choose.
The KnowledgeX confidential gives data owners the choice to decide where their data is being processed. They can also decide what a data scientist can do with the data in general by defining processing agreements. Processing agreements, authorizations, and proofs of executions are stored on a blockchain so that their integrity can be guaranteed. KnowledgeX lets data owners audit what happened with their data afterward. This builds trust that in current platforms does not exist.
The Technology: Three options to run KnowledgeX
Data owners have three different options for data processing.
Option 1: On-Premise
If a data owner wants that their data never leaves their own house, KnowledgeX enables them to hook up their own computing resources to the KnowledgeX decentralized cloud. The data scientist can create data analysis scripts on their local machine only using obfuscated sample data. The real data sets are purely processed in the data owner’s infrastructure.
Option 2: Traditional Cloud
If the data is not sensitive, KnowledgeX enables the second option to execute data science tasks in a normal cloud environment using iExec workers.
Option 3: Trusted Cloud
For the most sensitive data (e.g., patient health care records or banking transaction data), the data owner has the possibility to require the data scientist to execute all data science tasks in a trusted execution environment. KnowledgeX uses Intel SGX enclaves. With SGX attestations, it is possible to verify that all tasks have been authorized and to audit the code. Any violation of processing agreements can be audited and detected.
The Potential: Enable a global market for contracted data science
In 2020, more than 250,000 data science positions could not be filled worldwide . Data scientists are highly in demand and need to be highly skilled with at least ⅔ of all of them having a master or Ph.D. degree . KnowledgeX can help bring these sides in a trusted way together like no other solution on the market right now.
Learn more about Want to stay up to date with KnowledgeX?
Sign up for the newsletter for technical updates and future opportunities for funding and partnering here: https://www.knowledgex.eu/stay-up-to-date
About the author: Marcel Müller is a German deep tech entrepreneur and researcher with a passion for bringing innovations from research to the market. He is the CEO of JadenX, a company that develops deep tech innovations together with partners. Marcel is also the founder of KnowledgeX, a paradigm-changing data science marketplace that uses blockchain and trusted execution environments for trusted collaboration. Furthermore, Marcel is a researcher at SNET at TU Berlin.
 “The Data Science Shortage 2020”, Quanthub in 2020, https://quanthub.com/data-scientist-shortage-2020/
 “The State of Machine Learning and Data Science in 2020”, Kaggle, https://storage.googleapis.com/kaggle-media/surveys/Kaggle%20State%20of%20Machine%20Learning%20and%20Data%20Science%202020.pdf