“Give me insights, but don’t look at my data” — Introducing KnowledgeX: A Trust-aware Data Science Platform

Over the past years, value creation for many businesses has become more and more data-driven. Energy companies reduce their CO2 footprint by analyzing their operations, logistics companies shorten their supply chain by optimizing their processes, and healthcare service providers improve their patients’ lives through data-driven prognostics.

Creating knowledge from data often requires highly specialized data scientists with broad knowledge and expertise in data analysis and, lately, machine learning.

Nevertheless, the process of transferring highly valuable data to a data scientist for knowledge generation is privacy and trust-intensive. Data owners want to be sure that the data scientist has the experience needed and only uses the data for the agreed-upon task.

KnowledgeX is a decentralized platform that

  • lets companies find specialized data scientists for their specific use cases in a marketplace and
  • enables traceable, transparent, and auditable knowledge generation

The platform uses blockchain, decentralized cloud computing, trusted execution environments to create trust in an environment where data is a valuable resource but expertise is needed to gain insights. Data owners can outsource data science tasks to independent contractors without risking the loss of data. Independent data scientists can bid on proposed tasks without getting prior access to confidential data.

Currently, data markets are hampered by confidentiality requirements due to competitive (e.g., cost data) or regulatory (e.g., personal data) considerations. Data scientists have to be employed in-house or are contractually restricted by non-disclosure stipulations, which tend to be ambiguous and costly to enforce.

KnowledgeX aims to solve this problem via a process where data privacy and contract fulfillment are technologically guaranteed, so the need for non-disclosure agreements does not arise.

The Problem: Give me insights, but don’t look at my data

KnowledgeX tackles a well-known problem in data science. Over the last couple of years, many companies concluded that data is the new gold and started collecting data.

But collecting data is the easy part. Data might be more aptly described as the raw ore that contains nuggets of gold. What is hard is making sense out of collected data. For this, specialist data scientists are needed.

Most data owners do not have the necessary knowledge in machine learning and data analysis themselves. Even if they have, they might also have capacity limits. At the same time, they are reluctant to give away their valuable data to people they do not know. Large companies started to build up their own data science departments with specialists. But smaller companies like a water infrastructure operator with four employees do not have the means to hire expensive full-time specialists. Freelancers would be an option. But the problem with data science freelancers currently is, that the data owners need to be sure to trust them and to find suited data science specialists at all.

The Solution: Shared problem solving based on confidential data

KnowledgeX consists of two parts: the data science marketplace and a trustworthy data science computation environment.

The KnowledgeX marketplace connects data scientists and data owners. Data owners can create data science gigs and specify which specialized skills they need for the gig. For example, a water infrastructure operator needs a specialist in time series analysis and anomaly detection to predict which of the water pipes are most worn out and likely to burst. KnowledgeX selects the three best-suited candidates for the gig and lets the data owner decide which one to choose.

The KnowledgeX confidential gives data owners the choice to decide where their data is being processed. They can also decide what a data scientist can do with the data in general by defining processing agreements. Processing agreements, authorizations, and proofs of executions are stored on a blockchain so that their integrity can be guaranteed. KnowledgeX lets data owners audit what happened with their data afterward. This builds trust that in current platforms does not exist.

The Technology: Three options to run KnowledgeX

Data owners have three different options for data processing.

Option 1: On-Premise

If a data owner wants that their data never leaves their own house, KnowledgeX enables them to hook up their own computing resources to the KnowledgeX decentralized cloud. The data scientist can create data analysis scripts on their local machine only using obfuscated sample data. The real data sets are purely processed in the data owner’s infrastructure.

Option 2: Traditional Cloud

If the data is not sensitive, KnowledgeX enables the second option to execute data science tasks in a normal cloud environment using iExec workers.

Option 3: Trusted Cloud

For the most sensitive data (e.g., patient health care records or banking transaction data), the data owner has the possibility to require the data scientist to execute all data science tasks in a trusted execution environment. KnowledgeX uses Intel SGX enclaves. With SGX attestations, it is possible to verify that all tasks have been authorized and to audit the code. Any violation of processing agreements can be audited and detected.

The Potential: Enable a global market for contracted data science

In 2020, more than 250,000 data science positions could not be filled worldwide [1]. Data scientists are highly in demand and need to be highly skilled with at least ⅔ of all of them having a master or Ph.D. degree [2]. KnowledgeX can help bring these sides in a trusted way together like no other solution on the market right now.

Learn more about Want to stay up to date with KnowledgeX?

Sign up for the newsletter for technical updates and future opportunities for funding and partnering here: https://www.knowledgex.eu/stay-up-to-date

About the author: Marcel Müller is a German deep tech entrepreneur and researcher with a passion for bringing innovations from research to the market. He is the CEO of JadenX, a company that develops deep tech innovations together with partners. Marcel is also the founder of KnowledgeX, a paradigm-changing data science marketplace that uses blockchain and trusted execution environments for trusted collaboration. Furthermore, Marcel is a researcher at SNET at TU Berlin.

[1] “The Data Science Shortage 2020”, Quanthub in 2020, https://quanthub.com/data-scientist-shortage-2020/

[2] “The State of Machine Learning and Data Science in 2020”, Kaggle, https://storage.googleapis.com/kaggle-media/surveys/Kaggle%20State%20of%20Machine%20Learning%20and%20Data%20Science%202020.pdf

--

--

--

Deep Tech Innovation brings to you to latest deep tech updates you need to know. A must read for business leaders, transformers, and innovators.

Recommended from Medium

Stolen Bike Visual Analysis using Tableau

Spreadsheets versus Laytime Software, what’s the difference?

Why Data Literacy Will Make or Break Your Company’s Future (and How to Improve It)

How to ace SQL interviews

Z-Scores and Standard Deviation in Python

Getting Started with Snowflake ❄️

READ/DOWNLOAD!( Frogs into Princes: Neuro Linguist

A Closer Look at the Biomass Burned in California’s Wildfires

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dr. Marcel Müller

Dr. Marcel Müller

Entrepreneur into Process Innovation with Deep Tech. Blockchain. Data Science. AI. Founder of JadenX and KnowledgeX

More from Medium

The Secret Lives of Millennial CS Assistant Professors (Part 1)

You are Not Using the Right AI/ML API: Here’s Why

Uncovering AI Tactics For Solving Real-Life Problems | Dataloop Blog

Why Self-Driving will Kill Uber