KnowledgeX Decentralized Data Science Marketplace implements the iExec Protocol

Blair Maclennan
iExec
Published in
7 min readOct 1, 2021

This series is about showcasing what unique use and business cases that can be realized with the iExec protocol. We’ll be inviting guest writers to introduce their projects and how they implement iExec. In this article series, you’ll learn about projects using iExec for off-chain compute, confidential computing, decentralized oracles, and governance tools.

Today, we’re thrilled to be presenting KnowledgeX, a decentralized knowledge management platform connecting data owners & data science specialists. This project was initially awarded an EU grant as part of the program ‘ONTOCHAIN of which iExec is a partner.

KnowledgeX adopts some of iExec’s key technologies:

  • KnowledgeX adopts iExec RLC tokens at its core to pay for computing resources.
  • iExec provides choice within the cloud environment. With no ‘vendor lock-in with direct access to multiple types of cloud computing services, from the traditional cloud to specialized TEE (Trusted Execution Environment) cloud — seamlessly from one single interface.
  • The KnowledgeX platform leverages iExec Confidential Computing to create trust in an environment where valuable and sensitive data can be processed in a confidential manner. It offers data owners full control over their data, who can decide where their data is being processed.

Let’s get into the article written by the KnowledgeX team, and learn a little more about their project:

Over the past years, value creation for many businesses has become more and more data-driven. Energy companies reduce their CO2 footprint by analyzing their operations, logistics companies shorten their supply chain by optimizing their processes, and healthcare service providers improve their patients’ lives through data-driven prognostics.

Creating knowledge from data often requires highly specialized data scientists with broad knowledge and expertise in data analysis and, lately, machine learning.

Nevertheless, the process of transferring highly valuable data to a data scientist for knowledge generation is privacy and trust-intensive. Data owners want to be sure that the data scientist has the experience needed and only uses the data for the agreed-upon task.

KnowledgeX is a decentralized platform that

  • lets companies find specialized data scientists for their specific use cases in a marketplace and
  • enables traceable and transparent knowledge generation with trusted execution environments and blockchain technologies

The platform leverages iExec to create trust in an environment where data is a valuable resource but expertise is needed to gain insights. Data owners can outsource data science tasks to independent contractors without risking the loss of data. Independent data scientists can bid on proposed tasks without getting prior access to confidential data.

Currently, data markets are hampered by confidentiality requirements due to competitive (e.g., cost data) or regulatory (e.g., personal data) considerations. Data scientists have to be employed in-house or are contractually restricted by non-disclosure stipulations, which tend to be ambiguous and costly to enforce.

KnowledgeX aims to solve this problem via a process where data privacy and contract fulfillment are technologically guaranteed, so the need for non-disclosure agreements does not arise. In addition, the KnowledgeX project aims to track reputation in a decentralized fashion to avoid a “lemons market” scenario where only low-quality data scientists and data owners interact.

The Problem: “Give me insights but don’t look at my data”

KnowledgeX tackles a well-known problem in data science. Over the last couple of years, many companies concluded that data is the new gold and started collecting data.

But collecting data is the easy part. Data might be more aptly described as the raw ore that contains nuggets of gold. What is hard is making sense out of collected data. For this, specialist data scientists are needed. Most data owners do not have the necessary knowledge in machine learning and data analysis themselves. At the same time, they are reluctant to give away their valuable data to people they do not know. Large companies started to build up their own data science departments with specialists. But smaller companies like a water infrastructure operator with four employees do not have the means to hire expensive full-time specialists. Freelancers would be an option. But the problem with data science freelancers currently is, that the data owners need to be sure to trust them and to find suited data science specialists at all.

The Solution: Shared problem solving based on confidential data

KnowledgeX consists of two parts: the data science marketplace and a trustworthy data science computation environment.

The KnowledgeX marketplace connects data scientists and data owners. Data owners can create data science gigs and specify which specialized skills they need for the gig. For example, a water infrastructure operator needs a specialist in time series analysis and anomaly detection to predict which of the water pipes are most worn out and likely to burst. KnowledgeX selects the three best-suited candidates for the gig and lets the data owner decide which one to choose.

The KnowledgeX confidential computing environment leverages the iExec platform to let the data owner have full control over their data. Data owners can decide where their data is being processed. They can also decide what a data scientist can do with the data in general. KnowledgeX and iExec let data owners audit what happened with their data afterward. This builds trust that in current platforms does not exist.

The Technology: Three options to run KnowledgeX on iExec

iExec is at the heart of KnowledgeX. Data owners have three different options for data processing.

Option 1: On-Premise

If a data owner wants that their data never leaves their own house, KnowledgeX enables them to create a custom iExec worker pool on their premises. The data scientist can create data analysis scripts on their local machine only using obfuscated sample data. The real data sets are purely processed in their infrastructure.

Option 2: Traditional Cloud

If the data is not sensitive, KnowledgeX enables the second option to execute data science tasks in a normal cloud environment using iExec workers.

Option 3: TEE Cloud

For the most sensitive data (e.g., patient health care records or banking transaction data), the data owner has the possibility to require the data scientist to execute all data science tasks in a trusted execution environment. With iExec’s SGX attestations, it is possible to verify that all tasks have been authorized and to audit the code. Any violation of processing agreements can be audited and detected.

The Business: Task-specific data science knowledge in a trusted environment

KnowledgeX adopts iExec RLC tokens at its core to pay for computing resources. In the user, KnowledgeX is a decentralized platform that enables two different interactions with tokens and the blockchain.

Full Control

In the full control scenario, the data owners and data scientists manage their own crypto wallets and tokens. KnowledgeX provides services for matchmaking and conflict resolution to the two sides without being a trusted intermediary. For these services, KnowledgeX charges service fees paid in fiat or KnowledgeX tokens.

Fully Managed

Some users are not crypto-savvy and do not want to manage their own tokens. KnowledgeX offers to act on their behalf. This is a solution that opens up KnowledgeX as an application to potential customers outside of the traditional crypto ecosystem. Traditional companies with a lot of data also want to use KnowledgeX, but do not want to deal with tokens, wallets, and transactions. These users can opt for the managed version.

The Potential: Enable a global market for contracted data science

In 2020, more than 250,000 data science positions could not be filled worldwide [1]. Data scientists are highly in demand and need to be highly skilled with at least ⅔ of all of them having a master or Ph.D. degree [2]. KnowledgeX can help bring these sides in a trusted way together like no other solution on the market right now. iExec sits at the heart of KnowledgeX. We currently build up a commercial product that we aim to roll out soon.

Learn more about Want to stay up to date with KnowledgeX?

Sign up for the newsletter for technical updates and future opportunities for funding and partnering here: https://www.knowledgex.eu/stay-up-to-date

Want to get in contact? Check out our website
➡️ https://www.knowledgex.eu

Questions? Message Marcel on LinkedIn [https://www.linkedin.com/in/the-real-marcel-mueller/] or Twitter [https://twitter.com/onlyMarcelM]

[1] “The Data Science Shortage 2020”, Quanthub in 2020, https://quanthub.com/data-scientist-shortage-2020/

[2] “The State of Machine Learning and Data Science in 2020”, Kaggle, https://storage.googleapis.com/kaggle-media/surveys/Kaggle%20State%20of%20Machine%20Learning%20and%20Data%20Science%202020.pdf

💡 Want to learn more about iExec? Check out iExec Academy!

iExec Academy aggregates all content related to the project. You’ll find articles, tech documentation, videos, interactive demos, and much more! Whether you are a beginner or an expert, a developer or crypto-enthusiast, you’ll find what you are looking for on iExec Academy!

📚➡️ https://academy.iex.ec

--

--