How I used Ethereum Blockchain, IPFS and Machine Learning all in one Web App

Lorenzo Zaccagnini
4 min readJul 7, 2018

--

>> Try the live web version <<

Requirements :

  1. Install MetaMask
  2. Connect to the Ethereum test Rinkeby network

Intro

The zombie apocalypse is here, a bioterrorism group made a successful ICO masked as tradable virtual puppies, they used the money to create a biological weapon “The Zombie Virus”. We need a way to find immune people to the zombie virus in order to develop a vaccine. Good news, we’ve isolated the virus and created two “safe” versions of the zombie virus, ZMB1 and ZMB2. By injecting them into some volunteers, we can see how well they resist to the virus. Bad news, it’s not that easy, so we built a checklist for our scientist to inspect the symptoms of the two versions of the virus on the same patient, we need to comparate the evaluation of the symptoms of the volunteer to the previous data gathered by observing people infected “accidentally” by the live versions of the two viruses. The people are becoming scarce so we can’t use… we can’t make those incidents happen again.

Why the Blockchain

The apocalypse is here and centralized servers are dead, so we relay on decentralized solutions, more people, more nodes and more chances. Blockchain is our perfect match, decentralized and safe enough to protect us against a data manipulation attack from the biological terrorists. Ethereum blockchain is the only public blockchain still alive after the outbreak and Vitalik Buterin is leading the few survivors. This application smart contract is hosted on the Rinkeby Ethereum test network.

Why IPFS

Storing large files on the Ethereum blockchain is too expensive and slow (maybe with plasma we’ll have over gazillions of txs per milliseconds), FIAT currency is gone and now everybody is using only cryptos. We need a solution and IPFS is our best choice. IPFS is a decentralized solution to store files and host static websites, each file and all of the blocks within it are given a unique fingerprint called a cryptographic hash. If a file is modified the hash changes completely and the new version of the file is uploaded to IPFS, works like GitHub, IPFS provides historic versioning so we don’t lose any precious zombie analysis data or the bad guys can trick us with fake corrupted data.

Wrapping IPFS and Ethereum Blockchain together

I deployed a smart contract on the Ethereum blockchain capable of storing the IPFS hashes, so the huge analysis files are on IPFS and the smart contract stores only small strings. When we upload a new dataset, the ipfs hash of the file is sent to the smart contract and the huge file is uploaded to IPFS.

This is what the smart contract returns when fetches a dataset by the id

0: string: ipfsHash QmQpHy4vkH4ifyGkt1YxVHnwES2MiCRegD3PRmHuXHiXn7

1: uint256: insertedAt 1530905611

  1. A string that we can use to retrieve the file from IPFS, the link will be “https://ipfs.io/ipfs/QmQpHy4vkH4ifyGkt1YxVHnwES2MiCRegD3PRmHuXHiXn7”, basically the IPFS address plus the hash of the file.

2. A simple integer, the timestamp, the exact (not really) time when the dataset was inserted

User Interface

In order to create a user interface I used a Truffle box, a boilerplate containing React, Truffle and Web3.js. Web3 is a library that allows us to interact with ethereum Blockchain, in particular with the MetaMask Chrome extension. Truffle gives us an amazing development environment to work with Web3, including useful features such as smart contract testing and migration in a local environment.

Machine Learning

Machine learning is a subset of artificial intelligence, that uses statistical techniques to give machines the ability to “learn” and “predict”. In this case we have a classification problem, because we want to predict if a patient is immune or not to the zombie virus. I use a supervised learning algorithm called logistic regression, useful when we need to classify different result in a binary way, like “is immune” “isn’t immune”. I want to thank Andrew Ng for his course about machine learning on Coursera.

We’ve datasets based on people exposed to a live version of the virus, we’ve observed them using our checklist test and then, we built the ground truth dataset for our prediction model. Now, we can’t risk anymore to expose people to a live version of the virus, there are few people left and we need all of them. So we’ll use the live virus dataset to predict the outcome of zombie virus immunity, using “volunteers” on whom we’ve injected two limited versions of the zombie virus.

Client side Machine Learning

I used math.js to transpose the logistic regression algorithm from MathLab to Javascript.

I created a class in React that :

  1. Fetches the data from IPFS
  2. Creates matrices from the fetched file
  3. Calculates theta
  4. Applies the cost function
  5. Applies the gradient descent
  6. Calculates the sigmoid function
  7. Does a prediction output on the new data inserted (immune or not immune based on both checklist tests results)

Add a Dataset

It is possible from the web application to add a new dataset. but only if your account address is enabled in the smart contract. The smart contract allows only selected address to register new IPFS hash links. If you want to try this feature, contact me.

--

--