The Capital
Published in

The Capital

Big Data Projects on Blockchain: The Endless Solutions Blockchain Brings To Data Science

How blockchain can help us understand data? How blockchain can help address problems of big data?

Blockchain does more than protect data. As we have seen, it improves the quality of insight obtained from the data by verifying data at the input and preventing malicious actors.

This is 2019 and concepts like artificial intelligence, machine learning, virtual reality, big data, and blockchain are no longer seen as rocket science. Internet access is now just as common as electricity was during the epiphany of the internet. In similar fashion, the internet is enabling much greater, sophisticated technologies that are transforming virtually all spheres of society; redefining the way businesses and organizations are operated.

Photo by NASA on Unsplash

This is 2019, and data is well and truly king.

Businesses, governments, and other organizations are relying more and more on data; not just data, but deep insights derived from huge quantities of data. Major players in all sectors of the economy are leveraging data to improve their services and profitability. Needless to say, data has become a priority for the biggest companies like Apple, Microsoft, Google, Amazon, and Facebook as they control enormous volumes of data.

Big data technologies allow these organizations to harness insight from such large amounts of data which otherwise would not have been impossible. But we didn’t get here all of a sudden; data science is the precursor of the present day big data technology.

An Introduction To Data Science

In its simplest form, data science involves the analysis of data with aim of extracting knowledge and actionable insights from it. Data scientists use scientific methods, processes, algorithms, and systems to extract knowledge and insights from data which come in either structured or unstructured form.

Furthermore, data science covers such areas as statistics, data analysis, machine learning as well as other advanced methods used to understand processes.

In terms of use cases, data science applications abound around us. One that readily comes to mind is in the case of search engine protocols like the ones used for Google and Bing searches. Digital advertisement like programmatic ad protocols also require deep insights just like recommender services that prop up on digital platforms.

Outside ICT, data science is equally helpful in areas like the healthcare sector where data patients and medical supplies can be analyzed and used to provide better treatment and healthcare services in general.

As a matter of fact, the applications of data science cut across every industry are it in energy management, hospitality, entertainment, etc.

Big Data and A World of Possibilities

Big data is best seen as a much-advanced aspect of data science which involves extremely large sets of data that cannot be handled by regular data science methods. Gartner describes the three Vs of big data; data that contains greater variety arriving in increasing volumes and with ever-higher velocity. These extremely large and complex data sets that may be analyzed computationally to reveal patterns, trends, and associations.

More than any other time in history, data is larger, more complex and coming at fast speed and from several new sources; this implies a whole spectrum of possibilities for big data technology.

An Emerging Tech That Can Improve Every Industry


Blockchain is the technology that powers cryptocurrencies like bitcoin and ethereum. While cryptocurrency is strongly associated with blockchain, it is only one of many use cases of the technology.

For benefit of the doubt, we can define blockchain as a distributed ledger that records transactions in such a way that they cannot be altered. It records not only cryptocurrency transactions, but any anything of value — and there lies the secret to its applicability.

Blockchain is associated with trust. It removes the need for a third party, a regulator to transactions which would have been necessary if blockchain data was not immutable. So, when people say blockchain can disrupt just about any industry, they refer to this ‘trust’ that its decentralized nature affords. Its impact is already being felt in the finance industry where it enables commission free, decentralized financial platforms and real-time cross-border payments and settlements.

In the same way, blockchain is also relevant in data science.

Why Blockchain for Data?

It’s been established that data is the way of the present and the future, but it’s not just data but the quality of data. The insight that can be gotten from data is only as good as the quality of the data itself. Therefore organizations, as recognize this, are investing in technologies to improve the quality of the data they collect. Big data — and data in general — is now categorized by a fourth V which is its veracity.

In this regard, blockchain can help.

As a matter of fact, blockchain solutions, more than anything else, brings a high level of reliability, trustworthiness, veracity to data. When applied to big data, blockchain makes for seamless verification of transferable data. It can help address problems such as human error, data duplication, and false information often encountered when dealing with huge volumes of data.

Looking at public blockchain works, it’s easy to see how blockchain can improve the security and integrity of data. As a decentralized system, blockchain requires that multiple users authorize a transaction (or input) for it to be validated. Inconsistencies in data being imputed at this stage can be flagged and since it will take the majority of the nodes to approve an input, a single bad player cannot compromise the quality of the data. Furthermore, blockchain systems allow data to be tracked easily as they are all linked. Organizations in supply chain, logistics, and other sectors are beginning to leverage this benefit of blockchain.

Using Blockchain For Data Science

At this point, it becomes clear that blockchain helps data science in areas including data integrity, real-time data analysis, and data sharing.

As we will discuss later, these benefits are being applied in three broad categories. Firstly, in decentralized cloud storage where projects like Storj, Filecoin, datum, and Sia are shaking things up. There are projects like Provenance that basically use blockchain to ensure the security of data across several industries and also projects that hit right at the heart of data analytics. Projects like Omnilytics and Rubilix are using blockchain to improve market insights and trading predictions.

On the other hand, there are also applications that are analyzing data stored on blockchains to come up with meaningful insights. These include Chainalysis, Elliptic, Numisight, and Skry.

Some Blockchain Projects In Data Science

Here are a few blockchain projects which are helping us better understand and utilize data

A. Decentralized Cloud Storage Projects

1. Storj

Storj (pronounced “storage”) is a decentralized end-to-end encrypted cloud storage service which uses the excess capacity hard drives and bandwidth around the world. Basically, the Storj protocol allows for peer to peer negotiation and verification of storage contracts between storage providers (known as farmers) and storage users (or called “renters”).

The files are first encrypted at the client end and then split into pieces called “shards” before they are stored on the farmer side. Files are stored 3 times redundantly by default as backups. This way, only the client has complete access to the data; this added security gives it an edge over centralized cloud services. Using the STORJ cryptocurrency, Renters periodically audit the farmers to ensure the safety of their file and also pay for storage.

When it launches later this year, the service will cost $0.015 per gigabyte space and $0.05 per gigabyte downloaded monthly. While this is not an upgrade on fees like $10 for 1 Terabyte by Dropbox, it does offer a cost advantage for users who don’t use up the space allotted by Dropbox and co. Unlike these traditional cloud storage providers, Storj allows users to pay for only the space used without set-up fees or minimum usage requirements. The service is currently in a public alpha testing phase.

B Blockchain-enabled Data Analysis Projects

2. Omnilytics

A data analysis platform, Omnilytics provides market intelligence with tools that provide tools providing actionable insights for sales, marketing and merchandising.

It combines the blockchain with big data analytics and other technologies like artificial intelligence and machine learning data processing to aggregate data across various industries. Currently, Omnilytics provides data including competitor benchmarking, trends performance, and pricing analysis for its clients mainly in the retail sector.

The team explains that it uses blockchain to power its smart contracts, distributed data fingerprinting, data exchange and other protocols and APIs. On the blockchain, data partners can track the performance of their data and incentivize the key actors through micropayments.

3. Rublix

Through its flagship decentralized application “HedgeTrade”, Rublix is uniting cryptocurrency traders in a trading platform that verifies the authenticity and credibility of traders and predictions. The protocol basically leverages the transparency and immutability of blockchain, combined with investment data analytics to provide more accurate trading predictions. Traders/investors are ranked according to the accuracy of their predictions while and their market trend can easily be accessed. The block-chain verified traders earn rewards for high-quality content.

C. Blockchain-enabled Data Security Projects

4. Provenance

Provenance is a platform that gathers and shares key product information and journeys in a way that’s secure, trustworthy and accessible. It has its main application in the supply chain of products.

Its blockchain architecture accommodates 6 categories of participants — the producers, manufacturers, registrars (accreditors), standards organizations, agents as certifiers and auditors, and customers.

Through its protocol, consumers have access to verified information about products; information like its origin, the points along its supply chain, the content and quality of the products, even its impact on the environment. Beyond tracking products via blockchain, Provenance allows producers, retailers, and consumers get valuable insight from the data as it builds up.

Make no mistake, there are several other project and organizations using blockchain in varying degrees to improve data applications — some alongside other technologies like AI and machine learning.


As more organizations apply blockchain in data storage, data management, data analysis, and other aspects, we will see less of high-level data crisis like those experienced by Equifax and Facebook.

But blockchain does more than protect data. As we have seen, it improves the quality of insight obtained from the data by verifying data at the input and preventing malicious actors.

The solutions blockchain brings to the data science space is broadening just as the technology continues to evolve; there’s no telling how much of an impact blockchain will have in the next couple of years.

While blockchain and other DLTs aren’t without their own challenges, projects implementing decentralized solutions are gradually shifting the access to relevant data insights from big-budget organizations to smaller business and individuals. They are also ensuring that compensation for data goes to the data providers.

Follow us on Twitter, InvestFeed, Facebook, Instagram, LinkedIn, and join our Discord and Telegram.

Read about our upcoming Altcoin Magazine Mastermind Event here.




A publishing platform for professionals in business, finance, and tech

Recommended from Medium

QFORA Meet our Team

Why is it Important that Blockchain have distributed ledger?

The Thinkium core engine provides a standard desktop application—the Thinkium desktop, which has…

5 Reasons NFT Is A Waste Of Time

The Beginning of DeFi

🔥AMA session: key points🔥

Event Announcement: Online DAO event of the PwC-spinoff ChainSecurity and Idavo

What is Aegis DeFi and why it can define DeFi 2.0?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Product Person |#Commerce #Data #Growth | Prev. Google, eBay, Albertsons /

More from Medium

Twitter Acquisition by Elon Musk: Tweet Scraping and Sentiment Analysis

A Technical Case Study on The Preprocessing and Modeling Phases of an Aquatic Solar Site…

On-chain and off-chain user and marketing analytics for blockchain

Great Brazilian Social Economics Data Base for Analyses!