Can Blockchain Really Make Big Data Better?
By Pankaj Verma on ALTCOIN MAGAZINE
What Is A Blockchain?
A blockchain, originally blockchain, is a growing list of records, called blocks, which are linked using cryptography. Each block contains a cryptographic hash of the previous block, a timestamp, and transaction data .
The reason why the blockchain has gained so much admiration is that:
- A single entity does not own the data stored inside the blockchain
- The data is cryptographically stored inside
- The blockchain is immutable, so no one can tamper with the data that is inside the blockchain
- The blockchain is transparent so one can track the data if they want to
What Is Big Data?
According to Wikipedia, “Big data is a term used to refer to data sets that are too large or complex for traditional data-processing application software to adequately deal with. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate.”
Big data can be described by the following characteristics:
Volume: The quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can be considered big data or not.
Variety: The type and nature of the data. This helps people who analyze it to effectively use the resulting insight. Big data draws from text, images, audio, video; plus it completes missing pieces through data fusion.
Velocity In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Big data is often available in real-time. Compared to small data, big data are produced more continually. Two kinds of velocity related to big data are the frequency of generation and the frequency of handling, recording, and publishing.
Veracity It is the extended definition for big data, which refers to the data quality and the data value. The data quality of captured data can vary greatly, affecting accurate analysis.
The Benefits Of Big Data
Now that you know the Vs of big data, let’s look into why we should go through the trouble of analyzing big data in the first place. Let’s look at the benefits of big data analytics.
- Saves time
- Cost-efficient
- Helps in product development
- Helps in understanding market conditions
Biggest Challenges Of Big Data
As you can imagine, big data implementation has multiple challenges.
- Dealing with data growth. Big data, as the name suggests, deals with a huge volume of data. As such, it becomes extremely difficult to keep store all the data in a secure manner
- Securing big data Security is also a big concern for organizations with big data stores. After all, some big data stores can be attractive targets for hackers or advanced persistent threats
Big Data And Blockchain: Quantity And Quality
The reason why big data and blockchain can have a very fruitful relationship is that the blockchain can easily cover the flaws of big data. There are three reasons why this partnership can be fruitful:
Security: Blockchain’s biggest asset is the security that it imparts to the data stored inside it. Remember, all the data that is inside the blockchain is non-tamperable
Transparency: The transparent architecture of the blockchain can help you trace data back to its point of origin.
Decentralization: All the data that is stored inside a blockchain is not owned by one single entity. So, there is no chance of data getting stolen if that entity gets compromised in any way.
Flexibility: The blockchain can store all kinds and types of data.
#1 Decentralization
Before Bitcoin and BitTorrent came along, we were more used to centralized services. The idea is very simple. You have a centralized entity which stored all the data
Another example of a centralized system is the banks. They store all your money, and the only way that you can pay someone is by going through the bank.
Now, centralized systems have treated us well for many years, however, they have several vulnerabilities.
- Firstly, because they are centralized, all the data is stored in one spot. This makes them easy target spots for potential hackers.
- What if the centralized entity somehow shuts down for whatever reason? That way nobody will be able to access the information that it possesses
- Worst case scenario, what if this entity gets corrupted and malicious? If that happens then all the data that is inside the blockchain will be compromised.
In a decentralized system, the information is not stored by one single entity. In fact, everyone in the network owns the information.
#2 Transparency
One of the most interesting and misunderstood concepts in blockchain technology is “transparency.” Some people say that blockchain gives you privacy while some say that it is transparent. Why do you think that happens?
Well… a person’s identity is hidden via complex cryptography and represented only by their public address. So, if you were to look up a person’s transaction history, you will not see “Bob sent 1 BTC” instead you will see :
“1MF1bhsFLkBzzz9vpFYEmvwT2TbyCt7NZJ sent 1 BTC”.
So, while the person’s real identity is secure, you will still see all the transactions that were done by their public address. Two type address in bitcoin one is private and one is public address.
Speaking purely from the point of view of cryptocurrency, if you know the public address of one of these big companies, you can simply pop it in an explorer and look at all the transactions that they have engaged in. This forces them to be honest, something that they have never had to deal with before.
#3 Immutability
Immutability, in the context of the blockchain, means that once something has been entered into the blockchain, it cannot be tampered with.
The reason why the blockchain gets this property is because of cryptographic hash functions.
In simple terms, hashing means taking an input string of any length and giving out an output of a fixed length. In the context of cryptocurrencies like bitcoin, the transactions are taken as an input and run through a hashing algorithm (Bitcoin uses SHA-256) which gives an output of a fixed length.
Let’s see how the hashing process works. We are going to put in certain inputs. For this exercise, we are going to use the SHA-256 (Secure Hashing Algorithm 256).
As you can see, in the case of SHA-256, no matter how big or small your input is, the output will always have a fixed 256-bits length. This becomes critical when you are dealing with a huge amount of data and transactions. So basically, instead of remembering the input data which could be huge, you can just remember the hash and keep track
There is just one property that we want you to focus on today. It is called the “Avalanche Effect.”
What does that mean?
Even if you make a small change in your input, the changes that will be reflected in the hash will be huge.
Even though you just changed the case of the first alphabet of the input, look at how much that has affected the output hash.
The blockchain is a linked list that contains data and a hash pointer that points to its previous block, hence creating the chain. What is a hash pointer? A hash pointer is similar to a pointer, but instead of just containing the address of the previous block it also contains the hash of the data inside the previous block.
Examples Of Big Data And Blockchain Projects
1)Storj
Storj is an open-source, decentralized file storage solution. They use cryptography, sharding, and hash tables to help store files on a decentralized peer-to-peer network. Storj has a distributed set of storage nodes which utilizes the spare hard drive space from its community members, who are called “farmers”.
2) Omnilytics
Omnilytics is a startup that aims to combine the blockchain with big data analytics. It uses artificial intelligence and machine learning as part of this process, with marketing, financial due diligence, auditing, trend forecasting, and many other applications across industries.
3) Datum
Datum is a decentralized storage network driven by the Data Access Token (DAT). It puts the focus on the individual, who can monetize their own data in an open and honest marketplace, instead of being exploited by the current data giants like Facebook.
Conclusion
Big data and blockchain technology can join forces to truly revolutionize the way we process and analyze data. In this day and age, data is money. In order to come out on top of this race for acquiring more high-quality data, we will probably see more and more companies trying to delve into this powerful partnership.