Understanding One Way Hash Functions #HowToBUIDL (5/n)
Hashing:: A Building Block for Every Cryptographic System
Ethereum makes significant use of the Keccak/SHA-3 algorithm.
Specifically, Ethereum & Solidity use the Keccack-256 version of SHA-3.
This seems to be a trade secret among Dapp Developers for some reason.
Although hashing is absolutely a key building block of all cryptographic systems, we do not have to reinvent the wheel in order to write systems and smart contracts that run on top of cryptographically platforms.
You are not required to understand how SHA-3 works. I repeat:
You are not required to understand how SHA-3 works. Once more:
You are not required to understand how SHA-3 works. Understood? Good. Now…
You SHOULD understand why hashing is used.
SHA-3 is a subset of the cryptographic primitive family Keccak. SHA-3 uses the Sponge construction in which data is absorbed into the sponge, then the result is squeezed out.
A hash function is any function that can be used to map data of arbitrary size to data of fixed size. The values returned by the hash function are called hash values, hash codes, digest, or simply hashes.
A one-way hash function exhibits the following properties:
A) Irreversible; computationally infeasible to determine the message from its digest, B) Collision resistant; impractical to find more than one message that provides a given digest, and C) High avalanche effect; any small change has a significant change in digest. D) Deterministic. Same input must always map to the same output.
Analogy: Baking a Cake
You have butter, sugar, eggs, flour, salt, baking powder, milk, vanilla, frosting, and you mix up all those bits of ingredients, mix it all up, put it together and heat it in the oven. Out comes a cake. But can you take a cake and go back and pull out each of the individual ingredients? Imagine how difficult that would be to re-extract an egg from the cake. For all intents and purposes, impossible.
Because the data that comes out is always unique, we can do interesting things like derive a specific wallet address from a given public key, meaning the same public key maps to the same wallet address every time.
Deterministic Digital Fingerprints.
We can also create digital fingerprints of documents, and compare the hash of a document provided by an author, with an independently calculated hash of a fully readable document. As long as the hashes match, we know that the document has not been tampered with by an intermediary that delivered the message. Even a tiny change has a large avalanche effect on the fingerprint, making it nearly impossible to fake a change, but very easy to identify if a bad actor is in the system. We’ve discussed hashing as a mechanism for mining and Proof of Work in the past, but now we’re more interested in the context of how we might want to make use of hashing in our smart contracts.
Let’s try this out. Fire up the truffle develop console:
truffle develop
The web3.js
library is well documented.
Let’s try out sha3
with various sample inputs:
web3.sha3("Ladies and gents, this is the moment you've waited for")
'0x698b948227d9bd31126f12d50b31991dacaea46f28293ee551d057094af30080'web3.sha3("Ladies and Gents, this is the moment you've waited for")
'0x8075374896d5dbb7027d9b5b710b3097f48513d287c18c49e7307c9089a7367f'web3.sha3("Been searching in the dark, your sweat soaking through the floor")
'0x081cbc70e6cc9c8a295d90f9a3b62c7e3122376111ade597a346ec9cefb1bcc2'web3.sha3("And buried in your bones there's an ache that you can't ignore")
'0x5e997b2a5c35920d3766cca0cce578911bcb252758c2bfd49888eaca331e7051'web3.sha3("Taking your breath, stealing your mind")
'0x6c68baed17efc0033efafc53618aae61c666c0440cbc3f87e8b586ffe8dda623'web3.sha3("And all that was real is left behind")
'0x76fb4b778db420f198c73a8976b7e8ce5cee4e3ec742b8d073da1225440bd077'
Some online tools are available for you to try out, if you don’t want to code…
https://emn178.github.io/online-tools/keccak_256.html
Some quick things to note: 1) Even changing the g
to G
had a massive change in the output. 2) No matter how long the input message to the sha3
function was, the output was always 64 hex characters or 32 bytes. Solidity conveniently provides us with a data type for storing this: bytes32
.
Take, for example, the entire Adventures of Sherlock. If you hash the entire book, the resulting keccak256 / SHA3hash is
'0xe2af31201f97e4d24df87332465243f8e56f046bc268a790540ce2a258bed40c'
This has interesting consequences. Since we know that the hash function is deterministic, storing hashed data in a smart contract’s storage means that digital fingerprint recorded on the public blockchain can be compared with an original document and independently verified at a later date in the future. Additionally, we know there is a high avalanche effect, so even one bit changed in the input data results in a drastic change in the output data. Perhaps this will be useful for determining the provenance of data, or the record of ownership of a work of art, used as a guide for authenticity/quality.
A very simple example smart contract would record the msg.sender
for the first person that recorded the uniquefingerprint
passed to the contract. If an entry already exists, a storeFingerprint
function would throw an exception. An alternative function would take an arbitrary length data
string, and the keccak256
of the data would be calculated. It would be a much cheaper gas cost for the client of the contract to pre-compute the sha3 hash, but we are providing both function
options for convenience.
pragma solidity ^0.4.23;contract Provenance { mapping (bytes32=>address) public fingerprints; function storeFingerprint( bytes32 fingerprint ) public {
require(fingerprints[fingerprint] == 0x0);
fingerprints[fingerprint] = msg.address;
} function storeDataFingerprint( string data ) public {
storeFingerprint( keccak256(data) );
}}
Dan Emmons is owner of Emmonspired LLC, a Certified Bitcoin Professional, Certified Ethereum Developer, Full Stack Developer and Advisor on Cryptocurrency projects. He is also the creator of a Youtube Channel and iTunes Podcast called #ByteSizeBlockchain.
If you’re really interested in learning more about the internal details of SHA3, wikipedia has several articles on the matter.