Hashing Functions In Solidity Using Keccak256

Vincent T.
0xCODE
Published in
5 min readFeb 15, 2022

--

The keccak256 (SHA-3 family) algorithm computes the hash of an input to a fixed length output. The input can be a variable length string or number, but the result will always be a fixed bytes32 data type. It is a one-way cryptographic hash function, which cannot be decoded in reverse. This consists of 64 characters (letters and numbers) that can be expressed as hexadecimal numbers.

The digest, or result of a hash, was not meant to be decrypted with a key like in encryption algorithms (e.g. public key encryption). The best way to recover a hash code is to have it verified as the result of the hash function. Brute force methodology would be the alternative, but not the fastest way to solve a hash code to recover the original message. That is a form of security against attacks.

Basic Hashing Principle

Provided an input of a string, for example “Hello World”, and pass it into a hash function using keccak256. The result would be:

Hello World -> keccak256 -> 592fa743889fc7f92ac2a37bb1f5ba1daf2a5c84741ca0e0061d243a2e6707ba

The string “Hello World” is not the same as “hello world”. If we hash “hello world”, we will get a totally different result.

hello world -> keccak256 ->47173285a8d7341e5e972fc677286384f802f8ef42a5ec5f03bbfa254cb01fad

The slightest modification or change in the string results in a massive change in the hash digest. However, you can have arbitrary sized inputs but they will always result in the same output.

Take for example two strings:

“The quick brown fox jumped over the lazy dog”

“Hello”

Now let’s hash each string using keccak256 and see the result (Note: The quotation marks are not included as part of the input).

The quick brown fox jumped over the lazy dog -> keccak256 ->a82db2ff0b312da9d856a75ba260e9955f0fe467307cd7793521624d11921365Hello -> keccak256 ->06b3dfaec148fb1bb2b066f10ec285e7c9bf402ab32aa78a5d38e34566810cd2

Now let us compare the results from each string.

a82db2ff0b312da9d856a75ba260e9955f0fe467307cd7793521624d1192136506b3dfaec148fb1bb2b066f10ec285e7c9bf402ab32aa78a5d38e34566810cd2

As we can see, they are both of a fixed length (64 characters) despite one string being longer than the other.

Now let’s see hash functions applied using keccak256 in Solidity smart contracts.

Encoding Input Data

In Solidity (programming language used in Ethereum), a hash function must first have the data input encoded. This is basically needed to encode contract calls to the EVM (Ethereum Virtual Machine), at the bytecode level. This allows developers to interact with the smart contract by exposing their functions and methods.

Some will say that you don’t need to encode the data and just hash the raw data (original data) directly. You can do that, but it is not best practice especially if you want to protect the data (the purpose of hashing in the first place).

The keyword abi.encodePacked is used with statements for encoding data input. It is used with the following conditions:

  • Types shorter than 32 bytes are concatenated directly, without padding or sign extension
  • Dynamic types are encoded in-place and without the length
  • Array elements are padded, but still encoded in-place
abi.encodePacked( <data input> )

This means that dynamic types are encoded in-place without length while static types will not be padded if they are shorter than 32 bytes.

For example:

abi.encodePacked("AAAA")0x41414141

Now we can apply our hash function using the keyword keccak256 in the following method:

keccak256(abi.encodePacked( <data input> ))

Here is an example:

function hash(string memory _string) public pure returns(bytes32) {     return keccak256(abi.encodePacked(_string));}

Let’s assign a value to _string as “Hash This String”. The result we get is:

0x5f82559096154cc5c8b38479da101b29c56995ade10aa89e4940b68ef1002567

Notice the 0x prefix before the hash digest. That makes the length of the entire string 66 characters in length. The 0x prefix is added to indicate it as a hexadecimal string.

Encoding was meant to give the programmer more control of how the data should be encoded. Before it was the compiler that performed this function, but it can be problematic. The reason is that it can cause what are called collisions.

Preventing Collisions

There is also a more complex technique for encoding called abi.encode. The abi.encodePacked function was meant to be simpler and more compact for encoding data. The abi.encode function can be useful when it comes to preventing collisions in hash functions.

A collision can occur when two different inputs produce the same output. That may sound impossible, but it can happen in the least expected manner. Take for example the following inputs:

(AAA, BBB) -> AAABBB         
(AA, ABBB) -> AAABBB

They are supposed to be different from each other, but when concatenated as a single string, they actually will produce the same output.

With abi.encode, encoding a string results in:

abi.encode("AAAA")0x0000000000000000000000000000000000000000000000000000000000000020
0x0000000000000000000000000000000000000000000000000000000000000004
0x4141414100000000000000000000000000000000000000000000000000000000

That is a 96 bytes or 3 words in length. Notice that you have padded zeroes while in abi.encodePacked you do not.

Now let us apply the keccak256 function and see why collisions can occur. First use abi.encodePacked. We will use two strings as variables (non-state) _string1 and _string2.

function collisionExample(string memory _string1, string memory _string2)public pure returns (bytes32) {     return keccak256(abi.encodePacked(_string1, _string2));}

Let’s use a simple example (Example 1):

_string1 = AAA

_string2 = BBB

The result when we concatenate the strings and apply the hash function is:

0xf6568e65381c5fc6a447ddf0dcb848248282b798b2121d9944a87599b7921358

Now let us change the values to the following (Example 2):

_string1 = AA

_string2 = ABBB

Here is the result:

0xf6568e65381c5fc6a447ddf0dcb848248282b798b2121d9944a87599b7921358

We get the same exact result as the previous example. The reason is because when you concatenate the strings, they result in exactly the same string:

AAABBB

Regardless of the order of the characters. You can have _string1 set to A and _string2 set to AABBB and it will still have the same result. This is a collision, and in production systems this will be a problem.

Now let us use the same function, but this time with abi.encode.

function collisionExample2(string memory _string1, string memory _string2)public pure returns (bytes32) {     return keccak256(abi.encode(_string1, _string2));}

With the first example (Example 1) the result is:

0xd6da8b03238087e6936e7b951b58424ff2783accb08af423b2b9a71cb02bd18b

This is much different than using abi.encodePacked. Now the moment of truth (Example 2), will it result in the same hash?

0x54bc5894818c61a0ab427d51905bc936ae11a8111a8b373e53c8a34720957abb

We get a totally different output which prevents the collision from occurring. If there is likelihood of inputs resulting in outputs that can cause a collision it is recommended to use abi.encode instead of abi.encodePacked. There are other techniques developers can use (e.g. add a random value with the concatenated string), but this is one of the most common.

Synopsis

Hashing is an important aspect of cryptographic security for digital wallets and transactions on the blockchain. It can help to create a deterministic unique value from a hash input, applied to Commit-Reveal schemes and for compact cryptographic signatures.

Using keccak256 is just one example of many hash functions. The Ethereum protocol uses keccak256 in its network with a consensus engine called Ethash. It plays an important role in producing blocks, and securing them on the blockchain.

--

--

Vincent T.
0xCODE

Blockchain, AI, DevOps, Cybersecurity, Software Development, Engineering, Photography, Technology