Day 42: Hamming Code

Anytime data is transferred or stored, there are errors occurring which leads to data corruption. Computer memory, disk, ethernet, wi-fi, … it happens everywhere. Hamming code was an impressive discovery of how to deal with the problem efficiently.

To identify errors, we may double each bit. Instead of 1010 we can store 11001100. If a pair of consecutive bits doesn’t match, data is corrupted. That is called parity.

But doubling is not enough. We can identify there’s an error but we can’t recover. Hence we have to triple the data. Having 111000111000 we can identify corrupted triplets and let the triplet vote for majority to reconstruct the original.

However, for each bit of data this approach requires additional 2 bits. If we expect an error to occur not more than once out of 255 bits, that’s just wasting.

Hamming’s idea is the following. For 255 bits we need 8 bits as address space. We can store 247 bits of data and only use 8 bits for parity checks.

Each parity bit covers positions that have certain bit set to 1 in its address. For example parity bit P1 checks only addresses with mask xxx1, P2 checks only addresses xx1x, P4 checks only addresses x1xx, etc.

If an error occurs, only parities targeting the corrupted bit are set to 1 and form an address to exact location.

https://github.com/coells/100days

https://notebooks.azure.com/coells/libraries/100days

algorithm

def encode(parity_bits, data):
n = len(data) + parity_bits
assert 2 ** parity_bits == n + 1
    # copy data to code
code = np.zeros(n, dtype=int)
code[np.arange(n) & np.arange(n) + 1 > 0] = data
    # parity mask
mask = np.zeros(n, dtype=int)
mask[::2] = 1
    # compute parity
i = 0
while i < n:
code[i] = code[i:][mask == 1].sum() & 1
i += i + 1
mask = np.repeat(mask, 2)[:n - i]
    # result
return code

def decode(code):
n = len(code)
    # parity mask
mask = np.zeros(n, dtype=int)
mask[::2] = 1
    # compute parity
error, i = -1, 0
while i < n:
error += (i + 1) * (code[i:][mask == 1].sum() & 1)
i += i + 1
mask = np.repeat(mask, 2)[:n - i]
    # fix error
if error >= 0:
code[error] ^= 1
    # get data from code
data = code[np.arange(n) & np.arange(n) + 1 > 0]
    # result
return error, data

encoding

> parity_bits = 3
> data = np.random.randint(0, 2, 4)
> # generate code
> code = encode(parity_bits, data)
> print('hamming code', data, '->', code)
hamming code [1 0 0 0] -> [1 1 1 0 0 0 0]
> # make error
> code[3] ^= 1
> print('with error', code)
with error [1 1 1 1 0 0 0]
> # reconstruct
> error, recon = decode(code)
> print('error @', error, '->', recon)
error @ 3 -> [1 0 0 0]
One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.