# Day 42: Hamming Code

Anytime data is transferred or stored, there are errors occurring which leads to data corruption. Computer memory, disk, ethernet, wi-fi, … it happens everywhere. Hamming code was an impressive discovery of how to deal with the problem efficiently.

To identify errors, we may double each bit. Instead of `1010` we can store `11001100`. If a pair of consecutive bits doesn’t match, data is corrupted. That is called parity.

But doubling is not enough. We can identify there’s an error but we can’t recover. Hence we have to triple the data. Having `111000111000` we can identify corrupted triplets and let the triplet vote for majority to reconstruct the original.

However, for each bit of data this approach requires additional 2 bits. If we expect an error to occur not more than once out of 255 bits, that’s just wasting.

Hamming’s idea is the following. For 255 bits we need 8 bits as address space. We can store 247 bits of data and only use 8 bits for parity checks.

Each parity bit covers positions that have certain bit set to `1` in its address. For example parity bit P1 checks only addresses with mask `xxx1`, P2 checks only addresses `xx1x`, P4 checks only addresses `x1xx`, etc.

If an error occurs, only parities targeting the corrupted bit are set to 1 and form an address to exact location.

https://github.com/coells/100days

https://notebooks.azure.com/coells/libraries/100days

#### algorithm

`def encode(parity_bits, data):    n = len(data) + parity_bits    assert 2 ** parity_bits == n + 1`
`    # copy data to code    code = np.zeros(n, dtype=int)    code[np.arange(n) & np.arange(n) + 1 > 0] = data`
`    # parity mask    mask = np.zeros(n, dtype=int)    mask[::2] = 1`
`    # compute parity    i = 0    while i < n:        code[i] = code[i:][mask == 1].sum() & 1        i += i + 1        mask = np.repeat(mask, 2)[:n - i]`
`    # result    return code`
`def decode(code):    n = len(code)`
`    # parity mask    mask = np.zeros(n, dtype=int)    mask[::2] = 1`
`    # compute parity    error, i = -1, 0    while i < n:        error += (i + 1) * (code[i:][mask == 1].sum() & 1)        i += i + 1        mask = np.repeat(mask, 2)[:n - i]`
`    # fix error    if error >= 0:        code[error] ^= 1`
`    # get data from code    data = code[np.arange(n) & np.arange(n) + 1 > 0]`
`    # result    return error, data`

#### encoding

`> parity_bits = 3> data = np.random.randint(0, 2, 4)`
`> # generate code> code = encode(parity_bits, data)> print('hamming code', data, '->', code)`
`hamming code [1 0 0 0] -> [1 1 1 0 0 0 0]`
`> # make error> code[3] ^= 1> print('with error', code)`
`with error [1 1 1 1 0 0 0]`
`> # reconstruct> error, recon = decode(code)> print('error @', error, '->', recon)`
`error @ 3 -> [1 0 0 0]`