Day 14: Huffman codes

Published in

100 days of algorithms

1 min readApr 7, 2017

For many years the Huffman coding was state of the art in statistical data compression. Even though it should be noted that the main reason probably was that arithmetic coding was patented.

The idea is very similar to one of Samuel Morse, to create a sparse representation of the data.

Unlike Morse code, Huffman codes have unique prefixes which removes the need for separator and resulting stream has only one way of decoding. Disadvantage is that any error in a single bit can easily break the remaining part of the message.

algorithm

https://github.com/coells/100days

def find_min(freq):
    item = min(freq, key=lambda i: i[0])
    freq.remove(item)
    return itemdef print_codes(tree, prefix=''):
    if isinstance(tree, tuple):
        print_codes(tree[0], prefix + '0')
        print_codes(tree[1], prefix + '1')
    else:
        print(tree, prefix)def huffman_codes(text):
    freq = [(i, x) for x, i in Counter(text).items()]    while len(freq) > 1:
        li, lx = find_min(freq)
        ri, rx = find_min(freq)
        freq.append((li + ri, (lx, rx)))    print_codes(freq.pop()[1])

test

> huffman_codes('astala vista tasta')i 000
  001
t 01
l 1000
v 1001
s 101
a 11

Day 14: Huffman codes

algorithm

test

Written by Tomáš Bouda