Day 14: Huffman codes
Published in
1 min readApr 7, 2017
For many years the Huffman coding was state of the art in statistical data compression. Even though it should be noted that the main reason probably was that arithmetic coding was patented.
The idea is very similar to one of Samuel Morse, to create a sparse representation of the data.
Unlike Morse code, Huffman codes have unique prefixes which removes the need for separator and resulting stream has only one way of decoding. Disadvantage is that any error in a single bit can easily break the remaining part of the message.
algorithm
https://github.com/coells/100days
def find_min(freq):
item = min(freq, key=lambda i: i[0])
freq.remove(item)
return itemdef print_codes(tree, prefix=''):
if isinstance(tree, tuple):
print_codes(tree[0], prefix + '0')
print_codes(tree[1], prefix + '1')
else:
print(tree, prefix)def huffman_codes(text):
freq = [(i, x) for x, i in Counter(text).items()] while len(freq) > 1:
li, lx = find_min(freq)
ri, rx = find_min(freq)
freq.append((li + ri, (lx, rx))) print_codes(freq.pop()[1])
test
> huffman_codes('astala vista tasta')i 000
001
t 01
l 1000
v 1001
s 101
a 11