Photo by Pexels from Pixabay

A group of students made the history of information theory managing to apply set shaping theory and confirming the theoretical predictions

Aida Koch
CodeX
Published in
4 min readOct 20, 2022

--

Set shaping theory is a new theory whose theoretical predictions promise revolutionary results. The problem is that they are theoretical predictions, and since the concept of information is very difficult to formalize in a mathematical way, the lack of a practical application represents the greatest criticality of this theory.

For information about this theory, I suggest you read the following article in which I explain this new method in a simple way:

https://medium.com/codex/as-one-riemann-idea-it-is-revolutionizing-information-theory- set-shaping-theory-entropy-coding-f9e0549e9c29

The prediction of the set shaping theory

This prediction is quite simple to explain; given a set, that we call X, in which there are all the sequences that can be generated by rolling a die with A faces a number N of times there is a set, of equal size, that contains sequences that on average can be encoded with a smaller number of bits.

At this point, you may be wondering how this is possible. A random sequence cannot be compressed, in order to understand this result, we must know what the term coding of a sequence means.

Encoded message is the message where symbols are replaced by codewords.

In information theory, the coding sequence also represents the compressed sequence.

The set shaping theory, on the other hand, considers the compressed message as the encoded message plus the list of codewords. In practice, the compressed message is identified by all the information that is not independent of the sequence. Consequently, the list of codewords represents the inefficiency of coding a random sequence.

According to this theory, this inefficiency can be eliminated by transforming the randomly generated sequence into a new sequence of greater length but more easily codable.

The inefficiency of entropy coding is explained in detail in the following article:

https://levelup.gitconnected.com/the-inefficiency-of-entropy-coding-set-shaping-theory-fcbf1ccb40bc

You can understand the skepticism of many researchers in front of a prediction of this type without any experimental confirmation. However, all this has changed; in fact, a group of students managed to apply the theory and developed the following statistical experiment to verify the prediction.

The article “Practical applications of Set Shaping Theory in Huffman coding” was published on arxiv:

https://arxiv.org/abs/2208.13020

They also shared the Matlab file used.

https://www.mathworks.com/matlabcentral/fileexchange/115590-test-sst-huffman-coding

I also suggest the following article in which the implications of this result on Shannon’s first theorem are described.

https://www.academia.edu/88056303/Consequences_of_the_practical_application_of_set_shaping_theory_on_Shannon_s_first_theorem

The statistical experiment

Step 1) Generates a random sequence.

Step 2) Calculates the encoding limit (zero order empirical entropy) of the generated random sequence.

Step 3) The generated sequence is transformed into a sequence of longer length with length N + 1 therefore, having a plus symbol.

Step 4) The Huffman coding is applied and the symbols are replaced with codewords.

Step 5) The length of the encoded transformed sequence is compared with the encoding limit.

Step 6) The encoded sequence is decoded and the inverse transform is applied in order to obtain the initial sequence. If the sequence obtained is not the same as the initial sequence, the error is reported. In this way, we check that the function used is biunivocal.

Step 7) These steps are repeated a statistically significant number of times.

Step 8) Calculates the probability that the length of the transformed encoded sequence is less than the encoding limit calculated on the initial encoding sequence.

This presentation describes in detail the data compression experiment.

https://www.academia.edu/88055617/Description_of_the_program_used_to_validate_the_theoretical_results_of_the_Set_Shaping_Theory

Report the results presented in the article “Practical applications of Set Shaping Theory in Huffman coding”.

Ps = probability with which the transformed sequence f(x) can be encoded using a uniquely decodable code (Huffman coding) with lower bit number than the coding limit of the initial sequence x (zero order empirical entropy).

ns = number of symbols

N = length of the sequence

The results obtained confirm the theoretical predictions.

In this way, we get a result that was thought impossible. In fact, a random sequence can be transformed into a new, longer sequence that can be encoded with a smaller number of bits with a probability greater than 50%.

This does not mean being able to reduce the length of a random sequence because, from a practical point of view, a compressed sequence is uniquely defined by its encoding (substitution of symbols with codewords) and by the list of codewords. In fact, whoever receives the message must also receive the codewords to trace the original message.

--

--

Aida Koch
CodeX
Writer for

I am Aida Koch, I have a degree in information theory and I am doing my doctorate. I'm passionate about Set Shaping Theory. I still carry on !