Relative entropy

9/9/2023

For the optimal case, this would be 100/60 = 1.67 bits/symbol. The average number of bits per symbol would be 141/60 = 2.35 bits/symbol. We used sub-optimal code to represent the sequence of length 60. To draw an appropriate analogy, let’s consider our example. Here, symbol refers to the alphabets appearing in the sequence. Basically, we can think of it as the extra average number of bits per symbol needed to represent the system due to sub-optimal encoding. It is a distance function from a true probability distribution to a target probability distribution. It’s also called Kullback-Leibler divergence (KL-divergence). This is where the concept of relative entropy comes into picture. We need a way to quantify this whole thing, right? We should be able to measure this loss in quality because of sub-optimal representation. If you consider the full sequence of length 60 to build the optimal entropy coder, you will see that you need only 100 bits to represent the whole sequence.

That doesn’t make sense, right? The reason this happened is because we didn’t know what the data contained before we built our entropy coder. So, not using any entropy coding turned out to be better than using entropy coding. This means that the above sequence of length 60 would have taken 120 bits to represent it. If you hadn’t used any entropy coding, you would have assigned 2 bits to each character. So if we go ahead with the above representation, it will take 51 + 3 x 30 = 141 bits to represent the data stream. Now consider the first 60 alphabets in the same data stream: SSCAAACCCKKKKAKKKKKKKKCCCAKKKKSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS If you substitute these values, you will see that it optimally compresses the above sequence to 51 bits. So we go ahead and build an entropy coder using Huffman coding to get the following representation: S = 000 (3 bits) You can see that the letter ‘K’ appears a lot. Okay, let’s consider the first 30 alphabets in our data stream: SSCAAACCCKKKKAKKKKKKKKCCCAKKKK Nothing clears up a discussion like a concrete example. I’m not sure what you are talking about! Show me an example. Do we end up suffering in terms of compression by doing this? How do we measure the loss in quality? Since you cannot wait forever, you just wait for the first ‘n’ alphabets and build your entropy coder hoping that the rest of the data will adhere to this distribution. But what if you don’t have access to all the data? How do you know what alphabet appears most frequently if you can’t access the full data? The problem now is that you cannot know for sure if you have chosen the best possible representation. So you go ahead and build your nifty entropy coder to take care of all this.

Let’s say we have a stream of English alphabets coming in, and you want to store them in the best possible way by consuming the least amount of space. So coming to the topic at hand, let’s continue our discussion on entropy coding. If not, you may want to quickly read through my previous blog post. If you are familiar with the basics of entropy coding, you should be fine. We first discuss the classical case.In this blog post, we will be using a bit of background from my previous blog post. It is the quantum mechanical analog of relative entropy.įor simplicity, it will be assumed that all objects in the article are finite-dimensional. In quantum information theory, quantum relative entropy is a measure of distinguishability between two quantum states.

0 Comments

Author

Archives

Categories

Relative entropy

Leave a Reply.