Talk:Arithmetic coding/As a generalized change of radix: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 48:
:::::::: The purpose of all this is to divide the interval from 0 to 1 (or in this case, the interval from 0 to (length of string) since they want to use all integers) fully into intervals for each of the characters, where the length of the interval for each character is proportional to its frequency. So we want an interval of length 1 for A, length 2 for B, and length 3 for D. The implementations here accomplish that by assigning them from 0 upwards to the characters in ascending character order. So A gets the interval 0-1, B gets the interval 1-3, and D gets the interval 3-6. So each character gets an interval represented by the start being its "cumulative frequency", and the length of the interval being its frequency. --[[User:Spoon!|Spoon!]] ([[User talk:Spoon!|talk]]) 01:15, 29 January 2016 (UTC)
:::::::::: I considered that interpretation, but for it to work you'd need a dictionary to figure out which letters were in use, and that's not part of the definition. Worse, though, this conflicts with the writeup and example at [[wp:Arithmetic_coding#Arithmetic_coding_as_a_generalized_change_of_radix|wikipedia]]. I mean, yes, the choice of the length of the string as the number base sort of implies something like this, but using cumulative frequency to identify the letters is a worse choice than fixed letter values because you do not know which letters have which frequencies. Also, if you read the wikipedia page, you will see that it says "A = 0, B = 1, C = 2, D = 3" and you can't get C = 2 for that example if you are pretending that these numbers represent cumulative frequencies. --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 23:28, 30 January 2016 (UTC)
::::::::::: "Fixed letter values" doesn't make sense. A coding needs to divide the interval 0-1 into non-overlapping intervals for each character. Huffman coding is the optimal length coding where intervals are all restricted to the size 1/(some power of 2). Arithmetic coding is the optimal coding where the intervals can be any size, and mathematically, the optimal length coding is one where each character's interval is the same size as its probability of occurring, so if "DABDDB" were representative of the letter frequencies, then A should be given an interval of size 1/6, B given an interval of size 1/3, and D given an interval of size 1/3. (The "generalized change of radix" representation simply assumes a denominator of "base" (6), so the interval sizes are 1, 2, and 3, respectively.) How does T=19, H=7, E=4 (as you suggest) help us divide up 0-1 into intervals? It doesn't. And this is not just for letters; this is for all unit symbols, so for bytes there are 256 characters. How are you going to divide up 0-1 into intervals? If you are just going to give an interval of size 1/256 for each character, that's not a (remotely optimal) coding -- that's just the original plaintext binary. --[[User:Spoon!|Spoon!]] ([[User talk:Spoon!|talk]]) 18:20, 31 January 2016 (UTC)
::::::::: That said, using cumulative frequencies in place of letter values for your encoding means words like THE, ONE, TWO, DOG, BAD, etc. etc. all encode to the same number. --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 18:45, 30 January 2016 (UTC)
:::::::::: Yup, they SHOULD all encode to the same number. That's the point, because they have the same set of frequencies. It's the dictionary that allows you to decode them. In Huffman Coding, all of those should also similarly encode to the same binary code, and it's the dictionary that allows you to decode. --[[User:Spoon!|Spoon!]] ([[User talk:Spoon!|talk]]) 21:35, 30 January 2016 (UTC)
::::::::::: Well, I guess it's true that huffman coding implies a shared dictionary between the encoder and the decoder. And, there is an explicit analogy drawn here with huffman coding. But if that's the case, we either need to change the wikipedia page or we need to spell out this issue in the task requirements or both. And if this is the "correct definition" but we "cannot change the wikipedia page" then we should also address the conflict in the task description. --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 23:34, 30 January 2016 (UTC)
:::::::::::: I don't see anything wrong with the Wikipedia page. Can you point to any statement on the Wikipedia page you think is incorrect? --[[User:Spoon!|Spoon!]] ([[User talk:Spoon!|talk]]) 18:20, 31 January 2016 (UTC)
 
== Goof? (part 2) ==
Anonymous user