Talk:Huffman coding: Difference between revisions

From Rosetta Code
Content added Content deleted
(static huff enc)
m (→‎By hand: what's wrong)
Line 54: Line 54:


Id est, two bit for each letter. --[[User:ShinTakezou|ShinTakezou]] 16:33, 26 March 2009 (UTC)
Id est, two bit for each letter. --[[User:ShinTakezou|ShinTakezou]] 16:33, 26 March 2009 (UTC)

By the way, what is exactly wrong? The examplanation? Or the java code? --[[User:ShinTakezou|ShinTakezou]] 16:37, 26 March 2009 (UTC)

Revision as of 16:37, 26 March 2009

Umm... this is all wrong. Read the Wikipedia article. According to your scheme Huffman codewords would all be of the form 111...10 or 111...1, but that is not at all the case. --76.167.241.45 03:59, 26 March 2009 (UTC)

Yep. It is not Huffman coding. (Hmm, I do like the wikipedia description of the two queue method though ...) --Paddy3118 06:56, 26 March 2009 (UTC)
I did it based on what I learned in class today. If you look at the "Basic technique" section on the WP it shows codes identical to ones I used in the example so I'm pretty sure it is Huffman coding. There must be a few ways to generate them that give different actual codes with the same idea. --Mwn3d 13:44, 26 March 2009 (UTC)
For the example given, the Huffman code indeed looks like this. But as a general algorithm, it's wrong. What you should do is that you "combine" the last two elements in the table, and sort them back into your list. So starting with the example,
(A=?): 50%
(B=?): 25%
(C=?): 12.5%
(D=?): 12.5%
you first assign the bit to the last two items (C and D), and then combine them, adding the frequencies, and sorting it into the right place
(A=?): 50%
(B=?): 25%
(C=?0,D=?1): 25%
Then you do the same again:
(A=?): 50%
(B=?0, C=?10, D=?11): 50%
And finally you get
(A=0, B=10, C=110, D=1110): 100%
Thus the result is indeed as given in the example. However, assume that you start with
(A=?): 25%
(B=?): 25%
(C=?): 25%
(D=?): 25%
Your algorithm would still give the same result, while it's obvious that using just the standard two-bit numbering is optimal for this case. And indeed, the first step of the Huffman coding gives:
(C=?0, D=?1): 50%
(A=?): 25%
(B=?): 25%
Note how the (C,D) case moves up, because it's probability is larger than the 25% of each of A and B. Therefore the next step combines A and B:
(A=?0, B=?1): 50%
(C=?0, D=?1): 50%
(I've made the convention that items with the same probability are sorted lexicographically; of course other conventions, like e.g. leaving the newly formed pair as low as possible, also work). Now our final combination provides:
(A=00, B=01, C=10, D=11): 100%
which obviously is an optimal code for this case. --Ce 14:17, 26 March 2009 (UTC)
Huffman coding in the case where all the symbols have the same frequency is not practical, though. If all the symbols have the same frequency no thought needs ot be put into it at all and you can just use ordinary binary encoding. --Mwn3d 15:59, 26 March 2009 (UTC)

By hand

I made tons of that when coded my first Static Huffman cruncher eons ago (68k assembly, maybe I still can find the codes on old 3.5 floppies); by hand I made so

A 50% ----------------\
B 25% ----------\      \
C 12.5%  \       \      \  100%
D 12.5%  / 25%   /  50% /

so being "up" 1 and "down" 0: A=1, B=01, C=001, D=000. From here an algo can be extracted (by word in short: group 2-by-2 less frequent leaves; so first we have 12.5% and 12.5%, then we have the new leaf at 25%, and another one at 25%, so we join these... and so on). Dynamic Huffman Encoding is a little bit harder, not too much anyway, but I have never implemented that. I like this task, with a little bit of maquillage it's ok.

Now let's take the case all 25%

A 25 \ 50 \
B 25 /     \
C 25 \     / 100
D 25 / 50 /

Id est, two bit for each letter. --ShinTakezou 16:33, 26 March 2009 (UTC)

By the way, what is exactly wrong? The examplanation? Or the java code? --ShinTakezou 16:37, 26 March 2009 (UTC)