Talk:Huffman coding: Difference between revisions

← Older edit

Talk:Huffman coding (view source)

Revision as of 08:36, 31 October 2021

3,841 bytes added , 2 years ago

→‎Python: Fails??

Anonymous user

rosettacode>Paddy3118

Revision as of 00:23, 27 March 2009 (view source) rosettacode>ShinTakezou (→‎Category: new section) ← Older edit		Latest revision as of 08:36, 31 October 2021 (view source) rosettacode>Paddy3118 (→‎Python: Fails??)
(16 intermediate revisions by 11 users not shown)
Line 1: ==the Java example== Isn't the Java example wrong? It's not even descending - I thought Huffman code needs it to be sorted as descending before making a tree. Umm... this is all wrong. Read the Wikipedia article. According to your scheme Huffman codewords would all be of the form 111...10 or 111...1, but that is not at all the case. --[[Special:Contributions/76.167.241.45\|76.167.241.45]] 03:59, 26 March 2009 (UTC) : Yep. It is not Huffman coding. (Hmm, I do like the wikipedia description of the two queue method though ...) --[[User:Paddy3118\|Paddy3118]] 06:56, 26 March 2009 (UTC) Line 96 ⟶ 99: ==~~IS This OK?~~Python== I took a <strike>good</strike> better look at the WP article and came up with the following code, together with printouts of what it is doing: <lang python>from heapq import heappush, heappop, heapify Line 172 ⟶ 175: u 1 11000 x 1 11001</pre> - --[[User:Paddy3118\|Paddy3118]] 10:14, 27 March 2009 (UTC) : At a glance it's ok; it's enough that the codes are not ambiguous (a longer code cannot have as "prefix" a shorter one). My code generate apparently a third way... hopefully right: Line 193 ⟶ 198: x (5) 10111 </pre> ::Yeah, it looks right. --[[Special:Contributions/71.106.173.110\|71.106.173.110]] 09:47, 27 March 2009 (UTC) :: Pretty certain testing the Python code is not correct --[[Special:Contributions/Art-the-physicist\|Art-the-physicist]] 18:09, 16 May 2019 ::: Please post your failing testcase for review. --[[User:Paddy3118\|Paddy3118]] ([[User talk:Paddy3118\|talk]]) 08:36, 31 October 2021 (UTC) == Category == Is this a Text Processing task? We use "text" just because it's easier to show things using text, but this task works with any sequence of bytes... So I've removed the Text Processing category from the task. --[[User:ShinTakezou\|ShinTakezou]] 00:23, 27 March 2009 (UTC) == The C code uses a GCC extension == In the C code, the <code>swap_</code> macro (defined for <code>_heap_sort</code>) uses statement expressions, which are a gcc extension. The program might therefore not compile with other compilers. --[[User:Ce\|Ce]] 08:57, 27 March 2009 (UTC) : Fixed. --[[User:ShinTakezou\|ShinTakezou]] 10:22, 27 March 2009 (UTC) == Complaint about C++ example == User 122.167.5.231 [http://rosettacode.org/mw/index.php?title=Huffman_coding&diff=129055&oldid=128618 added] a claim that the C++ code is incorrect. I have moved the note from the page to here: ''Important : This method does not generate the optimal Huffman tree for any given string; it suffers from a serious flaw because of the fact that elements in a c++ priority queue are ordered according to strict weak ordering. To see why, please check out [http://cs.nyu.edu/~melamed/courses/102/lectures/huffman.ppt this example]. It shows that the optimal huffman tree for the given line of text will have no code longer than 4 bits. This piece of code generates huffman codes which are 5 bits in size. Try running it with the same line of text as input and you can verify this.'' I dispute these statements. First of all, the linked PowerPoint presentation incorrectly encodes the text in their example ("Eerie eyes seen near lake.") given their own encoding they generated. It says that it takes 73 bits; however, the correct encoded string is <tt>000010110000011001110001010110101111011010111001111101011111100011001111110100100101</tt>, which is 84 bits. Secondly, nowhere in the PowerPoint does it "show that the optimal huffman tree ... will have no code longer than 4 bits". It merely shows that that particular optimal Huffman coding (one of many possible ones which are equally optimal) has no code longer than 4 bits. In fact, if you take the C++ code and run the same example string, you will get an encoding which, although it uses 5-bit codes for some characters, still encodes the string in 84 bits, so is equally optimal. Finally, the PowerPoint does not mention any "serious flaw because of the fact that elements in a c++ priority queue are ordered according to strict weak ordering", and I can't seem to make sense of this statement. --[[User:Spoon!\|Spoon!]] 11:21, 26 December 2011 (UTC) : Given that the example string has more than 16 different characters, the "4 bit" assertion is obviously wrong. I guess the anon failed to take into account the requirement that no code can be a prefix to another code. (And, powerpoint? Please. Use a portable format.) --[[User:Ledrug\|Ledrug]] 18:29, 26 December 2011 (UTC) :: The example string in their example only has 12 characters. --[[User:Spoon!\|Spoon!]] 03:58, 27 December 2011 (UTC) == task's wording == "Using the characters and their frequency from the string "this is an example for huffman encoding", create a program to generate a Huffman encoding for each character as a table."<br> I'd suggest:<br> create a program to generate a Huffman encoding for each character in a string as a table and show this table for the string "this is an example for huffman encoding" --[[User:Walterpachl\|Walterpachl]] ([[User talk:Walterpachl\|talk]]) 07:54, 26 July 2014 (UTC)