Talk:LZW compression: Difference between revisions

m
→‎Details of LZW: -- belatedly sign section
m (→‎Details of LZW: -- belatedly sign section)
 
(6 intermediate revisions by 4 users not shown)
Line 1:
==Details of LZW==
 
The task currently does not actually specify the LZW algorithm to be implemented. Instead, it defers to whatever is currently present in the Wikipedia presentation of the algorithm. This has left some compatibility details unspecified. (As near as I can tell, none of the implementations here include support for these issues.) But the task page is also back-linked from wikipedia, and someone should take responsibility for mentioning these things:
 
(1) One consequence is that the implementations do not include the LZW bitpacking algorithm. Instead, they represent the numbers which would be fed to that algorithm. The bitpacking algorithm shifts indices into position to take advantage of all guaranteed zeros in the representation of the previous index. (GIF uses little-endian representation -- low order bits first, TIFF and PDF use big-endian representation -- high order bits first.)
 
(2) Another consequence is that the implementations tend to not include the LZW clear code (in the context of the compress file format, this is referred to as LZC -- basically all versions of compress skipped past code 0x100 and the documentation on that file format used this starting point for the dictionary as the distinction between LZW and LZC -- image file format specifications do the same but still call the format LZW). LZC may limit the size of the dictionary (12 bit indices in image file formats, typically 16 bit indices in the compress file format (which used a file name extension of ".Z"). When the dictionary size is reached, the encoder emits the clear code (0x100 for the usual ANSI LZW dictionary) and reverts the dictionary to its original state. (And, of course, the decoder must do the same at that point.) Though, if the dictionary size is known it's also possible to simply clear the dictionary when it gets full (and some versions of compress apparently did work this way). Note also that this was a non-issue when file sizes were small -- this was retrofitted into the algorithm when large file sizes became an issue.
 
(3) Another issue is that the LZW file format begins with a three byte sequence (0x1F 0x9d XX where (c notation) XX&0x1F was the dictionary size limit and XX&0x80 is true if clear codes are emitted.
 
(At the time I am writing this, this is an second draft describing these issues. This version is, reasonably accurate should be reviewed by someone with more of an emphasis on grammatical issues. Maybe me after enough time has passed to be baffled by how I wrote it. And then maybe reviewed again for technical accuracy. Note also that further down on this page is some example code that does bit packing,)
 
--[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 21:39, 10 October 2020 (UTC)
 
==Task and example==
The task talks about compression, but I've seen a lot of examples implement both compression and decompression; should we create a new task (LZW decompression) where to move the decompression related code, or change this one and add decompression to those examples that still does not implement it? --[[User:ShinTakezou|ShinTakezou]] 16:51, 31 January 2009 (UTC)
Line 4 ⟶ 18:
:: I see it hard (1), since most OO-lang implementation must be split by the people who wrote the code or know the language; (1) by hard I mean, I can't do it by myself, I could just add a new decompression task and write an impl. for Obj-C that would be the "complement" of the one here, then mark all other examples as ... hm... they do the task, but do also more... so they are task-compliant in some way... (what?) --[[User:ShinTakezou|ShinTakezou]] 16:43, 13 February 2009 (UTC)
::: Not a problem, currently. I can understand the currently-listed languages enough to do the separation myself. I'll get to it as soon as I can, which will likely be some time after midnight EST. --[[User:Short Circuit|Short Circuit]] 21:27, 13 February 2009 (UTC)
 
==JavaScript and Unicode==
The implementation of algorithm in JavaScript can treat ASCII characters only. Any attempt to work with Unicode causes data lost because of the dictionary doesn't contain characters with code greater than 255. But the line
<lang javascript>
result.push(dictionary[w]);
</lang>
expects the string stored in variable ''w'' is in the dictionary, which is wrong.--[[User:ArtyomShegeda|ArtyomShegeda]] ([[User talk:ArtyomShegeda|talk]]) 17:28, 28 October 2015 (UTC)
 
==C# Decompression function==
Line 63 ⟶ 84:
 
:'''WARNING: This code appears to have come from a GIF codec that has been modified to meet the requirements of this page, provided that the decoder works with the encoder to produce correct output. For writing GIF files the write_bits subroutine is wrong for Little Endian systems (it may be wrong for Big Endian as well.) The encoder also increases the number of bits in the variable length GIF-LZW after the N-2 code, whereas this must be done after N-1 to produce a working GIF file (just looking at the encoder, it's easy to see how this mistake could be made.)''' --[[User:Mick P.|Mick P.]] ([[User talk:Mick P.|talk]]) 03:17, 21 October 2015 (UTC)
*I was told by a person monitoring the GIF page on Wikipedia that the C code here looks like it may have came from a PDF reader, which they say is similar to GIF but different in subtle ways. In any case, I rewrote the C example for a GIF flavored codec, which does LZW packed to the GIF specification, combined, like the C code. Technically the packing is not LZW, so it may not even be appropriate. The following code could be used to make the C example more readable, although it switches to C++. Unfortunately I do not have a matching decoder to offer. It's also written to be ultra-compact, which may not be helpful, in addition to bebeing slightly unorthodox.
<lang cpp>
//GIF flavored LZW based on:
Line 111 ⟶ 132:
if(code!=cc)
nc = dictionary[code].next[c] = next_code++;
else if(c_bits>=12) c_bits = 9colorbits+1;
code = c;
}
Line 127 ⟶ 148:
</lang>
--[[User:Mick P.|Mick P.]] ([[User talk:Mick P.|talk]]) 11:08, 27 October 2015 (UTC)
 
: It took me a small while to find where the &nbsp; &nbsp; '''GOTO &nbsp; CC''' &nbsp; &nbsp; er, &nbsp; ...&nbsp; goes. &nbsp; Funny place to nearly hide a label. &nbsp; &nbsp; -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 21:19, 9 October 2020 (UTC)
6,951

edits