Talk:Run-length encoding

From Rosetta Code

"The output can be anything." might be too open-ended. I would prefer it if all the solutions did it the same way, or something. --76.167.241.45 18:32, 24 April 2009 (UTC)

I would prefer this task to use a run length encoding which will work at least on all ASCII characters. The current encoding can't represent [0-9]. A more flexible encoding is implemented here. --IanOsgood 19:27, 24 April 2009 (UTC)

  • Run code has high bit set, remaining 7 bits are run length-1.
  • Bytes in stream with high bit set are always encoded as a run, perhaps with a length of only one.
The RLE I knew was: high bit says if that byte says how many "literal bytes" follow, or if the next byte must be repeated according to the value of the previous byte (clearing the high bit); so, e.g. ABCD would be encoded as, just to say, bytes 84 "A" "B" "C" "D"; of course, this way the maximum number of repetition (or longest literal sequence) is 128 (0 means 128). I've settled down to the task specification, but tried to create more usable examples... --ShinTakezou 23:08, 24 April 2009 (UTC)
Woah! I didn't expect the page to take off this fast.
To be honest, I was just annoyed that the Wikipedia page had too much example code, but thought it would be a waste to just delete it, so I copied it over here, where I thought it might do some good. It's true that the task as it stands isn't formulated very well -- my goal was mostly to preserve the code from Wikipedia, which differed in how much it could encode (e.g. digits) and what it did output (String vs. Nested Array/List), but if the community is this active, it may be beneficial to just change the definition to require the implementations to be able to deal with arbitrary byte sequences as input, and/or output.
I did not exactly specify the way to encode the input because there were at least four variants I know of:
  1. Out of Band (as the task is described currently, with digits always signifying a run-length)
  2. Escape character followed by run-length and run-character (e.g. AB\C3 -> ABCCC)
    1. Escape character changes after every occurance in the input (in the hope of finding an unused character)
  3. Two (or three, or x) characters are always followed by the run-length (e.g. AA3 -> AAAAA)
And you can encode the run-length itself differently for very long runs (multiple escape-sequences vs. base128 encoded run-length)
-- DataWraith 14:42, 25 April 2009 (UTC)