Run-length encoding: Difference between revisions
add Ruby |
|||
Line 878: | Line 878: | ||
=={{header|Ruby}}== |
=={{header|Ruby}}== |
||
<lang ruby>def encode(s) |
<lang ruby>def encode(s) |
||
s.scan(/(.)(\1*)/).inject([]) do |encoding, matchgroups| |
|||
enc = [] |
|||
char, repeat = matchgroups |
|||
s.gsub(/(.)(\1*)/) {enc << [$1, 1+$2.length]} |
|||
encoding << [char, 1 + repeat.length] |
|||
enc |
|||
end |
|||
end |
end |
||
def decode(e) |
def decode(e) |
||
e.inject("") do |decoded, pair| |
|||
str = "" |
|||
char, length = pair |
|||
decoded << char * length |
|||
str |
|||
end |
|||
end |
end |
||
orig = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW" |
orig = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW" |
||
encoded = encode(orig) # => [["W", 12], ["B", 1], ["W", 12], ["B", 3], ["W", 24], ["B", 1], ["W", 14]] |
|||
decoded = decode( |
decoded = decode(encoded) |
||
puts "success!" if decoded == orig</lang> |
puts "success!" if decoded == orig</lang> |
||
Revision as of 15:40, 5 June 2009
This page uses content from Wikipedia. The original article was at Run-length_encoding. The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance) |
You are encouraged to solve this task according to the task description, using any language you may know.
Given a string containing uppercase characters (A-Z), compress repeated 'runs' of the same character by storing the length of that run, and provide a function to reverse the compression. The output can be anything, as long as you can recreate the input with it.
Example:
- Input:
WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
- Output:
12W1B12W3B24W1B14W
See also the Look-and-say sequence.
Ada
<lang Ada> with Ada.Text_IO; use Ada.Text_IO; with Ada.Strings.Fixed; use Ada.Strings.Fixed; procedure Test_Run_Length_Encoding is
function Encode (Data : String) return String is begin if Data'Length = 0 then return ""; else declare Code : constant Character := Data (Data'First); Index : Integer := Data'First + 1; begin while Index <= Data'Last and then Code = Data (Index) loop Index := Index + 1; end loop; declare Prefix : constant String := Integer'Image (Index - Data'First); begin return Prefix (2..Prefix'Last) & Code & Encode (Data (Index..Data'Last)); end; end; end if; end Encode; function Decode (Data : String) return String is begin if Data'Length = 0 then return ""; else declare Index : Integer := Data'First; Count : Natural := 0; begin while Index < Data'Last and then Data (Index) in '0'..'9' loop Count := Count * 10 + Character'Pos (Data (Index)) - Character'Pos ('0'); Index := Index + 1; end loop; if Index > Data'First then return Count * Data (Index) & Decode (Data (Index + 1..Data'Last)); else return Data; end if; end; end if; end Decode;
begin
Put_Line (Encode ("WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW")); Put_Line (Decode ("12W1B12W3B24W1B14W"));
end Test_Run_Length_Encoding; </lang> Sample output:
12W1B12W3B24W1B14W WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
AWK
It works with "textual" input. Lines containing numbers are skipped, since they can't be represented in a not ambiguous way in this implementation (e.g. "11AA" would be encoded as "212A", which would be decoded as A repeated 212 times!)
Encoding
<lang awk>BEGIN {
FS=""
} /^[^0-9]+$/ {
cp = $1; j = 0 for(i=1; i <= NF; i++) { if ( $i == cp ) { j++; } else { printf("%d%c", j, cp) j = 1 } cp = $i } printf("%d%c", j, cp)
}</lang>
Decoding
<lang awk>BEGIN {
RS="[0-9]+[^0-9]" final = "";
} {
match(RT, /([0-9]+)([^0-9])/, r) for(i=0; i < int(r[1]); i++) { final = final r[2] }
} END {
print final
}</lang>
ALGOL 68
Note: The following uses iterators, eliminating the need of declaring arbitrarily large CHAR arrays for caching. <lang algol>STRING input := "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"; STRING output := "12W1B12W3B24W1B14W";
MODE ITERATOR = VOID; MODE YIELDCHAR = PROC(CHAR)ITERATOR; MODE ITERCHAR = PROC(YIELDCHAR)ITERATOR;
PROC char seq = (REF STRING s, YIELDCHAR yield)ITERATOR:
FOR i FROM LWB s TO UPB s DO yield(s[i]) OD;
- Note: The following 2 lines use currying. This not supported by ELLA ALGOL 68RS #
ITERCHAR input seq = char seq(input,),
output seq = char seq(output,);
PROC for encode = (ITERCHAR for seq, YIELDCHAR yield)ITERATOR: (
INT count := 0; CHAR prev;
- FOR c IN seq DO #
for seq((CHAR c)VOID: ( IF count = 0 THEN count := 1; prev := c ELIF c /= prev THEN STRING str count := whole(count,0); char seq(str count, yield); count := 1; yield(prev); prev := c ELSE count +:=1 FI ) )
- OD #;
IF count /= 0 THEN STRING str count := whole(count,0); char seq(str count,yield); yield(prev) FI
);
STRING zero2nine = "0123456789";
PROC for decode = (ITERCHAR for seq, YIELDCHAR yield)ITERATOR: (
INT repeat := 0;
- FOR c IN seq DO #
for seq((CHAR c)VOID: ( IF char in string(c, NIL, zero2nine) THEN repeat := repeat*10 + ABS c - ABS "0" ELSE FOR i TO repeat DO yield(c) OD; repeat := 0 FI ))
);
- iterate through input string #
print("Encode input: ");
- FOR c IN encode(input seq) DO #
for encode(input seq, (CHAR c)VOID: print(c) )
- OD #;
print(new line);
print("Decode output: ");
- FOR c IN decode(output seq) DO #
for decode(output seq, (CHAR c)VOID: print(c) )
- OD #;
print(new line)</lang>
Encode input: 12W1B12W3B24W1B14W Decode output: WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW
AutoHotkey
<lang AutoHotkey> msgbox % key := rle_encode("WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW") msgbox % rle_decode(key) rle_encode(message) {
StringLeft, previous, message, 1 StringRight, last, message, 1 message .= asc(chr(last)+1) count = 0 loop, parse, message { if (previous == A_LoopField) count +=1 else { output .= previous . count previous := A_LoopField count = 1 } } return output
}
rle_decode(message) {
pos = 1 while, item := RegExMatch(message, "\D", char, pos) { digpos := RegExMatch(message, "\d+", dig, item) loop, % dig { output .= char } pos := digpos } return output
} </lang>
C
These functions have no check for the size of the output buffers.
Encoding function
Since repeat counter must fit a single byte in this implementation, it can't be greater than 255, so a byte repeated more than 255 times generates in the compressed stream more than 2 bytes (4 bytes if the length of the repeated byte sequence is less than 511 and so on)
<lang c>int rle_encode(char *out, const char *in, int l) {
int dl, i; char cp, c;
for(cp=c= *in++, i = 0, dl=0; l>0 ; c = *in++, l-- ) { if ( c == cp ) { i++; if ( i > 255 ) {
*out++ = 255; *out++ = c; dl += 2; i = 1;
} } else { *out++ = i; *out++ = cp; dl += 2; i = 1; } cp = c; } *out++ = i; *out++ = cp; dl += 2; return dl;
}</lang>
Decoding function
<lang c>int rle_decode(char *out, const char *in, int l) {
int i, j, tb; char c;
for(tb=0 ; l>=0 ; l -= 2 ) { i = *in++; c = *in++; tb += i; while(i-- > 0) *out++ = c; } return tb;
}</lang>
Usage example
<lang c>#include <stdio.h>
- include <stdlib.h>
- include <string.h>
const char *o = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW";
int main() {
char *d = malloc(2*strlen(o)); char *oc = malloc(strlen(o)); int rl = rle_encode(d, o, strlen(o)); /* fwrite(d, 1, rl, stdout); */
int ocl = rle_decode(oc, d, rl); fwrite(oc, 1, ocl, stdout);
free(d); free(oc); return 0;
}</lang>
In the following codes, encoding and decoding are implementend as "filters" which compress/decompress standard input to standard output writing ASCII strings; they will work as long as the input has no ASCII digits in it, and the compressed/original ratio of a "single group" will be less than or equal to 1 as long as the ASCII decimal representation's length of the repeat counter will be shorter than the length of the "group". It should be so except in the case the group is a single isolated character, e.g. B gives 1B (one byte against two compressed bytes)
Encoding filter
<lang c>#include <stdio.h>
int main() {
int i, c, cp;
for(cp=c=getchar(), i = 0; c != EOF; c = getchar() ) { if ( c == cp ) { i++; } else { printf("%d%c", i, cp); i = 1; } cp = c; } printf("%d%c", i, cp);
}</lang>
Decoding filter
<lang c>#include <stdio.h>
int main() {
int i, c, j;
while( scanf("%d%c", &i, &c) == 2 ) { for(j=0; j < i; j++) printf("%c", c); }
}</lang>
Final note: since the repeat counter value 0 has no meaning, it could be used as it would be 256, so extending by one the maximum number of repetitions representable with a single byte; or instead it could be used as a special marker to encode in a more efficient way (long) sequences of isolated characters, e.g. "ABCDE" would be encoded as "1A1B1C1D1E"; it could be instead encoded as "05ABCDE".
Common Lisp
<lang lisp>(defun group-similar (sequence &key (test 'eql))
(loop for x in (rest sequence) with temp = (subseq sequence 0 1) if (funcall test (first temp) x) do (push x temp) else collect temp and do (setf temp (list x))))
(defun run-length-encode (sequence)
(mapcar (lambda (group) (list (first group) (length group))) (group-similar (coerce sequence 'list))))
(defun run-length-decode (sequence)
(reduce (lambda (s1 s2) (concatenate 'simple-string s1 s2)) (mapcar (lambda (elem) (make-string (second elem) :initial-element (first elem))) sequence)))
(run-length-encode "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW") (run-length-decode '((#\W 12) (#\B 1) (#\W 12) (#\B 3) (#\W 24) (#\B 1)))</lang>
E
<lang e>def rle(string) {
var seen := null var count := 0 var result := [] def put() { if (seen != null) { result with= [count, seen] } } for ch in string { if (ch != seen) { put() seen := ch count := 0 } count += 1 } put() return result
}
def unrle(coded) {
var result := "" for [count, ch] in coded { result += E.toString(ch) * count } return result
}</lang>
<lang e>? rle("WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW")
- value: [[12, 'W'], [1, 'B'], [12, 'W'], [3, 'B'], [24, 'W'], [1, 'B'], [14, 'W']]
? unrle(rle("WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"))
- value: "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"</lang>
Fan
<lang Fan>
- Generates a run-length encoding for a string
class RLE {
Run[] encode(Str s) { runs := Run[,]
s.size.times |i| { ch := s[i] if (runs.size==0 || runs.last.char != ch) runs.add(Run(ch)) runs.last.inc } return runs }
Str decode(Run[] runs) { buf := StrBuf() runs.each |run| { run.count.times { buf.add(run.char.toChar) } } return buf.toStr }
Void main() { echo(decode(encode(
"WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
))) }
}
internal class Run {
Int char Int count := 0 new make(Int ch) { char = ch } Void inc() { ++count }
override Str toStr() { return "${count}${char.toChar}" }
} </lang>
Haskell
<lang haskell>import Data.List (group)
-- Datatypes type Encoded = [(Int, Char)] -- An encoded String with form [(times, char), ...] type Decoded = String
-- Takes a decoded string and returns an encoded list of tuples rlencode :: Decoded -> Encoded rlencode = map (\g -> (length g, head g)) . group
-- Takes an encoded list of tuples and returns the associated decoded String rldecode :: Encoded -> Decoded rldecode = concatMap decodeTuple
where decodeTuple (n,c) = replicate n c
main :: IO () main = do
-- Get input putStr "String to encode: " input <- getLine -- Output encoded and decoded versions of input let encoded = rlencode input decoded = rldecode encoded putStrLn $ "Encoded: " ++ show encoded ++ "\nDecoded: " ++ show decoded</lang>
Java
<lang java>import java.util.regex.Matcher; import java.util.regex.Pattern; public class RunLengthEncoding {
public String encode(String source) { StringBuffer dest = new StringBuffer(); for (int i = 0; i < source.length(); i++) { int runLength = 1; while (i+1 < source.length() && source.charAt(i) == source.charAt(i+1)) { runLength++; i++; } dest.append(runLength); dest.append(source.charAt(i)); } return dest.toString(); }
public String decode(String source) { StringBuffer dest = new StringBuffer(); Pattern pattern = Pattern.compile("[0-9]+|[a-zA-Z]"); Matcher matcher = pattern.matcher(source); while (matcher.find()) { int number = Integer.parseInt(matcher.group()); matcher.find(); while (number-- != 0) { dest.append(matcher.group()); } } return dest.toString(); }
public static void main(String[] args) { RunLengthEncoding RLE = new RunLengthEncoding(); String example = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"; System.out.println(RLE.encode(example)); System.out.println(RLE.decode("1W1B1W1B1W1B1W1B1W1B1W1B1W1B")); }
}</lang>
Objective-C
The class RCRunLengthEncoder represents internally data with which it was feeded as pair character - repetition counter: it does not implement a binary representation of itself; it is left to another class, so that different input/output encodings are possible starting from the same class.
<lang objc>#import <Foundation/Foundation.h>
@interface RCRunLengthEncoder : NSObject {
NSMutableArray *contents; NSMutableArray *counters;
} + (id)alloc; + (id)encoderWithData: (NSData *)data; - (id)init; - (id)initWithData: (NSData *)data; - (void)dealloc; - (void)addByte: (char)aByte; - (void)addByte: (char)aByte repeated: (int)repetitionCount; - (void)addData: (NSData *)data; - (NSData *)data; - (NSArray *)counters; - (NSArray *)contents; @end
@implementation RCRunLengthEncoder
+ (id)alloc
{
self = [[super alloc] init]; return self;
}
+ (id)encoderWithData: (NSData *)data {
id ne = [[RCRunLengthEncoder alloc] initWithData: data]; return [ne autorelease];
}
- (id)initWithData: (NSData *)data {
[self addData: data]; return self;
}
- (id)init {
contents = [[NSMutableArray alloc] init]; counters = [[NSMutableArray alloc] init]; return self;
}
- (void)dealloc {
[counters release]; [contents release]; [super dealloc];
}
- (void)addByte: (char)aByte {
[self addByte: aByte repeated: 1];
}
- (void)addByte: (char)aByte repeated: (int)repetitionCount {
if ( ([contents count] == 0) || ([[contents lastObject] charValue] != aByte) ) { [contents addObject: [NSNumber numberWithChar: aByte]]; [counters addObject: [NSNumber numberWithInt: repetitionCount]]; } else { NSNumber *a = [[counters lastObject] retain]; [counters removeLastObject]; [counters addObject: [NSNumber numberWithInt:
([a intValue] + repetitionCount) ]];
[a release]; }
}
- (void)addData: (NSData *)data {
[data retain]; char *d = (char *)[data bytes]; NSUInteger i; for(i=0; i < [data length]; i++) [self addByte: d[i]]; [data release];
}
- (NSArray *)contents {
return contents;
}
- (NSArray *)counters {
return counters;
}
- (NSData *)data {
NSMutableData *d = [[NSMutableData alloc] initWithCapacity: 256]; char buf[256]; int i;
for(i=0; i < [contents count]; i++) { char c = [[contents objectAtIndex: i] charValue]; int n = [[counters objectAtIndex: i] intValue]; memset(buf, c, 256); while ( n > 0 ) { [d appendBytes: buf length: MIN(256, n)]; n -= 256; } } return d;
} @end</lang>
The class codecRLE is derived from the previous, adding the methods that allow to binary encode the data internally held, and to create a internal representation from the encoded data. The specification here used are:
- byte N >= 128, then the next byte must be repeated N-128 times (if N-128 is 0, the the byte must be repeated 128 times)
- byte N < 128, then the next N bytes must be taken as they are (if N is 0, the next 128 bytes are literal)
<lang objc>@interface codecRLE : RCRunLengthEncoder - (NSData *)encode; - (codecRLE *)decode: (NSData *)enc; @end
@implementation codecRLE - (codecRLE *)decode: (NSData *)enc {
char *buf = (char *)[enc bytes]; int i;
for(i = 0; i < [enc length]; ) { if ( (buf[i] & 0x80) != 0) { [self addByte: buf[i+1] repeated: ( ((buf[i]&0x7f) == 0) ? 128 : (buf[i]&0x7f) ) ]; i += 2; } else { int rc = (buf[i] == 0) ? 128 : buf[i]; [self addData: [NSData dataWithBytes: &buf[i+1] length: rc]]; i += rc + 1; } } return self;
}
- (NSData *)encode {
int literalCount=0; int i; char buf[129];
NSMutableData *r = [[NSMutableData alloc] initWithCapacity: 256];
for(i=0; i < [counters count]; i++) { char c = [[contents objectAtIndex: i] charValue]; int howMany = [[counters objectAtIndex: i] intValue]; if ( literalCount == 128 ) { buf[0] = 0; [r appendBytes: buf length: 129]; literalCount = 0; } if ( howMany == 1 ) { buf[literalCount+1] = c; literalCount++; } else { if ( literalCount > 0 ) {
buf[0] = literalCount & 0x7f; [r appendBytes: buf length: (literalCount+1) ]; literalCount = 0;
} buf[1] = c; while( howMany > 127 ) {
buf[0] = 0x80; [r appendBytes: buf length: 2]; howMany -= 128;
} if (howMany > 0) {
buf[0] = howMany | 0x80; [r appendBytes: buf length: 2];
} } } return r;
} @end</lang>
Usage example:
<lang objc>const char *s = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW";
int main() {
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
codecRLE *enc = [[codecRLE alloc]
initWithData: [NSData dataWithBytes: s length: strlen(s)] ];
NSData *repr = [enc encode]; fwrite([repr bytes], 1, [repr length], stdout); [enc release];
enc = [[codecRLE alloc] decode: repr]; [repr release]; NSData *d = [enc data]; fwrite([d bytes], 1, [d length], stdout); [d release]; [enc release];
[pool drain]; return EXIT_SUCCESS;
}</lang>
Notes
- The code is not deeply tested yet
- Methods encode and data creates a NSData object that must be released by the user (it is not autoreleased nor released when the class is released)
Perl
<lang perl>sub encode
{my $str = shift; $str =~ s {(.)(\1*)} {(length($2) + 1) . $1 . ';'}gse; return $str;}
sub decode
{my $str = shift; $str =~ s {(\d+)(.);} {$2 x $1}gse; return $str;}</lang>
The following modified versions of the previous one, encode/decode a bytes sequence in a way compatible with the functions of the C version.
<lang perl>sub encode
{my $str = shift; $str =~ s {(.)(\1{0,254})} {pack("C",(length($2) + 1)) . $1 }gse; return $str;}
sub decode {
my @str = split //, shift; my $r = ""; foreach my $i (0 .. scalar(@str)/2-1) {
$r .= $str[2*$i + 1] x unpack("C", $str[2*$i]);
} return $r;
}</lang>
Python
<lang python>def encode(input_string):
count = 1 prev = lst = [] for character in input_string: if character != prev: if prev: entry = (prev,count) lst.append(entry) #print lst count = 1 prev = character else: count += 1 else: entry = (character,count) lst.append(entry) return lst
def decode(lst):
q = "" for character, count in lst: q += character * count return q
- Method call
encode("aaaaahhhhhhmmmmmmmuiiiiiiiaaaaaa") decode([('a', 5), ('h', 6), ('m', 7), ('u', 1), ('i', 7), ('a', 6)])</lang>
Functional
<lang python>from itertools import groupby def encode(input_string):
return [(len(list(g)), k) for k,g in groupby(input_string)]
def decode(lst):
return .join(c * n for n,c in lst)
encode("aaaaahhhhhhmmmmmmmuiiiiiiiaaaaaa") decode([(5, 'a'), (6, 'h'), (7, 'm'), (1, 'u'), (7, 'i'), (6, 'a')])</lang>
By regular expression
The simplified input range of only uppercase characters allows a simple regular expression to be applied repeatedly for encoding, and another for decoding:
<lang python>from re import finditer
def encode(text):
Doctest: >>> encode('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW') '12W1B12W3B24W1B14W' return .join( str(len(run.group(0))) + run.group(1) for run in finditer(r'(.)\1*', text) )
def decode(text):
Doctest: >>> decode('12W1B12W3B24W1B14W') 'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW' return .join( run.group(2) * int(run.group(1)) for run in finditer(r'(\d+)(\D)', text) )
textin = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW" assert decode(encode(textin)) == textin</lang>
Ruby
<lang ruby>def encode(s)
s.scan(/(.)(\1*)/).inject([]) do |encoding, matchgroups| char, repeat = matchgroups encoding << [char, 1 + repeat.length] end
end
def decode(e)
e.inject("") do |decoded, pair| char, length = pair decoded << char * length end
end
orig = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW" encoded = encode(orig) # => [["W", 12], ["B", 1], ["W", 12], ["B", 3], ["W", 24], ["B", 1], ["W", 14]] decoded = decode(encoded) puts "success!" if decoded == orig</lang>
Smalltalk
The class RunLengthEnc
holds a representation of a run length encoded sequence of objects.
<lang smalltalk>Object subclass: RunLengthEnc [
|counters contents| <category: 'Compression'> <comment: 'My instances are similar to a Bag, except
that items are ordered and counted iff they are adjacent. So that this instance keeps a representation of the added items suitable for performing a RunLengthEncoding, hence the name.'>
RunLengthEnc class >> new [ ^self basicNew initialize ]
size [ ^counters size ]
add: anObject [ ^(self add: anObject withCount: 1) ]
add: anObject withCount: anInt [ anObject isNil ifTrue: [ SystemExceptions.InvalidArgument signalOn: anObject reason: 'RunLengthEnc encodes existing objects, e.g. integers or characters, not nil' ]. (self size) > 0 ifTrue: [ (contents last) == anObject ifTrue: [ self incrementLastCounter: anInt. ] ifFalse: [
self appendObject: anObject withCount: anInt
] ] ifFalse: [ self appendObject: anObject withCount: anInt ]. ^anObject ]
initialize [ counters := OrderedCollection new. contents := OrderedCollection new. ]
appendObject: anObject withCount: anInt [ contents addLast: anObject. counters addLast: anInt ]
appendObject: anObject [ contents addLast: anObject. counters addLast: 1 ]
incrementLastCounter: howMuch [ | c | c := counters removeLast. counters addLast: (c+howMuch) ]
"the 'do:' can be used to let the user store the compressed 'stream' as s/he prefers, while 'add:withCount:' can be used to rebuild the informations from the custom storage" do: aBlock [ 1 to: (counters size) do: [ :i | | l | aBlock value: (contents at: i) value: (counters at: i) ] ]
asOrderedCollection [ |oc| oc := OrderedCollection new. self do: [ :o :c | 1 to: c do: [ :i| oc add: o ] ]. ^oc ]
printOn: aStream [ "output a representation of the object: counter [object] ... for every object" 1 to: (counters size) do: [ :i | (counters at: i) printOn: aStream.
aStream nextPut: (Character value: 32). (contents at: i) printOn: aStream. aStream nextPut: (Character nl).
] ]
asString [ |oc| "'build' a string from the run length encoded objects; works only if objects are Characters or Strings" oc := self asOrderedCollection. ^(oc asString) ]
].</lang>
The following extension to the OrderedCollection class allows to run length encode an ordered collection (theoretically of any objects' kind, but the RunLengthEnc class is supposed to work with characters mainly).
<lang smalltalk>OrderedCollection extend [
asRunLengthEnc [ |rc| rc := RunLengthEnc new. self do: [ :o | rc add: o ]. ^rc ]
].</lang>
The following extention to the String class allows to run length encode a string (it is basically a shortcut for aString asOrderedCollection asRunLengthEnc
).
<lang smalltalk>String extend [
asRunLengthEnc [ ^self asOrderedCollection asRunLengthEnc ]
].</lang>
Usage examples
<lang smalltalk>|cs s os|
s := 'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW'.
"let us run length encode the string" cs := s asRunLengthEnc. cs print. "this shows an ASCII representation of the run length encoded objects collection;
in this case, of the string"
"let us show that the class is able to return the string back; this really works
iff the objects of the collection are strings or characters"
cs asString displayNl.</lang>
The class does not mandate a way of storing itself into a file that can be loaded later. The following sample code shows how it could be done quickly, but not efficiently from the point of view of a compressor.
<lang smalltalk>|f| "let's write the object and its informations to a file..." f := FileStream open: 'rledump' mode: FileStream write. ObjectDumper dump: cs to: f. f close.
"... and let's read it back" |o| f := FileStream open: 'rledump' mode: FileStream read. o := ObjectDumper loadFrom: f. o print. "show that the object holds the same informations of cs" f close.</lang>
Tcl
The encoding is an even-length list with elements {count char ...} <lang tcl>proc encode {string} {
set encoding {} # use a regular expression to match runs of one character foreach {run -} [regexp -all -inline {(.)\1+|.} $string] { lappend encoding [string length $run] [string index $run 0] } return $encoding
}
proc decode {encoding} {
foreach {count char} $encoding { append decoded [string repeat $char $count] } return $decoded
}</lang>
<lang tcl>set str "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW" set enc [encode $str] ;# ==> {12 W 1 B 12 W 3 B 24 W 1 B 14 W} set dec [decode $enc] if {$str eq $dec} {
puts "success"
}</lang>