Arithmetic coding/As a generalized change of radix: Difference between revisions

J: rewrite
No edit summary
(J: rewrite)
Line 187:
 
=={{header|J}}==
 
Note that decoding assumes we know the length of the string (and also that strings are assumed to contain only upper case alphanumeric letters) - but that's not a part of this task.
 
Implementation:
 
<lang J>NB. generate a frequency dictionary from a reference string
<lang J>aek=:3 :0
aekDict=:verb define
b=. x:#y NB. numeric base
ud=. ~.y NB. uniquedictionary listlists ofunique characters
io=. u i.y /:d NB. character indicesin intocanonical uniquesorder
nf=. (#/.~%&x:#)y NB. and their NB.relative frequencies of uniques
(o{d);o{f
o=. /:u NB. indices to sort uniques alphabetically
)
c=. i{(+/\0,}:o{n)/:o NB. cumulative frequencies of characters
 
f=. i{n NB. frequencies of characters
NB. encode a string against a reference dict
L=. b #. c**/\1,}:f NB. lower bound
aekEnc=:verb define
p=. */f NB. product of frequencies of characters
NB. use string to generate a dict if none provided
e=. x:<.10^.p NB. number of decimal positions to drop
(aekDict y) aekEnc y
e,~<.(L+p)%10^e NB. convenient result with trailing zeros count
:
'u F'=.x NB. unpack dictionary
b=. x:#y NB. numeric base
f=. b*F NB. absolute frequencies
i=. u i.y NB. character indices
c=. +/\0,}:f NB. cumulative frequencies
L=. b #. (i{c)**/\1,}:i{f NB. lower bound
p=. */i{f NB. product of character frequencies
e=. x:<.10^.p NB. number of decimal positions to drop
e,~<.(L+p)%10^e
)
 
 
aekDec=:adverb define
:
'u F'=. x NB. unpack dictionary
f=. m*F NB. frequencies of characters
c=.+/\0,}:f NB. cumulative frequencies
C=.<:}.c,m NB. id lookup table
N=. (* 10&^)/y NB. remainder being decoded
r=. '' NB. result of decode
for_d. m^x:i.-m do. NB. positional values
id=. <.N%d NB. character id
i=.C I.id NB. character index
N=.<.(N -(i{c)*d)%i{f NB. corrected remainder
r=.r,u{~i NB. accumulated result
end.
)
 
NB. task demo utility:
aek=:verb define
dict=. aekDict y
echo 'Dictionary:'
echo ' ',.(0{::dict),.' ',.":,.1{::dict
echo 'Length:'
echo ' ',":#y
echo 'Encoded:'
echo ' ',":dict aekEnc y
echo 'Decoded:'
echo ' ',":dict (#y) aekDec aekEnc y
)</lang>
 
Line 209 ⟶ 247:
 
<lang J> aek 'DABDDB'
Dictionary:
251 2
A 1r6
B 1r3
D 1r2
Length:
6
Encoded:
251 2
Decoded:
DABDDB
 
aek 'DABDDBBDDBA'
Dictionary:
167351 6
A 2r11
aeka 'ABRACADABRA'
B 4r11
7954170 4
D 5r11
aek 'TOBEORNOTTOBEORTOBEORNOT'
Length:
1150764267498783364 15</lang>
11
Encoded:
167351 6
Decoded:
DABDDBBDDBA
 
aek 'ABRACADABRA'
Note that these results are not sufficient to recover the original character sequences. To recover the original character sequence we probably need to know which characters were used and the number of times each character appeared.
Dictionary:
A 5r11
B 2r11
C 1r11
D 1r11
R 2r11
Length:
11
Encoded:
7954170 4
Decoded:
ABRACADABRA
 
aek 'TOBEORNOTTOBEORTOBEORNOT'
Dictionary:
B 1r8
E 1r8
N 1r12
O 1r3
R 1r8
T 5r24
Length:
24
Encoded:
1150764267498783364 15
Decoded:
TOBEORNOTTOBEORTOBEORNOT</lang>
 
Note that for this task we use our plaintext to generate our dictionary for decoding.
That's something of a problem because this algorithm, and the frequencies used, are both explicitly tied to the length of the character sequence being compressed. It doesn't work with pre-shared parameters unless you limit yourself to a fixed sized collection of characters.
 
=={{header|Perl}}==
6,962

edits