Talk:Arithmetic coding/As a generalized change of radix: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 23:
:::::: The Wikipedia section uses cumulative frequencies. In the example, "DABDDB", gets mapped to 3, 0, 1, 3, 3, 1, where 0, 1, 3 are the cumulative frequencies for A, B, D, respectively. You can follow how those are used throughout the calculations.
::::::As another example, consider a situation where each character has frequency 1. Then the frequency dictionary is just every key mapped to 1. What you need for encoding is the cumulative frequencies 0, 1, 2, ..., which tells you what value to assign to each character. --[[User:Spoon!|Spoon!]] ([[User talk:Spoon!|talk]]) 19:43, 28 January 2016 (UTC)
::::::: No.
::::::: Here's the [[wp:Arithmetic_coding#Arithmetic_coding_as_a_generalized_change_of_radix|wikipedia]] expression for L. It won't render properly right now, but you can go to the wikipedia page to read it:
:<math>\begin{align}
\mathrm{L} = {} &(6^5 \times 3) + {}\\
& 3 \times (6^4 \times 0) + {}\\
& (3 \times 1) \times (6^3 \times 1) + {}\\
& (3 \times 1 \times 2) \times (6^2 \times 3) + {}\\
& (3 \times 1 \times 2 \times 3) \times (6^1 \times 3) + {}\\
& (3 \times 1 \times 2 \times 3 \times 3) \times (6^0 \times 1) {}\\
= {} & 25002
\end{align}</math>
::::::: The frequencies used here are 1 (A), 2 (B) and 3 (D). The value 0, and 1 and 3 correspond to the letters A (0), B (1) and D (2) - and you can see that in the wikipedia page where it states "The string is first mapped into the digit string 301331" - so that 0 has nothing to do with the concept of cumulative frequencies.
::::::: If I were to criticize the wikipedia page, I'd ding the author for choosing DABDDB as an example (because the letter encodings coincide exactly with the cumulative frequency values - so that's confusing).
::::::: But if I were being critical, I'd also criticize whoever designed this "as a generalized change of radix" algorithm for choosing the length of the string as the base for the encoding scheme. If anyone were to try to actually use this encoding scheme, they'd encode THE in base 3, using the encoding values T=19, H=7, E=4.
::::::: But, I guess that answers my question about why people are using cumulative frequencies in the encoding implementations: (a) the algorithm itself is basically useless, so no one should care about using it, and (b) the description in wikipedia uses an example where cumulative frequencies are easily confused with the letter encoding values.
::::::: All in all, it's a mess. --[[User:Rdm|Rdm]] ([[User talk:Rdm|talk]]) 19:57, 28 January 2016 (UTC)
6,962

edits