Talk:Entropy: Difference between revisions

From Rosetta Code
Content added Content deleted
m (→‎REXX (log2): correct indentation)
(→‎REXX (log2): added some comments/answers to/for a multiple query, added an expatiated LOG2 function. -- ~~~~)
Line 45: Line 45:
"The LOG2 subroutine in only included here for functionality, not to document how to calculate LOG2 using REXX"
"The LOG2 subroutine in only included here for functionality, not to document how to calculate LOG2 using REXX"
:: What's the difference and where is it / should it be documented?
:: What's the difference and where is it / should it be documented?

::: Difference between what?

::: The LOG2 function should/could be documented in a Rosetta Code task that asks for a user-written version of LN/LOG2/LOG (as opposed to one that is a BIF).   Documenting a subroutine that is a BIF (built-in function) for most languages would just make the LOG2 function much too bulky and detract from the task's requirement:   ''solve the entropy problem''.   The REXX LOG2 function (which is a modified LN function) was originally written to be compact and written as a ''one-liner'' function, intended to be placed at the end of the program that it was needed (it, and another hundred or so subroutines/functions --- that's no exaggeration).   Strictly speaking, the   '''E'''   value was originally a function, but it was abbreviated here (about 80 decimal digits as a constant rather than a function call) to save even more space and complexity.   Also omitted were error checks that were in the original LN function, as well as   SIGNAL ON NOVALUE;   SIGNAL ON SYNTAX,   SIGNAL ON HALT, and the mechanism of issuing/displaying the error message(s) and the offending REXX statements.   At lot of crafting went into the writing of the original LN function (this was under CMS REXX and then ported to PC/REXX in 1989, as I recall).   PC/REXX has some restrictions on the number of symbols, the aggregate size of the values of the variables, and the size of the REXX program. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 23:47, 28 May 2013 (UTC)

:: Take 'my' (now corrected - thanks) log as an example --[[User:Walterpachl|Walterpachl]]
:: Take 'my' (now corrected - thanks) log as an example --[[User:Walterpachl|Walterpachl]]

::: If this Rosetta Code task was to explain how LN (or LOG2) was calculated, that would be appropriate. -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 23:47, 28 May 2013 (UTC)

::: Here's a version of the (REXX) LOG2 function, unrolled and expatiated: -- [[User:Gerard Schildberger|Gerard Schildberger]] ([[User talk:Gerard Schildberger|talk]]) 23:47, 28 May 2013 (UTC)
<lang rexx>/*──────────────────────────────────LOG2 subroutine───────────────────────────*/
log2: procedure
parse arg x 1 xx
ig= x>1.5
is= 1 - 2*(ig\==1)
numeric digits digits()+5 /* [↓] precision of E must be > digits().*/
e=2.7182818284590452353602874713526624977572470936999595749669676277240766303535
ii=0

do while ig & xx>1.5 | \ig & xx<.5

_=e
do j=-1
iz=xx* _**-is
if j>=0 then if ig & iz<1 | \ig&iz>.5 then leave
_=_*_
izz=iz
end /*j*/

xx=izz
ii=ii + is* 2**j
end /*while*/

x=x * e**-ii -1
z=0
_=-1
p=z

do k=1
_=-_*x
z=z+_/k
if z=p then leave
p=z
end /*k*/

r=z+ii

if arg()==2 then return r
return r / log2(2,0)</lang>

Revision as of 23:47, 28 May 2013

What about a less dull example?

What about using "Rosetta code" as an example instead of the dull "1223334444"?--Grondilu 17:53, 22 February 2013 (UTC)

Or even funnier: write a program who computes its own entropy :) --Grondilu 12:40, 25 February 2013 (UTC)
Better yet, a bonus task of computing the entropy of each solution on the page. :) --TimToady 19:23, 25 February 2013 (UTC)
I like computing the entropy of “Rosetta Code” (it's about 3.08496, assuming my code is right); a more self-referential one is fine too, except it involves features that might block some languages from participating. (The draft child task is a better place for that.) –Donal Fellows 09:31, 26 February 2013 (UTC)

Alternate form

Not sure this is very useful, but I was wondering if one could not find a more concise way of writing this.

If we call the length of the string and the number of occurrences of the character c, we have:

In perl6, this allows a slightly simpler formula, i.e. not using hyperoperators:

<lang Perl 6>sub entropy(@a) {

   log(@a) - @a R/ [+] map -> \n { n * log n }, @a.bag.values

}</lang>

For what it's worth.--Grondilu 18:14, 4 March 2013 (UTC)

Description

I think the task is not described correctly. It calculates the average entropy per character, not the entropy of the message as a whole.

For example, I generated this 100-digit string with 0 through 7 from random.org. It has 300 bits of entropy (more conservatively, 299 to 300 bits of entropy, in case the generation is slightly flawed). The task gives it just under 3: <lang parigp>entropy("1652410507230105455225274644011734652475143261241037401534561367707447375435416503021223072520334062") %1 = 2.9758111700170733830744745234131842224</lang>

CRGreathouse (talk) 02:08, 17 April 2013 (UTC)

I don't know. It's not simple. I was a bit embarrassed with this task as I struggled to understand what the entropy of a string is. Normally entropy is defined for a system whose internal state is only known probabilisticly. A string is known exactly so stricto sensu, the entropy of any string is zero. As I understood it, the task defines the entropy of a string as the entropy of whatever system "generated" the string. The string is thus seen as a representative samples of the probabilities of each possible character, seen as a random variable.
In your example, you say the entropy is 300 bits, but I'm not sure it makes much sense to say that. 300 bits is the amount of information in this string, if you consider it as the result of picking a random number from 0 to 10^300. Yet the entropy is still zero, for your string is exactly defined.
The entropy as currently defined in the task does not directly depend on the "size" of the string in memory, because it merely represents the probability table of a stochastic process.
--Grondilu (talk) 15:42, 17 April 2013 (UTC)
What I mean when I say 299-300 bits is that if we knew I was going to pick 100 digits from random.org as described we wouldn't be able to design a scheme that would average fewer than 299 bits to tell you which one I picked, but we could design one which would do it in 300. This task takes a slightly different model (if I was going to generate a 100-character string with a given number of each digit) but the two are similar. Either way the string has about 300 and each character has about 3.
CRGreathouse (talk) 00:55, 18 April 2013 (UTC)

REXX (log2)

"The LOG2 subroutine in only included here for functionality, not to document how to calculate LOG2 using REXX"

What's the difference and where is it / should it be documented?
Difference between what?
The LOG2 function should/could be documented in a Rosetta Code task that asks for a user-written version of LN/LOG2/LOG (as opposed to one that is a BIF).   Documenting a subroutine that is a BIF (built-in function) for most languages would just make the LOG2 function much too bulky and detract from the task's requirement:   solve the entropy problem.   The REXX LOG2 function (which is a modified LN function) was originally written to be compact and written as a one-liner function, intended to be placed at the end of the program that it was needed (it, and another hundred or so subroutines/functions --- that's no exaggeration).   Strictly speaking, the   E   value was originally a function, but it was abbreviated here (about 80 decimal digits as a constant rather than a function call) to save even more space and complexity.   Also omitted were error checks that were in the original LN function, as well as   SIGNAL ON NOVALUE;   SIGNAL ON SYNTAX,   SIGNAL ON HALT, and the mechanism of issuing/displaying the error message(s) and the offending REXX statements.   At lot of crafting went into the writing of the original LN function (this was under CMS REXX and then ported to PC/REXX in 1989, as I recall).   PC/REXX has some restrictions on the number of symbols, the aggregate size of the values of the variables, and the size of the REXX program. -- Gerard Schildberger (talk) 23:47, 28 May 2013 (UTC)
Take 'my' (now corrected - thanks) log as an example --Walterpachl
If this Rosetta Code task was to explain how LN (or LOG2) was calculated, that would be appropriate. -- Gerard Schildberger (talk) 23:47, 28 May 2013 (UTC)
Here's a version of the (REXX) LOG2 function, unrolled and expatiated: -- Gerard Schildberger (talk) 23:47, 28 May 2013 (UTC)

<lang rexx>/*──────────────────────────────────LOG2 subroutine───────────────────────────*/ log2: procedure parse arg x 1 xx ig= x>1.5 is= 1 - 2*(ig\==1) numeric digits digits()+5 /* [↓] precision of E must be > digits().*/ e=2.7182818284590452353602874713526624977572470936999595749669676277240766303535 ii=0

                 do  while  ig & xx>1.5  |  \ig & xx<.5
                 _=e
                        do j=-1
                        iz=xx* _**-is
                        if j>=0  then  if  ig & iz<1  |  \ig&iz>.5  then leave
                        _=_*_
                        izz=iz
                        end   /*j*/
                 xx=izz
                 ii=ii + is* 2**j
                 end   /*while*/
x=x * e**-ii -1
z=0
_=-1
p=z
                 do k=1
                 _=-_*x
                 z=z+_/k
                 if z=p  then leave
                 p=z
                 end   /*k*/

r=z+ii

if arg()==2 then return r

                 return r / log2(2,0)</lang>