Entropy: Difference between revisions
Keithpinson (talk | contribs) |
m new line for basic task description |
||
Line 1: | Line 1: | ||
{{task|Mathematics}} |
{{task|Mathematics}} |
||
Calculate the [[wp:Entropy (information theory)|information entropy]] (Shannon entropy) of a given input string. |
Calculate the [[wp:Entropy (information theory)|information entropy]] (Shannon entropy) of a given input string. |
||
Entropy is the [[wp:Expected value|expected value]] of the measure of [[wp:Self-information|information]] content in a system. In general, the Shannon entropy of a variable <math>X</math> is defined as: |
Entropy is the [[wp:Expected value|expected value]] of the measure of [[wp:Self-information|information]] content in a system. In general, the Shannon entropy of a variable <math>X</math> is defined as: |
||
:<math>H(X) = \sum_{x\in\Omega} P(x) I(x)</math> |
:<math>H(X) = \sum_{x\in\Omega} P(x) I(x)</math> |
Revision as of 15:36, 27 February 2013
You are encouraged to solve this task according to the task description, using any language you may know.
Calculate the information entropy (Shannon entropy) of a given input string.
Entropy is the expected value of the measure of information content in a system. In general, the Shannon entropy of a variable is defined as:
where the information content . If the base of the logarithm , the result is expressed in bits, a unit of information. Therefore, given a string of length where is the relative frequency of each character, the entropy of a string in bits is:
For this task, use "1223334444" as an example. The result should be around 1.84644 bits.
Ada
Uses Ada 2012. <lang Ada>with Ada.Text_IO, Ada.Float_Text_IO, Ada.Numerics.Elementary_Functions;
procedure Count_Entropy is
package TIO renames Ada.Text_IO;
Count: array(Character) of Natural := (others => 0); Sum: Natural := 0; Line: String := "1223334444";
begin
for I in Line'Range loop -- count the characters Count(Line(I)) := Count(Line(I))+1; Sum := Sum + 1; end loop;
declare -- compute the entropy and print it function P(C: Character) return Float is (Float(Count(C)) / Float(Sum)); use Ada.Numerics.Elementary_Functions, Ada.Float_Text_IO; Result: Float := 0.0; begin for Ch in Character loop Result := Result - (if P(Ch)=0.0 then 0.0 else P(Ch) * Log(P(Ch), Base => 2.0)); end loop; Put(Result, Fore => 1, Aft => 5, Exp => 0); end;
end Count_Entropy;</lang>
Burlesque
<lang burlesque>blsq ) "1223334444"F:u[vv^^{1\/?/2\/LG}m[?*++ 1.8464393446710157</lang>
Euler Math Toolbox
<lang EulerMathToolbox>>function entropy (s) ... $ v=strtochar(s); $ m=getmultiplicities(unique(v),v); $ m=m/sum(m); $ return sum(-m*logbase(m,2)) $endfunction >entropy("1223334444")
1.84643934467</lang>
D
<lang d>import std.stdio, std.algorithm, std.math;
double entropy(T)(T[] s) /*pure nothrow*/ if (__traits(compiles, sort(s))) {
return s .sort() .group .map!(g => g[1] / cast(double)s.length) .map!(p => -p * log2(p)) .reduce!q{a + b};
}
void main() {
"1223334444"d.dup.entropy.writeln;
}</lang>
- Output:
1.84644
friendly interactive shell
Sort of hacky, but friendly interactive shell isn't really optimized for mathematic tasks (in fact, it doesn't even have associative arrays).
<lang fishshell>function entropy
for arg in $argv set name count_$arg if not count $$name > /dev/null set $name 0 set values $values $arg end set $name (math $$name + 1) end set entropy 0 for value in $values set name count_$value set entropy (echo " scale = 50 p = "$$name" / "(count $argv)" $entropy - p * l(p) " | bc -l) end echo "$entropy / l(2)" | bc -l
end entropy (echo 1223334444 | fold -w1)</lang>
- Output:
1.84643934467101549345
Haskell
<lang haskell>import Data.List
main = print $ entropy "1223334444"
entropy s =
sum . map lg' . fq' . map (fromIntegral.length) . group . sort $ s where lg' c = (c * ) . logBase 2 $ 1.0 / c fq' c = map (\x -> x / (sum c)) c </lang>
J
Solution:<lang j> entropy=: +/@:-@(* 2&^.)@(#/.~ % #)</lang>
- Example:
<lang j> entropy '1223334444' 1.84644</lang>
Mathematica
<lang Mathematica>shE[s_String] := -Plus @@ ((# Log[2., #]) & /@ ((Length /@ Gather[#])/
Length[#]) &[Characters[s]])</lang>
- Example:
<lang Mathematica> shE["1223334444"]
1.84644 shE["Rosetta Code"] 3.08496</lang>
Perl
<lang Perl>sub entropy {
my %count; $count{$_}++ for @_; my @p = map $_/@_, values %count; my $entropy = 0; $entropy += - $_ * log $_ for @p; $entropy / log 2
}
print entropy split //, "1223334444";</lang>
Perl 6
<lang Perl 6>sub entropy(@a) {
[+] map -> \p { p * -log p }, @a.bag.values »/» +@a;
}
say log(2) R/ entropy '1223334444'.comb;</lang>
- Output:
1.84643934467102
R
<lang r>entropy = function(s)
{freq = prop.table(table( substring(s, 1 : nchar(s), 1 : nchar(s)))) -sum(freq * log(freq, base = 2))}
print(entropy("1223334444")) # 1.846439</lang>
Ruby
<lang ruby>def entropy(s)
counts = Hash.new(0) s.each_char { |c| counts[c] += 1 }
counts.values.reduce(0) do |entropy, count| freq = count / s.length.to_f entropy - freq * Math.log2(freq) end
end</lang> One-liner, same performance (or better): <lang ruby>def entropy2(s)
s.each_char.group_by(&:to_s).values.map { |x| x.length / s.length.to_f }.reduce(0) { |e, x| e - x*Math.log2(x) }
end</lang>
Scala
<lang scala>import scala.math._
def entropy( v:String ) = { v
.groupBy (a => a) .values .map( i => i.length.toDouble / v.length ) .map( p => -p * log10(p) / log10(2)) .sum
}
// Confirm that "1223334444" has an entropy of about 1.84644 assert( math.round( entropy("1223334444") * 100000 ) * 0.00001 == 1.84644 )</lang>
Tcl
<lang tcl>proc entropy {str} {
set log2 [expr log(2)] foreach char [split $str ""] {dict incr counts $char} set entropy 0.0 foreach count [dict values $counts] {
set freq [expr {$count / double([string length $str])}] set entropy [expr {$entropy - $freq * log($freq)/$log2}]
} return $entropy
}</lang> Demonstration: <lang tcl>puts [format "entropy = %.5f" [entropy "1223334444"]] puts [format "entropy = %.5f" [entropy "Rosetta Code"]]</lang>
- Output:
entropy = 1.84644 entropy = 3.08496
XPL0
<lang XPL0>code real RlOut=48, Ln=54; \intrinsic routines string 0; \use zero-terminated strings
func StrLen(A); \Return number of characters in an ASCIIZ string char A; int I; for I:= 0, -1>>1-1 do
if A(I) = 0 then return I;
func real Entropy(Str); \Return Shannon entropy of string char Str; int Len, I, Count(128); real Sum, Prob; [Len:= StrLen(Str); for I:= 0 to 127 do Count(I):= 0; for I:= 0 to Len-1 do \count number of each character in string
Count(Str(I)):= Count(Str(I)) + 1;
Sum:= 0.0; for I:= 0 to 127 do
if Count(I) # 0 then \(avoid Ln(0.0) error) [Prob:= float(Count(I)) / float(Len); \probability of char in string Sum:= Sum + Prob*Ln(Prob); ];
return -Sum/Ln(2.0); ];
RlOut(0, Entropy("1223334444"))</lang>
- Output:
1.84644