Statistics/Normal distribution: Difference between revisions

Line 1,331:
0.8: *******
0.9: **</pre>
 
=={{header|jq}}==
'''Adapted from [[#Wren|Wren]]'''
{{works with|jq}}
'''Works with gojq, the Go implementation of jq''' (*)
 
Since jq does not have a built-in PRNG, this entry uses an external source
for entropy. For the sake of illustration, we will use /dev/urandom
as follows:
cat /dev/urandom | tr -cd '0-9' | fold -w 10 |
jq -nRr -f normal-distribution.jq
 
To save space, the function that generates the sample does not retain the observations, and for
simplicity, computes the sum of squared observations on the
assumption that overflow will not be an an issue, which is
reasonable as jq arithmetic uses IEEE 754 64-bit numbers.
 
(*) gojq requires an enormous amount of memory to complete the task for N=100,000,
and takes about 20 times longer.
'''Preliminaries'''
<lang jq># Pretty print a number to facilitate alignment of the decimal point.
# Input: a number without an exponent
# Output: a string holding the reformatted number so that there are at least `left` characters
# to the left of the decimal point, and exactly `right` characters to its right.
# Spaces are used for padding on the left, and zeros for padding on the right.
# No left-truncation occurs, so `left` can be specified as 0 to prevent left-padding.
def pp(left; right):
def lpad: tostring | (left - length) as $l | (" " * $l)[:$l] + .;
def rpad:
if (right > length) then . + ((right - length) * "0")
else .[:right]
end;
tostring as $s
| $s
| index(".") as $ix
| ((if $ix then $s[0:$ix] else $s end) | lpad) + "." +
(if $ix then $s[$ix+1:] | .[:right] else "" end | rpad) ;
 
def sigma( stream ): reduce stream as $x (0; . + $x);
 
# Input: {n, sum, ss}
# Output: augmented object with {mean, variance}
def sample_mean_and_variance:
.mean = (.sum/.n)
| .variance = ((.ss / .n) - .mean*.mean);</lang>
'''The Task'''
<lang jq># Task parameters
def parameters: {
N: 100000,
NUM_BINS: 12,
HIST_CHAR: "■",
HIST_CHAR_ALT: "-",
HIST_CHAR_SIZE: null, # null means compute dynamically
binSize: 0.1,
mu: 0.5,
sigma: 0.25 }
| .bins = [range(0; .NUM_BINS) | 0] ;
 
# input: an array of two iid rvs on [0,1]
# output: [z0, z1] as per the Box-Muller method -- see
# https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform
def normal(mu; sigma):
def pi: (1|atan) * 4;
. as [$u1, $u2]
| pi as $pi
| (sigma * ((-2 * ($u1|log))|sqrt)) as $mag
| [ $mag * ((2 * $pi * $u2)|cos) + mu,
$mag * ((2 * $pi * $u2)|sin) + mu ] ;
 
# Generate a random sample as specified by ., the task object (see `parameters`).
# Output: updated task object with sample statistics and .bins for creating a histogram.
# Each call to `input` should yield a string of random decimal digits
# such that the ensemble of ("0." + input | tonumber) can be considered to be iid rv on [0,1].
def generate:
# uniformly distributed random variable on [0,1]:
def udrv: "0." + input | tonumber;
# Maybe compute the bucket size:
(.HIST_CHAR_SIZE = (.HIST_CHAR_SIZE // (.N / (.NUM_BINS * 20) | ceil))) as $p
| reduce range(0; $p.N/2) as $i ($p;
([udrv, udrv] | normal($p.mu; $p.sigma)) as $rns
| reduce (0,1) as $j (.;
$rns[$j] as $rn
| .n += 1
| .sum += $rn
| .ss += ($rn*$rn)
| (if $rn < 0 then 0
elif $rn >= 1 then ($p.NUM_BINS - 1)
else ($rn/.binSize)|floor + 1
end ) as $bn
| .bins[$bn] += 1
# to retain the observations: .samples[$i*2 + $j] = $rn
)) ;
 
# Input: an object with
# {NUM_BINS, HIST_CHAR_SIZE, HIST_CHAR, HIST_CHAR_ALT, binSize, bins}
# Output: a stream of strings
def histogram:
def tidy: pp(2;1);
range(0; .NUM_BINS) as $i
| ((.bins[$i] / .HIST_CHAR_SIZE)|floor) as $bs
| (if $i == 0 or $i == .NUM_BINS -1
then .HIST_CHAR_ALT else .HIST_CHAR end) as $char
| (if $bs == 0 then "" else $char * $bs end) as $hist
| if $i == 0
then " -∞ ..< 0.0 \($hist)" # .bins[0]
elif ($i < .NUM_BINS - 1)
then "\(.binSize * ($i-1) | tidy) ..<\(.binSize * $i|tidy) \($hist)" # .bins[$i]]
else " 1.0 .. +∞ \($hist)" # .bins[.NUM_BINS - 1]
end;
 
def task:
parameters
| generate
| sample_mean_and_variance as $mv
| (if .HIST_CHAR_SIZE == 1 then "" else "s" end) as $plural
| "Summary statistics for \(.N) observations from N(\(.mu), \(.sigma)):",
" mean: \($mv.mean | pp(2;4))",
" variance: \($mv.variance | pp(2;4))",
" unadjusted stddev: \($mv.variance | sqrt | pp(2;4))",
" Range Number of observations (each \(.HIST_CHAR) represents \(.HIST_CHAR_SIZE) observation\($plural))",
histogram ;
 
task</lang>
{{out}}
<pre>
Summary statistics for 100000 observations from N(0.5, 0.25):
mean: 0.5001
variance: 0.0622
unadjusted stddev: 0.2495
Range Number of observations (each ■ represents 417 observations)
-∞ ..< 0.0 -----
0.0 ..< 0.1 ■■■■■■■■
0.1 ..< 0.2 ■■■■■■■■■■■■■■
0.2 ..< 0.3 ■■■■■■■■■■■■■■■■■■■■■■■
0.3 ..< 0.4 ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.4 ..< 0.5 ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.5 ..< 0.6 ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.6 ..< 0.7 ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.7 ..< 0.8 ■■■■■■■■■■■■■■■■■■■■■■■
0.8 ..< 0.9 ■■■■■■■■■■■■■■
0.9 ..< 1.0 ■■■■■■■■
1.0 .. +∞ ------
</pre>
 
=={{header|Julia}}==
2,471

edits