Random Latin squares: Difference between revisions
Content added Content deleted
(→{{header|Factor}}: uniformity) |
(→{{header|jq}}: add algorithm for uniform distribution) |
||
Line 1,561:
'''Also works with gojq, the Go implementation of jq.'''
This entry presents two jq programs for generating Latin Squares of order n
(LS(n)) in accordance with the requirements.
The first program uses a "brute-force" algorithm (with simple optimizations) to
generate Latin Squares of order n as though drawing with
replacement from the population of all such Latin Squares.
The chi-squared statistics show good agreement
with the theoretical uniformity.
The second program uses a much faster algorithm for generating Latin Squares
in accordance with the requirements, but with bias away from
uniformity, as also illustrated by chi-squared statistics.
Both algorithms use /dev/random as a source of entropy. They also
both use the Knuth shuffle to generate the first row, and rely on
backtracking using the jq idiom:
`first( repeat( _) )`
The first algorithm incrementally adds rows, backtracking to the
point immediately after the selection of the first row. For n larger
than about 4, this algorithm is quite slow, though in a fairly predictable way.
The second algorithm incrementally adds cells, backtracking to the
last cell. It is much faster but the running time can be quite
variable, as suggested by this table:
<pre>
n Typical range of u+s times on 3GHz machine
10 0.11 to 0.14 s
15 0.15 to 0.21 s
20 0.36 to 0.94 s
30 0.5 to 29 seconds
40 80 seconds to 21 minutes
45 8 to 39 minutes
</pre>
An interesting variant of the second algorithm can be obtained by
a trivial modification of just one line (see the comment with "n.b."):
backtracking to the last full row is slightly faster while maintaining
randomness, at the cost of a greater departure from uniform
randomness, as confirmed by these two runs using the same `stats`
function as defined in the first program.
<pre>
# Using `ext` (i.e., backtrack to point after selection of first row)
Number of LS(4): 5760
Of 576 possibilities, only 575 were generated.
Chi-squared statistic (df=575): 2128.6
# u+s 5.5s
# Using `extend` (i.e. backtrack to point of most recent row extension - faster but more biased)
Number of LS(4): 5760
All 576 possibilities have been generated.
Chi-squared statistic (df=575): 3055.8
# u+s 4.7s
</pre>
<syntaxhighlight lang=sh>
#!/bin/bash
Line 1,584 ⟶ 1,623:
</syntaxhighlight>
=== Common Functions ===
<syntaxhighlight lang=sh>
'''Generic Utility Functions'''
# For inclusion using jq's `include` directive:
# Output: a PRN in range(0;$n) where $n is .
def prn:
Line 1,610 ⟶ 1,651:
def lpad($len): tostring | ($len - length) as $l | (" " * $l)[:$l] + .;
# If the input array is not rectangular, let nulls fall where they may
def column($j):
[.[] | .[$j]];
# Emit a stream of [value, frequency] pairs
def histogram(stream):
reduce stream as $s ({};
($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| .[$t][$y][0] = $s
| .[$t][$y][1] += 1 )
| .[][] ;
def ss(s): reduce s as $x (0; . + ($x * $x));
def chiSquared($expected): ss( .[] - $expected ) / $expected;
</syntaxhighlight>
===uniformly-random-latin-squares.jq===
<syntaxhighlight lang=sh>
# Include the utilities e.g. by
# include "random-latin-squares.utilities" {search: "."};
# Determine orthogonality of two arrays, confining attention
# to the first $n elements in each:
def orthogonal($a; $b; $n):
first( (range(0; $n) | if $a[.] == $b[.] then 0 else empty end) // 1) | . == 1;
# Are the two arrays orthogonal up to the length of the shorter?
def orthogonal($a; $b):
([$a, $b | length] | min) as $min
| orthogonal($a; $b; $min);
# Is row $i orthogonal to all the previous rows?
def orthogonal($i):
. as $in
| .[$i] as $row
| all(range(0;$i); orthogonal($row; $in[.]));
# verify columnwise orthogonality
def columnwise:
length as $n
| transpose as $t
| all( range(1;$n); . as $i | $t | orthogonal($i)) ;
def addLast:
(.[0] | length) as $n
| [range(0; $n)] as $range
| [range(0; $n) as $i
| ($range - column($i))[0] ] as $last
| . + [$last] ;
# input: an array being a permutation of [range(0;$n)] for some $n
# output: a Latin Square selected at random from all the candidates
def extend:
(.[0] | length) as $n
| if length >= $n then .
elif length == $n - 1 then addLast
else ([range(0; $n)] | knuthShuffle) as $row
| (. + [$row] )
| if orthogonal(length - 1) and columnwise then extend else empty end
end ;
# Generate a Latin Square.
# The input should be an integer specifying its size.
def latinSquare:
. as $n
| if $n <= 0 then []
else
[ [range(0; $n)] | knuthShuffle]
| first(repeat(extend))
# | (if columnwise then . else debug end) # internal check
end ;
# If the input is a positive integer, $n, generate and print an $n x $n Latin Square.
# Otherwise, simply echo it.
def printLatinSquare:
if type == "number"
then latinSquare
| .[] | map(lpad(3)) | join(" ")
else .
end;
# $order should be in 1 .. 5 inclusive
# If $n is null, then use 10 * $counts[$order]
def stats($order; $n):
# table of counts:
[0,1,2,12,576,161280] as $counts
| $counts[$order] as $possibilities
| (if $n then $n else 10 * $possibilities end) as $n
| reduce range(0;$n) as $i ({};
($order|latinSquare|flatten|join("")) as $row
| .[$row] += 1)
| # ([histogram(.[])] | sort[] | join(" ")),
"Number of LS(\($order)): \($n)",
(if length == $possibilities
then "All \($possibilities) possibilities have been generated."
else "Of \($possibilities) possibilities, only \(length) were generated."
end),
"Chi-squared statistic (df=\($possibilities-1)): \( [.[]] | chiSquared( $n / $possibilities))";
stats(3;null), "",
stats(4;5760), ""
stats(4;5760)
</syntaxhighlight>
{{output}}
<pre>
Number of LS(3): 120
All 12 possibilities have been generated.
Chi-squared statistic (df=11): 18.8
Number of LS(4): 5760
All 576 possibilities have been generated.
Chi-squared statistic (df=575): 572.2
Number of LS(4): 5760
All 576 possibilities have been generated.
Chi-squared statistic (df=575): 517.2
</pre>
=== Random Latin Squares ===
This is the (much) faster program that meets the task
requirements while deviating from uniform randomness
as suggested by the Chi-squared statistics presented in the preamble.
<syntaxhighlight lang=sh>
# Include the utilities e.g. by
# include "random-latin-squares.utilities" {search: "."};
# Select an element at random from [range(0;$n)] - column($j)
Line 1,638 ⟶ 1,805:
# if we can complete the row, then there is no need for another backtrack point!
| if $cl == 1 and ($last|length) == $n - 1
then ($good + [ $last + $candidates]) |
else
if $cl == 1 then ($good + [ $last + $candidates]) | ext
|