Random Latin squares: Difference between revisions
Content added Content deleted
(→{{header|Factor}}: uniformity) |
(→{{header|jq}}: add algorithm for uniform distribution) |
||
Line 1,561: | Line 1,561: | ||
'''Also works with gojq, the Go implementation of jq.''' |
'''Also works with gojq, the Go implementation of jq.''' |
||
This entry presents two jq programs for generating Latin Squares of order n |
|||
The jq program presented in this section uses the Knuth shuffle for |
|||
(LS(n)) in accordance with the requirements. |
|||
the first row, and then adds cells one-by-one by making a calculated |
|||
but fallible selection based on the values in its row and column, |
|||
backtracking on failure. The method scales quite well, though the running time is quite variable. |
|||
The first program uses a "brute-force" algorithm (with simple optimizations) to |
|||
For example, to generate a Latin Square of order 10 (LS(10)) typically |
|||
generate Latin Squares of order n as though drawing with |
|||
takes from 0.11 to 0.14 seconds (u+s) on my 3GHz machine. |
|||
replacement from the population of all such Latin Squares. |
|||
The chi-squared statistics show good agreement |
|||
with the theoretical uniformity. |
|||
The second program uses a much faster algorithm for generating Latin Squares |
|||
To generate LS(15), the jq program typically takes 0.15 to 0.21s; to |
|||
in accordance with the requirements, but with bias away from |
|||
generate LS(20) takes from about 0.36 to 0.94 seconds; and LS(30) |
|||
uniformity, as also illustrated by chi-squared statistics. |
|||
about 0.5 to 29 seconds. The example of LS(40) shown below took 12 seconds (u+s) to generate. |
|||
Both algorithms use /dev/random as a source of entropy. They also |
|||
The algorithm will evidently generate every Latin Square of any order eventually, but it's not obvious |
|||
both use the Knuth shuffle to generate the first row, and rely on |
|||
how far from uniform the distribution might be, so it is perhaps reassuring to know that on a first attempt, |
|||
backtracking using the jq idiom: |
|||
a run in which 5,760 squares of order 4 were generated did in fact generate all 576 members of LS(4). |
|||
The Chi-squared statistic value was 3094 (SIGMA(o-10)^2)/10). |
|||
`first( repeat( _) )` |
|||
The first algorithm incrementally adds rows, backtracking to the |
|||
point immediately after the selection of the first row. For n larger |
|||
than about 4, this algorithm is quite slow, though in a fairly predictable way. |
|||
The second algorithm incrementally adds cells, backtracking to the |
|||
last cell. It is much faster but the running time can be quite |
|||
variable, as suggested by this table: |
|||
<pre> |
|||
n Typical range of u+s times on 3GHz machine |
|||
10 0.11 to 0.14 s |
|||
15 0.15 to 0.21 s |
|||
20 0.36 to 0.94 s |
|||
30 0.5 to 29 seconds |
|||
40 80 seconds to 21 minutes |
|||
45 8 to 39 minutes |
|||
</pre> |
|||
An interesting variant of the second algorithm can be obtained by |
|||
a trivial modification of just one line (see the comment with "n.b."): |
|||
backtracking to the last full row is slightly faster while maintaining |
|||
randomness, at the cost of a greater departure from uniform |
|||
randomness, as confirmed by these two runs using the same `stats` |
|||
function as defined in the first program. |
|||
<pre> |
|||
# Using `ext` (i.e., backtrack to point after selection of first row) |
|||
Number of LS(4): 5760 |
|||
Of 576 possibilities, only 575 were generated. |
|||
Chi-squared statistic (df=575): 2128.6 |
|||
# u+s 5.5s |
|||
# Using `extend` (i.e. backtrack to point of most recent row extension - faster but more biased) |
|||
Number of LS(4): 5760 |
|||
All 576 possibilities have been generated. |
|||
Chi-squared statistic (df=575): 3055.8 |
|||
# u+s 4.7s |
|||
</pre> |
|||
The program presented here uses /dev/random as its source of entropy. |
|||
<syntaxhighlight lang=sh> |
<syntaxhighlight lang=sh> |
||
#!/bin/bash |
#!/bin/bash |
||
Line 1,584: | Line 1,623: | ||
</syntaxhighlight> |
</syntaxhighlight> |
||
=== Common Functions === |
|||
'''random-latin-squares.jq''' |
|||
<syntaxhighlight lang=sh> |
<syntaxhighlight lang=sh> |
||
### Generic Utility Functions |
|||
'''Generic Utility Functions''' |
|||
# For inclusion using jq's `include` directive: |
|||
# Output: a PRN in range(0;$n) where $n is . |
# Output: a PRN in range(0;$n) where $n is . |
||
def prn: |
def prn: |
||
Line 1,610: | Line 1,651: | ||
def lpad($len): tostring | ($len - length) as $l | (" " * $l)[:$l] + .; |
def lpad($len): tostring | ($len - length) as $l | (" " * $l)[:$l] + .; |
||
# If the input array is not rectangular, let nulls fall where they may |
# If the input array is not rectangular, let nulls fall where they may |
||
def column($j): |
def column($j): |
||
[.[] | .[$j]]; |
[.[] | .[$j]]; |
||
# Emit a stream of [value, frequency] pairs |
|||
def histogram(stream): |
|||
reduce stream as $s ({}; |
|||
($s|type) as $t |
|||
| (if $t == "string" then $s else ($s|tojson) end) as $y |
|||
| .[$t][$y][0] = $s |
|||
| .[$t][$y][1] += 1 ) |
|||
| .[][] ; |
|||
def ss(s): reduce s as $x (0; . + ($x * $x)); |
|||
def chiSquared($expected): ss( .[] - $expected ) / $expected; |
|||
</syntaxhighlight> |
|||
===uniformly-random-latin-squares.jq=== |
|||
<syntaxhighlight lang=sh> |
|||
# Include the utilities e.g. by |
|||
# include "random-latin-squares.utilities" {search: "."}; |
|||
# Determine orthogonality of two arrays, confining attention |
|||
# to the first $n elements in each: |
|||
def orthogonal($a; $b; $n): |
|||
first( (range(0; $n) | if $a[.] == $b[.] then 0 else empty end) // 1) | . == 1; |
|||
# Are the two arrays orthogonal up to the length of the shorter? |
|||
def orthogonal($a; $b): |
|||
([$a, $b | length] | min) as $min |
|||
| orthogonal($a; $b; $min); |
|||
# Is row $i orthogonal to all the previous rows? |
|||
def orthogonal($i): |
|||
. as $in |
|||
| .[$i] as $row |
|||
| all(range(0;$i); orthogonal($row; $in[.])); |
|||
# verify columnwise orthogonality |
|||
def columnwise: |
|||
length as $n |
|||
| transpose as $t |
|||
| all( range(1;$n); . as $i | $t | orthogonal($i)) ; |
|||
def addLast: |
|||
(.[0] | length) as $n |
|||
| [range(0; $n)] as $range |
|||
| [range(0; $n) as $i |
|||
| ($range - column($i))[0] ] as $last |
|||
| . + [$last] ; |
|||
# input: an array being a permutation of [range(0;$n)] for some $n |
|||
# output: a Latin Square selected at random from all the candidates |
|||
def extend: |
|||
(.[0] | length) as $n |
|||
| if length >= $n then . |
|||
elif length == $n - 1 then addLast |
|||
else ([range(0; $n)] | knuthShuffle) as $row |
|||
| (. + [$row] ) |
|||
| if orthogonal(length - 1) and columnwise then extend else empty end |
|||
end ; |
|||
# Generate a Latin Square. |
|||
# The input should be an integer specifying its size. |
|||
def latinSquare: |
|||
. as $n |
|||
| if $n <= 0 then [] |
|||
else |
|||
[ [range(0; $n)] | knuthShuffle] |
|||
| first(repeat(extend)) |
|||
# | (if columnwise then . else debug end) # internal check |
|||
end ; |
|||
# If the input is a positive integer, $n, generate and print an $n x $n Latin Square. |
|||
# Otherwise, simply echo it. |
|||
def printLatinSquare: |
|||
if type == "number" |
|||
then latinSquare |
|||
| .[] | map(lpad(3)) | join(" ") |
|||
else . |
|||
end; |
|||
# $order should be in 1 .. 5 inclusive |
|||
# If $n is null, then use 10 * $counts[$order] |
|||
def stats($order; $n): |
|||
# table of counts: |
|||
[0,1,2,12,576,161280] as $counts |
|||
| $counts[$order] as $possibilities |
|||
| (if $n then $n else 10 * $possibilities end) as $n |
|||
| reduce range(0;$n) as $i ({}; |
|||
($order|latinSquare|flatten|join("")) as $row |
|||
| .[$row] += 1) |
|||
| # ([histogram(.[])] | sort[] | join(" ")), |
|||
"Number of LS(\($order)): \($n)", |
|||
(if length == $possibilities |
|||
then "All \($possibilities) possibilities have been generated." |
|||
else "Of \($possibilities) possibilities, only \(length) were generated." |
|||
end), |
|||
"Chi-squared statistic (df=\($possibilities-1)): \( [.[]] | chiSquared( $n / $possibilities))"; |
|||
stats(3;null), "", |
|||
stats(4;5760), "" |
|||
stats(4;5760) |
|||
</syntaxhighlight> |
|||
{{output}} |
|||
<pre> |
|||
Number of LS(3): 120 |
|||
All 12 possibilities have been generated. |
|||
Chi-squared statistic (df=11): 18.8 |
|||
Number of LS(4): 5760 |
|||
All 576 possibilities have been generated. |
|||
Chi-squared statistic (df=575): 572.2 |
|||
Number of LS(4): 5760 |
|||
All 576 possibilities have been generated. |
|||
Chi-squared statistic (df=575): 517.2 |
|||
</pre> |
|||
=== Random Latin Squares === |
|||
This is the (much) faster program that meets the task |
|||
requirements while deviating from uniform randomness |
|||
as suggested by the Chi-squared statistics presented in the preamble. |
|||
<syntaxhighlight lang=sh> |
|||
# Include the utilities e.g. by |
|||
# include "random-latin-squares.utilities" {search: "."}; |
|||
# Select an element at random from [range(0;$n)] - column($j) |
# Select an element at random from [range(0;$n)] - column($j) |
||
Line 1,638: | Line 1,805: | ||
# if we can complete the row, then there is no need for another backtrack point! |
# if we can complete the row, then there is no need for another backtrack point! |
||
| if $cl == 1 and ($last|length) == $n - 1 |
| if $cl == 1 and ($last|length) == $n - 1 |
||
then ($good + [ $last + $candidates]) | |
then ($good + [ $last + $candidates]) | ext # n.b. or use `extend` to speed things up at the cost of more bias |
||
else |
else |
||
if $cl == 1 then ($good + [ $last + $candidates]) | ext |
if $cl == 1 then ($good + [ $last + $candidates]) | ext |