Random Latin squares: Difference between revisions

Content added Content deleted
(→‎{{header|jq}}: add algorithm for uniform distribution)
Line 1,561:
'''Also works with gojq, the Go implementation of jq.'''
 
This entry presents two jq programs for generating Latin Squares of order n
The jq program presented in this section uses the Knuth shuffle for
(LS(n)) in accordance with the requirements.
the first row, and then adds cells one-by-one by making a calculated
but fallible selection based on the values in its row and column,
backtracking on failure. The method scales quite well, though the running time is quite variable.
 
The first program uses a "brute-force" algorithm (with simple optimizations) to
For example, to generate a Latin Square of order 10 (LS(10)) typically
generate Latin Squares of order n as though drawing with
takes from 0.11 to 0.14 seconds (u+s) on my 3GHz machine.
replacement from the population of all such Latin Squares.
The chi-squared statistics show good agreement
with the theoretical uniformity.
 
The second program uses a much faster algorithm for generating Latin Squares
To generate LS(15), the jq program typically takes 0.15 to 0.21s; to
in accordance with the requirements, but with bias away from
generate LS(20) takes from about 0.36 to 0.94 seconds; and LS(30)
uniformity, as also illustrated by chi-squared statistics.
about 0.5 to 29 seconds. The example of LS(40) shown below took 12 seconds (u+s) to generate.
 
Both algorithms use /dev/random as a source of entropy. They also
The algorithm will evidently generate every Latin Square of any order eventually, but it's not obvious
both use the Knuth shuffle to generate the first row, and rely on
how far from uniform the distribution might be, so it is perhaps reassuring to know that on a first attempt,
backtracking using the jq idiom:
a run in which 5,760 squares of order 4 were generated did in fact generate all 576 members of LS(4).
 
The Chi-squared statistic value was 3094 (SIGMA(o-10)^2)/10).
`first( repeat( _) )`
 
The first algorithm incrementally adds rows, backtracking to the
point immediately after the selection of the first row. For n larger
than about 4, this algorithm is quite slow, though in a fairly predictable way.
 
The second algorithm incrementally adds cells, backtracking to the
last cell. It is much faster but the running time can be quite
variable, as suggested by this table:
<pre>
n Typical range of u+s times on 3GHz machine
10 0.11 to 0.14 s
15 0.15 to 0.21 s
20 0.36 to 0.94 s
30 0.5 to 29 seconds
40 80 seconds to 21 minutes
45 8 to 39 minutes
</pre>
 
An interesting variant of the second algorithm can be obtained by
a trivial modification of just one line (see the comment with "n.b."):
backtracking to the last full row is slightly faster while maintaining
randomness, at the cost of a greater departure from uniform
randomness, as confirmed by these two runs using the same `stats`
function as defined in the first program.
 
<pre>
# Using `ext` (i.e., backtrack to point after selection of first row)
Number of LS(4): 5760
Of 576 possibilities, only 575 were generated.
Chi-squared statistic (df=575): 2128.6
# u+s 5.5s
 
# Using `extend` (i.e. backtrack to point of most recent row extension - faster but more biased)
Number of LS(4): 5760
All 576 possibilities have been generated.
Chi-squared statistic (df=575): 3055.8
# u+s 4.7s
</pre>
 
The program presented here uses /dev/random as its source of entropy.
<syntaxhighlight lang=sh>
#!/bin/bash
Line 1,584 ⟶ 1,623:
</syntaxhighlight>
 
=== Common Functions ===
'''random-latin-squares.jq'''
<syntaxhighlight lang=sh>
 
### Generic Utility Functions
'''Generic Utility Functions'''
# For inclusion using jq's `include` directive:
# Output: a PRN in range(0;$n) where $n is .
def prn:
Line 1,610 ⟶ 1,651:
 
def lpad($len): tostring | ($len - length) as $l | (" " * $l)[:$l] + .;
 
 
# If the input array is not rectangular, let nulls fall where they may
def column($j):
[.[] | .[$j]];
 
# Emit a stream of [value, frequency] pairs
def histogram(stream):
reduce stream as $s ({};
($s|type) as $t
| (if $t == "string" then $s else ($s|tojson) end) as $y
| .[$t][$y][0] = $s
| .[$t][$y][1] += 1 )
| .[][] ;
 
def ss(s): reduce s as $x (0; . + ($x * $x));
 
def chiSquared($expected): ss( .[] - $expected ) / $expected;
</syntaxhighlight>
 
===uniformly-random-latin-squares.jq===
<syntaxhighlight lang=sh>
# Include the utilities e.g. by
# include "random-latin-squares.utilities" {search: "."};
 
# Determine orthogonality of two arrays, confining attention
# to the first $n elements in each:
def orthogonal($a; $b; $n):
first( (range(0; $n) | if $a[.] == $b[.] then 0 else empty end) // 1) | . == 1;
 
# Are the two arrays orthogonal up to the length of the shorter?
def orthogonal($a; $b):
([$a, $b | length] | min) as $min
| orthogonal($a; $b; $min);
 
# Is row $i orthogonal to all the previous rows?
def orthogonal($i):
. as $in
| .[$i] as $row
| all(range(0;$i); orthogonal($row; $in[.]));
 
# verify columnwise orthogonality
def columnwise:
length as $n
| transpose as $t
| all( range(1;$n); . as $i | $t | orthogonal($i)) ;
 
def addLast:
(.[0] | length) as $n
| [range(0; $n)] as $range
| [range(0; $n) as $i
| ($range - column($i))[0] ] as $last
| . + [$last] ;
 
# input: an array being a permutation of [range(0;$n)] for some $n
# output: a Latin Square selected at random from all the candidates
def extend:
(.[0] | length) as $n
| if length >= $n then .
elif length == $n - 1 then addLast
else ([range(0; $n)] | knuthShuffle) as $row
| (. + [$row] )
| if orthogonal(length - 1) and columnwise then extend else empty end
end ;
 
# Generate a Latin Square.
# The input should be an integer specifying its size.
def latinSquare:
. as $n
| if $n <= 0 then []
else
[ [range(0; $n)] | knuthShuffle]
| first(repeat(extend))
# | (if columnwise then . else debug end) # internal check
end ;
 
# If the input is a positive integer, $n, generate and print an $n x $n Latin Square.
# Otherwise, simply echo it.
def printLatinSquare:
if type == "number"
then latinSquare
| .[] | map(lpad(3)) | join(" ")
else .
end;
 
# $order should be in 1 .. 5 inclusive
# If $n is null, then use 10 * $counts[$order]
def stats($order; $n):
# table of counts:
[0,1,2,12,576,161280] as $counts
| $counts[$order] as $possibilities
| (if $n then $n else 10 * $possibilities end) as $n
| reduce range(0;$n) as $i ({};
($order|latinSquare|flatten|join("")) as $row
| .[$row] += 1)
| # ([histogram(.[])] | sort[] | join(" ")),
"Number of LS(\($order)): \($n)",
(if length == $possibilities
then "All \($possibilities) possibilities have been generated."
else "Of \($possibilities) possibilities, only \(length) were generated."
end),
"Chi-squared statistic (df=\($possibilities-1)): \( [.[]] | chiSquared( $n / $possibilities))";
 
 
stats(3;null), "",
stats(4;5760), ""
stats(4;5760)
</syntaxhighlight>
 
{{output}}
<pre>
Number of LS(3): 120
All 12 possibilities have been generated.
Chi-squared statistic (df=11): 18.8
 
Number of LS(4): 5760
All 576 possibilities have been generated.
Chi-squared statistic (df=575): 572.2
 
Number of LS(4): 5760
All 576 possibilities have been generated.
Chi-squared statistic (df=575): 517.2
</pre>
 
=== Random Latin Squares ===
This is the (much) faster program that meets the task
requirements while deviating from uniform randomness
as suggested by the Chi-squared statistics presented in the preamble.
 
<syntaxhighlight lang=sh>
# Include the utilities e.g. by
# include "random-latin-squares.utilities" {search: "."};
 
# Select an element at random from [range(0;$n)] - column($j)
Line 1,638 ⟶ 1,805:
# if we can complete the row, then there is no need for another backtrack point!
| if $cl == 1 and ($last|length) == $n - 1
then ($good + [ $last + $candidates]) | extendext # n.b. or use `extend` to speed things up at the cost of more bias
else
if $cl == 1 then ($good + [ $last + $candidates]) | ext