Sorensen–Dice coefficient: Difference between revisions

Content added Content deleted
(clarification about A ∩ B)
Line 302: Line 302:
0.6087 Rhonda numbers
0.6087 Rhonda numbers
0.6000 Lah numbers
0.6000 Lah numbers
</pre>

=={{header|jq}}==
{{Works with|jq}}

'''Works with gojq, the Go implementation of jq'''

'''Works with jaq, the Rust implementation of jq'''

'''Adapted from [[#Wren|Wren]]'''
<syntaxhighlight lang="jq">
### Generic preliminaries

def count(s): reduce s as $x (0; .+1);

def lpad($len): tostring | ($len - length) as $l | (" " * $l) + .;

# Emit the count of the common items in the two given sorted arrays
# viewed as multisets
def count_commonality_of_multisets($A; $B):
# Returns a stream of the common elements
def pop:
.[0] as $i
| .[1] as $j
| if $i == ($A|length) or $j == ($B|length) then empty
elif $A[$i] == $B[$j] then $A[$i], ([$i+1, $j+1] | pop)
elif $A[$i] < $B[$j] then [$i+1, $j] | pop
else [$i, $j+1] | pop
end;
count([0,0] | pop);

# Emit an array of the normalized bigrams of the input string
def bigrams:
# Emit a stream of the bigrams of the input string blindly
def bg: . as $in | range(0;length-1 ) | $in[.:.+2];
ascii_downcase
| reduce splits(" *") as $word ([];
. + [$word | bg]);


### The Sorensen-Dice coefficient

def sorensen($a; $b):
($a | bigrams | sort) as $A
| ($b | bigrams | sort) as $B
| 2 * count_commonality_of_multisets($A; $B) / (($A|length) + ($B|length));


### Exercises

def exercises:
"Primordial primes",
"Sunkist-Giuliani formula",
"Sieve of Euripides",
"Chowder numbers"
;

[inputs] as $phrases
| exercises as $test
| [ range(0; $phrases|length) as $i
| [sorensen($phrases[$i]; $test), $phrases[$i] ] ]
| sort_by(first)
| .[-5:]
| reverse
| "\($test) >",
map( " \(first|tostring|.[:4]|lpad(4)) \(.[1])")[],
""
</syntaxhighlight>
{{output}}
Invocation: jq -nrR -f sorensen-dice-coefficient.jq rc_tasks_2022_09_24.txt
<pre>
Primordial primes >
0.68 Sequence of primorial primes
0.66 Factorial primes
0.57 Primorial numbers
0.54 Prime words
0.52 Almost prime

Sunkist-Giuliani formula >
0.56 Almkvist-Giullera formula for pi
0.37 Faulhaber's formula
0.34 Haversine formula
0.33 Check Machin-like formulas
0.30 Resistance calculator

Sieve of Euripides >
0.46 Sieve of Pritchard
0.46 Four sides of square
0.41 Sieve of Eratosthenes
0.4 Piprimes
0.38 Sierpinski curve

Chowder numbers >
0.78 Chowla numbers
0.64 Powerful numbers
0.60 Rhonda numbers
0.60 Fermat numbers
0.6 Lah numbers
</pre>
</pre>