Sorensen–Dice coefficient: Difference between revisions
Content added Content deleted
(clarification about A ∩ B) |
|||
Line 302: | Line 302: | ||
0.6087 Rhonda numbers |
0.6087 Rhonda numbers |
||
0.6000 Lah numbers |
0.6000 Lah numbers |
||
</pre> |
|||
=={{header|jq}}== |
|||
{{Works with|jq}} |
|||
'''Works with gojq, the Go implementation of jq''' |
|||
'''Works with jaq, the Rust implementation of jq''' |
|||
'''Adapted from [[#Wren|Wren]]''' |
|||
<syntaxhighlight lang="jq"> |
|||
### Generic preliminaries |
|||
def count(s): reduce s as $x (0; .+1); |
|||
def lpad($len): tostring | ($len - length) as $l | (" " * $l) + .; |
|||
# Emit the count of the common items in the two given sorted arrays |
|||
# viewed as multisets |
|||
def count_commonality_of_multisets($A; $B): |
|||
# Returns a stream of the common elements |
|||
def pop: |
|||
.[0] as $i |
|||
| .[1] as $j |
|||
| if $i == ($A|length) or $j == ($B|length) then empty |
|||
elif $A[$i] == $B[$j] then $A[$i], ([$i+1, $j+1] | pop) |
|||
elif $A[$i] < $B[$j] then [$i+1, $j] | pop |
|||
else [$i, $j+1] | pop |
|||
end; |
|||
count([0,0] | pop); |
|||
# Emit an array of the normalized bigrams of the input string |
|||
def bigrams: |
|||
# Emit a stream of the bigrams of the input string blindly |
|||
def bg: . as $in | range(0;length-1 ) | $in[.:.+2]; |
|||
ascii_downcase |
|||
| reduce splits(" *") as $word ([]; |
|||
. + [$word | bg]); |
|||
### The Sorensen-Dice coefficient |
|||
def sorensen($a; $b): |
|||
($a | bigrams | sort) as $A |
|||
| ($b | bigrams | sort) as $B |
|||
| 2 * count_commonality_of_multisets($A; $B) / (($A|length) + ($B|length)); |
|||
### Exercises |
|||
def exercises: |
|||
"Primordial primes", |
|||
"Sunkist-Giuliani formula", |
|||
"Sieve of Euripides", |
|||
"Chowder numbers" |
|||
; |
|||
[inputs] as $phrases |
|||
| exercises as $test |
|||
| [ range(0; $phrases|length) as $i |
|||
| [sorensen($phrases[$i]; $test), $phrases[$i] ] ] |
|||
| sort_by(first) |
|||
| .[-5:] |
|||
| reverse |
|||
| "\($test) >", |
|||
map( " \(first|tostring|.[:4]|lpad(4)) \(.[1])")[], |
|||
"" |
|||
</syntaxhighlight> |
|||
{{output}} |
|||
Invocation: jq -nrR -f sorensen-dice-coefficient.jq rc_tasks_2022_09_24.txt |
|||
<pre> |
|||
Primordial primes > |
|||
0.68 Sequence of primorial primes |
|||
0.66 Factorial primes |
|||
0.57 Primorial numbers |
|||
0.54 Prime words |
|||
0.52 Almost prime |
|||
Sunkist-Giuliani formula > |
|||
0.56 Almkvist-Giullera formula for pi |
|||
0.37 Faulhaber's formula |
|||
0.34 Haversine formula |
|||
0.33 Check Machin-like formulas |
|||
0.30 Resistance calculator |
|||
Sieve of Euripides > |
|||
0.46 Sieve of Pritchard |
|||
0.46 Four sides of square |
|||
0.41 Sieve of Eratosthenes |
|||
0.4 Piprimes |
|||
0.38 Sierpinski curve |
|||
Chowder numbers > |
|||
0.78 Chowla numbers |
|||
0.64 Powerful numbers |
|||
0.60 Rhonda numbers |
|||
0.60 Fermat numbers |
|||
0.6 Lah numbers |
|||
</pre> |
</pre> |
||