Sorensen–Dice coefficient: Difference between revisions

m (Remove draft tag. Draft for over a year, multiple implementations, little controversy)
Line 201:
0.608696 Fermat numbers
0.600000 Lah numbers </pre>
 
=={{header|Julia}}==
Julia has a multiset module but the module appears to implement its intersections as simple set intersection, so a counter package is used to preserve a minimum count in calculating intersection size. Note that the code handles 2-byte characters such that a 2-char sequence has 3 bytes, which causes a change in the fifth choice in the last example because "Erdős" has the multibyte char 'ő'.
<syntaxhighlight lang="julia">using DataStructures: counter
 
""" convert a phrase into a count of bigram tokens of its words """
function tokenizetext(txt)
tokens = counter(String)
words = split(lowercase(txt), r"\s+")
for w in words
if length(w) < 3
tokens[w] = tokens[w] + 1
else
for i in 1:length(w)-1
if isvalid(w, i) && isvalid(w, i + 1)
tokens[w[i:i+1]] = tokens[w[i:i+1]] + 1
end
end
end
end
return tokens
end
 
""" Sorenson-Dice similarity of multisets """
function sorenson_dice(text1, text2)
bc1, bc2 = tokenizetext(text1), tokenizetext(text2)
return 2 * sum(values((bc1 ∩ bc2).map)) / (sum(values(bc1)) + sum(values(bc2)))
end
 
const alltasks = split(read("onedrive/documents/julia programs/tasks.txt", String), "\n")
 
# run tests
for test in ["Primordial primes", "Sunkist-Giuliani formula",
"Sieve of Euripides", "Chowder numbers"]
taskvalues = sort!([(sorenson_dice(test, t), t) for t in alltasks], rev = true)
println("\n$test:")
for (val, task) in taskvalues[begin:begin+4]
println(lpad(Float16(val), 8), " ", task)
end
end
</syntaxhighlight>{{out}}
<pre>
Primordial primes:
0.6855 Sequence of primorial primes
0.6665 Factorial primes
0.5713 Primorial numbers
0.5454 Prime words
0.522 Almost prime
 
Sunkist-Giuliani formula:
0.5654 Almkvist-Giullera formula for pi
0.3784 Faulhaber's formula
0.3428 Haversine formula
0.3333 Check Machin-like formulas
0.3076 Resistance calculator
 
Sieve of Euripides:
0.4614 Sieve of Pritchard
0.4614 Four sides of square
0.4138 Sieve of Eratosthenes
0.4 Piprimes
0.3845 Sierpinski curve
 
Chowder numbers:
0.7827 Chowla numbers
0.64 Powerful numbers
0.609 Rhonda numbers
0.609 Fermat numbers
0.609 Erdős–Woods numbers
</pre>
 
=={{header|Nim}}==
4,105

edits