Sorensen–Dice coefficient: Difference between revisions

m
→‎Julia: char handling
(→‎Julia: file name)
m (→‎Julia: char handling)
Line 203:
 
=={{header|Julia}}==
 
Note that the code handles 2-byte characters such that a 2-char sequence containing one has 3 bytes, which causes a change in the fifth choice in the last example because "Erdős" has the multibyte char 'ő'.
<syntaxhighlight lang="julia">using Multisets
 
Line 214:
push!(tokens, w)
else
fora i= in 1:lengthcollect(w)-1
isvalid(w,for i) &&in isvalid1:length(w, i + 1a) && push!(tokens, w[i:i+-1])
push!(tokens, String(a[i:i+1]))
end
end
Line 223 ⟶ 224:
 
""" Sorenson-Dice similarity of multisets """
function sorensondicesorenson_dice(text1, text2)
bc1, bc2 = tokenizetext(text1), tokenizetext(text2)
return 2 * length(bc1 ∩ bc2) / (length(bc1) + length(bc2))
end
 
const alltasks = split(read("onedrive/documents/julia programs/tasks.txt", String), "\n")
 
# run tests
for test in ["Primordial primes", "Sunkist-Giuliani formula",
"Sieve of Euripides", "Chowder numbers"]
taskvalues = sort!([(sorensondicesorenson_dice(test, t), t) for t in alltasks], rev = true)
println("\n$test:")
for (val, task) in taskvalues[begin:begin+4]
Line 244 ⟶ 245:
Primordial primes:
0.6855 Sequence of primorial primes
0.6665 Factorial primes
0.5713 Primorial numbers
0.5454 Prime words
0.522 Almost prime
 
Sunkist-Giuliani formula:
Line 268 ⟶ 269:
0.609 Rhonda numbers
0.609 Fermat numbers
0.6096 Erdős–WoodsLah numbers
</pre>
 
4,107

edits