Sorensen–Dice coefficient: Difference between revisions

m
Python example
(→‎{{header|Phix}}: As per the comments for J, stop trying to use sets.)
m (Python example)
Line 307:
0.600000 Lah numbers
</pre>
 
=={{header|Python}}==
Of the several Python string similarity libraries implementing Sorenson-Dice similarity, none give the same results as the
original example's Raku library, so this was imitated using Multisets, as per the C++ and Wren examples.
<syntaxhighlight lang="python">''' Rosetta Code task rosettacode.org/wiki/Sorensen–Dice_coefficient '''
 
from multiset import Multiset
 
def tokenizetext(txt):
''' convert a phrase into a count of bigram tokens of its words '''
arr = []
for wrd in txt.lower().split(' '):
arr += ([wrd] if len(wrd) == 1 else [wrd[i:i+2] for i in range(len(wrd)-1)])
return Multiset(arr)
 
 
def sorenson_dice(s1, s2):
bc1, bc2 = tokenizetext(s1), tokenizetext(s2)
return 2 * len(bc1 & bc2) / (len(bc1) + len(bc2))
 
 
with open('tasklist_sorenson.txt', 'r') as fd:
alltasks = fd.read().split('\n')
 
for testtext in ['Primordial primes', 'Sunkist-Giuliani formula',
'Sieve of Euripides', 'Chowder numbers']:
taskvalues = sorted([(sorenson_dice(testtext, t), t)
for t in alltasks], reverse=True)
print(f'\n{testtext}:')
for (val, task) in taskvalues[:5]:
print(f' {val:.6f} {task}')
</syntaxhighlight>{{out}}
<pre>
Primordial primes:
0.685714 Sequence of primorial primes
0.666667 Factorial primes
0.571429 Primorial numbers
0.545455 Prime words
0.521739 Almost prime
 
Sunkist-Giuliani formula:
0.565217 Almkvist-Giullera formula for pi
0.378378 Faulhaber's formula
0.342857 Haversine formula
0.333333 Check Machin-like formulas
0.307692 Resistance calculator
 
Sieve of Euripides:
0.461538 Sieve of Pritchard
0.461538 Four sides of square
0.413793 Sieve of Eratosthenes
0.400000 Piprimes
0.384615 Sierpinski curve
 
Chowder numbers:
0.782609 Chowla numbers
0.640000 Powerful numbers
0.608696 Rhonda numbers
0.608696 Fermat numbers
0.600000 Lah numbers
</pre>
 
 
=={{header|Raku}}==
4,102

edits