Sorensen–Dice coefficient: Difference between revisions
Content added Content deleted
(→{{header|Phix}}: As per the comments for J, stop trying to use sets.) |
m (Python example) |
||
Line 307: | Line 307: | ||
0.600000 Lah numbers |
0.600000 Lah numbers |
||
</pre> |
</pre> |
||
=={{header|Python}}== |
|||
Of the several Python string similarity libraries implementing Sorenson-Dice similarity, none give the same results as the |
|||
original example's Raku library, so this was imitated using Multisets, as per the C++ and Wren examples. |
|||
<syntaxhighlight lang="python">''' Rosetta Code task rosettacode.org/wiki/Sorensen–Dice_coefficient ''' |
|||
from multiset import Multiset |
|||
def tokenizetext(txt): |
|||
''' convert a phrase into a count of bigram tokens of its words ''' |
|||
arr = [] |
|||
for wrd in txt.lower().split(' '): |
|||
arr += ([wrd] if len(wrd) == 1 else [wrd[i:i+2] for i in range(len(wrd)-1)]) |
|||
return Multiset(arr) |
|||
def sorenson_dice(s1, s2): |
|||
bc1, bc2 = tokenizetext(s1), tokenizetext(s2) |
|||
return 2 * len(bc1 & bc2) / (len(bc1) + len(bc2)) |
|||
with open('tasklist_sorenson.txt', 'r') as fd: |
|||
alltasks = fd.read().split('\n') |
|||
for testtext in ['Primordial primes', 'Sunkist-Giuliani formula', |
|||
'Sieve of Euripides', 'Chowder numbers']: |
|||
taskvalues = sorted([(sorenson_dice(testtext, t), t) |
|||
for t in alltasks], reverse=True) |
|||
print(f'\n{testtext}:') |
|||
for (val, task) in taskvalues[:5]: |
|||
print(f' {val:.6f} {task}') |
|||
</syntaxhighlight>{{out}} |
|||
<pre> |
|||
Primordial primes: |
|||
0.685714 Sequence of primorial primes |
|||
0.666667 Factorial primes |
|||
0.571429 Primorial numbers |
|||
0.545455 Prime words |
|||
0.521739 Almost prime |
|||
Sunkist-Giuliani formula: |
|||
0.565217 Almkvist-Giullera formula for pi |
|||
0.378378 Faulhaber's formula |
|||
0.342857 Haversine formula |
|||
0.333333 Check Machin-like formulas |
|||
0.307692 Resistance calculator |
|||
Sieve of Euripides: |
|||
0.461538 Sieve of Pritchard |
|||
0.461538 Four sides of square |
|||
0.413793 Sieve of Eratosthenes |
|||
0.400000 Piprimes |
|||
0.384615 Sierpinski curve |
|||
Chowder numbers: |
|||
0.782609 Chowla numbers |
|||
0.640000 Powerful numbers |
|||
0.608696 Rhonda numbers |
|||
0.608696 Fermat numbers |
|||
0.600000 Lah numbers |
|||
</pre> |
|||
=={{header|Raku}}== |
=={{header|Raku}}== |