Sorensen–Dice coefficient: Difference between revisions
m
→{{header|jq}}: simplify
(J second draft) |
m (→{{header|jq}}: simplify) |
||
(25 intermediate revisions by 10 users not shown) | |||
Line 1:
{{
The [[wp:Sørensen–Dice coefficient|Sørensen–Dice coefficient]], also known as the Sørensen–Dice index (or sdi, or sometimes by one of the individual names: sorensen or dice
The original use was in botany
[[Levenshtein distance|Levenshtein]]
Sørensen–Dice is more useful for 'fuzzy' matching partial
There are several different methods to tokenize objects for Sørensen–Dice comparisons. The most typical tokenizing scheme for text is to break the words up into bi-grams: groups of two consecutive letters.
Line 22:
Sørensen–Dice measures the similarity of two groups by dividing twice the intersection token count by the total token count of both groups
SDC = 2 × |A∩B| / (|A| + |B|)
where A, B and A∩B are to be understood as multisets, and that if an item, x, has multiplicity a in A and b in B, then it will have multiplicity min(a,b) in A∩B.
The Sørensen–Dice coefficient is thus a ratio between 0.0 and 1.0 giving the "percent similarity" between the two populations
Line 41:
How you get the task names is peripheral to the task. You can [[:Category:Programming_Tasks|web-scrape]] them or [[Sorensen–Dice coefficient/Tasks|download them to a file]], whatever.
If there is a built-in or easily, freely available library implementation for Sørensen–Dice coefficient calculations, it is acceptable to use that with a pointer to where it may be obtained.
=={{header|C++}}==
{{trans|Wren}}
<syntaxhighlight lang="cpp">#include <algorithm>
#include <cctype>
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <iterator>
#include <set>
#include <sstream>
#include <string>
#include <vector>
using bigram = std::pair<char, char>;
std::multiset<bigram> bigrams(const std::string& phrase) {
std::multiset<bigram> result;
std::istringstream is(phrase);
std::string word;
while (is >> word) {
for (char& ch : word) {
ch = std::tolower(static_cast<unsigned char>(ch));
}
size_t length = word.size();
if (length == 1) {
result.emplace(word[0], '\0');
} else {
for (size_t i = 0; i + 1 < length; ++i) {
result.emplace(word[i], word[i + 1]);
}
}
}
return result;
}
double sorensen(const std::string& s1, const std::string& s2) {
auto a = bigrams(s1);
auto b = bigrams(s2);
std::multiset<bigram> c;
std::set_intersection(a.begin(), a.end(), b.begin(), b.end(),
std::inserter(c, c.begin()));
return (2.0 * c.size()) / (a.size() + b.size());
}
int main() {
std::vector<std::string> tasks;
std::ifstream is("tasks.txt");
if (!is) {
std::cerr << "Cannot open tasks file.\n";
return EXIT_FAILURE;
}
std::string task;
while (getline(is, task)) {
tasks.push_back(task);
}
const size_t tc = tasks.size();
const std::string tests[] = {"Primordial primes",
"Sunkist-Giuliani formula",
"Sieve of Euripides", "Chowder numbers"};
std::vector<std::pair<double, size_t>> sdi(tc);
std::cout << std::fixed;
for (const std::string& test : tests) {
for (size_t i = 0; i != tc; ++i) {
sdi[i] = std::make_pair(sorensen(tasks[i], test), i);
}
std::partial_sort(sdi.begin(), sdi.begin() + 5, sdi.end(),
[](const std::pair<double, size_t>& a,
const std::pair<double, size_t>& b) {
return a.first > b.first;
});
std::cout << test << " >\n";
for (size_t i = 0; i < 5 && i < tc; ++i) {
std::cout << " " << sdi[i].first << ' ' << tasks[sdi[i].second]
<< '\n';
}
std::cout << '\n';
}
return EXIT_SUCCESS;
}</syntaxhighlight>
{{out}}
<pre>
Primordial primes >
0.685714 Sequence of primorial primes
0.666667 Factorial primes
0.571429 Primorial numbers
0.545455 Prime words
0.521739 Almost prime
Sunkist-Giuliani formula >
0.565217 Almkvist-Giullera formula for pi
0.378378 Faulhaber's formula
0.342857 Haversine formula
0.333333 Check Machin-like formulas
0.307692 Resistance calculator
Sieve of Euripides >
0.461538 Four sides of square
0.461538 Sieve of Pritchard
0.413793 Sieve of Eratosthenes
0.400000 Piprimes
0.384615 Sierpinski curve
Chowder numbers >
0.782609 Chowla numbers
0.640000 Powerful numbers
0.608696 Rhonda numbers
0.608696 Fermat numbers
0.600000 Lah numbers
</pre>
=={{header|J}}==
Line 59 ⟶ 171:
fmt=: ((8j6": 0{::]),' ',1{::])"1</syntaxhighlight>
The trick here is the concept of "intersection" which we must use. We can't use set intersection -- the current draft task description
But if A and B are sets, each containing the same tokens, the result here using cardinality of sets would be 2 rather than 1.
Instead, we
With this implementation, here's the task examples:
Line 89 ⟶ 201:
0.608696 Fermat numbers
0.600000 Lah numbers </pre>
=={{header|Java}}==
<syntaxhighlight lang="java">
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public final class SorensenDiceCoefficient {
public static void main(String[] args) throws IOException {
List<String> tasks = Files.readAllLines(Path.of("Rosetta Code Tasks.dat"), StandardCharsets.UTF_8);
List<String> tests = List.of(
"Primordial primes", "Sunkist-Giuliani formula", "Sieve of Euripides", "Chowder numbers" );
record TaskValue(String task, double value) {}
for ( String test : tests ) {
List<TaskValue> taskValues = new ArrayList<TaskValue>();
Map<String, Integer> bigramsTest = createBigrams(test);
for ( String task : tasks ) {
double value = sorensenDice(bigramsTest, createBigrams(task));
taskValues.add( new TaskValue(task, value) );
}
Collections.sort(taskValues, (one, two) -> Double.compare(two.value, one.value));
System.out.println(test + ":");
for ( int i = 0; i < 5; i++ ) {
System.out.println(String.format("%s%.4f%s%s",
" ", taskValues.get(i).value, " ", taskValues.get(i).task));
}
System.out.println();
}
}
private static double sorensenDice(Map<String, Integer> bigramsOne, Map<String, Integer> bigramsTwo) {
int intersectionSize = 0;
for ( Map.Entry<String, Integer> entry : bigramsOne.entrySet() ) {
if ( bigramsTwo.keySet().contains(entry.getKey()) ) {
intersectionSize += Math.min(entry.getValue(), bigramsTwo.get(entry.getKey()));
}
}
return 2.0 * intersectionSize / ( size(bigramsOne) + size(bigramsTwo) );
}
private static Map<String, Integer> createBigrams(String text) {
Map<String, Integer> result = new HashMap<String, Integer>();
for ( String word : text.toLowerCase().split(" ") ) {
if ( word.length() == 1 ) {
result.merge(word, 1, Integer::sum);
} else {
for ( int i = 0; i < word.length() - 1; i++ ) {
result.merge(word.substring(i, i + 2), 1, Integer::sum);
}
}
}
return result;
}
private static int size(Map<String, Integer> map) {
return map.values().stream().mapToInt(Integer::intValue).sum();
}
}
</syntaxhighlight>
{{ out }}
<pre>
Primordial primes:
0.6857 Sequence of primorial primes
0.6667 Factorial primes
0.5714 Primorial numbers
0.5455 Prime words
0.5217 Almost prime
Sunkist-Giuliani formula:
0.5652 Almkvist-Giullera formula for pi
0.3784 Faulhaber's formula
0.3429 Haversine formula
0.3333 Check Machin-like formulas
0.3077 Resistance calculator
Sieve of Euripides:
0.4615 Four sides of square
0.4615 Sieve of Pritchard
0.4138 Sieve of Eratosthenes
0.4000 Piprimes
0.3846 Sierpinski curve
Chowder numbers:
0.7826 Chowla numbers
0.6400 Powerful numbers
0.6087 Fermat numbers
0.6087 Rhonda numbers
0.6000 Lah numbers
</pre>
=={{header|jq}}==
{{Works with|jq}}
'''Works with gojq, the Go implementation of jq'''
'''Works with jaq, the Rust implementation of jq'''
'''Adapted from [[#Wren|Wren]]'''
<syntaxhighlight lang="jq">
### Generic preliminaries
def count(s): reduce s as $x (0; .+1);
def lpad($len): tostring | ($len - length) as $l | (" " * $l) + .;
# Emit the count of the common items in the two given sorted arrays
# viewed as multisets
def count_commonality_of_multisets($A; $B):
# Returns a stream of the common elements
def pop:
.[0] as $i
| .[1] as $j
| if $i == ($A|length) or $j == ($B|length) then empty
elif $A[$i] == $B[$j] then 1, ([$i+1, $j+1] | pop)
elif $A[$i] < $B[$j] then [$i+1, $j] | pop
else [$i, $j+1] | pop
end;
count([0,0] | pop);
# Emit an array of the normalized bigrams of the input string
def bigrams:
# Emit a stream of the bigrams of the input string blindly
def bg: . as $in | range(0;length-1 ) | $in[.:.+2];
ascii_downcase | [splits(" *") | bg];
### The Sorensen-Dice coefficient
def sorensen($a; $b):
($a | bigrams | sort) as $A
| ($b | bigrams | sort) as $B
| 2 * count_commonality_of_multisets($A; $B) / (($A|length) + ($B|length));
### Exercises
def exercises:
"Primordial primes",
"Sunkist-Giuliani formula",
"Sieve of Euripides",
"Chowder numbers"
;
[inputs] as $phrases
| exercises as $test
| [ range(0; $phrases|length) as $i
| [sorensen($phrases[$i]; $test), $phrases[$i] ] ]
| sort_by(first)
| .[-5:]
| reverse
| "\($test) >",
map( " \(first|tostring|.[:4]|lpad(4)) \(.[1])")[],
""
</syntaxhighlight>
{{output}}
Invocation: jq -nrR -f sorensen-dice-coefficient.jq rc_tasks_2022_09_24.txt
<pre>
Primordial primes >
0.68 Sequence of primorial primes
0.66 Factorial primes
0.57 Primorial numbers
0.54 Prime words
0.52 Almost prime
Sunkist-Giuliani formula >
0.56 Almkvist-Giullera formula for pi
0.37 Faulhaber's formula
0.34 Haversine formula
0.33 Check Machin-like formulas
0.30 Resistance calculator
Sieve of Euripides >
0.46 Sieve of Pritchard
0.46 Four sides of square
0.41 Sieve of Eratosthenes
0.4 Piprimes
0.38 Sierpinski curve
Chowder numbers >
0.78 Chowla numbers
0.64 Powerful numbers
0.60 Rhonda numbers
0.60 Fermat numbers
0.6 Lah numbers
</pre>
=={{header|Julia}}==
<syntaxhighlight lang="julia">using Multisets
""" convert a phrase into a count of bigram tokens of its words """
function tokenizetext(txt)
tokens = Multiset{String}()
words = split(lowercase(txt), r"\s+")
for w in words
a = collect(w)
if length(a) < 3
push!(tokens, w)
else
for i in 1:length(a)-1
push!(tokens, String(a[i:i+1]))
end
end
end
return tokens
end
""" Sorenson-Dice similarity of multisets """
function sorenson_dice(text1, text2)
bc1, bc2 = tokenizetext(text1), tokenizetext(text2)
return 2 * length(bc1 ∩ bc2) / (length(bc1) + length(bc2))
end
const alltasks = split(read("onedrive/documents/julia programs/tasks.txt", String), "\n")
# run tests
for test in ["Primordial primes", "Sunkist-Giuliani formula",
"Sieve of Euripides", "Chowder numbers"]
taskvalues = sort!([(sorenson_dice(test, t), t) for t in alltasks], rev = true)
println("\n$test:")
for (val, task) in taskvalues[begin:begin+4]
println(lpad(Float16(val), 8), " ", task)
end
end
</syntaxhighlight>{{out}}
<pre>
Primordial primes:
0.6855 Sequence of primorial primes
0.6665 Factorial primes
0.5713 Primorial numbers
0.5454 Prime words
0.522 Almost prime
Sunkist-Giuliani formula:
0.5654 Almkvist-Giullera formula for pi
0.3784 Faulhaber's formula
0.3428 Haversine formula
0.3333 Check Machin-like formulas
0.3076 Resistance calculator
Sieve of Euripides:
0.4614 Sieve of Pritchard
0.4614 Four sides of square
0.4138 Sieve of Eratosthenes
0.4 Piprimes
0.3845 Sierpinski curve
Chowder numbers:
0.7827 Chowla numbers
0.64 Powerful numbers
0.609 Rhonda numbers
0.609 Fermat numbers
0.6 Lah numbers
</pre>
=={{header|Nim}}==
<syntaxhighlight lang=Nim>import std/[algorithm, strutils, sugar, tables]
func bigrams(text: string): CountTable[string] =
## Extract the bigrams from a text.
for word in text.toLower.split(' '):
if word.len == 1:
result.inc(word)
else:
for i in 0..(word.len - 2):
result.inc(word[i..(i+1)])
func intersectionCount(a, b: CountTable[string]): int =
## Compute the cardinal of the intersection of two
## count tables.
for key, count in a:
if key in b:
inc result, min(count, b[key])
func card(a: CountTable[string]): int =
## Return the cardinal of a count table (i.e. the sum of counts).
for count in a.values:
inc result, count
func sorensenDice(text1, text2: string): float =
## Compute the Sorensen-dice coefficient of "text1" and "text2".
let ct1 = text1.bigrams
let ct2 = text2.bigrams
result = 2 * intersectionCount(ct1, ct2) / (ct1.card + ct2.card)
# Build the list of tasks.
let tasks = collect:
for line in lines("Sorensen-Dice.txt"):
line
const Tests = ["Primordial primes", "Sunkist-Giuliani formula",
"Sieve of Euripides", "Chowder numbers"]
for test in Tests:
echo test
var scores: seq[(float, string)]
for task in tasks:
scores.add (sorensenDice(test, task), task)
scores.sort(Descending)
for i in 0..4:
echo " ", scores[i][0].formatFloat(ffDecimal, 6), ' ', scores[i][1]
echo()
</syntaxhighlight>
{{out}}
<pre>Primordial primes
0.685714 Sequence of primorial primes
0.666667 Factorial primes
0.571429 Primorial numbers
0.545455 Prime words
0.521739 Almost prime
Sunkist-Giuliani formula
0.565217 Almkvist-Giullera formula for pi
0.378378 Faulhaber's formula
0.342857 Haversine formula
0.333333 Check Machin-like formulas
0.307692 Resistance calculator
Sieve of Euripides
0.461538 Sieve of Pritchard
0.461538 Four sides of square
0.413793 Sieve of Eratosthenes
0.400000 Piprimes
0.384615 Sierpinski curve
Chowder numbers
0.782609 Chowla numbers
0.640000 Powerful numbers
0.608696 Rhonda numbers
0.608696 Fermat numbers
0.600000 Lah numbers
</pre>
=={{header|Perl}}==
<syntaxhighlight lang="perl" line>use v5.036;
use Path::Tiny;
use List::Util <uniq head>;
sub bi_gram {
my $line = lc shift;
uniq map { substr $line,$_,2 } 0..length($line)-2;
}
sub score {
my($phrase, $word) = @_;
my %count;
my @match = bi_gram $phrase;
$count{$_}++ for @match, @$word;
2 * (grep { $count{$_} > 1 } keys %count) / (@match + @$word);
}
sub sorensen {
my($dict,$word,$cutoff) = @_; $cutoff //= 0.00;
my(%matches,$s);
($s = score($word, $$dict{$_})) > $cutoff and $matches{$_} = $s for keys %$dict;
%matches;
}
my %dict = map { $_ => [ bi_gram($_) ] } path('ref/Sorensen-Dice-Tasks.txt')->slurp =~ /.{10,}/gm;
for my $word ( ('Primordial primes', 'Sunkist-Giuliani formula', 'Sieve of Euripides', 'Chowder numbers') ) {
my(%scored,@ranked);
%scored = sorensen(\%dict,$word);
push @ranked, sprintf "%.3f $_", $scored{$_} for sort { $scored{$b} <=> $scored{$a} || $a cmp $b } keys %scored;
say "\n$word:\n" . join("\n", head 5, @ranked);
}</syntaxhighlight>
{{out}}
<pre>Primordial primes:
0.741 Factorial primes
0.629 Sequence of primorial primes
0.583 Almost prime
0.581 Next special primes
0.571 Pandigital prime
Sunkist-Giuliani formula:
0.542 Almkvist-Giullera formula for pi
0.368 Haversine formula
0.359 Faulhaber's formula
0.348 Check Machin-like formulas
0.303 FASTA format
Sieve of Euripides:
0.541 Sieve of Eratosthenes
0.529 Sieve of Pritchard
0.457 Four sides of square
0.457 The sieve of Sundaram
0.387 Sum of a series
Chowder numbers:
0.769 Chowla numbers
0.615 Rhonda numbers
0.609 Bell numbers
0.609 Lah numbers
0.593 Kaprekar numbers</pre>
=={{header|Phix}}==
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">bigram</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">words</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">split</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">lower</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)),</span>
Line 101 ⟶ 619:
<span style="color: #008080;">for</span> <span style="color: #000000;">word</span> <span style="color: #008080;">in</span> <span style="color: #000000;">words</span> <span style="color: #008080;">do</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">word</span><span style="color: #0000FF;">)-</span><span style="color: #000000;">1</span> <span style="color: #008080;">do</span>
<span style="color: #
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sort</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">res</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">intrasect</span><span style="color: #0000FF;">(</span><span style="color: #004080;">sequence</span> <span style="color: #000000;">s1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">s2</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">l1</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s1</span><span style="color: #0000FF;">),</span>
<span style="color: #000000;">l2</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s2</span><span style="color: #0000FF;">),</span>
<span style="color: #000000;">i1</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">i2</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1</span><span style="color: #0000FF;">,</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0</span>
<span style="color: #008080;">while</span> <span style="color: #000000;">i1</span><span style="color: #0000FF;"><=</span><span style="color: #000000;">l1</span> <span style="color: #008080;">and</span> <span style="color: #000000;">i2</span><span style="color: #0000FF;"><=</span><span style="color: #000000;">l2</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">c</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">compare</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s1</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i1</span><span style="color: #0000FF;">],</span><span style="color: #000000;">s2</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i2</span><span style="color: #0000FF;">])</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">+=</span> <span style="color: #0000FF;">(</span><span style="color: #000000;">c</span><span style="color: #0000FF;">=</span><span style="color: #000000;">0</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">i1</span> <span style="color: #0000FF;">+=</span> <span style="color: #0000FF;">(</span><span style="color: #000000;">c</span><span style="color: #0000FF;"><=</span><span style="color: #000000;">0</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">i2</span> <span style="color: #0000FF;">+=</span> <span style="color: #0000FF;">(</span><span style="color: #000000;">c</span><span style="color: #0000FF;">>=</span><span style="color: #000000;">0</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">while</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">res</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
Line 118 ⟶ 643:
<span style="color: #004080;">sequence</span> <span style="color: #000000;">scores</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{},</span>
<span style="color: #000000;">s1</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">bigram</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">phrase</span> <span style="color: #008080;">in</span> <span style="color: #000000;">dictionary</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">s2</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">bigram</span><span style="color: #0000FF;">(</span><span style="color: #000000;">phrase</span><span style="color: #0000FF;">)</span>
<span style="color: #
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%s >\n"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span>
Line 130 ⟶ 652:
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%f %s\n"</span><span style="color: #0000FF;">,{</span><span style="color: #000000;">scores</span><span style="color: #0000FF;">[</span><span style="color: #000000;">t</span><span style="color: #0000FF;">],</span><span style="color: #000000;">dictionary</span><span style="color: #0000FF;">[</span><span style="color: #000000;">t</span><span style="color: #0000FF;">]})</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\n"</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
Line 141 ⟶ 664:
Almkvist-Giullera formula for pi
Almost prime
Check Machin-like formulas
Chowla numbers
Line 164 ⟶ 686:
<!--</syntaxhighlight>-->
{{out}}
Extending the task list to the full 1577 entries changes nothing.
<pre>
Primordial primes >
0.
0.
0.571429 Primorial numbers
0.545455 Prime words
0.521739 Almost prime
Sunkist-Giuliani formula >
0.
0.
0.342857 Haversine formula
0.333333 Check Machin-like formulas
0.
Sieve of Euripides >
0.461538 Sieve of Pritchard
Line 184 ⟶ 708:
0.400000 Piprimes
0.384615 Sierpinski curve
Chowder numbers >
0.
0.
0.
0.
0.
</pre>
=={{header|Python}}==
Of the several Python string similarity libraries implementing Sorenson-Dice similarity, none give the same results as the
original example's Raku library, so this was imitated using Multisets, as per the C++ and Wren examples.
<syntaxhighlight lang="python">''' Rosetta Code task rosettacode.org/wiki/Sorensen–Dice_coefficient '''
from multiset import Multiset
def tokenizetext(txt):
''' convert a phrase into a count of bigram tokens of its words '''
arr = []
for wrd in txt.lower().split(' '):
arr += ([wrd] if len(wrd) == 1 else [wrd[i:i+2]
for i in range(len(wrd)-1)])
return Multiset(arr)
def sorenson_dice(text1, text2):
''' Sorenson-Dice similarity of Multisets '''
bc1, bc2 = tokenizetext(text1), tokenizetext(text2)
return 2 * len(bc1 & bc2) / (len(bc1) + len(bc2))
with open('tasklist_sorenson.txt', 'r') as fd:
alltasks = fd.read().split('\n')
for testtext in ['Primordial primes', 'Sunkist-Giuliani formula',
'Sieve of Euripides', 'Chowder numbers']:
taskvalues = sorted([(sorenson_dice(testtext, t), t)
for t in alltasks], reverse=True)
print(f'\n{testtext}:')
for (val, task) in taskvalues[:5]:
print(f' {val:.6f} {task}')
</syntaxhighlight>{{out}}
<pre>
Primordial primes:
0.685714 Sequence of primorial primes
0.666667 Factorial primes
0.571429 Primorial numbers
0.545455 Prime words
0.521739 Almost prime
Sunkist-Giuliani formula:
0.565217 Almkvist-Giullera formula for pi
0.378378 Faulhaber's formula
0.342857 Haversine formula
0.333333 Check Machin-like formulas
0.307692 Resistance calculator
Sieve of Euripides:
0.461538 Sieve of Pritchard
0.461538 Four sides of square
0.413793 Sieve of Eratosthenes
0.400000 Piprimes
0.384615 Sierpinski curve
Chowder numbers:
0.782609 Chowla numbers
0.640000 Powerful numbers
0.608696 Rhonda numbers
0.608696 Fermat numbers
0.600000 Lah numbers
</pre>
Line 243 ⟶ 832:
The results on this basis are the same as the Raku example.
<syntaxhighlight lang="
import "./str" for Str
import "./set" for Bag
|