Sorensen–Dice coefficient: Difference between revisions

J second draft
(Added Wren)
(J second draft)
Line 47:
Tentative implementation:
<syntaxhighlight lang=J>TASKS=: fread '~/tasks.txt' NB. from Sorensen–Dice_coefficient/Tasks
stoksdtok=: [: (#~ ' '*/ .~:~])2]\ 7 u: tolower@rplc&(LF,' ')
unionsdinter=: ,{{
all=. ~.x,y
inter=: [-.-.
X=. <:#/.~all,x
SDI=: ((inter +&#~ inter~) % #@union)&stok S:0
Y=. <:#/.~all,y
nearest=: {{ m{.\:~ x (] ;"0~ SDI) cutLF y }}</syntaxhighlight>
+/X<.Y
}}
sdunion=: #@,
SDC=: (2 * sdinter % sdunion)&sdtok S:0
nearest=: {{ m{.\:~ x (] ;"0~ SDISDC) cutLF y }}</syntaxhighlight>
fmt=: ((8j6": 0{::]),' ',1{::])"1</syntaxhighlight>
 
The trick here is the concept of "intersection" which we must use. We can't use set intersection -- the current draft task description, suggests that <code>SDI = 2 × (A ∩ B) / (A ⊎ B)</code> produces a number between 0 and 1. Because we're using division to produce this number, we must be using cardinality of the intersection rather than the intersection itself.
This is slightly different from the current draft task description, which suggests that <code>SDI = 2 × (A ∩ B) / (A ⊎ B)</code> produces a number between 0 and 1. If A and B are sets, each containing the same tokens, the result here would be 2 rather than 1. But we can make sense of this by assuming that the original algorithm was working with sequences rather than sets. The sequence difference is not commutative, so if <code>∩</code> represents sequence difference, and <code>⊎</code> represents sequence addition, it would make sense to define <code>SDI= ((A ∩ B) + (B ∩ A)) / (A ⊎ B)</code>, which is what we have done here (note that this change also includes an implicit shift from tokens to token counts somewhere in that calculation, as division only makes sense with numbers).
But if A and B are sets, each containing the same tokens, the result here using cardinality of sets would be 2 rather than 1.
 
Instead, we treat treat A and B as sequences of tokens (so repeated copies of a token are distinct) and for the cardinality of the intersection we count the number of times that each token appears in either A and in B and sum the minimum of the two counts. (So, tokens which only appear in A count 0 times, for example, where a token which appears 3 times in A and 2 times in B would contribute 2 to the sum.)
 
With this implementation, here's the task examples:
<pre> fmt 'Primordial primesprime' 5 nearest TASKS
0.647059 Sequence of primorial primes
┌────────┬────────────────────────────┐
│00.740741│Factorial615385 Factorial primes
│00.642857│Primorial592593 Primorial numbers
├────────┼────────────────────────────┤
│00.681818│Prime571429 Prime words
│0.714286│Sequence of primorial primes│
0.545455 Almost prime
├────────┼────────────────────────────┤
fmt 'ChowderSunkist-Giuliani numbersformula' 5 nearest TASKS
│0.681818│Prime words │
│00.608696│Almkvist565217 Almkvist-Giullera formula for pi
├────────┼────────────────────────────┤
│00.652174│Almost378378 primeFaulhaber's formula
│00.371429│Haversine342857 Haversine formula
├────────┼────────────────────────────┤
│00.357143│Check333333 Check Machin-like formulas
│0.642857│Primorial numbers │
0.307692 Resistance calculator
└────────┴────────────────────────────┘
fmt 'Sunkist-GiulianiSieve of formulaEuripides' 5 nearest TASKS
│00.461538│Sieve461538 Sieve of Pritchard
┌────────┬───────────────────────────────────┐
│00.461538│Four461538 Four sides of square
│0.608696│Almkvist-Giullera formula for pi │
│00.413793│Sieve413793 Sieve of Eratosthenes
├────────┼───────────────────────────────────┤
│00.378378│Faulhaber's400000 formulaPiprimes
0.384615 Sierpinski curve
├────────┼───────────────────────────────────┤
'Sievefmt of'Chowder Euripidesnumbers' 5 nearest TASKS
│0.371429│Haversine formula │
0.782609 Chowla numbers
├────────┼───────────────────────────────────┤
0.640000 Powerful numbers
│0.357143│Check Machin-like formulas │
0.608696 Rhonda numbers
├────────┼───────────────────────────────────┤
0.608696 Fermat numbers
│0.340426│Shoelace formula for polygonal area│
0.600000 Lah numbers </pre>
└────────┴───────────────────────────────────┘
'Sieve of Euripides' 5 nearest TASKS
┌────────┬────────────────────────┐
│0.461538│Sieve of Pritchard │
├────────┼────────────────────────┤
│0.461538│Four sides of square │
├────────┼────────────────────────┤
│0.413793│Sieve of Eratosthenes │
├────────┼────────────────────────┤
│0.4 │Piprimes │
├────────┼────────────────────────┤
│0.392857│Law of cosines - triples│
└────────┴────────────────────────┘
'Chowder numbers' 5 nearest TASKS
┌────────┬──────────────┐
│0.826087│Chowla numbers│
├────────┼──────────────┤
│0.666667│Bell numbers │
├────────┼──────────────┤
│0.652174│Rhonda numbers│
├────────┼──────────────┤
│0.652174│Humble numbers│
├────────┼──────────────┤
│0.65 │Lah numbers │
└────────┴──────────────┘</pre>
 
=={{header|Phix}}==
6,962

edits