Sorensen–Dice coefficient: Difference between revisions
Content added Content deleted
(Added Wren) |
(J second draft) |
||
Line 47: | Line 47: | ||
Tentative implementation: |
Tentative implementation: |
||
<syntaxhighlight lang=J>TASKS=: fread '~/tasks.txt' NB. from Sorensen–Dice_coefficient/Tasks |
<syntaxhighlight lang=J>TASKS=: fread '~/tasks.txt' NB. from Sorensen–Dice_coefficient/Tasks |
||
sdtok=: [: (#~ ' '*/ .~:~])2]\ 7 u: tolower@rplc&(LF,' ') |
|||
sdinter=: {{ |
|||
all=. ~.x,y |
|||
inter=: [-.-. |
|||
X=. <:#/.~all,x |
|||
SDI=: ((inter +&#~ inter~) % #@union)&stok S:0 |
|||
Y=. <:#/.~all,y |
|||
⚫ | |||
+/X<.Y |
|||
}} |
|||
sdunion=: #@, |
|||
SDC=: (2 * sdinter % sdunion)&sdtok S:0 |
|||
⚫ | |||
fmt=: ((8j6": 0{::]),' ',1{::])"1</syntaxhighlight> |
|||
The trick here is the concept of "intersection" which we must use. We can't use set intersection -- the current draft task description, suggests that <code>SDI = 2 × (A ∩ B) / (A ⊎ B)</code> produces a number between 0 and 1. Because we're using division to produce this number, we must be using cardinality of the intersection rather than the intersection itself. |
|||
This is slightly different from the current draft task description, which suggests that <code>SDI = 2 × (A ∩ B) / (A ⊎ B)</code> produces a number between 0 and 1. If A and B are sets, each containing the same tokens, the result here would be 2 rather than 1. But we can make sense of this by assuming that the original algorithm was working with sequences rather than sets. The sequence difference is not commutative, so if <code>∩</code> represents sequence difference, and <code>⊎</code> represents sequence addition, it would make sense to define <code>SDI= ((A ∩ B) + (B ∩ A)) / (A ⊎ B)</code>, which is what we have done here (note that this change also includes an implicit shift from tokens to token counts somewhere in that calculation, as division only makes sense with numbers). |
|||
But if A and B are sets, each containing the same tokens, the result here using cardinality of sets would be 2 rather than 1. |
|||
Instead, we treat treat A and B as sequences of tokens (so repeated copies of a token are distinct) and for the cardinality of the intersection we count the number of times that each token appears in either A and in B and sum the minimum of the two counts. (So, tokens which only appear in A count 0 times, for example, where a token which appears 3 times in A and 2 times in B would contribute 2 to the sum.) |
|||
With this implementation, here's the task examples: |
With this implementation, here's the task examples: |
||
<pre> 'Primordial |
<pre> fmt 'Primordial prime' 5 nearest TASKS |
||
0.647059 Sequence of primorial primes |
|||
┌────────┬────────────────────────────┐ |
|||
0.615385 Factorial primes |
|||
⚫ | |||
├────────┼────────────────────────────┤ |
|||
⚫ | |||
│0.714286│Sequence of primorial primes│ |
|||
0.545455 Almost prime |
|||
├────────┼────────────────────────────┤ |
|||
⚫ | |||
⚫ | |||
⚫ | |||
├────────┼────────────────────────────┤ |
|||
0.378378 Faulhaber's formula |
|||
⚫ | |||
├────────┼────────────────────────────┤ |
|||
⚫ | |||
⚫ | |||
0.307692 Resistance calculator |
|||
└────────┴────────────────────────────┘ |
|||
' |
fmt 'Sieve of Euripides' 5 nearest TASKS |
||
⚫ | |||
┌────────┬───────────────────────────────────┐ |
|||
⚫ | |||
⚫ | |||
⚫ | |||
├────────┼───────────────────────────────────┤ |
|||
0.400000 Piprimes |
|||
0.384615 Sierpinski curve |
|||
├────────┼───────────────────────────────────┤ |
|||
⚫ | |||
⚫ | |||
0.782609 Chowla numbers |
|||
├────────┼───────────────────────────────────┤ |
|||
0.640000 Powerful numbers |
|||
⚫ | |||
0.608696 Rhonda numbers |
|||
├────────┼───────────────────────────────────┤ |
|||
0.608696 Fermat numbers |
|||
│0.340426│Shoelace formula for polygonal area│ |
|||
0.600000 Lah numbers </pre> |
|||
└────────┴───────────────────────────────────┘ |
|||
⚫ | |||
┌────────┬────────────────────────┐ |
|||
⚫ | |||
├────────┼────────────────────────┤ |
|||
⚫ | |||
├────────┼────────────────────────┤ |
|||
⚫ | |||
├────────┼────────────────────────┤ |
|||
│0.4 │Piprimes │ |
|||
├────────┼────────────────────────┤ |
|||
│0.392857│Law of cosines - triples│ |
|||
└────────┴────────────────────────┘ |
|||
⚫ | |||
┌────────┬──────────────┐ |
|||
│0.826087│Chowla numbers│ |
|||
├────────┼──────────────┤ |
|||
│0.666667│Bell numbers │ |
|||
├────────┼──────────────┤ |
|||
│0.652174│Rhonda numbers│ |
|||
├────────┼──────────────┤ |
|||
│0.652174│Humble numbers│ |
|||
├────────┼──────────────┤ |
|||
│0.65 │Lah numbers │ |
|||
└────────┴──────────────┘</pre> |
|||
=={{header|Phix}}== |
=={{header|Phix}}== |