Sorensen–Dice coefficient: Difference between revisions

Line 47:

Tentative implementation:

<syntaxhighlight lang=J>TASKS=: fread '~/tasks.txt' NB. from Sorensen–Dice_coefficient/Tasks

~~stok~~=: [: (#~ ' '*/ .~:~])2]\ 7 u: tolower@rplc&(LF,' ')

sdtok=: [: (#~ ' '*/ .~:~])2]\ 7 u: tolower@rplc&(LF,' ')

~~union~~=: ,

sdinter=: {{

all=. ~.x,y

inter=: [-.-.

X=. <:#/.~all,x

SDI=: ((inter +&#~ inter~) % #@union)&stok S:0

Y=. <:#/.~all,y

⚫

nearest=: {{ m{.\:~ x (] ;"0~ ~~SDI~~) cutLF y }}~~</syntaxhighlight>~~

+/X<.Y

}}

sdunion=: #@,

SDC=: (2 * sdinter % sdunion)&sdtok S:0

⚫

nearest=: {{ m{.\:~ x (] ;"0~ SDC) cutLF y }}

fmt=: ((8j6": 0{::]),' ',1{::])"1</syntaxhighlight>

The trick here is the concept of "intersection" which we must use. We can't use set intersection -- the current draft task description, suggests that <code>SDI = 2 × (A ∩ B) / (A ⊎ B)</code> produces a number between 0 and 1. Because we're using division to produce this number, we must be using cardinality of the intersection rather than the intersection itself.

This is slightly different from the current draft task description, which suggests that <code>SDI = 2 × (A ∩ B) / (A ⊎ B)</code> produces a number between 0 and 1. If A and B are sets, each containing the same tokens, the result here would be 2 rather than 1. But we can make sense of this by assuming that the original algorithm was working with sequences rather than sets. The sequence difference is not commutative, so if <code>∩</code> represents sequence difference, and <code>⊎</code> represents sequence addition, it would make sense to define <code>SDI= ((A ∩ B) + (B ∩ A)) / (A ⊎ B)</code>, which is what we have done here (note that this change also includes an implicit shift from tokens to token counts somewhere in that calculation, as division only makes sense with numbers).

But if A and B are sets, each containing the same tokens, the result here using cardinality of sets would be 2 rather than 1.

Instead, we treat treat A and B as sequences of tokens (so repeated copies of a token are distinct) and for the cardinality of the intersection we count the number of times that each token appears in either A and in B and sum the minimum of the two counts. (So, tokens which only appear in A count 0 times, for example, where a token which appears 3 times in A and 2 times in B would contribute 2 to the sum.)

With this implementation, here's the task examples:

<pre> 'Primordial ~~primes~~' 5 nearest TASKS

<pre> fmt 'Primordial prime' 5 nearest TASKS

0.647059 Sequence of primorial primes

┌────────┬────────────────────────────┐

│0.~~740741│Factorial~~ primes │

0.615385 Factorial primes

⚫

0.592593 Primorial numbers

├────────┼────────────────────────────┤

⚫

0.571429 Prime words

│0.714286│Sequence of primorial primes│

0.545455 Almost prime

├────────┼────────────────────────────┤

⚫

fmt 'Sunkist-Giuliani formula' 5 nearest TASKS

⚫

│0.~~681818│Prime~~ words │

⚫

0.565217 Almkvist-Giullera formula for pi

├────────┼────────────────────────────┤

│0.~~652174│Almost~~ ~~prime~~ │

0.378378 Faulhaber's formula

⚫

0.342857 Haversine formula

├────────┼────────────────────────────┤

⚫

0.333333 Check Machin-like formulas

⚫

│0.~~642857│Primorial~~ numbers │

0.307692 Resistance calculator

└────────┴────────────────────────────┘

'~~Sunkist-Giuliani~~ ~~formula~~' 5 nearest TASKS

fmt 'Sieve of Euripides' 5 nearest TASKS

⚫

0.461538 Sieve of Pritchard

┌────────┬───────────────────────────────────┐

⚫

0.461538 Four sides of square

⚫

│0.~~608696│Almkvist~~-Giullera formula for pi │

⚫

0.413793 Sieve of Eratosthenes

├────────┼───────────────────────────────────┤

│0.~~378378│Faulhaber's~~ ~~formula~~ │

0.400000 Piprimes

0.384615 Sierpinski curve

├────────┼───────────────────────────────────┤

⚫

fmt 'Chowder numbers' 5 nearest TASKS

⚫

│0.~~371429│Haversine~~ formula │

0.782609 Chowla numbers

├────────┼───────────────────────────────────┤

0.640000 Powerful numbers

⚫

│0.~~357143│Check~~ Machin-like formulas │

0.608696 Rhonda numbers

├────────┼───────────────────────────────────┤

0.608696 Fermat numbers

│0.340426│Shoelace formula for polygonal area│

0.600000 Lah numbers </pre>

└────────┴───────────────────────────────────┘

⚫

~~'Sieve~~ of ~~Euripides~~' 5 nearest TASKS

┌────────┬────────────────────────┐

⚫

│0.~~461538│Sieve~~ of Pritchard │

├────────┼────────────────────────┤

⚫

│0.~~461538│Four~~ sides of square │

├────────┼────────────────────────┤

⚫

│0.~~413793│Sieve~~ of Eratosthenes │

├────────┼────────────────────────┤

│0.4 │Piprimes │

├────────┼────────────────────────┤

│0.392857│Law of cosines - triples│

└────────┴────────────────────────┘

⚫

'~~Chowder~~ ~~numbers~~' 5 nearest TASKS

┌────────┬──────────────┐

│0.826087│Chowla numbers│

├────────┼──────────────┤

│0.666667│Bell numbers │

├────────┼──────────────┤

│0.652174│Rhonda numbers│

├────────┼──────────────┤

│0.652174│Humble numbers│

├────────┼──────────────┤

│0.65 │Lah numbers │

└────────┴──────────────┘</pre>

=={{header|Phix}}==