Revision as of 09:54, 25 September 2022 (view source) PureFox (talk \| contribs) (Added Wren) ← Older edit		Revision as of 14:00, 25 September 2022 (view source) Rdm (talk \| contribs) (J second draft) Newer edit →
Line 47: Tentative implementation: <syntaxhighlight lang=J>TASKS=: fread '~/tasks.txt' NB. from Sorensen–Dice_coefficient/Tasks ~~stok~~sdtok=: [: (#~ ' '/ .~:~])2]\ 7 u: tolower@rplc&(LF,' ') ~~union~~sdinter=: ,{{ all=. ~.x,y ~~inter=: [-.-.~~ X=. <:#/.~all,x ~~SDI=: ((inter +&#~ inter~) % #@union)&stok S:0~~ Y=. <:#/.~all,y nearest=: {{ m{.\:~ x (] ;"0~ SDI) cutLF y }}</syntaxhighlight>▼ +/X<.Y }} sdunion=: #@, SDC=: (2 sdinter % sdunion)&sdtok S:0 ▲nearest=: {{ m{.\:~ x (] ;"0~ ~~SDI~~SDC) cutLF y }}~~</syntaxhighlight>~~ fmt=: ((8j6": 0{::]),' ',1{::])"1</syntaxhighlight> The trick here is the concept of "intersection" which we must use. We can't use set intersection -- the current draft task description, suggests that <code>SDI = 2 × (A ∩ B) / (A ⊎ B)</code> produces a number between 0 and 1. Because we're using division to produce this number, we must be using cardinality of the intersection rather than the intersection itself. This is slightly different from the current draft task description, which suggests that <code>SDI = 2 × (A ∩ B) / (A ⊎ B)</code> produces a number between 0 and 1. If A and B are sets, each containing the same tokens, the result here would be 2 rather than 1. But we can make sense of this by assuming that the original algorithm was working with sequences rather than sets. The sequence difference is not commutative, so if <code>∩</code> represents sequence difference, and <code>⊎</code> represents sequence addition, it would make sense to define <code>SDI= ((A ∩ B) + (B ∩ A)) / (A ⊎ B)</code>, which is what we have done here (note that this change also includes an implicit shift from tokens to token counts somewhere in that calculation, as division only makes sense with numbers). But if A and B are sets, each containing the same tokens, the result here using cardinality of sets would be 2 rather than 1. Instead, we treat treat A and B as sequences of tokens (so repeated copies of a token are distinct) and for the cardinality of the intersection we count the number of times that each token appears in either A and in B and sum the minimum of the two counts. (So, tokens which only appear in A count 0 times, for example, where a token which appears 3 times in A and 2 times in B would contribute 2 to the sum.) With this implementation, here's the task examples: <pre> fmt 'Primordial ~~primes~~prime' 5 nearest TASKS 0.647059 Sequence of primorial primes ~~┌────────┬────────────────────────────┐~~ │00.~~740741│Factorial~~615385 Factorial primes │ │00.~~642857│Primorial~~592593 Primorial numbers │▼ ~~├────────┼────────────────────────────┤~~ │00.~~681818│Prime~~571429 Prime words │▼ ~~│0.714286│Sequence of primorial primes│~~ 0.545455 Almost prime ~~├────────┼────────────────────────────┤~~ fmt '~~Chowder~~Sunkist-Giuliani ~~numbers~~formula' 5 nearest TASKS▼ ▲│0.681818│Prime words │ │00.~~608696│Almkvist~~565217 Almkvist-Giullera formula for pi │▼ ~~├────────┼────────────────────────────┤~~ │00.~~652174│Almost~~378378 ~~prime~~Faulhaber's formula │ │00.~~371429│Haversine~~342857 Haversine formula │▼ ~~├────────┼────────────────────────────┤~~ │00.~~357143│Check~~333333 Check Machin-like formulas │▼ ▲│0.642857│Primorial numbers │ 0.307692 Resistance calculator ~~└────────┴────────────────────────────┘~~ fmt '~~Sunkist-Giuliani~~Sieve of ~~formula~~Euripides' 5 nearest TASKS │00.~~461538│Sieve~~461538 Sieve of Pritchard │▼ ~~┌────────┬───────────────────────────────────┐~~ │00.~~461538│Four~~461538 Four sides of square │▼ ▲│0.608696│Almkvist-Giullera formula for pi │ │00.~~413793│Sieve~~413793 Sieve of Eratosthenes │▼ ~~├────────┼───────────────────────────────────┤~~ │00.~~378378│Faulhaber's~~400000 ~~formula~~Piprimes │ 0.384615 Sierpinski curve ~~├────────┼───────────────────────────────────┤~~ ~~'Sieve~~fmt of'Chowder ~~Euripides~~numbers' 5 nearest TASKS▼ ▲│0.371429│Haversine formula │ 0.782609 Chowla numbers ~~├────────┼───────────────────────────────────┤~~ 0.640000 Powerful numbers ▲│0.357143│Check Machin-like formulas │ 0.608696 Rhonda numbers ~~├────────┼───────────────────────────────────┤~~ 0.608696 Fermat numbers ~~│0.340426│Shoelace formula for polygonal area│~~ 0.600000 Lah numbers </pre> ~~└────────┴───────────────────────────────────┘~~ ▲ 'Sieve of Euripides' 5 nearest TASKS ~~┌────────┬────────────────────────┐~~ ▲│0.461538│Sieve of Pritchard │ ~~├────────┼────────────────────────┤~~ ▲│0.461538│Four sides of square │ ~~├────────┼────────────────────────┤~~ ▲│0.413793│Sieve of Eratosthenes │ ~~├────────┼────────────────────────┤~~ ~~│0.4 │Piprimes │~~ ~~├────────┼────────────────────────┤~~ ~~│0.392857│Law of cosines - triples│~~ ~~└────────┴────────────────────────┘~~ ▲ 'Chowder numbers' 5 nearest TASKS ~~┌────────┬──────────────┐~~ ~~│0.826087│Chowla numbers│~~ ~~├────────┼──────────────┤~~ ~~│0.666667│Bell numbers │~~ ~~├────────┼──────────────┤~~ ~~│0.652174│Rhonda numbers│~~ ~~├────────┼──────────────┤~~ ~~│0.652174│Humble numbers│~~ ~~├────────┼──────────────┤~~ ~~│0.65 │Lah numbers │~~ ~~└────────┴──────────────┘</pre>~~ =={{header\|Phix}}==

Sorensen–Dice coefficient: Difference between revisions

Sorensen–Dice coefficient (view source)

Revision as of 14:00, 25 September 2022