Sorensen–Dice coefficient: Difference between revisions
Content added Content deleted
m (add a docstring) |
m (→{{header|J}}: grammar) |
||
Line 171: | Line 171: | ||
fmt=: ((8j6": 0{::]),' ',1{::])"1</syntaxhighlight> |
fmt=: ((8j6": 0{::]),' ',1{::])"1</syntaxhighlight> |
||
The trick here is the concept of "intersection" which we must use. We can't use set intersection -- the current draft task description |
The trick here is the concept of "intersection" which we must use. We can't use set intersection -- the current draft task description suggests that <code>SDI = 2 × (A ∩ B) / (A ⊎ B)</code> produces a number between 0 and 1. Because we're using division to produce this number, we must be using cardinality of the intersection rather than the intersection itself. |
||
But if A and B are sets, each containing the same tokens, the result here using cardinality of sets would be 2 rather than 1. |
But if A and B are sets, each containing the same tokens, the result here using cardinality of sets would be 2 rather than 1. |
||
Instead, we |
Instead, we treat A and B as sequences of tokens (so repeated copies of a token are distinct), for the cardinality of the intersection we count the number of times that each token appears in either A and in B and sum the minimum of the two counts. (So, tokens which only appear in A count 0 times, for example, where a token which appears 3 times in A and 2 times in B would contribute 2 to the sum.) |
||
With this implementation, here's the task examples: |
With this implementation, here's the task examples: |