Sorensen–Dice coefficient: Difference between revisions

Line 171:

fmt=: ((8j6": 0{::]),' ',1{::])"1</syntaxhighlight>

The trick here is the concept of "intersection" which we must use. We can't use set intersection -- the current draft task description, suggests that <code>SDI = 2 × (A ∩ B) / (A ⊎ B)</code> produces a number between 0 and 1. Because we're using division to produce this number, we must be using cardinality of the intersection rather than the intersection itself.

The trick here is the concept of "intersection" which we must use. We can't use set intersection -- the current draft task description suggests that <code>SDI = 2 × (A ∩ B) / (A ⊎ B)</code> produces a number between 0 and 1. Because we're using division to produce this number, we must be using cardinality of the intersection rather than the intersection itself.

But if A and B are sets, each containing the same tokens, the result here using cardinality of sets would be 2 rather than 1.

Instead, we ~~treat~~ treat A and B as sequences of tokens (so repeated copies of a token are distinct) ~~and~~ for the cardinality of the intersection we count the number of times that each token appears in either A and in B and sum the minimum of the two counts. (So, tokens which only appear in A count 0 times, for example, where a token which appears 3 times in A and 2 times in B would contribute 2 to the sum.)

Instead, we treat A and B as sequences of tokens (so repeated copies of a token are distinct), for the cardinality of the intersection we count the number of times that each token appears in either A and in B and sum the minimum of the two counts. (So, tokens which only appear in A count 0 times, for example, where a token which appears 3 times in A and 2 times in B would contribute 2 to the sum.)

With this implementation, here's the task examples: