Sorensen–Dice coefficient: Difference between revisions

Content added Content deleted
(tidy up task description)
(clarification about A ∩ B)
Line 24: Line 24:
Sørensen–Dice measures the similarity of two groups by dividing twice the intersection token count by the total token count of both groups:
Sørensen–Dice measures the similarity of two groups by dividing twice the intersection token count by the total token count of both groups:


SDC = 2 × |A ∩ B| / (|A| + |B|)
SDC = 2 × |A∩B| / (|A| + |B|)

it being understood that A, B and A∩B are to be understood as multisets, and that if an item, x, has multiplicity a in A and b in B, then it will have multiplicity min(a,b) in A∩B.


The Sørensen–Dice coefficient is thus a ratio between 0.0 and 1.0 giving the "percent similarity" between the two populations.
The Sørensen–Dice coefficient is thus a ratio between 0.0 and 1.0 giving the "percent similarity" between the two populations.