Jaccard index: Difference between revisions
(Add Factor) |
(→{{header|Wren}}: Updated in line with task description.) |
||
Line 69: | Line 69: | ||
=={{header|Wren}}== |
=={{header|Wren}}== |
||
{{libheader|Wren-set}} |
{{libheader|Wren-set}} |
||
{{libheader|Wren-trait}} |
|||
{{libheader|Wren-fmt}} |
|||
Note that the Set object in the above module is implemented as a Map and consequently the iteration order (and the order in which elements are printed) is undefined. |
Note that the Set object in the above module is implemented as a Map and consequently the iteration order (and the order in which elements are printed) is undefined. |
||
<lang ecmascript>import "./set" for Set |
<lang ecmascript>import "./set" for Set |
||
import "./trait" for Indexed |
|||
import "./fmt" for Fmt |
|||
var jacardIndex = Fn.new { |a, b| |
var jacardIndex = Fn.new { |a, b| |
||
Line 77: | Line 81: | ||
} |
} |
||
var a = Set.new([ |
var a = Set.new([]) |
||
var b = Set.new([ |
var b = Set.new([1, 2, 3, 4, 5]) |
||
⚫ | |||
⚫ | |||
var d = Set.new([2, 4, 6, 8, 10]) |
|||
System.print("b = %(b)") |
|||
var e = Set.new([2, 3, 5, 7]) |
|||
⚫ | |||
var f = Set.new([8]) |
|||
var sets = [a, b, c, d, e, f] |
|||
for (se in Indexed.new(sets)) { |
|||
var i = se.index |
|||
var s = se.value |
|||
s = s.toList.sort() // force original sorted order |
|||
Fmt.print("$s = $n", String.fromByte(65 + i), s) |
|||
} |
|||
var pairs = [ |
|||
[a, b], [a, c], [a, d], [a, e], [a, f], [b, c], [b, d], [b, e], |
|||
[b, f], [c, d], [c, e], [c, f], [d, e], [d, f], [e, f] |
|||
] |
|||
var names = [ |
|||
"AB", "AC", "AD", "AE", "AF", "BC", "BD", "BE", |
|||
"BF", "CD", "CE", "CF", "DE", "DF", "EF" |
|||
] |
|||
⚫ | |||
for (se in Indexed.new(pairs)) { |
|||
var n = names[se.index] |
|||
var ss = se.value |
|||
⚫ | |||
}</lang> |
|||
{{out}} |
{{out}} |
||
<pre> |
<pre> |
||
A = [] |
|||
⚫ | |||
B = [1, 2, 3, 4, 5] |
|||
C = [1, 3, 5, 7, 9] |
|||
⚫ | |||
D = [2, 4, 6, 8, 10] |
|||
E = [2, 3, 5, 7] |
|||
F = [8] |
|||
J(A, B) = 0 |
|||
J(A, C) = 0 |
|||
J(A, D) = 0 |
|||
J(A, E) = 0 |
|||
J(A, F) = 0 |
|||
J(B, C) = 0.428571 |
|||
J(B, D) = 0.25 |
|||
⚫ | |||
J(B, F) = 0 |
|||
J(C, D) = 0 |
|||
J(C, E) = 0.5 |
|||
J(C, F) = 0 |
|||
J(D, E) = 0.125 |
|||
J(D, F) = 0.2 |
|||
J(E, F) = 0 |
|||
</pre> |
</pre> |
Revision as of 00:33, 9 November 2021
This page uses content from Wikipedia. The original article was at Jaccard index. The list of authors can be seen in the page history. As with Rosetta Code, the text of Wikipedia is available under the GNU FDL. (See links for details on variance) |
The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It was developed by Paul Jaccard, originally giving the French name coefficient de communauté, and independently formulated again by T. Tanimoto. Thus, the Tanimoto index or Tanimoto coefficient are also used in some fields. However, they are identical in generally taking the ratio of Intersection over Union. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:
- J(A, B) = |A ∩ B|/|A ∪ B|
Define sets as follows, using any linear data structure:
A = {} B = {1, 2, 3, 4, 5} C = {1, 3, 5, 7, 9} D = {2, 4, 6, 8, 10} E = {2, 3, 5, 7} F = {8}
Write a program that computes the Jaccard index for every pairing of these sets, including self-pairings.
Factor
<lang factor>USING: formatting kernel math prettyprint sequences sets ;
- jaccard ( seq1 seq2 -- x )
2dup [ empty? ] both? [ 2drop 1 ] [ [ intersect ] [ union ] 2bi [ length ] bi@ / ] if ;
{ { } { 1 2 3 4 5 } { 1 3 5 7 9 } { 2 4 6 8 10 } { 2 3 5 7 } { 8 } } dup [ 2dup jaccard "%[%d, %] %[%d, %] -> %u\n" printf ] cartesian-each</lang>
- Output:
{ } { } -> 1 { } { 1, 2, 3, 4, 5 } -> 0 { } { 1, 3, 5, 7, 9 } -> 0 { } { 2, 4, 6, 8, 10 } -> 0 { } { 2, 3, 5, 7 } -> 0 { } { 8 } -> 0 { 1, 2, 3, 4, 5 } { } -> 0 { 1, 2, 3, 4, 5 } { 1, 2, 3, 4, 5 } -> 1 { 1, 2, 3, 4, 5 } { 1, 3, 5, 7, 9 } -> 3/7 { 1, 2, 3, 4, 5 } { 2, 4, 6, 8, 10 } -> 1/4 { 1, 2, 3, 4, 5 } { 2, 3, 5, 7 } -> 1/2 { 1, 2, 3, 4, 5 } { 8 } -> 0 { 1, 3, 5, 7, 9 } { } -> 0 { 1, 3, 5, 7, 9 } { 1, 2, 3, 4, 5 } -> 3/7 { 1, 3, 5, 7, 9 } { 1, 3, 5, 7, 9 } -> 1 { 1, 3, 5, 7, 9 } { 2, 4, 6, 8, 10 } -> 0 { 1, 3, 5, 7, 9 } { 2, 3, 5, 7 } -> 1/2 { 1, 3, 5, 7, 9 } { 8 } -> 0 { 2, 4, 6, 8, 10 } { } -> 0 { 2, 4, 6, 8, 10 } { 1, 2, 3, 4, 5 } -> 1/4 { 2, 4, 6, 8, 10 } { 1, 3, 5, 7, 9 } -> 0 { 2, 4, 6, 8, 10 } { 2, 4, 6, 8, 10 } -> 1 { 2, 4, 6, 8, 10 } { 2, 3, 5, 7 } -> 1/8 { 2, 4, 6, 8, 10 } { 8 } -> 1/5 { 2, 3, 5, 7 } { } -> 0 { 2, 3, 5, 7 } { 1, 2, 3, 4, 5 } -> 1/2 { 2, 3, 5, 7 } { 1, 3, 5, 7, 9 } -> 1/2 { 2, 3, 5, 7 } { 2, 4, 6, 8, 10 } -> 1/8 { 2, 3, 5, 7 } { 2, 3, 5, 7 } -> 1 { 2, 3, 5, 7 } { 8 } -> 0 { 8 } { } -> 0 { 8 } { 1, 2, 3, 4, 5 } -> 0 { 8 } { 1, 3, 5, 7, 9 } -> 0 { 8 } { 2, 4, 6, 8, 10 } -> 1/5 { 8 } { 2, 3, 5, 7 } -> 0 { 8 } { 8 } -> 1
Wren
Note that the Set object in the above module is implemented as a Map and consequently the iteration order (and the order in which elements are printed) is undefined. <lang ecmascript>import "./set" for Set import "./trait" for Indexed import "./fmt" for Fmt
var jacardIndex = Fn.new { |a, b|
if (a.count == 0 && b.count == 0) return 1 return a.intersect(b).count / a.union(b).count
}
var a = Set.new([]) var b = Set.new([1, 2, 3, 4, 5]) var c = Set.new([1, 3, 5, 7, 9]) var d = Set.new([2, 4, 6, 8, 10]) var e = Set.new([2, 3, 5, 7]) var f = Set.new([8]) var sets = [a, b, c, d, e, f]
for (se in Indexed.new(sets)) {
var i = se.index var s = se.value s = s.toList.sort() // force original sorted order Fmt.print("$s = $n", String.fromByte(65 + i), s)
}
var pairs = [
[a, b], [a, c], [a, d], [a, e], [a, f], [b, c], [b, d], [b, e], [b, f], [c, d], [c, e], [c, f], [d, e], [d, f], [e, f]
]
var names = [
"AB", "AC", "AD", "AE", "AF", "BC", "BD", "BE", "BF", "CD", "CE", "CF", "DE", "DF", "EF"
]
System.print() for (se in Indexed.new(pairs)) {
var n = names[se.index] var ss = se.value Fmt.print("J($s, $s) = $h", n[0], n[1], jacardIndex.call(ss[0], ss[1]))
}</lang>
- Output:
A = [] B = [1, 2, 3, 4, 5] C = [1, 3, 5, 7, 9] D = [2, 4, 6, 8, 10] E = [2, 3, 5, 7] F = [8] J(A, B) = 0 J(A, C) = 0 J(A, D) = 0 J(A, E) = 0 J(A, F) = 0 J(B, C) = 0.428571 J(B, D) = 0.25 J(B, E) = 0.5 J(B, F) = 0 J(C, D) = 0 J(C, E) = 0.5 J(C, F) = 0 J(D, E) = 0.125 J(D, F) = 0.2 J(E, F) = 0