Jaro-Winkler distance: Difference between revisions
Content added Content deleted
Thundergnat (talk | contribs) m (Thundergnat moved page Jaro-Winkler Distance to Jaro-Winkler distance: Follow normal task title capitalization policy) |
Alextretyak (talk | contribs) (Added 11l) |
||
Line 72: | Line 72: | ||
:* Comparing string similarity algorithms. [https://medium.com/@appaloosastore/string-similarity-algorithms-compared-3f7b4d12f0ff Comparison of algorithms on Medium] |
:* Comparing string similarity algorithms. [https://medium.com/@appaloosastore/string-similarity-algorithms-compared-3f7b4d12f0ff Comparison of algorithms on Medium] |
||
<br><br> |
<br><br> |
||
=={{header|11l}}== |
|||
{{trans|Python}} |
|||
<lang 11l>V WORDS = File(‘linuxwords.txt’).read_lines() |
|||
V MISSPELLINGS = [‘accomodate’, |
|||
‘definately’, |
|||
‘goverment’] |
|||
F jaro_winkler_distance(=st1, =st2) |
|||
I st1.len < st2.len |
|||
(st1, st2) = (st2, st1) |
|||
V len1 = st1.len |
|||
V len2 = st2.len |
|||
I len2 == 0 |
|||
R 0.0 |
|||
V delta = max(0, len2 I/ 2 - 1) |
|||
V flag = (0 .< len2).map(_ -> 0B) |
|||
[Char] ch1_match |
|||
L(ch1) st1 |
|||
V idx1 = L.index |
|||
L(ch2) st2 |
|||
V idx2 = L.index |
|||
I idx2 <= idx1 + delta & idx2 >= idx1 - delta & ch1 == ch2 & !(flag[idx2]) |
|||
flag[idx2] = 1B |
|||
ch1_match.append(ch1) |
|||
L.break |
|||
V matches = ch1_match.len |
|||
I matches == 0 |
|||
R 1.0 |
|||
V transpositions = 0 |
|||
V idx1 = 0 |
|||
L(ch2) st2 |
|||
V idx2 = L.index |
|||
I flag[idx2] |
|||
transpositions += (ch2 != ch1_match[idx1]) |
|||
idx1++ |
|||
V jaro = (Float(matches) / len1 + Float(matches) / len2 + (matches - transpositions / 2) / matches) / 3.0 |
|||
V commonprefix = 0 |
|||
L(i) 0 .< min(4, len2) |
|||
commonprefix += (st1[i] == st2[i]) |
|||
R 1.0 - (jaro + commonprefix * 0.1 * (1 - jaro)) |
|||
F within_distance(maxdistance, stri, maxtoreturn) |
|||
V arr = :WORDS.filter(w -> jaro_winkler_distance(@stri, w) <= @maxdistance) |
|||
arr.sort(key' x -> jaro_winkler_distance(@stri, x)) |
|||
R I arr.len <= maxtoreturn {arr} E arr[0 .< maxtoreturn] |
|||
L(STR) MISSPELLINGS |
|||
print("\nClose dictionary words ( distance < 0.15 using Jaro-Winkler distance) to \" "STR" \" are:\n Word | Distance") |
|||
L(w) within_distance(0.15, STR, 5) |
|||
print(‘#14 | #.4’.format(w, jaro_winkler_distance(STR, w)))</lang> |
|||
{{out}} |
|||
<pre> |
|||
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " accomodate " are: |
|||
Word | Distance |
|||
accommodate | 0.0182 |
|||
accommodated | 0.0333 |
|||
accommodates | 0.0333 |
|||
accommodating | 0.0815 |
|||
accommodation | 0.0815 |
|||
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " definately " are: |
|||
Word | Distance |
|||
definitely | 0.0400 |
|||
defiantly | 0.0422 |
|||
define | 0.0800 |
|||
definite | 0.0850 |
|||
definable | 0.0872 |
|||
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " goverment " are: |
|||
Word | Distance |
|||
government | 0.0533 |
|||
govern | 0.0667 |
|||
governments | 0.0697 |
|||
movement | 0.0810 |
|||
governmental | 0.0833 |
|||
</pre> |
|||
=={{header|Elm}}== |
=={{header|Elm}}== |