Jaro-Winkler distance: Difference between revisions

m
typo
m (Minor edit to Rust code)
m (typo)
 
(41 intermediate revisions by 17 users not shown)
Line 1:
{{draft task|Jaro-Winkler Distance}}
 
The Jaro-Winkler distance is a metric for measuring the edit distance between words.
It is similar to the more basic LevensteinLevenshtein distance but the Jaro distance also accounts
for transpositions between letters in the words. With the Winkler modification to the Jaro
metric, the Jaro-Winkler distance also adds an increase in similarity for words which
Line 20:
Where:
 
* <math>m</math> &nbsp; is the number of ''matching characters (the same character inwithin themax(|s1|, same|s2|)/2 position- 1 of one another)'';
* <math>t</math> &nbsp; is half the number of ''transpositions (a shared character placed in different positions)''.
<br />
Line 60:
Using a dictionary of your choice and the following list of 9 commonly misspelled words:
 
"accomodate​accomodate", "definately​definately", "goverment​", "occured", "publically", "recieve​", "seperate", "untill​untill", "wich​"
 
* Calculate the Jaro-Winkler distance between the misspelled word and words in the dictionary.
Line 73:
<br><br>
 
=={{header|11l}}==
{{trans|Python}}
 
<syntaxhighlight lang="11l">V WORDS = File(‘linuxwords.txt’).read_lines()
V MISSPELLINGS = [‘accomodate’,
‘definately’,
‘goverment’]
 
F jaro_winkler_distance(=st1, =st2)
I st1.len < st2.len
(st1, st2) = (st2, st1)
V len1 = st1.len
V len2 = st2.len
I len2 == 0
R 0.0
V delta = max(0, len2 I/ 2 - 1)
V flag = (0 .< len2).map(_ -> 0B)
[Char] ch1_match
L(ch1) st1
V idx1 = L.index
L(ch2) st2
V idx2 = L.index
I idx2 <= idx1 + delta & idx2 >= idx1 - delta & ch1 == ch2 & !(flag[idx2])
flag[idx2] = 1B
ch1_match.append(ch1)
L.break
V matches = ch1_match.len
I matches == 0
R 1.0
V transpositions = 0
V idx1 = 0
L(ch2) st2
V idx2 = L.index
I flag[idx2]
transpositions += (ch2 != ch1_match[idx1])
idx1++
V jaro = (Float(matches) / len1 + Float(matches) / len2 + (matches - transpositions / 2) / matches) / 3.0
V commonprefix = 0
L(i) 0 .< min(4, len2)
commonprefix += (st1[i] == st2[i])
R 1.0 - (jaro + commonprefix * 0.1 * (1 - jaro))
 
F within_distance(maxdistance, stri, maxtoreturn)
V arr = :WORDS.filter(w -> jaro_winkler_distance(@stri, w) <= @maxdistance)
arr.sort(key' x -> jaro_winkler_distance(@stri, x))
R I arr.len <= maxtoreturn {arr} E arr[0 .< maxtoreturn]
 
L(STR) MISSPELLINGS
print("\nClose dictionary words ( distance < 0.15 using Jaro-Winkler distance) to \" "STR" \" are:\n Word | Distance")
L(w) within_distance(0.15, STR, 5)
print(‘#14 | #.4’.format(w, jaro_winkler_distance(STR, w)))</syntaxhighlight>
 
{{out}}
<pre>
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " accomodate " are:
Word | Distance
accommodate | 0.0182
accommodated | 0.0333
accommodates | 0.0333
accommodating | 0.0815
accommodation | 0.0815
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " definately " are:
Word | Distance
definitely | 0.0400
defiantly | 0.0422
define | 0.0800
definite | 0.0850
definable | 0.0872
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " goverment " are:
Word | Distance
government | 0.0533
govern | 0.0667
governments | 0.0697
movement | 0.0810
governmental | 0.0833
</pre>
 
=={{header|Elm}}==
Author: zh5
<syntaxhighlight lang="elm">module JaroWinkler exposing (similarity)
 
 
commonPrefixLength : List a -> List a -> Int -> Int
commonPrefixLength xs ys counter =
case ( xs, ys ) of
( x :: xs_, y :: ys_ ) ->
if x == y then
commonPrefixLength xs_ ys_ (counter + 1)
 
else
counter
 
_ ->
counter
 
similarity : String -> String -> Float
similarity s1 s2 =
let
chars1 =
String.toList s1
 
chars2 =
String.toList s2
 
jaroScore =
jaro chars1 chars2
 
l =
toFloat <| min (commonPrefixLength chars1 chars2 0) 4
 
p =
0.1
in
jaroScore + (l * p * (1.0 - jaroScore))
 
 
containtsInNextN : Int -> a -> List a -> Bool
containtsInNextN i a items =
case ( i, items ) of
( 0, _ ) ->
False
 
( _, [] ) ->
False
 
( _, item :: rest ) ->
if item == a then
True
 
else
containtsInNextN (i - 1) a rest
 
 
exists : Int -> Int -> List a -> a -> Bool
exists startAt endAt items i =
if endAt < startAt then
False
 
else if startAt == 0 then
case items of
first :: rest ->
if i == first then
True
 
else
exists 0 (endAt - 1) rest i
 
[] ->
False
 
else
exists 0 (endAt - startAt) (List.drop startAt items) i
 
 
existsInWindow : a -> List a -> Int -> Int -> Bool
existsInWindow item items offset radius =
let
startAt =
max 0 (offset - radius)
 
endAt =
min (offset + radius) (List.length items - 1)
in
exists startAt endAt items item
 
 
transpositions : List a -> List a -> Int -> Int
transpositions xs ys counter =
case ( xs, ys ) of
( [], _ ) ->
counter
 
( _, [] ) ->
counter
 
( x :: xs_, y :: ys_ ) ->
if x /= y then
transpositions xs_ ys_ (counter + 1)
 
else
transpositions xs_ ys_ counter
 
 
commonItems : List a -> List a -> Int -> List a
commonItems items1 items2 radius =
items1
|> List.indexedMap
(\index item ->
if existsInWindow item items2 index radius then
[ item ]
 
else
[]
)
|> List.concat
 
 
jaro : List Char -> List Char -> Float
jaro chars1 chars2 =
let
minLenth =
min (List.length chars1) (List.length chars2)
 
matchRadius =
minLenth // 2 + (minLenth |> modBy 2)
 
c1 =
commonItems chars1 chars2 matchRadius
 
c2 =
commonItems chars2 chars1 matchRadius
 
c1length =
toFloat (List.length c1)
 
c2length =
toFloat (List.length c2)
 
mismatches =
transpositions c1 c2 0
 
transpositionScore =
(toFloat mismatches + abs (c1length - c2length)) / 2.0
 
s1length =
toFloat (List.length chars1)
 
s2length =
toFloat (List.length chars2)
 
tLength =
max c1length c2length
 
result =
(c1length / s1length + c2length / s2length + (tLength - transpositionScore) / tLength) / 3.0
in
if isNaN result then
0.0
 
else
result
</syntaxhighlight>
 
=={{header|ALGOL 68}}==
{{works with|ALGOL 68G|Any - tested with release 2.8.3.win32}}
{{Trans|Wren}} - the actual distance routines are translated from the Wren sample, the file reading and asociative arrays etc. are based on similar Algol 68 task solutions.
<br>
Uses unixdict.txt - possibly a different version to those used by some other solutions, as this finds a slightly different list of matches for "seperate" (assuming I got the translation correct!).
<br>
Prints the 6 closest matches regarddless of their distance (i.e. we don't restrict it to matches closer that 0.15).
<syntaxhighlight lang="algol68">PROC jaro sim = ( STRING sp1, sp2 )REAL:
IF STRING s1 = sp1[ AT 0 ];
STRING s2 = sp2[ AT 0 ];
INT le1 = ( UPB s1 - LWB s1 ) + 1;
INT le2 = ( UPB s2 - LWB s2 ) + 1;
le1 < 1 AND le2 < 1
THEN # both strings are empty # 1
ELIF le1 < 1 OR le2 < 1
THEN # one of the strings is empty # 0
ELSE # both strings are non-empty #
INT dist := IF le2 > le1 THEN le2 ELSE le1 FI;
dist OVERAB 2 -:= 1;
[ 0 : le1 ]BOOL matches1; FOR i FROM LWB matches1 TO UPB matches1 DO matches1[ i ] := FALSE OD;
[ 0 : le2 ]BOOL matches2; FOR i FROM LWB matches2 TO UPB matches2 DO matches2[ i ] := FALSE OD;
INT matches := 0;
INT transpos := 0;
FOR i FROM LWB s1 TO UPB s1 DO
INT start := i - dist;
IF start < 0 THEN start := 0 FI;
INT end := i + dist + 1;
IF end > le2 THEN end := le2 FI;
FOR k FROM start TO end - 1
WHILE IF matches2[ k ]
THEN TRUE
ELIF s1[ i ] /= s2[ k ]
THEN TRUE
ELSE
matches2[ k ] := matches1[ i ] := TRUE;
matches +:= 1;
FALSE
FI
DO SKIP OD
OD;
IF matches = 0
THEN 0
ELSE
INT k := 0;
FOR i FROM LWB s1 TO UPB s1 DO
IF matches1[ i ] THEN
WHILE NOT matches2[ k ] DO k +:= 1 OD;
IF s1[ i ] /= s2[ k ] THEN transpos +:= 1 FI;
k +:= 1
FI
OD;
transpos OVERAB 2;
( ( matches / le1 )
+ ( matches / le2 )
+ ( ( matches - transpos ) / matches )
) / 3
FI
FI # jaro sim # ;
PROC jaro winkler dist = ( STRING sp, tp )REAL:
BEGIN
STRING s = sp[ AT 0 ];
STRING t = tp[ AT 0 ];
INT ls = ( UPB s - LWB s ) + 1;
INT lt = ( UPB t - LWB t ) + 1;
INT l max := IF ls < lt THEN ls ELSE lt FI;
IF l max > 4 THEN l max := 4 FI;
INT l := 0;
FOR i FROM 0 TO l max - 1 DO IF s[ i ] = t[ i ] THEN l +:= 1 FI OD;
REAL js = jaro sim( s, t );
REAL p = 0.1;
REAL ws = js + ( l * p * ( 1 - js ) );
1 - ws
END # jaro winkler dist # ;
# include the Associative Array code #
PR read "aArray.a68" PR
# test cases #
[]STRING misspelt = ( "accomodate", "definately", "goverment", "occured", "publically", "recieve", "seperate", "untill", "wich" );
IF FILE input file;
STRING file name = "unixdict.txt";
open( input file, file name, stand in channel ) /= 0
THEN
# failed to open the file #
print( ( "Unable to open """ + file name + """", newline ) )
ELSE
# file opened OK #
BOOL at eof := FALSE;
# set the EOF handler for the file #
on logical file end( input file, ( REF FILE f )BOOL:
BEGIN
# note that we reached EOF on the #
# latest read #
at eof := TRUE;
# return TRUE so processing can continue #
TRUE
END
);
REF AARRAY words := INIT LOC AARRAY;
STRING word;
WHILE NOT at eof
DO
STRING word;
get( input file, ( word, newline ) );
words // word := ""
OD;
# close the file #
close( input file );
# look for near matches to the misspelt words #
INT max closest = 6; # max number of closest matches to show #
FOR m pos FROM LWB misspelt TO UPB misspelt DO
[ max closest ]STRING closest word;
[ max closest ]REAL closest jwd;
FOR i TO max closest DO closest word[ i ] := ""; closest jwd[ i ] := 999 999 999 OD;
REF AAELEMENT e := FIRST words;
WHILE e ISNT nil element DO
STRING word = key OF e;
REAL jwd = jaro winkler dist( misspelt[ m pos ], word );
BOOL found better match := FALSE;
FOR i TO max closest WHILE NOT found better match DO
IF jwd <= closest jwd[ i ] THEN
# found a new closer match #
found better match := TRUE;
# shuffle the others down 1 and insert the new match #
FOR j FROM max closest BY - 1 TO i + 1 DO
closest word[ j ] := closest word[ j - 1 ];
closest jwd[ j ] := closest jwd[ j - 1 ]
OD;
closest word[ i ] := word;
closest jwd[ i ] := jwd
FI
OD;
e := NEXT words
OD;
print( ( "Misspelt word: ", misspelt[ m pos ], ":", newline ) );
FOR i TO max closest DO
print( ( fixed( closest jwd[ i ], -8, 4 ), " ", closest word[ i ], newline ) )
OD;
print( ( newline ) )
OD
FI</syntaxhighlight>
{{out}}
<pre>
Misspelt word: accomodate:
0.0182 accommodate
0.1044 accordant
0.1136 accolade
0.1219 acclimate
0.1327 accompanist
0.1333 accost
 
Misspelt word: definately:
0.0800 define
0.0850 definite
0.0886 defiant
0.1200 definitive
0.1219 designate
0.1267 deflate
 
Misspelt word: goverment:
0.0667 govern
0.1167 governor
0.1175 governess
0.1330 governance
0.1361 coverlet
0.1367 sovereignty
 
Misspelt word: occured:
0.0250 occurred
0.0571 occur
0.0952 occurrent
0.1056 occlude
0.1217 concurred
0.1429 cure
 
Misspelt word: publically:
0.0800 public
0.1325 pullback
0.1327 publication
0.1400 pull
0.1556 pulley
0.1571 publish
 
Misspelt word: recieve:
0.0333 receive
0.0667 relieve
0.0762 reeve
0.0852 recessive
0.0852 receptive
0.0905 recipe
 
Misspelt word: seperate:
0.0708 desperate
0.0917 separate
0.1042 temperate
0.1048 repartee
0.1167 sewerage
0.1167 selenate
 
Misspelt word: untill:
0.0333 until
0.1111 till
0.1333 huntsville
0.1357 instill
0.1422 unital
0.1511 unilateral
 
Misspelt word: wich:
0.0533 witch
0.0533 winch
0.0600 which
0.0857 wichita
0.1111 twitch
0.1111 switch
</pre>
 
=={{header|C++}}==
{{trans|Swift}}
<syntaxhighlight lang="cpp">#include <algorithm>
#include <cstdlib>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <string>
#include <vector>
 
auto load_dictionary(const std::string& path) {
std::ifstream in(path);
if (!in)
throw std::runtime_error("Cannot open file " + path);
std::string line;
std::vector<std::string> words;
while (getline(in, line))
words.push_back(line);
return words;
}
 
double jaro_winkler_distance(std::string str1, std::string str2) {
size_t len1 = str1.size();
size_t len2 = str2.size();
if (len1 < len2) {
std::swap(str1, str2);
std::swap(len1, len2);
}
if (len2 == 0)
return len1 == 0 ? 0.0 : 1.0;
size_t delta = std::max(size_t(1), len1/2) - 1;
std::vector<bool> flag(len2, false);
std::vector<char> ch1_match;
ch1_match.reserve(len1);
for (size_t idx1 = 0; idx1 < len1; ++idx1) {
char ch1 = str1[idx1];
for (size_t idx2 = 0; idx2 < len2; ++idx2) {
char ch2 = str2[idx2];
if (idx2 <= idx1 + delta && idx2 + delta >= idx1
&& ch1 == ch2 && !flag[idx2]) {
flag[idx2] = true;
ch1_match.push_back(ch1);
break;
}
}
}
size_t matches = ch1_match.size();
if (matches == 0)
return 1.0;
size_t transpositions = 0;
for (size_t idx1 = 0, idx2 = 0; idx2 < len2; ++idx2) {
if (flag[idx2]) {
if (str2[idx2] != ch1_match[idx1])
++transpositions;
++idx1;
}
}
double m = matches;
double jaro = (m/len1 + m/len2 + (m - transpositions/2.0)/m)/3.0;
size_t common_prefix = 0;
len2 = std::min(size_t(4), len2);
for (size_t i = 0; i < len2; ++i) {
if (str1[i] == str2[i])
++common_prefix;
}
return 1.0 - (jaro + common_prefix * 0.1 * (1.0 - jaro));
}
 
auto within_distance(const std::vector<std::string>& words,
double max_distance, const std::string& str,
size_t max_to_return) {
using pair = std::pair<std::string, double>;
std::vector<pair> result;
for (const auto& word : words) {
double jaro = jaro_winkler_distance(word, str);
if (jaro <= max_distance)
result.emplace_back(word, jaro);
}
std::stable_sort(result.begin(), result.end(),
[](const pair& p1, const pair& p2) { return p1.second < p2.second; });
if (result.size() > max_to_return)
result.resize(max_to_return);
return result;
}
 
int main() {
try {
auto words(load_dictionary("linuxwords.txt"));
std::cout << std::fixed << std::setprecision(4);
for (auto str : {"accomodate", "definately", "goverment",
"occured", "publically", "recieve",
"seperate", "untill", "wich"}) {
std::cout << "Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to '"
<< str << "' are:\n Word | Distance\n";
for (const auto& pair : within_distance(words, 0.15, str, 5)) {
std::cout << std::setw(14) << pair.first << " | "
<< std::setw(6) << pair.second << '\n';
}
std::cout << '\n';
}
} catch (const std::exception& ex) {
std::cerr << ex.what() << '\n';
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}</syntaxhighlight>
 
{{out}}
<pre>
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'accomodate' are:
Word | Distance
accommodate | 0.0182
accommodated | 0.0333
accommodates | 0.0333
accommodating | 0.0815
accommodation | 0.0815
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'definately' are:
Word | Distance
definitely | 0.0400
defiantly | 0.0422
define | 0.0800
definite | 0.0850
definable | 0.0872
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'goverment' are:
Word | Distance
government | 0.0533
govern | 0.0667
governments | 0.0697
movement | 0.0810
governmental | 0.0833
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'occured' are:
Word | Distance
occurred | 0.0250
occur | 0.0571
occupied | 0.0786
occurs | 0.0905
accursed | 0.0917
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'publically' are:
Word | Distance
publicly | 0.0400
public | 0.0800
publicity | 0.1044
publication | 0.1327
biblically | 0.1400
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'recieve' are:
Word | Distance
receive | 0.0333
received | 0.0625
receiver | 0.0625
receives | 0.0625
relieve | 0.0667
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'seperate' are:
Word | Distance
desperate | 0.0708
separate | 0.0917
temperate | 0.1042
separated | 0.1144
separates | 0.1144
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'untill' are:
Word | Distance
until | 0.0333
untie | 0.1067
untimely | 0.1083
till | 0.1111
Antilles | 0.1264
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'wich' are:
Word | Distance
witch | 0.0533
which | 0.0600
switch | 0.1111
twitch | 0.1111
witches | 0.1143
 
</pre>
 
=={{header|F_Sharp|F#}}==
This task uses [http://www.rosettacode.org/wiki/Jaro_distance#F.23 Jaro Distance (F#)]
<syntaxhighlight lang="fsharp">
// Calculate Jaro-Winkler Similarity of 2 Strings. Nigel Galloway: August 7th., 2020
let Jw P n g=let L=float(let i=Seq.map2(fun n g->n=g) n g in (if Seq.length i>4 then i|>Seq.take 4 else i)|>Seq.takeWhile id|>Seq.length)
let J=J n g in J+P*L*(1.0-J)
 
let dict=System.IO.File.ReadAllLines("linuxwords.txt")
let fN g=let N=Jw 0.1 g in dict|>Array.map(fun n->(n,1.0-(N n)))|>Array.sortBy snd
["accomodate";"definately";"goverment";"occured";"publically";"recieve";"seperate";"untill";"wich"]|>
List.iter(fun n->printfn "%s" n;fN n|>Array.take 5|>Array.iter(fun n->printf "%A" n);printfn "\n")
</syntaxhighlight>
{{out}}
<pre>
accomodate
("accommodate", 0.01818181818)("accommodated", 0.03333333333)("accommodates", 0.03333333333)("accommodation", 0.08153846154)("accommodating", 0.08153846154)
 
definately
("definitely", 0.04)("defiantly", 0.04222222222)("define", 0.08)("definite", 0.085)("definable", 0.08722222222)
 
goverment
("government", 0.05333333333)("govern", 0.06666666667)("governments", 0.0696969697)("governmental", 0.08333333333)("governs", 0.09523809524)
 
occured
("occurred", 0.025)("occur", 0.05714285714)("occupied", 0.07857142857)("occurs", 0.09047619048)("cured", 0.09523809524)
 
publically
("publicly", 0.04)("public", 0.08)("publicity", 0.1044444444)("publication", 0.1327272727)("politically", 0.1418181818)
 
recieve
("receive", 0.03333333333)("received", 0.0625)("receives", 0.0625)("receiver", 0.0625)("relieve", 0.07619047619)
 
seperate
("desperate", 0.0787037037)("separate", 0.09166666667)("separated", 0.1143518519)("separates", 0.1143518519)("temperate", 0.1157407407)
 
untill
("until", 0.03333333333)("untie", 0.1066666667)("untimely", 0.1083333333)("till", 0.1111111111)("Huntsville", 0.1333333333)
 
wich
("witch", 0.05333333333)("which", 0.06)("switch", 0.1111111111)("twitch", 0.1111111111)("witches", 0.1142857143)
</pre>
 
=={{header|Go}}==
This uses unixdict and borrows code from the [[Jaro_distance#Go]] task. Otherwise it is a translation of the Wren entry.
<syntaxhighlight lang="go">package main
 
import (
"bytes"
"fmt"
"io/ioutil"
"log"
"sort"
)
 
func jaroSim(str1, str2 string) float64 {
if len(str1) == 0 && len(str2) == 0 {
return 1
}
if len(str1) == 0 || len(str2) == 0 {
return 0
}
match_distance := len(str1)
if len(str2) > match_distance {
match_distance = len(str2)
}
match_distance = match_distance/2 - 1
str1_matches := make([]bool, len(str1))
str2_matches := make([]bool, len(str2))
matches := 0.
transpositions := 0.
for i := range str1 {
start := i - match_distance
if start < 0 {
start = 0
}
end := i + match_distance + 1
if end > len(str2) {
end = len(str2)
}
for k := start; k < end; k++ {
if str2_matches[k] {
continue
}
if str1[i] != str2[k] {
continue
}
str1_matches[i] = true
str2_matches[k] = true
matches++
break
}
}
if matches == 0 {
return 0
}
k := 0
for i := range str1 {
if !str1_matches[i] {
continue
}
for !str2_matches[k] {
k++
}
if str1[i] != str2[k] {
transpositions++
}
k++
}
transpositions /= 2
return (matches/float64(len(str1)) +
matches/float64(len(str2)) +
(matches-transpositions)/matches) / 3
}
 
func jaroWinklerDist(s, t string) float64 {
ls := len(s)
lt := len(t)
lmax := lt
if ls < lt {
lmax = ls
}
if lmax > 4 {
lmax = 4
}
l := 0
for i := 0; i < lmax; i++ {
if s[i] == t[i] {
l++
}
}
js := jaroSim(s, t)
p := 0.1
ws := js + float64(l)*p*(1-js)
return 1 - ws
}
 
type wd struct {
word string
dist float64
}
 
func main() {
misspelt := []string{
"accomodate", "definately", "goverment", "occured", "publically",
"recieve", "seperate", "untill", "wich",
}
b, err := ioutil.ReadFile("unixdict.txt")
if err != nil {
log.Fatal("Error reading file")
}
words := bytes.Fields(b)
for _, ms := range misspelt {
var closest []wd
for _, w := range words {
word := string(w)
if word == "" {
continue
}
jwd := jaroWinklerDist(ms, word)
if jwd < 0.15 {
closest = append(closest, wd{word, jwd})
}
}
fmt.Println("Misspelt word:", ms, ":")
sort.Slice(closest, func(i, j int) bool { return closest[i].dist < closest[j].dist })
for i, c := range closest {
fmt.Printf("%0.4f %s\n", c.dist, c.word)
if i == 5 {
break
}
}
fmt.Println()
}
}</syntaxhighlight>
 
{{out}}
<pre>
Misspelt word: accomodate :
0.0182 accommodate
0.1044 accordant
0.1136 accolade
0.1219 acclimate
0.1327 accompanist
0.1333 accord
 
Misspelt word: definately :
0.0800 define
0.0850 definite
0.0886 defiant
0.1200 definitive
0.1219 designate
0.1267 deflate
 
Misspelt word: goverment :
0.0667 govern
0.1167 governor
0.1175 governess
0.1330 governance
0.1361 coverlet
0.1367 sovereignty
 
Misspelt word: occured :
0.0250 occurred
0.0571 occur
0.0952 occurrent
0.1056 occlude
0.1217 concurred
0.1429 cure
 
Misspelt word: publically :
0.0800 public
0.1327 publication
0.1400 pull
0.1492 pullback
 
Misspelt word: recieve :
0.0333 receive
0.0667 relieve
0.0762 reeve
0.0852 receptive
0.0852 recessive
0.0905 recife
 
Misspelt word: seperate :
0.0708 desperate
0.0917 separate
0.1042 temperate
0.1167 selenate
0.1167 sewerage
0.1167 sept
 
Misspelt word: untill :
0.0333 until
0.1111 till
0.1333 huntsville
0.1357 instill
0.1422 unital
 
Misspelt word: wich :
0.0533 winch
0.0533 witch
0.0600 which
0.0857 wichita
0.1111 switch
0.1111 twitch
</pre>
 
=={{header|J}}==
 
Implementation:
 
<syntaxhighlight lang="j">jaro=: {{
Eq=. (x=/y)*(<.<:-:x>.&#y)>:|x -/&i.&# y
xM=. (+./"1 Eq)#x
yM=. (+./"2 Eq)#y
M=. xM <.&# yM
T=. -: +/ xM ~:&(M&{.) yM
3%~ (M%#x) + (M%#y) + (M-T)%M
}}
 
jarowinkler=: {{
p=. 0.1
l=. +/*/\x =&((4<.x<.&#y)&{.) y
simj=. x jaro y
-.simj + l*p*-.simj
}}</syntaxhighlight>
 
Task example:
 
<syntaxhighlight lang="j">task=: {{
words=. <;._2 fread '/usr/share/dict/words'
for_word. ;:'accomodate definately goverment occured publically recieve seperate untill wich' do.
b=.d<:close=. 2{/:~d=. word jarowinkler every words
echo (;word),':'
echo ' ',.(":,.b#d),.' ',.>b#words
echo''
end.
}}
 
task''
accomodate:
0.0681818 accommodate
0.0945455 accorporate
0.0703704 commodate
 
definately:
0.0422222 defiantly
0.0622222 definably
0.0622222 definedly
 
goverment:
0.0833333 govern
0.0644444 government
0.0944444 governmental
 
occured:
0.105556 occlude
0.0571429 occur
0.0952381 occursive
 
publically:
0.08 public
0.0747222 publicity
0.0525 publicly
 
recieve:
0.0592593 reachieve
0.0333333 receive
0.0392857 recidive
 
seperate:
0.0145833 separate
0.0405093 separates
0.0458333 septate
 
untill:
0.0333333 until
0 untill
0.0333333 untrill
 
wich:
0.04 wicht
0.0533333 winch
0.0533333 witch
</syntaxhighlight>
=={{header|Java}}==
{{trans|C++}}
<syntaxhighlight lang="java">import java.io.*;
import java.util.*;
 
public class JaroWinkler {
public static void main(String[] args) {
try {
List<String> words = loadDictionary("linuxwords.txt");
String[] strings = {
"accomodate", "definately", "goverment", "occured",
"publically", "recieve", "seperate", "untill", "wich"
};
for (String string : strings) {
System.out.printf("Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to '%s' are:\n"
+ " Word | Distance\n", string);
for (StringDistance s : withinDistance(words, 0.15, string, 5)) {
System.out.printf("%14s | %.4f\n", s.word, s.distance);
}
System.out.println();
}
} catch (Exception e) {
e.printStackTrace();
}
}
 
private static class StringDistance implements Comparable<StringDistance> {
private StringDistance(String word, double distance) {
this.word = word;
this.distance = distance;
}
public int compareTo(StringDistance s) {
return Double.compare(distance, s.distance);
}
private String word;
private double distance;
}
 
private static List<StringDistance> withinDistance(List<String> words,
double maxDistance, String string, int max) {
List<StringDistance> result = new ArrayList<>();
for (String word : words) {
double distance = jaroWinklerDistance(word, string);
if (distance <= maxDistance)
result.add(new StringDistance(word, distance));
}
Collections.sort(result);
if (result.size() > max)
result = result.subList(0, max);
return result;
}
 
private static double jaroWinklerDistance(String string1, String string2) {
int len1 = string1.length();
int len2 = string2.length();
if (len1 < len2) {
String s = string1;
string1 = string2;
string2 = s;
int tmp = len1;
len1 = len2;
len2 = tmp;
}
if (len2 == 0)
return len1 == 0 ? 0.0 : 1.0;
int delta = Math.max(1, len1 / 2) - 1;
boolean[] flag = new boolean[len2];
Arrays.fill(flag, false);
char[] ch1Match = new char[len1];
int matches = 0;
for (int i = 0; i < len1; ++i) {
char ch1 = string1.charAt(i);
for (int j = 0; j < len2; ++j) {
char ch2 = string2.charAt(j);
if (j <= i + delta && j + delta >= i && ch1 == ch2 && !flag[j]) {
flag[j] = true;
ch1Match[matches++] = ch1;
break;
}
}
}
if (matches == 0)
return 1.0;
int transpositions = 0;
for (int i = 0, j = 0; j < len2; ++j) {
if (flag[j]) {
if (string2.charAt(j) != ch1Match[i])
++transpositions;
++i;
}
}
double m = matches;
double jaro = (m / len1 + m / len2 + (m - transpositions / 2.0) / m) / 3.0;
int commonPrefix = 0;
len2 = Math.min(4, len2);
for (int i = 0; i < len2; ++i) {
if (string1.charAt(i) == string2.charAt(i))
++commonPrefix;
}
return 1.0 - (jaro + commonPrefix * 0.1 * (1.0 - jaro));
}
 
private static List<String> loadDictionary(String path) throws IOException {
try (BufferedReader reader = new BufferedReader(new FileReader(path))) {
List<String> words = new ArrayList<>();
String word;
while ((word = reader.readLine()) != null)
words.add(word);
return words;
}
}
}</syntaxhighlight>
 
{{out}}
<pre>
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'accomodate' are:
Word | Distance
accommodate | 0.0182
accommodated | 0.0333
accommodates | 0.0333
accommodating | 0.0815
accommodation | 0.0815
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'definately' are:
Word | Distance
definitely | 0.0400
defiantly | 0.0422
define | 0.0800
definite | 0.0850
definable | 0.0872
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'goverment' are:
Word | Distance
government | 0.0533
govern | 0.0667
governments | 0.0697
movement | 0.0810
governmental | 0.0833
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'occured' are:
Word | Distance
occurred | 0.0250
occur | 0.0571
occupied | 0.0786
occurs | 0.0905
accursed | 0.0917
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'publically' are:
Word | Distance
publicly | 0.0400
public | 0.0800
publicity | 0.1044
publication | 0.1327
biblically | 0.1400
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'recieve' are:
Word | Distance
receive | 0.0333
received | 0.0625
receiver | 0.0625
receives | 0.0625
relieve | 0.0667
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'seperate' are:
Word | Distance
desperate | 0.0708
separate | 0.0917
temperate | 0.1042
separated | 0.1144
separates | 0.1144
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'untill' are:
Word | Distance
until | 0.0333
untie | 0.1067
untimely | 0.1083
till | 0.1111
Antilles | 0.1264
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'wich' are:
Word | Distance
witch | 0.0533
which | 0.0600
switch | 0.1111
twitch | 0.1111
witches | 0.1143
 
</pre>
 
=={{header|jq}}==
{{works with|jq}}
'''Works with gojq, the Go implementation of jq'''
 
This entry, which uses unixdict.txt, borrows the implementation in jq of the Jaro similarity measure as defined at
[[Jaro_similarity#jq]]; since it is quite long, it is not repeated here.
<syntaxhighlight lang="jq"># See [[Jaro_similarity#jq]] for the implementation of jaro/2
 
def length_of_common_prefix($s1; $s2):
if ($s1|length) > ($s2|length) then length_of_common_prefix($s2; $s1)
else ($s1|explode) as $x1
| ($s2|explode) as $x2
| first( range(0;$x1|length) | select( $x1[.] != $x2[.] )) // ($x1|length)
end;
 
# Output: the Jaro-WInkler distance using 0.1 as the common-prefix multiplier
def jaro_winkler($s1; $s2):
if $s1 == $s2 then 0
else jaro($s1; $s2) as $j
| length_of_common_prefix($s1[:4]; $s2[:4]) as $l
| 1 - ($j + 0.1 * $l * (1 - $j))
end ;
 
# Input: an array of words
# Output: [[match, distance] ...]
def candidates($word; $threshold):
map(jaro_winkler($word; . ) as $x | select($x <= $threshold) | [., $x] );
 
def lpad($len): tostring | ($len - length) as $l | (" " * $l)[:$l] + .;
 
def task:
[inputs] # the dictionary
| ("accomodate", "definately", "goverment​", "occured", "publically", "recieve​", "seperate", "untill", "wich​") as $word
| candidates($word; 0.15) | sort_by(.[-1]) | .[:5]
| "Matches for \($word|lpad(10)): Distance",
(.[] | "\(.[0] | lpad(21)) : \(.[-1] * 1000 | round / 1000)") ;
 
task</syntaxhighlight>
{{out}}
Invocation: jq -rRn -f program.jq unixdict.txt
<pre>
Matches for accomodate: Distance
accommodate : 0.018
accordant : 0.104
accolade : 0.114
acclimate : 0.122
accompanist : 0.133
Matches for definately: Distance
define : 0.08
definite : 0.085
defiant : 0.089
definitive : 0.12
deflate : 0.127
Matches for goverment​: Distance
govern : 0.08
governor : 0.13
governess : 0.133
governance : 0.149
Matches for occured: Distance
occurred : 0.025
occur : 0.057
occurrent : 0.095
occlude : 0.106
concurred : 0.122
Matches for publically: Distance
public : 0.08
publication : 0.133
Matches for recieve​: Distance
receive : 0.063
reeve : 0.1
relieve : 0.105
recife : 0.108
recipe : 0.108
Matches for seperate: Distance
desperate : 0.079
separate : 0.092
temperate : 0.116
sept : 0.117
septate : 0.131
Matches for untill: Distance
until : 0.033
till : 0.111
huntsville : 0.133
unital : 0.142
Matches for wich​: Distance
winch : 0.107
witch : 0.107
which : 0.12
wichita : 0.126
</pre>
 
=={{header|Julia}}==
<syntaxhighlight lang="julia"># download("http://users.cs.duke.edu/~ola/ap/linuxwords", "linuxwords.txt")
const words = read("linuxwords.txt", String) |> split .|> strip
 
function jarowinklerdistance(s1, s2)
if length(s1) < length(s2)
s1, s2 = s2, s1
end
len1, len2 = length(s1), length(s2)
len2 == 0 && return 0.0
delta = max(0, len2 ÷ 2 - 1)
flag = zeros(Bool, len2) # flags for possible transpositions, begin as false
ch1_match = eltype(s1)[]
for (i, ch1) in enumerate(s1)
for (j, ch2) in enumerate(s2)
if (j <= i + delta) && (j >= i - delta) && (ch1 == ch2) && !flag[j]
flag[j] = true
push!(ch1_match, ch1)
break
end
end
end
matches = length(ch1_match)
matches == 0 && return 1.0
transpositions, i = 0, 0
for (j, ch2) in enumerate(s2)
if flag[j]
i += 1
transpositions += (ch2 != ch1_match[i])
end
end
jaro = (matches / len1 + matches / len2 + (matches - transpositions/2) / matches) / 3.0
commonprefix = count(i -> s1[i] == s2[i], 1:min(len2, 4))
return 1 - (jaro + commonprefix * 0.1 * (1 - jaro))
end
 
function closewords(s, maxdistance, maxtoreturn)
jw = 0.0
arr = [(w, jw) for w in words if (jw = jarowinklerdistance(s, w)) <= maxdistance]
sort!(arr, lt=(x, y) -> x[2] < y[2])
return length(arr) <= maxtoreturn ? arr : arr[1:maxtoreturn]
end
 
for s in ["accomodate", "definately", "goverment", "occured", "publically",
"recieve", "seperate", "untill", "wich"]
println("\nClose dictionary words ( distance < 0.15 using Jaro-Winkler distance) to '$s' are:")
println(" Word | Distance")
for (w, jw) in closewords(s, 0.15, 5)
println(rpad(w, 14), "| ", Float16(jw))
end
end
</syntaxhighlight>{{out}}
<pre>
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'accomodate' are:
Word | Distance
accommodate | 0.01819
accommodated | 0.03333
accommodates | 0.03333
accommodating | 0.08154
accommodation | 0.08154
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'definately' are:
Word | Distance
definitely | 0.04
defiantly | 0.04224
define | 0.08
definite | 0.085
definable | 0.0872
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'goverment' are:
Word | Distance
government | 0.05334
govern | 0.06665
governments | 0.0697
movement | 0.081
governmental | 0.0833
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'occured' are:
Word | Distance
occurred | 0.025
occur | 0.05713
occupied | 0.07855
occurs | 0.09045
accursed | 0.0917
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'publically' are:
Word | Distance
publicly | 0.04
public | 0.08
publicity | 0.10443
publication | 0.1327
biblically | 0.14
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'recieve' are:
Word | Distance
receive | 0.03333
received | 0.0625
receiver | 0.0625
receives | 0.0625
relieve | 0.06665
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'seperate' are:
Word | Distance
desperate | 0.07086
separate | 0.0917
temperate | 0.1042
separated | 0.1144
separates | 0.1144
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'untill' are:
Word | Distance
until | 0.03333
untie | 0.1067
untimely | 0.10834
Antilles | 0.1263
untidy | 0.1333
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'wich' are:
Word | Distance
witch | 0.05334
which | 0.06
witches | 0.11426
rich | 0.11664
wick | 0.11664
</pre>
 
=={{header|Mathematica}}/{{header|Wolfram Language}}==
<syntaxhighlight lang="mathematica">ClearAll[JWD]
JWD[a_][b_]:=Experimental`JaroWinklerDistance[a,b]
dict=DictionaryLookup[];
TakeSmallestBy[dict->{"Element","Value"},JWD["accomodate"],5]//Grid
TakeSmallestBy[dict->{"Element","Value"},JWD["definately"],5]//Grid
TakeSmallestBy[dict->{"Element","Value"},JWD["goverment"],5]//Grid
TakeSmallestBy[dict->{"Element","Value"},JWD["occured"],5]//Grid
TakeSmallestBy[dict->{"Element","Value"},JWD["publically"],5]//Grid
TakeSmallestBy[dict->{"Element","Value"},JWD["recieve"],5]//Grid
TakeSmallestBy[dict->{"Element","Value"},JWD["seperate"],5]//Grid
TakeSmallestBy[dict->{"Element","Value"},JWD["untill"],5]//Grid
TakeSmallestBy[dict->{"Element","Value"},JWD["wich"],5]//Grid</syntaxhighlight>
{{out}}
<pre>accommodate 0.0181818
accommodated 0.0333333
accommodates 0.0333333
accommodation 0.0815385
accommodating 0.0815385
 
definitely 0.04
defiantly 0.0422222
definably 0.0622222
definitively 0.07
define 0.08
 
government 0.0422222
governments 0.0585859
govern 0.0666667
governmental 0.0722222
governs 0.0952381
 
occurred 0.025
occur 0.0571429
occupied 0.0785714
occurs 0.0904762
cured 0.0952381
 
publicly 0.04
public 0.08
publican 0.085
publicans 0.104444
publicity 0.10444
 
receive 0.0333333
receives 0.0625
received 0.0625
receiver 0.0625
reeve 0.0761905
 
desperate 0.0787037
separate 0.0916667
separateness 0.106944
sprat 0.1125
separated 0.114352
 
until 0.0333333
untiled 0.0904762
untiles 0.0904762
unlit 0.0977778
untypically 0.106061
 
winch 0.0533333
witch 0.0533333
which 0.06
switch 0.111111
twitch 0.111111</pre>
 
=={{header|Nim}}==
{{trans|Go}}
<syntaxhighlight lang="nim">import lenientops
 
func jaroSim(s1, s2: string): float =
 
if s1.len == 0 and s2.len == 0: return 1
if s1.len == 0 or s2.len == 0: return 0
 
let matchDistance = max(s1.len, s2.len) div 2 - 1
var s1Matches = newSeq[bool](s1.len)
var s2Matches = newSeq[bool](s2.len)
var matches = 0
for i in 0..s1.high:
for j in max(0, i - matchDistance)..min(i + matchDistance, s2.high):
if not s2Matches[j] and s1[i] == s2[j]:
s1Matches[i] = true
s2Matches[j] = true
inc matches
break
if matches == 0: return 0
 
var transpositions = 0.0
var k = 0
for i in ..s1.high:
if not s1Matches[i]: continue
while not s2Matches[k]: inc k
if s1[i] != s2[k]: transpositions += 0.5
inc k
 
result = (matches / s1.len + matches / s2.len + (matches - transpositions) / matches) / 3
 
 
func jaroWinklerDist(s, t: string): float =
let ls = s.len
let lt = t.len
var lmax = if ls < lt: ls else: lt
if lmax > 4: lmax = 4
var l = 0
for i in 0..<lmax:
if s[i] == t[i]: inc l
let js = jaroSim(s, t)
let p = 0.1
let ws = js + float(l) * p * (1 - js)
result = 1 - ws
 
 
when isMainModule:
 
import algorithm, sequtils, strformat
 
type Wd = tuple[word: string; dist: float]
 
func `<`(w1, w2: Wd): bool =
if w1.dist < w2.dist: true
elif w1.dist == w2.dist: w1.word < w2.word
else: false
 
const Misspelt = ["accomodate", "definately", "goverment", "occured",
"publically", "recieve", "seperate", "untill", "wich"]
 
let words = toSeq("unixdict.txt".lines)
for ms in Misspelt:
var closest: seq[Wd]
for word in words:
if word.len == 0: continue
let jwd = jaroWinklerDist(ms, word)
if jwd < 0.15:
closest.add (word, jwd)
echo "Misspelt word: ", ms, ":"
closest.sort()
for i, c in closest:
echo &"{c.dist:0.4f} {c.word}"
if i == 5: break
echo()</syntaxhighlight>
 
{{out}}
<pre>Misspelt word: accomodate:
0.0182 accommodate
0.1044 accordant
0.1136 accolade
0.1219 acclimate
0.1327 accompanist
0.1333 accord
 
Misspelt word: definately:
0.0800 define
0.0850 definite
0.0886 defiant
0.1200 definitive
0.1219 designate
0.1267 deflate
 
Misspelt word: goverment:
0.0667 govern
0.1167 governor
0.1175 governess
0.1330 governance
0.1361 coverlet
0.1367 sovereignty
 
Misspelt word: occured:
0.0250 occurred
0.0571 occur
0.0952 occurrent
0.1056 occlude
0.1217 concurred
0.1429 cure
 
Misspelt word: publically:
0.0800 public
0.1327 publication
0.1400 pull
0.1492 pullback
 
Misspelt word: recieve:
0.0333 receive
0.0667 relieve
0.0762 reeve
0.0852 receptive
0.0852 recessive
0.0905 recife
 
Misspelt word: seperate:
0.0708 desperate
0.0917 separate
0.1042 temperate
0.1167 selenate
0.1167 sept
0.1167 sewerage
 
Misspelt word: untill:
0.0333 until
0.1111 till
0.1333 huntsville
0.1357 instill
0.1422 unital
 
Misspelt word: wich:
0.0533 winch
0.0533 witch
0.0600 which
0.0857 wichita
0.1111 switch
0.1111 twitch</pre>
 
=={{header|Perl}}==
<syntaxhighlight lang="perl">use strict;
use warnings;
use List::Util qw(min max head);
 
sub jaro_winkler {
my($s, $t) = @_;
my(@s_matches, @t_matches, $matches);
 
return 0 if $s eq $t;
 
my $s_len = length $s; my @s = split //, $s;
my $t_len = length $t; my @t = split //, $t;
 
my $match_distance = int (max($s_len,$t_len)/2) - 1;
 
for my $i (0 .. $#s) {
my $start = max(0, $i - $match_distance);
my $end = min($i + $match_distance, $t_len - 1);
for my $j ($start .. $end) {
next if $t_matches[$j] or $s[$i] ne $t[$j];
($s_matches[$i], $t_matches[$j]) = (1, 1);
$matches++ and last;
}
}
return 1 unless $matches;
 
my($k, $transpositions) = (0, 0);
 
for my $i (0 .. $#s) {
next unless $s_matches[$i];
$k++ until $t_matches[$k];
$transpositions++ if $s[$i] ne $t[$k];
$k++;
}
 
my $prefix = 0;
$s[$_] eq $t[$_] and ++$prefix for 0 .. -1 + min 5, $s_len, $t_len;
 
my $jaro = ($matches / $s_len + $matches / $t_len +
(($matches - $transpositions / 2) / $matches)) / 3;
 
1 - ($jaro + $prefix * .1 * ( 1 - $jaro) )
}
 
my @words = split /\n/, `cat ./unixdict.txt`;
 
for my $word (<accomodate definately goverment occured publically recieve seperate untill wich>) {
my %J;
$J{$_} = jaro_winkler($word, $_) for @words;
print "\nClosest 5 dictionary words with a Jaro-Winkler distance < .15 from '$word':\n";
printf "%15s : %0.4f\n", $_, $J{$_}
for head 5, sort { $J{$a} <=> $J{$b} or $a cmp $b } grep { $J{$_} < 0.15 } keys %J;
}</syntaxhighlight>
{{out}}
<pre style="height:40ex">Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'accomodate':
accommodate : 0.0152
accordant : 0.1044
accompanist : 0.1106
accolade : 0.1136
accompaniment : 0.1183
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'definately':
define : 0.0667
definite : 0.0708
defiant : 0.0886
definitive : 0.1000
designate : 0.1219
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'goverment':
govern : 0.0556
governor : 0.0972
governess : 0.0979
governance : 0.1108
coverlet : 0.1167
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'occured':
occurred : 0.0208
occur : 0.0476
occurrent : 0.0794
occlude : 0.1056
occurring : 0.1217
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'publically':
public : 0.0667
publication : 0.1106
publish : 0.1310
pull : 0.1400
pullback : 0.1492
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'recieve':
receive : 0.0333
relieve : 0.0571
reeve : 0.0667
receptive : 0.0852
recessive : 0.0852
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'seperate':
desperate : 0.0708
separate : 0.0786
sewerage : 0.1000
repartee : 0.1083
repeater : 0.1083
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'untill':
until : 0.0278
till : 0.1111
tilt : 0.1111
huntsville : 0.1333
instill : 0.1357
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from 'wich':
winch : 0.0533
witch : 0.0533
which : 0.0600
wichita : 0.0857
switch : 0.1111</pre>
 
=={{header|Phix}}==
Uses jaro() from [[Jaro_distance#Phix]] (reproduced below for your convenience) and the standard unix_dict()
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #008080;">function</span> <span style="color: #000000;">jaro</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">str1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">str2</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">str1</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">trim</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">upper</span><span style="color: #0000FF;">(</span><span style="color: #000000;">str1</span><span style="color: #0000FF;">))</span>
<span style="color: #000000;">str2</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">trim</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">upper</span><span style="color: #0000FF;">(</span><span style="color: #000000;">str2</span><span style="color: #0000FF;">))</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">len1</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">str1</span><span style="color: #0000FF;">),</span>
<span style="color: #000000;">len2</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">str2</span><span style="color: #0000FF;">),</span>
<span style="color: #000000;">match_distance</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">floor</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">max</span><span style="color: #0000FF;">(</span><span style="color: #000000;">len1</span><span style="color: #0000FF;">,</span><span style="color: #000000;">len2</span><span style="color: #0000FF;">)/</span><span style="color: #000000;">2</span><span style="color: #0000FF;">)-</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span>
<span style="color: #000000;">match_count</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0</span><span style="color: #0000FF;">,</span>
<span style="color: #000000;">half_transposed</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">0</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">len1</span><span style="color: #0000FF;">==</span><span style="color: #000000;">0</span> <span style="color: #008080;">then</span> <span style="color: #008080;">return</span> <span style="color: #000000;">len2</span><span style="color: #0000FF;">==</span><span style="color: #000000;">0</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #000080;font-style:italic;">-- count the number of matches</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">m1</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #004600;">false</span><span style="color: #0000FF;">,</span><span style="color: #000000;">len1</span><span style="color: #0000FF;">),</span>
<span style="color: #000000;">m2</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #004600;">false</span><span style="color: #0000FF;">,</span><span style="color: #000000;">len2</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #000000;">len1</span> <span style="color: #008080;">do</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">k</span><span style="color: #0000FF;">=</span><span style="color: #7060A8;">max</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #000000;">i</span><span style="color: #0000FF;">-</span><span style="color: #000000;">match_distance</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">to</span> <span style="color: #7060A8;">min</span><span style="color: #0000FF;">(</span><span style="color: #000000;">len2</span><span style="color: #0000FF;">,</span><span style="color: #000000;">i</span><span style="color: #0000FF;">+</span><span style="color: #000000;">match_distance</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #008080;">if</span> <span style="color: #008080;">not</span> <span style="color: #000000;">m2</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">]</span> <span style="color: #008080;">then</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">str1</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]=</span><span style="color: #000000;">str2</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">]</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">m1</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #004600;">true</span>
<span style="color: #000000;">m2</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #004600;">true</span>
<span style="color: #000000;">match_count</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span>
<span style="color: #008080;">exit</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">match_count</span><span style="color: #0000FF;">==</span><span style="color: #000000;">0</span> <span style="color: #008080;">then</span> <span style="color: #008080;">return</span> <span style="color: #000000;">0</span> <span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #000080;font-style:italic;">-- count the number of half-transpositions</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">k</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">1</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #000000;">len1</span> <span style="color: #008080;">do</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">m1</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> <span style="color: #008080;">then</span>
<span style="color: #008080;">while</span> <span style="color: #008080;">not</span> <span style="color: #000000;">m2</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">]</span> <span style="color: #008080;">do</span> <span style="color: #000000;">k</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span> <span style="color: #008080;">end</span> <span style="color: #008080;">while</span>
<span style="color: #000000;">half_transposed</span> <span style="color: #0000FF;">+=</span> <span style="color: #0000FF;">(</span><span style="color: #000000;">str1</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]!=</span><span style="color: #000000;">str2</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">])</span>
<span style="color: #000000;">k</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">1</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">transpositions</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">floor</span><span style="color: #0000FF;">(</span><span style="color: #000000;">half_transposed</span><span style="color: #0000FF;">/</span><span style="color: #000000;">2</span><span style="color: #0000FF;">),</span>
<span style="color: #000000;">not_transposed</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">match_count</span> <span style="color: #0000FF;">-</span> <span style="color: #000000;">transpositions</span>
<span style="color: #000080;font-style:italic;">--
-- return the average of:
-- percentage/fraction of the first string matched,
-- percentage/fraction of the second string matched, and
-- percentage/fraction of matches that were not transposed.
--</span>
<span style="color: #008080;">return</span> <span style="color: #0000FF;">(</span><span style="color: #000000;">match_count</span><span style="color: #0000FF;">/</span><span style="color: #000000;">len1</span> <span style="color: #0000FF;">+</span>
<span style="color: #000000;">match_count</span><span style="color: #0000FF;">/</span><span style="color: #000000;">len2</span> <span style="color: #0000FF;">+</span>
<span style="color: #000000;">not_transposed</span><span style="color: #0000FF;">/</span><span style="color: #000000;">match_count</span><span style="color: #0000FF;">)/</span><span style="color: #000000;">3</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">jaroWinklerDist</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">t</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">lm</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">min</span><span style="color: #0000FF;">({</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">),</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">t</span><span style="color: #0000FF;">),</span><span style="color: #000000;">4</span><span style="color: #0000FF;">}),</span>
<span style="color: #000000;">l</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sum</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">sq_eq</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">..</span><span style="color: #000000;">lm</span><span style="color: #0000FF;">],</span><span style="color: #000000;">t</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">..</span><span style="color: #000000;">lm</span><span style="color: #0000FF;">]))</span>
<span style="color: #008080;">return</span> <span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">-</span><span style="color: #000000;">jaro</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">t</span><span style="color: #0000FF;">))*(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">-</span><span style="color: #000000;">l</span><span style="color: #0000FF;">*</span><span style="color: #000000;">0.1</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">constant</span> <span style="color: #000000;">mispelt</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"accomodate"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"definately"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"goverment"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"occured"</span><span style="color: #0000FF;">,</span>
<span style="color: #008000;">"publically"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"recieve"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"seperate"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"untill"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"wich"</span><span style="color: #0000FF;">},</span>
<span style="color: #000000;">words</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">unix_dict</span><span style="color: #0000FF;">()</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">jwds</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">repeat</span><span style="color: #0000FF;">(</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">words</span><span style="color: #0000FF;">))</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">m</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">mispelt</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">string</span> <span style="color: #000000;">ms</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">mispelt</span><span style="color: #0000FF;">[</span><span style="color: #000000;">m</span><span style="color: #0000FF;">]</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"\nMisspelt word: %s :\n"</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">ms</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">w</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">words</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #000000;">jwds</span><span style="color: #0000FF;">[</span><span style="color: #000000;">w</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">jaroWinklerDist</span><span style="color: #0000FF;">(</span><span style="color: #000000;">ms</span><span style="color: #0000FF;">,</span><span style="color: #000000;">words</span><span style="color: #0000FF;">[</span><span style="color: #000000;">w</span><span style="color: #0000FF;">])</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">tags</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">custom_sort</span><span style="color: #0000FF;">(</span><span style="color: #000000;">jwds</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">tagset</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">words</span><span style="color: #0000FF;">)))</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">j</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #000000;">6</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">tj</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">tags</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">]</span>
<span style="color: #000080;font-style:italic;">-- if jwds[tj]&gt;0.15 then exit end if</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"%0.4f %s\n"</span><span style="color: #0000FF;">,</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">jwds</span><span style="color: #0000FF;">[</span><span style="color: #000000;">tj</span><span style="color: #0000FF;">],</span> <span style="color: #000000;">words</span><span style="color: #0000FF;">[</span><span style="color: #000000;">tj</span><span style="color: #0000FF;">]})</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<!--</syntaxhighlight>-->
Output identical to <del>Go/Wren</del> Algol68
 
=={{header|Python}}==
<langsyntaxhighlight lang="python">"""
Test Jaro-Winkler distance metric.
linuxwords.txt is from http://users.cs.duke.edu/~ola/ap/linuxwords
Line 84 ⟶ 1,871:
"accomodate​",
"definately​",
"goverment​goverment",
"occured",
"publically",
"recieve​recieve",
"seperate",
"untill​untill",
"wich​wich",
]
 
Line 141 ⟶ 1,928:
for w in within_distance(0.15, STR, 5):
print('{:>14} | {:6.4f}'.format(w, jaro_winkler_distance(STR, w)))
</langsyntaxhighlight>{{out}}
<pre>
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " accomodate​accomodate " are:
Word | Distance
accommodate | 0.03640182
accommodated | 0.05150333
accommodates | 0.05150333
accommodating | 0.09790815
accommodation | 0.09790815
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " definately​definately " are:
Word | Distance
definitely | 0.05640400
defiantly | 0.05860422
define | 0.09090800
definite | 0.09770850
defiantdefinable | 0.10130872
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " goverment​goverment " are:
Word | Distance
government | 0.07330533
govern | 0.08000667
governments | 0.08970697
movement | 0.09920810
governmental | 0.10330833
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " occured " are:
Line 183 ⟶ 1,970:
biblically | 0.1400
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " recieve​recieve " are:
Word | Distance
receive | 0.06250333
received | 0.09170625
receiver | 0.09170625
receives | 0.09170625
relieve | 0.09170667
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " seperate " are:
Line 199 ⟶ 1,986:
separates | 0.1144
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " untill​untill " are:
Word | Distance
until | 0.05710333
untie | 0.12571067
untimely | 0.13211083
Antilles | 0.1264
untidy | 0.1333
 
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " wich​wich " are:
Word | Distance
witch | 0.10670533
which | 0.12000600
witches | 0.1143
rich | 0.1167
wick | 0.1167
</pre>
 
=={{header|Raku}}==
{{works with|Rakudo|2020.07}}
A minor modification of the [[Jaro_distance#Raku|Jaro distance]] task entry.
 
using the unixdict.txt file from www.puzzlers.org
 
<syntaxhighlight lang="raku" line>sub jaro-winkler ($s, $t) {
 
return 0 if $s eq $t;
 
my $s_len = + my @s = $s.comb;
my $t_len = + my @t = $t.comb;
 
my $match_distance = ($s_len max $t_len) div 2 - 1;
 
my @s_matches;
my @t_matches;
my $matches = 0;
 
for ^@s -> $i {
 
my $start = 0 max $i - $match_distance;
my $end = $i + $match_distance min ($t_len - 1);
 
for $start .. $end -> $j {
@t_matches[$j] and next;
@s[$i] eq @t[$j] or next;
@s_matches[$i] = 1;
@t_matches[$j] = 1;
$matches++;
last;
}
}
 
return 1 if $matches == 0;
 
my $k = 0;
my $transpositions = 0;
 
for ^@s -> $i {
@s_matches[$i] or next;
until @t_matches[$k] { ++$k }
@s[$i] eq @t[$k] or ++$transpositions;
++$k;
}
 
my $prefix = 0;
 
++$prefix if @s[$_] eq @t[$_] for ^(min 4, $s_len, $t_len);
 
my $jaro = ($matches / $s_len + $matches / $t_len +
(($matches - $transpositions / 2) / $matches)) / 3;
 
1 - ($jaro + $prefix * .1 * ( 1 - $jaro) )
}
 
 
my @words = './unixdict.txt'.IO.slurp.words;
 
for <accomodate definately goverment occured publically recieve seperate untill wich>
-> $word {
 
my %result = @words.race.map: { $_ => jaro-winkler($word, $_) };
 
say "\nClosest 5 dictionary words with a Jaro-Winkler distance < .15 from $word:";
 
printf "%15s : %0.4f\n", .key, .value for %result.grep({ .value < .15 }).sort({+.value, ~.key}).head(5);
}</syntaxhighlight>
{{out}}
<pre>Closest 5 dictionary words with a Jaro-Winkler distance < .15 from accomodate:
accommodate : 0.0182
accordant : 0.1044
accolade : 0.1136
acclimate : 0.1219
accompanist : 0.1327
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from definately:
define : 0.0800
definite : 0.0850
defiant : 0.0886
definitive : 0.1200
designate : 0.1219
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from goverment:
govern : 0.0667
governor : 0.1167
governess : 0.1175
governance : 0.1330
coverlet : 0.1361
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from occured:
occurred : 0.0250
occur : 0.0571
occurrent : 0.0952
occlude : 0.1056
concurred : 0.1217
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from publically:
public : 0.0800
publication : 0.1327
pull : 0.1400
pullback : 0.1492
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from recieve:
receive : 0.0333
relieve : 0.0667
reeve : 0.0762
receptive : 0.0852
recessive : 0.0852
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from seperate:
desperate : 0.0708
separate : 0.0917
temperate : 0.1042
selenate : 0.1167
sept : 0.1167
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from untill:
until : 0.0333
till : 0.1111
huntsville : 0.1333
instill : 0.1357
unital : 0.1422
 
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from wich:
winch : 0.0533
witch : 0.0533
which : 0.0600
wichita : 0.0857
switch : 0.1111</pre>
 
=={{header|Rust}}==
{{trans|Python}}
<langsyntaxhighlight lang="rust">use std::fs::File;
use std::io::{self, BufRead};
 
Line 228 ⟶ 2,151:
let mut st1 = string1;
let mut st2 = string2;
iflet mut len1 = st1.lenchars() < st2.lencount() {;
let mut len2 = st2.chars().count();
if len1 < len2 {
std::mem::swap(&mut st1, &mut st2);
std::mem::swap(&mut len1, &mut len2);
}
let len1 = st1.len();
let len2 = st2.len();
if len2 == 0 {
return if len1 == 0 { 0.0 } else { 1.0 };
}
let delta = std::cmp::max(01, len2len1 / 2) - 1);
let mut flag = vec![false; len2];
let mut ch1_match = vec![];
Line 291 ⟶ 2,215:
Ordering::Greater
} else {
Ordering::Equal
}
};
Line 322 ⟶ 2,246:
Err(error) => eprintln!("{}", error),
}
}</langsyntaxhighlight>
 
{{out}}
The output is slightly different from that of the Python example because I've removed
a trailing zero-width space character from some of the test strings.
<pre>
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'accomodate' are:
Line 389 ⟶ 2,311:
untie | 0.1067
untimely | 0.1083
till | 0.1111
Antilles | 0.1264
untidy | 0.1333
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'wich' are:
Line 396 ⟶ 2,318:
witch | 0.0533
which | 0.0600
switch | 0.1111
twitch | 0.1111
witches | 0.1143
rich | 0.1167
wick | 0.1167
 
</pre>
 
=={{header|Swift}}==
{{trans|Rust}}
<syntaxhighlight lang="swift">import Foundation
 
func loadDictionary(_ path: String) throws -> [String] {
let contents = try String(contentsOfFile: path, encoding: String.Encoding.ascii)
return contents.components(separatedBy: "\n")
}
 
func jaroWinklerDistance(string1: String, string2: String) -> Double {
var st1 = Array(string1)
var st2 = Array(string2)
var len1 = st1.count
var len2 = st2.count
if len1 < len2 {
swap(&st1, &st2)
swap(&len1, &len2)
}
if len2 == 0 {
return len1 == 0 ? 0.0 : 1.0
}
let delta = max(1, len1 / 2) - 1
var flag = Array(repeating: false, count: len2)
var ch1Match: [Character] = []
ch1Match.reserveCapacity(len1)
for idx1 in 0..<len1 {
let ch1 = st1[idx1]
for idx2 in 0..<len2 {
let ch2 = st2[idx2]
if idx2 <= idx1 + delta && idx2 + delta >= idx1 && ch1 == ch2 && !flag[idx2] {
flag[idx2] = true
ch1Match.append(ch1)
break
}
}
}
let matches = ch1Match.count
if matches == 0 {
return 1.0
}
var transpositions = 0
var idx1 = 0
for idx2 in 0..<len2 {
if flag[idx2] {
if st2[idx2] != ch1Match[idx1] {
transpositions += 1
}
idx1 += 1
}
}
let m = Double(matches)
let jaro =
(m / Double(len1) + m / Double(len2) + (m - Double(transpositions) / 2.0) / m) / 3.0
var commonPrefix = 0
for i in 0..<min(4, len2) {
if st1[i] == st2[i] {
commonPrefix += 1
}
}
return 1.0 - (jaro + Double(commonPrefix) * 0.1 * (1.0 - jaro))
}
 
func withinDistance(words: [String], maxDistance: Double, string: String,
maxToReturn: Int) -> [(String, Double)] {
var arr = Array(words.map{($0, jaroWinklerDistance(string1: string, string2: $0))}
.filter{$0.1 <= maxDistance})
arr.sort(by: { x, y in return x.1 < y.1 })
return Array(arr[0..<min(maxToReturn, arr.count)])
}
 
func pad(string: String, width: Int) -> String {
if string.count >= width {
return string
}
return String(repeating: " ", count: width - string.count) + string
}
 
do {
let dict = try loadDictionary("linuxwords.txt")
for word in ["accomodate", "definately", "goverment", "occured",
"publically", "recieve", "seperate", "untill", "wich"] {
print("Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to '\(word)' are:")
print(" Word | Distance")
for (w, dist) in withinDistance(words: dict, maxDistance: 0.15,
string: word, maxToReturn: 5) {
print("\(pad(string: w, width: 14)) | \(String(format: "%6.4f", dist))")
}
print()
}
} catch {
print(error.localizedDescription)
}</syntaxhighlight>
 
{{out}}
<pre>
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'accomodate' are:
Word | Distance
accommodate | 0.0182
accommodated | 0.0333
accommodates | 0.0333
accommodating | 0.0815
accommodation | 0.0815
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'definately' are:
Word | Distance
definitely | 0.0400
defiantly | 0.0422
define | 0.0800
definite | 0.0850
definable | 0.0872
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'goverment' are:
Word | Distance
government | 0.0533
govern | 0.0667
governments | 0.0697
movement | 0.0810
governmental | 0.0833
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'occured' are:
Word | Distance
occurred | 0.0250
occur | 0.0571
occupied | 0.0786
occurs | 0.0905
accursed | 0.0917
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'publically' are:
Word | Distance
publicly | 0.0400
public | 0.0800
publicity | 0.1044
publication | 0.1327
biblically | 0.1400
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'recieve' are:
Word | Distance
receive | 0.0333
received | 0.0625
receiver | 0.0625
receives | 0.0625
relieve | 0.0667
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'seperate' are:
Word | Distance
desperate | 0.0708
separate | 0.0917
temperate | 0.1042
separated | 0.1144
separates | 0.1144
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'untill' are:
Word | Distance
until | 0.0333
untie | 0.1067
untimely | 0.1083
till | 0.1111
Antilles | 0.1264
 
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'wich' are:
Word | Distance
witch | 0.0533
which | 0.0600
switch | 0.1111
twitch | 0.1111
witches | 0.1143
 
</pre>
 
=={{header|Typescript}}==
{{trans|Java}}
<syntaxhighlight lang="typescript">
var fs = require('fs')
 
// Jaro Winkler Distance Formula
function jaroDistance(string1: string, string2: string): number{
// Compute Jaro-Winkler distance between two string
// Swap strings if string1 is shorter than string 2
if (string1.length < string2.length){
const tempString: string = string1;
string1 = string2;
string2 = tempString
}
let len1: number = string1.length
let len2: number = string2.length
if (!len2){
return 0.0
}
const delta: number = Math.max(1, len1 / 2.0) - 1.0;
// Flags for transpositions
let flag: boolean[] = Array(len2).fill(false)
let ch1Match: string[] = Array(len1).fill('')
// Count number of matching characters
let matches = 0
// Check if characters on both string matches
for (let i: number = 0; i < len1; i++){
const ch1: string = string1[i]
for (let j = 0; j < len2; j++){
const ch2: string = string2[j]
if (j <= i + delta && j + delta >= 1 && ch1 == ch2 && !flag[j]){
flag[j] = true
ch1Match[matches++] = ch1;
break;
}
}
}
if (!matches){
return 1.0
}
// Count number of transpositions (shared characters placed in different positions)
let transpositions: number = 0.0
for (let i: number = 0, j: number = 0; j < len2; j++){
if (flag[j]){
if (string2[j] != ch1Match[i]){
transpositions++
}
i++
}
}
const m: number = matches
// Jaro Similarity Formula simj = ( (m / length of s1) + (m / length of s2) + (m - t) / m ) / 3
const jaro: number = (m / len1 + m / len2 + (m - transpositions / 2.0) / m) / 3.0
// Length of common prefix between string up to 4 characters
let commonPrefix: number = 0.0
len2 = Math.min(4, len2)
for (let i: number = 0; i < len2; i++){
if (string1[i] == string2[i]){
commonPrefix++
}
}
// Jaro Winkler Distance Formula simw = simj + lp(1 - simj)
return 1.0 - (jaro + commonPrefix * 0.1 * (1.0 - jaro))
}
 
// Compute Jaro Winkler Distance for every word on the dictionary against the misspelled word
function withinDistance(words: string[] ,maxDistance: number, string: string, maxToReturn: number): (string | number)[][]{
let result: (string | number)[][] = new Array()
words.forEach(word =>{
const distance = jaroDistance(word, string)
// check if computed jaro winkler distance is within the set distance parameter
if (distance <= maxDistance){
const tuple = [distance, word]
result.push(tuple)
}
})
result.sort()
// Limit of matches set to maxtoReturn
return result.length <= maxToReturn ? result : result.slice(0, maxToReturn)
}
 
function loadDictionary(fileName: string): string[]{
let words: string[] = new Array()
try{
//attacomsian.com/blog/reading-a-file-line-by-line-in-nodejs
const data = fs.readFileSync(fileName, 'utf-8')
const lines: string[] = data.split(/\r?\n/)
lines.forEach(line => {
words.push(line)
})
return words
}
catch(error){
console.log("Error reading dictionary")
}
}
 
function main(): void{
try {
const misspellings = [
"accomodate​",
"definately​",
"goverment",
"occured",
"publically",
"recieve",
"seperate",
"untill",
"wich"
]
//unixdict.txt from users.cs.duke.edu/~ola/ap/linuxwords
let words: string[] = loadDictionary("unixdict.txt")
 
misspellings.forEach(spelling =>{
console.log("Misspelling:", spelling)
const closeWord = withinDistance(words, 0.15, spelling, 5)
closeWord.forEach(word =>{
console.log((word[0] as number).toFixed(4) + " " + word[1])
})
console.log("")
})
}
catch(error) {
console.log("Error on main")
}
}
main();
</syntaxhighlight>
{{out}}
<pre>
Misspelling: accomodate​
0.0364 accommodate
0.0515 accommodated
0.0515 accommodates
0.0979 accommodating
0.0979 accommodation
 
Misspelling: definately​
0.0564 definitely
0.0586 defiantly
0.0909 define
0.0977 definite
0.1013 defiant
 
Misspelling: goverment
0.0533 government
0.0667 govern
0.0697 governments
0.0833 governmental
0.0952 governs
 
Misspelling: occured
0.0250 occurred
0.0571 occur
0.0786 occupied
0.0905 occurs
0.0917 accursed
 
Misspelling: publically
0.0400 publicly
0.0800 public
0.1044 publicity
0.1327 publication
0.1400 biblically
 
Misspelling: recieve
0.0333 receive
0.0625 received
0.0625 receiver
0.0625 receives
0.0667 relieve
 
Misspelling: seperate
0.0708 desperate
0.1042 temperate
0.1083 separate
0.1167 repeated
0.1167 sept
 
Misspelling: untill
0.0333 until
0.1067 untie
0.1083 untimely
0.1111 till
0.1264 Antilles
 
Misspelling: wich
0.0533 witch
0.0600 which
0.1111 switch
0.1111 twitch
0.1143 witches
</pre>
 
=={{header|V (Vlang)}}==
{{trans|Go}}
<syntaxhighlight lang="v (vlang)">import os
 
fn jaro_sim(str1 string, str2 string) f64 {
if str1.len == 0 && str2.len == 0 {
return 1
}
if str1.len == 0 || str2.len == 0 {
return 0
}
mut match_distance := str1.len
if str2.len > match_distance {
match_distance = str2.len
}
match_distance = match_distance/2 - 1
mut str1_matches := []bool{len: str1.len}
mut str2_matches := []bool{len: str2.len}
mut matches := 0.0
mut transpositions := 0.0
for i in 0..str1.len {
mut start := i - match_distance
if start < 0 {
start = 0
}
mut end := i + match_distance + 1
if end > str2.len {
end = str2.len
}
for k in start..end {
if str2_matches[k] {
continue
}
if str1[i] != str2[k] {
continue
}
str1_matches[i] = true
str2_matches[k] = true
matches++
break
}
}
if matches == 0 {
return 0
}
mut k := 0
for i in 0.. str1.len {
if !str1_matches[i] {
continue
}
for !str2_matches[k] {
k++
}
if str1[i] != str2[k] {
transpositions++
}
k++
}
transpositions /= 2
return (matches/f64(str1.len) +
matches/f64(str2.len) +
(matches-transpositions)/matches) / 3
}
fn jaro_winkler_dist(s string, t string) f64 {
ls := s.len
lt := t.len
mut lmax := lt
if ls < lt {
lmax = ls
}
if lmax > 4 {
lmax = 4
}
mut l := 0
for i in 0 .. lmax {
if s[i] == t[i] {
l++
}
}
js := jaro_sim(s, t)
p := 0.1
ws := js + f64(l)*p*(1-js)
return 1 - ws
}
struct Wd {
word string
dist f64
}
fn main() {
misspelt := [
"accomodate", "definately", "goverment", "occured", "publically",
"recieve", "seperate", "untill", "wich",
]
words := os.read_lines('unixdict.txt')?
for ms in misspelt {
mut closest := []Wd{}
for word in words {
if word == "" {
continue
}
jwd := jaro_winkler_dist(ms, word)
if jwd < 0.15 {
closest << Wd{word, jwd}
}
}
println("Misspelt word: $ms:")
closest.sort(a.dist<b.dist)
for i, c in closest {
println("${c.dist:.4f} ${c.word}")
if i == 5 {
break
}
}
println('')
}
}</syntaxhighlight>
 
{{out}}
<pre>
Misspelt word: accomodate :
0.0182 accommodate
0.1044 accordant
0.1136 accolade
0.1219 acclimate
0.1327 accompanist
0.1333 accord
 
Misspelt word: definately :
0.0800 define
0.0850 definite
0.0886 defiant
0.1200 definitive
0.1219 designate
0.1267 deflate
 
Misspelt word: goverment :
0.0667 govern
0.1167 governor
0.1175 governess
0.1330 governance
0.1361 coverlet
0.1367 sovereignty
 
Misspelt word: occured :
0.0250 occurred
0.0571 occur
0.0952 occurrent
0.1056 occlude
0.1217 concurred
0.1429 cure
 
Misspelt word: publically :
0.0800 public
0.1327 publication
0.1400 pull
0.1492 pullback
 
Misspelt word: recieve :
0.0333 receive
0.0667 relieve
0.0762 reeve
0.0852 receptive
0.0852 recessive
0.0905 recife
 
Misspelt word: seperate :
0.0708 desperate
0.0917 separate
0.1042 temperate
0.1167 selenate
0.1167 sewerage
0.1167 sept
 
Misspelt word: untill :
0.0333 until
0.1111 till
0.1333 huntsville
0.1357 instill
0.1422 unital
 
Misspelt word: wich :
0.0533 winch
0.0533 witch
0.0600 which
0.0857 wichita
0.1111 switch
0.1111 twitch
</pre>
 
=={{header|Wren}}==
{{libheader|Wren-fmt}}
{{libheader|Wren-sort}}
This uses unixdict and borrows code from the [[Jaro_distance#Wren]] task.
<syntaxhighlight lang="wren">import "io" for File
import "./fmt" for Fmt
import "./sort" for Sort
 
var jaroSim = Fn.new { |s1, s2|
var le1 = s1.count
var le2 = s2.count
if (le1 == 0 && le2 == 0) return 1
if (le1 == 0 || le2 == 0) return 0
var dist = (le2 > le1) ? le2 : le1
dist = (dist/2).floor - 1
var matches1 = List.filled(le1, false)
var matches2 = List.filled(le2, false)
var matches = 0
var transpos = 0
for (i in 0...s1.count) {
var start = i - dist
if (start < 0) start = 0
var end = i + dist + 1
if (end > le2) end = le2
var k = start
while (k < end) {
if (!(matches2[k] || s1[i] != s2[k])) {
matches1[i] = true
matches2[k] = true
matches = matches + 1
break
}
k = k + 1
}
}
if (matches == 0) return 0
var k = 0
for (i in 0...s1.count) {
if (matches1[i]) {
while(!matches2[k]) k = k + 1
if (s1[i] != s2[k]) transpos = transpos + 1
k = k + 1
}
}
transpos = transpos / 2
return (matches/le1 + matches/le2 + (matches - transpos)/matches) / 3
}
 
var jaroWinklerDist = Fn.new { |s, t|
var ls = s.count
var lt = t.count
var lmax = (ls < lt) ? ls : lt
if (lmax > 4) lmax = 4
var l = 0
for (i in 0...lmax) {
if (s[i] == t[i]) l = l + 1
}
var js = jaroSim.call(s, t)
var p = 0.1
var ws = js + l*p*(1 - js)
return 1 - ws
}
 
var misspelt = ["accomodate", "definately", "goverment", "occured", "publically", "recieve", "seperate", "untill", "wich"]
var words = File.read("unixdict.txt").split("\n").map { |w| w.trim() }.where { |w| w != "" }
for (ms in misspelt) {
var closest = []
for (word in words) {
var jwd = jaroWinklerDist.call(ms, word)
if (jwd < 0.15) closest.add([word, jwd])
}
System.print("Misspelt word: %(ms):")
var cmp = Fn.new { |n1, n2| (n1[1]-n2[1]).sign }
Sort.insertion(closest, cmp)
for (c in closest.take(6)) Fmt.print("$0.4f $s", c[1], c[0])
System.print()
}</syntaxhighlight>
 
{{out}}
<pre>
Misspelt word: accomodate:
0.0182 accommodate
0.1044 accordant
0.1136 accolade
0.1219 acclimate
0.1327 accompanist
0.1333 accord
 
Misspelt word: definately:
0.0800 define
0.0850 definite
0.0886 defiant
0.1200 definitive
0.1219 designate
0.1267 deflate
 
Misspelt word: goverment:
0.0667 govern
0.1167 governor
0.1175 governess
0.1330 governance
0.1361 coverlet
0.1367 sovereignty
 
Misspelt word: occured:
0.0250 occurred
0.0571 occur
0.0952 occurrent
0.1056 occlude
0.1217 concurred
0.1429 cure
 
Misspelt word: publically:
0.0800 public
0.1327 publication
0.1400 pull
0.1492 pullback
 
Misspelt word: recieve:
0.0333 receive
0.0667 relieve
0.0762 reeve
0.0852 receptive
0.0852 recessive
0.0905 recife
 
Misspelt word: seperate:
0.0708 desperate
0.0917 separate
0.1042 temperate
0.1167 selenate
0.1167 sept
0.1167 sewerage
 
Misspelt word: untill:
0.0333 until
0.1111 till
0.1333 huntsville
0.1357 instill
0.1422 unital
 
Misspelt word: wich:
0.0533 winch
0.0533 witch
0.0600 which
0.0857 wichita
0.1111 switch
0.1111 twitch
</pre>
6,951

edits