Jaro-Winkler distance
The Jaro-Winkler distance is a metric for measuring the edit distance between words. It is similar to the more basic Levenstein distance but the Jaro distance also accounts for transpositions between letters in the words. With the Winkler modification to the Jaro metric, the Jaro-Winkler distance also adds an increase in similarity for words which start with the same letters (prefix).
The Jaro-Winkler distance is a modification of the Jaro similarity metric, which measures the similarity between two strings. The Jaro similarity is 1.0 when strings are identical and 0 when strings have no letters in common. Distance measures such as the Jaro distance or Jaro-Winkler distance, on the other hand, are 0 when strings are identical and 1 when they have no letters in common.
The Jaro similarity between two strings s1 and s2, simj, is defined as
- simj = 0 if m is 0.
- simj = ( (m / length of s1) + (m / length of s2) + (m - t) / m ) / 3 otherwise.
Where:
- is the number of matching characters (the same character within max(|s1|, |s2|)/2 - 1 of one another);
- is half the number of transpositions (a shared character placed in different positions).
The Winkler modification to Jaro is to check for identical prefixes of the strings.
If we define the number of initial (prefix) characters in common as:
l = the length of a common prefix between strings, up to 4 characters
and, additionally, select a multiplier (Winkler suggested 0.1) for the relative importance of the prefix for the word similarity:
p = 0.1
The Jaro-Winkler similarity can then be defined as
simw = simj + lp(1 - simj)
Where:
- simj is the Jaro similarity.
- l is the number of matching characters at the beginning of the strings, up to 4.
- p is a factor to modify the amount to which the prefix similarity affects the metric.
Winkler suggested this be 0.1.
The Jaro-Winkler distance between strings, which is 0.0 for identical strings, is then defined as
dw = 1 - simw
String metrics such as Jaro-Winkler distance are useful in applications such as spelling checkers, because letter transpositions are common typing errors and humans tend to misspell the middle portions of words more often than their beginnings. This may help a spelling checker program to generate better alternatives for misspelled word replacement.
- The task
Using a dictionary of your choice and the following list of 9 commonly misspelled words:
"accomodate", "definately", "goverment", "occured", "publically", "recieve", "seperate", "untill", "wich"
- Calculate the Jaro-Winkler distance between the misspelled word and words in the dictionary.
- Use this distance to list close alternatives (at least two per word) to the misspelled words.
- Show the calculated distances between the misspelled words and their potential replacements.
- See also
-
- Wikipedia page: Jaro–Winkler distance.
- Comparing string similarity algorithms. Comparison of algorithms on Medium
Go
This uses unixdict and borrows code from the Jaro_distance#Go task. Otherwise it is a translation of the Wren entry. <lang go>package main
import (
"bytes" "fmt" "io/ioutil" "log" "sort"
)
func jaroSim(str1, str2 string) float64 {
if len(str1) == 0 && len(str2) == 0 { return 1 } if len(str1) == 0 || len(str2) == 0 { return 0 } match_distance := len(str1) if len(str2) > match_distance { match_distance = len(str2) } match_distance = match_distance/2 - 1 str1_matches := make([]bool, len(str1)) str2_matches := make([]bool, len(str2)) matches := 0. transpositions := 0. for i := range str1 { start := i - match_distance if start < 0 { start = 0 } end := i + match_distance + 1 if end > len(str2) { end = len(str2) } for k := start; k < end; k++ { if str2_matches[k] { continue } if str1[i] != str2[k] { continue } str1_matches[i] = true str2_matches[k] = true matches++ break } } if matches == 0 { return 0 } k := 0 for i := range str1 { if !str1_matches[i] { continue } for !str2_matches[k] { k++ } if str1[i] != str2[k] { transpositions++ } k++ } transpositions /= 2 return (matches/float64(len(str1)) + matches/float64(len(str2)) + (matches-transpositions)/matches) / 3
}
func jaroWinklerDist(s, t string) float64 {
ls := len(s) lt := len(t) lmax := lt if ls < lt { lmax = ls } if lmax > 4 { lmax = 4 } l := 0 for i := 0; i < lmax; i++ { if s[i] == t[i] { l++ } } js := jaroSim(s, t) p := 0.1 ws := js + float64(l)*p*(1-js) return 1 - ws
}
type wd struct {
word string dist float64
}
func main() {
misspelt := []string{ "accomodate", "definately", "goverment", "occured", "publically", "recieve", "seperate", "untill", "wich", } b, err := ioutil.ReadFile("unixdict.txt") if err != nil { log.Fatal("Error reading file") } words := bytes.Fields(b) for _, ms := range misspelt { var closest []wd for _, w := range words { word := string(w) if word == "" { continue } jwd := jaroWinklerDist(ms, word) if jwd < 0.15 { closest = append(closest, wd{word, jwd}) } } fmt.Println("Misspelt word:", ms, ":") sort.Slice(closest, func(i, j int) bool { return closest[i].dist < closest[j].dist }) for i, c := range closest { fmt.Printf("%0.4f %s\n", c.dist, c.word) if i == 5 { break } } fmt.Println() }
}</lang>
- Output:
Misspelt word: accomodate : 0.0182 accommodate 0.1044 accordant 0.1136 accolade 0.1219 acclimate 0.1327 accompanist 0.1333 accord Misspelt word: definately : 0.0800 define 0.0850 definite 0.0886 defiant 0.1200 definitive 0.1219 designate 0.1267 deflate Misspelt word: goverment : 0.0667 govern 0.1167 governor 0.1175 governess 0.1330 governance 0.1361 coverlet 0.1367 sovereignty Misspelt word: occured : 0.0250 occurred 0.0571 occur 0.0952 occurrent 0.1056 occlude 0.1217 concurred 0.1429 cure Misspelt word: publically : 0.0800 public 0.1327 publication 0.1400 pull 0.1492 pullback Misspelt word: recieve : 0.0333 receive 0.0667 relieve 0.0762 reeve 0.0852 receptive 0.0852 recessive 0.0905 recife Misspelt word: seperate : 0.0708 desperate 0.0917 separate 0.1042 temperate 0.1167 selenate 0.1167 sewerage 0.1167 sept Misspelt word: untill : 0.0333 until 0.1111 till 0.1333 huntsville 0.1357 instill 0.1422 unital Misspelt word: wich : 0.0533 winch 0.0533 witch 0.0600 which 0.0857 wichita 0.1111 switch 0.1111 twitch
Julia
<lang julia># download("http://users.cs.duke.edu/~ola/ap/linuxwords", "linuxwords.txt") const words = read("linuxwords.txt", String) |> split .|> strip
function jarowinklerdistance(s1, s2)
if length(s1) < length(s2) s1, s2 = s2, s1 end len1, len2 = length(s1), length(s2) len2 == 0 && return 0.0 delta = max(0, len2 ÷ 2 - 1) flag = zeros(Bool, len2) # flags for possible transpositions, begin as false ch1_match = eltype(s1)[] for (i, ch1) in enumerate(s1) for (j, ch2) in enumerate(s2) if (j <= i + delta) && (j >= i - delta) && (ch1 == ch2) && !flag[j] flag[j] = true push!(ch1_match, ch1) break end end end matches = length(ch1_match) matches == 0 && return 1.0 transpositions, i = 0, 0 for (j, ch2) in enumerate(s2) if flag[j] i += 1 transpositions += (ch2 != ch1_match[i]) end end jaro = (matches / len1 + matches / len2 + (matches - transpositions/2) / matches) / 3.0 commonprefix = count(i -> s1[i] == s2[i], 1:min(len2, 4)) return 1 - (jaro + commonprefix * 0.1 * (1 - jaro))
end
function closewords(s, maxdistance, maxtoreturn)
jw = 0.0 arr = [(w, jw) for w in words if (jw = jarowinklerdistance(s, w)) <= maxdistance] sort!(arr, lt=(x, y) -> x[2] < y[2]) return length(arr) <= maxtoreturn ? arr : arr[1:maxtoreturn]
end
for s in ["accomodate", "definately", "goverment", "occured", "publically",
"recieve", "seperate", "untill", "wich"] println("\nClose dictionary words ( distance < 0.15 using Jaro-Winkler distance) to '$s' are:") println(" Word | Distance") for (w, jw) in closewords(s, 0.15, 5) println(rpad(w, 14), "| ", Float16(jw)) end
end
</lang>
- Output:
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'accomodate' are: Word | Distance accommodate | 0.01819 accommodated | 0.03333 accommodates | 0.03333 accommodating | 0.08154 accommodation | 0.08154 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'definately' are: Word | Distance definitely | 0.04 defiantly | 0.04224 define | 0.08 definite | 0.085 definable | 0.0872 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'goverment' are: Word | Distance government | 0.05334 govern | 0.06665 governments | 0.0697 movement | 0.081 governmental | 0.0833 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'occured' are: Word | Distance occurred | 0.025 occur | 0.05713 occupied | 0.07855 occurs | 0.09045 accursed | 0.0917 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'publically' are: Word | Distance publicly | 0.04 public | 0.08 publicity | 0.10443 publication | 0.1327 biblically | 0.14 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'recieve' are: Word | Distance receive | 0.03333 received | 0.0625 receiver | 0.0625 receives | 0.0625 relieve | 0.06665 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'seperate' are: Word | Distance desperate | 0.07086 separate | 0.0917 temperate | 0.1042 separated | 0.1144 separates | 0.1144 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'untill' are: Word | Distance until | 0.03333 untie | 0.1067 untimely | 0.10834 Antilles | 0.1263 untidy | 0.1333 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to 'wich' are: Word | Distance witch | 0.05334 which | 0.06 witches | 0.11426 rich | 0.11664 wick | 0.11664
Python
<lang python>""" Test Jaro-Winkler distance metric. linuxwords.txt is from http://users.cs.duke.edu/~ola/ap/linuxwords """
WORDS = [s.strip() for s in open("linuxwords.txt").read().split()] MISSPELLINGS = [
"accomodate", "definately", "goverment", "occured", "publically", "recieve", "seperate", "untill", "wich",
]
def jaro_winkler_distance(st1, st2):
""" Compute Jaro-Winkler distance between two strings. """ if len(st1) < len(st2): st1, st2 = st2, st1 len1, len2 = len(st1), len(st2) if len2 == 0: return 0.0 delta = max(0, len2 // 2 - 1) flag = [False for _ in range(len2)] # flags for possible transpositions ch1_match = [] for idx1, ch1 in enumerate(st1): for idx2, ch2 in enumerate(st2): if idx2 <= idx1 + delta and idx2 >= idx1 - delta and ch1 == ch2 and not flag[idx2]: flag[idx2] = True ch1_match.append(ch1) break
matches = len(ch1_match) if matches == 0: return 1.0 transpositions, idx1 = 0, 0 for idx2, ch2 in enumerate(st2): if flag[idx2]: transpositions += (ch2 != ch1_match[idx1]) idx1 += 1
jaro = (matches / len1 + matches / len2 + (matches - transpositions/2) / matches) / 3.0 commonprefix = 0 for i in range(min(4, len2)): commonprefix += (st1[i] == st2[i])
return 1.0 - (jaro + commonprefix * 0.1 * (1 - jaro))
def within_distance(maxdistance, stri, maxtoreturn):
""" Find words in WORDS of closeness to stri within maxdistance, return up to maxreturn of them. """ arr = [w for w in WORDS if jaro_winkler_distance(stri, w) <= maxdistance] arr.sort(key=lambda x: jaro_winkler_distance(stri, x)) return arr if len(arr) <= maxtoreturn else arr[:maxtoreturn]
for STR in MISSPELLINGS:
print('\nClose dictionary words ( distance < 0.15 using Jaro-Winkler distance) to "', STR, '" are:\n Word | Distance') for w in within_distance(0.15, STR, 5): print('{:>14} | {:6.4f}'.format(w, jaro_winkler_distance(STR, w)))
</lang>
- Output:
Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " accomodate " are: Word | Distance accommodate | 0.0182 accommodated | 0.0333 accommodates | 0.0333 accommodating | 0.0815 accommodation | 0.0815 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " definately " are: Word | Distance definitely | 0.0400 defiantly | 0.0422 define | 0.0800 definite | 0.0850 definable | 0.0872 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " goverment " are: Word | Distance government | 0.0533 govern | 0.0667 governments | 0.0697 movement | 0.0810 governmental | 0.0833 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " occured " are: Word | Distance occurred | 0.0250 occur | 0.0571 occupied | 0.0786 occurs | 0.0905 accursed | 0.0917 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " publically " are: Word | Distance publicly | 0.0400 public | 0.0800 publicity | 0.1044 publication | 0.1327 biblically | 0.1400 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " recieve " are: Word | Distance receive | 0.0333 received | 0.0625 receiver | 0.0625 receives | 0.0625 relieve | 0.0667 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " seperate " are: Word | Distance desperate | 0.0708 separate | 0.0917 temperate | 0.1042 separated | 0.1144 separates | 0.1144 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " untill " are: Word | Distance until | 0.0333 untie | 0.1067 untimely | 0.1083 Antilles | 0.1264 untidy | 0.1333 Close dictionary words ( distance < 0.15 using Jaro-Winkler distance) to " wich " are: Word | Distance witch | 0.0533 which | 0.0600 witches | 0.1143 rich | 0.1167 wick | 0.1167
Raku
A minor modification of the Jaro distance task entry.
using the unixdict.txt file from www.puzzlers.org
<lang perl6>sub jaro-winkler ($s, $t) {
return 0 if $s eq $t;
my $s_len = + my @s = $s.comb; my $t_len = + my @t = $t.comb;
my $match_distance = ($s_len max $t_len) div 2 - 1;
my @s_matches; my @t_matches; my $matches = 0;
for ^@s -> $i {
my $start = 0 max $i - $match_distance; my $end = $i + $match_distance min ($t_len - 1);
for $start .. $end -> $j { @t_matches[$j] and next; @s[$i] eq @t[$j] or next; @s_matches[$i] = 1; @t_matches[$j] = 1; $matches++; last; } }
return 1 if $matches == 0;
my $k = 0; my $transpositions = 0;
for ^@s -> $i { @s_matches[$i] or next; until @t_matches[$k] { ++$k } @s[$i] eq @t[$k] or ++$transpositions; ++$k; }
my $prefix = 0;
++$prefix if @s[$_] eq @t[$_] for ^(min 4, $s_len, $t_len);
my $jaro = ($matches / $s_len + $matches / $t_len + (($matches - $transpositions / 2) / $matches)) / 3;
1 - ($jaro + $prefix * .1 * ( 1 - $jaro) )
}
my @words = './unixdict.txt'.IO.slurp.words;
for <accomodate definately goverment occured publically recieve seperate untill wich>
-> $word {
my %result = @words.race.map: { $_ => jaro-winkler($word, $_) };
say "\nClosest 5 dictionary words with a Jaro-Winkler distance < .15 from $word:";
printf "%15s : %0.4f\n", .key, .value for %result.grep({ .value < .15 }).sort({+.value, ~.key}).head(5);
}</lang>
- Output:
Closest 5 dictionary words with a Jaro-Winkler distance < .15 from accomodate: accommodate : 0.0182 accordant : 0.1044 accolade : 0.1136 acclimate : 0.1219 accompanist : 0.1327 Closest 5 dictionary words with a Jaro-Winkler distance < .15 from definately: define : 0.0800 definite : 0.0850 defiant : 0.0886 definitive : 0.1200 designate : 0.1219 Closest 5 dictionary words with a Jaro-Winkler distance < .15 from goverment: govern : 0.0667 governor : 0.1167 governess : 0.1175 governance : 0.1330 coverlet : 0.1361 Closest 5 dictionary words with a Jaro-Winkler distance < .15 from occured: occurred : 0.0250 occur : 0.0571 occurrent : 0.0952 occlude : 0.1056 concurred : 0.1217 Closest 5 dictionary words with a Jaro-Winkler distance < .15 from publically: public : 0.0800 publication : 0.1327 pull : 0.1400 pullback : 0.1492 Closest 5 dictionary words with a Jaro-Winkler distance < .15 from recieve: receive : 0.0333 relieve : 0.0667 reeve : 0.0762 receptive : 0.0852 recessive : 0.0852 Closest 5 dictionary words with a Jaro-Winkler distance < .15 from seperate: desperate : 0.0708 separate : 0.0917 temperate : 0.1042 selenate : 0.1167 sept : 0.1167 Closest 5 dictionary words with a Jaro-Winkler distance < .15 from untill: until : 0.0333 till : 0.1111 huntsville : 0.1333 instill : 0.1357 unital : 0.1422 Closest 5 dictionary words with a Jaro-Winkler distance < .15 from wich: winch : 0.0533 witch : 0.0533 which : 0.0600 wichita : 0.0857 switch : 0.1111
Rust
<lang rust>use std::fs::File; use std::io::{self, BufRead};
fn load_dictionary(filename: &str) -> std::io::Result<Vec<String>> {
let file = File::open(filename)?; let mut dict = Vec::new(); for line in io::BufReader::new(file).lines() { dict.push(line?); } Ok(dict)
}
fn jaro_winkler_distance(string1: &str, string2: &str) -> f64 {
let mut st1 = string1; let mut st2 = string2; if st1.len() < st2.len() { std::mem::swap(&mut st1, &mut st2); } let len1 = st1.len(); let len2 = st2.len(); if len2 == 0 { return 0.0; } let delta = std::cmp::max(1, len2 / 2) - 1; let mut flag = vec![false; len2]; let mut ch1_match = vec![]; for (idx1, ch1) in st1.chars().enumerate() { for (idx2, ch2) in st2.chars().enumerate() { if idx2 <= idx1 + delta && idx2 + delta >= idx1 && ch1 == ch2 && !flag[idx2] { flag[idx2] = true; ch1_match.push(ch1); break; } } } let matches = ch1_match.len(); if matches == 0 { return 1.0; } let mut transpositions = 0; let mut idx1 = 0; for (idx2, ch2) in st2.chars().enumerate() { if flag[idx2] { transpositions += (ch2 != ch1_match[idx1]) as i32; idx1 += 1; } } let m = matches as f64; let jaro = (m / (len1 as f64) + m / (len2 as f64) + (m - (transpositions as f64) / 2.0) / m) / 3.0; let mut commonprefix = 0; for (c1, c2) in st1.chars().zip(st2.chars()).take(std::cmp::min(4, len2)) { commonprefix += (c1 == c2) as i32; } 1.0 - (jaro + commonprefix as f64 * 0.1 * (1.0 - jaro))
}
fn within_distance<'a>(
dict: &'a Vec<String>, max_distance: f64, stri: &str, max_to_return: usize,
) -> Vec<(&'a String, f64)> {
let mut arr: Vec<(&String, f64)> = dict .iter() .map(|w| (w, jaro_winkler_distance(stri, w))) .filter(|x| x.1 <= max_distance) .collect(); // The trait std::cmp::Ord is not implemented for f64, otherwise // we could just do this: // arr.sort_by_key(|x| x.1); let compare_distance = |d1, d2| { use std::cmp::Ordering; if d1 < d2 { Ordering::Less } else if d1 > d2 { Ordering::Greater } else { Ordering::Equal } }; arr.sort_by(|x, y| compare_distance(x.1, y.1)); arr[0..std::cmp::min(max_to_return, arr.len())].to_vec()
}
fn main() {
match load_dictionary("linuxwords.txt") { Ok(dict) => { for word in &[ "accomodate", "definately", "goverment", "occured", "publically", "recieve", "seperate", "untill", "wich", ] { println!("Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to '{}' are:", word); println!(" Word | Distance"); for (w, dist) in within_distance(&dict, 0.15, word, 5) { println!("{:>14} | {:6.4}", w, dist) } println!(); } } Err(error) => eprintln!("{}", error), }
}</lang>
- Output:
Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'accomodate' are: Word | Distance accommodate | 0.0182 accommodated | 0.0333 accommodates | 0.0333 accommodating | 0.0815 accommodation | 0.0815 Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'definately' are: Word | Distance definitely | 0.0400 defiantly | 0.0422 define | 0.0800 definite | 0.0850 definable | 0.0872 Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'goverment' are: Word | Distance government | 0.0533 govern | 0.0667 governments | 0.0697 movement | 0.0810 governmental | 0.0833 Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'occured' are: Word | Distance occurred | 0.0250 occur | 0.0571 occupied | 0.0786 occurs | 0.0905 accursed | 0.0917 Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'publically' are: Word | Distance publicly | 0.0400 public | 0.0800 publicity | 0.1044 publication | 0.1327 biblically | 0.1400 Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'recieve' are: Word | Distance receive | 0.0333 received | 0.0625 receiver | 0.0625 receives | 0.0625 relieve | 0.0667 Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'seperate' are: Word | Distance desperate | 0.0708 separate | 0.0917 temperate | 0.1042 separated | 0.1144 separates | 0.1144 Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'untill' are: Word | Distance until | 0.0333 untie | 0.1067 untimely | 0.1083 Antilles | 0.1264 untidy | 0.1333 Close dictionary words (distance < 0.15 using Jaro-Winkler distance) to 'wich' are: Word | Distance witch | 0.0533 which | 0.0600 witches | 0.1143 rich | 0.1167 wick | 0.1167
Wren
This uses unixdict and borrows code from the Jaro_distance#Wren task. <lang ecmascript>import "io" for File import "/fmt" for Fmt import "/sort" for Sort
var jaroSim = Fn.new { |s1, s2|
var le1 = s1.count var le2 = s2.count if (le1 == 0 && le2 == 0) return 1 if (le1 == 0 || le2 == 0) return 0 var dist = (le2 > le1) ? le2 : le1 dist = (dist/2).floor - 1 var matches1 = List.filled(le1, false) var matches2 = List.filled(le2, false) var matches = 0 var transpos = 0 for (i in 0...s1.count) { var start = i - dist if (start < 0) start = 0 var end = i + dist + 1 if (end > le2) end = le2 var k = start while (k < end) { if (!(matches2[k] || s1[i] != s2[k])) { matches1[i] = true matches2[k] = true matches = matches + 1 break } k = k + 1 } } if (matches == 0) return 0 var k = 0 for (i in 0...s1.count) { if (matches1[i]) { while(!matches2[k]) k = k + 1 if (s1[i] != s2[k]) transpos = transpos + 1 k = k + 1 } } transpos = transpos / 2 return (matches/le1 + matches/le2 + (matches - transpos)/matches) / 3
}
var jaroWinklerDist = Fn.new { |s, t|
var ls = s.count var lt = t.count var lmax = (ls < lt) ? ls : lt if (lmax > 4) lmax = 4 var l = 0 for (i in 0...lmax) { if (s[i] == t[i]) l = l + 1 } var js = jaroSim.call(s, t) var p = 0.1 var ws = js + l*p*(1 - js) return 1 - ws
}
var misspelt = ["accomodate", "definately", "goverment", "occured", "publically", "recieve", "seperate", "untill", "wich"] var words = File.read("unixdict.txt").split("\n").map { |w| w.trim() }.where { |w| w != "" } for (ms in misspelt) {
var closest = [] for (word in words) { var jwd = jaroWinklerDist.call(ms, word) if (jwd < 0.15) closest.add([word, jwd]) } System.print("Misspelt word: %(ms):") var cmp = Fn.new { |n1, n2| (n1[1]-n2[1]).sign } Sort.insertion(closest, cmp) for (c in closest.take(6)) Fmt.print("$0.4f $s", c[1], c[0]) System.print()
}</lang>
- Output:
Misspelt word: accomodate: 0.0182 accommodate 0.1044 accordant 0.1136 accolade 0.1219 acclimate 0.1327 accompanist 0.1333 accord Misspelt word: definately: 0.0800 define 0.0850 definite 0.0886 defiant 0.1200 definitive 0.1219 designate 0.1267 deflate Misspelt word: goverment: 0.0667 govern 0.1167 governor 0.1175 governess 0.1330 governance 0.1361 coverlet 0.1367 sovereignty Misspelt word: occured: 0.0250 occurred 0.0571 occur 0.0952 occurrent 0.1056 occlude 0.1217 concurred 0.1429 cure Misspelt word: publically: 0.0800 public 0.1327 publication 0.1400 pull 0.1492 pullback Misspelt word: recieve: 0.0333 receive 0.0667 relieve 0.0762 reeve 0.0852 receptive 0.0852 recessive 0.0905 recife Misspelt word: seperate: 0.0708 desperate 0.0917 separate 0.1042 temperate 0.1167 selenate 0.1167 sept 0.1167 sewerage Misspelt word: untill: 0.0333 until 0.1111 till 0.1333 huntsville 0.1357 instill 0.1422 unital Misspelt word: wich: 0.0533 winch 0.0533 witch 0.0600 which 0.0857 wichita 0.1111 switch 0.1111 twitch