Jaro similarity: Difference between revisions

Correct distance to similarity; tried to clarify definition of transpositions as well.
m (Markjreed moved page Jaro distance to Jaro similarity: Described task calculates the similarity (1=identical) rather than the distance (0=identical))
(Correct distance to similarity; tried to clarify definition of transpositions as well.)
Line 1:
{{task}}
 
The Jaro distance is a measure of edit distance between two strings; its inverse, called the ''Jaro similarity'', is a measure of two strings' similarity: the higher the value, the more similar the strings are. The score is normalized such that   '''0'''   equates to no similarities and   '''1'''   is an exact match.
The Jaro distance is a measure of similarity between two strings.
 
The higher the Jaro distance for two strings is, the more similar the strings are.
 
The score is normalized such that   '''0'''   equates to no similarity and   '''1'''   is an exact match.
 
 
;;Definition
 
The Jaro distancesimilarity &nbsp; <math>d_j</math> &nbsp; of two given strings &nbsp; <math>s_1</math> &nbsp; and &nbsp; <math>s_2</math> &nbsp; is
 
: <math>d_j = \left\{
Line 24 ⟶ 20:
 
 
Two characters from &nbsp; <math>s_1</math> &nbsp; and &nbsp; <math>s_2</math> &nbsp; respectively, are considered ''matching'' only if they are the same and not farther apart than &nbsp; <math>\left\lfloor\frac{\max(|s_1|,|s_2|)}{2}\right\rfloor-1</math> characters.
 
Each character of &nbsp; <math>s_1</math> &nbsp; is compared with all its matching
characters in &nbsp; <math>s_2</math>.
 
Each character of &nbsp; <math>s_1</math> &nbsp; is compared with all its matching characters in &nbsp; <math>s_2</math>. Each difference in position is half a ''transposition''; that is, the number of transpositions is half the number of characters which are common to the two strings but occupy different positions in each one
The number of matching (but different sequence order) characters
divided by 2 defines the number of ''transpositions''.
 
 
Line 50 ⟶ 42:
;Task
 
Implement the Jaro-distance algorithm and show the distancessimilarity scores for each of the following pairs:
 
* ("MARTHA", "MARHTA")
1,480

edits