Most frequent k chars distance: Difference between revisions

Content added Content deleted
(→‎{{header|Perl6}}: Add Perl6 example)
m (Reduce some of the bogus markup in the task description so the contents actually contains the entries)
Line 39: Line 39:
The string distance
The string distance


==Definition==
;Definition
Method has two steps.
Method has two steps.
* [[wp:Hash function|Hash]] input strings str1 and str2 separately using MostFreqKHashing and output hstr1 and hstr2 respectively
* [[wp:Hash function|Hash]] input strings str1 and str2 separately using MostFreqKHashing and output hstr1 and hstr2 respectively
* Calculate string distance (or string similarity coefficient) of two hash outputs, hstr1 and hstr2 and output an integer value
* Calculate string distance (or string similarity coefficient) of two hash outputs, hstr1 and hstr2 and output an integer value


===Most Frequent K Hashing===
;Most Frequent K Hashing
The first step of algorithm is calculating the hashing based on the most frequent k characters. The hashing algorithm has below steps:
The first step of algorithm is calculating the hashing based on the most frequent k characters. The hashing algorithm has below steps:
'''string function''' ''MostFreqKHashing'' ('''string''' inputString, '''int''' K)
'''string function''' ''MostFreqKHashing'' ('''string''' inputString, '''int''' K)
Line 64: Line 64:
Similar to the most of [[wp:hashing function|hashing functions]], ''Most Frequent K Hashing'' is also a [[wp:one way function]].
Similar to the most of [[wp:hashing function|hashing functions]], ''Most Frequent K Hashing'' is also a [[wp:one way function]].


===Most Frequent K Distance===
;Most Frequent K Distance
Distance calculation between two strings is based on the hash outputs of two strings.
Distance calculation between two strings is based on the hash outputs of two strings.
'''int function''' ''MostFreqKSimilarity'' ('''string''' inputStr1, '''string''' inputStr2)
'''int function''' ''MostFreqKSimilarity'' ('''string''' inputStr1, '''string''' inputStr2)
Line 77: Line 77:
In some implementations, the distance metric is required instead of similarity coefficient. In order to convert the output of above similarity coefficient to distance metric, the output can be subtracted from any constant value (like the maximum possible output value). For the case, it is also possible to implement a [[wp:wrapper function]] over above two functions.
In some implementations, the distance metric is required instead of similarity coefficient. In order to convert the output of above similarity coefficient to distance metric, the output can be subtracted from any constant value (like the maximum possible output value). For the case, it is also possible to implement a [[wp:wrapper function]] over above two functions.


===String Distance Wrapper Function===
;;String Distance Wrapper Function
In order to calculate the distance between two strings, below function can be implemented
In order to calculate the distance between two strings, below function can be implemented
'''int function''' MostFreqKSDF ('''string''' inputStr1, '''string''' inputStr2, '''int''' K, '''int''' maxDistance)
'''int function''' MostFreqKSDF ('''string''' inputStr1, '''string''' inputStr2, '''int''' K, '''int''' maxDistance)
Line 84: Line 84:
Any call to above string distance function will supply two input strings and a maximum distance value. The function will calculate the similarity and subtract that value from the maximum possible distance. It can be considered as a simple [[wp:additive inverse]] of similarity.
Any call to above string distance function will supply two input strings and a maximum distance value. The function will calculate the similarity and subtract that value from the maximum possible distance. It can be considered as a simple [[wp:additive inverse]] of similarity.


==Examples==
;Examples
Let's consider maximum 2 frequent hashing over two strings ‘research’ and ‘seeking’.
Let's consider maximum 2 frequent hashing over two strings ‘research’ and ‘seeking’.
<lang javascript>MostFreqKHashing('research',2) = 'r2e2'</lang>
<lang javascript>MostFreqKHashing('research',2) = 'r2e2'</lang>
Line 142: Line 142:
MostFreqKSDF(str1,str2,2,100) = 83
MostFreqKSDF(str1,str2,2,100) = 83
</pre>
</pre>

=Implementations=


=={{header|C++}}==
=={{header|C++}}==
Line 918: Line 918:
It computes the identical value on the identical inputs.
It computes the identical value on the identical inputs.


=References=
;References
{{reflist}}
{{reflist}}