Most frequent k chars distance: Difference between revisions
Content added Content deleted
Thundergnat (talk | contribs) (→{{header|Perl6}}: Add Perl6 example) |
Thundergnat (talk | contribs) m (Reduce some of the bogus markup in the task description so the contents actually contains the entries) |
||
Line 39: | Line 39: | ||
The string distance |
The string distance |
||
;Definition |
|||
Method has two steps. |
Method has two steps. |
||
* [[wp:Hash function|Hash]] input strings str1 and str2 separately using MostFreqKHashing and output hstr1 and hstr2 respectively |
* [[wp:Hash function|Hash]] input strings str1 and str2 separately using MostFreqKHashing and output hstr1 and hstr2 respectively |
||
* Calculate string distance (or string similarity coefficient) of two hash outputs, hstr1 and hstr2 and output an integer value |
* Calculate string distance (or string similarity coefficient) of two hash outputs, hstr1 and hstr2 and output an integer value |
||
;Most Frequent K Hashing |
|||
The first step of algorithm is calculating the hashing based on the most frequent k characters. The hashing algorithm has below steps: |
The first step of algorithm is calculating the hashing based on the most frequent k characters. The hashing algorithm has below steps: |
||
'''string function''' ''MostFreqKHashing'' ('''string''' inputString, '''int''' K) |
'''string function''' ''MostFreqKHashing'' ('''string''' inputString, '''int''' K) |
||
Line 64: | Line 64: | ||
Similar to the most of [[wp:hashing function|hashing functions]], ''Most Frequent K Hashing'' is also a [[wp:one way function]]. |
Similar to the most of [[wp:hashing function|hashing functions]], ''Most Frequent K Hashing'' is also a [[wp:one way function]]. |
||
;Most Frequent K Distance |
|||
Distance calculation between two strings is based on the hash outputs of two strings. |
Distance calculation between two strings is based on the hash outputs of two strings. |
||
'''int function''' ''MostFreqKSimilarity'' ('''string''' inputStr1, '''string''' inputStr2) |
'''int function''' ''MostFreqKSimilarity'' ('''string''' inputStr1, '''string''' inputStr2) |
||
Line 77: | Line 77: | ||
In some implementations, the distance metric is required instead of similarity coefficient. In order to convert the output of above similarity coefficient to distance metric, the output can be subtracted from any constant value (like the maximum possible output value). For the case, it is also possible to implement a [[wp:wrapper function]] over above two functions. |
In some implementations, the distance metric is required instead of similarity coefficient. In order to convert the output of above similarity coefficient to distance metric, the output can be subtracted from any constant value (like the maximum possible output value). For the case, it is also possible to implement a [[wp:wrapper function]] over above two functions. |
||
;;String Distance Wrapper Function |
|||
In order to calculate the distance between two strings, below function can be implemented |
In order to calculate the distance between two strings, below function can be implemented |
||
'''int function''' MostFreqKSDF ('''string''' inputStr1, '''string''' inputStr2, '''int''' K, '''int''' maxDistance) |
'''int function''' MostFreqKSDF ('''string''' inputStr1, '''string''' inputStr2, '''int''' K, '''int''' maxDistance) |
||
Line 84: | Line 84: | ||
Any call to above string distance function will supply two input strings and a maximum distance value. The function will calculate the similarity and subtract that value from the maximum possible distance. It can be considered as a simple [[wp:additive inverse]] of similarity. |
Any call to above string distance function will supply two input strings and a maximum distance value. The function will calculate the similarity and subtract that value from the maximum possible distance. It can be considered as a simple [[wp:additive inverse]] of similarity. |
||
;Examples |
|||
Let's consider maximum 2 frequent hashing over two strings ‘research’ and ‘seeking’. |
Let's consider maximum 2 frequent hashing over two strings ‘research’ and ‘seeking’. |
||
<lang javascript>MostFreqKHashing('research',2) = 'r2e2'</lang> |
<lang javascript>MostFreqKHashing('research',2) = 'r2e2'</lang> |
||
Line 142: | Line 142: | ||
MostFreqKSDF(str1,str2,2,100) = 83 |
MostFreqKSDF(str1,str2,2,100) = 83 |
||
</pre> |
</pre> |
||
=Implementations= |
|||
=={{header|C++}}== |
=={{header|C++}}== |
||
Line 918: | Line 918: | ||
It computes the identical value on the identical inputs. |
It computes the identical value on the identical inputs. |
||
;References |
|||
{{reflist}} |
{{reflist}} |