Anonymous user
Most frequent k chars distance: Difference between revisions
Try to tidy up
(Try to tidy up) |
|||
Line 1:
{{clarify task}}
{{draft task}}{{Wikipedia}}
In [[wp:information theory|information theory]], the '''MostFreqKDistance''' is a [[wp:String metric|
| last1 = SEKER | first1 = Sadi E. | author1-link = Sadi Evren SEKER
| last2 = Altun | first2 = Oguz
Line 44 ⟶ 45:
===Most Frequent K Hashing===
The first step of algorithm is calculating the hashing based on the most frequent k characters. The hashing algorithm has below steps:
'''def string''' outputString▼
▲String function MostFreqKHashing (String inputString, int K)
▲ def string outputString
'''count''' occurrence of each character▼
▲ for each district characters
▲ count occurrence of each character
'''char''' c = '''next''' most
▲ for i := 0 to K
'''int''' count = number of occurrence of the character▼
▲ char c = next most freq ith character (if two chars have same frequency than get the first occurrence in inputString)
'''append''' to ''outputString'', c and count▼
▲ int count = number of occurrence of the character
'''end for'''▼
▲ append to outputString, c and count
'''return''' outputString▼
▲ end for
▲ return outputString
</lang>▼
Aim of 'Most Frequent K Hashing' function is calculating the most count of each character and returning the K most frequent character with the character and count. Rules of hash can be listed as below:
Line 66 ⟶ 65:
===Most Frequent K Distance===
Distance calculation between two strings is based on the hash outputs of two strings.
'''int function''' ''MostFreqKSimilarity'' (
'''def int''' similarity▼
▲int function MostFreqKSimilarity (String inputStr1, String inputStr2)
'''for each''' c = '''next''' character '''from''' inputStr1▼
▲ def int similarity
'''lookup''' c '''in''' inputStr2
▲ for each c = next character from inputStr1
similarity '''+=''' frequency of c in inputStr1 + frequency of c in inputStr2▼
'''return''' similarity▼
▲ similarity += frequency of c in inputStr1 + frequency of c in inputStr2
▲ return similarity
Above function, simply gets two input strings, previously outputted from the MostFreqKHashing function. From the most frequent k hashing function, the characters and their frequencies are returned. So, the similarity function calculates the similarity based on characters and their frequencies by checking if the same character appears on both strings and if their frequencies are equal.
Line 82 ⟶ 78:
===String Distance Wrapper Function===
In order to calculate the distance between two strings, below function can be implemented
'''int function''' MostFreqKSDF (
'''return''' maxDistance - MostFreqKSimilarity(MostFreqKHashing(inputStr1,K), MostFreqKHashing(inputStr2,K))▼
▲int function MostFreqKSDF (String inputStr1, String inputStr2, int K, int maxDistance)
▲ return maxDistance - MostFreqKSimilarity(MostFreqKHashing(inputStr1,K), MostFreqKHashing(inputStr2,K))
Any call to above string distance function will supply two input strings and a maximum distance value. The function will calculate the similarity and subtract that value from the maximum possible distance. It can be considered as a simple [[wp:additive inverse]] of similarity.
Line 91 ⟶ 85:
==Examples==
Let's consider maximum 2 frequent hashing over two strings ‘research’ and ‘seeking’.
<lang javascript>MostFreqKHashing('research',2) = 'r2e2'</lang>
because we have 2 'r' and 2 'e' characters with the highest frequency and we return in the order they appear in the string.
<lang javascript>MostFreqKHashing('seeking',2) = 'e2s1'</lang>
Again we have character 'e' with highest frequency and rest of the characters have same frequency of 1, so we return the first character of equal frequencies, which is 's'.
Finally we make the comparison:
<lang javascript>MostFreqKSimilarity('r2e2','e2s1') = 2</lang>
We simply compared the outputs and only character occurring in both input is character 'e' and the occurrence in both input is 2.
Instead running the sample step by step as above, we can simply run by using the string distance wrapper function as below:
<lang javascript>MostFreqKSDF('research', 'seeking',2) = 2</lang>
Below table holds some sample runs between example inputs for K=2:
Line 140 ⟶ 134:
Method is also suitable for bioinformatics to compare the genetic strings like in [[wp:fasta]] format
<pre>
Str1= LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
Str2 = EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
MostFreqKHashing(str1,2) = L9T8
MostFreqKHashing(str2,2) = F9L8
MostFreqKSDF(str1,str2,2,100) = 83
=Implementations=
=={{header|Java}}==
Translation of the pseudo-code of the Wikipedia article [[wp:Most frequent k characters]] to [[wp:java]] implementation of three functions given in the definition section are given below with [[wp:JavaDoc]] comments:
Line 266 ⟶ 257:
}</lang>
{{reflist}}
|