Most frequent k chars distance: Difference between revisions
m
→{{header|Wren}}: Minor tidy
Alextretyak (talk | contribs) (Added 11l) |
m (→{{header|Wren}}: Minor tidy) |
||
(8 intermediate revisions by 5 users not shown) | |||
Line 1:
{{
The string distance function (SDF) that is the subject of this entry
was proposed in a paper published in 2014 as a method for quickly
estimating how [[wp:Similarity measure|similar]] two
[[wp:Order theory|ordered sets]] or
[[wp:String (computer science)|strings]] are.
This SDF has also been proposed as being suitable in bioinformatics, e.g. for comparing sequences in the [[wp:fasta]] format.
Unfortunately, the original paper
by Sadi Evren Seker et al.
(arXiv:1401.6596 [cs.DS])
<ref name="mfkc">{{citation
| last1 = SEKER | first1 = Sadi E. | author1-link = Sadi Evren SEKER
| last2 = Altun | first2 = Oguz
Line 12 ⟶ 22:
| publisher = [[wp:International Association of Computer Science and Information Technology Press (IACSIT Press)]]
| title = [[wp:International Journal of Machine Learning and Computing (IJMLC)]]
| url =
| year = 2014}}.</ref>
has a number of internal inconsistencies and for this and other
reasons is not entirely clear in several key respects, but the paper
does give some worked examples and several of these (notably the two
given under the "Worked Examples" heading below) agree with the
interpretation used by the authors of the Python entry herein, so that
interpretation is used in the present task description.
;The Task
(a) Show a time-efficient implementation of the SDF as if by a
function mostFreqKSDF(s1, s2, k, n), where s1 and s2 are arbitrary
strings, and k and n are non-negative integers as explained below;
(b) Show the values produced by the SDF for the following values of s1 and s2,
with k = 2 and n = 10:
{|class="wikitable"
|-
!String Inputs
!"Top Two"
!SDF Output (
|-
|'night'
'nacht'
|n:1 i:1
n:1 a:1
|
|-
|'my'
'a'
|m:1 y:1
a:1
|10
|-
|‘research’
‘research’
|r:2 e:2
r:2 e:2
|
|-
|-
|‘significant’
‘capabilities’
|i:3 n:2
i:3 a:2
|
|}
(c) Show the value produced by the SDF for k = 2, n = 100,
and the two strings:
"LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV"
"EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG"
; Reference Algorithm
The original paper employs an unusual scheme for encoding the
frequencies of single letters as a string of letters and numbers. This
scheme has several disadvantages, and so in the following we take a
more abstract view and simply assume, solely for the purposes
of this description, that two functions are available:
1) MostFreqK(s, k) returns (in decreasing order of their frequency of occurrence)
the k most frequently occurring letters in s, with
any ties broken by the position of the first occurrence in s;
2) For any letter c in MostFreqK(s, k), Freq(s,c) returns the frequency
of c in s.
; MFKsimilarity
Next define MFKsimilarlity(s1, s2, k) as if by the following pseudo-code:
<pre>
int function MFKsimilarity(string s1, string s2, int k):
let similarity = 0
let mfk1 = MostFreqK(s1, k)
let mfk2 = MostFreqK(s2, k)
for each c in mfk1
if (c occurs in mfk2)
then similarity += Freq(s1, c) + Freq(s2, c)
end
return similarity
</pre>
; String Distance Wrapper Function
Now define MostFreqKSDF as if by:
<pre>
int function MostFreqKSDF(string s1, string s2, int k, int n)
return n - MFKsimilarity(s1, s2, k)
</pre>
; Worked Examples:
The original paper gives several worked examples, including
these two:
(i) MostFreqKSDF("night", "nacht", 2, 10)
For "night", the "top two" letters with their frequencies are: n:1 and i:1.
For "nacht", the "top two" letters with their frequencies are: n:1 and a:1.
Since "n" is the only shared letter amongst these candidates, the
SDF is 10 - 1 - 1 = 8, as per the original paper.
(2)
MostFreqKSDF(
"LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV",
"EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG",
2, 100)
For the first string, the "top two" letters with their frequencies are: L:9 and T:8;
and for the second, the "top two" are F:9 and L:8.
Since "L" is the only shared letter amongst these top-k characters, and since the sum of the corresponding frequencies is 9+8 = 17, the SDF is 83.
<br/>
=={{header|11l}}==
{{trans|Python}}
<
DefaultDict[Char, Int] occuDict
L(c) inputString
Line 175:
print(dict2, end' ":\n")
print(dict2.map((c, cnt) -> c‘’String(cnt)).join(‘’))
print(most_freq_ksdf(str1, str2, K, maxDistance))</
{{out}}
Line 187:
=={{header|C++}}==
<
#include <vector>
#include <map>
Line 242:
return 0 ;
}
</syntaxhighlight>
{{out}}
<pre>MostFreqKHashing( s1 , 2 ) = L9T8
Line 250:
=={{header|Go}}==
{{trans|Kotlin}}
<
import (
Line 368:
s2 = reverseStr(s1)
mostFreqKSDF(s1, s2, 2, 100)
}</
{{out}}
Line 422:
=={{header|Haskell}}==
<
where
import Data.List ( nub , sortBy )
Line 461:
mostFreqKSDF :: String -> String -> Int ->Int
mostFreqKSDF s t n = mostFreqKSimilarity ( mostFreqKHashing s n ) (mostFreqKHashing t n )
</syntaxhighlight>
{{out}}
<pre>mostFrequentKHashing "LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV" 2
Line 470:
=={{header|J}}==
'''Solution''':<
mfksDF =: {:@:[ - (mfks@:(mfkh&.>)~ {.)~
Line 487:
NB. No need to fix mfksDF
mfkh =: mfkh f.
mfks =: mfks f.</
'''Examples''':<
fkh =. ;@:,@:(":&.>) NB. format k hash
Line 516:
'pass'
)
pass</
'''Notes''': As of press time, there are significant discrepancies between the task description, its pseudocode, the test cases provided, and the two other existing implementations. See the [[Talk:Most frequent k chars distance#Prank_Page.3F|talk page]] for the assumptions made in this implementation to reconcile these discrepancies (in particular, in the scoring function).
Line 523:
Translation of the pseudo-code of the Wikipedia article [[wp:Most frequent k characters]] to [[wp:java]] implementation of three functions given in the definition section are given below with [[wp:JavaDoc]] comments:
<
import java.util.Comparator;
import java.util.HashMap;
Line 649:
}
}</
Line 655:
Defines the functions for counting and frequency outside of the hashing function, as it is easier to calculate the comparison from a structured result rather than parsing a hash.
<
const kCounts = str => {
const counts = {};
Line 703:
const mostFreqKSDF = (str1, str2, k, maxDistance) => {
return maxDistance - mostFreqKSimilarity(str1, str2, k);
};</
'''Testing with Jest'''
<
expect(mostFreqKHashing(str1, 2)).toBe("L9T8");
});
Line 716:
test("SDF of strings 1 and 2 with k=2 and max=100 is 83", () => {
expect(mostFreqKSDF(str1, str2, 2, 100)).toBe(83);
});</
'''Testing in Console'''
<
const str2 = "EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG";
const K = 2;
Line 725:
console.log(mostFreqKHashing(str1, K));
console.log(mostFreqKHashing(str2, K));
console.log(mostFreqKSDF(str1, str2, K, maxDistance));</
{{out}}
Line 731:
F9L8
83</pre>
=={{header|jq}}==
{{works with|jq}}
'''Works with gojq, the Go implementation of jq'''
'''Preliminaries'''
<syntaxhighlight lang="jq"># bag of words
def bow(stream):
reduce stream as $word ({}; .[($word|tostring)] += 1);
# Like sort_by(f) but for items that compare equal, retain the original order
def sort_by_decreasing(f):
# Add $n-$i
def enumerate: . as $in | length as $n | reduce range(0;$n) as $i ([]; . + [ [$n-$i, $in[$i] ] ]);
enumerate
| sort_by((.[1]|f), .[0])
| reverse
| map(.[1]);</syntaxhighlight>
'''The Tasks'''
<syntaxhighlight lang="jq"># Output: { characters: array_of_characters_in_decreasing_order_of_frequency, frequency: object}
def MostFreqKHashing($K):
def chars: explode | range(0;length) as $i | [.[$i]] | implode;
. as $in
| bow(tostring | chars) as $bow
| $bow
| to_entries
| sort_by_decreasing(.value) # if two chars have same frequency than get the first occurrence in $in
| (reduce .[0:$K][] as $kv ({}; .[$kv.key] = $kv.value) ) as $frequency
| {characters: map(.key)[:$K], $frequency};
def MostFreqKSimilarity($in1; $in2; $K):
[$in1, $in2] | map( MostFreqKHashing($K)) as [$s1, $s2]
| reduce $s1.characters[] as $c (0;
$s2.frequency[$c] as $f
| if $f then . + $s1.frequency[$c] + $f
else . end) ;
def MostFreqKSDF($inputStr1; $inputStr2; $K; $maxDistance):
$maxDistance - MostFreqKSimilarity($inputStr1; $inputStr2; $K);
def MostFreqKSDF($K; $maxDistance):
. as [$inputStr1, $inputStr2]
| $maxDistance - MostFreqKSimilarity($inputStr1; $inputStr2; $K);
def task2:
["night", "nacht"],
["my", "a"],
["research", "research"],
["research", "seeking"],
["significant", "capabilities"]
| MostFreqKSDF(2; 10) as $sdk
| [., $sdk] ;
def task100:
["LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV",
"EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG"]
| MostFreqKSDF(2; 100);
task2, task100</syntaxhighlight>
{{out}}
<pre>
[["night","nacht"],8]
[["my","a"],10]
[["research","research"],2]
[["research","seeking"],6]
[["significant","capabilities"],4]
83
</pre>
=={{header|Kotlin}}==
Line 740 ⟶ 808:
I have no idea what NULL0 is supposed to mean so I've ignored that.
<
fun mostFreqKHashing(input: String, k: Int): String =
Line 793 ⟶ 861:
s2 = s1.reversed()
mostFreqKSDF(s1, s2, 2, 100)
}</
{{out}}
Line 845 ⟶ 913:
SDF(input1, input2, 2, 100) = 88
</pre>
=={{header|Nim}}==
It is impossible to get consistent results. We chose to implement Wikipedia algorithm (note that the page is no longer available, except in Internet Archive). With this algorithm, we get the same results as those of Wikipedia except for the last example where we get 91 instead of 83.
In order to avoid the limitation to 10 occurrences, we chose to use an ordered table (equivalent to Python OrderedDict). We could have used Kotlin solution or used a sequence of tuples (char, count) but the ordered table gives better performance.
<syntaxhighlight lang="nim">import algorithm, sugar, tables
type CharCounts = OrderedTable[char, int]
func `$`(counts: CharCounts): string =
for (c, count) in counts.pairs:
result.add $c & $count
func mostFreqKHashing(str: string; k: Positive): CharCounts =
var counts: CharCounts # To get the counts in apparition order.
for c in str: inc counts.mgetOrPut(c, 0)
counts.sort((x, y) => cmp(x[1], y[1]), Descending) # Note that sort is stable.
var count = 0
for c, val in counts.pairs:
inc count
result[c] = val
if count == k: break
func mostFreqKSimilarity(input1, input2: CharCounts): int =
for c, count in input1.pairs:
if c in input2:
result += count
func mostFreqKSDF(str1, str2: string; k, maxDistance: Positive): int =
maxDistance - mostFreqKSimilarity(mostFreqKHashing(str1, k), mostFreqKHashing(str2, k))
const
Pairs = [("night", "nacht"),
("my", "a"),
("research", "research"),
("aaaaabbbb", "ababababa"),
("significant", "capabilities")]
for (str1, str2) in Pairs:
echo "str1: ", str1
echo "str2: ", str2
echo "mostFreqKHashing(str1, 2) = ", mostFreqKHashing(str1, 2)
echo "mostFreqKHashing(str2, 2) = ", mostFreqKHashing(str2, 2)
echo "mostFreqKSDF(str1, str2, 2, 10) = ", mostFreqKSDF(str1, str2, 2, 10)
echo()
const
S1 = "LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV"
S2 = "EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG"
echo "str1: ", S1
echo "str2: ", S2
echo "mostFreqKHashing(str1, 2) = ", mostFreqKHashing(S1, 2)
echo "mostFreqKHashing(str2, 2) = ", mostFreqKHashing(S2, 2)
echo "mostFreqKSDF(str1, str2, 2, 100) = ", mostFreqKSDF(S1, S2, 2, 100)</syntaxhighlight>
{{out}}
<pre>str1: night
str2: nacht
mostFreqKHashing(str1, 2) = n1i1
mostFreqKHashing(str2, 2) = n1a1
mostFreqKSDF(str1, str2, 2, 10) = 9
str1: my
str2: a
mostFreqKHashing(str1, 2) = m1y1
mostFreqKHashing(str2, 2) = a1
mostFreqKSDF(str1, str2, 2, 10) = 10
str1: research
str2: research
mostFreqKHashing(str1, 2) = r2e2
mostFreqKHashing(str2, 2) = r2e2
mostFreqKSDF(str1, str2, 2, 10) = 6
str1: aaaaabbbb
str2: ababababa
mostFreqKHashing(str1, 2) = a5b4
mostFreqKHashing(str2, 2) = a5b4
mostFreqKSDF(str1, str2, 2, 10) = 1
str1: significant
str2: capabilities
mostFreqKHashing(str1, 2) = i3n2
mostFreqKHashing(str2, 2) = i3a2
mostFreqKSDF(str1, str2, 2, 10) = 7
str1: LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
str2: EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
mostFreqKHashing(str1, 2) = L9T8
mostFreqKHashing(str2, 2) = F9L8
mostFreqKSDF(str1, str2, 2, 100) = 91</pre>
=={{header|Perl}}==
<
use strict ;
use warnings ;
Line 897 ⟶ 1,060:
print "MostFreqKHashing ( " . '$firststring , 2)' . " is " . mostFreqHashing( $firststring , 2 ) . "\n" ;
print "MostFreqKHashing ( " . '$secondstring , 2)' . " is " . mostFreqHashing( $secondstring , 2 ) . "\n" ;
</syntaxhighlight>
{{out}}
<pre>MostFreqKHashing ( $firststring , 2) is L9T8
Line 904 ⟶ 1,067:
=={{header|Phix}}==
{{trans|Go}}
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #008080;">with</span> <span style="color: #008080;">javascript_semantics</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">mostFreqKHashing</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">input</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">integer</span> <span style="color: #000000;">k</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">string</span> <span style="color: #000000;">cfs</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">""</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">cfsnx</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{}</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">input</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">r</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">input</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">],</span>
<span style="color: #000000;">ix</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">find</span><span style="color: #0000FF;">(</span><span style="color: #000000;">r</span><span style="color: #0000FF;">,</span><span style="color: #000000;">cfs</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">ix</span><span style="color: #0000FF;">></span><span style="color: #000000;">0</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">cfsnx</span><span style="color: #0000FF;">[</span><span style="color: #000000;">ix</span><span style="color: #0000FF;">][</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">-=</span> <span style="color: #000000;">1</span>
<span style="color: #008080;">else</span>
<span style="color: #000000;">cfs</span> <span style="color: #0000FF;">&=</span> <span style="color: #000000;">r</span>
<span style="color: #000000;">cfsnx</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">append</span><span style="color: #0000FF;">(</span><span style="color: #000000;">cfsnx</span><span style="color: #0000FF;">,{-</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">cfs</span><span style="color: #0000FF;">)})</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #000000;">cfsnx</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">sort</span><span style="color: #0000FF;">(</span><span style="color: #000000;">cfsnx</span><span style="color: #0000FF;">)</span> <span style="color: #000080;font-style:italic;">-- (aside: the idx forces stable sort)</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">acc</span> <span style="color: #0000FF;">:=</span> <span style="color: #0000FF;">{}</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">min</span><span style="color: #0000FF;">(</span><span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">cfs</span><span style="color: #0000FF;">),</span><span style="color: #000000;">k</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">integer</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">n</span><span style="color: #0000FF;">,</span><span style="color: #000000;">ix</span><span style="color: #0000FF;">}</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">cfsnx</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span>
<span style="color: #000000;">acc</span> <span style="color: #0000FF;">&=</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">cfs</span><span style="color: #0000FF;">[</span><span style="color: #000000;">ix</span><span style="color: #0000FF;">],</span> <span style="color: #0000FF;">-</span><span style="color: #000000;">n</span><span style="color: #0000FF;">}</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">acc</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">mostFreqKSimilarity</span><span style="color: #0000FF;">(</span><span style="color: #004080;">sequence</span> <span style="color: #000000;">input1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">input2</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">similarity</span> <span style="color: #0000FF;">:=</span> <span style="color: #000000;">0</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">input1</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">by</span> <span style="color: #000000;">2</span> <span style="color: #008080;">do</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">j</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">input2</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">by</span> <span style="color: #000000;">2</span> <span style="color: #008080;">do</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">input1</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> <span style="color: #0000FF;">==</span> <span style="color: #000000;">input2</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">]</span> <span style="color: #008080;">then</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">freq1</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">input1</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">+</span><span style="color: #000000;">1</span><span style="color: #0000FF;">],</span>
<span style="color: #000000;">freq2</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">input2</span><span style="color: #0000FF;">[</span><span style="color: #000000;">j</span><span style="color: #0000FF;">+</span><span style="color: #000000;">1</span><span style="color: #0000FF;">]</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">freq1</span><span style="color: #0000FF;">=</span><span style="color: #000000;">freq2</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">similarity</span> <span style="color: #0000FF;">+=</span> <span style="color: #000000;">freq1</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">similarity</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">function</span> <span style="color: #000000;">flat</span><span style="color: #0000FF;">(</span><span style="color: #004080;">sequence</span> <span style="color: #000000;">s</span><span style="color: #0000FF;">)</span>
<span style="color: #004080;">string</span> <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">""</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">by</span> <span style="color: #000000;">2</span> <span style="color: #008080;">do</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">&=</span> <span style="color: #7060A8;">sprintf</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"%c%d"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">s</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">..</span><span style="color: #000000;">i</span><span style="color: #0000FF;">+</span><span style="color: #000000;">1</span><span style="color: #0000FF;">])</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">res</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">procedure</span> <span style="color: #000000;">mostFreqKSDF</span><span style="color: #0000FF;">(</span><span style="color: #004080;">string</span> <span style="color: #000000;">input1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">input2</span><span style="color: #0000FF;">,</span> <span style="color: #004080;">integer</span> <span style="color: #000000;">k</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">maxDistance</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"input1 : %s\n"</span><span style="color: #0000FF;">,</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">input1</span><span style="color: #0000FF;">})</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"input2 : %s\n"</span><span style="color: #0000FF;">,</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">input2</span><span style="color: #0000FF;">})</span>
<span style="color: #004080;">sequence</span> <span style="color: #000000;">s1</span> <span style="color: #0000FF;">:=</span> <span style="color: #000000;">mostFreqKHashing</span><span style="color: #0000FF;">(</span><span style="color: #000000;">input1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">k</span><span style="color: #0000FF;">),</span>
<span style="color: #000000;">s2</span> <span style="color: #0000FF;">:=</span> <span style="color: #000000;">mostFreqKHashing</span><span style="color: #0000FF;">(</span><span style="color: #000000;">input2</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">k</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"mfkh(input1, %d) = %s\n"</span><span style="color: #0000FF;">,</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">k</span><span style="color: #0000FF;">,</span><span style="color: #000000;">flat</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s1</span><span style="color: #0000FF;">)})</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"mfkh(input2, %d) = %s\n"</span><span style="color: #0000FF;">,</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">k</span><span style="color: #0000FF;">,</span><span style="color: #000000;">flat</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s2</span><span style="color: #0000FF;">)})</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">result</span> <span style="color: #0000FF;">:=</span> <span style="color: #000000;">maxDistance</span> <span style="color: #0000FF;">-</span> <span style="color: #000000;">mostFreqKSimilarity</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">s2</span><span style="color: #0000FF;">)</span>
<span style="color: #7060A8;">printf</span><span style="color: #0000FF;">(</span><span style="color: #000000;">1</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"SDF(input1, input2, %d, %d) = %d\n\n"</span><span style="color: #0000FF;">,</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">k</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">maxDistance</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">result</span><span style="color: #0000FF;">})</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">procedure</span>
<span style="color: #008080;">constant</span> <span style="color: #000000;">tests</span> <span style="color: #0000FF;">=</span> <span style="color: #0000FF;">{{</span><span style="color: #008000;">"research"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"seeking"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"night"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"nacht"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"my"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"a"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"research"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"research"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"aaaaabbbb"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"ababababa"</span><span style="color: #0000FF;">},</span>
<span style="color: #0000FF;">{</span><span style="color: #008000;">"significant"</span><span style="color: #0000FF;">,</span> <span style="color: #008000;">"capabilities"</span><span style="color: #0000FF;">}}</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">tests</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #004080;">string</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">t1</span><span style="color: #0000FF;">,</span><span style="color: #000000;">t2</span><span style="color: #0000FF;">}</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">tests</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span>
<span style="color: #000000;">mostFreqKSDF</span><span style="color: #0000FF;">(</span><span style="color: #000000;">t1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">t2</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">2</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">10</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #004080;">string</span> <span style="color: #000000;">s1</span> <span style="color: #0000FF;">:=</span> <span style="color: #008000;">"LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV"</span><span style="color: #0000FF;">,</span>
<span style="color: #000000;">s2</span> <span style="color: #0000FF;">:=</span> <span style="color: #008000;">"EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG"</span>
<span style="color: #000000;">mostFreqKSDF</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">s2</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">2</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">100</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">s1</span> <span style="color: #0000FF;">=</span> <span style="color: #008000;">"abracadabra12121212121abracadabra12121212121"</span>
<span style="color: #000000;">s2</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">reverse</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s1</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">mostFreqKSDF</span><span style="color: #0000FF;">(</span><span style="color: #000000;">s1</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">s2</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">2</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">100</span><span style="color: #0000FF;">)</span>
<!--</syntaxhighlight>-->
Output matches Go and Kotlin.
Line 983 ⟶ 1,149:
{{works with|Python|2.7+}}
'''unoptimized and limited'''
<
def MostFreqKHashing(inputString, K):
occuDict = collections.defaultdict(int)
Line 1,006 ⟶ 1,172:
def MostFreqKSDF(inputStr1, inputStr2, K, maxDistance):
return maxDistance - MostFreqKSimilarity(MostFreqKHashing(inputStr1,K), MostFreqKHashing(inputStr2,K))</
'''optimized'''
A version that replaces the intermediate string with OrderedDict to reduce the time complexity of lookup operation:
<
def MostFreqKHashing(inputString, K):
occuDict = collections.defaultdict(int)
Line 1,031 ⟶ 1,197:
def MostFreqKSDF(inputStr1, inputStr2, K, maxDistance):
return maxDistance - MostFreqKSimilarity(MostFreqKHashing(inputStr1,K), MostFreqKHashing(inputStr2,K))</
Test:
<
str2 = "EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG"
K = 2
Line 1,043 ⟶ 1,209:
print("%s:"%dict2)
print(''.join(c + str(cnt) for c, cnt in dict2.items()))
print(MostFreqKSDF(str1, str2, K, maxDistance))</
{{out}}
<pre>
Line 1,055 ⟶ 1,221:
=={{header|Racket}}==
<
(define (MostFreqKHashing inputString K)
Line 1,082 ⟶ 1,248:
;; (Should add more tests, but it looks like there's a bunch of mistakes
;; in the given tests...)
</syntaxhighlight>
=={{header|Raku}}==
Line 1,093 ⟶ 1,259:
Maybe I am too hasty though. Lets give it a try. Implement a MFKC routine and run an assortment of words through it to get a feel for how it hashes different words.
<syntaxhighlight lang="raku"
# which can be joined to make a string or manipulated further.
Line 1,131 ⟶ 1,297:
>;
say @test-words.map( { join '', mfkh($_)».kv } ).Bag;</
{{out}}
<pre>Bag(i2n2(151))</pre>
Line 1,137 ⟶ 1,303:
=={{header|Ring}}==
<
# Project : Most frequent k chars distance
Line 1,181 ⟶ 1,347:
next
return aList
</syntaxhighlight>
Output:
<pre>
Line 1,191 ⟶ 1,357:
=={{header|Sidef}}==
<
var seen = Hash()
Line 1,271 ⟶ 1,437:
MostFreqKSDF(s, t, k, limit),
)
}</
{{out}}
<pre>
Line 1,290 ⟶ 1,456:
=={{header|Tcl}}==
{{works with|Tcl|8.6}}
<
proc MostFreqKHashing {inputString k} {
Line 1,315 ⟶ 1,481:
set h2 [MostFreqKHashing $inputStr2 $k]
expr {$limit - [MostFreqKSimilarity $h1 $h2]}
}</
Demonstrating:
<
set str2 "EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG"
puts [MostFreqKHashing $str1 2]
puts [MostFreqKHashing $str2 2]
puts [MostFreqKSDF $str1 $str2 2 100]</
{{out}}
<pre>
Line 1,330 ⟶ 1,496:
;A more efficient metric calculator
This version is appreciably more efficient because it does not compute the intermediate string representation “hash”, instead working directly on the intermediate dictionaries and lists:
<
set c1 [set c2 {}]
foreach ch [split $inputStr1 ""] {dict incr c1 $ch}
Line 1,342 ⟶ 1,508:
}
return [expr {$limit - $s}]
}</
It computes the identical value on the identical inputs.
=={{header|Typescript}}==
{{trans|Javascript}} with added type annotations
<
const kCounts = (str: string): Record<string, number> => {
const counts: Record<string, number> = {};
Line 1,395 ⟶ 1,561:
const mostFreqKSDF = ( str1: string, str2: string, k: number, maxDistance: number ): number => {
return maxDistance - mostFreqKSimilarity(str1, str2, k);
};</
=={{header|Wren}}==
Line 1,401 ⟶ 1,567:
{{libheader|Wren-seq}}
{{libheader|Wren-sort}}
{{libheader|Wren-
<
import "./sort" for Sort
import "./
var mostFreqKHashing = Fn.new { |input, k|
Line 1,463 ⟶ 1,629:
s1 = "abracadabra12121212121abracadabra12121212121"
s2 = s1[-1..0]
mostFreqKSDF.call(s1, s2, 2, 100)</
{{out}}
|