Text completion: Difference between revisions

→‎{{header|Raku}}: Add an alternate using Sorenson-Dice bigrams
m (→‎{{header|Phix}}: use apply()/calc once)
(→‎{{header|Raku}}: Add an alternate using Sorenson-Dice bigrams)
Line 516:
=={{header|Raku}}==
(formerly Perl 6)
===Hamming distance===
{{Trans|Java}}
<lang perl6>sub MAIN ( Str $user_word = 'complition', Str $filename = 'words.txt' ) {
Line 542 ⟶ 543:
80.00% complexion
</pre>
 
===Sorenson-Dice===
Sorenson-Dice tends to return relatively low percentages even for small differences, especially for short words. We need to "lower the bar" to get any results at all. Different variations of the algorithm do or don't regular case. This one does, though it doesn't much matter for the tested words.
 
Using unixdict.txt from www.puzzlers.org
 
<lang perl6>sub sorenson ($phrase, %hash) {
my $match = bigram $phrase.fc;
%hash.race.map: {
my $this = .value;
[(2 * +($match ∩ $this) / (+$match + $this)).round(.001), .key]
}
}
 
sub bigram (\these) {
Bag.new( flat these.words.map: { .comb.rotor(2 => -1)».join } )
}
 
 
# Load the dictionary
my %hash = './unixdict.txt'.IO.slurp.fc.words.race.map: { $_ => .&bigram };
 
# Testing
for 'complition', 'inconsqual', 'Sørenson' -> $w {
say "\n$w:";
.say for sorenson($w, %hash).grep(*.[0] >= .55).sort(-*[0]).head(10);
}</lang>
 
{{out}}
<pre>complition:
[0.778 completion]
[0.737 composition]
[0.737 competition]
[0.706 coalition]
[0.7 incompletion]
[0.667 decomposition]
[0.667 complexion]
[0.667 complicity]
[0.632 compilation]
[0.632 computation]
 
inconsqual:
[0.609 inconsequential]
[0.588 continual]
[0.571 squall]
[0.556 conceptual]
[0.556 inconstant]
 
Sørenson:
[0.714 sorenson]
[0.667 benson]
[0.615 swenson]
[0.571 sorensen]
[0.571 evensong]</pre>
 
=={{header|REXX}}==
10,327

edits