Text completion: Difference between revisions
(→{{header|REXX}}: added the computer programming language REXX.) |
|||
Line 106:
Process finished with exit code 0
</pre>
=={{header|Julia}}==
See https://en.wikipedia.org/wiki/Levenshtein_distance, the number of one character edits to obtain one word from another.
<lang julia>using StringDistances
const fname = download("https://www.mit.edu/~ecprice/wordlist.10000", "wordlist10000.txt")
const words = read(fname, String) |> split .|> strip .|> string
const wrd = "complition"
levdistof(n, string) = filter(w -> Levenshtein()(string, w) == n, words)
for n in 1:4
println("Words at Levenshtein distance of $n from \"$wrd\": ", levdistof(n, wrd), "\n")
end
</lang>{{out}}
<pre>
Words at Levenshtein distance of 1 from "complition": ["completion"]
Words at Levenshtein distance of 2 from "complition": ["coalition", "competition", "compilation", "composition"]
Words at Levenshtein distance of 3 from "complition": ["companion", "competitions", "completing", "complications", "computation", "condition"]
Words at Levenshtein distance of 4 from "complition": ["collection", "combination", "commission", "comparison", "compensation", "competing", "competitive", "complaint", "complete", "completed", "completely", "complexity", "compliance", "compliant", "compression", "computing", "conclusion", "conditions", "connection", "convention", "conviction", "cooperation", "corporation", "correction", "correlation", "corruption", "nomination", "opinion", "opposition", "option", "pollution", "population", "position", "simulation", "solution"]
</pre>
=={{header|Raku}}==
|
Revision as of 10:01, 29 July 2020
- Task
Write a program that takes in a user inputted word and prints out possible words that are valid in the English dictionary. Please state any dictionaries or files/binaries/dependencies used in your program. Do show the similarity of the inputted word and outcome as a percentage. Any algorithm can be used to accomplish this task.
- Resources
Github Repo
Raw Text, Save as .txt file
Hamming Distance
Jaro-Winkler Distance
SoundEx Algorithm
SoundEx Algorithm Wiki
Dice's Coefficient
Dice Coefficient Wiki
- Possible Output
Input word: complition compaction : 80.00% similar. completion : 90.00% similar. completions : 81.82% similar. complexion : 80.00% similar.
- Extension
- How can you make the accuracy of your program higher?
Java
Github Repo Uses dependencies given. <lang Java> import java.io.File; import java.io.IOException; import java.net.URISyntaxException; import java.util.ArrayList; import java.util.Scanner;
//uses https://github.com/dwyl/english-words
public class textCompletionConcept {
public static int correct = 0; public static ArrayList<String> listed = new ArrayList<>(); public static void main(String[]args) throws IOException, URISyntaxException { Scanner input = new Scanner(System.in); System.out.println("Input word: "); String errorRode = input.next(); File file = new File(new File(textCompletionConcept.class.getProtectionDomain().getCodeSource().getLocation().toURI()).getPath() + File.separator + "words.txt"); Scanner reader = new Scanner(file); while(reader.hasNext()){ double percent; String compareToThis = reader.nextLine(); char[] s1 = errorRode.toCharArray(); char[] s2 = compareToThis.toCharArray(); int maxlen = Math.min(s1.length, s2.length); for (int index = 0; index < maxlen; index++) { String x = String.valueOf(s1[index]); String y = String.valueOf(s2[index]); if (x.equals(y)) { correct++; } } double length = Math.max(s1.length, s2.length); percent = correct / length; percent *= 100; boolean perfect = false; if (percent >= 80 && compareToThis.charAt(0) == errorRode.charAt(0)) { if(String.valueOf(percent).equals("100.00")){ perfect = true; } String addtoit = compareToThis + " : " + String.format("%.2f", percent) + "% similar."; listed.add(addtoit); } if(compareToThis.contains(errorRode) && !perfect && errorRode.length() * 2 > compareToThis.length()){ String addtoit = compareToThis + " : 80.00% similar."; listed.add(addtoit); } correct = 0; }
for(String x : listed){ if(x.contains("100.00% similar.")){ System.out.println(x); listed.clear(); break; } }
for(String x : listed){ System.out.println(x); } }
} </lang>
- Output
Input word: complition compaction : 80.00% similar. completion : 90.00% similar. completions : 81.82% similar. complexion : 80.00% similar. Process finished with exit code 0
Julia
See https://en.wikipedia.org/wiki/Levenshtein_distance, the number of one character edits to obtain one word from another. <lang julia>using StringDistances
const fname = download("https://www.mit.edu/~ecprice/wordlist.10000", "wordlist10000.txt") const words = read(fname, String) |> split .|> strip .|> string const wrd = "complition"
levdistof(n, string) = filter(w -> Levenshtein()(string, w) == n, words)
for n in 1:4
println("Words at Levenshtein distance of $n from \"$wrd\": ", levdistof(n, wrd), "\n")
end
</lang>
- Output:
Words at Levenshtein distance of 1 from "complition": ["completion"] Words at Levenshtein distance of 2 from "complition": ["coalition", "competition", "compilation", "composition"] Words at Levenshtein distance of 3 from "complition": ["companion", "competitions", "completing", "complications", "computation", "condition"] Words at Levenshtein distance of 4 from "complition": ["collection", "combination", "commission", "comparison", "compensation", "competing", "competitive", "complaint", "complete", "completed", "completely", "complexity", "compliance", "compliant", "compression", "computing", "conclusion", "conditions", "connection", "convention", "conviction", "cooperation", "corporation", "correction", "correlation", "corruption", "nomination", "opinion", "opposition", "option", "pollution", "population", "position", "simulation", "solution"]
Raku
(formerly Perl 6)
<lang perl6>sub MAIN ( Str $user_word = 'complition', Str $filename = 'words.txt' ) {
my @s1 = $user_word.comb; my @listed = gather for $filename.IO.lines -> $line { my @s2 = $line.comb;
my $correct = 100 * sum( @s1 Zeq @s2) / max(+@s1, +@s2);
my $score = ( $correct >= 100 and @s1[0] eq @s2[0] ) ?? 100 !! ( $correct >= 80 and @s1[0] eq @s2[0] ) ?? $correct !! ( $line.contains($user_word) and @s1 * 2 > @s2 ) ?? 80 !! 0; take [$score, $line] if $score; }
@listed = @listed[$_] with @listed.first: :k, { .[0] == 100 };
say "{.[0].fmt('%.2f')}% {.[1]}" for @listed;
}</lang>
- Output:
80.00% compaction 90.00% completion 81.82% completions 80.00% complexion
REXX
<lang rexx>/*REXX pgm finds (dictionary) words which can be found in a specified word wheel (grid).*/ parse arg what iFID . /*obtain optional arguments from the CL*/ if what==|what=="," then what= 'complition' /*Not specified? Then use the default.*/ if iFID==|iFID=="," then iFID= 'UNIXDICT.TXT' /* " " " " " " */ @abc= 'abcdefghijklmnopqrstuvwxyz' /*(Latin) lowercase letters to be used.*/ L= length(@abc) /* " " " the Latin letters. */ wrds= 0 /*# words that are in the dictionary. */ dups= 0 /*" " " " duplicates. */ ills= 0 /*" " " contain "not" letters.*/ say ' Reading the file: ' iFID /*align the text. */ @.= . /*non─duplicated dictionary words. */ $= /*the list of dictionary words in grid.*/
do recs=0 while lines(iFID)\==0 /*process all words in the dictionary. */ x= space( linein(iFID), 0) /*elide any blanks in the dictinary. */ if @.x\==. then do; dups= dups+1; iterate; end /*is this a duplicate? */ if \datatype(x,'M') then do; ills= ills+1; iterate; end /*has word non─letters? */ @.x= /*signify that X is a dictionary word*/ wrds= wrds + 1 /*bump the number of "good" dist. words*/ end /*recs*/
a= say ' number of records (words) in the dictionary: ' right( commas(recs), 9) say ' number of ill─formed words in the dictionary: ' right( commas(ills), 9) say ' number of duplicate words in the dictionary: ' right( commas(dups), 9) say ' number of acceptable words in the dictionary: ' right( commas(wrds), 9) say ' the "word" to be used for text completion: ' what say call del what; a= a result; call del what,1; a= a result call ins what; a= a result; call ins what,1; a= a result call sub what; a= a result; call sub what,1; a= a result call prune
- = words($)
say commas(#) ' similar words found:'
do j=1 for #; _= word($, j); say right( count(_,what), 24) _ end /*j*/
exit # /*stick a fork in it, we're all done. */ /*──────────────────────────────────────────────────────────────────────────────────────*/ commas: parse arg _; do ?=length(_)-3 to 1 by -3; _= insert(',', _, ?); end; return _ prune: do k=1 for words(a); _= word(a,k); if wordpos(_,$)==0 then $= $ _; end; return recur: $= $ del(z); $= $ ins(z); $= $ sub(z); return /*──────────────────────────────────────────────────────────────────────────────────────*/ count: procedure; parse arg x,y; cnt= 0; w= length(x)
do j=1 for w; p= pos( substr(x, j, 1), y); if p==0 then iterate y= overlay(., y, p); cnt= cnt + 1 end /*j*/ return ' ' left("("format(cnt/w*100,,2)/1'%)', 9) /*express as a percent.*/
/*──────────────────────────────────────────────────────────────────────────────────────*/ del: procedure expose @. @abc L; parse arg y,r; $=
do j=1 for length(y); z= space(left(y,j-1) || substr(y,j+1), 0) if @.z\==. then $= $ z; if r==1 then call recur end /*j*/; return space($)
/*──────────────────────────────────────────────────────────────────────────────────────*/ ins: procedure expose @. @abc L; parse arg y,r; $=
do j=1 for length(y) do k=1 for L; z= space(left(y,j-1) || substr(@abc,k,1) || substr(y,j), 0) if @.z\==. then $= $ z; if r==1 then call recur end /*k*/ end /*j*/; return space($)
/*──────────────────────────────────────────────────────────────────────────────────────*/ sub: procedure expose @. @abc L; parse arg y,r; $=
do j=1 for length(y) do k=1 for L; z= space(left(y,j-1) || substr(@abc,k,1) || substr(y,j+1), 0) if @.z\==. then $= $ z; if r==1 then call recur end /*k*/ end /*j*/; return space($)</lang>
- output when using the default inputs:
Reading the file: UNIXDICT.TXT number of records (words) in the dictionary: 25,104 number of ill─formed words in the dictionary: 126 number of duplicate words in the dictionary: 0 number of acceptable words in the dictionary: 24,978 the "word" to be used for text completion: complition 6 similar words found: (88.89%) coalition (90%) completion (81.82%) competition (90.91%) compilation (81.82%) composition (80%) complexion
The input file is the same dictionary that the Java entry used.
- output when using the inputs of: , GitHub.dict
Reading the file: GitHub.dict number of records (words) in the dictionary: 466,551 number of ill─formed words in the dictionary: 50,254 number of duplicate words in the dictionary: 0 number of acceptable words in the dictionary: 416,297 the "word" to be used for text completion: complition 11 similar words found: (88.89%) coalition (90%) completion (81.82%) commolition (81.82%) comparition (81.82%) competition (90.91%) compilation (81.82%) composition (81.82%) complection (83.33%) complication (80%) compaction (80%) complexion