Words containing "the" substring: Difference between revisions
(→{{header|Smalltalk}}: changed because the problem statement wasn't clear, but the discussions seems to imply that (sigh)) |
(→{{header|Smalltalk}}: misunderstanding in the problem statement) |
||
Line 668: | Line 668: | ||
=={{header|Smalltalk}}== |
=={{header|Smalltalk}}== |
||
{{works with|Smalltalk/X}} |
{{works with|Smalltalk/X}} |
||
<lang smalltalk>'unixdict.txt' asFilename contents |
<lang smalltalk>d := 'unixdict.txt' asFilename contents asSet. |
||
page := 'https://www.rosettacode.org/wiki/Words_containing_%22the%22_substring' asURL retrieveContents. |
|||
page asCollectionOfWords |
|||
select:[:word | (word size > 11) and:[word includesString:'the' caseSensitive:trueOrFalseWhoKnows]] |
select:[:word | (word size > 11) and:[word includesString:'the' caseSensitive:trueOrFalseWhoKnows]] |
||
thenDo:#transcribeCR</lang> |
thenDo:#transcribeCR</lang> |
||
Variant (as script file): |
|||
{{works with|Smalltalk/X}} |
|||
File: "filter.st": |
|||
<lang smalltalk>#! /usr/bin/env stx --script |
|||
[Stdin atEnd] whileFalse:[ |
|||
|word| |
|||
((word := Stdin nextLine) size > 11 |
|||
and:[word includesString:'the' caseSensitive: trueOrFalseWhoKnows] |
|||
) ifTrue:[ |
|||
Stdout nextPutLine: word |
|||
] |
|||
]</lang> |
|||
Execute with: |
|||
<lang shell>chmod +x filter.st |
|||
./filter.st < unixdict.txt</lang> |
|||
{{out}} |
|||
<pre>authenticate |
|||
chemotherapy |
|||
chrysanthemum |
|||
clothesbrush |
|||
clotheshorse |
|||
eratosthenes |
|||
featherbedding |
|||
featherbrain |
|||
featherweight |
|||
gaithersburg |
|||
hydrothermal |
|||
lighthearted |
|||
mathematician |
|||
neurasthenic |
|||
nevertheless |
|||
northeastern |
|||
northernmost |
|||
otherworldly |
|||
parasympathetic |
|||
physiotherapist |
|||
physiotherapy |
|||
psychotherapeutic |
|||
psychotherapist |
|||
psychotherapy |
|||
radiotherapy |
|||
southeastern |
|||
southernmost |
|||
theoretician |
|||
weatherbeaten |
|||
weatherproof |
|||
weatherstrip |
|||
weatherstripping</pre> |
|||
=={{header|Wren}}== |
=={{header|Wren}}== |
Revision as of 13:48, 9 December 2020
- Task
Using the dictionary unixdict.txt, search words containing "the" substring,
then display the found words (on this page).
The length of any word shown should have a length > 11.
- Metrics
- Counting
- Word frequency
- Letter frequency
- Jewels and stones
- I before E except after C
- Bioinformatics/base count
- Count occurrences of a substring
- Count how many vowels and consonants occur in a string
- Remove/replace
- XXXX redacted
- Conjugate a Latin verb
- Remove vowels from a string
- String interpolation (included)
- Strip block comments
- Strip comments from a string
- Strip a set of characters from a string
- Strip whitespace from a string -- top and tail
- Strip control codes and extended characters from a string
- Anagrams/Derangements/shuffling
- Word wheel
- ABC problem
- Sattolo cycle
- Knuth shuffle
- Ordered words
- Superpermutation minimisation
- Textonyms (using a phone text pad)
- Anagrams
- Anagrams/Deranged anagrams
- Permutations/Derangements
- Find/Search/Determine
- ABC words
- Odd words
- Word ladder
- Semordnilap
- Word search
- Wordiff (game)
- String matching
- Tea cup rim text
- Alternade words
- Changeable words
- State name puzzle
- String comparison
- Unique characters
- Unique characters in each string
- Extract file extension
- Levenshtein distance
- Palindrome detection
- Common list elements
- Longest common suffix
- Longest common prefix
- Compare a list of strings
- Longest common substring
- Find common directory path
- Words from neighbour ones
- Change e letters to i in words
- Non-continuous subsequences
- Longest common subsequence
- Longest palindromic substrings
- Longest increasing subsequence
- Words containing "the" substring
- Sum of the digits of n is substring of n
- Determine if a string is numeric
- Determine if a string is collapsible
- Determine if a string is squeezable
- Determine if a string has all unique characters
- Determine if a string has all the same characters
- Longest substrings without repeating characters
- Find words which contains all the vowels
- Find words which contains most consonants
- Find words which contains more than 3 vowels
- Find words which first and last three letters are equals
- Find words which odd letters are consonants and even letters are vowels or vice_versa
- Formatting
- Substring
- Rep-string
- Word wrap
- String case
- Align columns
- Literals/String
- Repeat a string
- Brace expansion
- Brace expansion using ranges
- Reverse a string
- Phrase reversals
- Comma quibbling
- Special characters
- String concatenation
- Substring/Top and tail
- Commatizing numbers
- Reverse words in a string
- Suffixation of decimal numbers
- Long literals, with continuations
- Numerical and alphabetical suffixes
- Abbreviations, easy
- Abbreviations, simple
- Abbreviations, automatic
- Song lyrics/poems/Mad Libs/phrases
- Mad Libs
- Magic 8-ball
- 99 Bottles of Beer
- The Name Game (a song)
- The Old lady swallowed a fly
- The Twelve Days of Christmas
- Tokenize
- Text between
- Tokenize a string
- Word break problem
- Tokenize a string with escaping
- Split a character string based on change of character
- Sequences
ALGOL 68
<lang algol68># find 12 character (or more) words that have "the" in them # IF FILE input file;
STRING file name = "unixdict.txt"; open( input file, file name, stand in channel ) /= 0
THEN
# failed to open the file # print( ( "Unable to open """ + file name + """", newline ) )
ELSE
# file opened OK # BOOL at eof := FALSE; # set the EOF handler for the file # on logical file end( input file, ( REF FILE f )BOOL: BEGIN # note that we reached EOF on the # # latest read # at eof := TRUE; # return TRUE so processing can continue # TRUE END ); INT the count := 0; WHILE STRING word; get( input file, ( word, newline ) ); NOT at eof DO IF INT w len = ( UPB word + 1 ) - LWB word; w len > 11 THEN BOOL found the := FALSE; FOR w pos FROM LWB word TO UPB word - 2 WHILE NOT found the DO IF word[ w pos : w pos + 2 ] = "the" THEN found the := TRUE; the count +:= 1; print( ( word, " " ) ); IF the count MOD 6 = 0 THEN print( ( newline ) ) ELSE FROM w len + 1 TO 18 DO print( ( " " ) ) OD FI FI OD FI OD; print( ( newline, "found ", whole( the count, 0 ), " ""the"" words", newline ) ); close( input file )
FI</lang>
- Output:
authenticate chemotherapy chrysanthemum clothesbrush clotheshorse eratosthenes featherbedding featherbrain featherweight gaithersburg hydrothermal lighthearted mathematician neurasthenic nevertheless northeastern northernmost otherworldly parasympathetic physiotherapist physiotherapy psychotherapeutic psychotherapist psychotherapy radiotherapy southeastern southernmost theoretician weatherbeaten weatherproof weatherstrip weatherstripping found 32 "the" words
AppleScript
AppleScripters can tackle this task in a variety of ways. The example handlers below are listed in order of increasing speed but all complete the task in under 0.2 seconds on my current machine. They all take a file specifier, search string, and minimum length as parameters and return identical results for the same input.
Using just the core language — 'words': <lang applescript>on wordsContaining(textfile, searchText, minLength)
script o property wordList : missing value property output : {} end script -- Extract the text's 'words' and return any that meet both the search text and minimum length requirements. set o's wordList to words of (read (textfile as alias) as «class utf8») repeat with thisWord in o's wordList if ((thisWord contains searchText) and (thisWord's length ≥ minLength)) then set end of o's output to thisWord's contents end if end repeat return o's output
end wordsContaining</lang>
Using just the core language — 'text items': <lang applescript>on wordsContaining(textFile, searchText, minLength)
script o property textItems : missing value property output : {} end script -- Extract the text's search-text-delimited sections. set astid to AppleScript's text item delimiters set AppleScript's text item delimiters to searchText set o's textItems to text items of (read (textFile as alias) as «class utf8») set AppleScript's text item delimiters to astid -- Reconstitute any words containing the search text from the stubs at the section ends and -- the search text itself, returning any results which meet the minimum length requirement. set thisSection to beginning of o's textItems set sectionHasWords to ((count thisSection's words) > 0) considering white space repeat with i from 2 to (count o's textItems) set foundWord to searchText if (sectionHasWords) then set thisStub to thisSection's last word if (thisSection ends with thisStub) then set foundWord to thisStub & foundWord end if set thisSection to item i of o's textItems set sectionHasWords to ((count thisSection's words) > 0) if (sectionHasWords) then set thisStub to thisSection's first word if (thisSection begins with thisStub) then set foundWord to foundWord & thisStub end if if (foundWord's length ≥ minLength) then set end of o's output to foundWord end repeat end considering return o's output
end wordsContaining</lang>
Using a shell script: <lang applescript>on wordsContaining(textFile, searchText, minLength)
-- Set up and execute a shell script which uses grep to find words containing the search text -- (matching AppleScript's current case-sensitivity setting) and awk to pass those which -- satisfy the minimum length requirement. if ("A" = "a") then set part1 to "grep -io " else set part1 to "grep -o " end if set shellCode to part1 & quoted form of ("\\b\\w*" & searchText & "\\w*\\b") & ¬ (" <" & quoted form of textFile's POSIX path) & ¬ (" | awk " & quoted form of ("// && length($0) >= " & minLength)) return paragraphs of (do shell script shellCode)
end wordsContaining</lang>
Using Foundation methods (AppleScriptObjC): <lang applescript>use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later use framework "Foundation" use scripting additions
on wordsContaining(textFile, searchText, minLength)
set theText to current application's class "NSMutableString"'s ¬ stringWithContentsOfFile:(textFile's POSIX path) usedEncoding:(missing value) |error|:(missing value) -- Replace every run of non AppleScript 'word' characters with a linefeed. tell theText to replaceOccurrencesOfString:("(?:[\\W--[.'’]]|(?<!\\w)[.'’]|[.'’](?!\\w))++") withString:(linefeed) ¬ options:(current application's NSRegularExpressionSearch) range:({0, its |length|()}) -- Split the text at the linefeeds. set theWords to theText's componentsSeparatedByString:(linefeed) -- Filter the resulting array for strings which meet the search text and minimum length requirements, -- matching AppleScript's current case-sensitivity setting. NSString lengths are measured in 16-bit -- code units so use regex to check the lengths in characters. if ("A" = "a") then set filterTemplate to "((self CONTAINS[c] %@) && (self MATCHES %@))" else set filterTemplate to "((self CONTAINS %@) && (self MATCHES %@))" end if set filter to current application's class "NSPredicate"'s ¬ predicateWithFormat_(filterTemplate, searchText, ".{" & minLength & ",}+") return (theWords's filteredArrayUsingPredicate:(filter)) as list
end wordsContaining</lang>
Test code for the task with any of the above: <lang applescript>local textFile, output set textFile to ((path to desktop as text) & "unixdict.txt") as «class furl» -- considering case -- Uncomment this and the corresponding 'end' line for case-sensitive searches. set output to wordsContaining(textFile, "the", 12) -- end considering return {count output, output}</lang>
- Output:
<lang applescript>{32, {"authenticate", "chemotherapy", "chrysanthemum", "clothesbrush", "clotheshorse", "eratosthenes", "featherbedding", "featherbrain", "featherweight", "gaithersburg", "hydrothermal", "lighthearted", "mathematician", "neurasthenic", "nevertheless", "northeastern", "northernmost", "otherworldly", "parasympathetic", "physiotherapist", "physiotherapy", "psychotherapeutic", "psychotherapist", "psychotherapy", "radiotherapy", "southeastern", "southernmost", "theoretician", "weatherbeaten", "weatherproof", "weatherstrip", "weatherstripping"}}</lang>
AWK
The following is an awk one-liner entered at a Posix shell.
<lang awk>/Code$ awk '/the/ && length($1) > 11' unixdict.txt authenticate chemotherapy chrysanthemum clothesbrush clotheshorse eratosthenes featherbedding featherbrain featherweight gaithersburg hydrothermal lighthearted mathematician neurasthenic nevertheless northeastern northernmost otherworldly parasympathetic physiotherapist physiotherapy psychotherapeutic psychotherapist psychotherapy radiotherapy southeastern southernmost theoretician weatherbeaten weatherproof weatherstrip weatherstripping /Code$ </lang>
FreeBASIC
Reuses some code from Odd words#FreeBASIC <lang freebasic>#define NULL 0
type node
word as string*32 'enough space to store any word in the dictionary nxt as node ptr
end type
function addword( tail as node ptr, word as string ) as node ptr
'allocates memory for a new node, links the previous tail to it, 'and returns the address of the new node dim as node ptr newnode = allocate(sizeof(node)) tail->nxt = newnode newnode->nxt = NULL newnode->word = word return newnode
end function
function length( word as string ) as uinteger
'necessary replacement for the built-in len function, which in this 'case would always return 32 for i as uinteger = 1 to 32 if asc(mid(word,i,1)) = 0 then return i-1 next i return 999
end function
dim as string word dim as node ptr tail = allocate( sizeof(node) ) dim as node ptr head = tail, curr = head, currj tail->nxt = NULL tail->word = "XXXXHEADER"
open "unixdict.txt" for input as #1 while true
line input #1, word if word = "" then exit while if length(word)>11 then tail = addword( tail, word )
wend close #1
dim as string tempword
while curr->nxt <> NULL
for i as uinteger = 1 to length(curr->word)-3 if mid(curr->word,i,3) = "the" then print curr->word next i curr = curr->nxt
wend</lang>
- Output:
authenticate chemotherapy chrysanthemum clothesbrush clotheshorse eratosthenes featherbedding featherbrain featherweight gaithersburg hydrothermal lighthearted mathematician neurasthenic nevertheless northeastern northernmost otherworldly parasympathetic physiotherapist physiotherapy psychotherapeutic psychotherapist psychotherapy radiotherapy southeastern southernmost theoretician weatherbeaten weatherproof weatherstrip weatherstripping
Go
<lang go>package main
import (
"bytes" "fmt" "io/ioutil" "log" "strings" "unicode/utf8"
)
func main() {
wordList := "unixdict.txt" b, err := ioutil.ReadFile(wordList) if err != nil { log.Fatal("Error reading file") } bwords := bytes.Fields(b) var words []string for _, bword := range bwords { s := string(bword) if utf8.RuneCountInString(s) > 11 { words = append(words, s) } } count := 0 fmt.Println("Words containing 'the' having a length > 11 in", wordList, "\b:") for _, word := range words { if strings.Contains(word, "the") { count++ fmt.Printf("%2d: %s\n", count, word) } }
}</lang>
- Output:
Words containing 'the' having a length > 11 in unixdict.txt: 1: authenticate 2: chemotherapy 3: chrysanthemum 4: clothesbrush 5: clotheshorse 6: eratosthenes 7: featherbedding 8: featherbrain 9: featherweight 10: gaithersburg 11: hydrothermal 12: lighthearted 13: mathematician 14: neurasthenic 15: nevertheless 16: northeastern 17: northernmost 18: otherworldly 19: parasympathetic 20: physiotherapist 21: physiotherapy 22: psychotherapeutic 23: psychotherapist 24: psychotherapy 25: radiotherapy 26: southeastern 27: southernmost 28: theoretician 29: weatherbeaten 30: weatherproof 31: weatherstrip 32: weatherstripping
Julia
<lang julia>function wordscontaining(needle, overlength, dictfile)
for haystack in split(read(dictfile, String)) length(haystack) > overlength && occursin(needle, haystack) && println(haystack) end
end
wordscontaining("the", 11, "unixdict.txt")
</lang>
- Output:
authenticate chemotherapy chrysanthemum clothesbrush clotheshorse eratosthenes featherbedding featherbrain featherweight gaithersburg hydrothermal lighthearted mathematician neurasthenic nevertheless northeastern northernmost otherworldly parasympathetic physiotherapist physiotherapy psychotherapeutic psychotherapist psychotherapy radiotherapy southeastern southernmost theoretician weatherbeaten weatherproof weatherstrip weatherstripping
Perl
Perl one-liner entered from a Posix shell:
<lang perl>/Code$ perl -n -e '/(\w*the\w*)/ && length($1)>11 && print' unixdict.txt authenticate chemotherapy chrysanthemum clothesbrush clotheshorse eratosthenes featherbedding featherbrain featherweight gaithersburg hydrothermal lighthearted mathematician neurasthenic nevertheless northeastern northernmost otherworldly parasympathetic physiotherapist physiotherapy psychotherapeutic psychotherapist psychotherapy radiotherapy southeastern southernmost theoretician weatherbeaten weatherproof weatherstrip weatherstripping /Code$ </lang>
Phix
<lang Phix>function the(string word) return length(word)>11 and match("the",word) end function sequence words = filter(get_text("demo/unixdict.txt",GT_LF_STRIPPED),the) printf(1,"found %d 'the' words:\n%s\n",{length(words),join(shorten(words,"",3),", ")})</lang>
- Output:
found 32 'the' words: authenticate, chemotherapy, chrysanthemum, ..., weatherproof, weatherstrip, weatherstripping
Python
Entered from a Posix shell:
<lang python>/Code$ python -c 'import sys > for line in sys.stdin: > if "the" in line and len(line.strip()) > 11: > print(line.rstrip()) > ' < unixdict.txt authenticate chemotherapy chrysanthemum clothesbrush clotheshorse eratosthenes featherbedding featherbrain featherweight gaithersburg hydrothermal lighthearted mathematician neurasthenic nevertheless northeastern northernmost otherworldly parasympathetic physiotherapist physiotherapy psychotherapeutic psychotherapist psychotherapy radiotherapy southeastern southernmost theoretician weatherbeaten weatherproof weatherstrip weatherstripping /Code$ </lang>
Raku
A trivial modification of the ABC words task.
<lang perl6>put 'unixdict.txt'.IO.words».fc.grep({ (.chars > 11) && (.contains: 'the') })\
.&{"{+$_} words:\n " ~ .batch(8)».fmt('%-17s').join: "\n "};</lang>
- Output:
32 words: authenticate chemotherapy chrysanthemum clothesbrush clotheshorse eratosthenes featherbedding featherbrain featherweight gaithersburg hydrothermal lighthearted mathematician neurasthenic nevertheless northeastern northernmost otherworldly parasympathetic physiotherapist physiotherapy psychotherapeutic psychotherapist psychotherapy radiotherapy southeastern southernmost theoretician weatherbeaten weatherproof weatherstrip weatherstripping
REXX
This REXX version doesn't care what order the words in the dictionary are in, nor does it care what
case (lower/upper/mixed) the words are in, the search for the substring the is caseless.
It also allows the substring to be specified on the command line (CL) as well as the dictionary file identifier.
Programming note: If the minimum length is negative, it indicates to find the words (but not display them), and
only the display the count of found words.
<lang rexx>/*REXX program finds words that contain the substring "the" (within an identified dict.)*/
parse arg $ minL iFID . /*obtain optional arguments from the CL*/
if $== | $=="," then $= 'the' /*Not specified? Then use the default.*/
if minL== | minL=="," then minL= 12 /* " " " " " " */
if iFID== | iFID=="," then iFID='unixdict.txt' /* " " " " " " */
tell= minL>0; minL= abs(minL) /*use absolute value of minimum length.*/
@.= /*default value of any dictionary word.*/
do #=1 while lines(iFID)\==0 /*read each word in the file (word=X).*/ @.#= strip( linein( iFID) ) /*pick off a word from the input line. */ end /*#*/
$u= $; upper $u /*obtain an uppercase version of $. */ say copies('─', 25) # "words in the dictionary file: " iFID finds= 0 /*count of the substring found in dict.*/
do j=1 for #-1; z= @.j; upper z /*process all the words that were found*/ if length(z)<minL then iterate /*Is word too short? Yes, then skip.*/ if pos($u, z)==0 then iterate /*Found the substring? No, " " */ finds= finds + 1 /*bump count of substring words found. */ if tell then say right(left(@.j, 20), 25) /*Show it? Indent original word.*/ end /*j*/ /*stick a fork in it, we're all done. */
say copies('─', 25) finds " words (with a min. length of" ,
minL') that contains the substring: ' $</lang>
- output when using the default inputs:
───────────────────────── 25105 words in the dictionary file: unixdict.txt authenticate chemotherapy chrysanthemum clothesbrush clotheshorse eratosthenes featherbedding featherbrain featherweight gaithersburg hydrothermal lighthearted mathematician neurasthenic nevertheless northeastern northernmost otherworldly parasympathetic physiotherapist physiotherapy psychotherapeutic psychotherapist psychotherapy radiotherapy southeastern southernmost theoretician weatherbeaten weatherproof weatherstrip weatherstripping ───────────────────────── 32 words (with a min. length of 12) that contain the substring: the
- output when using the input of: , -3
───────────────────────── 25105 words in the dictionary file: unixdict.txt ───────────────────────── 287 words (with a min. length of 3) that contains the substring: the
Ring
<lang ring> cStr = read("unixdict.txt") wordList = str2list(cStr) num = 0 the = "the"
see "working..." + nl
ln = len(wordList) for n = ln to 1 step -1
if len(wordList[n]) < 12 del(wordList,n) ok
next
see "Words containing "the" substring:" + nl
for n = 1 to len(wordList)
ind = substr(wordList[n],the) if ind > 0 num = num +1 see "" + num + ". " + wordList[n] + nl ok
next
see "done..." + nl </lang> Output:
working... Founded "the" words are: 1. authenticate 2. chemotherapy 3. chrysanthemum 4. clothesbrush 5. clotheshorse 6. eratosthenes 7. featherbedding 8. featherbrain 9. featherweight 10. gaithersburg 11. hydrothermal 12. lighthearted 13. mathematician 14. neurasthenic 15. nevertheless 16. northeastern 17. northernmost 18. otherworldly 19. parasympathetic 20. physiotherapist 21. physiotherapy 22. psychotherapeutic 23. psychotherapist 24. psychotherapy 25. radiotherapy 26. southeastern 27. southernmost 28. theoretician 29. weatherbeaten 30. weatherproof 31. weatherstrip 32. weatherstripping done...
Smalltalk
<lang smalltalk>d := 'unixdict.txt' asFilename contents asSet. page := 'https://www.rosettacode.org/wiki/Words_containing_%22the%22_substring' asURL retrieveContents. page asCollectionOfWords
select:[:word | (word size > 11) and:[word includesString:'the' caseSensitive:trueOrFalseWhoKnows]] thenDo:#transcribeCR</lang>
Wren
<lang ecmascript>import "io" for File import "/fmt" for Fmt
var wordList = "unixdict.txt" // local copy var words = File.read(wordList).trimEnd().split("\n").where { |w| w.count > 11 }.toList var count = 0 System.print("Words containing 'the' having a length > 11 in %(wordList):") for (word in words) {
if (word.contains("the")) { count = count + 1 Fmt.print("$2d: $s", count, word) }
}</lang>
- Output:
Words containing 'the' having a length > 11 in unixdict.txt: 1: authenticate 2: chemotherapy 3: chrysanthemum 4: clothesbrush 5: clotheshorse 6: eratosthenes 7: featherbedding 8: featherbrain 9: featherweight 10: gaithersburg 11: hydrothermal 12: lighthearted 13: mathematician 14: neurasthenic 15: nevertheless 16: northeastern 17: northernmost 18: otherworldly 19: parasympathetic 20: physiotherapist 21: physiotherapy 22: psychotherapeutic 23: psychotherapist 24: psychotherapy 25: radiotherapy 26: southeastern 27: southernmost 28: theoretician 29: weatherbeaten 30: weatherproof 31: weatherstrip 32: weatherstripping