Words containing "the" substring: Difference between revisions

Added AppleScript solutions.
(Added AppleScript solutions.)
Line 67:
found 32 "the" words
</pre>
 
=={{header|AppleScript}}==
AppleScripters can tackle this task in a variety of ways. The example handlers below are listed in order of increasing speed but all complete the task in under 0.2 seconds. They all take a file specifier, search string, and minimum length as parameters and return identical results for the same input.
 
Using just the core language — 'words':
<lang applescript>on wordsContaining(textfile, searchText, minLength)
script o
property wordList : missing value
property output : {}
end script
-- Extract the text's 'words' and return any that meet both the search text and minimum length requirements.
set o's wordList to words of (read (textfile as alias) as «class utf8»)
repeat with thisWord in o's wordList
if ((thisWord contains searchText) and (thisWord's length ≥ minLength)) then
set end of o's output to thisWord's contents
end if
end repeat
return o's output
end wordsContaining</lang>
 
Using just the core language — 'text items':
<lang applescript>on wordsContaining(textFile, searchText, minLength)
script o
property textItems : missing value
property output : {}
end script
-- Extract the text's search-text-delimited sections.
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to searchText
set o's textItems to text items of (read (textFile as alias) as «class utf8»)
set AppleScript's text item delimiters to astid
-- Reconstitute any words containing the search text from the stubs at the section ends and
-- the search text itself, returning any results which meet the minimum length requirement.
set thisSection to beginning of o's textItems
set sectionHasWords to ((count thisSection's words) > 0)
considering white space
repeat with i from 2 to (count o's textItems)
set foundWord to searchText
if (sectionHasWords) then
set thisStub to thisSection's last word
if (thisSection ends with thisStub) then set foundWord to thisStub & foundWord
end if
set thisSection to item i of o's textItems
set sectionHasWords to ((count thisSection's words) > 0)
if (sectionHasWords) then
set thisStub to thisSection's first word
if (thisSection begins with thisStub) then set foundWord to foundWord & thisStub
end if
if (foundWord's length ≥ minLength) then set end of o's output to foundWord
end repeat
end considering
return o's output
end wordsContaining</lang>
 
Using a shell script:
<lang applescript>on wordsContaining(textFile, searchText, minLength)
-- Set up and execute a shell script which uses grep to find words containing the search text
-- (matching the current AppleScript case-sensitivity setting) and awk to pass those which
-- satisfy the minumum length requirement.
if ("A" = "a") then
set part1 to "grep -io "
else
set part1 to "grep -o "
end if
set shellCode to part1 & quoted form of ("\\b\\w*" & searchText & "\\w*\\b") & ¬
(" <" & quoted form of textFile's POSIX path) & ¬
(" | awk " & quoted form of ("// && length($0) >= " & minLength))
return paragraphs of (do shell script shellCode)
end wordsContaining</lang>
 
Using Foundation methods (AppleScriptObjC):
<lang applescript>use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions
 
on wordsContaining(textFile, searchText, minLength)
set theText to current application's class "NSMutableString"'s ¬
stringWithContentsOfFile:(textFile's POSIX path) usedEncoding:(missing value) |error|:(missing value)
-- Replace every run of non AppleScript 'word' characters with a linefeed.
tell theText to replaceOccurrencesOfString:("(?:[\\W--[.'’]]|(?<!\\w)[.'’]|[.'’](?!\\w))++") withString:(linefeed) ¬
options:(current application's NSRegularExpressionSearch) range:({0, its |length|()})
-- Split the text at the linefeeds.
set theWords to theText's componentsSeparatedByString:(linefeed)
-- Filter the resulting array for strings which meet the search text and minimum length requirements,
-- matching the current AppleScript case-sensitivity setting. NSString lengths are measured in 16-bit
-- code units so use regex to check the lengths in characters.
if ("A" = "a") then
set filterTemplate to "((self CONTAINS[c] %@) && (self MATCHES %@))"
else
set filterTemplate to "((self CONTAINS %@) && (self MATCHES %@))"
end if
set filter to current application's class "NSPredicate"'s ¬
predicateWithFormat_(filterTemplate, searchText, ".{" & minLength & ",}+")
return (theWords's filteredArrayUsingPredicate:(filter)) as list
end wordsContaining</lang>
 
Test code for the task with any of the above:
<lang applescript>local textFile, output
set textFile to ((path to desktop as text) & "unixdict.txt") as «class furl»
-- considering case -- Uncomment this and the corresponding 'end' line for case-sensitive searches.
set output to wordsContaining(textFile, "the", 12)
-- end considering
return {count output, output}</lang>
 
{{output}}
<lang applescript>{32, {"authenticate", "chemotherapy", "chrysanthemum", "clothesbrush", "clotheshorse", "eratosthenes", "featherbedding", "featherbrain", "featherweight", "gaithersburg", "hydrothermal", "lighthearted", "mathematician", "neurasthenic", "nevertheless", "northeastern", "northernmost", "otherworldly", "parasympathetic", "physiotherapist", "physiotherapy", "psychotherapeutic", "psychotherapist", "psychotherapy", "radiotherapy", "southeastern", "southernmost", "theoretician", "weatherbeaten", "weatherproof", "weatherstrip", "weatherstripping"}}</lang>
 
=={{header|AWK}}==
557

edits