I before E except after C
You are encouraged to solve this task according to the task description, using any language you may know.
The phrase "I before E, except after C" is a widely known mnemonic which is supposed to help when spelling English words.
- Task Description
Using the word list from http://www.puzzlers.org/pub/wordlists/unixdict.txt, check if the two sub-clauses of the phrase are plausible individually:
- "I before E when not preceded by C"
- "E before I when preceded by C"
If both sub-phrases are plausible then the original phrase can be said to be plausible.
Something is plausible if the number of words having the feature is more than two times the number of words having the opposite feature (where feature is 'ie' or 'ei' preceded or not by 'c' as appropriate).
- Stretch goal
As a stretch goal use the entries from the table of Word Frequencies in Written and Spoken English: based on the British National Corpus, (selecting those rows with three space or tab separated words only), to see if the phrase is plausible when word frequencies are taken into account.
Show your output here as well as your program.
- cf.
- Schools to rethink 'i before e' - BBC news, 20 June 2009
- I Before E Except After C - QI Series 8 Ep 14, (humorous)
- Companion website for the book: "Word Frequencies in Written and Spoken English: based on the British National Corpus".
AutoHotkey
<lang AutoHotkey>WordList := URL_ToVar("http://www.puzzlers.org/pub/wordlists/unixdict.txt") WordList := RegExReplace(WordList, "i)cie", "", cieN) WordList := RegExReplace(WordList, "i)cei", "", ceiN) RegExReplace(WordList, "i)ie", "", ieN) RegExReplace(WordList, "i)ei", "", eiN)
cei := ceiN / cieN > 2 ? "plausible" : "implausible" ei := ieN / eiN > 2 ? "plausible" : "implausible" ova := cei = "plausible." && ei = "plausible" ? "plausible" : "implausible"
MsgBox, % """I before E when not preceded by C"" is " ei ".`n"
. ieN " cases for and " eiN " cases against is a ratio of " ieN / eiN ".`n`n" . """E before I when preceded by C"" is " cei ".`n" . ceiN " cases for and " cieN " cases against is a ratio of " ceiN / cieN ".`n`n" . "Overall the rule is " ova "."
URL_ToVar(URL) {
WebRequest := ComObjCreate("WinHttp.WinHttpRequest.5.1") WebRequest.Open("GET", URL) WebRequest.Send() return, WebRequest.ResponseText
}</lang> Output:
"I before E when not preceded by C" is plausible. 466 cases for and 217 cases against is a ratio of 2.147465. "E before I when preceded by C" is implausible. 13 cases for and 24 cases against is a ratio of 0.541667. Overall the rule is implausible.
AWK
<lang awk>#!/usr/bin/awk -f
/.ei/ {nei+=cnt($3)} /cei/ {cei+=cnt($3)}
/.ie/ {nie+=cnt($3)} /cie/ {cie+=cnt($3)}
function cnt(c) { if (c<1) return 1; return c; }
END { printf("cie: %i\nnie: %i\ncei: %i\nnei: %i\n",cie,nie-cie,cei,nei-cei); v = ""; if (nie < 3 * cie) { v=" not"; } print "I before E when not preceded by C: is"v" plausible"; v = ""; if (nei > 3 * cei) { v=" not"; } print "E before I when preceded by C: is"v" plausible"; }</lang>
Usage:
$ awk -f ./i_before_e_except_after_c.awk unixdict.txt cie: 24 nie: 464 cei: 13 nei: 194 I before E when not preceded by C: is plausible E before I when preceded by C: is not plausible $ awk -f i_before_e_except_after_c.awk 1_2_all_freq.txt cie: 994 nie: 8148 cei: 327 nei: 4826 I before E when not preceded by C: is plausible E before I when preceded by C: is not plausible
C
Inspired by the J solution, but implemented as a single pass through the data, we have flex build the finite state machine in C. This may in turn motivate me to provide a second J solution as a single pass FSM. Please find the program output hidden at the top of the source as part of the build and example run. <lang c> %{
/* compilation and example on a GNU linux system: $ flex --case-insensitive --noyywrap --outfile=cia.c source.l $ make LOADLIBES=-lfl cia $ ./cia < unixdict.txt I before E when not preceded by C: plausible E before I when preceded by C: implausible Overall, the rule is: implausible */ int cie, cei, ie, ei;
%}
%%
cie ++cie, ++ie; /* longer patterns are matched preferentially, consuming input */ cei ++cei, ++ei; ie ++ie; ei ++ei; .|\n ;
%%
int main() {
cie = cei = ie = ei = 0; yylex(); printf("%s: %s\n","I before E when not preceded by C", (2*ei < ie ? "plausible" : "implausible")); printf("%s: %s\n","E before I when preceded by C", (2*cie < cei ? "plausible" : "implausible")); printf("%s: %s\n","Overall, the rule is", (2*(cie+ei) < (cei+ie) ? "plausible" : "implausible")); return 0;
} </lang>
C++
1. The task fails to consider whether certain words should be excluded. Proper names, words with a basis in a foreign language (e.g. geisha), etc. 2. Some words have multiple occurrences of the patterns. 3. Another way to determine overall plausibility would be to add all pro/con arguments together; this would result in a different outcome with the input file in question. 4. If the file changes, the outcome will possibly be different. sha1 of file 2013-12-30: 058f8872306ef36f679d44f1b556334a13a85b57 unixdict.txt 5. Build with g++ -Wall -std=c++0x thisfile.cpp -lboost_regex (Test used 4.4, so only a limited number of C++11 features were used.)
<lang cpp>#include <iostream>
- include <fstream>
- include <string>
- include <tuple>
- include <vector>
- include <stdexcept>
- include <boost/regex.hpp>
struct Claim {
Claim(const std::string& name) : name_(name), pro_(0), against_(0), propats_(), againstpats_() { } void add_pro(const std::string& pat) { propats_.push_back(std::make_tuple(boost::regex(pat), pat[0] == '^')); } void add_against(const std::string& pat) { againstpats_.push_back(std::make_tuple(boost::regex(pat), pat[0] == '^')); } bool plausible() const { return pro_ > against_*2; } void check(const char * buf, uint32_t len) { for (auto i = propats_.begin(), ii = propats_.end(); i != ii; ++i) { uint32_t pos = 0; boost::cmatch m; if (std::get<1>(*i) && pos > 0) continue; while (pos < len && boost::regex_search(buf+pos, buf+len, m, std::get<0>(*i))) { ++pro_; if (pos > 0) std::cerr << name_ << " [pro] multiple matches in: " << buf << "\n"; pos += m.position() + m.length(); } } for (auto i = againstpats_.begin(), ii = againstpats_.end(); i != ii; ++i) { uint32_t pos = 0; boost::cmatch m; if (std::get<1>(*i) && pos > 0) continue; while (pos < len && boost::regex_search(buf+pos, buf+len, m, std::get<0>(*i))) { ++against_; if (pos > 0) std::cerr << name_ << " [against] multiple matches in: " << buf << "\n"; pos += m.position() + m.length(); } } } friend std::ostream& operator<<(std::ostream& os, const Claim& c);
private:
std::string name_; uint32_t pro_; uint32_t against_; // tuple<regex,begin only> std::vector<std::tuple<boost::regex,bool>> propats_; std::vector<std::tuple<boost::regex,bool>> againstpats_;
};
std::ostream& operator<<(std::ostream& os, const Claim& c) {
os << c.name_ << ": matches: " << c.pro_ << " vs. counter matches: " << c.against_ << ". "; os << "Plausibility: " << (c.plausible() ? "yes" : "no") << "."; return os;
}
int main(int argc, char ** argv) {
try { if (argc < 2) throw std::runtime_error("No input file."); std::ifstream is(argv[1]); if (! is) throw std::runtime_error("Input file not valid.");
Claim ieclaim("[^c]ie"); ieclaim.add_pro("[^c]ie"); ieclaim.add_pro("^ie"); ieclaim.add_against("[^c]ei"); ieclaim.add_against("^ei");
Claim ceiclaim("cei"); ceiclaim.add_pro("cei"); ceiclaim.add_against("cie");
{ const uint32_t MAXLEN = 32; char buf[MAXLEN]; uint32_t longest = 0; while (is) { is.getline(buf, sizeof(buf)); if (is.gcount() <= 0) break; else if (is.gcount() > longest) longest = is.gcount(); ieclaim.check(buf, is.gcount()); ceiclaim.check(buf, is.gcount()); } if (longest >= MAXLEN) throw std::runtime_error("Buffer too small."); }
std::cout << ieclaim << "\n"; std::cout << ceiclaim << "\n"; std::cout << "Overall plausibility: " << (ieclaim.plausible() && ceiclaim.plausible() ? "yes" : "no") << "\n";
} catch (const std::exception& ex) { std::cerr << "*** Error: " << ex.what() << "\n"; return -1; } return 0;
} </lang>
Output: [^c]ie [pro] multiple matches in: siegfried [^c]ie [against] multiple matches in: weinstein [^c]ie: matches: 466 vs. counter matches: 217. Plausibility: yes. cei: matches: 13 vs. counter matches: 24. Plausibility: no. Overall plausibility: no
Coco
First we need to set the variable dict
to the text of the dictionary as a string. How to do this depends on your JavaScript platform. Using Node.js, for example, you could download a copy of the dictionary to /tmp/unixdict.txt
and then say dict = fs.readFileSync '/tmp/unixdict.txt', {encoding: 'UTF-8'}
.
Now we can do the task:
<lang coco>ie-npc = ei-npc = ie-pc = ei-pc = 0 for word of dict.toLowerCase!.match /\S+/g
++ie-npc if /(^|[^c])ie/.test word ++ei-npc if /(^|[^c])ei/.test word ++ie-pc if word.indexOf('cie') > -1 ++ei-pc if word.indexOf('cei') > -1
p1 = ie-npc > 2 * ei-npc p2 = ei-pc > 2 * ie-pc
console.log '(1) is%s plausible.', if p1 then else ' not' console.log '(2) is%s plausible.', if p2 then else ' not' console.log 'The whole phrase is%s plausible.', if p1 and p2 then else ' not'</lang>
Common Lisp
<lang lisp> (defun test-rule (rule-name examples counter-examples)
(let ((plausible (if (> examples (* 2 counter-examples)) 'plausible 'not-plausible))) (list rule-name plausible examples counter-examples)))
(defun plausibility (result-string file parser)
(let ((cei 0) (cie 0) (ie 0) (ei 0)) (macrolet ((search-count (&rest terms) (when terms `(progn (when (search ,(string-downcase (symbol-name (car terms))) word) (incf ,(car terms) freq)) (search-count ,@(cdr terms)))))) (with-open-file (stream file :external-format :latin-1) (loop :for raw-line = (read-line stream nil 'eof) :until (eq raw-line 'eof) :for line = (string-trim '(#\Tab #\Space) raw-line) :for (word freq) = (funcall parser line) :do (search-count cei cie ie ei)) (print-result result-string cei cie ie ei)))))
(defun print-result (result-string cei cie ie ei)
(let ((results (list (test-rule "I before E when not preceded by C" (- ie cie) (- ei cei)) (test-rule "E before I when preceded by C" cei cie)))) (format t "~a:~%~{~{~2TThe rule \"~a\" is ~S. There were ~a examples and ~a counter-examples.~}~^~%~}~%~%~2TOverall the rule is ~S~%~%" result-string results (or (find 'not-plausible (mapcar #'cadr results)) 'plausible))))
(defun parse-dict (line) (list line 1))
(defun parse-freq (line)
(list (subseq line 0 (position #\Tab line)) (parse-integer (subseq line (position #\Tab line :from-end t)) :junk-allowed t)))
(plausibility "Dictionary" #p"unixdict.txt" #'parse-dict) (plausibility "Word frequencies (stretch goal)" #p"1_2_all_freq.txt" #'parse-freq) </lang>
Output:
Dictionary: The rule "I before E when not preceded by C" is PLAUSIBLE. There were 465 examples and 213 counter-examples. The rule "E before I when preceded by C" is NOT-PLAUSIBLE. There were 13 examples and 24 counter-examples. Overall the rule is NOT-PLAUSIBLE Word frequencies (stretch goal): The rule "I before E when not preceded by C" is NOT-PLAUSIBLE. There were 8163 examples and 4826 counter-examples. The rule "E before I when preceded by C" is NOT-PLAUSIBLE. There were 327 examples and 994 counter-examples. Overall the rule is NOT-PLAUSIBLE
Fortran
Please find the linux build instructions along with example run in the comments at the beginning of the f90 source. Thank you. <lang FORTRAN> !-*- mode: compilation; default-directory: "/tmp/" -*- !Compilation started at Sat May 18 22:19:19 ! !a=./F && make $a && $a < unixdict.txt !f95 -Wall -ffree-form F.F -o F ! ie ei cie cei ! 490 230 24 13 ! [^c]ie plausible ! cei implausible ! ([^c]ie)|(cei) implausible ! !Compilation finished at Sat May 18 22:19:19
! test the plausibility of i before e except... program cia
implicit none character (len=256) :: s integer :: ie, ei, cie, cei integer :: ios data ie, ei, cie, cei/4*0/ do while (.true.) read(5,*,iostat = ios)s if (0 .ne. ios) then exit endif call lower_case(s) cie = cie + occurrences(s, 'cie') cei = cei + occurrences(s, 'cei') ie = ie + occurrences(s, 'ie') ei = ei + occurrences(s, 'ei') enddo write(6,'(1x,4(a4,1x))') 'ie','ei','cie','cei' write(6,'(1x,4(i4,1x))') ie,ei,cie,cei ! 488 230 24 13 write(6,'(1x,2(a,1x))') ' [^c]ie',plausibility(ie,ei) write(6,'(1x,2(a,1x))') ' cei',plausibility(cei,cie) write(6,'(1x,2(a,1x))') '([^c]ie)|(cei)',plausibility(ie+cei,ei+cie)
contains
subroutine lower_case(s) character(len=*), intent(inout) :: s integer :: i do i=1, len_trim(s) s(i:i) = achar(ior(iachar(s(i:i)),32)) enddo end subroutine lower_case
integer function occurrences(a,b) character(len=*), intent(in) :: a, b integer :: i, j, n n = 0 i = 0 j = index(a, b) do while (0 .lt. j) n = n+1 i = i+len(b)+j-1 j = index(a(i:), b) end do occurrences = n end function occurrences
character*(32) function plausibility(da, nyet) integer, intent(in) :: da, nyet !write(0,*)da,nyet if (nyet*2 .lt. da) then plausibility = 'plausible' else plausibility = 'implausible' endif end function plausibility
end program cia </lang>
freebasic
<lang freebasic>
Function getfile(file As String) As String
Dim As Integer F = Freefile Dim As String text,intext Open file For Input As #F Line Input #F,text While Not Eof(F) Line Input #F,intext text=text+Chr(10)+intext Wend close #F Return text
End Function
Function TALLY(instring As String,PartString As String) As Integer
Dim count As Integer var lens2=Len(PartString) Dim As String s=instring Dim As Integer position=Instr(s,PartString) If position=0 Then Return 0 While position>0 count=count+1 position=Instr(position+Lens2,s,PartString) Wend Function=count End Function
Dim As String myfile="unixdict.txt"
Dim As String wordlist= getfile(myfile) wordlist=lcase(wordlist)
print print "The number of words in unixdict.txt ",TALLY(wordlist,chr(10))+1 print dim as integer cei=TALLY(wordlist,"cei") print "Instances of cei",cei dim as integer cie=TALLY(wordlist,"cie") print "Instances of cie",cie print dim as integer ei=TALLY(wordlist,"ei") print "Instances of *ei, where * is not c",ei-cei dim as integer ie=TALLY(wordlist,"ie") print "Instances of *ie, where * is not c",ie-cie print print "Conclusion:" print "ie is plausible when not preceeded by c, the ratio is ";(ie-cie)/(ei-cei) print "ei is not plausible when preceeded by c, the ratio is ";cei/cie print "So, the idea is not plausible."
Sleep
</lang> Output:
The number of words in unixdict.txt 25104 Instances of cei 13 Instances of cie 24 Instances of *ei, where * is not c 217 Instances of *ie, where * is not c 466 Conclusion: ie is plausible when not preceeded by c, the ratio is 2.147465437788018 ei is not plausible when preceeded by c, the ratio is 0.5416666666666666 So, the idea is not plausible.
Go
<lang go>package main
import ( "bufio" "fmt" "log" "os" "regexp" "strings" )
func main() { f, err := os.Open("unixdict.txt") if err != nil { log.Fatalln(err) } defer f.Close()
s := bufio.NewScanner(f) rie := regexp.MustCompile("^ie|[^c]ie") rei := regexp.MustCompile("^ei|[^c]ei") var cie, ie int var cei, ei int for s.Scan() { line := s.Text() if strings.Contains(line, "cie") { cie++ } if strings.Contains(line, "cei") { cei++ } if rie.MatchString(line) { ie++ } if rei.MatchString(line) { ei++ } } err = s.Err() if err != nil { log.Fatalln(err) }
if check(ie, ei, "I before E when not preceded by C") && check(cei, cie, "E before I when preceded by C") { fmt.Println("Both plausable.") fmt.Println(`"I before E, except after C" is plausable.`) } else { fmt.Println("One or both implausable.") fmt.Println(`"I before E, except after C" is implausable.`) } }
// check checks if a statement is plausible. Something is plausible if a is more // than two times b. func check(a, b int, s string) bool { switch { case a > b*2: fmt.Printf("%q is plausible (%d vs %d).\n", s, a, b) return true case a >= b: fmt.Printf("%q is implausible (%d vs %d).\n", s, a, b) default: fmt.Printf("%q is implausible and contra-indicated (%d vs %d).\n", s, a, b) } return false }</lang>
- Output:
"I before E when not preceded by C" is plausible (465 vs 213). "E before I when preceded by C" is implausible and contra-indicated (13 vs 24). One or both implausable. "I before E, except after C" is implausable.
Haskell
Using Regular Expressions, you can quickly count all occurrences of words that follow this rule and words that don't. In this solution, TDFA -- a fast, POSIX ERE engine -- was used. However, substituting any other regex engine for TDFA should only require changing the import statement. See this page for a list of the most common regex engines available in Haskell.
This solution does not attempt the stretch goal.
<lang Haskell>import Network.HTTP import Text.Regex.TDFA import Text.Printf
getWordList :: IO String getWordList = do
response <- simpleHTTP.getRequest$ url getResponseBody response where url = "http://www.puzzlers.org/pub/wordlists/unixdict.txt"
main = do
words <- getWordList putStrLn "Checking Rule 1: \"I before E when not preceded by C\"..." let numTrueRule1 = matchCount (makeRegex "[^c]ie" :: Regex) words numFalseRule1 = matchCount (makeRegex "[^c]ei" :: Regex) words rule1Plausible = numTrueRule1 > (2*numFalseRule1) printf "Rule 1 is correct for %d\n incorrect for %d\n" numTrueRule1 numFalseRule1 printf "*** Rule 1 is %splausible.\n" (if rule1Plausible then "" else "im") putStrLn "Checking Rule 2: \"E before I when preceded by C\"..." let numTrueRule2 = matchCount (makeRegex "cei" :: Regex) words numFalseRule2 = matchCount (makeRegex "cie" :: Regex) words rule2Plausible = numTrueRule2 > (2*numFalseRule2) printf "Rule 2 is correct for %d\n incorrect for %d\n" numTrueRule2 numFalseRule2 printf "*** Rule 2 is %splausible.\n" (if rule2Plausible then "" else "im")</lang>
The output:
Checking Rule 1: "I before E when not preceded by C"... Rule 1 is correct for 465 incorrect for 195 *** Rule 1 is plausible. Checking Rule 2: "E before I when preceded by C"... Rule 2 is correct for 13 incorrect for 24 *** Rule 2 is implausible.
Icon and Unicon
This solution only works in Unicon, but wouldn't be hard to adapt to Icon. Assumes that words that start with "ei" violate "i before e except after c" and that occurrences of "ei" and "ie" that occur multiple times in the same input line should all be tested.
<lang Unicon>import Utils # To get the FindFirst class
procedure main(a)
showCounts := "--showcounts" == !a totals := table(0) phrases := ["cei","cie","ei","ie"] # Longer phrases first ff := FindFirst(phrases)
every map(!&input) ? while totals[2(tab(ff.locate()), ff.moveMatch(), move(-1))] +:= 1
eiP := totals["cei"] > 2* totals["cie"] ieP := (totals["ie"]+totals["cei"]) > 2* totals["ei"] write("phrase is ",((\ieP & \eiP),"plausible")|"not plausible") write("ie is ",(\ieP,"plausible")|"not plausible") write("ei is ",(\eiP,"plausible")|"not plausible")
if \showCounts then every write(phrase := !phrases,": ",totals[phrase])
end</lang>
Output of running with --showcounts flag:
-> ei --showcounts <unixdict.txt phrase is not plausible ie is plausible ei is not plausible cei: 13 cie: 24 ei: 217 ie: 466 ->
stretch goal
<lang Unicon>import Utils # To get the FindFirst class
procedure main(a)
WS := " \t" showCounts := "--showcounts" == !a phrases := ["cei","cie","ei","ie"] ff := FindFirst(phrases) totals := table(0)
every map(!&input) ? { w := (tab(many(WS)),tab(upto(WS))) # word (tab(many(WS)),tab(upto(WS))) # Skip part of speech n := integer((tab(many(WS)),tab(upto(WS)|0))) | next # frequency? \w ? while totals[2(tab(ff.locate()), ff.moveMatch(), move(-1))] +:= n }
eiP := totals["cei"] > 2* totals["cie"] ieP := (totals["ie"]+totals["cei"]) > 2* totals["ei"] write("phrase is ",((\ieP & \eiP),"plausible")|"not plausible") write("ie is ",(\ieP,"plausible")|"not plausible") write("ei is ",(\eiP,"plausible")|"not plausible")
if \showCounts then every write(phrase := !phrases,": ",totals[phrase])
end</lang>
with output:
->ei2 --showcounts <1_2*txt phrase is not plausible ie is not plausible ei is not plausible cei: 327 cie: 994 ei: 4826 ie: 8207 ->
J
After downloading unixdict to /tmp:
<lang J> dict=:tolower fread '/tmp/unixdict.txt'</lang>
Investigating the rules:
<lang J> +/'cie' E. dict 24
+/'cei' E. dict
13
+/'ie' E. dict
490
+/'ei' E. dict
230</lang>
So, based on unixdict.txt, the "I before E" rule seems plausible (490 > 230 by more than a factor of 2), but the exception does not make much sense (we see almost twice as many i before e after a c as we see e before i after a c).
Note that if we looked at frequency of use for words, instead of considering all words to have equal weights, we might come up with a different answer.
stretch goal
After downloading 1_2_all_freq to /tmp, we can read it into J, and break out the first column (as words) and the third column as numbers:
<lang J>allfreq=: |:}.<;._1;._2]1!:1<'/tmp/1_2_all_freq.txt'
words=: >0 { allfreq freqs=: 0 {.@".&>2 { allfreq</lang>
With these definitions, we can define a prevalence verb which will tell us how often a particular substring is appears in use:
<lang J>prevalence=:verb define
(y +./@E."1 words) +/ .* freqs
)</lang>
Investigating our original proposed rules:
<lang J> 'ie' %&prevalence 'ei' 1.76868</lang>
A generic "i before e" rule is not looking quite as good now - words that have i before e are used less than twice as much as words which use e before i.
<lang J> 'cei' %&prevalence 'cie' 0.328974</lang>
An "except after c" variant is looking awful now - words that use the cie sequence are three times as likely as words that use the cei sequence. So, of course, if we modified our original rule with this exception it would weaken the original rule:
<lang J> ('ie' -&prevalence 'cie') % ('ei' -&prevalence 'cei') 1.68255</lang>
Note that we might also want to consider non-adjacent matches (the regular expression 'i.*e' instead of 'ie' or perhaps 'c.*ie' or 'c.*i.*e' instead of 'cie') - this would be straightforward to check, but this would bulk up the page.
Java
Download and save wordlist to unixdict.txt.
<lang java> import java.io.BufferedReader; import java.io.FileReader;
public class IbeforeE { public static void main(String[] args) { IbeforeE now=new IbeforeE(); String wordlist="unixdict.txt"; if(now.isPlausibleRule(wordlist)) System.out.println("Rule is plausible."); else System.out.println("Rule is not plausible."); } boolean isPlausibleRule(String filename) { int truecount=0,falsecount=0; try { BufferedReader br=new BufferedReader(new FileReader(filename)); String word; while((word=br.readLine())!=null) { if(isPlausibleWord(word)) truecount++; else if(isOppPlausibleWord(word)) falsecount++; } br.close(); } catch(Exception e) { System.out.println("Something went horribly wrong: "+e.getMessage()); }
System.out.println("Plausible count: "+truecount); System.out.println("Implausible count: "+falsecount); if(truecount>2*falsecount) return true; return false; } boolean isPlausibleWord(String word) { if(!word.contains("c")&&word.contains("ie")) return true; else if(word.contains("cei")) return true; return false; } boolean isOppPlausibleWord(String word) { if(!word.contains("c")&&word.contains("ei")) return true; else if(word.contains("cie")) return true; return false; } } </lang>
Output:
Plausible count: 384 Implausible count: 204 Rule is not plausible.
Lasso
<lang lasso> local(cie,cei,ie,ei) = (:0,0,0,0)
local(match_ie) = regExp(`[^c]ie`) local(match_ei) = regExp(`[^c]ei`)
with word in include_url(`http://www.puzzlers.org/pub/wordlists/unixdict.txt`)->asString->split("\n") where #word >> `ie` or #word >> `ei` do {
#word >> `cie` ? #cie++ #word >> `cei` ? #cei++
#match_ie->reset(-input=#word, -ignoreCase)&find ? #ie++ #match_ei->reset(-input=#word, -ignoreCase)&find ? #ei++
}
local(ie_plausible) = (#ie >= (2 * #ei)) local(cei_plausible) = (#cei >= (2 * #cie))
stdoutnl(
`The rule "I before E when not preceded by C" is ` + (#ie_plausible ? | 'NOT-') + `PLAUSIBLE. There were ` + #ie + ` examples and ` + #ei + ` counter-examples.`
) stdoutnl(
`The rule "E before I when preceded by C" is ` + (#cei_plausible ? `` | `NOT-`) + `PLAUSIBLE. There were ` + #cei + ` examples and ` + #cie + ` counter-examples.`
) stdoutnl(`Overall the rule is ` + (#ie_plausible and #cei_plausible ? `` | `NOT-`) + `PLAUSIBLE`) </lang>
- Output:
The rule "I before E when not preceded by C" is PLAUSIBLE. There were 464 examples and 194 counter-examples. The rule "E before I when preceded by C" is NOT-PLAUSIBLE. There were 13 examples and 24 counter-examples. Overall the rule is NOT-PLAUSIBLE
Mathematica
<lang mathematica>wordlist =
Import["http://www.puzzlers.org/pub/wordlists/unixdict.txt", "Words"];
Print["The number of words in unixdict.txt = " <>
ToString[Length[wordlist]]]
StringMatchQ[#, ___ ~~ "c" ~~ "i" ~~ "e" ~~ ___] & /@ wordlist ; cie = Count[%, True]; StringMatchQ[#, ___ ~~ "c" ~~ "e" ~~ "i" ~~ ___] & /@ wordlist ; cei = Count[%, True]; StringMatchQ[#, ___ ~~ "i" ~~ "e" ~~ ___] & /@ wordlist ; ie = Count[%, True] - cie; StringMatchQ[#, ___ ~~ "e" ~~ "i" ~~ ___] & /@ wordlist ; ei = Count[%, True] - cei; test1 = ie > 2 ei; Print["The rule \"I before E when not preceded by C\" is " <>
If[test1, "PLAUSIBLE", "NOT PLAUSIBLE"]]
Print["There were " <> ToString[ie] <> " examples and " <>
ToString[ei] <> " counter examples, for a ratio of " <> ToString[N[ie/ei]]]
test2 = cei > 2 cie; Print["The rule \"E before I when preceded by C\" is " <>
If[test2, "PLAUSIBLE", "NOT PLAUSIBLE"]]
Print["There were " <> ToString[cei] <> " examples and " <>
ToString[cie] <> " counter examples, for a ratio of " <> ToString[N[cei/cie]]]
Print["Overall the rule is " <>
If[test1 && test2, "PLAUSIBLE", "NOT PLAUSIBLE" ]]</lang>
- Output:
<lang mathematica>The number of words in unixdict.txt = 25104 The rule "I before E when not preceded by C" is PLAUSIBLE There were 465 examples and 213 counter examples, for a ratio of 2.1831 The rule "E before I when preceded by C" is NOT PLAUSIBLE There were 13 examples and 24 counter examples, for a ratio of 0.541667 Overall the rule is NOT PLAUSIBLE </lang>
MATLAB / Octave
<lang MATLAB>function i_before_e_except_after_c(f)
fid = fopen(f,'r'); nei = 0; cei = 0; nie = 0; cie = 0; while ~feof(fid) c = strsplit(strtrim(fgetl(fid)),char([9,32])); if length(c) > 2, n = str2num(c{3}); else n = 1; end; if strfind(c{1},'ei')>1, nei=nei+n; end; if strfind(c{1},'cei'), cei=cei+n; end; if strfind(c{1},'ie')>1, nie=nie+n; end; if strfind(c{1},'cie'), cie=cie+n; end; end; fclose(fid);
printf('cie: %i\nnie: %i\ncei: %i\nnei: %i\n',cie,nie-cie,cei,nei-cei); v = ; if (nie < 3 * cie) v=' not'; end printf('I before E when not preceded by C: is%s plausible\n',v); v = ; if (nei > 3 * cei) v=' not'; end printf('E before I when preceded by C: is%s plausible\n',v); </lang>
octave:23> i_before_e_except_after_c 1_2_all_freq.txt cie: 994 nie: 8133 cei: 327 nei: 4274 I before E when not preceded by C: is plausible E before I when preceded by C: is not plausible octave:24> i_before_e_except_after_c unixdict.txt cie: 24 nie: 464 cei: 13 nei: 191 I before E when not preceded by C: is plausible E before I when preceded by C: is not plausible
Objeck
<lang objeck> use HTTP; use Collection;
class HttpTest {
function : Main(args : String[]) ~ Nil { IsPlausibleRule("http://www.puzzlers.org/pub/wordlists/unixdict.txt"); }
function : PlausibilityCheck(comment : String, x : Int, y : Int) ~ Bool { ratio := x->As(Float) / y->As(Float); " Checking plausibility of: {$comment}"->PrintLine(); if(x > 2 * y) { " PLAUSIBLE. As we have counts of {$x} vs {$y} words, a ratio of {$ratio} times"->PrintLine(); } else if(x > y) { " IMPLAUSIBLE. As although we have counts of {$x} vs {$y} words, a ratio of {$ratio} times does not make it plausible"->PrintLine(); } else { " IMPLAUSIBLE, probably contra-indicated. As we have counts of {$x} vs {$y} words, a ratio of {$ratio} times"->PrintLine(); };
return x > 2 * y; }
function : IsPlausibleRule(url : String) ~ Nil { truecount := 0; falsecount := 0;
client := HttpClient->New(); data := client->Get(url)->Get(0)->As(String); data := data->ToLower(); words := data->Split("\n");
cie := Count("cie", words); cei := Count("cei", words); not_c_ie := Count("ie", words) - cie; not_c_ei := Count("ei", words) - cei;
"Checking plausibility of \"I before E except after C\":"->PrintLine(); if(PlausibilityCheck("I before E when not preceded by C", not_c_ie, not_c_ei) & PlausibilityCheck("E before I when preceded by C", cei, cie)) { "OVERALL IT IS PLAUSIBLE!"->PrintLine(); } else { "OVERALL IT IS IMPLAUSIBLE!"->PrintLine(); "(To be plausible, one word count must exceed another by 2 times)"->PrintLine(); }; }
function : Count(check: String, words : String[]) ~ Int { count := 0;
each(i : words) { if(words[i]->Find(check) > -1) { count += 1; }; };
return count; }
} </lang>
Output:
Checking plausibility of "I before E except after C": Checking plausibility of: I before E when not preceded by C PLAUSIBLE. As we have counts of 465 vs 213 words, a ratio of 2.183 times Checking plausibility of: E before I when preceded by C IMPLAUSIBLE, probably contra-indicated. As we have counts of 13 vs 24 words, a ratio of 0.542 times OVERALL IT IS IMPLAUSIBLE! (To be plausible, one word count must exceed another by 2 times)
Perl
<lang perl>#!/usr/bin/perl use warnings; use strict;
sub result {
my ($support, $against) = @_; my $ratio = sprintf '%.2f', $support / $against; my $result = $ratio >= 2; print "$support / $against = $ratio. ", 'NOT ' x !$result, "PLAUSIBLE\n"; return $result;
}
my @keys = qw(ei cei ie cie); my %count;
while (<>) {
for my $k (@keys) { $count{$k}++ if -1 != index $_, $k; }
}
my ($support, $against, $result);
print 'I before E when not preceded by C: '; $support = $count{ie} - $count{cie}; $against = $count{ei} - $count{cei}; $result += result($support, $against);
print 'E before I when preceded by C: '; $support = $count{cei}; $against = $count{cie}; $result += result($support, $against);
print 'Overall: ', 'NOT ' x ($result < 2), "PLAUSIBLE.\n";</lang>
Output:
I before E when not preceded by C: 465 / 213 = 2.18. PLAUSIBLE E before I when preceded by C: 13 / 24 = 0.54. NOT PLAUSIBLE Overall: NOT PLAUSIBLE.
Perl: Stretch Goal
Just replace the while loop with the following one: <lang perl>while (<>) {
my @columns = split; next if 3 < @columns; my ($word, $freq) = @columns[0, 2]; for my $k (@keys) { $count{$k} += $freq if -1 != index $word, $k; }
}</lang> Output:
I before E when not preceded by C: 8148 / 4826 = 1.69. NOT PLAUSIBLE E before I when preceded by C: 327 / 994 = 0.33. NOT PLAUSIBLE Overall: NOT PLAUSIBLE.
Perl 6
This solution uses grammars and actions to parse the given file, the Bag for tallying up occurrences of each possible thing we're looking for ("ie", "ei", "cie", and "cei"), and junctions to determine the plausibility of a phrase from the subphrases. Note that a version of rakudo newer than the January 2014 compiler or Star releases is needed, as this code relies on a recent bugfix to the make function. <lang perl6>grammar CollectWords {
token TOP { [^^ <word> $$ \n?]+ }
token word { [ <with_c> | <no_c> | \N ]+ }
token with_c { c <ie_part> }
token no_c { <ie_part> }
token ie_part { ie | ei | eie # a couple words in the list have "eie" }
}
class CollectWords::Actions {
method TOP($/) { make $<word>».ast.Bag; }
method word($/) { if $<with_c> + $<no_c> { make ($<with_c>».ast, $<no_c>».ast); } else { make (); } }
method with_c($/) { make "c" X~ $<ie_part>.ast; }
method no_c($/) { make "!c" X~ $<ie_part>.ast; }
method ie_part($/) { if ~$/ eq 'eie' { make ('ei', 'ie'); } else { make ~$/; } }
}
sub plausible($good, $bad, $msg) {
if $good > 2*$bad { say "$msg: PLAUSIBLE ($good ✔ vs. $bad ✘)"; return True; } else { say "$msg: NOT PLAUSIBLE ($good ✔ vs. $bad ✘)"; return False; }
}
my $results = CollectWords.parsefile("unixdict.txt", :actions(CollectWords::Actions)).ast;
my $phrasetest = [&] plausible($results<!cie>, $results<!cei>, "I before E when not preceded by C"),
plausible($results<cei>, $results<cie>, "E before I when preceded by C");
say "I before E except after C: ", $phrasetest ?? "PLAUSIBLE" !! "NOT PLAUSIBLE";</lang>
- Output:
I before E when not preceded by C: PLAUSIBLE (466 ✔ vs. 217 ✘) E before I when preceded by C: NOT PLAUSIBLE (13 ✔ vs. 24 ✘) I before E except after C: NOT PLAUSIBLE
Perl 6: Stretch Goal
Note that within the original text file, a tab character was erroneously replaced with a space. Thus, the following changes to the text file are needed before this solution will run:
--- orig_1_2_all_freq.txt 2014-02-01 14:36:53.124121018 -0800 +++ 1_2_all_freq.txt 2014-02-01 14:37:10.525552980 -0800 @@ -2488,7 +2488,7 @@ other than Prep 43 visited Verb 43 cross NoC 43 - lie Verb 43 + lie Verb 43 grown Verb 43 crowd NoC 43 recognised Verb 43
This solution requires just a few modifications to the grammar and actions from the non-stretch goal. <lang perl6>grammar CollectWords {
token TOP { ^^ \t Word \t PoS \t Freq $$ \n [^^ <word> $$ \n?]+ }
token word { \t+ [ <with_c> | <no_c> | \T ]+ \t+ \T+ \t+ # PoS doesn't matter to us, so ignore it $<freq>=[<.digit>+] \h* }
token with_c { c <ie_part> }
token no_c { <ie_part> }
token ie_part { ie | ei }
}
class CollectWords::Actions {
method TOP($/) { make $<word>».ast».flat.Bag; }
method word($/) { if $<with_c> + $<no_c> { make ($<with_c>».ast xx $<freq>, $<no_c>».ast xx $<freq>); } else { make (); } }
method with_c($/) { make "c" ~ $<ie_part>; }
method no_c($/) { make "!c" ~ $<ie_part>; }
}
sub plausible($good, $bad, $msg) {
if $good > 2*$bad { say "$msg: PLAUSIBLE ($good ✔ vs. $bad ✘)"; return True; } else { say "$msg: NOT PLAUSIBLE ($good ✔ vs. $bad ✘)"; return False; }
}
- can't use .parsefile like before due to the non-Unicode £ in this file.
my $file = slurp("1_2_all_freq.txt", :enc<iso-8859-1>); my $results = CollectWords.parse($file, :actions(CollectWords::Actions)).ast;
my $phrasetest = [&] plausible($results<!cie>, $results<!cei>, "I before E when not preceded by C"),
plausible($results<cei>, $results<cie>, "E before I when preceded by C");
say "I before E except after C: ", $phrasetest ?? "PLAUSIBLE" !! "NOT PLAUSIBLE";</lang>
- Output:
I before E when not preceded by C: NOT PLAUSIBLE (8222 ✔ vs. 4826 ✘) E before I when preceded by C: NOT PLAUSIBLE (327 ✔ vs. 994 ✘) I before E except after C: NOT PLAUSIBLE
PowerShell
<lang Powershell>$Web = New-Object -TypeName Net.Webclient $Words = $web.DownloadString('http://www.puzzlers.org/pub/wordlists/unixdict.txt')
$IE = $EI = $CIE = $CEI = @()
$Clause1 = $Clause2 = $MainClause = $false
foreach ($Word in $Words.split()) {
switch ($Word) { {($_ -like '*ie*') -and ($_ -notlike '*cie*')} {$IE += $Word} {($_ -like '*ei*') -and ($_ -notlike '*cei*')} {$EI += $Word} {$_ -like '*cei*'} {$CEI += $Word} {$_ -like '*cie*'} {$CIE += $Word} }
}
if ($IE.count -gt $EI.count * 2) {$Clause1 = $true} "The plausibility of 'I before E when not preceded by C' is $Clause1"
if ($CIE.count -gt $CEI.count * 2) {$Clause2 = $true} "The plausibility of 'II before E when preceded by C' is $Clause2"
if ($Clause1 -and $Clause2) {$MainClause = $True} "The plausibility of the phrase 'I before E except after C' is $MainClause" </lang>
- Output:
The plausibility of 'I before E when not preceded by C' is True The plausibility of 'II before E when preceded by C' is False The plausibility of the phrase 'I before E except after C' is False
Python
<lang python>import urllib.request import re
PLAUSIBILITY_RATIO = 2
def plausibility_check(comment, x, y):
print('\n Checking plausibility of: %s' % comment) if x > PLAUSIBILITY_RATIO * y: print(' PLAUSIBLE. As we have counts of %i vs %i, a ratio of %4.1f times' % (x, y, x / y)) else: if x > y: print(' IMPLAUSIBLE. As although we have counts of %i vs %i, a ratio of %4.1f times does not make it plausible' % (x, y, x / y)) else: print(' IMPLAUSIBLE, probably contra-indicated. As we have counts of %i vs %i, a ratio of %4.1f times' % (x, y, x / y)) return x > PLAUSIBILITY_RATIO * y
def simple_stats(url='http://www.puzzlers.org/pub/wordlists/unixdict.txt'):
words = urllib.request.urlopen(url).read().decode().lower().split() cie = len({word for word in words if 'cie' in word}) cei = len({word for word in words if 'cei' in word}) not_c_ie = len({word for word in words if re.search(r'(^ie|[^c]ie)', word)}) not_c_ei = len({word for word in words if re.search(r'(^ei|[^c]ei)', word)}) return cei, cie, not_c_ie, not_c_ei
def print_result(cei, cie, not_c_ie, not_c_ei):
if ( plausibility_check('I before E when not preceded by C', not_c_ie, not_c_ei) & plausibility_check('E before I when preceded by C', cei, cie) ): print('\nOVERALL IT IS PLAUSIBLE!') else: print('\nOVERALL IT IS IMPLAUSIBLE!') print('(To be plausible, one count must exceed another by %i times)' % PLAUSIBILITY_RATIO)
print('Checking plausibility of "I before E except after C":') print_result(*simple_stats())</lang>
- Output:
Checking plausibility of "I before E except after C": Checking plausibility of: I before E when not preceded by C PLAUSIBLE. As we have counts of 465 vs 213, a ratio of 2.2 times Checking plausibility of: E before I when preceded by C IMPLAUSIBLE, probably contra-indicated. As we have counts of 13 vs 24, a ratio of 0.5 times OVERALL IT IS IMPLAUSIBLE! (To be plausible, one count must exceed another by 2 times)
Python: Stretch Goal
Add the following to the bottom of the previous program: <lang python>def stretch_stats(url='http://ucrel.lancs.ac.uk/bncfreq/lists/1_2_all_freq.txt'):
freq = [line.strip().lower().split() for line in urllib.request.urlopen(url) if len(line.strip().split()) == 3] wordfreq = [(word.decode(), int(frq)) for word, pos, frq in freq[1:] if (b'ie' in word) or (b'ei' in word)] cie = sum(frq for word, frq in wordfreq if 'cie' in word) cei = sum(frq for word, frq in wordfreq if 'cei' in word) not_c_ie = sum(frq for word, frq in wordfreq if re.search(r'(^ie|[^c]ie)', word)) not_c_ei = sum(frq for word, frq in wordfreq if re.search(r'(^ei|[^c]ei)', word)) return cei, cie, not_c_ie, not_c_ei
print('\n\nChecking plausibility of "I before E except after C"') print('And taking account of word frequencies in British English:') print_result(*stretch_stats())</lang>
To produce this extra output:
Checking plausibility of "I before E except after C" And taking account of word frequencies in British English: Checking plausibility of: I before E when not preceded by C IMPLAUSIBLE. As although we have counts of 8192 vs 4826, a ratio of 1.7 times does not make it plausible Checking plausibility of: E before I when preceded by C IMPLAUSIBLE, probably contra-indicated. As we have counts of 327 vs 994, a ratio of 0.3 times OVERALL IT IS IMPLAUSIBLE! (To be plausible, one count must exceed another by 2 times)
R
<lang rsplus>words = tolower(readLines("http://www.puzzlers.org/pub/wordlists/unixdict.txt")) ie.npc = sum(grepl("(?<!c)ie", words, perl = T)) ei.npc = sum(grepl("(?<!c)ei", words, perl = T)) ie.pc = sum(grepl("cie", words, fixed = T)) ei.pc = sum(grepl("cei", words, fixed = T))
p1 = ie.npc > 2 * ei.npc p2 = ei.pc > 2 * ie.pc
message("(1) is ", (if (p1) "" else "not "), "plausible.") message("(2) is ", (if (p2) "" else "not "), "plausible.") message("The whole phrase is ", (if (p1 && p2) "" else "not "), "plausible.")</lang>
Output:
(1) is plausible. (2) is not plausible. The whole phrase is not plausible.
Racket
<lang racket>#lang racket
(define (get-tallies filename line-parser . patterns)
(for/fold ([totals (make-list (length patterns) 0)]) ([line (file->lines filename)]) (match-let ([(list word n) (line-parser line)]) (for/list ([p patterns] [t totals]) (if (regexp-match? p word) (+ n t) t)))))
(define (plausible test) (string-append (if test "" "IM") "PLAUSIBLE"))
(define (subrule description examples counters)
(let ([result (> examples (* 2 counters))]) (printf " The sub-rule \"~a\" is ~a. There were ~a examples and ~a counter-examples.\n" description (plausible result) examples counters) result))
(define (plausibility description filename parser)
(printf "~a:\n" description) (match-let ([(list cei cie ie ei) (get-tallies filename parser "cei" "cie" "ie" "ei")]) (let ([rule1 (subrule "I before E when not preceded by C" (- ie cie) (- ei cei))] [rule2 (subrule "E before I when preceded by C" cei cie)]) (printf "\n Overall, the rule \"I before E, except after C\" is ~a.\n" (plausible (and rule1 rule2))))))
(define (parse-frequency-data line)
(let ([words (string-split line)]) (list (string-join (drop-right words 2)) (string->number (last words)))))
(plausibility "Dictionary" "unixdict.txt" (λ (line) (list line 1))) (newline) (plausibility "Word frequencies (stretch goal)" "1_2_all_freq.txt" parse-frequency-data)</lang>
- Output:
Dictionary: The sub-rule "I before E when not preceded by C" is PLAUSIBLE. There were 465 examples and 213 counter-examples. The sub-rule "E before I when preceded by C" is IMPLAUSIBLE. There were 13 examples and 24 counter-examples. Overall, the rule "I before E, except after C" is IMPLAUSIBLE. Word frequencies (stretch goal): The sub-rule "I before E when not preceded by C" is IMPLAUSIBLE. There were 8163 examples and 4826 counter-examples. The sub-rule "E before I when preceded by C" is IMPLAUSIBLE. There were 327 examples and 994 counter-examples. Overall, the rule "I before E, except after C" is IMPLAUSIBLE.
REXX
The following assumptions were made about the (default) dictionary:
- there could be leading and/or trailing blanks or tabs
- the dictionary words are in mixed case.
- there could be blank lines
- there may be more than one occurrence of a target string within a word [einsteinium]
unweighted version
<lang rexx>/*REXX pgm shows plausibility of I before E when not preceded by C, and*/ /*────────────────────────────── E before I when preceded by C. */
- .=0 /*zero out various word counters.*/
parse arg iFID .; if iFID== then iFID='UNIXDICT.TXT' /*use default?*/
do r=0 while lines(ifid)\==0; _=linein(iFID) /*get a single line.*/ u=translate(space(_,0)) /*elide superfluous blanks & tabs*/ if u== then iterate /*if a blank line, then ignore it*/ #.words=#.words+1 /*keep a running count of #words.*/ if pos('EI',u)\==0 & pos('IE',u)\==0 then #.both=#.both+1 /*has both.*/ call find 'ie' call find 'ei' end /*r*/
L=length(#.words) /*use this to align the output #s*/ say 'lines in the ' ifid ' dictionary: ' r say 'words in the ' ifid ' dictionary: ' #.words say say 'words with "IE" and "EI" (in same word): ' right(#.both,L) say 'words with "IE" and preceded by "C": ' right(#.ie.c ,L) say 'words with "IE" and not preceded by "C": ' right(#.ie.z ,L) say 'words with "EI" and preceded by "C": ' right(#.ei.c ,L) say 'words with "EI" and not preceded by "C": ' right(#.ei.z ,L) say; mantra='The spelling mantra ' p1=#.ie.z/max(1,#.ei.z); phrase='"I before E when not preceded by C"' say mantra phrase ' is ' word("im", 1+(p1>2))'plausible.' p2=#.ie.c/max(1,#.ei.c); phrase='"E before I when preceded by C"' say mantra phrase ' is ' word("im", 1+(p2>2))'plausible.' po=p1>2 & p2>2; say 'Overall, it is' word("im",1+po)'plausible.' exit /*stick a fork in it, we're done.*/ /*──────────────────────────────────FIND subroutine─────────────────────*/ find: arg x; s=1; do forever; _=pos(x,u,s); if _==0 then leave
if substr(u,_-1+(_==1)*999,1)=='C' then #.x.c=#.x.c+1 else #.x.z=#.x.z+1 s=_+1 /*handle case of multiple finds. */ end /*forever*/
return</lang> output when using the default dictionary
lines in the UNIXDICT.TXT dictionary: 25104 words in the UNIXDICT.TXT dictionary: 25104 words with "IE" and "EI" (in same word): 4 words with "IE" and preceded by "C": 24 words with "IE" and not preceded by "C": 465 words with "EI" and preceded by "C": 13 words with "EI" and not preceded by "C": 213 The spelling mantra "I before E when not preceded by C" is plausible. The spelling mantra "E before I when preceded by C" is implausible. Overall, it is implausible.
weighted version
Using the default word frequency count file, several discrepancies (or not) became apparent:
- some "words" were in fact, phrases
- some words were in the form of x / y indicating x OR y
- some words were in the form of x/y (with no blanks) indicating x OR y, or a word)
- some words had a ~ prefix
- some words had a * suffix
- some words had a ~ suffix
- some words had a ~ and * suffix
- one word had a ~ prefix and a ~ suffix
- some lines had an imbedded [xxx] comment
- some words had a ' (quote) prefix to indicate a:
- possessive
- plural
- contraction
- word (as is)
All of the cases where an asterisk [*] or tilde [~] were used were not programmatically handled within the REXX program; it is assumed that prefixes and suffixes were being used to indicate multiple words that either begin or end with (any) string (or in some case, both).
A cursory look at the file seems to indicate that the use of the tilde and/or asterisk doesn't affect the rules for the mantra phrases.
<lang rexx>/*REXX pgm shows plausibility of I before E when not preceded by C, and*/
/*────────────────────────────── E before I when preceded by C using a*/
/*────────────────────────────── weighted frequency for each word. */
- .=0 /*zero out various word counters.*/
parse arg iFID wFID . if iFID== | iFID==',' then iFID='UNIXDICT.TXT' /*use the default? */ if wFID== | wFID==',' then wFID='WORDFREQ.TXT' /*use the default? */ tabs=xrange('0'x, "f"x) f.=1 /*default word freq. multiplier. */
do recs=0 while lines(wFID)\==0; _=linein(wFID) /*get a record. */ u=translate(_,,tabs); upper u /*trans various tabs & low hexex.*/ u=translate(u,'*', "~") /*translate tildes to an asterisk*/ if u== then iterate /*if a blank line, then ignore it*/ freq=word(u,words(u)) /*get the last token on the line.*/ if \datatype(freq,'W') then iterate /*Not numeric? Then ignore it. */ parse var u w.1 '/' w.2 . /*handle case of: ααα/ßßß ... */
do j=1 for 2; w.j=word(w.j,1) /*strip leading/trailing blanks */ _=w.j; if _== then iterate /*if not present, then ignore it.*/ if j==2 then if w.2==w.1 then iterate /*2nd word=1st word? skip.*/ #.freqs = #.freqs + 1 /*bump word count in FREQ list.*/ f._ = f._ + freq /*add to a word's frequency count*/ end /*ws*/
end /*recs*/
if recs\==0 then say 'lines in the ' wFID ' list: ' recs if #.freqs\==0 then say 'words in the ' wFID ' list: ' #.freqs if #.freqs==0 then weighted=
else weighted=' (weighted)'
say
do r=0 while lines(iFID)\==0; _=linein(iFID) /*get a single line.*/ u=space(_,0); upper u /*elide superfluous blanks & tabs*/ if u== then iterate /*if a blank line, then ignore it*/ #.words=#.words+1 /*keep a running count of #words.*/ one=f.u if pos('EI',u)\==0 & pos('IE',u)\==0 then #.both=#.both+one /*has both*/ call find 'ie' call find 'ei' end /*r*/
L=length(#.words) /*use this to align the output #s*/ say 'lines in the ' iFID ' dictionary: ' r say 'words in the ' iFID ' dictionary: ' #.words say say 'words with "IE" and "EI" (in same word): ' right(#.both,L) weighted say 'words with "IE" and preceded by "C": ' right(#.ie.c ,L) weighted say 'words with "IE" and not preceded by "C": ' right(#.ie.z ,L) weighted say 'words with "EI" and preceded by "C": ' right(#.ei.c ,L) weighted say 'words with "EI" and not preceded by "C": ' right(#.ei.z ,L) weighted say; mantra='The spelling mantra ' p1=#.ie.z/max(1,#.ei.z); phrase='"I before E when not preceded by C"' say mantra phrase ' is ' word("im", 1+(p1>2))'plausible.' p2=#.ie.c/max(1,#.ei.c); phrase='"E before I when preceded by C"' say mantra phrase ' is ' word("im", 1+(p2>2))'plausible.' po=p1>2 & p2>2; say 'Overall, it is' word("im",1+po)'plausible.' exit /*stick a fork in it, we're done.*/ /*──────────────────────────────────FIND subroutine─────────────────────*/ find: arg x; s=1; do forever; _=pos(x,u,s); if _==0 then leave
if substr(u,_-1+(_==1)*999,1)=='C' then #.x.c=#.x.c+one else #.x.z=#.x.z+one s=_+1 /*handle case of multiple finds. */ end /*forever*/
return</lang> output when using the default dictionary and default word frequency list
lines in the WORDFREQ.TXT list: 7727 words in the WORDFREQ.TXT list: 7728 lines in the UNIXDICT.TXT dictionary: 25104 words in the UNIXDICT.TXT dictionary: 25104 words with "IE" and "EI" (in same word): 4 (weighted) words with "IE" and preceded by "C": 719 (weighted) words with "IE" and not preceded by "C": 3818 (weighted) words with "EI" and preceded by "C": 100 (weighted) words with "EI" and not preceded by "C": 4875 (weighted) The spelling mantra "I before E when not preceded by C" is implausible. The spelling mantra "E before I when preceded by C" is plausible. Overall, it is implausible.
Ruby
<lang ruby>require 'open-uri'
plausibility_ratio = 2 counter = Hash.new(0) path = 'http://www.puzzlers.org/pub/wordlists/unixdict.txt' rules = [['I before E when not preceded by C:', 'ie', 'ei'],
['E before I when preceded by C:', 'cei', 'cie']]
open(path){|f| f.each{|line| line.scan(/ie|ei|cie|cei/){|match| counter[match] += 1 }}}
overall_plausible = rules.all? do |(str, x, y)|
num_x, num_y, ratio = counter[x], counter[y], counter[x] / counter[y].to_f plausibility = ratio > plausibility_ratio puts str puts "#{x}: #{num_x}; #{y}: #{num_y}; Ratio: #{ratio.round(2)}: #{ plausibility ? 'Plausible' : 'Implausible'}" plausibility
end
puts "Overall: #{overall_plausible ? 'Plausible' : 'Implausible'}." </lang> Output:
I before E when not preceded by C: ie: 464; ei: 217; Ratio: 2.14: Plausible E before I when preceded by C: cei: 13; cie: 24; Ratio: 0.54: Implausible Overall: Implausible.
Scala
<lang Scala>object I_before_E_except_after_C extends App {
val testIE1 = "(^|[^c])ie".r // i before e when not preceded by c val testIE2 = "cie".r // i before e when preceded by c var countsIE = (0,0)
val testCEI1 = "cei".r // e before i when preceded by c val testCEI2 = "(^|[^c])ei".r // e before i when not preceded by c var countsCEI = (0,0)
scala.io.Source.fromURL("http://www.puzzlers.org/pub/wordlists/unixdict.txt").getLines.map(_.toLowerCase).foreach{word => if (testIE1.findFirstIn(word).isDefined) countsIE = (countsIE._1 + 1, countsIE._2) if (testIE2.findFirstIn(word).isDefined) countsIE = (countsIE._1, countsIE._2 + 1) if (testCEI1.findFirstIn(word).isDefined) countsCEI = (countsCEI._1 + 1, countsCEI._2) if (testCEI2.findFirstIn(word).isDefined) countsCEI = (countsCEI._1, countsCEI._2 + 1) }
def plausible(counts: (Int,Int)) = counts._1 > (2 * counts._2) def plausibility(plausible: Boolean) = if (plausible) "plausible" else "implausible" def plausibility(counts: (Int, Int)): String = plausibility(plausible(counts)) println("I before E when not preceded by C: "+plausibility(countsIE)) println("E before I when preceded by C: "+plausibility(countsCEI)) println("Overall: "+plausibility(plausible(countsIE) && plausible(countsCEI)))
}</lang>
- Output:
I before E when not preceded by C: plausible E before I when preceded by C: implausible Overall: implausible
Seed7
<lang seed7>$ include "seed7_05.s7i";
include "gethttp.s7i"; include "float.s7i";
const integer: PLAUSIBILITY_RATIO is 2;
const func boolean: plausibilityCheck (in string: comment, in integer: x, in integer: y) is func
result var boolean: plausible is FALSE; begin writeln(" Checking plausibility of: " <& comment); if x > PLAUSIBILITY_RATIO * y then writeln(" PLAUSIBLE. As we have counts of " <& x <& " vs " <& y <& " words, a ratio of " <& flt(x) / flt(y) digits 1 lpad 4 <& " times"); elsif x > y then writeln(" IMPLAUSIBLE. As although we have counts of " <& x <& " vs " <& y <& " words, a ratio of " <& flt(x) / flt(y) digits 1 lpad 4 <& " times does not make it plausible"); else writeln(" IMPLAUSIBLE, probably contra-indicated. As we have counts of " <& x <& " vs " <& y <& " words, a ratio of " <& flt(x) / flt(y) digits 1 lpad 4 <& " times"); end if; plausible := x > PLAUSIBILITY_RATIO * y; end func;
const func integer: count (in string: stri, in array string: words) is func
result var integer: count is 0; local var integer: index is 0; begin for key index range words do if pos(words[index], stri) <> 0 then incr(count); end if; end for; end func;
const proc: main is func
local var array string: words is 0 times ""; var integer: cie is 0; var integer: cei is 0; var integer: not_c_ie is 0; var integer: not_c_ei is 0; begin words := split(lower(getHttp("www.puzzlers.org/pub/wordlists/unixdict.txt")), "\n"); cie := count("cie", words); cei := count("cei", words); not_c_ie := count("ie", words) - cie; not_c_ei := count("ei", words) - cei; writeln("Checking plausibility of \"I before E except after C\":"); if plausibilityCheck("I before E when not preceded by C", not_c_ie, not_c_ei) and plausibilityCheck("E before I when preceded by C", cei, cie) then writeln("OVERALL IT IS PLAUSIBLE!"); else writeln("OVERALL IT IS IMPLAUSIBLE!"); writeln("(To be plausible, one word count must exceed another by " <& PLAUSIBILITY_RATIO <& " times)"); end if; end func;</lang>
Output:
Checking plausibility of "I before E except after C": Checking plausibility of: I before E when not preceded by C PLAUSIBLE. As we have counts of 465 vs 213 words, a ratio of 2.2 times Checking plausibility of: E before I when preceded by C IMPLAUSIBLE, probably contra-indicated. As we have counts of 13 vs 24 words, a ratio of 0.5 times OVERALL IT IS IMPLAUSIBLE! (To be plausible, one word count must exceed another by 2 times)
Tcl
<lang tcl>package require http
variable PLAUSIBILITY_RATIO 2.0 proc plausible {description x y} {
variable PLAUSIBILITY_RATIO puts " Checking plausibility of: $description" if {$x > $PLAUSIBILITY_RATIO * $y} {
set conclusion "PLAUSIBLE" set fmt "As we have counts of %i vs %i words, a ratio of %.1f times" set result true
} elseif {$x > $y} {
set conclusion "IMPLAUSIBLE" set fmt "As although we have counts of %i vs %i words," append fmt " a ratio of %.1f times does not make it plausible" set result false
} else {
set conclusion "IMPLAUSIBLE, probably contra-indicated" set fmt "As we have counts of %i vs %i words, a ratio of %.1f times" set result false
} puts [format " %s.\n $fmt" $conclusion $x $y [expr {double($x)/$y}]] return $result
}
set t [http::geturl http://www.puzzlers.org/pub/wordlists/unixdict.txt] set words [split [http::data $t] "\n"] http::cleanup $t foreach {name pattern} {ie (?:^|[^c])ie ei (?:^|[^c])ei cie cie cei cei} {
set count($name) [llength [lsearch -nocase -all -regexp $words $pattern]]
}
puts "Checking plausibility of \"I before E except after C\":" if {
[plausible "I before E when not preceded by C" $count(ie) $count(ei)] && [plausible "E before I when preceded by C" $count(cei) $count(cie)]
} then {
puts "\nOVERALL IT IS PLAUSIBLE!"
} else {
puts "\nOVERALL IT IS IMPLAUSIBLE!"
} puts "\n(To be plausible, one word count must exceed another by\ $PLAUSIBILITY_RATIO times)"</lang>
- Output:
Checking plausibility of "I before E except after C": Checking plausibility of: I before E when not preceded by C PLAUSIBLE. As we have counts of 465 vs 213 words, a ratio of 2.2 times Checking plausibility of: E before I when preceded by C IMPLAUSIBLE, probably contra-indicated. As we have counts of 13 vs 24 words, a ratio of 0.5 times OVERALL IT IS IMPLAUSIBLE! (To be plausible, one word count must exceed another by 2.0 times)
UNIX Shell
<lang bash>#!/bin/sh
matched() {
grep -Poe "$1" unixdict.txt | wc -l
}
check() {
local num_for="$(matched "$3")" local num_against="$(matched "$2")" if [ "$num_for" -le "$(expr 2 \* "$num_against")" ]; then echo "Clause $1 not plausible ($num_for examples; $num_against counterexamples)" return 1 else echo "Clause $1 is plausible ($num_for examples; $num_against counterexamples)" return 0 fi
}
check 1 '(?<!c)ei' '(?<!c)ie' PLAUSIBLE_1=$? check 2 'cie' 'cei' PLAUSIBLE_2=$? if [ $PLAUSIBLE_1 -eq 0 -a $PLAUSIBLE_2 -eq 0 ]; then
echo "Overall, the rule is plausible"
else
echo "Overall, the rule is not plausible"
fi </lang>
- Output:
Clause 1 is plausible (466 examples; 217 counterexamples) Clause 2 not plausible (13 examples; 24 counterexamples) Overall, the rule is not plausible
zkl
<lang zkl>fcn wcnt(wordList,altrs,aAdjust,bltrs,bAdjust,text){
a:=wordList.reduce('wrap(cnt,word){ cnt+word.holds(altrs) },0) - aAdjust; b:=wordList.reduce('wrap(cnt,word){ cnt+word.holds(bltrs) },0) - bAdjust; ratio:=a.toFloat()/b; "%s is %splausible".fmt(text,ratio<2 and "im" or "").println(); " %d cases for and %d cases against is a ratio of %.3f.".fmt(a,b,ratio).println(); return(a,b,ratio);
} wordList:=File("unixdict.txt").read();</lang> <lang zkl>a,b,r1:=wcnt(wordList,"cei",0,"cie",0,"E before I when preceded by C"); _,_,r2:=wcnt(wordList,"ie",b,"ei",a, "I before E when not preceded by C"); "Overall the rule is %splausible".fmt((r1<2 or r2<2) and "im" or "").println();</lang>
- Output:
E before I when preceded by C is implausible 13 cases for and 24 cases against is a ratio of 0.542. I before E when not preceded by C is plausible 465 cases for and 213 cases against is a ratio of 2.183. Overall the rule is implausible
Stretch <lang zkl>fcn wc2(wordList,altrs,aAdjust,bltrs,bAdjust,text){
a,b:=wordList.reduce('wrap(cnts,line){ // don't care if line is "Word PoS Freq" or "as yet Adv 14" word,_,n:=line.split(); if(word.holds(altrs)) cnts[0]=cnts[0]+n; if(word.holds(bltrs)) cnts[1]=cnts[1]+n; cnts },L(0,0)); a-=aAdjust; b-=bAdjust; ratio:=a.toFloat()/b; "%s is %splausible".fmt(text,ratio<2 and "im" or "").println(); " %d cases for and %d cases against is a ratio of %.3f.".fmt(a,b,ratio).println(); return(a,b,ratio);
} wordList:=File("1_2_all_freq.txt").read();</lang>
- Output:
E before I when preceded by C is implausible 327 cases for and 994 cases against is a ratio of 0.329. I before E when not preceded by C is implausible 8148 cases for and 4826 cases against is a ratio of 1.688. Overall the rule is implausible
- Programming Tasks
- Solutions by Programming Task
- AutoHotkey
- AWK
- Awk examples needing attention
- Examples needing attention
- C
- C++
- Coco
- Common Lisp
- Fortran
- Freebasic
- Go
- Haskell
- Icon
- Unicon
- J
- Java
- Lasso
- Mathematica
- MATLAB
- Octave
- MATLAB examples needing attention
- Objeck
- Perl
- Perl 6
- PowerShell
- Python
- R
- Racket
- REXX
- Ruby
- Scala
- Seed7
- Tcl
- UNIX Shell
- Zkl