I before E except after C: Difference between revisions

From Rosetta Code
Content added Content deleted
Line 450: Line 450:
<lang racket>#lang racket
<lang racket>#lang racket


(define-values (cei cie ie ei)
(define (get-tallies filename)
(for/fold ([cei 0] [cie 0] [ie 0] [ei 0])
(for/fold ([cei 0] [cie 0] [ie 0] [ei 0])
([line (file->lines "unixdict.txt")])
([line (file->lines filename)])
(let* ([words (string-split line)]
(define-syntax-rule (tally x ...)
[word (first words)]
(values (if (regexp-match? (symbol->string 'x) line) (add1 x) x) ...))
[n (or (string->number (last words)) 1)])
(tally cei cie ie ei)))
(define-syntax-rule (tally x ...)
(values (if (regexp-match? (symbol->string 'x) word) (+ n x) x) ...))
(tally cei cie ie ei))))


(define (plausible test) (string-append (if test "" "NOT ") "PLAUSIBLE"))
(define (plausible test) (string-append (if test "" "IM") "PLAUSIBLE"))


(define (subrule description examples counters)
(define (subrule description examples counters)
(let ([result (> examples (* 2 counters))])
(let ([result (> examples (* 2 counters))])
(printf "The sub-rule \"~a\" is ~a. There were ~a examples and ~a counter-examples.\n"
(printf " The sub-rule \"~a\" is ~a. There were ~a examples and ~a counter-examples.\n"
description (plausible result) examples counters)
description (plausible result) examples counters)
result))
result))


(define (plausibility description filename)
(printf "\nOverall, the rule \"I before E, except after C\" is ~a.\n"
(printf "~a:\n" description)
(plausible (and (subrule "I before E when not preceded by C" (- ie cie) (- ei cei))
(let-values ([(cei cie ie ei) (get-tallies filename)])
(subrule "E before I when preceded by C" cei cie))))</lang>
(let ([rule1 (subrule "I before E when not preceded by C" (- ie cie) (- ei cei))]
[rule2 (subrule "E before I when preceded by C" cei cie)])
(printf "\n Overall, the rule \"I before E, except after C\" is ~a.\n"
(plausible (and rule1 rule2))))))

(plausibility "Dictionary" "unixdict.txt") (newline)
(plausibility "Word frequencies (stretch goal)" "1_2_all_freq.txt")</lang>


{{output}}
{{output}}


<pre>
<pre>
Dictionary:
The sub-rule "I before E when not preceded by C" is PLAUSIBLE. There were 465 examples and 213 counter-examples.
The sub-rule "E before I when preceded by C" is NOT PLAUSIBLE. There were 13 examples and 24 counter-examples.
The sub-rule "I before E when not preceded by C" is PLAUSIBLE. There were 465 examples and 213 counter-examples.
The sub-rule "E before I when preceded by C" is IMPLAUSIBLE. There were 13 examples and 24 counter-examples.


Overall, the rule "I before E, except after C" is NOT PLAUSIBLE.
Overall, the rule "I before E, except after C" is IMPLAUSIBLE.
</pre>

===Stretch goal===
<lang racket>#lang racket

(require net/url)

; Does str contain sub-str?
;
(define (in? str sub-str)
(regexp-match? (regexp-quote sub-str) str))

(define (split-lines s)
(string-split s "\n"))

; Grab lines from the web
;
(define lines
((compose1 split-lines port->string get-pure-port string->url)
"http://ucrel.lancs.ac.uk/bncfreq/lists/1_2_all_freq.txt"))

(define freqs
(for/list ([line (map string-split lines)]
#:when (and (= 3 (length line))
(string->number (last line))))
(list (first line) (string->number (last line)))))


; Take the words that contain sub1, and return two groups:
; those that contain sub2 and those that don't
;
(define (cases sub1 sub2 freqs)
(partition (λ (f) (in? (first f) sub2))
(filter (λ (f) (in? (first f) sub1)) freqs)))

(define-values (cie ~cie) (cases "ie" "cie" freqs))
(define-values (cei ~cei) (cases "ei" "cei" freqs))

(define (plausible test)
(string-append (if test "" "NOT ") "PLAUSIBLE"))

(define (sum a)
(for/sum ([i a])
(last i)))

(define (dominates a b)
(> (sum a) (* 2 (sum b))))

(define (subrule description examples counters)
(define result (dominates examples counters))
(printf "The sub-rule \"~a\" is ~a.\n" description (plausible result))
(printf " ~a examples: ~a ...\n" (sum examples) (take examples 5))
(printf " ~a counter-examples: ~a ...\n\n" (sum counters) (take counters 5))
result)

(define rule1 (subrule "I before E when not preceded by C" ~cie ~cei))
(define rule2 (subrule "E before I when preceded by C" cei cie))
(printf "Therefore, the rule \"I before E, except after C\" is ~a.\n"
(plausible (and rule1 rule2)))</lang>

{{out}}
<pre>
The sub-rule "I before E when not preceded by C" is NOT PLAUSIBLE.
8148 examples: ((view 214) (believe 212) (experience 189) (companies 178) (patients 173)) ...
4826 counter-examples: ((their 2608) (being 862) (either 220) (eight 173) (foreign 161)) ...


Word frequencies (stretch goal):
The sub-rule "E before I when preceded by C" is NOT PLAUSIBLE.
The sub-rule "I before E when not preceded by C" is IMPLAUSIBLE. There were 8148 examples and 4826 counter-examples.
327 examples: ((received 130) (receive 75) (receiving 29) (ceiling 23) (perceived 17)) ...
The sub-rule "E before I when preceded by C" is IMPLAUSIBLE. There were 327 examples and 994 counter-examples.
994 counter-examples: ((society 238) (science 106) (species 96) (policies 88) (sufficient 59)) ...


Therefore, the rule "I before E, except after C" is NOT PLAUSIBLE.
Overall, the rule "I before E, except after C" is IMPLAUSIBLE.
</pre>
</pre>



Revision as of 13:11, 5 June 2013

Task
I before E except after C
You are encouraged to solve this task according to the task description, using any language you may know.

The phrase "I before E, except after C" is a widely known mnemonic which is supposed to help when spelling English words.

Task Description

Using the word list from http://www.puzzlers.org/pub/wordlists/unixdict.txt, check if the two sub-clauses of the phrase are plausible individually:

  1. "I before E when not preceded by C"
  2. "E before I when preceded by C"

If both sub-phrases are plausible then the original phrase can be said to be plausible.
Something is plausible if the number of words having the feature is more than two times the number of words having the opposite feature (where feature is 'ie' or 'ei' preceded or not by 'c' as appropriate).

Stretch goal

As a stretch goal use the entries from the table of Word Frequencies in Written and Spoken English: based on the British National Corpus, (selecting those rows with three space or tab separated words only), to see if the phrase is plausible when word frequencies are taken into account.

Show your output here as well as your program.

cf.

C

Inspired by the J solution, but implemented as a single pass through the data, we have flex build the finite state machine in C. This may in turn motivate me to provide a second J solution as a single pass FSM. Please find the program output hidden at the top of the source as part of the build and example run. <lang c> %{

 /*
   compilation and example on a GNU linux system:

   $ flex --case-insensitive --noyywrap --outfile=cia.c source.l
   $ make LOADLIBES=-lfl cia 
   $ ./cia < unixdict.txt 
   I before E when not preceded by C: plausible
   E before I when preceded by C: implausible
   Overall, the rule is: implausible 
 */
 int cie, cei, ie, ei;

%}

%%

cie ++cie, ++ie; /* longer patterns are matched preferentially, consuming input */ cei ++cei, ++ei; ie ++ie; ei ++ei; .|\n ;

%%

int main() {

 cie = cei = ie = ei = 0;
 yylex();
 printf("%s: %s\n","I before E when not preceded by C", (2*ei < ie ? "plausible" : "implausible"));
 printf("%s: %s\n","E before I when preceded by C", (2*cie < cei ? "plausible" : "implausible"));
 printf("%s: %s\n","Overall, the rule is", (2*(cie+ei) < (cei+ie) ? "plausible" : "implausible"));
 return 0;

} </lang>

Common Lisp

<lang lisp>(defun plausibility (rule-name examples counter-examples)

 (let ((plausible (if (> examples (* 2 counter-examples))
                      'plausible 'not-plausible)))
   (format t "The rule \"~a\" is ~S. There were ~a examples and ~a counter-examples.~%"
           rule-name plausible examples counter-examples)
   plausible))

(with-open-file (stream #p"unixdict.txt")

 (let ((cei 0) (cie 0) (ie 0) (ei 0))
   (macrolet ((search-counter (&rest terms)
                (when terms
                  `(if (search ,(string-downcase (symbol-name (car terms))) line)
                       (incf ,(car terms))
                       (search-counter ,@(cdr terms))))))
     (do ((line (read-line stream nil)
                (read-line stream nil)))
         ((null line))
       (search-counter cei cie ie ei)))
   (flet ((plausible-p (&rest results)
            (or (car (member 'not-plausible results))
                'plausible)))
     (format t "~%~%Overall the rule is ~S~%"
             (plausible-p (plausibility "I before E when not preceded by C" ie ei)
                          (plausibility "E before I when preceded by C" cei cie))))))</lang>

Output:

The rule "I before E when not preceded by C" is PLAUSIBLE. There were 465 examples and 209 counter-examples.
The rule "E before I when preceded by C" is NOT-PLAUSIBLE. There were 13 examples and 24 counter-examples.


Overall the rule is NOT-PLAUSIBLE

Fortran

Please find the linux build instructions along with example run in the comments at the beginning of the f90 source. Thank you. <lang FORTRAN> !-*- mode: compilation; default-directory: "/tmp/" -*- !Compilation started at Sat May 18 22:19:19 ! !a=./F && make $a && $a < unixdict.txt !f95 -Wall -ffree-form F.F -o F ! ie ei cie cei ! 490 230 24 13 ! [^c]ie plausible ! cei implausible ! ([^c]ie)|(cei) implausible ! !Compilation finished at Sat May 18 22:19:19

! test the plausibility of i before e except... program cia

 implicit none
 character (len=256) :: s
 integer :: ie, ei, cie, cei
 integer :: ios
 data ie, ei, cie, cei/4*0/
 do while (.true.)
   read(5,*,iostat = ios)s
   if (0 .ne. ios) then
     exit
   endif
   call lower_case(s)
   cie = cie + occurrences(s, 'cie')
   cei = cei + occurrences(s, 'cei')
   ie = ie + occurrences(s, 'ie')
   ei = ei + occurrences(s, 'ei')
 enddo
 write(6,'(1x,4(a4,1x))') 'ie','ei','cie','cei'
 write(6,'(1x,4(i4,1x))') ie,ei,cie,cei ! 488 230 24 13
 write(6,'(1x,2(a,1x))') '        [^c]ie',plausibility(ie,ei)
 write(6,'(1x,2(a,1x))') '           cei',plausibility(cei,cie)
 write(6,'(1x,2(a,1x))') '([^c]ie)|(cei)',plausibility(ie+cei,ei+cie)

contains

 subroutine lower_case(s)
   character(len=*), intent(inout) :: s
   integer :: i
   do i=1, len_trim(s)
     s(i:i) = achar(ior(iachar(s(i:i)),32))
   enddo
 end subroutine lower_case
 integer function occurrences(a,b)
   character(len=*), intent(in) :: a, b
   integer :: i, j, n
   n = 0
   i = 0
   j = index(a, b)
   do while (0 .lt. j)
     n = n+1
     i = i+len(b)+j-1
     j = index(a(i:), b)
   end do
   occurrences = n
 end function occurrences
 character*(32) function plausibility(da, nyet)
   integer, intent(in) :: da, nyet
   !write(0,*)da,nyet
   if (nyet*2 .lt. da) then
     plausibility = 'plausible'
   else
     plausibility = 'implausible'
   endif
 end function plausibility

end program cia </lang>

J

After downloading unixdict to /tmp:

<lang J> dict=:tolower fread '/tmp/unixdict.txt'</lang>

Investigating the rules:

<lang J> +/'cie' E. dict 24

  +/'cei' E. dict

13

  +/'ie' E. dict

490

  +/'ei' E. dict

230</lang>

So, based on unixdict.txt, the "I before E" rule seems plausible (490 > 230 by more than a factor of 2), but the exception does not make much sense (we see almost twice as many i before e after a c as we see e before i after a c).

Note that if we looked at frequency of use for words, instead of considering all words to have equal weights, we might come up with a different answer.

stretch goal

After downloading 1_2_all_freq to /tmp, we can read it into J, and break out the first column (as words) and the third column as numbers:

<lang J>allfreq=: |:}.<;._1;._2]1!:1<'/tmp/1_2_all_freq.txt'

words=: >0 { allfreq freqs=: 0 {.@".&>2 { allfreq</lang>

With these definitions, we can define a prevalence verb which will tell us how often a particular substring is appears in use:

<lang J>prevalence=:verb define

 (y +./@E."1 words) +/ .* freqs

)</lang>

Investigating our original proposed rules:

<lang J> 'ie' %&prevalence 'ei' 1.76868</lang>

A generic "i before e" rule is not looking quite as good now - words that have i before e are used less than twice as much as words which use e before i.

<lang J> 'cei' %&prevalence 'cie' 0.328974</lang>

An "except after c" variant is looking awful now - words that use the cie sequence are three times as likely as words that use the cei sequence. So, of course, if we modified our original rule with this exception it would weaken the original rule:

<lang J> ('ie' -&prevalence 'cie') % ('ei' -&prevalence 'cei') 1.68255</lang>

Note that we might also want to consider non-adjacent matches (the regular expression 'i.*e' instead of 'ie' or perhaps 'c.*ie' or 'c.*i.*e' instead of 'cie') - this would be straightforward to check, but this would bulk up the page.

Java

Download and save wordlist to unixdict.txt.

<lang java> import java.io.BufferedReader; import java.io.FileReader;

public class IbeforeE { public static void main(String[] args) { IbeforeE now=new IbeforeE(); String wordlist="unixdict.txt"; if(now.isPlausibleRule(wordlist)) System.out.println("Rule is plausible."); else System.out.println("Rule is not plausible."); } boolean isPlausibleRule(String filename) { int truecount=0,falsecount=0; try { BufferedReader br=new BufferedReader(new FileReader(filename)); String word; while((word=br.readLine())!=null) { if(isPlausibleWord(word)) truecount++; else if(isOppPlausibleWord(word)) falsecount++; } br.close(); } catch(Exception e) { System.out.println("Something went horribly wrong: "+e.getMessage()); }

System.out.println("Plausible count: "+truecount); System.out.println("Implausible count: "+falsecount); if(truecount>2*falsecount) return true; return false; } boolean isPlausibleWord(String word) { if(!word.contains("c")&&word.contains("ie")) return true; else if(word.contains("cei")) return true; return false; } boolean isOppPlausibleWord(String word) { if(!word.contains("c")&&word.contains("ei")) return true; else if(word.contains("cie")) return true; return false; } } </lang>

Output:

Plausible count: 384
Implausible count: 204
Rule is not plausible.

Objeck

Translation of: Seed7

<lang objeck> use HTTP; use Collection;

class HttpTest {

 function : Main(args : String[]) ~ Nil {
   IsPlausibleRule("http://www.puzzlers.org/pub/wordlists/unixdict.txt");
 }
 function : PlausibilityCheck(comment : String, x : Int, y : Int) ~ Bool {
   ratio := x->As(Float) / y->As(Float);
   "  Checking plausibility of: {$comment}"->PrintLine();
   if(x > 2 * y) {
     "    PLAUSIBLE. As we have counts of {$x} vs {$y} words, a ratio of {$ratio} times"->PrintLine();
   }
   else if(x > y) {
     "    IMPLAUSIBLE. As although we have counts of {$x} vs {$y} words, a ratio of {$ratio} times does not make it plausible"->PrintLine();
   }
   else {
     "    IMPLAUSIBLE, probably contra-indicated. As we have counts of {$x} vs {$y} words, a ratio of {$ratio} times"->PrintLine();
   };
   return x > 2 * y;
 }
 function : IsPlausibleRule(url : String) ~ Nil {
   truecount := 0;
   falsecount := 0;
   client := HttpClient->New();
   data := client->Get(url)->Get(0)->As(String);
   data := data->ToLower();
   words := data->Split("\n");
   cie := Count("cie", words);
   cei := Count("cei", words);
   not_c_ie := Count("ie", words) - cie;
   not_c_ei := Count("ei", words) - cei;
   "Checking plausibility of \"I before E except after C\":"->PrintLine();
   if(PlausibilityCheck("I before E when not preceded by C", not_c_ie, not_c_ei) &
       PlausibilityCheck("E before I when preceded by C", cei, cie)) {
     "OVERALL IT IS PLAUSIBLE!"->PrintLine();
   }
   else {
     "OVERALL IT IS IMPLAUSIBLE!"->PrintLine();
     "(To be plausible, one word count must exceed another by 2 times)"->PrintLine();
   };
 }
 function : Count(check: String, words : String[]) ~ Int {
   count := 0;
   each(i : words) {
     if(words[i]->Find(check) > -1) {
       count += 1;
     };
   };
   return count;
 }

} </lang>

Output:

Checking plausibility of "I before E except after C":
  Checking plausibility of: I before E when not preceded by C
    PLAUSIBLE. As we have counts of 465 vs 213 words, a ratio of 2.183 times
  Checking plausibility of: E before I when preceded by C
            IMPLAUSIBLE, probably contra-indicated. As we have counts of 13 vs 24 words, a ratio of 0.542 times
OVERALL IT IS IMPLAUSIBLE!
(To be plausible, one word count must exceed another by 2 times)

Python

<lang python>import urllib.request import re

PLAUSIBILITY_RATIO = 2

def plausibility_check(comment, x, y):

   print('\n  Checking plausibility of: %s' % comment)
   if x > PLAUSIBILITY_RATIO * y:
       print('    PLAUSIBLE. As we have counts of %i vs %i, a ratio of %4.1f times'
             % (x, y, x / y))
   else:
       if x > y:
           print('    IMPLAUSIBLE. As although we have counts of %i vs %i, a ratio of %4.1f times does not make it plausible'
                 % (x, y, x / y))
       else:
           print('    IMPLAUSIBLE, probably contra-indicated. As we have counts of %i vs %i, a ratio of %4.1f times'
                 % (x, y, x / y))
   return x > PLAUSIBILITY_RATIO * y

def simple_stats(url='http://www.puzzlers.org/pub/wordlists/unixdict.txt'):

   words = urllib.request.urlopen(url).read().decode().lower().split()
   cie = len({word for word in words if 'cie' in word})
   cei = len({word for word in words if 'cei' in word})
   not_c_ie = len({word for word in words if re.search(r'(^ie|[^c]ie)', word)})
   not_c_ei = len({word for word in words if re.search(r'(^ei|[^c]ei)', word)})
   return cei, cie, not_c_ie, not_c_ei

def print_result(cei, cie, not_c_ie, not_c_ei):

   if ( plausibility_check('I before E when not preceded by C', not_c_ie, not_c_ei)
        & plausibility_check('E before I when preceded by C', cei, cie) ):
       print('\nOVERALL IT IS PLAUSIBLE!')
   else:
       print('\nOVERALL IT IS IMPLAUSIBLE!')
   print('(To be plausible, one count must exceed another by %i times)' % PLAUSIBILITY_RATIO)

print('Checking plausibility of "I before E except after C":') print_result(*simple_stats())</lang>

Output:
Checking plausibility of "I before E except after C":

  Checking plausibility of: I before E when not preceded by C
    PLAUSIBLE. As we have counts of 465 vs 213, a ratio of  2.2 times

  Checking plausibility of: E before I when preceded by C
    IMPLAUSIBLE, probably contra-indicated. As we have counts of 13 vs 24, a ratio of  0.5 times

OVERALL IT IS IMPLAUSIBLE!
(To be plausible, one count must exceed another by 2 times)

Python: Stretch Goal

Add the following to the bottom of the previous program: <lang python>def stretch_stats(url='http://ucrel.lancs.ac.uk/bncfreq/lists/1_2_all_freq.txt'):

   freq = [line.strip().lower().split()
           for line in urllib.request.urlopen(url)
           if len(line.strip().split()) == 3]
   wordfreq = [(word.decode(), int(frq))
               for word, pos, frq in freq[1:]
               if (b'ie' in word) or (b'ei' in word)]
   cie = sum(frq for word, frq in wordfreq if 'cie' in word)
   cei = sum(frq for word, frq in wordfreq if 'cei' in word)
   not_c_ie = sum(frq for word, frq in wordfreq if re.search(r'(^ie|[^c]ie)', word))
   not_c_ei = sum(frq for word, frq in wordfreq if re.search(r'(^ei|[^c]ei)', word))
   return cei, cie, not_c_ie, not_c_ei

print('\n\nChecking plausibility of "I before E except after C"') print('And taking account of word frequencies in British English:') print_result(*stretch_stats())</lang>

To produce this extra output:

Checking plausibility of "I before E except after C"
And taking account of word frequencies in British English:

  Checking plausibility of: I before E when not preceded by C
    IMPLAUSIBLE. As although we have counts of 8192 vs 4826, a ratio of  1.7 times does not make it plausible

  Checking plausibility of: E before I when preceded by C
    IMPLAUSIBLE, probably contra-indicated. As we have counts of 327 vs 994, a ratio of  0.3 times

OVERALL IT IS IMPLAUSIBLE!
(To be plausible, one count must exceed another by 2 times)

Racket

<lang racket>#lang racket

(define (get-tallies filename)

 (for/fold ([cei 0] [cie 0] [ie 0] [ei 0])
   ([line (file->lines filename)])
   (let* ([words (string-split line)]
          [word (first words)]
          [n (or (string->number (last words)) 1)]) 
     (define-syntax-rule (tally x ...)
       (values (if (regexp-match? (symbol->string 'x) word) (+ n x) x) ...))
   (tally cei cie ie ei))))

(define (plausible test) (string-append (if test "" "IM") "PLAUSIBLE"))

(define (subrule description examples counters)

 (let ([result (> examples (* 2 counters))])
   (printf "  The sub-rule \"~a\" is ~a.  There were ~a examples and ~a counter-examples.\n" 
           description (plausible result) examples counters)
   result))

(define (plausibility description filename)

 (printf "~a:\n" description)
 (let-values ([(cei cie ie ei) (get-tallies filename)])
   (let ([rule1 (subrule "I before E when not preceded by C" (- ie cie) (- ei cei))]
         [rule2 (subrule "E before I when preceded by C" cei cie)])
     (printf "\n  Overall, the rule \"I before E, except after C\" is ~a.\n"
             (plausible (and rule1 rule2))))))

(plausibility "Dictionary" "unixdict.txt") (newline) (plausibility "Word frequencies (stretch goal)" "1_2_all_freq.txt")</lang>

Output:
Dictionary:
  The sub-rule "I before E when not preceded by C" is PLAUSIBLE.  There were 465 examples and 213 counter-examples.
  The sub-rule "E before I when preceded by C" is IMPLAUSIBLE.  There were 13 examples and 24 counter-examples.

  Overall, the rule "I before E, except after C" is IMPLAUSIBLE.

Word frequencies (stretch goal):
  The sub-rule "I before E when not preceded by C" is IMPLAUSIBLE.  There were 8148 examples and 4826 counter-examples.
  The sub-rule "E before I when preceded by C" is IMPLAUSIBLE.  There were 327 examples and 994 counter-examples.

  Overall, the rule "I before E, except after C" is IMPLAUSIBLE.

REXX

The following assumptions were made about the (default) dictionary:

  • there could be leading and/or trailing blanks or tabs
  • the dictionary words are in mixed case.
  • there could be blank lines
  • there may be more than one occurrence of a target string within a word [einsteinium]

unweighted version

<lang rexx>/*REXX pgm shows plausibility of I before E when not preceded by C, and*/ /*────────────────────────────── E before I when preceded by C. */

  1. .=0 /*zero out various word counters.*/

parse arg iFID .; if iFID== then iFID='UNIXDICT.TXT' /*use default?*/

 do r=0  while lines(ifid)\==0;    _=linein(iFID)  /*get a single line.*/
 u=translate(space(_,0))              /*elide superfluous blanks & tabs*/
 if u==             then iterate    /*if a blank line, then ignore it*/
 #.words=#.words+1                    /*keep a running count of #words.*/
 if pos('EI',u)\==0 & pos('IE',u)\==0 then #.both=#.both+1  /*has both.*/
 call find 'ie'
 call find 'ei'
 end   /*r*/

L=length(#.words) /*use this to align the output #s*/ say 'lines in the ' ifid ' dictionary: ' r say 'words in the ' ifid ' dictionary: ' #.words say say 'words with "IE" and "EI" (in same word): ' right(#.both,L) say 'words with "IE" and preceded by "C": ' right(#.ie.c ,L) say 'words with "IE" and not preceded by "C": ' right(#.ie.z ,L) say 'words with "EI" and preceded by "C": ' right(#.ei.c ,L) say 'words with "EI" and not preceded by "C": ' right(#.ei.z ,L) say; mantra='The spelling mantra ' p1=#.ie.z/max(1,#.ei.z); phrase='"I before E when not preceded by C"' say mantra phrase ' is ' word("im", 1+(p1>2))'plausible.' p2=#.ie.c/max(1,#.ei.c); phrase='"E before I when preceded by C"' say mantra phrase ' is ' word("im", 1+(p2>2))'plausible.' po=p1>2 & p2>2; say 'Overall, it is' word("im",1+po)'plausible.' exit /*stick a fork in it, we're done.*/ /*──────────────────────────────────FIND subroutine─────────────────────*/ find: arg x; s=1; do forever; _=pos(x,u,s); if _==0 then leave

                   if substr(u,_-1+(_==1)*999,1)=='C'  then #.x.c=#.x.c+1
                                                       else #.x.z=#.x.z+1
                   s=_+1              /*handle case of multiple finds. */
                   end   /*forever*/

return</lang> output when using the default dictionary

lines in the   UNIXDICT.TXT  dictionary:  25104
words in the   UNIXDICT.TXT  dictionary:  25104

words with "IE" and "EI" (in same word):      4
words with "IE" and     preceded by "C":     24
words with "IE" and not preceded by "C":    465
words with "EI" and     preceded by "C":     13
words with "EI" and not preceded by "C":    213

The spelling mantra   "I before E when not preceded by C"  is  plausible.
The spelling mantra   "E before I when     preceded by C"  is  implausible.
Overall, it is implausible.

weighted version

Using the default word frequency count file, several discrepancies (or not) became apparent:

  • some "words" were in fact, phrases
  • some words were in the form of     x / y     indicating x OR y
  • some words were in the form of     x/y     (with no blanks)   indicating x OR y,   or a word)
  • some words had a ~ prefix
  • some words had a * suffix
  • some words had a ~ suffix
  • some words had a ~ and * suffix
  • one word had a ~ prefix and a ~ suffix
  • some lines had an imbedded [xxx] comment
  • some words had a   '   (quote)   prefix to indicate a:
    • possessive
    • plural
    • contraction
    • word   (as is)

All of the cases where an asterisk [*] or tilde [~] were used were not programmatically handled within the REXX program;   it is assumed that prefixes and suffixes were being used to indicate multiple words that either begin or end with (any) string   (or in some case, both).
A cursory look at the file seems to indicate that the use of the tilde and/or asterisk doesn't affect the rules for the mantra phrases. <lang rexx>/*REXX pgm shows plausibility of I before E when not preceded by C, and*/ /*────────────────────────────── E before I when preceded by C using a*/ /*────────────────────────────── weighted frequency for each word. */

  1. .=0 /*zero out various word counters.*/

parse arg iFID wFID . if iFID== | iFID==',' then iFID='UNIXDICT.TXT' /*use the default? */ if wFID== | wFID==',' then wFID='WORDFREQ.TXT' /*use the default? */ tabs=xrange('0'x, "f"x) f.=1 /*default word freq. multiplier. */

 do recs=0  while lines(wFID)\==0;  _=linein(wFID)  /*get a record.    */
 u=translate(_,,tabs);   upper u      /*trans various tabs & low hexex.*/
 u=translate(u,'*', "~")              /*translate tildes to an asterisk*/
 if u==                then iterate /*if a blank line, then ignore it*/
 freq=word(u,words(u))                /*get the last token on the line.*/
 if \datatype(freq,'W')  then iterate /*Not numeric?   Then ignore it. */
 parse var u w.1 '/' w.2 .            /*handle case of:   ααα/ßßß  ... */
    do j=1  for 2;  w.j=word(w.j,1)   /*strip leading/trailing blanks  */
    _=w.j;   if _==  then iterate   /*if not present, then ignore it.*/
    if j==2  then if w.2==w.1  then iterate  /*2nd word=1st word? skip.*/
    #.freqs = #.freqs + 1             /*bump word count in  FREQ  list.*/
    f._ = f._ + freq                  /*add to a word's frequency count*/
    end   /*ws*/
 end   /*recs*/

if recs\==0 then say 'lines in the ' wFID ' list: ' recs if #.freqs\==0 then say 'words in the ' wFID ' list: ' #.freqs if #.freqs==0 then weighted=

               else weighted=' (weighted)'

say

 do r=0  while lines(iFID)\==0;    _=linein(iFID)  /*get a single line.*/
 u=space(_,0);  upper u               /*elide superfluous blanks & tabs*/
 if u==             then iterate    /*if a blank line, then ignore it*/
 #.words=#.words+1                    /*keep a running count of #words.*/
 one=f.u
 if pos('EI',u)\==0 & pos('IE',u)\==0 then #.both=#.both+one /*has both*/
 call find 'ie'
 call find 'ei'
 end   /*r*/

L=length(#.words) /*use this to align the output #s*/ say 'lines in the ' iFID ' dictionary: ' r say 'words in the ' iFID ' dictionary: ' #.words say say 'words with "IE" and "EI" (in same word): ' right(#.both,L) weighted say 'words with "IE" and preceded by "C": ' right(#.ie.c ,L) weighted say 'words with "IE" and not preceded by "C": ' right(#.ie.z ,L) weighted say 'words with "EI" and preceded by "C": ' right(#.ei.c ,L) weighted say 'words with "EI" and not preceded by "C": ' right(#.ei.z ,L) weighted say; mantra='The spelling mantra ' p1=#.ie.z/max(1,#.ei.z); phrase='"I before E when not preceded by C"' say mantra phrase ' is ' word("im", 1+(p1>2))'plausible.' p2=#.ie.c/max(1,#.ei.c); phrase='"E before I when preceded by C"' say mantra phrase ' is ' word("im", 1+(p2>2))'plausible.' po=p1>2 & p2>2; say 'Overall, it is' word("im",1+po)'plausible.' exit /*stick a fork in it, we're done.*/ /*──────────────────────────────────FIND subroutine─────────────────────*/ find: arg x; s=1; do forever; _=pos(x,u,s); if _==0 then leave

                 if substr(u,_-1+(_==1)*999,1)=='C'  then #.x.c=#.x.c+one
                                                     else #.x.z=#.x.z+one
                 s=_+1                /*handle case of multiple finds. */
                 end   /*forever*/

return</lang> output when using the default dictionary and default word frequency list

lines in the   WORDFREQ.TXT        list:  7727
words in the   WORDFREQ.TXT        list:  7728

lines in the   UNIXDICT.TXT  dictionary:  25104
words in the   UNIXDICT.TXT  dictionary:  25104

words with "IE" and "EI" (in same word):      4  (weighted)
words with "IE" and     preceded by "C":    719  (weighted)
words with "IE" and not preceded by "C":   3818  (weighted)
words with "EI" and     preceded by "C":    100  (weighted)
words with "EI" and not preceded by "C":   4875  (weighted)

The spelling mantra   "I before E when not preceded by C"  is  implausible.
The spelling mantra   "E before I when     preceded by C"  is  plausible.
Overall, it is implausible.

Seed7

<lang seed7>$ include "seed7_05.s7i";

 include "gethttp.s7i";
 include "float.s7i";

const integer: PLAUSIBILITY_RATIO is 2;

const func boolean: plausibilityCheck (in string: comment, in integer: x, in integer: y) is func

 result
   var boolean: plausible is FALSE;
 begin
   writeln("  Checking plausibility of: " <& comment);
   if x > PLAUSIBILITY_RATIO * y then
     writeln("    PLAUSIBLE. As we have counts of " <& x <& " vs " <& y <&
             " words, a ratio of " <& flt(x) / flt(y) digits 1 lpad 4 <& " times");
   elsif x > y then
     writeln("    IMPLAUSIBLE. As although we have counts of " <& x <& " vs " <& y <&
             " words, a ratio of " <& flt(x) / flt(y) digits 1 lpad 4 <& " times does not make it plausible");
   else
     writeln("    IMPLAUSIBLE, probably contra-indicated. As we have counts of " <& x <& " vs " <& y <&
             " words, a ratio of " <& flt(x) / flt(y) digits 1 lpad 4 <& " times");
   end if;
   plausible := x > PLAUSIBILITY_RATIO * y;
 end func;

const func integer: count (in string: stri, in array string: words) is func

 result
   var integer: count is 0;
 local
   var integer: index is 0;
 begin
   for key index range words do
     if pos(words[index], stri) <> 0 then
       incr(count);
     end if;
   end for;
 end func;

const proc: main is func

 local
   var array string: words is 0 times "";
   var integer: cie is 0;
   var integer: cei is 0;
   var integer: not_c_ie is 0;
   var integer: not_c_ei is 0;
 begin
   words := split(lower(getHttp("www.puzzlers.org/pub/wordlists/unixdict.txt")), "\n");
   cie := count("cie", words);
   cei := count("cei", words);
   not_c_ie := count("ie", words) - cie;
   not_c_ei := count("ei", words) - cei;
   writeln("Checking plausibility of \"I before E except after C\":");
   if plausibilityCheck("I before E when not preceded by C", not_c_ie, not_c_ei) and
       plausibilityCheck("E before I when preceded by C", cei, cie) then
     writeln("OVERALL IT IS PLAUSIBLE!");
   else
     writeln("OVERALL IT IS IMPLAUSIBLE!");
     writeln("(To be plausible, one word count must exceed another by " <& PLAUSIBILITY_RATIO <& " times)");
   end if;
 end func;</lang>

Output:

Checking plausibility of "I before E except after C":
  Checking plausibility of: I before E when not preceded by C
    PLAUSIBLE. As we have counts of 465 vs 213 words, a ratio of  2.2 times
  Checking plausibility of: E before I when preceded by C
    IMPLAUSIBLE, probably contra-indicated. As we have counts of 13 vs 24 words, a ratio of  0.5 times
OVERALL IT IS IMPLAUSIBLE!
(To be plausible, one word count must exceed another by 2 times)

Tcl

Translation of: Python

<lang tcl>package require http

variable PLAUSIBILITY_RATIO 2.0 proc plausible {description x y} {

   variable PLAUSIBILITY_RATIO
   puts "  Checking plausibility of: $description"
   if {$x > $PLAUSIBILITY_RATIO * $y} {

set conclusion "PLAUSIBLE" set fmt "As we have counts of %i vs %i words, a ratio of %.1f times" set result true

   } elseif {$x > $y} {

set conclusion "IMPLAUSIBLE" set fmt "As although we have counts of %i vs %i words," append fmt " a ratio of %.1f times does not make it plausible" set result false

   } else {

set conclusion "IMPLAUSIBLE, probably contra-indicated" set fmt "As we have counts of %i vs %i words, a ratio of %.1f times" set result false

   }
   puts [format "    %s.\n    $fmt" $conclusion $x $y [expr {double($x)/$y}]]
   return $result

}

set t [http::geturl http://www.puzzlers.org/pub/wordlists/unixdict.txt] set words [split [http::data $t] "\n"] http::cleanup $t foreach {name pattern} {ie (?:^|[^c])ie ei (?:^|[^c])ei cie cie cei cei} {

   set count($name) [llength [lsearch -nocase -all -regexp $words $pattern]]

}

puts "Checking plausibility of \"I before E except after C\":" if {

   [plausible "I before E when not preceded by C" $count(ie) $count(ei)] &&
   [plausible "E before I when preceded by C" $count(cei) $count(cie)]

} then {

   puts "\nOVERALL IT IS PLAUSIBLE!"

} else {

   puts "\nOVERALL IT IS IMPLAUSIBLE!"

} puts "\n(To be plausible, one word count must exceed another by\ $PLAUSIBILITY_RATIO times)"</lang>

Output:
Checking plausibility of "I before E except after C":
  Checking plausibility of: I before E when not preceded by C
    PLAUSIBLE.
    As we have counts of 465 vs 213 words, a ratio of 2.2 times
  Checking plausibility of: E before I when preceded by C
    IMPLAUSIBLE, probably contra-indicated.
    As we have counts of 13 vs 24 words, a ratio of 0.5 times

OVERALL IT IS IMPLAUSIBLE!

(To be plausible, one word count must exceed another by 2.0 times)

UNIX Shell

<lang bash>#!/bin/sh

matched() { egrep "$1" unixdict.txt | wc -l }

check() { if [ $(expr $(matched $3) \> $(expr 2 \* $(matched $2))) = '0' ]; then echo clause $1 not plausible exit 1 fi }

check 1 \[^c\]ei \[^c\]ie && check 2 cie cei && echo plausible</lang>

Output:
clause 2 not plausible