Birthday problem

From Rosetta Code
Revision as of 00:27, 5 November 2013 by rosettacode>Gerard Schildberger (→‎{{header|REXX}}: interjected a missing SEP line. -- ~~~~)

This page uses content from Wikipedia. The current wikipedia article is at Birthday Problem. The original RosettaCode article was extracted from the wikipedia article № 296054030 of 21:44, 12 June 2009 . The list of authors can be seen in the page history. As with Rosetta Code, the pre 5 June 2009 text of Wikipedia is available under the GNU FDL. (See links for details on variance)

Birthday problem is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

In probability theory, the birthday problem, or birthday paradox This is not a paradox in the sense of leading to a logical contradiction, but is called a paradox because the mathematical truth contradicts naïve intuition: most people estimate that the chance is much lower than 50%. pertains to the probability that in a set of randomly chosen people some pair of them will have the same birthday. In a group of at least 23 randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. For 57 or more people, the probability is more than 99%, and it reaches 100% when the number of people reaches 366 (by the pigeon hole principle, ignoring leap years). The mathematics behind this problem leads to a well-known cryptographic attack called the birthday attack.

Task

Using simulation, estimate the number of independent people required in a groups before we can expect a better than even chance that at least 2 independent people in a group share a common birthday. Furthermore: Simulate and thus estimate when we can expect a better than even chance that at least 3, 4 & 5 independent people of the group share a common birthday. For simplicity assume that all of the people are alive...

Suggestions for improvement
  • Estimating the error in the estimate to help ensure the estimate is accurate to 4 decimal places.
  • Converging to the th solution using a root finding method, as opposed to using an extensive search.
  • Kudos (κῦδος) for finding the solution by proof (in a programming language) rather than by construction and simulation.
See also

ALGOL 68

Works with: ALGOL 68 version Revision 1
Works with: ALGOL 68G version Any - tested with release algol68g-2.6.

File: Birthday_problem.a68<lang algol68>#!/usr/bin/a68g --script #

  1. -*- coding: utf-8 -*- #

REAL desired probability := 0.5; # 50% #

REAL upb year = 365 + 1/4 # - 3/400 but alive, ignore those born prior to 1901 #, INT upb sample size = 100 000,

   upb common = 5 ;

FORMAT name int fmt = $g": "g(-0)"; "$,

      name real fmt = $g": "g(-0,4)"; "$,
      name percent fmt = $g": "g(-0,2)"%; "$;

printf((

 name real fmt,
 "upb year",upb year,
 name int fmt,
 "upb common",upb common,
 "upb sample size",upb sample size,
 $l$

));

INT required common := 1; # initial value # FOR group size FROM required common WHILE required common <= upb common DO

 INT sample with no required common := 0;
 TO upb sample size DO
 # generate sample #
   [group size]INT sample;
   FOR i TO UPB sample DO sample[i] := ENTIER(random * upb year) + 1 OD;
   FOR birthday i TO UPB sample DO
     INT birthday = sample[birthday i];
     INT number in common := 1;
   # special case = 1 #
     IF number in common >= required common THEN
       found required common
     FI;
     FOR birthday j FROM birthday i + 1 TO UPB sample DO
       IF birthday = sample[birthday j] THEN
         number in common +:= 1;
         IF number in common >= required common THEN
           found required common
         FI
       FI
     OD
   OD  # days in year #;
   sample with no required common +:= 1;
   found required common: SKIP
 OD # sample size #;
 REAL portion of years with required common birthdays = 
   (upb sample size - sample with no required common) / upb sample size;
 print(".");
 IF portion of years with required common birthdays > desired probability THEN
   printf((
     $l$,
     name int fmt,
     "required common",required common,
     "group size",group size,
     # "sample with no required common",sample with no required common, #
     name percent fmt,
     "%age of years with required common birthdays",portion of years with required common birthdays*100,
     $l$
   ));
   required common +:= 1
 FI

OD # group size #</lang>Output:

upb year: 365.2500; upb common: 5; upb sample size: 100000; 
.
required common: 1; group size: 1; %age of years with required common birthdays: 100.00%; 
......................
required common: 2; group size: 23; %age of years with required common birthdays: 50.71%; 
.................................................................
required common: 3; group size: 88; %age of years with required common birthdays: 50.90%; 
...................................................................................................
required common: 4; group size: 187; %age of years with required common birthdays: 50.25%; 
...............................................................................................................................
required common: 5; group size: 314; %age of years with required common birthdays: 50.66%; 

PL/I

<lang PL/I>*process source attributes xref;

bd: Proc Options(main);
/*--------------------------------------------------------------------
* 04.11.2013 Walter Pachl
* Take samp samples of groups with gs persons and check
*how many of the groups have at least match persons with same birthday
*-------------------------------------------------------------------*/
Dcl (float,random) Builtin;
Dcl samp Bin Fixed(31) Init(1000000);
Dcl arr(0:366) Bin Fixed(31);
Dcl r Bin fixed(31);
Dcl i Bin fixed(31);
Dcl ok Bin fixed(31);
Dcl g  Bin fixed(31);
Dcl gs Bin fixed(31);
Dcl match Bin fixed(31);
Dcl cnt(0:1) Bin Fixed(31);
Dcl lo(6) Bin Fixed(31) Init(0,21,85,185,311,458);
Dcl hi(6) Bin Fixed(31) Init(0,25,89,189,315,462);
Dcl rf Bin Float(63);
Dcl hits  Bin Float(63);
Dcl arrow Char(3);
Do match=2 To 6;
  Put Edit(' ')(Skip,a);
  Put Edit(samp,' samples. Percentage of groups with at least',
           match,' matches')(Skip,f(8),a,f(2),a);
  Put Edit('Group size')(Skip,a);
  Do gs=lo(match) To hi(match);
    cnt=0;
    Do i=1 To samp;
      ok=0;
      arr=0;
      Do g=1 To gs;
        rf=random();
        r=rf*365+1;
        arr(r)+=1;
        If arr(r)=match Then Do;
          /* Put Edit(r)(Skip,f(4));*/
          ok=1;
          End;
        End;
      cnt(ok)+=1;
      End;
    hits=float(cnt(1))/samp;
    If hits>=.5 Then arrow=' <-';
                Else arrow=;
    Put Edit(gs,cnt(0),cnt(1),100*hits,'%',arrow)
            (Skip,f(10),2(f(7)),f(8,3),a,a);
    End;
  End;
End;</lang> 

Output:

 1000000 samples. Percentage of groups with at least 2 matches
Group size
        21 556903 443097  44.310%
        22 524741 475259  47.526%
        23 492034 507966  50.797% <-
        24 462172 537828  53.783% <-
        25 431507 568493  56.849% <-

 1000000 samples. Percentage of groups with at least 3 matches
Group size
        85 523287 476713  47.671%
        86 512219 487781  48.778%
        87 499874 500126  50.013% <-
        88 488197 511803  51.180% <-
        89 478044 521956  52.196% <-

 1000000 samples. Percentage of groups with at least 4 matches
Group size
       185 511352 488648  48.865%
       186 503888 496112  49.611%
       187 497844 502156  50.216% <-
       188 490490 509510  50.951% <-
       189 482893 517107  51.711% <-

 1000000 samples. Percentage of groups with at least 5 matches
Group size
       311 508743 491257  49.126%
       312 503524 496476  49.648%
       313 498244 501756  50.176% <-
       314 494032 505968  50.597% <-
       315 489821 510179  51.018% <-

 1000000 samples. Percentage of groups with at least 6 matches
Group size
       458 505225 494775  49.478%
       459 501871 498129  49.813%
       460 497719 502281  50.228% <-
       461 493948 506052  50.605% <-
       462 489416 510584  51.058% <- 

REXX

<lang rexx>/*REXX programs solves "birthday problem" via random number simulations.*/ parse arg grps samp seed . /*get optional arguments from CL */ if grps== | grps==',' then grps=5 /*Not specified? Use the default*/ if samp== | samp==',' then samp=100000 /*" " " " " */ if seed\==',' & seed \== then call random ,,seed /*repeatability? */ diy =365 /*or: diy=365.25*/ /*the number of Days In a Year. */ diyM=diy*100 /*this expands the RANDOM range. */

                                      /* [↓] get a rough estimate for %*/
    do   g=2  to grps;     s=0        /*perform through 2──►group size.*/
      do  samp;            @.=0       /*perform some number of trials. */
            do j=1                    /*do until g dup birthdays found.*/
            day=random(1,diyM) % 100  /*expand random number generation*/
            @.day=@.day+1             /*record the of a particular Bday*/
            if @.day==g  then leave   /*when  G  hits have occurred ···*/
            end   /*j*/
      s=s+j                           /*add number of people to sum.   */
      end         /*samp*/
    start.g=s/samp%1-g                /*define where the try-outs start*/
    end           /*g*/

say ' sample size is ' samp /*show sample size of this run. */ say ___=' ' /*padding for easier eyeballing. */ say ___ 'required' ___ 'group' ___ '% with required' say ___ ' common ' ___ ' size' ___ 'common birthdays' say ___ '────────' ___ '─────' ___ '────────────────'

                                      /* [↓] where the try-outs happen.*/
   do   g=2  to grps                  /*perform through 2──►group size.*/
     do try=start.g;    s=0           /*perform try-outs until avg>50%.*/
       do  samp;        @.=0          /*perform some number of trials. */
            do j=1  for try           /*do until K dup birthdays found.*/
            day=random(1,diyM) % 100  /*expand random number generation*/
            @.day=@.day+1             /*record the of a particular Bday*/
            if @.day\==g then iterate /*when  G  hits have occurred ···*/
            s=s+1                     /*another common birthday found. */
            leave                     /* ... and stop looking for more.*/
            end   /*j*/
       end       /*samp*/
     if s/samp>.5  then leave         /*if the average is > 50%, stop. */
     end         /*try*/
   say ___ center(g,8) ___  right(try,5)  ___ center((s/samp*100)'%',16)
   end           /*g*/
                                      /*stick a fork in it, we're done.*/</lang>

output

             sample size is  100000

    required    group    %  with required
     common      size    common birthdays
    ────────    ─────    ────────────────
       2           23       50.75100%
       3           88       51.14700%
       4          187       50.30400%
       5          313       50.14100%