Birthday problem

From Rosetta Code
Revision as of 22:08, 4 November 2013 by Walterpachl (talk | contribs) (add PL/I)

This page uses content from Wikipedia. The current wikipedia article is at Birthday Problem. The original RosettaCode article was extracted from the wikipedia article № 296054030 of 21:44, 12 June 2009 . The list of authors can be seen in the page history. As with Rosetta Code, the pre 5 June 2009 text of Wikipedia is available under the GNU FDL. (See links for details on variance)

Birthday problem is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

In probability theory, the birthday problem, or birthday paradox This is not a paradox in the sense of leading to a logical contradiction, but is called a paradox because the mathematical truth contradicts naïve intuition: most people estimate that the chance is much lower than 50%. pertains to the probability that in a set of randomly chosen people some pair of them will have the same birthday. In a group of at least 23 randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. For 57 or more people, the probability is more than 99%, and it reaches 100% when the number of people reaches 366 (by the pigeon hole principle, ignoring leap years). The mathematics behind this problem leads to a well-known cryptographic attack called the birthday attack.

Task

Using simulation, estimate the number of independent people required in a groups before we can expect a better than even chance that at least 2 independent people in a group share a common birthday. Furthermore: Simulate and thus estimate when we can expect a better than even chance that at least 3, 4 & 5 independent people of the group share a common birthday. For simplicity assume that all of the people are alive...

Suggestions for improvement
  • Estimating the error in the estimate to help ensure the estimate is accurate to 4 decimal places.
  • Converging to the th solution using a root finding method, as opposed to using an extensive search.
  • Kudos (κῦδος) for finding the solution by proof (in a programming language) rather than by construction and simulation.
See also

ALGOL 68

Works with: ALGOL 68 version Revision 1
Works with: ALGOL 68G version Any - tested with release algol68g-2.6.

File: Birthday_problem.a68<lang algol68>#!/usr/bin/a68g --script #

  1. -*- coding: utf-8 -*- #

REAL desired probability := 0.5; # 50% #

REAL upb year = 365 + 1/4 # - 3/400 but alive, ignore those born prior to 1901 #, INT upb sample size = 100 000,

   upb common = 5 ;

FORMAT name int fmt = $g": "g(-0)"; "$,

      name real fmt = $g": "g(-0,4)"; "$,
      name percent fmt = $g": "g(-0,2)"%; "$;

printf((

 name real fmt,
 "upb year",upb year,
 name int fmt,
 "upb common",upb common,
 "upb sample size",upb sample size,
 $l$

));

INT required common := 1; # initial value # FOR group size FROM required common WHILE required common <= upb common DO

 INT sample with no required common := 0;
 TO upb sample size DO
 # generate sample #
   [group size]INT sample;
   FOR i TO UPB sample DO sample[i] := ENTIER(random * upb year) + 1 OD;
   FOR birthday i TO UPB sample DO
     INT birthday = sample[birthday i];
     INT number in common := 1;
   # special case = 1 #
     IF number in common >= required common THEN
       found required common
     FI;
     FOR birthday j FROM birthday i + 1 TO UPB sample DO
       IF birthday = sample[birthday j] THEN
         number in common +:= 1;
         IF number in common >= required common THEN
           found required common
         FI
       FI
     OD
   OD  # days in year #;
   sample with no required common +:= 1;
   found required common: SKIP
 OD # sample size #;
 REAL portion of years with required common birthdays = 
   (upb sample size - sample with no required common) / upb sample size;
 print(".");
 IF portion of years with required common birthdays > desired probability THEN
   printf((
     $l$,
     name int fmt,
     "required common",required common,
     "group size",group size,
     # "sample with no required common",sample with no required common, #
     name percent fmt,
     "%age of years with required common birthdays",portion of years with required common birthdays*100,
     $l$
   ));
   required common +:= 1
 FI

OD # group size #</lang>Output:

upb year: 365.2500; upb common: 5; upb sample size: 100000; 
.
required common: 1; group size: 1; %age of years with required common birthdays: 100.00%; 
......................
required common: 2; group size: 23; %age of years with required common birthdays: 50.71%; 
.................................................................
required common: 3; group size: 88; %age of years with required common birthdays: 50.90%; 
...................................................................................................
required common: 4; group size: 187; %age of years with required common birthdays: 50.25%; 
...............................................................................................................................
required common: 5; group size: 314; %age of years with required common birthdays: 50.66%; 

PL/I

<lang PL/I>*process source attributes xref;

bd: Proc Options(main);
/*--------------------------------------------------------------------
* 04.11.2013 Walter Pachl
* Take samp samples of groups with gs persons and check
* how many of the groupt have at least match person with same birthday
*-------------------------------------------------------------------*/
Dcl (float,random) Builtin;
Dcl samp Bin Fixed(31) Init(1000000);
Dcl arr(0:366) Bin Fixed(31);
Dcl r Bin fixed(31);
Dcl i Bin fixed(31);
Dcl ok Bin fixed(31);
Dcl g  Bin fixed(31);
Dcl gs Bin fixed(31);
Dcl match Bin fixed(31);
Dcl cnt(0:1) Bin Fixed(31);
Dcl lo(6) Bin Fixed(31) Init(0,21,85,185,311,458);
Dcl hi(6) Bin Fixed(31) Init(0,25,89,189,315,462);
Dcl rf Bin Float(63);
Dcl hits  Bin Float(63);
Dcl arrow Char(3);
Do match=2 To 6;
  Put Edit(' ')(Skip,a);
  Put Edit(samp,' samples. Percentage of groups with at least',
           match,' matches')(Skip,f(8),a,f(2),a);
  Put Edit('Group size')(Skip,a);
  Do gs=lo(match) To hi(match);
    cnt=0;
    Do i=1 To samp;
      ok=0;
      arr=0;
      Do g=1 To gs;
        rf=random();
        r=rf*365+1;
        arr(r)+=1;
        If arr(r)=match Then Do;
          /* Put Edit(r)(Skip,f(4));*/
          ok=1;
          End;
        End;
      cnt(ok)+=1;
      End;
    hits=float(cnt(1))/samp;
    If hits>=.5 Then arrow=' <-';
                Else arrow=;
    Put Edit(gs,cnt(0),cnt(1),100*hits,'%',arrow)
            (Skip,f(10),2(f(7)),f(8,3),a,a);
    End;
  End;
End;</lang> 

Output:

 1000000 samples. Percentage of groups with at least 2 matches
Group size
        21 556903 443097  44.310%
        22 524741 475259  47.526%
        23 492034 507966  50.797% <-
        24 462172 537828  53.783% <-
        25 431507 568493  56.849% <-

 1000000 samples. Percentage of groups with at least 3 matches
Group size
        85 523287 476713  47.671%
        86 512219 487781  48.778%
        87 499874 500126  50.013% <-
        88 488197 511803  51.180% <-
        89 478044 521956  52.196% <-

 1000000 samples. Percentage of groups with at least 4 matches
Group size
       185 511352 488648  48.865%
       186 503888 496112  49.611%
       187 497844 502156  50.216% <-
       188 490490 509510  50.951% <-
       189 482893 517107  51.711% <-

 1000000 samples. Percentage of groups with at least 5 matches
Group size
       311 508743 491257  49.126%
       312 503524 496476  49.648%
       313 498244 501756  50.176% <-
       314 494032 505968  50.597% <-
       315 489821 510179  51.018% <-

 1000000 samples. Percentage of groups with at least 6 matches
Group size
       458 505225 494775  49.478%
       459 501871 498129  49.813%
       460 497719 502281  50.228% <-
       461 493948 506052  50.605% <-
       462 489416 510584  51.058% <- 

REXX

The method used: using simulation, this program finds (and records) when a pseudo-random number is used (in a large sample size trial) and finds a number of people (group size of independent people) which has been reached that share a common birthday.   This whole number is then averaged with the multiple finds and then interpolation is used in the result shown.

The particular REXX random BIF that is being used (below) appears to be not generating "true-enough" random numbers as it looks like a group of 24 people are needed to cause a common birthday; but indeed, the number should be 23. <lang rexx>/*REXX programs solves "birthday problem" via random number simulations.*/ parse arg grps samp seed . /*get optional arguments from CL */ if grps== | grps==',' then grps=5 /*Not specified? Use the default*/ if samp== | samp==',' then samp=100000 /*" " " " " */ if seed\==',' & seed \== then call random ,,seed /*repeatability? */ !.=0 /*used for tracking pop averages.*/ diy=365 /*or: diy=365.25*/ /*the number of Days In a Year. */

                                      /* [↑]  Feb 29th's B-day ≡ Mar 1.*/
  do   k=2  to grps;   s=0            /*perform through 2──►group size.*/
    do  samp                          /*perform some number of trials. */
    @.=0                              /*initialize birthday occurrences*/
          do j=1                      /*do until K dup birthdays found.*/
          day=random(1,diy*100) % 100 /*expand random number generation*/
          @.day=@.day+1               /*record the of a particular Bday*/
          if @.day==k  then leave     /*when  K  hits have occurred ...*/
          end   /*j*/
    s=s+j                             /*add number of people to sum.   */
    end         /*samp*/
  !.k=s                               /*save the total number of people*/
  end           /*k*/

pad=' ' /*padding for easier eyeballing. */ say ' sample size is ' samp /*show sample size of this run. */ say say pad 'required' pad 'group' pad '% with required' say pad ' common ' pad ' size' pad 'common birthdays' say pad '────────' pad '─────' pad '────────────────'

  do g=2  to  grps                    /*show all the groups simulated. */
  d=!.g/samp                          /*average common birthday size.  */
  say pad center(g,8) pad  right(d%1,5)  pad center((50+(d-d%1)/d)'%',16)
  end   /*g*/
                                      /*stick a fork in it, we're done.*/</lang>

output

             sample size is  100000

   required    group    %  with required
    common      size    common birthdays
   ────────    ─────    ────────────────
      2           24       50.0274786
      3           88       50.0083309
      4          187       50.0016825
      5          311       50.0020498