Birthday problem

From Rosetta Code
Revision as of 20:18, 4 November 2013 by rosettacode>Gerard Schildberger (→‎{{header|REXX}}: changed/add REXX section header comments. -- ~~~~)

This page uses content from Wikipedia. The current wikipedia article is at Birthday Problem. The original RosettaCode article was extracted from the wikipedia article № 296054030 of 21:44, 12 June 2009 . The list of authors can be seen in the page history. As with Rosetta Code, the pre 5 June 2009 text of Wikipedia is available under the GNU FDL. (See links for details on variance)

Birthday problem is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

In probability theory, the birthday problem, or birthday paradox This is not a paradox in the sense of leading to a logical contradiction, but is called a paradox because the mathematical truth contradicts naïve intuition: most people estimate that the chance is much lower than 50%. pertains to the probability that in a set of randomly chosen people some pair of them will have the same birthday. In a group of at least 23 randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. For 57 or more people, the probability is more than 99%, and it reaches 100% when the number of people reaches 366 (by the pigeon hole principle, ignoring leap years). The mathematics behind this problem leads to a well-known cryptographic attack called the birthday attack.

Task

Using simulation, estimate the number of independent people required in a groups before we can expect a better than even chance that at least 2 independent people in a group share a common birthday. Furthermore: Simulate and thus estimate when we can expect a better than even chance that at least 3, 4 & 5 independent people of the group share a common birthday. For simplicity assume that all of the people are alive...

Suggestions for improvement
  • Estimating the error in the estimate to help ensure the estimate is accurate to 4 decimal places.
  • Converging to the th solution using a root finding method, as opposed to using an extensive search.
  • Kudos (κῦδος) for finding the solution by proof (in a programming language) rather than by construction and simulation.
See also

ALGOL 68

Works with: ALGOL 68 version Revision 1
Works with: ALGOL 68G version Any - tested with release algol68g-2.6.

File: Birthday_problem.a68<lang algol68>#!/usr/bin/a68g --script #

  1. -*- coding: utf-8 -*- #

REAL desired probability := 0.5; # 50% #

REAL upb year = 365 + 1/4 # - 3/400 but alive, ignore those born prior to 1901 #, INT upb sample size = 100 000,

   upb common = 5 ;

FORMAT name int fmt = $g": "g(-0)"; "$,

      name real fmt = $g": "g(-0,4)"; "$,
      name percent fmt = $g": "g(-0,2)"%; "$;

printf((

 name real fmt,
 "upb year",upb year,
 name int fmt,
 "upb common",upb common,
 "upb sample size",upb sample size,
 $l$

));

INT required common := 1; # initial value # FOR group size FROM required common WHILE required common <= upb common DO

 INT sample with no required common := 0;
 TO upb sample size DO
 # generate sample #
   [group size]INT sample;
   FOR i TO UPB sample DO sample[i] := ENTIER(random * upb year) + 1 OD;
   FOR birthday i TO UPB sample DO
     INT birthday = sample[birthday i];
     INT number in common := 1;
   # special case = 1 #
     IF number in common >= required common THEN
       found required common
     FI;
     FOR birthday j FROM birthday i + 1 TO UPB sample DO
       IF birthday = sample[birthday j] THEN
         number in common +:= 1;
         IF number in common >= required common THEN
           found required common
         FI
       FI
     OD
   OD  # days in year #;
   sample with no required common +:= 1;
   found required common: SKIP
 OD # sample size #;
 REAL portion of years with required common birthdays = 
   (upb sample size - sample with no required common) / upb sample size;
 print(".");
 IF portion of years with required common birthdays > desired probability THEN
   printf((
     $l$,
     name int fmt,
     "required common",required common,
     "group size",group size,
     # "sample with no required common",sample with no required common, #
     name percent fmt,
     "%age of years with required common birthdays",portion of years with required common birthdays*100,
     $l$
   ));
   required common +:= 1
 FI

OD # group size #</lang>Output:

upb year: 365.2500; upb common: 5; upb sample size: 100000; 
.
required common: 1; group size: 1; %age of years with required common birthdays: 100.00%; 
......................
required common: 2; group size: 23; %age of years with required common birthdays: 50.71%; 
.................................................................
required common: 3; group size: 88; %age of years with required common birthdays: 50.90%; 
...................................................................................................
required common: 4; group size: 187; %age of years with required common birthdays: 50.25%; 
...............................................................................................................................
required common: 5; group size: 314; %age of years with required common birthdays: 50.66%; 

REXX

The method used: using simulation, this program finds (and records) when a pseudo-random number is used (in a large sample size trial) and finds a number of people (group size of independent people) which has been reached that share a common birthday.   This whole number is then averaged with the multiple finds and then interpolation is used in the result shown.

The particular REXX random BIF that is being used (below) appears to be not generating "true-enough" random numbers as it looks like a group of 24 people are needed to cause a common birthday; but indeed, the number should be 23. <lang rexx>/*REXX programs solves "birthday problem" via random number simulations.*/ parse arg grps samp seed . /*get optional arguments from CL */ if grps== | grps==',' then grps=5 /*Not specified? Use the default*/ if samp== | samp==',' then samp=100000 /*" " " " " */ if seed\==',' & seed \== then call random ,,seed /*repeatability? */ !.=0 /*used for tracking pop averages.*/ diy=365 /*or: diy=365.25*/ /*the number of Days In a Year. */

                                      /* [↑]  Feb 29th's B-day ≡ Mar 1.*/
  do   k=2  to grps;   s=0            /*perform through 2──►group size.*/
    do  samp                          /*perform some number of trials. */
    @.=0                              /*initialize birthday occurrences*/
          do j=1                      /*do until K dup birthdays found.*/
          day=random(1,diy*100) % 100 /*expand random number generation*/
          @.day=@.day+1               /*record the of a particular Bday*/
          if @.day==k  then leave     /*when  K  hits have occurred ...*/
          end   /*j*/
    s=s+j                             /*add number of people to sum.   */
    end         /*samp*/
  !.k=s                               /*save the total number of people*/
  end           /*k*/

pad=' ' /*padding for easier eyeballing. */ say ' sample size is ' samp /*show sample size of this run. */ say say pad 'required' pad 'group' pad '% with required' say pad ' common ' pad ' size' pad 'common birthdays' say pad '────────' pad '─────' pad '────────────────'

  do g=2  to  grps                    /*show all the groups simulated. */
  d=!.g/samp                          /*average common birthday size.  */
  say pad center(g,8) pad  right(d%1,5)  pad center((50+(d-d%1)/d)'%',16)
  end   /*g*/
                                      /*stick a fork in it, we're done.*/</lang>

output

             sample size is  100000

   required    group    %  with required
    common      size    common birthdays
   ────────    ─────    ────────────────
      2           24       50.0274786
      3           88       50.0083309
      4          187       50.0016825
      5          311       50.0020498