Birthday problem: Difference between revisions

From Rosetta Code
Content added Content deleted
m (→‎{{header|REXX}}: optimized for speed because of large loop)
Line 100: Line 100:
The method used: using simulation, this program finds (and records) when a pseudo-random number is used (in a large sample size trial) and finds a number of people (group size of independent people) which has been reached that share a common birthday with a probability that exceeds 50%.   This whole number is then averaged with the multiple finds and then interpolation is used in the result shown.
The method used: using simulation, this program finds (and records) when a pseudo-random number is used (in a large sample size trial) and finds a number of people (group size of independent people) which has been reached that share a common birthday with a probability that exceeds 50%.   This whole number is then averaged with the multiple finds and then interpolation is used in the result shown.
<lang rexx>/*REXX programs solves "birthday problem" via random number simulations.*/
<lang rexx>/*REXX programs solves "birthday problem" via random number simulations.*/
parse arg gs samp seed . /*get optional arguments from CL */
parse arg grps samp seed . /*get optional arguments from CL */
if gs =='' | gs ==',' then gs=5 /*Not specified? Use the default*/
if grps=='' | grps==',' then grps=5 /*Not specified? Use the default*/
if samp=='' | samp==',' then samp=100000 /*" " " " " */
if samp=='' | samp==',' then samp=100000 /*" " " " " */
if seed\==',' & seed \=='' then call random ,,seed /*repeatability? */
if seed\==',' & seed \=='' then call random ,,seed /*repeatability? */
yl=365+1/4 /*size (number of days) in year. */
!.=0 /*used for tracking pop averages.*/
!.=0 /*used for keeping track of avgs.*/
diy=365 /*or: diy=365.25*/ /*the number of Days In a Year. */
/* [↑] Feb 29th's B-day ≡ Mar 1.*/

do t=1 for samp /*perform SAMP number of trials*/
do k=2 to grps; s=0 /*perform through 2──►group size.*/
do k=1 for gs /*perform through 1──►group size.*/
do samp /*perform some number of trials. */
overH=0 /*require that we need over 50%. */
@.=0 /*initialize birthday occurrences*/
@.=0 /*initialize birthday occurrences*/
do j=1 /*perform until > 50% have common*/
do j=1 /*do until K dup birthdays found.*/
day=random(1,yl*100) % 100 /*expand random number generation*/
day=random(1,diy*100) % 100 /*expand random number generation*/
@.day=@.day+1 /*keep track of a common birthday*/
@.day=@.day+1 /*record the of a particular Bday*/
if @.day==k then do /*when overH=0, have reached 50%.*/
if @.day==k then leave /*when K hits have occurred ...*/
if overH then leave /*we reached > 50%.*/
!.k=!.k+j /*keep track of average pop count*/
overH=1 /*now, keep looking until >50%.*/
end
end /*j*/
end /*j*/
end /*k*/
s=s+j /*add number of people to sum. */
end /*samp*/
end /*samp*/
!.k=s /*save the total number of people*/
end /*k*/


pad=' ' /*padding for easier eyeballing. */
pad=' ' /*padding for easier eyeballing. */
Line 130: Line 127:
say pad '────────' pad '─────' pad '────────────────'
say pad '────────' pad '─────' pad '────────────────'


do g=1 for gs /*show all groups simulated. */
do g=2 to grps /*show all the groups simulated. */
d=!.g/samp /*average common birthday size. */
bias=1/(1+(overH==overH)-(g==1))*100/1 /*bias: used when groupSize=1*/
say pad center(g,8) pad right(d%1,5) pad center((50+(d-d%1)/d)'%',16)
d=!.g/samp /*the average group size. */
say pad center(g,8) pad right(d%1,5) pad center((bias+(d-d%1)/d)'%',16)
end /*g*/
end /*g*/
/*stick a fork in it, we're done.*/</lang>
/*stick a fork in it, we're done.*/</lang>
Line 143: Line 139:
common size common birthdays
common size common birthdays
──────── ───── ────────────────
──────── ───── ────────────────
1 1 100
2 24 50.0274786
2 24 50.0274786
3 88 50.0083309
3 88 50.0083309

Revision as of 20:07, 4 November 2013

This page uses content from Wikipedia. The current wikipedia article is at Birthday Problem. The original RosettaCode article was extracted from the wikipedia article № 296054030 of 21:44, 12 June 2009 . The list of authors can be seen in the page history. As with Rosetta Code, the pre 5 June 2009 text of Wikipedia is available under the GNU FDL. (See links for details on variance)

Birthday problem is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

In probability theory, the birthday problem, or birthday paradox This is not a paradox in the sense of leading to a logical contradiction, but is called a paradox because the mathematical truth contradicts naïve intuition: most people estimate that the chance is much lower than 50%. pertains to the probability that in a set of randomly chosen people some pair of them will have the same birthday. In a group of at least 23 randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. For 57 or more people, the probability is more than 99%, and it reaches 100% when the number of people reaches 366 (by the pigeon hole principle, ignoring leap years). The mathematics behind this problem leads to a well-known cryptographic attack called the birthday attack.

Task

Using simulation, estimate the number of independent people required in a groups before we can expect a better than even chance that at least 2 independent people in a group share a common birthday. Furthermore: Simulate and thus estimate when we can expect a better than even chance that at least 3, 4 & 5 independent people of the group share a common birthday. For simplicity assume that all of the people are alive...

Suggestions for improvement
  • Estimating the error in the estimate to help ensure the estimate is accurate to 4 decimal places.
  • Converging to the th solution using a root finding method, as opposed to using an extensive search.
  • Kudos (κῦδος) for finding the solution by proof (in a programming language) rather than by construction and simulation.
See also

ALGOL 68

Works with: ALGOL 68 version Revision 1
Works with: ALGOL 68G version Any - tested with release algol68g-2.6.

File: Birthday_problem.a68<lang algol68>#!/usr/bin/a68g --script #

  1. -*- coding: utf-8 -*- #

REAL desired probability := 0.5; # 50% #

REAL upb year = 365 + 1/4 # - 3/400 but alive, ignore those born prior to 1901 #, INT upb sample size = 100 000,

   upb common = 5 ;

FORMAT name int fmt = $g": "g(-0)"; "$,

      name real fmt = $g": "g(-0,4)"; "$,
      name percent fmt = $g": "g(-0,2)"%; "$;

printf((

 name real fmt,
 "upb year",upb year,
 name int fmt,
 "upb common",upb common,
 "upb sample size",upb sample size,
 $l$

));

INT required common := 1; # initial value # FOR group size FROM required common WHILE required common <= upb common DO

 INT sample with no required common := 0;
 TO upb sample size DO
 # generate sample #
   [group size]INT sample;
   FOR i TO UPB sample DO sample[i] := ENTIER(random * upb year) + 1 OD;
   FOR birthday i TO UPB sample DO
     INT birthday = sample[birthday i];
     INT number in common := 1;
   # special case = 1 #
     IF number in common >= required common THEN
       found required common
     FI;
     FOR birthday j FROM birthday i + 1 TO UPB sample DO
       IF birthday = sample[birthday j] THEN
         number in common +:= 1;
         IF number in common >= required common THEN
           found required common
         FI
       FI
     OD
   OD  # days in year #;
   sample with no required common +:= 1;
   found required common: SKIP
 OD # sample size #;
 REAL portion of years with required common birthdays = 
   (upb sample size - sample with no required common) / upb sample size;
 print(".");
 IF portion of years with required common birthdays > desired probability THEN
   printf((
     $l$,
     name int fmt,
     "required common",required common,
     "group size",group size,
     # "sample with no required common",sample with no required common, #
     name percent fmt,
     "%age of years with required common birthdays",portion of years with required common birthdays*100,
     $l$
   ));
   required common +:= 1
 FI

OD # group size #</lang>Output:

upb year: 365.2500; upb common: 5; upb sample size: 100000; 
.
required common: 1; group size: 1; %age of years with required common birthdays: 100.00%; 
......................
required common: 2; group size: 23; %age of years with required common birthdays: 50.71%; 
.................................................................
required common: 3; group size: 88; %age of years with required common birthdays: 50.90%; 
...................................................................................................
required common: 4; group size: 187; %age of years with required common birthdays: 50.25%; 
...............................................................................................................................
required common: 5; group size: 314; %age of years with required common birthdays: 50.66%; 

REXX

The method used: using simulation, this program finds (and records) when a pseudo-random number is used (in a large sample size trial) and finds a number of people (group size of independent people) which has been reached that share a common birthday with a probability that exceeds 50%.   This whole number is then averaged with the multiple finds and then interpolation is used in the result shown. <lang rexx>/*REXX programs solves "birthday problem" via random number simulations.*/ parse arg grps samp seed . /*get optional arguments from CL */ if grps== | grps==',' then grps=5 /*Not specified? Use the default*/ if samp== | samp==',' then samp=100000 /*" " " " " */ if seed\==',' & seed \== then call random ,,seed /*repeatability? */ !.=0 /*used for tracking pop averages.*/ diy=365 /*or: diy=365.25*/ /*the number of Days In a Year. */

                                      /* [↑]  Feb 29th's B-day ≡ Mar 1.*/
  do   k=2  to grps;   s=0            /*perform through 2──►group size.*/
    do  samp                          /*perform some number of trials. */
    @.=0                              /*initialize birthday occurrences*/
          do j=1                      /*do until K dup birthdays found.*/
          day=random(1,diy*100) % 100 /*expand random number generation*/
          @.day=@.day+1               /*record the of a particular Bday*/
          if @.day==k  then leave     /*when  K  hits have occurred ...*/
          end   /*j*/
    s=s+j                             /*add number of people to sum.   */
    end         /*samp*/
  !.k=s                               /*save the total number of people*/
  end           /*k*/

pad=' ' /*padding for easier eyeballing. */ say ' sample size is ' samp /*show sample size of this run. */ say say pad 'required' pad 'group' pad '% with required' say pad ' common ' pad ' size' pad 'common birthdays' say pad '────────' pad '─────' pad '────────────────'

  do g=2  to  grps                    /*show all the groups simulated. */
  d=!.g/samp                          /*average common birthday size.  */
  say pad center(g,8) pad  right(d%1,5)  pad center((50+(d-d%1)/d)'%',16)
  end   /*g*/
                                      /*stick a fork in it, we're done.*/</lang>

output

             sample size is  100000

   required    group    %  with required
    common      size    common birthdays
   ────────    ─────    ────────────────
      2           24       50.0274786
      3           88       50.0083309
      4          187       50.0016825
      5          311       50.0020498