Birthday problem
This page uses content from Wikipedia. The current wikipedia article is at Birthday Problem. The original RosettaCode article was extracted from the wikipedia article № 296054030 of 21:44, 12 June 2009 . The list of authors can be seen in the page history. As with Rosetta Code, the pre 5 June 2009 text of Wikipedia is available under the GNU FDL. (See links for details on variance) |
In probability theory, the birthday problem, or birthday paradox This is not a paradox in the sense of leading to a logical contradiction, but is called a paradox because the mathematical truth contradicts naïve intuition: most people estimate that the chance is much lower than 50%. pertains to the probability that in a set of randomly chosen people some pair of them will have the same birthday. In a group of at least 23 randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. For 57 or more people, the probability is more than 99%, and it reaches 100% when the number of people reaches 366 (by the pigeon hole principle, ignoring leap years). The mathematics behind this problem leads to a well-known cryptographic attack called the birthday attack.
- Task
Using simulation, estimate the number of independent people required in a groups before we can expect a better than even chance that at least 2 independent people in a group share a common birthday. Furthermore: Simulate and thus estimate when we can expect a better than even chance that at least 3, 4 & 5 independent people of the group share a common birthday. For simplicity assume that all of the people are alive...
- Suggestions for improvement
- Estimating the error in the estimate to help ensure the estimate is accurate to 4 decimal places.
- Converging to the th solution using a root finding method, as opposed to using an extensive search.
- Kudos (κῦδος) for finding the solution by proof (in a programming language) rather than by construction and simulation.
- See also
ALGOL 68
File: Birthday_problem.a68<lang algol68>#!/usr/bin/a68g --script #
- -*- coding: utf-8 -*- #
REAL desired probability := 0.5; # 50% #
REAL upb year = 365 + 1/4 # - 3/400 but alive, ignore those born prior to 1901 #, INT upb sample size = 100 000,
upb common = 5 ;
FORMAT name int fmt = $g": "g(-0)"; "$,
name real fmt = $g": "g(-0,4)"; "$, name percent fmt = $g": "g(-0,2)"%; "$;
printf((
name real fmt, "upb year",upb year, name int fmt, "upb common",upb common, "upb sample size",upb sample size, $l$
));
INT required common := 1; # initial value # FOR group size FROM required common WHILE required common <= upb common DO
INT sample with no required common := 0; TO upb sample size DO # generate sample # [group size]INT sample; FOR i TO UPB sample DO sample[i] := ENTIER(random * upb year) + 1 OD; FOR birthday i TO UPB sample DO INT birthday = sample[birthday i]; INT number in common := 1; # special case = 1 # IF number in common >= required common THEN found required common FI; FOR birthday j FROM birthday i + 1 TO UPB sample DO IF birthday = sample[birthday j] THEN number in common +:= 1; IF number in common >= required common THEN found required common FI FI OD OD # days in year #; sample with no required common +:= 1; found required common: SKIP OD # sample size #; REAL portion of years with required common birthdays = (upb sample size - sample with no required common) / upb sample size; print("."); IF portion of years with required common birthdays > desired probability THEN printf(( $l$, name int fmt, "required common",required common, "group size",group size, # "sample with no required common",sample with no required common, # name percent fmt, "%age of years with required common birthdays",portion of years with required common birthdays*100, $l$ )); required common +:= 1 FI
OD # group size #</lang>Output:
upb year: 365.2500; upb common: 5; upb sample size: 100000; . required common: 1; group size: 1; %age of years with required common birthdays: 100.00%; ...................... required common: 2; group size: 23; %age of years with required common birthdays: 50.71%; ................................................................. required common: 3; group size: 88; %age of years with required common birthdays: 50.90%; ................................................................................................... required common: 4; group size: 187; %age of years with required common birthdays: 50.25%; ............................................................................................................................... required common: 5; group size: 314; %age of years with required common birthdays: 50.66%;
Lasso
<lang Lasso>if(sys_listunboundmethods !>> 'randomgen') => { define randomgen(len::integer,max::integer)::array => { #len <= 0 ? return local(out = array) loop(#len) => { #out->insert(math_random(#max,1)) } return #out } } if(sys_listunboundmethods !>> 'hasdupe') => { define hasdupe(a::array,threshold::integer) => { with i in #a do => { #a->find(#i)->size > #threshold-1 ? return true }
return false } } local(threshold = 2) local(qty = 22, probability = 0.00, samplesize = 10000) while(#probability < 50.00) => {^ local(dupeqty = 0) loop(#samplesize) => { local(x = randomgen(#qty,365)) hasdupe(#x,#threshold) ? #dupeqty++ } #probability = (#dupeqty / decimal(#samplesize)) * 100
'Threshold: '+#threshold+', qty: '+#qty+' - probability: '+#probability+'\r' #qty += 1 ^}</lang>
- Output:
Threshold: 2, qty: 22 - probability: 47.810000 Threshold: 2, qty: 23 - probability: 51.070000 Threshold: 3, qty: 86 - probability: 48.400000 Threshold: 3, qty: 87 - probability: 49.200000 Threshold: 3, qty: 88 - probability: 52.900000 Threshold: 4, qty: 184 - probability: 48.000000 Threshold: 4, qty: 185 - probability: 49.800000 Threshold: 4, qty: 186 - probability: 49.600000 Threshold: 4, qty: 187 - probability: 48.900000 Threshold: 4, qty: 188 - probability: 50.700000 Threshold: 5, qty: 308 - probability: 48.130000 Threshold: 5, qty: 309 - probability: 48.430000 Threshold: 5, qty: 310 - probability: 48.640000 Threshold: 5, qty: 311 - probability: 49.370000 Threshold: 5, qty: 312 - probability: 49.180000 Threshold: 5, qty: 313 - probability: 49.540000 Threshold: 5, qty: 314 - probability: 50.000000
PL/I
<lang PL/I>*process source attributes xref;
bd: Proc Options(main); /*-------------------------------------------------------------------- * 04.11.2013 Walter Pachl * Take samp samples of groups with gs persons and check *how many of the groups have at least match persons with same birthday *-------------------------------------------------------------------*/ Dcl (float,random) Builtin; Dcl samp Bin Fixed(31) Init(1000000); Dcl arr(0:366) Bin Fixed(31); Dcl r Bin fixed(31); Dcl i Bin fixed(31); Dcl ok Bin fixed(31); Dcl g Bin fixed(31); Dcl gs Bin fixed(31); Dcl match Bin fixed(31); Dcl cnt(0:1) Bin Fixed(31); Dcl lo(6) Bin Fixed(31) Init(0,21,85,185,311,458); Dcl hi(6) Bin Fixed(31) Init(0,25,89,189,315,462); Dcl rf Bin Float(63); Dcl hits Bin Float(63); Dcl arrow Char(3); Do match=2 To 6; Put Edit(' ')(Skip,a); Put Edit(samp,' samples. Percentage of groups with at least', match,' matches')(Skip,f(8),a,f(2),a); Put Edit('Group size')(Skip,a); Do gs=lo(match) To hi(match); cnt=0; Do i=1 To samp; ok=0; arr=0; Do g=1 To gs; rf=random(); r=rf*365+1; arr(r)+=1; If arr(r)=match Then Do; /* Put Edit(r)(Skip,f(4));*/ ok=1; End; End; cnt(ok)+=1; End; hits=float(cnt(1))/samp; If hits>=.5 Then arrow=' <-'; Else arrow=; Put Edit(gs,cnt(0),cnt(1),100*hits,'%',arrow) (Skip,f(10),2(f(7)),f(8,3),a,a); End; End; End;</lang>
Output:
1000000 samples. Percentage of groups with at least 2 matches Group size 3000000 500000 samples 21 556903 443097 44.310% 44.343% 44.347% 22 524741 475259 47.526% 47.549% 47.521% 23 492034 507966 50.797% <- 50.735% <- 50.722% <- 24 462172 537828 53.783% <- 53.815% <- 53.838% <- 25 431507 568493 56.849% <- 56.849% <- 56.842% <- 1000000 samples. Percentage of groups with at least 3 matches Group size 85 523287 476713 47.671% 47.638% 47.631% 86 512219 487781 48.778% 48.776% 48.821% 87 499874 500126 50.013% <- 49.902% 49.903% 88 488197 511803 51.180% <- 51.127% <- 51.096% <- 89 478044 521956 52.196% <- 52.263% <- 52.290% <- 1000000 samples. Percentage of groups with at least 4 matches Group size 185 511352 488648 48.865% 48.868% 48.921% 186 503888 496112 49.611% 49.601% 49.568% 187 497844 502156 50.216% <- 50.258% <- 50.297% <- 188 490490 509510 50.951% <- 50.916% <- 50.946% <- 189 482893 517107 51.711% <- 51.645% <- 51.655% <- 1000000 samples. Percentage of groups with at least 5 matches Group size 311 508743 491257 49.126% 49.158% 49.164% 312 503524 496476 49.648% 49.631% 49.596% 313 498244 501756 50.176% <- 50.139% <- 50.095% <- 314 494032 505968 50.597% <- 50.636% <- 50.586% <- 315 489821 510179 51.018% <- 51.107% <- 51.114% <- 1000000 samples. Percentage of groups with at least 6 matches Group size 458 505225 494775 49.478% 49.498% 49.512% 459 501871 498129 49.813% 49.893% 49.885% 460 497719 502281 50.228% <- 50.278% <- 50.248% <- 461 493948 506052 50.605% <- 50.622% <- 50.626% <- 462 489416 510584 51.058% <- 51.029% <- 51.055% <-
extended to verify REXX results:
1000000 samples. Percentage of groups with at least 7 matches Group size 621 503758 496242 49.624% 622 500320 499680 49.968% 623 497047 502953 50.295% <- 624 493679 506321 50.632% <- 625 491240 508760 50.876% <- 1000000 samples. Percentage of groups with at least 8 matches Group size 796 504764 495236 49.524% 797 502537 497463 49.746% 798 499488 500512 50.051% <- 799 496658 503342 50.334% <- 800 494773 505227 50.523% <- 1000000 samples. Percentage of groups with at least 9 matches Group size 983 502613 497387 49.739% 984 501665 498335 49.834% 985 498606 501394 50.139% <- 986 497453 502547 50.255% <- 987 493816 506184 50.618% <- 1000000 samples. Percentage of groups with at least10 matches Group size 1179 502910 497090 49.709% 1180 500906 499094 49.909% 1181 499079 500921 50.092% <- 1182 496957 503043 50.304% <- 1183 494414 505586 50.559% <-
Python
Note: the first (unused), version of function equal_birthdays() uses a different but equally valid interpretation of the phrase "common birthday". <lang python> from random import randint
def equal_birthdays(sharers=2, groupsize=23, rep=100000):
'Note: 4 sharing common birthday may have 2 dates shared between two people each' g = range(groupsize) sh = sharers - 1 eq = sum((groupsize - len(set(randint(1,365) for i in g)) >= sh) for j in range(rep)) return (eq * 100.) / rep
def equal_birthdays(sharers=2, groupsize=23, rep=100000):
'Note: 4 sharing common birthday must all share same common day' g = range(groupsize) sh = sharers - 1 eq = 0 for j in range(rep): group = [randint(1,365) for i in g] if (groupsize - len(set(group)) >= sh and any( group.count(member) >= sharers for member in set(group))): eq += 1 return (eq * 100.) / rep
group_est = [2] for sharers in (2, 3, 4, 5):
groupsize = group_est[-1]+1 while equal_birthdays(sharers, groupsize, 100) < 50.: # Coarse groupsize += 1 for groupsize in range(int(groupsize - (groupsize - group_est[-1])/4.), groupsize + 999): # Finer eq = equal_birthdays(sharers, groupsize, 250) if eq > 50.: break for groupsize in range(groupsize - 1, groupsize +999): # Finest eq = equal_birthdays(sharers, groupsize, 50000) if eq > 50.: break group_est.append(groupsize) print("%i independent people in a group of %s share a common birthday. (%5.1f)" % (sharers, groupsize, eq))</lang>
- Output:
2 independent people in a group of 23 share a common birthday. ( 50.9) 3 independent people in a group of 87 share a common birthday. ( 50.0) 4 independent people in a group of 188 share a common birthday. ( 50.9) 5 independent people in a group of 314 share a common birthday. ( 50.6)
REXX
The method used is to find the average number of people to share a common birthday, and then use the floor of that
value (less the group size) as a starting point to find a new group size with an expected size that exceeds 50%
common birthdays of the required size.
version 1
<lang rexx>/*REXX programs solves "birthday problem" via random number simulations.*/ parse arg grps samp seed . /*get optional arguments from CL.*/ if grps== | grps==',' then grps=5 /*Not specified? Use the default*/ if samp== | samp==',' then samp=100000 /*" " " " " */ if seed\==',' & seed \== then call random ,,seed /*repeatability? */ diy =365 /*or: diy=365.25*/ /*the number of Days In a Year. */ diyM=diy*100 /*this expands the RANDOM range. */
/* [↓] get a rough estimate for %*/ do g=2 to grps; s=0 /*perform through 2──►group size.*/ do samp; @.=0 /*perform some number of trials. */ do j=1 /*do until G dup birthdays found.*/ day=random(1,diyM) % 100 /*expand random number generation*/ @.day=@.day+1 /*record the # common birthdays.*/ if @.day==g then leave /*'nuff G hits have occurred ··· */ end /*j*/ s=s+j /*add # of birthday hits to sum. */ end /*samp*/
start.g=s/samp%1-g /*define where the try-outs start*/ end /*g*/
___=' ' /*padding for easier eyeballing. */ say ___ ___ ___ 'sample size is ' samp /*show sample size of this run. */ say say ___ 'required' ___ 'group' ___ '% with required' say ___ ' common ' ___ ' size' ___ 'common birthdays' say ___ '────────' ___ '─────' ___ '────────────────'
/* [↓] where the try-outs happen.*/ do g=2 to grps /*perform through 2──►group size.*/ do try=start.g; s=0 /*perform try-outs until avg>50%.*/ do samp; @.=0 /*perform some number of trials. */ do j=1 for try /*do until K dup birthdays found.*/ day=random(1,diyM) % 100 /*expand random number generation*/ @.day=@.day+1 /*record the # common birthdays.*/ if @.day\==g then iterate /*not 'nuff G hits occurred? ··· */ s=s+1 /*another common birthday found. */ leave /* ··· and stop looking for more.*/ end /*j*/ end /*samp*/
if s/samp>.5 then leave /*if the average is > 50%, stop. */ end /*try*/
say ___ center(g,8) ___ right(try,5) ___ center((s/samp*100)'%',16) end /*g*/ /*stick a fork in it, we're done.*/</lang>
output when using the input of: 10
sample size is 100000 required group % with required common size common birthdays ──────── ───── ──────────────── 2 23 50.70900% 3 88 51.23200% 4 187 50.15100% 5 314 50.77800% 6 460 50.00600% 7 623 50.64800% 8 798 50.00700% 9 985 50.13400% 10 1181 50.22200%
version 2
<lang rexx> /*--------------------------------------------------------------------
* 04.11.2013 Walter Pachl translated from PL/I * Take samp samples of groups with gs persons and check *how many of the groups have at least match persons with same birthday *-------------------------------------------------------------------*/ samp=100000 lo='0 21 85 185 311 458' hi='0 25 89 189 315 462' Do match=2 To 6 Say ' ' Say samp' samples . Percentage of groups with at least', match ' matches' Say 'Group size' Do gs=word(lo,match) To word(hi,match) cnt.=0 Do i=1 To samp ok=0 arr.=0 Do g=1 To gs r=random(1,365) arr.r=arr.r+1 If arr.r=match Then ok=1 End cnt.ok=cnt.ok+1 End hits=cnt.1/samp If hits>=.5 Then arrow=' <-' Else arrow= Say format(gs,10) cnt.0 cnt.1 100*hits||'%'||arrow End End</lang>
Output:
100000 samples . Percentage of groups with at least 2 matches Group size 21 55737 44263 44.26300% 22 52158 47842 47.84200% 23 49141 50859 50.85900% <- 24 46227 53773 53.77300% <- 25 43091 56909 56.90900% <- 100000 samples . Percentage of groups with at least 3 matches Group size 85 52193 47807 47.80700% 86 51489 48511 48.51100% 87 50146 49854 49.85400% 88 48790 51210 51.2100% <- 89 47771 52229 52.22900% <- 100000 samples . Percentage of groups with at least 4 matches Group size 185 50930 49070 49.0700% 186 50506 49494 49.49400% 187 49739 50261 50.26100% <- 188 49024 50976 50.97600% <- 189 48283 51717 51.71700% <- 100000 samples . Percentage of groups with at least 5 matches Group size 311 50909 49091 49.09100% 312 50441 49559 49.55900% 313 49912 50088 50.08800% <- 314 49425 50575 50.57500% <- 315 48930 51070 51.0700% <- 100000 samples . Percentage of groups with at least 6 matches Group size 458 50580 49420 49.4200% 459 49848 50152 50.15200% <- 460 49975 50025 50.02500% <- 461 49316 50684 50.68400% <- 462 49121 50879 50.87900% <-