One of n lines in a file: Difference between revisions

Content added Content deleted

Inline

Revision as of 23:22, 7 September 2011

A method of choosing a line randomly from a file:

Is to:

keep the first line of the file as a possible choice, then
Read the second line of the file if possible and make it the possible choice if a uniform random value between zero and one is less than 1/2.
Read the third line of the file if possible and make it the possible choice if a uniform random value between zero and one is less than 1/3.

...

Read the Nth line of the file if possible and make it the possible choice if a uniform random value between zero and one is less than 1/N

Create a function/method/routine called one_of_n that given n, the number of actual lines in a file, follows the algotrithm above to return an integer - the line number of the line chosen from the file.
The number returned can vary, randomly, in each run.
Use one_of_n in a simulation to find what woud be the chosen line of a 10 line file simulated 1,000,000 times.
Print and show how many times each of the 10 lines is chosen as a rough measure of how well the algorithm works.

Note: You may choose a smaller number of repetitions if necessary, but mention this up-front.

Translation of: Python

<lang perl6>sub one_of_n($n) {

   my $choice;
   $choice = $_ if rand * $_ < 1 for 1 .. $n;
   $choice - 1;

}

sub one_of_n_test($n = 10, $trials = 1000000) {

   my @bins;
   @bins[one_of_n($n)]++ for ^$trials;
   @bins;

}

say one_of_n_test();</lang> Output:

100288 100047 99660 99773 100256 99633 100161 100483 99789 99910

<lang python>from random import random as rnd

def one_of_n(n):

   # Zero based line numbers
   choice = 0
   for i in range(1, n):
       if rnd() < 1. / (i + 1.):
           choice = i
   return choice

def one_of_n_test(n=10, trials=1000000):

   bins = [0] * n
   if n:
       for i in range(trials):
           bins[one_of_n(n)] += 1
   return bins

print(one_of_n_test())</lang>

[99833, 100303, 99902, 100132, 99608, 100117, 99531, 100017, 99795, 100762]