One of n lines in a file
![Task](http://static.miraheze.org/rosettacodewiki/thumb/b/ba/Rcode-button-task-crushed.png/64px-Rcode-button-task-crushed.png)
You are encouraged to solve this task according to the task description, using any language you may know.
A method of choosing a line randomly from a file:
- Without reading the file more than once
- When substantial parts of the file cannot be held in memory
- Without knowing how many lines are in the file
Is to:
- keep the first line of the file as a possible choice, then
- Read the second line of the file if possible and make it the possible choice if a uniform random value between zero and one is less than 1/2.
- Read the third line of the file if possible and make it the possible choice if a uniform random value between zero and one is less than 1/3.
- ...
- Read the Nth line of the file if possible and make it the possible choice if a uniform random value between zero and one is less than 1/N
- Return the computed possible choice when no further lines exist in the file.
- Task
- Create a function/method/routine called
one_of_n
that givenn
, the number of actual lines in a file, follows the algorithm above to return an integer - the line number of the line chosen from the file.
The number returned can vary, randomly, in each run. - Use
one_of_n
in a simulation to find what woud be the chosen line of a 10 line file simulated 1,000,000 times. - Print and show how many times each of the 10 lines is chosen as a rough measure of how well the algorithm works.
Note: You may choose a smaller number of repetitions if necessary, but mention this up-front.
Ada
<lang Ada>with Ada.Text_IO, Ada.Numerics.Float_Random;
procedure One_Of_N is
Num_Of_Lines: constant Positive := 10;
package Rnd renames Ada.Numerics.Float_Random; Gen: Rnd.Generator; -- used globally
function Choose_One_Of_N(Last_Line_Number: Positive) return Natural is Current_Choice: Natural := 0; begin for Line_Number in 1 .. Last_Line_Number loop if (Rnd.Random(Gen) * Float(Line_Number) <= 1.0) then Current_Choice := Line_Number; end if; end loop; return Current_Choice; end Choose_One_Of_N;
Results: array(1 .. Num_Of_Lines) of Natural := (others => 0); Index: Integer range 1 .. Num_Of_Lines;
begin
Rnd.Reset(Gen); for I in 1 .. 1_000_000 loop -- compute results Index := Choose_One_Of_N(Num_Of_Lines); Results(Index) := Results(Index) + 1; end loop;
for R in Results'Range loop -- output results Ada.Text_IO.Put(Integer'Image(Results(R))); end loop;
end One_Of_N;</lang>
Example output:
100104 100075 99761 99851 100457 100315 100101 99557 99678 100101
AutoHotkey
This simulation is for 100,000 repetitions. <lang AutoHotkey>one_of_n(n){
; One based line numbers choice = 1 Loop % n-1 { Random, rnd, 1, % A_Index+1 If rnd = 1 choice := A_Index+1 } return choice
} one_of_n_test(n=10, trials=100000){
bins := [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] Loop % trials bins[one_of_n(n)] += 1 return bins
}
b := one_of_n_test() Loop 10
out .= A_Index ": " b[A_Index] "`n"
MsgBox % out</lang> Output:
--------------------------- One of n.ahk --------------------------- 1: 10046 2: 9958 3: 9953 4: 9973 5: 9915 6: 10212 7: 9941 8: 9993 9: 10063 10: 9946 --------------------------- OK ---------------------------
C
<lang c>#include <stdio.h>
- include <stdlib.h>
inline int irand(int n) { int r, randmax = RAND_MAX/n * n; while ((r = rand()) >= randmax); return r / (randmax / n); }
inline int one_of_n(int n) { int i, r = 0; for (i = 1; i < n; i++) if (!irand(i + 1)) r = i; return r; }
int main(void) { int i, r[10] = {0};
for (i = 0; i < 1000000; i++, r[one_of_n(10)]++); for (i = 0; i < 10; i++) printf("%d%c", r[i], i == 9 ? '\n':' ');
return 0; }</lang>output
100561 99814 99816 99721 99244 99772 100790 100072 99997 100213
Euphoria
<lang Euphoria>-- One of n lines in a file include std/rand.e include std/math.e
function one_of_n(integer n) integer line_num = 1 for i = 2 to n do if rnd() < 1 / i then line_num = i end if end for return line_num end function
procedure main() integer num_reps = 1000000, num_lines_in_file = 10 sequence lines = repeat(0,num_lines_in_file) for i = 1 to num_reps do lines[one_of_n(num_lines_in_file)] += 1 end for for i = 1 to num_lines_in_file do printf(1,"Number of times line %d was selected: %g\n", {i,lines[i]}) end for printf(1,"Total number selected: %d\n", sum(lines) ) end procedure
main() </lang> Sample Output:
Number of times line 1 was selected: 100154 Number of times line 2 was selected: 99778 Number of times line 3 was selected: 99906 Number of times line 4 was selected: 99727 Number of times line 5 was selected: 100025 Number of times line 6 was selected: 100465 Number of times line 7 was selected: 99738 Number of times line 8 was selected: 100135 Number of times line 9 was selected: 99871 Number of times line 10 was selected: 100201 Total number selected: 1000000
Factor
random-line uses an input stream. "/etc/passwd" ascii [ random-line . ] with-file-reader
would print a random line from /etc/passwd.
<lang factor>! rosettacode/random-line/random-line.factor USING: io kernel locals math random ; IN: rosettacode.random-line
- random-line ( -- line )
readln :> choice! 1 :> count! [ readln dup ] [ count 1 + dup count! random zero? [ choice! ] [ drop ] if ] while drop choice ;</lang>
one-of-n wants to use the same algorithm. Factor has duck typing, so one-of-n creates a mock object that quacks like an input stream. This mock object only responds to stream-readln, not the other methods of stream protocol. This works because random-line only needs stream-readln. The mock response is a line number instead of a real line.
<lang factor>! rosettacode/one-of-n/one-of-n.factor USING: accessors io kernel math rosettacode.random-line ; IN: rosettacode.one-of-n
<PRIVATE TUPLE: mock-stream count last ;
- <mock-stream> ( n -- stream )
mock-stream new 0 >>count swap >>last ;
M: mock-stream stream-readln ! stream -- line
dup [ count>> ] [ last>> ] bi < [ [ 1 + ] change-count count>> ] [ drop f ] if ;
PRIVATE>
- one-of-n ( n -- line )
<mock-stream> [ random-line ] with-input-stream* ;
USING: assocs formatting locals sequences sorting ; <PRIVATE
- f>0 ( object/f -- object/0 )
dup [ drop 0 ] unless ;
- test-one-of-n ( -- )
H{ } clone :> chosen 1000000 [ 10 one-of-n chosen [ f>0 1 + ] change-at ] times chosen keys natural-sort [ dup chosen at "%d chosen %d times\n" printf ] each ;
PRIVATE> MAIN: test-one-of-n</lang>
$ ./factor -run=rosettacode.one-of-n Loading resource:work/rosettacode/one-of-n/one-of-n.factor Loading resource:work/rosettacode/random-line/random-line.factor Loading resource:basis/formatting/formatting.factor Loading resource:basis/formatting/formatting-docs.factor 1 chosen 100497 times 2 chosen 100157 times 3 chosen 100207 times 4 chosen 99448 times 5 chosen 100533 times 6 chosen 99774 times 7 chosen 99535 times 8 chosen 99826 times 9 chosen 100058 times 10 chosen 99965 times
Go
<lang go>package main
import (
"bufio" "fmt" "io" "math/rand" "time"
)
// choseLineRandomly implements the method described in the task. // input is a an io.Reader, which could be an os.File, for example. // Or, to implement a simulation, it could be anything else that implements // io.Reader. The method as described suggests saving and returning // lines, but the rest of the task requires line numbers. This function // thus returns both. func choseLineRandomly(r io.Reader) (s string, ln int, err error) {
br := bufio.NewReader(r) s, err = br.ReadString('\n') if err != nil { return } ln = 1 lnLast := 1. var sLast string for { // note bufio.ReadString used here. This effectively defines a // line of the file as zero or more bytes followed by a newline. sLast, err = br.ReadString('\n') if err == io.EOF { return s, ln, nil // normal return } if err != nil { break } lnLast++ if rand.Float64() < 1/lnLast { s = sLast ln = int(lnLast) } } return // error return
}
// oneOfN function required for task item 1. Specified to take a number // n, the number of lines in a file, but the method (above) specified to // to be used does not need n, but rather the file itself. This function // thus takes both, ignoring n and passing the file to choseLineRandomly. func oneOfN(n int, file io.Reader) int {
_, ln, err := choseLineRandomly(file) if err != nil { panic(err) } return ln
}
// simulated file reader for task item 2 type simReader int
func (r *simReader) Read(b []byte) (int, error) {
if *r <= 0 { return 0, io.EOF } b[0] = '\n' *r-- return 1, nil
}
func main() {
// task item 2 simulation consists of accumulating frequency statistic // on 1,000,000 calls of oneOfN on simulated file. n := 10 freq := make([]int, n) rand.Seed(time.Now().UnixNano()) for times := 0; times < 1e6; times++ { sr := simReader(n) freq[oneOfN(n, &sr)-1]++ }
// task item 3. show frequencies. fmt.Println(freq)
}</lang> Output:
[99945 99770 99594 100532 99941 100223 99716 100217 99855 100207]
Icon and Unicon
<lang Icon>procedure main() # one of n
one_of_n_test(10,1000000)
end
procedure one_of_n(n)
every i := 1 to n do choice := (?0 < 1. / i, i) return \choice | fail
end
procedure one_of_n_test(n,trials)
bins := table(0) every i := 1 to trials do bins[one_of_n(n)] +:= 1 every writes(bins[i := 1 to n]," ") return bins
end</lang>
Sample output:
99470 99806 99757 99921 100213 100001 99778 100385 100081 100588
J
This implementation also implements line buffering, since the built-in line handling does not work quite how I want it to work. That said, if a line is too large (many gigabytes, for example), the system will crawl to a halt when the line is read.
<lang j>randLineBig=:3 :0
file=. boxopen y r=. n=. 1 size=. fsize file blocksize=. 1e7 buffer=. for_block. |: blocksize -~/\@(] <. [ * 0 1 +/i.@>.@%~) size do. buffer=. buffer, fread file,<block linends=. LF = buffer lines=. linends <;.2 buffer buffer=. buffer }.~ {: 1+I.linends pick=. (0 ?@$~ #lines) < % n+i.#lines if. 1 e. pick do. r=. ({:I.pick) {:: lines end. n=. n+#lines end. r
)</lang>
Usage: randLineBig 'filename'
Testing:
<lang j> (,LF,.~":,.i.10) fwrite <'seq.txt' 20
(#;~.)/.~ /:~ <@randLineBig"0]1e6#<'seq.txt'
┌──────┬───┐ │99916 │0 │ ├──────┼───┤ │99944 │1 │ ├──────┼───┤ │100250│2 │ ├──────┼───┤ │100621│3 │ ├──────┼───┤ │99594 │4 │ ├──────┼───┤ │100106│5 │ ├──────┼───┤ │99957 │6 │ ├──────┼───┤ │99975 │7 │ ├──────┼───┤ │100054│8 │ ├──────┼───┤ │99583 │9 │ └──────┴───┘</lang>
Java
<lang Java>import java.util.Arrays; import java.util.Random;
public class OneOfNLines {
static Random rand;
public static int oneOfN(int n) { int choice = 0;
for(int i = 1; i < n; i++) { if(rand.nextInt(i+1) == 0) choice = i; }
return choice; }
public static void main(String[] args) { int n = 10; int trials = 1000000; int[] bins = new int[n]; rand = new Random();
for(int i = 0; i < trials; i++) bins[oneOfN(n)]++;
System.out.println(Arrays.toString(bins));
}
}
</lang>
Sample output:
[99832, 99958, 100281, 99601, 99568, 99689, 100118, 99753, 100659, 100541]
OCaml
<lang ocaml>let one_of_n n =
let rec aux i r = if i >= n then r else if Random.int (i + 1) = 0 then aux (succ i) i else aux (succ i) r in aux 1 0
let test ~n ~trials =
let ar = Array.make n 0 in for i = 1 to trials do let d = one_of_n n in ar.(d) <- succ ar.(d) done; Array.iter (Printf.printf " %d") ar; print_newline ()
let () =
Random.self_init (); test ~n:10 ~trials:1_000_000</lang>
Executing:
$ ocamlopt -o one.opt one.ml $ ./one.opt 100620 99719 99928 99864 99760 100151 99553 100529 99800 100076
Perl 6
<lang perl6>sub one_of_n($n) {
my $choice; $choice = $_ if .rand < 1 for 1 .. $n; $choice - 1;
}
sub one_of_n_test($n = 10, $trials = 1000000) {
my @bins; @bins[one_of_n($n)]++ for ^$trials; @bins;
}
say one_of_n_test();</lang> Output:
100288 100047 99660 99773 100256 99633 100161 100483 99789 99910
PicoLisp
<lang PicoLisp>(de one-of-n (N)
(let R 1 (for I N (when (= 1 (rand 1 I)) (setq R I) ) ) R ) )
(let L (need 10 0)
(do 1000000 (inc (nth L (one-of-n 10))) ) L )</lang>
Output:
-> (99893 100145 99532 100400 100263 100229 99732 100116 99709 99981)
PureBasic
<lang purebasic>Procedure.f randomFloat()
ProcedureReturn Random(1000000) / 1000000
EndProcedure
Procedure one_of_n(n)
Protected linesRead, lineChosen While linesRead < n linesRead + 1 If randomFloat() <= (1.0 / (linesRead)) lineChosen = linesRead EndIf Wend ProcedureReturn lineChosen
EndProcedure
If OpenConsole()
#testFileLineCount = 10 #simulationCount = 1000000 Define i Dim a(#testFileLineCount) ;index 0 is not used For i = 1 To #simulationCount x = one_of_n(#testFileLineCount) a(x) + 1 Next For i = 1 To #testFileLineCount Print(Str(a(i)) + " ") Next PrintN("") Print(#CRLF$ + #CRLF$ + "Press ENTER to exit"): Input() CloseConsole()
EndIf</lang> Sample output:
99959 100011 100682 100060 99834 99632 100083 99817 99824 100098
Python
<lang python>from random import randrange
def one_of_n(n):
# Zero based line numbers choice = 0 for i in range(1, n): if randrange(i+1) == 0: choice = i return choice
def one_of_n_test(n=10, trials=1000000):
bins = [0] * n if n: for i in range(trials): bins[one_of_n(n)] += 1 return bins
print(one_of_n_test())</lang>
- Sample output
[99833, 100303, 99902, 100132, 99608, 100117, 99531, 100017, 99795, 100762]
Ruby
<lang ruby># Returns a random line from _io_, or nil if _io_ has no lines.
- # Get a random line from /etc/passwd
- line = open("/etc/passwd") {|f| random_line(f) }
def random_line(io)
choice = io.gets; count = 1 while line = io.gets rand(count += 1).zero? and choice = line end choice
end
def one_of_n(n)
# Create a mock IO that provides line numbers instead of lines. # Assumes that #random_line calls #gets. (mock_io = Object.new).instance_eval do @count = 0 @last = n def self.gets (@count < @last) ? (@count += 1) : nil end end random_line(mock_io)
end
chosen = Hash.new(0) 1_000_000.times { chosen[one_of_n(10)] += 1 } chosen.keys.sort.each do |key|
puts "#{key} chosen #{chosen[key]} times"
end</lang>
$ ruby one-of-n.rb 1 chosen 100470 times 2 chosen 100172 times 3 chosen 100473 times 4 chosen 99725 times 5 chosen 100600 times 6 chosen 99126 times 7 chosen 100297 times 8 chosen 99606 times 9 chosen 100039 times 10 chosen 99492 times
Seed7
<lang seed7>$ include "seed7_05.s7i";
const func integer: one_of_n (in integer: n) is func
result var integer: r is 1; local var integer: i is 0; begin for i range 2 to n do if rand(1, i) = 1 then r := i; end if; end for; end func;
const proc: main is func
local var array integer: r is 10 times 0; var integer: i is 0; begin for i range 1 to 1000000 do incr(r[one_of_n(10)]); end for; for i range 1 to 10 do write(r[i] <& " "); end for; writeln; end func;</lang>
Output:
100372 99661 100264 99644 100180 99748 99718 100205 99714 100494
Tcl
<lang tcl>package require Tcl 8.5 proc 1ofN {n} {
for {set line 1} {$line <= $n} {incr line} {
if {rand() < 1.0/[incr fraction]} { set result $line }
} return $result
}
for {set i 0} {$i < 1000000} {incr i} {
incr count([1ofN 10])
} parray count; # Alphabetic order, but convenient</lang> Sample output:
count(1) = 99862 count(10) = 100517 count(2) = 100545 count(3) = 100339 count(4) = 99636 count(5) = 99920 count(6) = 99263 count(7) = 100283 count(8) = 99871 count(9) = 99764