Knuth's algorithm S

Jump to: navigation, search
Knuth's algorithm S
You are encouraged to solve this task according to the task description, using any language you may know.

This is a method of randomly sampling n items from a set of M items, with equal probability; where M >= n and M, the number of items is unknown until the end. This means that the equal probability sampling should be maintained for all successive items > n as they become available (although the content of successive samples can change).

The algorithm
1. Select the first n items as the sample as they become available;
2. For the i-th item where i > n, have a random chance of n/i of keeping it. If failing this chance, the sample remains the same. If not, have it randomly (1/n) replace one of the previously selected n items of the sample.
3. Repeat #2 for any subsequent items.
The Task
1. Create a function `s_of_n_creator` that given n the maximum sample size, returns a function `s_of_n` that takes one parameter, `item`.
2. Function `s_of_n` when called with successive items returns an equi-weighted random sample of up to n of its items so far, each time it is called, calculated using Knuths Algorithm S.
3. Test your functions by printing and showing the frequency of occurrences of the selected digits from 100,000 repetitions of:
1. Use the s_of_n_creator with n == 3 to generate an s_of_n.
2. call s_of_n with each of the digits 0 to 9 in order, keeping the returned three digits of its random sampling from its last call with argument item=9.

Note: A class taking n and generating a callable instance/function might also be used.

Reference
• The Art of Computer Programming, Vol 2, 3.4.2 p.142
Cf.

Ada

Instead of defining a function S_of_N_Creator, we define a generic packgage with that name. The generic parameters are N (=Sample_Size) and the type of the items to be sampled:

`generic   Sample_Size: Positive;   type Item_Type is private;package S_Of_N_Creator is    subtype Index_Type is Positive range 1 .. Sample_Size;   type Item_Array is array (Index_Type) of Item_Type;    procedure Update(New_Item: Item_Type);   function Result return Item_Array; end S_Of_N_Creator;`

Here is the implementation of that package:

`with Ada.Numerics.Float_Random, Ada.Numerics.Discrete_Random; package body S_Of_N_Creator is    package F_Rnd renames Ada.Numerics.Float_Random;   F_Gen: F_Rnd.Generator;    package D_Rnd is new Ada.Numerics.Discrete_Random(Index_Type);   D_Gen: D_Rnd.Generator;    Item_Count: Natural := 0; -- this is a global counter   Sample: Item_Array; -- also used globally    procedure Update(New_Item: Item_Type) is   begin      Item_Count := Item_Count + 1;      if Item_Count <= Sample_Size then         -- select the first Sample_Size items as the sample         Sample(Item_Count) := New_Item;      else         -- for I-th item, I > Sample_Size: Sample_Size/I chance of keeping it         if (Float(Sample_Size)/Float(Item_Count)) > F_Rnd.Random(F_Gen)  then            -- randomly (1/Sample_Size) replace one of the items of the sample            Sample(D_Rnd.Random(D_Gen)) := New_Item;         end if;      end if;   end Update;    function Result return Item_Array is   begin      Item_Count := 0; -- ready to start another run      return Sample;   end Result; begin   D_Rnd.Reset(D_Gen); -- at package instantiation, initialize rnd-generators   F_Rnd.Reset(F_Gen);end S_Of_N_Creator;`

The main program:

`with S_Of_N_Creator, Ada.Text_IO; procedure Test_S_Of_N is    Repetitions: constant Positive := 100_000;   type D_10 is range 0 .. 9;    -- the instantiation of the generic package S_Of_N_Creator generates    -- a package with the desired functionality   package S_Of_3 is new S_Of_N_Creator(Sample_Size => 3, Item_Type => D_10);    Sample: S_Of_3.Item_Array;   Result: array(D_10) of Natural := (others => 0); begin   for J in 1 .. Repetitions loop      -- get Sample      for Dig in D_10 loop         S_Of_3.Update(Dig);      end loop;      Sample := S_Of_3.Result;       -- update current Result      for Item in Sample'Range loop         Result(Sample(Item)) := Result(Sample(Item)) + 1;      end loop;   end loop;    -- finally: output Result   for Dig in Result'Range loop      Ada.Text_IO.Put(D_10'Image(Dig) & ":"                        & Natural'Image(Result(Dig)) & ";   ");   end loop;end Test_S_Of_N;`

A sample output:

` 0: 30008;    1: 30056;    2: 30080;    3: 29633;    4: 29910;    5: 30293;    6: 30105;    7: 29924;    8: 29871;    9: 30120; `

BBC BASIC

At each of the 100000 repetitions not only is a new function created but also new copies of its PRIVATE variables index% and samples%(). Creating such a large number of variables at run-time impacts adversely on execution speed and isn't to be recommended, other than to meet the artificial requirements of the task.

`      HIMEM = PAGE + 20000000       PRINT "Single run samples for n = 3:"      SofN% = FNs_of_n_creator(3)      FOR I% = 0 TO 9        !^a%() = FN(SofN%)(I%)        PRINT " For item " ; I% " sample(s) = " FNshowarray(a%(), I%+1)      NEXT       DIM cnt%(9)      PRINT '"Digit counts after 100000 runs:"      FOR rep% = 1 TO 100000        IF (rep% MOD 1000) = 0 PRINT ; rep% ; CHR\$(13) ;        F% = FNs_of_n_creator(3)        FOR I% = 0 TO 9          !^a%() = FN(F%)(I%)        NEXT        cnt%(a%(1)) += 1 : cnt%(a%(2)) += 1 : cnt%(a%(3)) += 1      NEXT      FOR digit% = 0 TO 9        PRINT " " ; digit% " : " ; cnt%(digit%)      NEXT      END       REM Dynamically creates this function:      REM DEF FNfunction(item%) : PRIVATE samples%(), index%      REM DIM samples%(n%) : = FNs_of_n(item%, samples%(), index%)      DEF FNs_of_n_creator(n%)      LOCAL p%, f\$      f\$ = "(item%) : " + CHR\$&0E + " samples%(), index% : " + \      \    CHR\$&DE + " samples%(" + STR\$(n%) + ") : = " + \      \    CHR\$&A4 + "s_of_n(item%, samples%(), index%)"      DIM p% LEN(f\$) + 4 : \$(p%+4) = f\$ : !p% = p%+4      = p%       DEF FNs_of_n(D%, s%(), RETURN I%)      LOCAL N%      N% = DIM(s%(),1)      I% += 1      IF I% <= N% THEN        s%(I%) = D%      ELSE        IF RND(I%) <= N% s%(RND(N%)) = D%      ENDIF      = !^s%()       DEF FNshowarray(a%(), n%)      LOCAL i%, a\$      a\$ = "["      IF n% > DIM(a%(),1) n% = DIM(a%(),1)      FOR i% = 1 TO n%        a\$ += STR\$(a%(i%)) + ", "      NEXT      = LEFT\$(LEFT\$(a\$)) + "]"`

Output:

```Single run samples for n = 3:
For item 0 sample(s) = [0]
For item 1 sample(s) = [0, 1]
For item 2 sample(s) = [0, 1, 2]
For item 3 sample(s) = [0, 1, 2]
For item 4 sample(s) = [0, 1, 4]
For item 5 sample(s) = [0, 1, 4]
For item 6 sample(s) = [0, 1, 6]
For item 7 sample(s) = [0, 1, 6]
For item 8 sample(s) = [8, 1, 6]
For item 9 sample(s) = [8, 1, 9]

Digit counts after 100000 runs:
0 : 30068
1 : 30017
2 : 30378
3 : 29640
4 : 30153
5 : 29994
6 : 29941
7 : 29781
8 : 29918
9 : 30110
```

C

Instead of returning a closure we set the environment in a structure:

`#include <stdlib.h>#include <stdio.h>#include <string.h>#include <time.h> struct s_env {    unsigned int n, i;    size_t size;    void *sample;}; void s_of_n_init(struct s_env *s_env, size_t size, unsigned int n){    s_env->i = 0;    s_env->n = n;    s_env->size = size;    s_env->sample = malloc(n * size);} void sample_set_i(struct s_env *s_env, unsigned int i, void *item){    memcpy(s_env->sample + i * s_env->size, item, s_env->size);} void *s_of_n(struct s_env *s_env, void *item){    s_env->i++;    if (s_env->i <= s_env->n)        sample_set_i(s_env, s_env->i - 1, item);    else if ((rand() % s_env->i) < s_env->n)        sample_set_i(s_env, rand() % s_env->n, item);    return s_env->sample;} int *test(unsigned int n, int *items_set, unsigned int num_items){    int i;    struct s_env s_env;    s_of_n_init(&s_env, sizeof(items_set[0]), n);    for (i = 0; i < num_items; i++) {        s_of_n(&s_env, (void *) &items_set[i]);    }    return (int *)s_env.sample;} int main(){    unsigned int i, j;    unsigned int n = 3;    unsigned int num_items = 10;    unsigned int *frequencies;    int *items_set;    srand(time(NULL));    items_set = malloc(num_items * sizeof(int));    frequencies = malloc(num_items * sizeof(int));    for (i = 0; i < num_items; i++) {        items_set[i] = i;        frequencies[i] = 0;    }    for (i = 0; i < 100000; i++) {        int *res = test(n, items_set, num_items);        for (j = 0; j < n; j++) {            frequencies[res[j]]++;        }	free(res);    }    for (i = 0; i < num_items; i++) {        printf(" %d", frequencies[i]);    }    puts("");    return 0;}`

C++

Works with: C++11
`#include <iostream>#include <functional>#include <vector>#include <cstdlib>#include <ctime> template <typename T>std::function<std::vector<T>(T)> s_of_n_creator(int n) {  std::vector<T> sample;  int i = 0;  return [=](T item) mutable {    i++;    if (i <= n) {      sample.push_back(item);    } else if (std::rand() % i < n) {      sample[std::rand() % n] = item;    }    return sample;  };} int main() {  std::srand(std::time(NULL));  int bin[10] = {0};  for (int trial = 0; trial < 100000; trial++) {    auto s_of_n = s_of_n_creator<int>(3);    std::vector<int> sample;    for (int i = 0; i < 10; i++)      sample = s_of_n(i);    for (int s : sample)      bin[s]++;  }  for (int x : bin)    std::cout << x << std::endl;  return 0;}`
Output:
```30052
29740
30197
30223
29857
29688
30095
29803
30098
30247
```

Class-based version:

`#include <iostream>#include <vector>#include <cstdlib>#include <ctime> template <typename T>class SOfN {  std::vector<T> sample;  int i;  const int n; public:  SOfN(int _n) : i(0), n(_n) { }  std::vector<T> operator()(T item) {    i++;    if (i <= n) {      sample.push_back(item);    } else if (std::rand() % i < n) {      sample[std::rand() % n] = item;    }    return sample;  }}; int main() {  std::srand(std::time(NULL));  int bin[10] = {0};  for (int trial = 0; trial < 100000; trial++) {    SOfN<int> s_of_n(3);    std::vector<int> sample;    for (int i = 0; i < 10; i++)      sample = s_of_n(i);    for (std::vector<int>::const_iterator i = sample.begin(); i != sample.end(); i++)      bin[*i]++;  }  for (int i = 0; i < 10; i++)    std::cout << bin[i] << std::endl;  return 0;}`

Clojure

The Clojure approach to problems like this is to define a function which takes an accumulator state and an input item and produces the updated state. Here the accumulator state is the current sample and the number of items processed. This function is then used in a reduce call with an initial state and a list of items.

`(defn s-of-n-fn-creator [n]  (fn [[sample iprev] item]    (let [i (inc iprev)]      (if (<= i n)        [(conj sample item) i]        (let [r (rand-int i)]          (if (< r n)            [(assoc sample r item) i]            [sample i])))))) (def s-of-3-fn (s-of-n-fn-creator 3)) (->> #(reduce s-of-3-fn [[] 0] (range 10))    (repeatedly 100000)    (map first)    flatten    frequencies    sort    println) `

Sample output:

`([0 29924] [1 30053] [2 30018] [3 29765] [4 29974] [5 30225] [6 30082] [7 29996] [8 30128] [9 29835])`

If we really need a stateful (thread safe!) function for some reason, we can get it like this:

`(defn s-of-n-creator [n]  (let [state (atom [[] 0])        s-of-n-fn (s-of-n-fn-creator n)]    (fn [item]      (first (swap! state s-of-n-fn item)))))`

CoffeeScript

` s_of_n_creator = (n) ->  arr = []  cnt = 0  (elem) ->    cnt += 1    if cnt <= n      arr.push elem    else      pos = Math.floor(Math.random() * cnt)      if pos < n        arr[pos] = elem    arr.sort() sample_size = 3    range = [0..9]num_trials = 100000 counts = {} for digit in range  counts[digit] = 0 for i in [1..num_trials]  s_of_n = s_of_n_creator(sample_size)  for digit in range    sample = s_of_n(digit)  for digit in sample    counts[digit] += 1 for digit in range  console.log digit, counts[digit] `

output

` > coffee knuth_sample.coffee 0 298991 298412 299303 300584 299325 299486 300477 301148 299769 30255 `

Common Lisp

`(defun s-n-creator (n)  (let ((sample (make-array n :initial-element nil))        (i 0))    (lambda (item)      (if (<= (incf i) n)          (setf (aref sample (1- i)) item)        (when (< (random i) n)          (setf (aref sample (random n)) item)))      sample))) (defun algorithm-s ()  (let ((*random-state* (make-random-state t))        (frequency (make-array '(10) :initial-element 0)))    (loop repeat 100000          for s-of-n = (s-n-creator 3)          do (flet ((s-of-n (item)                      (funcall s-of-n item)))               (map nil                    (lambda (i)                      (incf (aref frequency i)))                    (loop for i from 0 below 9                          do (s-of-n i)                          finally (return (s-of-n 9))))))    frequency)) (princ (algorithm-s)) `
output
`#(30026 30023 29754 30017 30267 29997 29932 29990 29965 30029)`

D

`import std.stdio, std.random; auto sofN_creator(in int n) {    size_t i;    int[] sample;     return (in int item) {        i++;        if (i <= n)            sample ~= item;        else if (uniform01 < (double(n) / i))            sample[uniform(0, n)] = item;        return sample;    };} void main() {    enum nRuns = 100_000;    size_t[10] bin;     foreach (immutable trial; 0 .. nRuns) {        immutable sofn = sofN_creator(3);        int[] sample;        foreach (immutable item; 0 .. bin.length)            sample = sofn(item);        foreach (immutable s; sample)            bin[s]++;    }    writefln("Item counts for %d runs:\n%s", nRuns, bin);}`
Output:
```Item counts for 100000 runs:
[30191, 29886, 29988, 30149, 30251, 29997, 29748, 29909, 30041, 29840]```

Faster Version

`import std.stdio, std.random, std.algorithm; struct SOfN(size_t n) {    size_t i;    int[n] sample = void;     int[] next(in size_t item, ref Xorshift rng) {        i++;        if (i <= n)            sample[i - 1] = item;        else if (rng.uniform01 < (double(n) / i))            sample[uniform(0, n, rng)] = item;        return sample[0 .. min(i, \$)];    }} void main() {    enum nRuns = 100_000;    size_t[10] bin;    auto rng = Xorshift(0);     foreach (immutable trial; 0 .. nRuns) {        SOfN!3 sofn;        foreach (immutable item; 0 .. bin.length - 1)            sofn.next(item, rng);        foreach (immutable s; sofn.next(bin.length - 1, rng))            bin[s]++;    }    writefln("Item counts for %d runs:\n%s", nRuns, bin);}`

Go

`package main import (    "fmt"    "math/rand"    "time") func sOfNCreator(n int) func(byte) []byte {    s := make([]byte, 0, n)    m := n    return func(item byte) []byte {        if len(s) < n {            s = append(s, item)        } else {            m++            if rand.Intn(m) < n {                s[rand.Intn(n)] = item            }        }        return s    }} func main() {    rand.Seed(time.Now().UnixNano())    var freq [10]int    for r := 0; r < 1e5; r++ {        sOfN := sOfNCreator(3)        for d := byte('0'); d < '9'; d++ {            sOfN(d)        }        for _, d := range sOfN('9') {            freq[d-'0']++        }    }    fmt.Println(freq)}`

Output:

```[30075 29955 30024 30095 30031 30018 29973 29642 30156 30031]
```

Icon and Unicon

The following solution makes use of the makeProc procedure defined in the UniLib library and so is Unicon specific. However, the solution can be modified to work in Icon as well.

Technically, s_of_n_creator returns a co-expression, not a function. In Unicon, the calling syntax for this co-expression is indistinguishable from that of a function.

`import Utils procedure main(A)    freq := table(0)    every 1 to (\A[2] | 100000)\1 do {        s_of_n := s_of_n_creator(\A[1] | 3)        every sample := s_of_n(0 to 9)        every freq[!sample] +:= 1        }    every write(i := 0 to 9,": ",right(freq[i],6))end procedure s_of_n_creator(n)    items := []    itemCnt := 0.0    return makeProc {               repeat {                   item := (items@&source)[1]                   itemCnt +:= 1                   if *items < n then put(items, item)                   else if ?0 < (n/itemCnt) then ?items := item                   }               }end`

and a sample run:

```->kas
0:  29941
1:  29963
2:  29941
3:  30005
4:  30087
5:  29895
6:  30075
7:  30059
8:  29962
9:  30072
->```

J

Note that this approach introduces heavy inefficiencies, to achieve information hiding.

`coclass'inefficient'  create=:3 :0    N=: y    ITEMS=: ''    K=:0  )   s_of_n=:3 :0    K=: K+1    if. N>#ITEMS do.      ITEMS=: ITEMS,y    else.      if. (N%K)>?0 do.        ITEMS=: (((i.#ITEMS)-.?N){ITEMS),y      else.        ITEMS      end.    end.  )  s_of_n_creator_base_=: 1 :0  ctx=: conew&'inefficient' m  s_of_n__ctx)`

Required example:

`run=:3 :0  nl=. conl 1  s3_of_n=. 3 s_of_n_creator  r=. {: s3_of_n"0 i.10  coerase (conl 1)-.nl  r)    (~.,._1 + #/.~) (i.10),,D=:run"0 i.1e50 300991 299732 297953 299954 299965 302896 299037 299938 302159 29742`

Java

A class-based solution:

`import java.util.*; class SOfN<T> {    private static final Random rand = new Random();     private List<T> sample;    private int i = 0;    private int n;    public SOfN(int _n) {	n = _n;	sample = new ArrayList<T>(n);    }    public List<T> process(T item) {	i++;	if (i <= n) {            sample.add(item);	} else if (rand.nextInt(i) < n) {	    sample.set(rand.nextInt(n), item);	}	return sample;    }} public class AlgorithmS {    public static void main(String[] args) {	int[] bin = new int[10];	for (int trial = 0; trial < 100000; trial++) {	    SOfN<Integer> s_of_n = new SOfN<Integer>(3);	    List<Integer> sample = null;	    for (int i = 0; i < 10; i++)		sample = s_of_n.process(i);	    for (int s : sample)		bin[s]++;	}	System.out.println(Arrays.toString(bin));    }}`

Output:

`[30115, 30141, 30050, 29887, 29765, 30132, 29767, 30114, 30079, 29950]`

Alternative solution without using an explicitly named type; instead using an anonymous class implementing a generic "function" interface:

`import java.util.*; interface Function<S, T> {    public T call(S x);} public class AlgorithmS {    private static final Random rand = new Random();    public static <T> Function<T, List<T>> s_of_n_creator(final int n) {	return new Function<T, List<T>>() {	    private List<T> sample = new ArrayList<T>(n);	    private int i = 0;	    public List<T> call(T item) {		i++;		if (i <= n) {		    sample.add(item);		} else if (rand.nextInt(i) < n) {		    sample.set(rand.nextInt(n), item);		}		return sample;	    }	};    }     public static void main(String[] args) {	int[] bin = new int[10];	for (int trial = 0; trial < 100000; trial++) {	    Function<Integer, List<Integer>> s_of_n = s_of_n_creator(3);	    List<Integer> sample = null;	    for (int i = 0; i < 10; i++)		sample = s_of_n.call(i);	    for (int s : sample)		bin[s]++;	}	System.out.println(Arrays.toString(bin));    }}`

Julia

` function makesofn(n::Int)    buf = Any[]    i = 0    function sofn(item)        i += 1        if i <= n            push!(buf, item)        else            j = rand(1:i)            if j <= n                buf[j] = item            end        end        return buf    end    return sofnend  nhist = zeros(Int, 10) for i in 1:10^5    kas = makesofn(3)    for j in 0:8        kas(j)    end    for k in kas(9)        nhist[k+1] += 1    endend println("Simulating sof3(0:9) 100000 times:")for (i, c) in enumerate(nhist)    println(@sprintf "   %2d => %5d" i-1 c)end `
Output:
```Simulating sof3(0:9) 100000 times:
0 => 29994
1 => 30026
2 => 30173
3 => 29590
4 => 29967
5 => 30104
6 => 30185
7 => 29761
8 => 30147
9 => 30053
```

Objective-C

Works with: Mac OS X version 10.6+

Uses blocks

`#import <Foundation/Foundation.h> typedef NSArray *(^SOfN)(id); SOfN s_of_n_creator(int n) {  NSMutableArray *sample = [[NSMutableArray alloc] initWithCapacity:n];  __block int i = 0;  return [^(id item) {    i++;    if (i <= n) {      [sample addObject:item];    } else if (rand() % i < n) {      sample[rand() % n] = item;    }    return sample;  } copy];} int main(int argc, const char *argv[]) {  @autoreleasepool {     NSCountedSet *bin = [[NSCountedSet alloc] init];    for (int trial = 0; trial < 100000; trial++) {      SOfN s_of_n = s_of_n_creator(3);      NSArray *sample;      for (int i = 0; i < 10; i++)        sample = s_of_n(@(i));      [bin addObjectsFromArray:sample];    }    NSLog(@"%@", bin);   }  return 0;}`

Log:

```<NSCountedSet: 0x100114120> (0 [29934], 9 [30211], 5 [29926], 1 [30067], 6 [30001], 2 [29972], 7 [30126], 3 [29944], 8 [29910], 4 [29909])
```

OCaml

`let s_of_n_creator n =  let i = ref 0  and sample = ref [| |] in  fun item ->    incr i;    if !i <= n then sample := Array.append [| item |] !sample    else if Random.int !i < n then !sample.(Random.int n) <- item;    !sample let test n items_set =  let s_of_n = s_of_n_creator n in  Array.fold_left (fun _ v -> s_of_n v) [| |] items_set let () =  Random.self_init();  let n = 3 in  let num_items = 10 in  let items_set = Array.init num_items (fun i -> i) in  let results = Array.create num_items 0 in  for i = 1 to 100_000 do    let res = test n items_set in    Array.iter (fun j -> results.(j) <- succ results.(j)) res  done;  Array.iter (Printf.printf " %d") results;  print_newline()`

Output:

` 30051 29899 30249 30058 30012 29836 29998 29882 30148 29867`

PARI/GP

 This example is in need of improvement: Does not return a function.
`KnuthS(v,n)={  my(u=vector(n,i,i));  for(i=n+1,#v,    if(random(i)<n,u[random(n)+1]=i)  );  vecextract(v,u)};test()={  my(v=vector(10),t);  for(i=1,1e5,    t=KnuthS([0,1,2,3,4,5,6,7,8,9],3);    v[t[1]+1]++;v[t[2]+1]++;v[t[3]+1]++  );  v};`

Output:

`%1 = [30067, 30053, 29888, 30161, 30204, 29990, 30175, 29980, 29622, 29860]`

Perl

`use strict; sub s_of_n_creator {    my \$n = shift;    my @sample;    my \$i = 0;    sub {        my \$item = shift;        \$i++;        if (\$i <= \$n) {            # Keep first n items            push @sample, \$item;        } elsif (rand() < \$n / \$i) {            # Keep item            @sample[rand \$n] = \$item;        }        @sample    }} my @items = (0..9);my @bin; foreach my \$trial (1 .. 100000) {    my \$s_of_n = s_of_n_creator(3);    my @sample;    foreach my \$item (@items) {        @sample = \$s_of_n->(\$item);    }    foreach my \$s (@sample) {        \$bin[\$s]++;    }}print "@bin\n"; `
Sample output
`30003 29923 30192 30164 29994 29976 29935 29860 30040 29913`

Perl 6

`sub s_of_n_creator(\$n) {    my @sample;    my \$i = 0;    -> \$item {        if ++\$i <= \$n {            push @sample, \$item;        }        elsif \$i.rand < \$n {            @sample[\$n.rand] = \$item;        }        @sample;    }} my @items = 0..9;my @bin; for ^100000 {    my &s_of_n = s_of_n_creator(3);    my @sample;    for @items -> \$item {        @sample = s_of_n(\$item);    }    for @sample -> \$s {        @bin[\$s]++;    }}say @bin;`

Output:

`29975 30028 30246 30056 30004 29983 29836 29967 29924 29981`

PHP

Works with: PHP version 5.3+
`<?phpfunction s_of_n_creator(\$n) {    \$sample = array();    \$i = 0;    return function(\$item) use (&\$sample, &\$i, \$n) {        \$i++;        if (\$i <= \$n) {            // Keep first n items            \$sample[] = \$item;        } else if (rand(0, \$i-1) < \$n) {            // Keep item            \$sample[rand(0, \$n-1)] = \$item;        }        return \$sample;    };} \$items = range(0, 9); for (\$trial = 0; \$trial < 100000; \$trial++) {    \$s_of_n = s_of_n_creator(3);    foreach (\$items as \$item)        \$sample = \$s_of_n(\$item);    foreach (\$sample as \$s)        \$bin[\$s]++;}print_r(\$bin);?>`
Sample output
```Array
(
[3] => 30158
[8] => 29859
[9] => 29984
[6] => 29937
[7] => 30361
[4] => 29994
[5] => 29849
[0] => 29724
[1] => 29997
[2] => 30137
)```

PicoLisp

`(de s_of_n_creator (@N)   (curry (@N (I . 0) (Res)) (Item)      (cond         ((>= @N (inc 'I)) (push 'Res Item))         ((>= @N (rand 1 I)) (set (nth Res (rand 1 @N)) Item)) )      Res ) ) (let Freq (need 10 0)   (do 100000      (let S_of_n (s_of_n_creator 3)         (for I (mapc S_of_n (0 1 2 3 4 5 6 7 8 9))            (inc (nth Freq (inc I))) ) ) )   Freq )`

Output:

`-> (30003 29941 29918 30255 29848 29875 30056 29839 30174 30091)`

Python

Works with: Python version 3.x
`from random import randrange def s_of_n_creator(n):    sample, i = [], 0    def s_of_n(item):        nonlocal i         i += 1        if i <= n:            # Keep first n items            sample.append(item)        elif randrange(i) < n:            # Keep item            sample[randrange(n)] = item        return sample    return s_of_n if __name__ == '__main__':    bin = [0]* 10    items = range(10)    print("Single run samples for n = 3:")    s_of_n = s_of_n_creator(3)    for item in items:        sample = s_of_n(item)        print("  Item: %i -> sample: %s" % (item, sample))    #    for trial in range(100000):        s_of_n = s_of_n_creator(3)        for item in items:            sample = s_of_n(item)        for s in sample:            bin[s] += 1    print("\nTest item frequencies for 100000 runs:\n ",          '\n  '.join("%i:%i" % x for x in enumerate(bin)))`
Sample output
```Single run samples for n = 3:
Item: 0 -> sample: [0]
Item: 1 -> sample: [0, 1]
Item: 2 -> sample: [0, 1, 2]
Item: 3 -> sample: [0, 1, 3]
Item: 4 -> sample: [0, 1, 3]
Item: 5 -> sample: [0, 1, 3]
Item: 6 -> sample: [0, 1, 3]
Item: 7 -> sample: [0, 3, 7]
Item: 8 -> sample: [0, 3, 7]
Item: 9 -> sample: [0, 3, 7]

Test item frequencies for 100000 runs:
0:29983
1:30240
2:29779
3:29921
4:30224
5:29967
6:30036
7:30050
8:29758
9:30042```

Python Class based version

Only a slight change creates the following class-based implementation:

`class S_of_n_creator():    def __init__(self, n):        self.n = n        self.i = 0        self.sample = []     def __call__(self, item):        self.i += 1        n, i, sample = self.n, self.i, self.sample        if i <= n:            # Keep first n items            sample.append(item)        elif randrange(i) < n:            # Keep item            sample[randrange(n)] = item        return sample`

The above can be instantiated as follows after which `s_of_n` can be called in the same way as it is in the first example where it is a function instead of an instance.

`s_of_n = S_of_n_creator(3)`

Racket

`#lang racket/base (define (s-of-n-creator n)  (let ([count 0] ; 'i' in the description         [vec (make-vector n)]) ; store the elts we've seen so far    (lambda (item)      (if (< count n)          ; we're not full, so, kind of boring          (begin            (vector-set! vec count item)            (set! count (+ count 1)))          ; we've already seen n elts; fun starts          (begin            (set! count (+ count 1))            (when (< (random count) n)              (vector-set! vec (random n) item))))      vec))) (define counts (make-vector 10)) (for ([iter (in-range 0 100000)]) ; trials  (let ([s-of-n (s-of-n-creator 3)]) ; set up the chooser    (for ([d (in-vector ; iterate over the chosen digits              (for/last ([digit (in-range 0 10)]) ; loop through the digits                        (s-of-n digit)))]) ; feed them in      (vector-set! counts d (add1 (vector-ref counts d)))))) ; update counts (for ([d (in-range 0 10)])  (printf "~a ~a~n" d (vector-ref counts d)))`

Output:

```0 29906
1 29863
2 29953
3 30111
4 29867
5 30157
6 29985
7 30325
8 30030
9 29803
```

REXX

`/*REXX program using Knuth's algorithm S  (a random sampling  N of M  items). */parse arg trials size .                /*obtain optional arguments from the CL*/if trials==''  then trials=100000      /*Not specified?  Then use the default.*/if   size==''  then size=3             /* "      "         "   "   "     "    */#.=0                                   /*initialize an array of freq counters.*/      do trials                        /*OK, now let's light this candle.     */      call SofN_creator  size          /*create initial list of   N   items.  */          do gener=0  for 10            /*and then call  SofN  for each digit. */         call SofN   gener             /*call  SofN  with a single decimal dig*/         end       /*gener*/              do count=1  for size      /*let's examine what  SofN  generated. */             _=!.count                 /*get a digit from the  Nth  item, and */             #._=#._+1                 /*          ···  count it, of course.  */             end   /*count*/      end          /*trials*/ say "Using Knuth's algorithm  S  for"   commas(trials),                                  'trials, and with size='commas(size)":";   say      do dig=0  to 9                   /* [↓]  display the frequency of a dig.*/      say left('',15)   "frequency of the"    dig    'digit is:'   commas(#.dig)      end   /*dig*/exit                                   /*stick a fork in it,  we're all done. *//*────────────────────────────────────────────────────────────────────────────*/SofN_creator: parse arg item 1 items   /*generate  ITEM  number of items.     */                 do k=1  for item      /*traipse through the firs  N  items.  */                 !.k=random(0, 9)      /*set the  Kth  item with random digit.*/                 end   /*k*/return                                 /*the piddly stuff is done for now.    *//*────────────────────────────────────────────────────────────────────────────*/SofN:  parse arg item;  items=items+1  /*get  "item",  bump the items counter.*/c=random(1, items)                     /*should we replace a previous item?   */if c>size  then return                 /*probability isn't good, so skip it.  */_=random(1, size)                      /*now, figure out which previous ···   */!._=item                               /*     ··· item to replace with  ITEM. */return/*────────────────────────────────────────────────────────────────────────────*/commas: procedure;  parse arg _;   n=_'.9';    #=123456789;    b=verify(n,#,"M")        e=verify(n,#'0',,verify(n,#"0.",'M'))-4           do j=e  to b  by -3;   _=insert(',',_,j);    end  /*j*/;     return _`

output   when using the default input of:   100000   2

```Using Knuth's algorihm  S  for 100,000 trials, and with size=3:

frequency of the 0 digit is: 29,837
frequency of the 1 digit is: 29,871
frequency of the 2 digit is: 30,071
frequency of the 3 digit is: 29,965
frequency of the 4 digit is: 30,082
frequency of the 5 digit is: 30,106
frequency of the 6 digit is: 30,109
frequency of the 7 digit is: 29,843
frequency of the 8 digit is: 30,192
frequency of the 9 digit is: 29,924
```

Ruby

Using a closure

`def s_of_n_creator(n)  sample = []  i = 0  Proc.new do |item|    i += 1    if i <= n      sample << item    elsif rand(i) < n      sample[rand(n)] = item    end    sample  endend frequency = Array.new(10,0)100_000.times do  s_of_n = s_of_n_creator(3)  sample = nil  (0..9).each {|digit| sample = s_of_n[digit]}  sample.each {|digit| frequency[digit] += 1}end (0..9).each {|digit| puts "#{digit}\t#{frequency[digit]}"}`

Example

```0       29850
1       30015
2       29970
3       29789
4       29841
5       30075
6       30281
7       30374
8       29953
9       29852```

Swift

`import Darwin func s_of_n_creator<T>(n: Int) -> T -> [T]  {  var sample = [T]()  var i = 0  return {(item: T) in    i++    if (i <= n) {      sample.append(item)    } else if (Int(arc4random_uniform(UInt32(i))) < n) {      sample[Int(arc4random_uniform(UInt32(n)))] = item    }    return sample  }} var bin = [Int](count:10, repeatedValue:0)for trial in 0..<100000 {  let s_of_n: Int -> [Int] = s_of_n_creator(3)  var sample: [Int] = []  for i in 0..<10 {    sample = s_of_n(i)  }  for s in sample {    bin[s]++  }}println(bin)`
Output:
```[30038, 29913, 30047, 30069, 30159, 30079, 29773, 29962, 30000, 29960]
```

Tcl

`package require Tcl 8.6 oo::class create SofN {    variable items size count    constructor {n} {	set size \$n    }    method item {item} {	if {[incr count] <= \$size} {	    lappend items \$item	} elseif {rand()*\$count < \$size} {	    lset items [expr {int(\$size * rand())}] \$item	}	return \$items    }} # Test codefor {set i 0} {\$i < 100000} {incr i} {    set sOf3 [SofN new 3]    foreach digit {0 1 2 3 4 5 6 7 8 9} {	set digs [\$sOf3 item \$digit]    }    \$sOf3 destroy    foreach digit \$digs {	incr freq(\$digit)    }}parray freq`
Sample output:
```freq(0) = 29812
freq(1) = 30099
freq(2) = 29927
freq(3) = 30106
freq(4) = 30048
freq(5) = 29993
freq(6) = 29912
freq(7) = 30219
freq(8) = 30060
freq(9) = 29824
```

zkl

`fcn s_of_n_creator(n){   fcn(item,ri,N,samples){      i:=ri.inc(); // 1,2,3,4,...      if(i<=N) samples.append(item);      else if ((0).random(i) < N) samples[(0).random(N)] = item;      samples   }.fp1(Ref(1),n,L())}`

One run:

`s3:=s_of_n_creator(3);[0..9].pump(List,s3,"copy").println();`
Output:
```L(L(0),L(0,1),L(0,1,2),L(0,1,2),L(0,4,2),L(5,4,2),L(5,6,2),L(5,6,2),L(5,6,2),L(9,6,2))
```

100,000 runs:

`dist:=L(0,0,0,0,0,0,0,0,0,0);do(0d100_000){   (0).pump(10,Void,s_of_n_creator(3)).apply2('wrap(n){dist[n]=dist[n]+1}) }N:=dist.sum();dist.apply('wrap(n){"%.2f%%".fmt(n.toFloat()/N*100)}).println();`
Output:
```L("10.00%","9.98%","10.00%","9.99%","10.00%","9.98%","10.01%","10.04%","9.98%","10.02%")
```