# Sieve of Eratosthenes

Sieve of Eratosthenes
You are encouraged to solve this task according to the task description, using any language you may know.
This task has been clarified. Its programming examples are in need of review to ensure that they still fit the requirements of the task.

The Sieve of Eratosthenes is a simple algorithm that finds the prime numbers up to a given integer.

Implement the   Sieve of Eratosthenes   algorithm, with the only allowed optimization that the outer loop can stop at the square root of the limit, and the inner loop may start at the square of the prime just found.

That means especially that you shouldn't optimize by using pre-computed wheels, i.e. don't assume you need only to cross out odd numbers (wheel based on 2), numbers equal to 1 or 5 modulo 6 (wheel based on 2 and 3), or similar wheels based on low primes.

If there's an easy way to add such a wheel based optimization, implement it as an alternative version.

Note
• It is important that the sieve algorithm be the actual algorithm used to find prime numbers for the task.

## 11l

Translation of: Python
```F primes_upto(limit)
V is_prime = [0B]*2 [+] [1B]*(limit - 1)
L(n) 0 .< Int(limit ^ 0.5 + 1.5)
I is_prime[n]
L(i) (n*n..limit).step(n)
is_prime[i] = 0B
R enumerate(is_prime).filter((i, prime) -> prime).map((i, prime) -> i)

print(primes_upto(100))```
Output:
```[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
```

## 360 Assembly

For maximum compatibility, this program uses only the basic instruction set.

```*        Sieve of Eratosthenes
ERATOST  CSECT
USING  ERATOST,R12
SAVEAREA B      STM-SAVEAREA(R15)
DC     17F'0'
DC     CL8'ERATOST'
STM      STM    R14,R12,12(R13) save calling context
ST     R13,4(R15)
ST     R15,8(R13)
*        ----   CODE
LA     R4,1            I=1
LA     R6,1            increment
L      R7,N            limit
LOOPI    BXH    R4,R6,ENDLOOPI  do I=2 to N
LR     R1,R4           R1=I
BCTR   R1,0
LA     R14,CRIBLE(R1)
CLI    0(R14),X'01'
BNE    ENDIF           if not CRIBLE(I)
LR     R5,R4           J=I
LR     R8,R4
LR     R9,R7
LOOPJ    BXH    R5,R8,ENDLOOPJ  do J=I*2 to N by I
LR     R1,R5           R1=J
BCTR   R1,0
LA     R14,CRIBLE(R1)
MVI    0(R14),X'00'    CRIBLE(J)='0'B
B      LOOPJ
ENDLOOPJ EQU    *
ENDIF    EQU    *
B      LOOPI
ENDLOOPI EQU    *
LA     R4,1            I=1
LA     R6,1
L      R7,N
LOOP     BXH    R4,R6,ENDLOOP   do I=1 to N
LR     R1,R4           R1=I
BCTR   R1,0
LA     R14,CRIBLE(R1)
CLI    0(R14),X'01'
BNE    NOTPRIME        if not CRIBLE(I)
CVD    R4,P            P=I
UNPK   Z,P             Z=P
MVC    C,Z             C=Z
OI     C+L'C-1,X'F0'   zap sign
MVC    WTOBUF(8),C+8
WTO    MF=(E,WTOMSG)
NOTPRIME EQU    *
B      LOOP
ENDLOOP  EQU    *
RETURN   EQU    *
LM     R14,R12,12(R13) restore context
XR     R15,R15         set return code to 0
*        ----   DATA
I        DS     F
J        DS     F
DS     0F
P        DS     PL8             packed
Z        DS     ZL16            zoned
C        DS     CL16            character
WTOMSG   DS     0F
DC     H'80'           length of WTO buffer
DC     H'0'            must be binary zeroes
WTOBUF   DC     80C' '
LTORG
N        DC     F'100000'
CRIBLE   DC     100000X'01'
YREGS
END    ERATOST```
Output:
```00000002
00000003
00000005
00000007
00000011
00000013
00000017
00000019
00000023
00000029
00000031
00000037
00000041
00000043
00000047
00000053
00000059
00000061
00000067
...
00099767
00099787
00099793
00099809
00099817
00099823
00099829
00099833
00099839
00099859
00099871
00099877
00099881
00099901
00099907
00099923
00099929
00099961
00099971
00099989
00099991
```

## 6502 Assembly

If this subroutine is called with the value of n in the accumulator, it will store an array of the primes less than n beginning at address 1000 hex and return the number of primes it has found in the accumulator.

```ERATOS: STA  \$D0      ; value of n
LDA  #\$00
LDX  #\$00
SETUP:  STA  \$1000,X  ; populate array
INX
CPX  \$D0
BPL  SET
JMP  SETUP
SET:    LDX  #\$02
SIEVE:  LDA  \$1000,X  ; find non-zero
INX
CPX  \$D0
BPL  SIEVED
CMP  #\$00
BEQ  SIEVE
STA  \$D1      ; current prime
MARK:   CLC
TAY
LDA  #\$00
STA  \$1000,Y
TYA
CMP  \$D0
BPL  SIEVE
JMP  MARK
SIEVED: LDX  #\$01
LDY  #\$00
COPY:   INX
CPX  \$D0
BPL  COPIED
LDA  \$1000,X
CMP  #\$00
BEQ  COPY
STA  \$2000,Y
INY
JMP  COPY
COPIED: TYA           ; how many found
RTS```

## 68000 Assembly

Algorithm somewhat optimized: array omits 1, 2, all higher odd numbers. Optimized for storage: uses bit array for prime/composite flags.

Works with: [EASy68K v5.13.00]

Some of the macro code is derived from the examples included with EASy68K. See 68000 "100 Doors" listing for additional information.

```*-----------------------------------------------------------
* Title      : BitSieve
* Written by : G. A. Tippery
* Date       : 2014-Feb-24, 2013-Dec-22
* Description: Prime number sieve
*-----------------------------------------------------------
ORG    \$1000

**	---- Generic macros ----	**
PUSH	MACRO
MOVE.L	\1,-(SP)
ENDM

POP	MACRO
MOVE.L	(SP)+,\1
ENDM

DROP	MACRO
ENDM

PUTS	MACRO
** Print a null-terminated string w/o CRLF **
** Returns with D0, A1 modified
MOVEQ	#14,D0	; task number 14 (display null string)
LEA	\1,A1	; address of string
TRAP	#15	; display it
ENDM

GETN	MACRO
MOVEQ	#4,D0	; Read a number from the keyboard into D1.L.
TRAP	#15
ENDM

**	---- Application-specific macros ----	**

val	MACRO		; Used by bit sieve. Converts bit address to the number it represents.
ADD.L	\1,\1	; double it because odd numbers are omitted
ADDQ	#3,\1	; add offset because initial primes (1, 2) are omitted
ENDM

* ** ================================================================================ **
* ** Integer square root routine, bisection method **
* ** IN: D0, should be 0<D0<\$10000 (65536) -- higher values MAY work, no guarantee
* ** OUT: D1
*
SquareRoot:
*
MOVEM.L	D2-D4,-(SP)	; save registers needed for local variables
*	DO == n
*	D1 == a
*	D2 == b
*	D3 == guess
*	D4 == temp
*
*		a = 1;
*		b = n;
MOVEQ	#1,D1
MOVE.L	D0,D2
*		do {
REPEAT
*		guess = (a+b)/2;
MOVE.L	D1,D3
LSR.L	#1,D3
*		if (guess*guess > n) {	// inverse function of sqrt is square
MOVE.L	D3,D4
MULU	D4,D4		; guess^2
CMP.L	D0,D4
BLS	.else
*		b = guess;
MOVE.L	D3,D2
BRA	.endif
*		} else {
.else:
*		a = guess;
MOVE.L	D3,D1
*		} //if
.endif:
*		} while ((b-a) > 1);	; Same as until (b-a)<=1 or until (a-b)>=1
MOVE.L	D2,D4
SUB.L	D1,D4	; b-a
UNTIL.L	  D4 <LE> #1 DO.S
*		return (a)	; Result is in D1
*		} //LongSqrt()
MOVEM.L	(SP)+,D2-D4	; restore saved registers
RTS
*
* ** ================================================================================ **

** ======================================================================= **
*
**  Prime-number Sieve of Eratosthenes routine using a big bit field for flags  **
*  Enter with D0 = size of sieve (bit array)
*  Prints found primes 10 per line
*  Returns # prime found in D6
*
*   Register usage:
*
*	D0 == n
*	D1 == prime
*	D2 == sqroot
*	D3 == PIndex
*	D4 == CIndex
*	D5 == MaxIndex
*	D6 == PCount
*
*	A0 == PMtx
*
*   On return, all registers above except D0 are modified. Could add MOVEMs to save and restore D2-D6/A0.
*

**	------------------------	**

GetBit:		** sub-part of Sieve subroutine **
** Entry: bit # is on TOS
** Exit: A6 holds the byte number, D7 holds the bit number within the byte
** Note: Input param is still on TOS after return. Could have passed via a register, but
**  wanted to practice with stack. :)
*
MOVE.L	(4,SP),D7	; get value from (pre-call) TOS
ASR.L	#3,D7	; /8
MOVEA	D7,A6	; byte #
MOVE.L	(4,SP),D7	; get value from (pre-call) TOS
AND.L	#\$7,D7	; bit #
RTS

**	------------------------	**

Sieve:
MOVE	D0,D5
SUBQ	#1,D5
JSR	SquareRoot	; sqrt D0 => D1
MOVE.L	D1,D2
LEA	PArray,A0
CLR.L	D3
*
PrimeLoop:
MOVE.L	D3,D1
val	D1
MOVE.L	D3,D4
*
CxLoop:		; Goes through array marking multiples of d1 as composite numbers
CMP.L	D5,D4
BHI	ExitCx
PUSH	D4	; set D7 as bit # and A6 as byte pointer for D4'th bit of array
JSR GetBit
DROP
BSET	D7,0(A0,A6.L)	; set bit to mark as composite number
ADD.L	D1,D4	; next number to mark
BRA	CxLoop
ExitCx:
CLR.L	D1	; Clear new-prime-found flag
ADDQ	#1,D3	; Start just past last prime found
PxLoop:		; Searches for next unmarked (not composite) number
CMP.L	D2,D3	; no point searching past where first unmarked multiple would be past end of array
BHI	ExitPx	; if past end of array
TST.L	D1
BNE	ExitPx	; if flag set, new prime found
PUSH D3		; check D3'th bit flag
JSR	GetBit	; sets D7 as bit # and A6 as byte pointer
DROP		; drop TOS
BTST	D7,0(A0,A6.L)	; read bit flag
BNE	IsSet	; If already tagged as composite
MOVEQ	#-1,D1	; Set flag that we've found a new prime
IsSet:
BRA	PxLoop
ExitPx:
SUBQ	#1,D3	; back up PIndex
TST.L	D1	; Did we find a new prime #?
BNE	PrimeLoop	; If another prime # found, go process it
*
; fall through to print routine

**	------------------------	**

* Print primes found
*
*	D4 == Column count
*
*	Print header and assumed primes (#1, #2)
MOVEQ	#2,D6	; Start counter at 2 because #1 and #2 are assumed primes
MOVEQ	#2,D4
*
MOVEQ	#0,D3
PrintLoop:
CMP.L	D5,D3
BHS	ExitPL
PUSH	D3
JSR	GetBit	; sets D7 as bit # and A6 as byte pointer
DROP		; drop TOS
BTST	D7,0(A0,A6.L)
BNE		NotPrime
*		printf(" %6d", val(PIndex)
MOVE.L	D3,D1
val	D1
AND.L	#\$0000FFFF,D1
MOVEQ	#6,D2
MOVEQ	#20,D0	; display signed RJ
TRAP	#15
*	*** Display formatting ***
*		if((PCount % 10) == 0) printf("\n");
CMP	#10,D4
BLO	NoLF
PUTS	CRLF
MOVEQ	#0,D4
NoLF:
NotPrime:
BRA	PrintLoop
ExitPL:
RTS

** ======================================================================= **

N	EQU	5000	; *** Size of boolean (bit) array ***
SizeInBytes	EQU	(N+7)/8
*
START:                  	; first instruction of program
MOVE.L	#N,D0	; # to test
JSR	Sieve
*		printf("\n %d prime numbers found.\n", D6); ***
PUTS	Summary1,A1
MOVE	#3,D0	; Display signed number in D1.L in decimal in smallest field.
MOVE.W	D6,D1
TRAP	#15
PUTS	Summary2,A1

SIMHALT             	; halt simulator

** ======================================================================= **

* Variables and constants here

ORG	\$2000
CR	EQU	13
LF	EQU	10
CRLF	DC.B	CR,LF,\$00

PArray:	DCB.B	SizeInBytes,0

DC.B	'     1     2',\$00

Summary1:	DC.B	CR,LF,' ',\$00
Summary2:	DC.B	' prime numbers found.',CR,LF,\$00

END    START        	; last line of source
```

## 8086 Assembly

```MAXPRM:	equ	5000		; Change this value for more primes
cpu	8086
bits	16
org	100h
section	.text
erato:	mov	cx,MAXPRM	; Initialize array (set all items to prime)
mov	bp,cx		; Keep a copy in BP
mov	di,sieve
mov	al,1
rep	stosb
;;;	Sieve
mov	bx,sieve	; Set base register to array
inc	cx		; CX=1 (CH=0, CL=1); CX was 0 before
mov	si,cx		; Start at number 2 (1+1)
.next:	inc	si		; Next number
cmp	cl,[bx+si]	; Is this number marked as prime?
jne	.next		; If not, try next number
mov	ax,si		; Otherwise, calculate square,
mul	si
mov	di,ax		; and put it in DI
cmp	di,bp		; Check array bounds
ja	output		; We're done when SI*SI>MAXPRM
.mark:	mov	[bx+di],ch	; Mark byte as composite
cmp 	di,bp		; While maximum not reached
jbe	.mark
jmp	.next
;;;	Output
output:	mov	si,2		; Start at 2
.test:	dec	byte [bx+si]	; Prime?
jnz	.next		; If not, try next number
mov	ax,si		; Otherwise, print number
call	prax
.next:	inc	si
cmp	si,MAXPRM
jbe	.test
ret
;;;	Write number in AX to standard output (using MS-DOS)
prax:	push	bx		; Save BX
mov	bx,numbuf
mov	bp,10		; Divisor
.loop:	xor	dx,dx		; Divide AX by 10, modulus in DX
div	bp
dec	bx
mov	[bx],dl		; Store ASCII digit
test	ax,ax		; More digits?
jnz	.loop
mov	dx,bx		; Print number
mov	ah,9		; 9 = MS-DOS syscall to print string
int	21h
pop	bx		; Restore BX
ret
section	.data
db	'*****'		; Room for number
numbuf:	db	13,10,'\$'
section	.bss
sieve:	resb	MAXPRM
```
Output:
```2
3
5
7
11
...
4969
4973
4987
4993
4999
```

## 8th

```with: n

\ create a new buffer which will function as a bit vector
: bit-vec SED: n -- b
dup 3 shr swap 7 band if 1+ then b:new b:clear ;

\ given a buffer, sieving prime, and limit, cross off multiples
\ of the sieving prime.
: +composites SED: b n n -- b
>r dup sqr rot \ want: -- n index b
repeat
over 1- true b:bit!
>r over + r>
over r@ > until!
rdrop nip nip ;

\ SoE algorithm proper
: make-sieve SED: n -- b
dup>r bit-vec 2
repeat
tuck 1- b:bit@ not
if
over r@ +composites
then swap 1+
dup sqr r@ < while!
rdrop drop ;

\ traverse the final buffer, creating an array of primes
: sieve>a SED: b n -- a
>r a:new swap
( 1- b:bit@ not if >r I a:push r> then ) 2 r> loop drop ;

;with

: sieve SED: n -- a
dup make-sieve swap sieve>a ;```
Output:
```ok> 100 sieve .
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
ok> 1_000_000 sieve a:len . \ count primes up to 1,000,000
78498
ok> -1 a:@ . \ largest prime < 1,000,000
999983
```

## AArch64 Assembly

Works with: as version Raspberry Pi 3B version Buster 64 bits
```/* ARM assembly AARCH64 Raspberry PI 3B */
/*  program cribleEras64.s   */

/*******************************************/
/* Constantes file                         */
/*******************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeConstantesARM64.inc"

.equ MAXI,      100

/*********************************/
/* Initialized data              */
/*********************************/
.data
sMessResult:        .asciz "Prime  : @ \n"
szCarriageReturn:   .asciz "\n"

/*********************************/
/* UnInitialized data            */
/*********************************/
.bss
sZoneConv:                  .skip 24
TablePrime:                 .skip   8 * MAXI
/*********************************/
/*  code section                 */
/*********************************/
.text
.global main
main:                               // entry of program
mov x0,#2                       // prime 2
bl displayPrime
mov x1,#2
mov x2,#1
1:                                  // loop for multiple of 2
str x2,[x4,x1,lsl #3]           // mark  multiple of 2
cmp x1,#MAXI                    // end ?
ble 1b                          // no loop
mov x1,#3                       // begin indice
mov x3,#1
2:
ldr x2,[x4,x1,lsl #3]           // load table élément
cmp x2,#1                       // is prime ?
beq 4f
mov x0,x1                       // yes -> display
bl displayPrime
mov x2,x1
3:                                  // and loop to mark multiples of this prime
str x3,[x4,x2,lsl #3]
cmp x2,#MAXI                    // end ?
ble 3b                          // no -> loop
4:
add x1,x1,2                     // other prime in table
cmp x1,MAXI                     // end table ?
ble 2b                          // no -> loop

100:                                // standard end of the program
mov x0,0                        // return code
mov x8,EXIT                     // request to exit program
svc 0                           // perform the system call

/******************************************************************/
/*      Display prime table elements                                */
/******************************************************************/
/* x0 contains the prime */
displayPrime:
stp x1,lr,[sp,-16]!             // save  registers
bl conversion10                 // call décimal conversion
ldr x1,qAdrsZoneConv            // insert conversion in message
bl strInsertAtCharInc
bl affichageMess                // display message
100:
ldp x1,lr,[sp],16               // restaur  2 registers

/********************************************************/
/*        File Include fonctions                        */
/********************************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeARM64.inc"```
```Prime  : 2
Prime  : 3
Prime  : 5
Prime  : 7
Prime  : 11
Prime  : 13
Prime  : 17
Prime  : 19
Prime  : 23
Prime  : 29
Prime  : 31
Prime  : 37
Prime  : 41
Prime  : 43
Prime  : 47
Prime  : 53
Prime  : 59
Prime  : 61
Prime  : 67
Prime  : 71
Prime  : 73
Prime  : 79
Prime  : 83
Prime  : 89
Prime  : 97
```

## ABAP

```PARAMETERS: p_limit TYPE i OBLIGATORY DEFAULT 100.

AT SELECTION-SCREEN ON p_limit.
IF p_limit LE 1.
MESSAGE 'Limit must be higher then 1.' TYPE 'E'.
ENDIF.

START-OF-SELECTION.
FIELD-SYMBOLS: <fs_prime> TYPE flag.
DATA: gt_prime TYPE TABLE OF flag,
gv_prime TYPE flag,
gv_i     TYPE i,
gv_j     TYPE i.

DO p_limit TIMES.
IF sy-index > 1.
gv_prime = abap_true.
ELSE.
gv_prime = abap_false.
ENDIF.

APPEND gv_prime TO gt_prime.
ENDDO.

gv_i = 2.
WHILE ( gv_i <= trunc( sqrt( p_limit ) ) ).
IF ( gt_prime[ gv_i ] EQ abap_true ).
gv_j =  gv_i ** 2.
WHILE ( gv_j <= p_limit ).
gt_prime[ gv_j ] = abap_false.
gv_j = ( gv_i ** 2 ) + ( sy-index * gv_i ).
ENDWHILE.
ENDIF.
gv_i = gv_i + 1.
ENDWHILE.

LOOP AT gt_prime INTO gv_prime.
IF gv_prime = abap_true.
WRITE: / sy-tabix.
ENDIF.
ENDLOOP.
```

## ACL2

```(defun nats-to-from (n i)
(declare (xargs :measure (nfix (- n i))))
(if (zp (- n i))
nil
(cons i (nats-to-from n (+ i 1)))))

(defun remove-multiples-up-to-r (factor limit xs i)
(declare (xargs :measure (nfix (- limit i))))
(if (or (> i limit)
(zp (- limit i))
(zp factor))
xs
(remove-multiples-up-to-r
factor
limit
(remove i xs)
(+ i factor))))

(defun remove-multiples-up-to (factor limit xs)
(remove-multiples-up-to-r factor limit xs (* factor 2)))

(defun sieve-r (factor limit)
(declare (xargs :measure (nfix (- limit factor))))
(if (zp (- limit factor))
(nats-to-from limit 2)
(remove-multiples-up-to factor (+ limit 1)
(sieve-r (1+ factor) limit))))

(defun sieve (limit)
(sieve-r 2 limit))
```

## Action!

```DEFINE MAX="1000"

PROC Main()
BYTE ARRAY t(MAX+1)
INT i,j,k,first

FOR i=0 TO MAX
DO
t(i)=1
OD

t(0)=0
t(1)=0

i=2 first=1
WHILE i<=MAX
DO
IF t(i)=1 THEN
IF first=0 THEN
Print(", ")
FI
PrintI(i)
FOR j=2*i TO MAX STEP i
DO
t(j)=0
OD
first=0
FI
i==+1
OD
RETURN```
Output:
```2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103,
107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223,
227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347,
349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463,
467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607,
613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743,
751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883,
887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997
```

## ActionScript

Works with ActionScript 3.0 (this is utilizing the actions panel, not a separated class file)

```function eratosthenes(limit:int):Array
{
var primes:Array = new Array();
if (limit >= 2) {
var sqrtlmt:int = int(Math.sqrt(limit) - 2);
for (var i:int = 2; i <= limit; i++) // and
nums.push(i); // only initialize the Array once...
for (var j:int = 0; j <= sqrtlmt; j++) {
var p:int = nums[j]
if (p)
for (var t:int = p * p - 2; t < nums.length; t += p)
nums[t] = 0;
}
for (var m:int = 0; m < nums.length; m++) {
var r:int = nums[m];
if (r)
primes.push(r);
}
}
return primes;
}
var e:Array = eratosthenes(1000);
trace(e);
```

Output:

Output:
```2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,947,953,967,971,977,983,991,997
```

```with Ada.Text_IO, Ada.Command_Line;

procedure Eratos is

Prime: array(1 .. Last) of Boolean := (1 => False, others => True);
Base: Positive := 2;
Cnt: Positive;
begin
while Base * Base <= Last loop
if Prime(Base) then
Cnt := Base + Base;
while Cnt <= Last loop
Prime(Cnt) := False;
Cnt := Cnt + Base;
end loop;
end if;
Base := Base + 1;
end loop;
Ada.Text_IO.Put("Primes less or equal" & Positive'Image(Last) &" are:");
for Number in Prime'Range loop
if Prime(Number) then
end if;
end loop;
end Eratos;
```
Output:
```> ./eratos 31
Primes less or equal 31 are : 2 3 5 7 11 13 17 19 23 29 31```

## Agda

```-- imports
open import Data.Nat as ℕ     using (ℕ; suc; zero; _+_; _∸_)
open import Data.Vec as Vec   using (Vec; _∷_; []; tabulate; foldr)
open import Data.Fin as Fin   using (Fin; suc; zero)
open import Function          using (_∘_; const; id)
open import Data.List as List using (List; _∷_; [])
open import Data.Maybe        using (Maybe; just; nothing)

-- Without square cutoff optimization
module Simple where
primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve (tabulate (just ∘ suc))
where
sieve : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m))
sieve [] = []
sieve (nothing ∷ xs) =         sieve xs
sieve (just x  ∷ xs) = suc x ∷ sieve (foldr B remove (const []) xs x)
where
B = λ n → ∀ {i} → Fin i → Vec (Maybe (Fin (2 + m))) n

remove : ∀ {n} → Maybe (Fin (2 + m)) → B n → B (suc n)
remove _ ys zero    = nothing ∷ ys x
remove y ys (suc z) = y       ∷ ys z

-- With square cutoff optimization
module SquareOpt where
primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve 1 m (Vec.tabulate (just ∘ Fin.suc ∘ Fin.suc))
where
sieve : ∀ {n} → ℕ → ℕ → Vec (Maybe (Fin (3 + m))) n → List (Fin (3 + m))
sieve _ zero = List.mapMaybe id ∘ Vec.toList
sieve _ (suc _) [] = []
sieve i (suc l) (nothing ∷ xs) =     sieve (suc i) (l ∸ i ∸ i) xs
sieve i (suc l) (just x  ∷ xs) = x ∷ sieve (suc i) (l ∸ i ∸ i) (Vec.foldr B remove (const []) xs i)
where
B = λ n → ℕ → Vec (Maybe (Fin (3 + m))) n

remove : ∀ {i} → Maybe (Fin (3 + m)) → B i → B (suc i)
remove _ ys zero    = nothing ∷ ys i
remove y ys (suc j) = y       ∷ ys j
```

## Agena

Tested with Agena 2.9.5 Win32

```# Sieve of Eratosthenes

# generate and return a sequence containing the primes up to sieveSize
sieve := proc( sieveSize :: number ) :: sequence is
local sieve, result;

result := seq(); # sequence of primes - initially empty
create register sieve( sieveSize ); # "vector" to be sieved

sieve[ 1 ] := false;
for sPos from 2 to sieveSize do sieve[ sPos ] := true od;

# sieve the primes
for sPos from 2 to entier( sqrt( sieveSize ) ) do
if sieve[ sPos ] then
for p from sPos * sPos to sieveSize by sPos do
sieve[ p ] := false
od
fi
od;

# construct the sequence of primes
for sPos from 1 to sieveSize do
if sieve[ sPos ] then insert sPos into result fi
od

return result
end; # sieve

# test the sieve proc
for i in sieve( 100 ) do write( " ", i ) od; print();```
Output:
``` 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
```

## ALGOL 60

Based on the 1962 Revised Repport:

```comment Sieve of Eratosthenes;
begin
integer array t[0:1000];
integer i,j,k;
for i:=0 step 1 until 1000 do t[i]:=1;
t:=0; t:=0; i:=0;
for i:=i while i<1000 do
begin
for i:=i while i<1000 and t[i]=0 do i:=i+1;
if i<1000 then
begin
j:=2;
k:=j*i;
for k:=k while k<1000 do
begin
t[k]:=0;
j:=j+1;
k:=j*i
end;
i:=i+1
end
end;
for i:=0 step 1 until 999 do
if t[i]≠0 then print(i,ꞌ is primeꞌ)
end
```

An 1964 Implementation:

Works with: ALGOL 60 for OS/360

```'BEGIN'
'INTEGER' 'ARRAY' CANDIDATES(/0..1000/);
'INTEGER' I,J,K;
'COMMENT' SET LINE-LENGTH=120,SET LINES-PER-PAGE=62,OPEN;
SYSACT(1,6,120); SYSACT(1,8,62); SYSACT(1,12,1);
'FOR' I := 0 'STEP' 1 'UNTIL' 1000 'DO'
'BEGIN'
CANDIDATES(/I/) := 1;
'END';
CANDIDATES(/0/) := 0;
CANDIDATES(/1/) := 0;
I := 0;
'FOR' I := I 'WHILE' I 'LESS' 1000 'DO'
'BEGIN'
'FOR' I := I 'WHILE' I 'LESS' 1000
'AND' CANDIDATES(/I/) 'EQUAL' 0 'DO'
I := I+1;
'IF' I 'LESS' 1000 'THEN'
'BEGIN'
J := 2;
K := J*I;
'FOR' K := K 'WHILE' K 'LESS' 1000 'DO'
'BEGIN'
CANDIDATES(/K/) := 0;
J := J + 1;
K := J*I;
'END';
I := I+1;
'END'
'END';
'FOR' I := 0 'STEP' 1 'UNTIL' 999 'DO'
'IF' CANDIDATES(/I/) 'NOTEQUAL' 0  'THEN'
'BEGIN'
OUTINTEGER(1,I);
OUTSTRING(1,'(' IS PRIME')');
'COMMENT' NEW LINE;
SYSACT(1,14,1)
'END'
'END'
'END'```

## ALGOL 68

```BOOL prime = TRUE, non prime = FALSE;
PROC eratosthenes = (INT n)[]BOOL:
(
[n]BOOL sieve;
FOR i TO UPB sieve DO sieve[i] := prime OD;
INT m = ENTIER sqrt(n);
sieve := non prime;
FOR i FROM 2 TO m DO
IF sieve[i] = prime THEN
FOR j FROM i*i BY i TO n DO
sieve[j] := non prime
OD
FI
OD;
sieve
);

print((eratosthenes(80),new line))```
Output:
```FTTFTFTFFFTFTFFFTFTFFFTFFFFFTFTFFFFFTFFFTFTFFFTFFFFFTFFFFFTFTFFFFFTFFFTFTFFFFFTF
```

## ALGOL W

### Standard, non-optimised sieve

```begin

% implements the sieve of Eratosthenes                                   %
%     s(i) is set to true if i is prime, false otherwise                 %
%     algol W doesn't have a upb operator, so we pass the size of the    %
%     array in n                                                         %
procedure sieve( logical array s ( * ); integer value n ) ;
begin

for i := 1 until n do s( i ) := true;

% sieve out the non-primes                                           %
s( 1 ) := false;
for i := 2 until truncate( sqrt( n ) )
do begin
if s( i )
then begin
for p := i * i step i until n do s( p ) := false
end if_s_i
end for_i ;

end sieve ;

% test the sieve procedure                                               %

integer sieveMax;

sieveMax := 100;
begin

logical array s ( 1 :: sieveMax );

i_w := 2; % set output field width                                   %
s_w := 1; % and output separator width                               %

% find and display the primes                                        %
sieve( s, sieveMax );
for i := 1 until sieveMax do if s( i ) then writeon( i );

end

end.```
Output:
``` 2  3  5  7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
```

### Odd numbers only version

Alternative version that only stores odd numbers greater than 1 in the sieve.

```begin
% implements the sieve of Eratosthenes                                   %
% only odd numbers appear in the sieve, which starts at 3                %
% s( i ) is set to true if ( i * 2 ) + 1 is prime                        %
procedure sieve2( logical array s ( * ); integer value n ) ;
begin
for i := 1 until n do s( i ) := true;
% sieve out the non-primes                                           %
% the subscripts of s are  1  2  3  4  5  6  7  8  9 10 11 12 13...  %
%      which correspond to 3  5  7  9 11 13 15 17 19 21 23 25 27...  %
for i := 1 until truncate( sqrt( n ) ) do begin
if s( i ) then begin
integer ip;
ip := ( i * 2 ) + 1;
for p := i + ip step ip until n do s( p ) := false
end if_s_i
end for_i ;
end sieve2 ;
% test the sieve2 procedure                                              %
integer primeMax, arrayMax;
primeMax := 100;
arrayMax := ( primeMax div 2 ) - 1;
begin
logical array s ( 1 :: arrayMax);
i_w := 2; % set output field width                                   %
s_w := 1; % and output separator width                               %
% find and display the primes                                        %
sieve2( s, arrayMax );
write( 2 );
for i := 1 until arrayMax do if s( i ) then writeon( ( i * 2 ) + 1 );
end
end.```
Output:

Same as the standard version.

## ALGOL-M

```BEGIN

COMMENT
FIND PRIMES UP TO THE SPECIFIED LIMIT (HERE 1,000) USING
CLASSIC SIEVE OF ERATOSTHENES;

% CALCULATE INTEGER SQUARE ROOT %
INTEGER FUNCTION ISQRT(N);
INTEGER N;
BEGIN
INTEGER R1, R2;
R1 := N;
R2 := 1;
WHILE R1 > R2 DO
BEGIN
R1 := (R1+R2) / 2;
R2 := N / R1;
END;
ISQRT := R1;
END;

INTEGER LIMIT, I, J, FALSE, TRUE, COL, COUNT;
INTEGER ARRAY FLAGS[1:1000];

LIMIT := 1000;
FALSE := 0;
TRUE := 1;

WRITE("FINDING PRIMES FROM 2 TO",LIMIT);

% INITIALIZE TABLE %
WRITE("INITIALIZING ... ");
FOR I := 1 STEP 1 UNTIL LIMIT DO
FLAGS[I] := TRUE;

% SIEVE FOR PRIMES %
WRITEON("SIEVING ... ");
FOR I := 2 STEP 1 UNTIL ISQRT(LIMIT) DO
BEGIN
IF FLAGS[I] = TRUE THEN
FOR J := (I * I) STEP I UNTIL LIMIT DO
FLAGS[J] := FALSE;
END;

% WRITE OUT THE PRIMES TEN PER LINE %
WRITEON("PRINTING");
COUNT := 0;
COL := 1;
WRITE("");
FOR I := 2 STEP 1 UNTIL LIMIT DO
BEGIN
IF FLAGS[I] = TRUE THEN
BEGIN
WRITEON(I);
COUNT := COUNT + 1;
COL := COL + 1;
IF COL > 10 THEN
BEGIN
WRITE("");
COL := 1;
END;
END;
END;

WRITE("");
WRITE(COUNT, " PRIMES WERE FOUND.");

END```
Output:
```FINDING PRIMES FROM 2 TO  1000
INTIALIZING ... SIEVING ... PRINTING
2     3     5     7    11    13    17    19    23    29
31    37    41    43    47    53    59    61    67    71
. . .
877   881   883   887   907   911   919   929   937   941
947   953   967   971   977   983   991   997

168 PRIMES WERE FOUND.
```

## APL

All these versions requires ⎕io←0 (index origin 0).

It would have been better to require a result of the boolean mask rather than the actual list of primes. The list of primes obtains readily from the mask by application of a simple function (here {⍵/⍳⍴⍵}). Other related computations (such as the number of primes < n) obtain readily from the mask, easier than producing the list of primes.

### Non-Optimized Version

```sieve2←{
b←⍵⍴1
b[⍳2⌊⍵]←0
2≥⍵:b
p←{⍵/⍳⍴⍵}∇⌈⍵*0.5
m←1+⌊(⍵-1+p×p)÷p
b ⊣ p {b[⍺×⍺+⍳⍵]←0}¨ m
}

primes2←{⍵/⍳⍴⍵}∘sieve2
```

The required list of prime divisors obtains by recursion ({⍵/⍳⍴⍵}∇⌈⍵*0.5).

### Optimized Version

```sieve←{
b←⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5
b[⍳6⌊⍵]←(6⌊⍵)⍴0 0 1 1 0 1
49≥⍵:b
p←3↓{⍵/⍳⍴⍵}∇⌈⍵*0.5
m←1+⌊(⍵-1+p×p)÷2×p
b ⊣ p {b[⍺×⍺+2×⍳⍵]←0}¨ m
}

primes←{⍵/⍳⍴⍵}∘sieve
```

The optimizations are as follows:

• Multiples of 2 3 5 are marked by initializing b with ⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5 rather than with ⍵⍴1.
• Subsequently, only odd multiples of primes > 5 are marked.
• Multiples of a prime to be marked start at its square.

### Examples

```   primes 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

primes¨ ⍳14
┌┬┬┬─┬───┬───┬─────┬─────┬───────┬───────┬───────┬───────┬──────────┬──────────┐
││││2│2 3│2 3│2 3 5│2 3 5│2 3 5 7│2 3 5 7│2 3 5 7│2 3 5 7│2 3 5 7 11│2 3 5 7 11│
└┴┴┴─┴───┴───┴─────┴─────┴───────┴───────┴───────┴───────┴──────────┴──────────┘

sieve 13
0 0 1 1 0 1 0 1 0 0 0 1 0

+/∘sieve¨ 10*⍳10
0 4 25 168 1229 9592 78498 664579 5761455 50847534
```

The last expression computes the number of primes < 1e0 1e1 ... 1e9. The last number 50847534 can perhaps be called the anti-Bertelsen's number (http://mathworld.wolfram.com/BertelsensNumber.html).

## AppleScript

```on sieveOfEratosthenes(limit)
script o
property numberList : {missing value}
end script

repeat with n from 2 to limit
set end of o's numberList to n
end repeat
repeat with n from 2 to (limit ^ 0.5 div 1)
if (item n of o's numberList is n) then
repeat with multiple from (n * n) to limit by n
set item multiple of o's numberList to missing value
end repeat
end if
end repeat

return o's numberList's numbers
end sieveOfEratosthenes

sieveOfEratosthenes(1000)
```
Output:
```{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997}
```

## ARM Assembly

Works with: as version Raspberry Pi
```/* ARM assembly Raspberry PI  */
/*  program cribleEras.s   */

/* REMARK 1 : this program use routines in a include file
see task Include a file language arm assembly
for the routine affichageMess conversion10
see at end of this program the instruction include */
/* for constantes see task include a file in arm assembly */
/************************************/
/* Constantes                       */
/************************************/
.include "../constantes.inc"

.equ MAXI,      101

/*********************************/
/* Initialized data              */
/*********************************/
.data
sMessResult:        .asciz "Prime  : @ \n"
szCarriageReturn:   .asciz "\n"

/*********************************/
/* UnInitialized data            */
/*********************************/
.bss
sZoneConv:                  .skip 24
TablePrime:                 .skip   4 * MAXI
/*********************************/
/*  code section                 */
/*********************************/
.text
.global main
main:                               @ entry of program
mov r0,#2                       @ prime 2
bl displayPrime
mov r1,#2
mov r2,#1
1:                                  @ loop for multiple of 2
str r2,[r4,r1,lsl #2]           @ mark  multiple of 2
cmp r1,#MAXI                    @ end ?
ble 1b                          @ no loop
mov r1,#3                       @ begin indice
mov r3,#1
2:
ldr r2,[r4,r1,lsl #2]           @ load table élément
cmp r2,#1                       @ is prime ?
beq 4f
mov r0,r1                       @ yes -> display
bl displayPrime
mov r2,r1
3:                                  @ and loop to mark multiples of this prime
str r3,[r4,r2,lsl #2]
cmp r2,#MAXI              @ end ?
ble 3b                          @ no -> loop
4:
add r1,#2                       @ other prime in table
cmp r1,#MAXI              @ end table ?
ble 2b                          @ no -> loop

100:                                @ standard end of the program
mov r0, #0                      @ return code
mov r7, #EXIT                   @ request to exit program
svc #0                          @ perform the system call

/******************************************************************/
/*      Display prime table elements                                */
/******************************************************************/
/* r0 contains the prime */
displayPrime:
push {r1,lr}                    @ save registers
bl conversion10                 @ call décimal conversion
ldr r1,iAdrsZoneConv            @ insert conversion in message
bl strInsertAtCharInc
bl affichageMess                @ display message
100:
pop {r1,lr}
bx lr
/***************************************************/
/*      ROUTINES INCLUDE                           */
/***************************************************/
.include "../affichage.inc"```
```Prime  : 2
Prime  : 3
Prime  : 5
Prime  : 7
Prime  : 11
Prime  : 13
Prime  : 17
Prime  : 19
Prime  : 23
Prime  : 29
Prime  : 31
Prime  : 37
Prime  : 41
Prime  : 43
Prime  : 47
Prime  : 53
Prime  : 59
Prime  : 61
Prime  : 67
Prime  : 71
Prime  : 73
Prime  : 79
Prime  : 83
Prime  : 89
Prime  : 97
Prime  : 101
```

## Arturo

```sieve: function [upto][
composites: array.of: inc upto false
loop 2..to :integer sqrt upto 'x [
if not? composites\[x][
loop range.step: x x^2 upto 'c [
composites\[c]: true
]
]
]
result: new []
loop.with:'i composites 'c [
unless c -> 'result ++ i
]
return result -- [0,1]
]

print sieve 100
```
Output:
`2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97`

## AutoHotkey

Search autohotkey.com: of Eratosthenes
Source: AutoHotkey forum by Laszlo

```MsgBox % "12345678901234567890`n" Sieve(20)

Sieve(n) { ; Sieve of Eratosthenes => string of 0|1 chars, 1 at position k: k is prime
Static zero := 48, one := 49 ; Asc("0"), Asc("1")
VarSetCapacity(S,n,one)
NumPut(zero,S,0,"char")
i := 2
Loop % sqrt(n)-1 {
If (NumGet(S,i-1,"char") = one)
Loop % n//i
If (A_Index > 1)
NumPut(zero,S,A_Index*i-1,"char")
i += 1+(i>2)
}
Return S
}
```

### Alternative Version

```Sieve_of_Eratosthenes(n){
arr := []
loop % n-1
if A_Index>1
arr[A_Index] := true

for i, v in arr	{
if (i>Sqrt(n))
break
else if arr[i]
while ((j := i*2 + (A_Index-1)*i) < n)
arr.delete(j)
}
return Arr
}
```
Examples:
```n := 101
Arr := Sieve_of_Eratosthenes(n)
loop, % n-1
output .= (Arr[A_Index] ? A_Index : ".") . (!Mod(A_Index, 10) ? "`n" : "`t")
MsgBox % output
return
```
Output:
```.	2	3	.	5	.	7	.	.	.
11	.	13	.	.	.	17	.	19	.
.	.	23	.	.	.	.	.	29	.
31	.	.	.	.	.	37	.	.	.
41	.	43	.	.	.	47	.	.	.
.	.	53	.	.	.	.	.	59	.
61	.	.	.	.	.	67	.	.	.
71	.	73	.	.	.	.	.	79	.
.	.	83	.	.	.	.	.	89	.
.	.	.	.	.	.	97	.	.	.```

## AutoIt

```#include <Array.au3>
\$M = InputBox("Integer", "Enter biggest Integer")
Global \$a[\$M], \$r[\$M], \$c = 1
For \$i = 2 To \$M -1
If Not \$a[\$i] Then
\$r[\$c] = \$i
\$c += 1
For \$k = \$i To \$M -1 Step \$i
\$a[\$k] = True
Next
EndIf
Next
\$r = \$c - 1
ReDim \$r[\$c]
_ArrayDisplay(\$r)
```

## AWK

An initial array holds all numbers 2..max (which is entered on stdin); then all products of integers are deleted from it; the remaining are displayed in the unsorted appearance of a hash table. Here, the script is entered directly on the commandline, and input entered on stdin:

```\$ awk '{for(i=2;i<=\$1;i++) a[i]=1;
>       for(i=2;i<=sqrt(\$1);i++) for(j=2;j<=\$1;j++) delete a[i*j];
>       for(i in a) printf i" "}'
100
71 53 17 5 73 37 19 83 47 29 7 67 59 11 97 79 89 31 13 41 23 2 61 43 3
```

The following variant does not unset non-primes, but sets them to 0, to preserve order in output:

```\$ awk '{for(i=2;i<=\$1;i++) a[i]=1;
>       for(i=2;i<=sqrt(\$1);i++) for(j=2;j<=\$1;j++) a[i*j]=0;
>       for(i=2;i<=\$1;i++) if(a[i])printf i" "}'
100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
```

Now with the script from a file, input from commandline as well as stdin, and input is checked for valid numbers:

```# usage:  gawk  -v n=101  -f sieve.awk

function sieve(n) { # print n,":"
for(i=2; i<=n;      i++) a[i]=1;
for(i=2; i<=sqrt(n);i++) for(j=2;j<=n;j++) a[i*j]=0;
for(i=2; i<=n;      i++) if(a[i]) printf i" "
print ""
}

BEGIN	{ print "Sieve of Eratosthenes:"
if(n>1) sieve(n)
}

{ n=\$1+0 }
n<2	{ exit }
{ sieve(n) }

END	{ print "Bye!" }
```

Here is an alternate version that uses an associative array to record composites with a prime dividing it. It can be considered a slow version, as it does not cross out composites until needed. This version assumes enough memory to hold all primes up to ULIMIT. It prints out noncomposites greater than 1.

```BEGIN {  ULIMIT=100

for ( n=1 ; (n++) < ULIMIT ; )
if (n in S) {
p = S[n]
delete S[n]
for ( m = n ; (m += p) in S ; )  { }
S[m] = p
}
else  print ( S[(n+n)] = n )
}
```

## Bash

See solutions at UNIX Shell.

## BASIC

Works with: FreeBASIC
Works with: RapidQ
```DIM n AS Integer, k AS Integer, limit AS Integer

INPUT "Enter number to search to: "; limit
DIM flags(limit) AS Integer

FOR n = 2 TO SQR(limit)
IF flags(n) = 0 THEN
FOR k = n*n TO limit STEP n
flags(k) = 1
NEXT k
END IF
NEXT n

' Display the primes
FOR n = 2 TO limit
IF flags(n) = 0 THEN PRINT n; ", ";
NEXT n
```

### Applesoft BASIC

```10  INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT
20  DIM FLAGS(LIMIT)
30  FOR N = 2 TO SQR (LIMIT)
40  IF FLAGS(N) < > 0 GOTO 80
50  FOR K = N * N TO LIMIT STEP N
60  FLAGS(K) = 1
70  NEXT K
80  NEXT N
90  REM  DISPLAY THE PRIMES
100  FOR N = 2 TO LIMIT
110  IF FLAGS(N) = 0 THEN PRINT N;", ";
120  NEXT N
```

### Atari BASIC

Translation of: Commodore BASIC

Auto-initialization of arrays is not reliable, so we have to do our own. Also, PRINTing with commas doesn't quite format as nicely as one might hope, so we do a little extra work to keep the columns lined up.

```100 REM SIEVE OF ERATOSTHENES
110 PRINT "LIMIT";:INPUT LI
120 DIM N(LI):FOR I=0 TO LI:N(I)=1:NEXT I
130 SL = SQR(LI)
140 N(0)=0:N(1)=0
150 FOR P=2 TO SL
160  IF N(P)=0 THEN 200
170  FOR I=P*P TO LI STEP P
180    N(I)=0
190  NEXT I
200 NEXT P
210 C=0
220 FOR I=2 TO LI
230   IF N(I)=0 THEN 260
240   PRINT I,:C=C+1
250   IF C=3 THEN PRINT:C=0
260 NEXT I
270 IF C THEN PRINT
```
Output:
```  Ready
RUN
LIMIT?100
2         3         5
7         11        13
17        19        23
29        31        37
41        43        47
53        59        61
67        71        73
79        83        89
97```

### Commodore BASIC

Since C= BASIC initializes arrays to all zeroes automatically, we avoid needing our own initialization loop by simply letting 0 mean prime and using 1 for composite.

```100 REM SIEVE OF ERATOSTHENES
110 INPUT "LIMIT";LI
120 DIM N(LI)
130 SL = SQR(LI)
140 N(0)=1:N(1)=1
150 FOR P=2 TO SL
160 : IF N(P) THEN 200
170 : FOR I=P*P TO LI STEP P
180 :   N(I)=1
190 : NEXT I
200 NEXT P
210 FOR I=2 TO LI
220 : IF N(I)=0 THEN PRINT I,
230 NEXT I
240 PRINT
```
Output:
```READY.
RUN
LIMIT? 100
2         3         5         7
11        13        17        19
23        29        31        37
41        43        47        53
59        61        67        71
73        79        83        89
97

```

### IS-BASIC

```100 PROGRAM "Sieve.bas"
110 LET LIMIT=100
120 NUMERIC T(1 TO LIMIT)
130 FOR I=1 TO LIMIT
140   LET T(I)=0
150 NEXT
160 FOR I=2 TO SQR(LIMIT)
170   IF T(I)<>1 THEN
180     FOR K=I*I TO LIMIT STEP I
190       LET T(K)=1
200     NEXT
210   END IF
220 NEXT
230 FOR I=2 TO LIMIT ! Display the primes
240   IF T(I)=0 THEN PRINT I;
250 NEXT```

### Locomotive Basic

```10 DEFINT a-z
20 INPUT "Limit";limit
30 DIM f(limit)
40 FOR n=2 TO SQR(limit)
50 IF f(n)=1 THEN 90
60 FOR k=n*n TO limit STEP n
70 f(k)=1
80 NEXT k
90 NEXT n
100 FOR n=2 TO limit
110 IF f(n)=0 THEN PRINT n;",";
120 NEXT
```

### MSX Basic

```5 REM Tested with MSXPen web emulator
6 REM Translated from Rosetta's ZX Spectrum implementation
10 INPUT "Enter number to search to: ";l
20 DIM p(l)
30 FOR n=2 TO SQR(l)
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
70 NEXT k
80 NEXT n
90 REM Display the primes
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n```

### Sinclair ZX81 BASIC

If you only have 1k of RAM, this program will work—but you will only be able to sieve numbers up to 101. The program is therefore more useful if you have more memory available.

A note on `FAST` and `SLOW`: under normal circumstances the CPU spends about 3/4 of its time driving the display and only 1/4 doing everything else. Entering `FAST` mode blanks the screen (which we do not want to update anyway), resulting in substantially improved performance; we then return to `SLOW` mode when we have something to print out.

``` 10 INPUT L
20 FAST
30 DIM N(L)
40 FOR I=2 TO SQR L
50 IF N(I) THEN GOTO 90
60 FOR J=I+I TO L STEP I
70 LET N(J)=1
80 NEXT J
90 NEXT I
100 SLOW
110 FOR I=2 TO L
120 IF NOT N(I) THEN PRINT I;" ";
130 NEXT I
```

### ZX Spectrum Basic

```10 INPUT "Enter number to search to: ";l
20 DIM p(l)
30 FOR n=2 TO SQR l
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
70 NEXT k
80 NEXT n
90 REM Display the primes
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n
```

### QBasic

Works with: QBasic version 1.1
Works with: QuickBasic version 4.5
```limit = 120

DIM flags(limit)
FOR n = 2 TO limit
flags(n) = 1
NEXT n

PRINT "Prime numbers less than or equal to "; limit; " are: "
FOR n = 2 TO SQR(limit)
IF flags(n) = 1 THEN
FOR i = n * n TO limit STEP n
flags(i) = 0
NEXT i
END IF
NEXT n

FOR n = 1 TO limit
IF flags(n) THEN PRINT USING "####"; n;
NEXT n
```
Output:
```Prime numbers less than or equal to 120 are:
2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97 101 103 107 109 113```

### BASIC256

```arraybase 1
limit = 120

dim  flags(limit)
for n = 2 to limit
flags[n] = True
next n

print "Prime numbers less than or equal to "; limit; " are: "
for n = 2 to sqr(limit)
if flags[n] then
for i = n * n to limit step n
flags[i] = False
next i
end if
next n

for n = 1 to limit
if flags[n] then print rjust(n,4);
next n```
Output:
```Prime numbers less than or equal to 120 are:
2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97 101 103 107 109 113```

### True BASIC

Translation of: QBasic
```LET limit = 120
DIM flags(0)
MAT redim flags(limit)
FOR n = 2 to limit
LET flags(n) = 1
NEXT n
PRINT "Prime numbers less than or equal to "; limit; " are: "
FOR n = 2 to sqr(limit)
IF flags(n) = 1 then
FOR i = n*n to limit step n
LET flags(i) = 0
NEXT i
END IF
NEXT n
FOR n = 1 to limit
IF flags(n)<>0 then PRINT  using "####": n;
NEXT n
END
```
Output:
`Same as QBasic entry.`

### QL SuperBASIC

#### using 'easy way' to 'add' 2n wheels

Translation of: ZX Spectrum Basic

Sets h\$ to 1 for higher multiples of 2 via `FILL\$`, later on sets `STEP` to 2n; replaces Floating Pt array p(z) with string variable h\$(z) to sieve out all primes < z=441 (`l`=21) in under 1K, so that h\$ is fillable to its maximum (32766), even on a 48K ZX Spectrum if translated back.

```10 INPUT "Enter Stopping Pt for squared factors: ";z
15 LET l=SQR(z)
20 LET h\$="10" : h\$=h\$ & FILL\$("01",z)
40      FOR n=3 TO l
50 IF h\$(n): NEXT n
60 FOR k=n*n TO z STEP n+n: h\$(k)=1
80      END FOR n
90 REM Display the primes
100 FOR n=2 TO z: IF h\$(n)=0: PRINT n;", ";
```

#### 2i wheel emulation of Sinclair ZX81 BASIC

Backward-compatible also on Spectrums, as well as 1K ZX81s for all primes < Z=441. N.B. the `STEP` of 2 in line 40 mitigates line 50's inefficiency when going to 90.
``` 10 INPUT Z
15 LET L=SQR(Z)
30 LET H\$="10"
32 FOR J=3 TO Z STEP 2
34 LET H\$=H\$ & "01"
36 NEXT J
40 FOR I=3 TO L STEP 2
50 IF H\$(I)="1" THEN GOTO 90
60 FOR J=I*I TO Z STEP I+I
70 LET H\$(J)="1"
80 NEXT J
90 NEXT I
110 FOR I=2 TO Z
120 IF H\$(I)="0" THEN PRINT I!
130 NEXT I
```

#### 2i wheel emulation of Sinclair ZX80 BASIC

. . . with 2:1 compression (of 16-bit integer variables on ZX80s) such that it obviates having to account for any multiple of 2; one has to input odd upper limits on factors to be squared, `L` (=21 at most on 1K ZX80s for all primes till 439).

Backward-compatible on ZX80s after substituting ** for ^ in line 120.
``` 10 INPUT L
15 LET Z=(L+1)*(L- 1)/2
30 DIM H(Z)
40 FOR I=3 TO L STEP 2
50 IF H((I-1)/ 2) THEN GOTO 90
60 FOR J=I*I TO L*L STEP I+I
70 LET H((J-1)/ 2)=1
80 NEXT J
90 NEXT I
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I
```

#### Sieve of Sundaram

Objections that the latter emulation has strayed far from the given task are obviously justified. Yet not as obvious is that we are now just a slight transformation away from the Sieve of Sundaram, as transformed as follows: `O` is the highest value for an Index of succesive diagonal elements in Sundaram's matrix, for which H(J) also includes the off-diagonal elements in-between, such that duplicate entries are omitted. Thus, a slightly transformed Sieve of Sundaram is what Eratosthenes' Sieve becomes upon applying all optimisations incorporated into the prior entries for QL SuperBASIC, except for any equivalent to line 50 in them.

Backward-compatible on 1K ZX80s for all primes < 441 (O=10) after substituting ** for ^ in line 120.
``` 10 INPUT O
15 LET Z=2*O*O+O*2
30 DIM H(Z)
40 FOR I=1 TO O
45 LET A=2*I*I+I*2
50 REM IF H(A) THEN GOTO 90
60 FOR J=A TO Z STEP 1+I*2
65 REM IF H(J) THEN GOTO 80
70 LET H(J)=1
80 NEXT J
90 NEXT I
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I
```

#### Eulerian optimisation

While slower than the optimised Sieve of Eratosthenes before it, the Sieve of Sundaram above has a compatible compression scheme that's more convenient than the conventional one used beforehand. It is therefore applied below along with Euler's alternative optimisation in a reversed implementation that lacks backward-compatibility to ZX80 BASIC. This program is designed around features & limitations of the QL, yet can be rewritten more efficiently for 1K ZX80s, as they allow integer variables to be parameters of `FOR` statements (& as their 1K of static RAM is equivalent to L1 cache, even in `FAST` mode). That's left as an exercise for ZX80 enthusiasts, who for o%=14 should be able to generate all primes < 841, i.e. 3 orders of (base 2) magnitude above the limit for the program listed under Sinclair ZX81 BASIC. In QL SuperBASIC, o% may at most be 127--generating all primes < 65,025 (almost 2x the upper limit for indices & integer variables used to calculate them ~2x faster than for floating point as used in line 30, after which the integer code mimics an assembly algorithm for the QL's 68008.)

```
10 INPUT "Enter highest value of diagonal index q%: ";o%
15 LET z%=o%*(2+o%*2) : h\$=FILL\$(" ",z%+o%) : q%=1 : q=q% : m=z% DIV (2*q%+1)
30 FOR p=m TO q STEP -1: h\$((2*q+1)*p+q)="1"
42 GOTO 87
61 IF h\$(p%)="1": GOTO 63
62 IF p%<q%: GOTO 87 : ELSE h\$((2*q%+1)*p%+q%)="1"
63 LET p%=p%-1 : GOTO 61
87 LET q%=q%+1 : IF h\$(q%)="1": GOTO 87
90 LET p%=z% DIV (2*q%+1) : IF q%<=o%: GOTO 61
100 LET z%=z%-1 : IF z%=0: PRINT N%(z%) : STOP
101 IF h\$(z%)=" ": PRINT N%(z%)!
110 GOTO 100
127 DEF FN N%(i)=0^i+1+i*2
```

## Batch File

```:: Sieve of Eratosthenes for Rosetta Code - PG
@echo off
setlocal ENABLEDELAYEDEXPANSION
setlocal ENABLEEXTENSIONS
rem echo on
set /p n=limit:
rem set n=100
for /L %%i in (1,1,%n%) do set crible.%%i=1
for /L %%i in (2,1,%n%) do (
if !crible.%%i! EQU 1 (
set /A w = %%i * 2
for /L %%j in (!w!,%%i,%n%) do (
set crible.%%j=0
)
)
)
for /L %%i in (2,1,%n%) do (
if !crible.%%i! EQU 1 echo %%i
)
pause
```
Output:
```limit: 100
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97```

## BBC BASIC

```      limit% = 100000
DIM sieve% limit%

prime% = 2
WHILE prime%^2 < limit%
FOR I% = prime%*2 TO limit% STEP prime%
sieve%?I% = 1
NEXT
REPEAT prime% += 1 : UNTIL sieve%?prime%=0
ENDWHILE

REM Display the primes:
FOR I% = 1 TO limit%
IF sieve%?I% = 0 PRINT I%;
NEXT
```

## BCPL

```get "libhdr"

manifest \$( LIMIT = 1000 \$)

let sieve(prime,max) be
\$(  let i = 2
0!prime := false
1!prime := false
for i = 2 to max do i!prime := true
while i*i <= max do
\$(  if i!prime do
\$(  let j = i*i
while j <= max do
\$(  j!prime := false
j := j + i
\$)
\$)
i := i + 1
\$)
\$)

let start() be
\$(  let prime = vec LIMIT
let col = 0
sieve(prime, LIMIT)
for i = 2 to LIMIT do
if i!prime do
\$(  writef("%I4",i)
col := col + 1
if col rem 20 = 0 then wrch('*N')
\$)
wrch('*N')
\$)```
Output:
```   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71
73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997```

### Odds-only bit packed array version (64 bit)

This sieve also uses an iterator structure to enumerate the primes in the sieve. It's inspired by the golang bit packed sieve that returns a closure as an iterator. However, BCPL does not support closures, so the code uses an iterator object.

```GET "libhdr"

LET lowbit(n) =
0 -> -1,
VALOF {
// The table is byte packed to conserve space; therefore we must
// unpack the structure.
//
LET deBruijn64 = TABLE
#x0001300239311C03, #x3D3A322A261D1104,
#x3E373B2435332B16, #x2D27211E18120C05,
#x3F2F381B3C292510, #x362334152C20170B,
#x2E1A280F22141F0A, #x190E13090D080706

LET x6 = (n & -n) * #x3F79D71B4CB0A89 >> 58
RESULTIS deBruijn64[x6 >> 3] >> (7 - (x6 & 7) << 3) & #xFF
}

LET primes_upto(limit) =
limit < 3 -> 0,
VALOF {
LET bit_sz = (limit + 1) / 2 - 1
LET bit, p = ?, ?
LET q, r = bit_sz >> 6, bit_sz & #x3F
LET sz = q - (r > 0)
LET sieve = getvec(sz)

// Initialize the array
FOR i = 0 TO q - 1 DO
sieve!i := -1
IF r > 0 THEN sieve!q := ~(-1 << r)
sieve!sz := -1 // Sentinel value to mark the end -
// (after sieving, we'll never have 64 consecutive odd primes.)

// run the sieve
bit := 0
{
WHILE (sieve[bit >> 6] & 1 << (bit & #x3F)) = 0 DO
bit +:= 1
p := 2*bit + 3
q := p*p
IF q > limit THEN RESULTIS sieve
r := (q - 3) >> 1
UNTIL r >= bit_sz DO {
sieve[r >> 6] &:= ~(1 << (r & #x3F))
r +:= p
}
bit +:= 1
} REPEAT
}

MANIFEST { // fields in an iterable
sieve_start; sieve_bits; sieve_ptr
}

LET prime_iter(sieve) = VALOF {
LET iter = getvec(2)
iter!sieve_start := 0
iter!sieve_bits := sieve!0
iter!sieve_ptr := sieve
RESULTIS iter
}

LET nextprime(iter) =
!iter!sieve_ptr = -1 -> 0,  // guard entry if at the end already
VALOF {
LET p, x = ?, ?

// iter!sieve_start is also a flag to yield 2.
IF iter!sieve_start = 0 {
iter!sieve_start := 3
RESULTIS 2
}
x := iter!sieve_bits
{
TEST x ~= 0
THEN {
p := (lowbit(x) << 1) + iter!sieve_start
x &:= x - 1
iter!sieve_bits := x
RESULTIS p
}
ELSE {
iter!sieve_start +:= 128
iter!sieve_ptr +:= 1
x := !iter!sieve_ptr
IF x = -1 RESULTIS 0
}
} REPEAT
}

LET show(sieve) BE {
LET iter = prime_iter(sieve)
LET c, p = 0, ?
{
p := nextprime(iter)
IF p = 0 THEN {
wrch('*n')
freevec(iter)
RETURN
}
IF c MOD 10 = 0 THEN wrch('*n')
c +:= 1
writef("%8d", p)
} REPEAT
}

LET start() = VALOF {
LET n = ?
LET argv = VEC 20
LET sz = ?
LET primes = ?

sz := rdargs("upto/a/n/p", argv, 20)
IF sz = 0 RESULTIS 1
n := !argv!0
primes := primes_upto(n)
IF primes = 0 RESULTIS 1 // no array allocated because limit too small
show(primes)
freevec(primes)
RESULTIS 0
}```
Output:
```\$ ./sieve 1000

BCPL 64-bit Cintcode System (13 Jan 2020)
0.000>
2       3       5       7      11      13      17      19      23      29
31      37      41      43      47      53      59      61      67      71
73      79      83      89      97     101     103     107     109     113
127     131     137     139     149     151     157     163     167     173
179     181     191     193     197     199     211     223     227     229
233     239     241     251     257     263     269     271     277     281
283     293     307     311     313     317     331     337     347     349
353     359     367     373     379     383     389     397     401     409
419     421     431     433     439     443     449     457     461     463
467     479     487     491     499     503     509     521     523     541
547     557     563     569     571     577     587     593     599     601
607     613     617     619     631     641     643     647     653     659
661     673     677     683     691     701     709     719     727     733
739     743     751     757     761     769     773     787     797     809
811     821     823     827     829     839     853     857     859     863
877     881     883     887     907     911     919     929     937     941
947     953     967     971     977     983     991     997
0.005>
```

## Befunge

```2>:3g" "-!v\  g30          <
|!`"O":+1_:.:03p>03g+:"O"`|
@               ^  p3\" ":<
2 234567890123456789012345678901234567890123456789012345678901234567890123456789
```

## BQN

A more efficient sieve (primes below one billion in under a minute) is provided as `PrimesTo` in bqn-libs primes.bqn.

```Primes ← {
𝕩≤2 ? ↕0 ;             # No primes below 2
p ← 𝕊⌈√n←𝕩             # Initial primes by recursion
b ← 2≤↕n               # Initial sieve: no 0 or 1
E ← {↕∘⌈⌾((𝕩×𝕩+⊢)⁼)n}  # Multiples of 𝕩 under n, starting at 𝕩×𝕩
/ b E⊸{0¨⌾(𝕨⊸⊏)𝕩}´ p   # Cross them out
}```
Output:
```   Primes 100
⟨ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 ⟩
≠∘Primes¨ 10⋆↕7  # Number of primes below 1e0, 1e1, ... 1e6
⟨ 0 4 25 168 1229 9592 78498 ⟩```

## Bracmat

This solution does not use an array. Instead, numbers themselves are used as variables. The numbers that are not prime are set (to the silly value "nonprime"). Finally all numbers up to the limit are tested for being initialised. The uninitialised (unset) ones must be the primes.

```( ( eratosthenes
=   n j i
.   !arg:?n
& 1:?i
&   whl
' ( (1+!i:?i)^2:~>!n:?j
& ( !!i
|   whl
' ( !j:~>!n
& nonprime:?!j
& !j+!i:?j
)
)
)
& 1:?i
&   whl
' ( 1+!i:~>!n:?i
& (!!i|put\$(!i " "))
)
)
& eratosthenes\$100
)```
Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

## C

Plain sieve, without any optimizations:
```#include <stdlib.h>
#include <math.h>

char*
eratosthenes(int n, int *c)
{
char* sieve;
int i, j, m;

if(n < 2)
return NULL;

*c = n-1;     /* primes count */
m = (int) sqrt((double) n);

/* calloc initializes to zero */
sieve = calloc(n+1,sizeof(char));
sieve = 1;
sieve = 1;
for(i = 2; i <= m; i++)
if(!sieve[i])
for (j = i*i; j <= n; j += i)
if(!sieve[j]){
sieve[j] = 1;
--(*c);
}
return sieve;
}
```
Possible optimizations include sieving only odd numbers (or more complex wheels), packing the sieve into bits to improve locality (and allow larger sieves), etc.

Another example:

We first fill ones into an array and assume all numbers are prime. Then, in a loop, fill zeroes into those places where i * j is less than or equal to n (number of primes requested), which means they have multiples! To understand this better, look at the output of the following example.

To print this back, we look for ones in the array and only print those spots.
```#include <stdio.h>
#include <malloc.h>
void sieve(int *, int);

int main(int argc, char *argv)
{
int *array, n=10;
array =(int *)malloc((n + 1) * sizeof(int));
sieve(array,n);
return 0;
}

void sieve(int *a, int n)
{
int i=0, j=0;

for(i=2; i<=n; i++) {
a[i] = 1;
}

for(i=2; i<=n; i++) {
printf("\ni:%d", i);
if(a[i] == 1) {
for(j=i; (i*j)<=n; j++) {
printf ("\nj:%d", j);
printf("\nBefore a[%d*%d]: %d", i, j, a[i*j]);
a[(i*j)] = 0;
printf("\nAfter a[%d*%d]: %d", i, j, a[i*j]);
}
}
}

printf("\nPrimes numbers from 1 to %d are : ", n);
for(i=2; i<=n; i++) {
if(a[i] == 1)
printf("%d, ", i);
}
printf("\n\n");
}
```
Output:
```i:2
j:2
Before a[2*2]: 1
After a[2*2]: 0
j:3
Before a[2*3]: 1
After a[2*3]: 0
j:4
Before a[2*4]: 1
After a[2*4]: 0
j:5
Before a[2*5]: 1
After a[2*5]: 0
i:3
j:3
Before a[3*3]: 1
After a[3*3]: 0
i:4
i:5
i:6
i:7
i:8
i:9
i:10
Primes numbers from 1 to 10 are : 2, 3, 5, 7,
```

## C#

Works with: C# version 2.0+
```using System;
using System.Collections;
using System.Collections.Generic;

namespace SieveOfEratosthenes
{
class Program
{
static void Main(string[] args)
{
int maxprime = int.Parse(args);
var primelist = GetAllPrimesLessThan(maxprime);
foreach (int prime in primelist)
{
Console.WriteLine(prime);
}
Console.WriteLine("Count = " + primelist.Count);
}

private static List<int> GetAllPrimesLessThan(int maxPrime)
{
var primes = new List<int>();
var maxSquareRoot = (int)Math.Sqrt(maxPrime);
var eliminated = new BitArray(maxPrime + 1);

for (int i = 2; i <= maxPrime; ++i)
{
if (!eliminated[i])
{
if (i <= maxSquareRoot)
{
for (int j = i * i; j <= maxPrime; j += i)
{
eliminated[j] = true;
}
}
}
}
return primes;
}
}
}
```

### Richard Bird Sieve

Translation of: F#

To show that C# code can be written in somewhat functional paradigms, the following in an implementation of the Richard Bird sieve from the Epilogue of [Melissa E. O'Neill's definitive article](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) in Haskell:

```using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesBird : IEnumerable<PrimeT> {
private struct CIS<T> {
public T v; public Func<CIS<T>> cont;
public CIS(T v, Func<CIS<T>> cont) {
this.v = v; this.cont = cont;
}
}
private CIS<PrimeT> pmlts(PrimeT p) {
Func<PrimeT, CIS<PrimeT>> fn = null;
fn = (c) => new CIS<PrimeT>(c, () => fn(c + p));
return fn(p * p);
}
private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {
return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont())); }
private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {
var x = xs.v; var y = ys.v;
if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));
else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));
else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));
}
private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {
return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(css.cont()))); }
private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {
var nn = n; var ncs = cs;
for (; ; ++nn) {
if (nn >= ncs.v) ncs = ncs.cont();
else return new CIS<PrimeT>(nn, () => minusat(++nn, ncs));
}
}
private CIS<PrimeT> prms() {
return new CIS<PrimeT>(2, () => minusat(3, cmpsts(allmlts(prms())))); }
public IEnumerator<PrimeT> GetEnumerator() {
for (var ps = prms(); ; ps = ps.cont()) yield return ps.v;
}
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}
```

### Tree Folding Sieve

Translation of: F#

The above code can easily be converted to "odds-only" and a infinite tree-like folding scheme with the following minor changes:

```using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesTreeFold : IEnumerable<PrimeT> {
private struct CIS<T> {
public T v; public Func<CIS<T>> cont;
public CIS(T v, Func<CIS<T>> cont) {
this.v = v; this.cont = cont;
}
}
private CIS<PrimeT> pmlts(PrimeT p) {
var adv = p + p;
Func<PrimeT, CIS<PrimeT>> fn = null;
fn = (c) => new CIS<PrimeT>(c, () => fn(c + adv));
return fn(p * p);
}
private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {
return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont()));
}
private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {
var x = xs.v; var y = ys.v;
if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));
else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));
else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));
}
private CIS<CIS<PrimeT>> pairs(CIS<CIS<PrimeT>> css) {
var nxtcss = css.cont();
return new CIS<CIS<PrimeT>>(merge(css.v, nxtcss.v), () => pairs(nxtcss.cont())); }
private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {
return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(pairs(css.cont()))));
}
private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {
var nn = n; var ncs = cs;
for (; ; nn += 2) {
if (nn >= ncs.v) ncs = ncs.cont();
else return new CIS<PrimeT>(nn, () => minusat(nn + 2, ncs));
}
}
private CIS<PrimeT> oddprms() {
return new CIS<PrimeT>(3, () => minusat(5, cmpsts(allmlts(oddprms()))));
}
public IEnumerator<PrimeT> GetEnumerator() {
yield return 2;
for (var ps = oddprms(); ; ps = ps.cont()) yield return ps.v;
}
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}
```

The above code runs over ten times faster than the original Richard Bird algorithm.

### Priority Queue Sieve

Translation of: F#

First, an implementation of a Min Heap Priority Queue is provided as extracted from the entry at RosettaCode, with only the necessary methods duplicated here:

```namespace PriorityQ {
using KeyT = System.UInt32;
using System;
using System.Collections.Generic;
using System.Linq;
class Tuple<K, V> { // for DotNet 3.5 without Tuple's
public K Item1; public V Item2;
public Tuple(K k, V v) { Item1 = k; Item2 = v; }
public override string ToString() {
return "(" + Item1.ToString() + ", " + Item2.ToString() + ")";
}
}
class MinHeapPQ<V> {
private struct HeapEntry {
public KeyT k; public V v;
public HeapEntry(KeyT k, V v) { this.k = k; this.v = v; }
}
private List<HeapEntry> pq;
private MinHeapPQ() { this.pq = new List<HeapEntry>(); }
private bool mt { get { return pq.Count == 0; } }
private Tuple<KeyT, V> pkmn {
get {
if (pq.Count == 0) return null;
else {
var mn = pq;
return new Tuple<KeyT, V>(mn.k, mn.v);
}
}
}
private void psh(KeyT k, V v) { // add extra very high item if none
if (pq.Count == 0) pq.Add(new HeapEntry(UInt32.MaxValue, v));
var i = pq.Count; pq.Add(pq[i - 1]); // copy bottom item...
for (var ni = i >> 1; ni > 0; i >>= 1, ni >>= 1) {
var t = pq[ni - 1];
if (t.k > k) pq[i - 1] = t; else break;
}
pq[i - 1] = new HeapEntry(k, v);
}
private void siftdown(KeyT k, V v, int ndx) {
var cnt = pq.Count - 1; var i = ndx;
for (var ni = i + i + 1; ni < cnt; ni = ni + ni + 1) {
var oi = i; var lk = pq[ni].k; var rk = pq[ni + 1].k;
var nk = k;
if (k > lk) { i = ni; nk = lk; }
if (nk > rk) { ni += 1; i = ni; }
if (i != oi) pq[oi] = pq[i]; else break;
}
pq[i] = new HeapEntry(k, v);
}
private void rplcmin(KeyT k, V v) {
if (pq.Count > 1) siftdown(k, v, 0); }
public static MinHeapPQ<V> empty { get { return new MinHeapPQ<V>(); } }
public static Tuple<KeyT, V> peekMin(MinHeapPQ<V> pq) { return pq.pkmn; }
public static MinHeapPQ<V> push(KeyT k, V v, MinHeapPQ<V> pq) {
pq.psh(k, v); return pq; }
public static MinHeapPQ<V> replaceMin(KeyT k, V v, MinHeapPQ<V> pq) {
pq.rplcmin(k, v); return pq; }
}
```

### Restricted Base Primes Queue

The following code implements an improved version of the odds-only O'Neil algorithm, which provides the improvements of only adding base prime composite number streams to the queue when the sieved number reaches the square of the base prime (saving a huge amount of memory and considerable execution time, including not needing as wide a range of a type for the internal prime numbers) as well as minimizing stream processing using fusion:

```using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesPQ : IEnumerable<PrimeT> {
private IEnumerator<PrimeT> nmrtr() {
MinHeapPQ<PrimeT> pq = MinHeapPQ<PrimeT>.empty;
PrimeT bp = 3; PrimeT q = 9;
IEnumerator<PrimeT> bps = null;
yield return 2; yield return 3;
for (var n = (PrimeT)5; ; n += 2) {
if (n >= q) { // always equal or less...
if (q <= 9) {
bps = nmrtr();
bps.MoveNext(); bps.MoveNext(); } // move to 3...
bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;
var adv = bp + bp; bp = nbp;
}
else {
var pk = MinHeapPQ<PrimeT>.peekMin(pq);
var ck = (pk == null) ? q : pk.Item1;
if (n >= ck) {
do { var adv = pk.Item2;
pk = MinHeapPQ<PrimeT>.peekMin(pq); ck = pk.Item1;
} while (n >= ck);
}
else yield return n;
}
}
}
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}
```

The above code is at least about 2.5 times faster than the Tree Folding version.

### Dictionary (Hash table) Sieve

The above code adds quite a bit of overhead in having to provide a version of a Priority Queue for little advantage over a Dictionary (hash table based) version as per the code below:

```using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesDict : IEnumerable<PrimeT> {
private IEnumerator<PrimeT> nmrtr() {
Dictionary<PrimeT, PrimeT> dct = new Dictionary<PrimeT, PrimeT>();
PrimeT bp = 3; PrimeT q = 9;
IEnumerator<PrimeT> bps = null;
yield return 2; yield return 3;
for (var n = (PrimeT)5; ; n += 2) {
if (n >= q) { // always equal or less...
if (q <= 9) {
bps = nmrtr();
bps.MoveNext(); bps.MoveNext();
} // move to 3...
bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;
var adv = bp + bp; bp = nbp;
}
else {
if (dct.ContainsKey(n)) {
}
else yield return n;
}
}
}
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}
```

The above code runs in about three quarters of the time as the above Priority Queue based version for a range of a million primes which will fall even further behind for increasing ranges due to the Dictionary providing O(1) access times as compared to the O(log n) access times for the Priority Queue; the only slight advantage of the PQ based version is at very small ranges where the constant factor overhead of computing the table hashes becomes greater than the "log n" factor for small "n".

### Best performance: CPU-Cache-Optimized Segmented Sieve

All of the above unbounded versions are really just an intellectual exercise as with very little extra lines of code above the fastest Dictionary based version, one can have an bit-packed page-segmented array based version as follows:

```using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesPgd : IEnumerable<PrimeT> {
private const int PGSZ = 1 << 14; // L1 CPU cache size in bytes
private const int BFBTS = PGSZ * 8; // in bits
private const int BFRNG = BFBTS * 2;
public IEnumerator<PrimeT> nmrtr() {
IEnumerator<PrimeT> bps = null;
List<uint> bpa = new List<uint>();
uint[] cbuf = new uint[PGSZ / 4]; // 4 byte words
yield return 2;
for (var lowi = (PrimeT)0; ; lowi += BFBTS) {
for (var bi = 0; ; ++bi) {
if (bi < 1) {
if (bi < 0) { bi = 0; yield return 2; }
PrimeT nxt = 3 + lowi + lowi + BFRNG;
if (lowi <= 0) { // cull very first page
for (int i = 0, p = 3, sqr = 9; sqr < nxt; i++, p += 2, sqr = p * p)
if ((cbuf[i >> 5] & (1 << (i & 31))) == 0)
for (int j = (sqr - 3) >> 1; j < BFBTS; j += p)
cbuf[j >> 5] |= 1u << j;
}
else { // cull for the rest of the pages
Array.Clear(cbuf, 0, cbuf.Length);
if (bpa.Count == 0) { // inite secondar base primes stream
bps = nmrtr(); bps.MoveNext(); bps.MoveNext();
} // add 3 to base primes array
// make sure bpa contains enough base primes...
for (PrimeT p = bpa[bpa.Count - 1], sqr = p * p; sqr < nxt; ) {
p = bps.Current; bps.MoveNext(); sqr = p * p; bpa.Add((uint)p);
}
for (int i = 0, lmt = bpa.Count - 1; i < lmt; i++) {
var p = (PrimeT)bpa[i]; var s = (p * p - 3) >> 1;
// adjust start index based on page lower limit...
if (s >= lowi) s -= lowi;
else {
var r = (lowi - s) % p;
s = (r != 0) ? p - r : 0;
}
for (var j = (uint)s; j < BFBTS; j += p)
cbuf[j >> 5] |= 1u << ((int)j);
}
}
}
while (bi < BFBTS && (cbuf[bi >> 5] & (1 << (bi & 31))) != 0) ++bi;
if (bi < BFBTS) yield return 3 + (((PrimeT)bi + lowi) << 1);
else break; // outer loop for next page segment...
}
}
}
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}
```

The above code is about 25 times faster than the Dictionary version at computing the first about 50 million primes (up to a range of one billion), with the actual enumeration of the result sequence now taking longer than the time it takes to cull the composite number representation bits from the arrays, meaning that it is over 50 times faster at actually sieving the primes. The code owes its speed as compared to a naive "one huge memory array" algorithm to using an array size that is the size of the CPU L1 or L2 caches and using bit-packing to fit more number representations into this limited capacity; in this way RAM memory access times are reduced by a factor of from about four to about 10 (depending on CPU and RAM speed) as compared to those naive implementations, and the minor computational cost of the bit manipulations is compensated by a large factor in total execution time.

The time to enumerate the result primes sequence can be reduced somewhat (about a second) by removing the automatic iterator "yield return" statements and converting them into a "roll-your-own" IEnumerable<PrimeT> implementation, but for page segmentation of odds-only, this iteration of the results will still take longer than the time to actually cull the composite numbers from the page arrays.

In order to make further gains in speed, custom methods must be used to avoid using iterator sequences. If this is done, then further gains can be made by extreme wheel factorization (up to about another about four times gain in speed) and multi-processing (with another gain in speed proportional to the actual independent CPU cores used).

Note that all of these gains in speed are not due to C# other than it compiles to reasonably efficient machine code, but rather to proper use of the Sieve of Eratosthenes algorithm.

All of the above unbounded code can be tested by the following "main" method (replace the name "PrimesXXX" with the name of the class to be tested):

```    static void Main(string[] args) {
Console.WriteLine(new PrimesXXX().ElementAt(1000000 - 1)); // zero based indexing...
}
```

To produce the following output for all tested versions (although some are considerably faster than others):

Output:
`15485863`

## C++

### Standard Library

This implementation follows the standard library pattern of std::iota. The start and end iterators are provided for the container. The destination container is used for marking primes and then filled with the primes which are less than the container size. This method requires no memory allocation inside the function.

```#include <iostream>
#include <algorithm>
#include <vector>

// requires Iterator satisfies RandomAccessIterator
template <typename Iterator>
size_t prime_sieve(Iterator start, Iterator end)
{
if (start == end) return 0;
// clear the container with 0
std::fill(start, end, 0);
// mark composites with 1
for (Iterator prime_it = start + 1; prime_it != end; ++prime_it)
{
if (*prime_it == 1) continue;
// determine the prime number represented by this iterator location
size_t stride = (prime_it - start) + 1;
// mark all multiples of this prime number up to max
Iterator mark_it = prime_it;
while ((end - mark_it) > stride)
{
*mark_it = 1;
}
}
// copy marked primes into container
Iterator out_it = start;
for (Iterator it = start + 1; it != end; ++it)
{
if (*it == 0)
{
*out_it = (it - start) + 1;
++out_it;
}
}
return out_it - start;
}

int main(int argc, const char* argv[])
{
std::vector<int> primes(100);
size_t count = prime_sieve(primes.begin(), primes.end());
// display the primes
for (size_t i = 0; i < count; ++i)
std::cout << primes[i] << " ";
std::cout << std::endl;
return 1;
}
```

### Boost

```// yield all prime numbers less than limit.
template<class UnaryFunction>
void primesupto(int limit, UnaryFunction yield)
{
std::vector<bool> is_prime(limit, true);

const int sqrt_limit = static_cast<int>(std::sqrt(limit));
for (int n = 2; n <= sqrt_limit; ++n)
if (is_prime[n]) {
yield(n);

for (unsigned k = n*n, ulim = static_cast<unsigned>(limit); k < ulim; k += n)
//NOTE: "unsigned" is used to avoid an overflow in `k+=n` for `limit` near INT_MAX
is_prime[k] = false;
}

for (int n = sqrt_limit + 1; n < limit; ++n)
if (is_prime[n])
yield(n);
}
```

Full program:

Works with: Boost
```/**
\$ g++ -I/path/to/boost sieve.cpp -o sieve && sieve 10000000
*/
#include <inttypes.h> // uintmax_t
#include <limits>
#include <cmath>
#include <iostream>
#include <sstream>
#include <vector>

#include <boost/lambda/lambda.hpp>

int main(int argc, char *argv[])
{
using namespace std;
using namespace boost::lambda;

int limit = 10000;
if (argc == 2) {
stringstream ss(argv[--argc]);
ss >> limit;

if (limit < 1 or ss.fail()) {
cerr << "USAGE:\n  sieve LIMIT\n\nwhere LIMIT in the range [1, "
<< numeric_limits<int>::max() << ")" << endl;
return 2;
}
}

// print primes less then 100
primesupto(100, cout << _1 << " ");
cout << endl;

// find number of primes less then limit and their sum
int count = 0;
uintmax_t sum = 0;
primesupto(limit, (var(sum) += _1, var(count) += 1));

cout << "limit sum pi(n)\n"
<< limit << " " << sum << " " << count << endl;
}
```

## Chapel

 This example is incorrect. Please fix the code and remove this message.Details: Doesn't compile since at least Chapel version 1.20 to 1.24.1.

This solution uses nested iterators to create new wheels at run time:

```// yield prime and remove all multiples of it from children sieves
iter sieve(prime):int {

yield prime;

var last = prime;
label candidates for candidate in sieve(prime+1) do {
for composite in last..candidate by prime do {

// candidate is a multiple of this prime
if composite == candidate {
// remember size of last composite
last = composite;
// and try the next candidate
continue candidates;
}
}

// candidate cannot need to be removed by this sieve
// yield to parent sieve for checking
yield candidate;
}
}
```
The topmost sieve needs to be started with 2 (the smallest prime):
```config const N = 30;
for p in sieve(2) {
if p > N {
writeln();
break;
}
write(" ", p);
}
```

### Alternate Conventional Bit-Packed Implementation

The following code implements the conventional monolithic (one large array) Sieve of Eratosthenes where the representations of the numbers use only one bit per number, using an iteration for output so as to not require further memory allocation:

compile with the `--fast` option

```use Time;
use BitOps;

type Prime = uint(32);

config const limit: Prime = 1000000000; // sieve limit

proc main() {
write("The first 25 primes are:  ");
for p in primes(100) do write(p, " "); writeln();

var count = 0; for p in primes(1000000) do count += 1;
writeln("Count of primes to a million is:  ", count, ".");

var timer: Timer;
timer.start();

count = 0;
for p in primes(limit) do count += 1;

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

iter primes(n: Prime): Prime {
const szlmt = n / 8;
var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up

for bp in 2 .. n {
if cmpsts[bp >> 3] & (1: uint(8) << (bp & 7)) == 0 {
const s0 = bp * bp;
if s0 > n then break;
for c in s0 .. n by bp { cmpsts[c >> 3] |= 1: uint(8) << (c & 7); }
}
}

for p in 2 .. n do if cmpsts[p >> 3] & (1: uint(8) << (p & 7)) == 0 then yield p;

}
```
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Count of primes to a million is:  78498.
Found 50847534 primes up to 1000000000 in 7964.05 milliseconds.```

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

### Alternate Odds-Only Bit-Packed Implementation

```use Time;
use BitOps;

type Prime = int(32);

config const limit: Prime = 1000000000; // sieve limit

proc main() {
write("The first 25 primes are:  ");
for p in primes(100) do write(p, " "); writeln();

var count = 0; for p in primes(1000000) do count += 1;
writeln("Count of primes to a million is:  ", count, ".");

var timer: Timer;
timer.start();

count = 0;
for p in primes(limit) do count += 1;

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

iter primes(n: Prime): Prime {
const ndxlmt = (n - 3) / 2;
const szlmt = ndxlmt / 8;
var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up

for i in 0 .. ndxlmt { // never gets to the end!
if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 {
const bp = i + i + 3;
const s0 = (bp * bp - 3) / 2;
if s0 > ndxlmt then break;
for s in s0 .. ndxlmt by bp do cmpsts[s >> 3] |= 1: uint(8) << (s & 7);
}
}

yield 2;
for i in 0 .. ndxlmt do
if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 then yield i + i + 3;

}
```
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Count of primes to a million is:  78498.
Found 50847534 primes up to 1000000000 in 4008.16 milliseconds.```

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, sieving odds-only is about twice as fast due to the reduced number of operations; it also uses only half the amount of memory. However, this is still not all that fast at about 14.4 CPU clock cycles per sieve culling operation due to the size of the array exceeding the CPU cache size(s).

### Hash Table Based Odds-Only Version

Translation of: Python
Works with: Chapel version 1.25.1
```use Time;

config const limit = 100000000;

type Prime = uint(32);

class Primes { // needed so we can use next to get successive values
var n: Prime; var obp: Prime; var q: Prime;
var bps: owned Primes?;
var keys: domain(Prime); var dict: [keys] Prime;
proc next(): Prime { // odd primes!
if this.n < 5 { this.n = 5; return 3; }
if this.bps == nil {
this.bps = new Primes(); // secondary odd base primes feed
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
while true {
if this.n >= this.q { // advance secondary stream of base primes...
const adv = this.obp * 2; const key = this.q + adv;
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
this.keys += key; this.dict[key] = adv;
}
else if this.keys.contains(this.n) { // found a composite; advance...
var nkey = this.n + adv;
while this.keys.contains(nkey) do nkey += adv;
this.keys += nkey; this.dict[nkey] = adv;
}
else { const p = this.n; this.n += 2; return p; }
this.n += 2;
}
return 0; // to keep compiler happy in returning a value!
}
iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
var count = 0;
write("The first 25 primes are:  ");
for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
writeln();

var timer: Timer;
timer.start();

count = 0;
for p in new Primes() { if p > limit then break; count += 1; }

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
```
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 5761455 primes up to 100000000 in 5195.41 milliseconds.```

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, this is much slower than the array based versions but much faster than previous Chapel version code as the hashing has been greatly improved.

As an alternate to the use of a built-in library, the following code implements a specialized BasePrimesTable that works similarly to the way the Python associative arrays work as to hashing algorithm used (no hashing, as the hash values for integers are just themselves) and something similar to the Python method of handling hash table collisions is used:

Works with: Chapel version 1.25.1

Compile with the `--fast` compiler command line option

```use Time;

config const limit = 100000000;

type Prime = uint(32);

record BasePrimesTable { // specialized for the use here...
record BasePrimeEntry { var fullkey: Prime; var val: Prime; }
var cpcty: int = 8; var sz: int = 0;
var dom = { 0 .. cpcty - 1 }; var bpa: [dom] BasePrimeEntry;
proc grow() {
const ndom = dom; var cbpa: [ndom] BasePrimeEntry = bpa[ndom];
bpa = new BasePrimeEntry(); cpcty *= 2; dom = { 0 .. cpcty - 1 };
for kv in cbpa do if kv.fullkey != 0 then add(kv.fullkey, kv.val);
}
proc find(k: Prime): int { // internal get location of value or -1
const msk = cpcty - 1; var skey = k: int & msk;
var perturb = k: int; var loop = 8;
do {
if bpa[skey].fullkey == k then return skey;
perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
loop -= 1; if perturb > 0 then loop = 8;
} while loop > 0;
}
proc contains(k: Prime): bool { return find(k) >= 0; }
proc add(k, v: Prime) { // if exists then replaces else new entry
const fndi = find(k);
if fndi >= 0 then bpa[fndi] = new BasePrimeEntry(k, v);
else {
sz += 1; if 2 * sz > cpcty then grow();
const msk = cpcty - 1; var skey = k: int & msk;
var perturb = k: int; var loop = 8;
do {
if bpa[skey].fullkey == 0 {
bpa[skey] = new BasePrimeEntry(k, v); return; }
perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
loop -= 1; if perturb > 0 then loop = 8;
} while loop > 0;
}
}
proc remove(k: Prime) { // if doesn't exist does nothing
const fndi = find(k);
if fndi >= 0 { bpa[fndi].fullkey = 0; sz -= 1; }
}
proc this(k: Prime): Prime { // returns value or 0 if not found
const fndi = find(k);
if fndi < 0 then return 0; else return bpa[fndi].val;
}
}

class Primes { // needed so we can use next to get successive values
var n: Prime; var obp: Prime; var q: Prime;
var bps: shared Primes?; var dict = new BasePrimesTable();
proc next(): Prime { // odd primes!
if this.n < 5 { this.n = 5; return 3; }
if this.bps == nil {
this.bps = new Primes(); // secondary odd base primes feed
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
while true {
if this.n >= this.q { // advance secondary stream of base primes...
const adv = this.obp * 2; const key = this.q + adv;
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
else if this.dict.contains(this.n) { // found a composite; advance...
var nkey = this.n + adv;
while this.dict.contains(nkey) do nkey += adv;
}
else { const p = this.n; this.n += 2; return p; }
this.n += 2;
}
return 0; // to keep compiler happy in returning a value!
}
iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
var count = 0;
write("The first 25 primes are:  ");
for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
writeln();

var timer: Timer;
timer.start();

count = 0;
for p in new Primes() { if p > limit then break; count += 1; }

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
```
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 5761455 primes up to 100000000 in 2351.79 milliseconds.```

This last code is quite usable up to a hundred million (as here) or even a billion in a little over ten times the time, but is still slower than the very simple odds-only monolithic array version and is also more complex, although it uses less memory (only for the hash table for the base primes of about eight Kilobytes for sieving to a billion compared to over 60 Megabytes for the monolithic odds-only simple version).

Chapel version 1.25.1 provides yet another option as to the form of the code although the algorithm is the same in that one can now override the hashing function for Chapel records so that they can be used as the Key Type for Hash Map's as follows:

Works with: Chapel version 1.25.1

Compile with the `--fast` compiler command line option

```use Time;

use Map;

config const limit = 100000000;

type Prime = uint(32);

class Primes { // needed so we can use next to get successive values
record PrimeR { var prime: Prime; proc hash() { return prime; } }
var n: PrimeR = new PrimeR(0); var obp: Prime; var q: Prime;
var bps: owned Primes?;
var dict = new map(PrimeR, Prime);
proc next(): Prime { // odd primes!
if this.n.prime < 5 { this.n.prime = 5; return 3; }
if this.bps == nil {
this.bps = new Primes(); // secondary odd base primes feed
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
while true {
if this.n.prime >= this.q { // advance secondary stream of base primes...
const adv = this.obp * 2; const key = new PrimeR(this.q + adv);
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
else if this.dict.contains(this.n) { // found a composite; advance...
var nkey = new PrimeR(this.n.prime + adv);
while this.dict.contains(nkey) do nkey.prime += adv;
}
else { const p = this.n.prime;
this.n.prime += 2; return p; }
this.n.prime += 2;
}
return 0; // to keep compiler happy in returning a value!
}
iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
var count = 0;
write("The first 25 primes are:  ");
for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
writeln();

var timer: Timer;
timer.start();

count = 0;
for p in new Primes() { if p > limit then break; count += 1; }

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
```

This works in about exactly the same time as the last previous code, but doesn't require special custom adaptations of the associative array so that the standard library Map can be used.

### Functional Tree Folding Odds-Only Version

Chapel isn't really a very functional language even though it has some functional forms of code in the Higher Order Functions (HOF's) of zippered, scanned, and reduced, iterations and has first class functions (FCF's) and lambdas (anonymous functions), these last can't be closures (capture variable bindings from external scope(s)), nor can the work around of using classes to emulate closures handle recursive (Y-combinator type) variable bindings using reference fields (at least currently with version 1.22). However, the Tree Folding add-on to the Richard Bird lazy list sieve doesn't require any of the things that can't be emulated using classes, so a version is given as follows:

Translation of: Nim
Works with: 1.22 version - compile with the --fast compiler command line flag for full optimization
```use Time;

type Prime = uint(32);

config const limit = 1000000: Prime;

// Chapel doesn't have closures, so we need to emulate them with classes...
class PrimeCIS { // base prime stream...
proc next(): shared PrimeCIS { return new shared PrimeCIS(); }
}

class PrimeMultiples: PrimeCIS {
override proc next(): shared PrimeCIS {
return new shared PrimeMultiples(
}

class PrimeCISCIS { // base stream of prime streams; never used directly...
proc init() { this.head = new shared PrimeCIS(); }
proc next(): shared PrimeCISCIS {
return new shared PrimeCISCIS(); }
}

class AllMultiples: PrimeCISCIS {
var bps: shared PrimeCIS;
proc init(bsprms: shared PrimeCIS) {
const bp = bsprms.head; const sqr = bp * bp; const adv = bp + bp;
this.bps = bsprms;
}
override proc next(): shared PrimeCISCIS {
return new shared AllMultiples(this.bps.next()): PrimeCISCIS; }
}

class Union: PrimeCIS {
var feeda, feedb: shared PrimeCIS;
proc init(fda: shared PrimeCIS, fdb: shared PrimeCIS) {
this.head = if ahd < bhd then ahd else bhd;
this.feeda = fda; this.feedb = fdb;
}
override proc next(): shared PrimeCIS {
if ahd < bhd then
return new shared Union(this.feeda.next(), this.feedb): shared PrimeCIS;
if ahd > bhd then
return new shared Union(this.feeda, this.feedb.next()): shared PrimeCIS;
return new shared Union(this.feeda.next(),
this.feedb.next()): shared PrimeCIS;
}
}

class Pairs: PrimeCISCIS {
var feed: shared PrimeCISCIS;
proc init(fd: shared PrimeCISCIS) {
const fs = fd.head; const sss = fd.next(); const ss = sss.head;
this.head = new shared Union(fs, ss): shared PrimeCIS; this.feed = sss;
}
override proc next(): shared PrimeCISCIS {
return new shared Pairs(this.feed.next()): shared PrimeCISCIS; }
}

class Composites: PrimeCIS {
var feed: shared PrimeCISCIS;
proc init(fd: shared PrimeCISCIS) {
}
override proc next(): shared PrimeCIS {
const prs = new shared Pairs(this.feed.next()): shared PrimeCISCIS;
const ncs = new shared Composites(prs): shared PrimeCIS;
return new shared Union(fs, ncs): shared PrimeCIS;
}
}

class OddPrimesFrom: PrimeCIS {
var cmpsts: shared PrimeCIS;
override proc next(): shared PrimeCIS {
var n = head + 2; var cs = this.cmpsts;
while true {
return new shared OddPrimesFrom(n, cs): shared PrimeCIS;
n += 2; cs = cs.next();
}
return this.cmpsts; // never used; keeps compiler happy!
}
}

class OddPrimes: PrimeCIS {
proc init() { this.head = 3; }
override proc next(): shared PrimeCIS {
const bps = new shared OddPrimes(): shared PrimeCIS;
const mlts = new shared AllMultiples(bps): shared PrimeCISCIS;
const cmpsts = new shared Composites(mlts): shared PrimeCIS;
return new shared OddPrimesFrom(5, cmpsts): shared PrimeCIS;
}
}

iter primes(): Prime {
yield 2; var cur = new shared OddPrimes(): shared PrimeCIS;
while true { yield cur.head; cur = cur.next(); }
}

// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

var timer: Timer; timer.start(); cnt = 0;
for p in primes() { if p > limit then break; cnt += 1; }
timer.stop(); write("\nFound ", cnt, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
```
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes up to 1000000 in 344.859 milliseconds.```

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

The above code is really just a toy example to show that Chapel can handle some tasks functionally (within the above stated limits) although doing so is slower than the Hash Table version above and also takes more memory as the nested lazy list structure consumes more memory in lazy list links and "plumbing" than does the simple implementation of a Hash Table. It also has a worst asymptotic performance with an extra `log(n)` factor where `n` is the sieving range; this can be shown by running the above program with `--limit=10000000` run time command line option to sieve to ten million which takes about 4.5 seconds to count the primes up to ten million (a factor of ten higher range, but much higher than the expected increased factor of about 10 per cent extra as per the Hash Table version with about 20 per cent more operations times the factor of ten for this version). Other than for the extra operations, this version is generally slower due to the time to do the many small allocations/de-allocations of the functional object instances, and this will be highly dependent on the platform on which it is run: cygwin on Windows may be particularly slow due to the extra level of indirection, and some on-line IDE's may also be slow due to their level of virtualization.

### A Multi-Threaded Page-Segmented Odds-Only Bit-Packed Version

Works with: 1.24.1 version - compile with the --fast compiler command line flag for full optimization
```use Time; use BitOps; use CPtr;

type Prime = uint(64);
type PrimeNdx = int(64);
type BasePrime = uint(32);

config const LIMIT = 1000000000: Prime;

config const L1 = 16; // CPU L1 cache size in Kilobytes (1024);
assert (L1 == 16 || L1 == 32 || L1 == 64,
"L1 cache size must be 16, 32, or 64 Kilobytes!");
config const L2 = 128; // CPU L2 cache size in Kilobytes (1024);
assert (L2 == 128 || L2 == 256 || L2 == 512,
"L2 cache size must be 128, 256, or 512 Kilobytes!");
const CPUL1CACHE: int = L1 * 1024 * 8; // size in bits!
const CPUL2CACHE: int = L2 * 1024 * 8; // size in bits!
assert(NUMTHRDS >= 1, "NUMTHRDS must be at least one!");

const WHLPRMS = [ 2: Prime, 3: Prime, 5: Prime, 7: Prime,
11: Prime, 13: Prime, 17: Prime];
const FRSTSVPRM = 19: Prime; // past the pre-cull primes!
// 2 eliminated as even; 255255 in bytes...
const WHLPTRNSPN = 3 * 5 * 7 * 11 * 13 * 17;
// rounded up to next 64-bit boundary plus a 16 Kilobyte buffer for overflow...
const WHLPTRNBTSZ = ((WHLPTRNSPN * 8 + 63) & (-64)) + 131072;

// number of base primes within small span!
const SZBPSTRTS = 6542 - WHLPRMS.size + 1; // extra one for marker!
// number of base primes for CPU L1 cache buffer!
const SZMNSTRTS = (if L1 == 16 then 12251 else
if L1 == 32 then 23000 else 43390)
- WHLPRMS.size + 1; // extra one for marker!

// using this Look Up Table faster than bit twiddling...
const bitmsk = for i in 0 .. 7 do 1:uint(8) << i;

var WHLPTRN: SieveBuffer = new SieveBuffer(WHLPTRNBTSZ); fillWHLPTRN(WHLPTRN);
proc fillWHLPTRN(ref wp: SieveBuffer) {
const hi = WHLPRMS.size - 1;
const rng = 0 .. hi; var whlhd = new shared BasePrimeArr({rng});
// contains wheel pattern primes skipping the small wheel prime (2)!...
// never advances past the first base prime arr as it ends with a huge!...
for i in rng do whlhd.bparr[i] = (if i != hi then WHLPRMS[i + 1] // skip 2!
else 0x7FFFFFFF): BasePrime; // last huge!
var whlbpas = new shared BasePrimeArrs(whlhd);
var whlstrts = new StrtsArr({rng});
wp.cull(0, WHLPTRNBTSZ, whlbpas, whlstrts);
// eliminate wheel primes from the WHLPTRN buffer!...
wp.cmpsts = 0xFF: uint(8);
}

// the following two must be classes for compability with sync...
class PrimeArr { var dom = { 0 .. -1 }; var prmarr: [dom] Prime; }
class BasePrimeArr { var dom = { 0 .. -1 }; var bparr: [dom] BasePrime; }
record StrtsArr { var dom = { 0 .. -1 }; var strtsarr: [dom] int(32); }
record SieveBuffer {
var dom = { 0 .. -1 }; var cmpsts: [dom] uint(8) = 0;
proc init() {}
proc init(btsz: int) { dom = { 0 .. btsz / 8 - 1 }; }
proc deinit() { dom = { 0 .. -1 }; }

proc fill(lwi: PrimeNdx) { // fill from the WHLPTRN stamp...
const sz = cmpsts.size; const mvsz = min(sz, 16384);
var mdlo = ((lwi / 8) % (WHLPTRNSPN: PrimeNdx)): int;
for i in 0 .. sz - 1 by 16384 {
c_memcpy(c_ptrTo(cmpsts[i]): c_void_ptr,
c_ptrTo(WHLPTRN.cmpsts[mdlo]): c_void_ptr, mvsz);
mdlo += 16384; if mdlo >= WHLPTRNSPN then mdlo -= WHLPTRNSPN;
}
}

proc count(btlmt: int) { // count by 64 bits using CPU popcount...
const lstwrd = btlmt / 64; const lstmsk = (-2):uint(64) << (btlmt & 63);
const cmpstsp = c_ptrTo(cmpsts: [dom] uint(8)): c_ptr(uint(64));
var i = 0; var cnt = (lstwrd * 64 + 64): int;
while i < lstwrd { cnt -= popcount(cmpstsp[i]): int; i += 1; }
return cnt - popcount(cmpstsp[lstwrd] | lstmsk): int;
}

// most of the time is spent doing culling operations as follows!...
proc cull(lwi: PrimeNdx, bsbtsz: int, bpas: BasePrimeArrs,
ref strts: StrtsArr) {
const btlmt = cmpsts.size * 8 - 1; const bplmt = bsbtsz / 32;
const ndxlmt = lwi: Prime + btlmt: Prime; // can't overflow!
const strtssz = strts.strtsarr.size;
// C pointer for speed magic!...
const cmpstsp = c_ptrTo(cmpsts);
const strtsp = c_ptrTo(strts.strtsarr);

// first fill the strts array with pre-calculated start addresses...
var i = 0; for bp in bpas {
// calculate page start address for the given base prime...
const bpi = bp: int; const bbp = bp: Prime; const ndx0 = (bbp - 3) / 2;
const s0 = (ndx0 + ndx0) * (ndx0 + 3) + 3; // can't overflow!
if s0 > ndxlmt then {
if i < strtssz then strtsp[i] = -1: int(32); break; }
var s = 0: int;
if s0 >= lwi: Prime then s = (s0 - lwi: Prime): int;
else { const r = (lwi: Prime - s0) % bbp;
if r == 0 then s = 0: int; else s = (bbp - r): int; };
if i < strtssz - 1 { strtsp[i] = s: int(32); i += 1; continue; }
if i < strtssz { strtsp[i] = -1; i = strtssz; }
// cull the full buffer for this given base prime as usual...
// only works up to limit of int(32)**2!!!!!!!!
while s <= btlmt { cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
}

// cull the smaller sub buffers according to the strts array...
for sbtlmt in bsbtsz - 1 .. btlmt by bsbtsz {
i = 0; for bp in bpas { // bp never bigger than uint(32)!
// cull the sub buffer for this given base prime...
var s = strtsp[i]: int; if s < 0 then break;
var bpi = bp: int; var nxt = 0x7FFFFFFFFFFFFFFF;
if bpi <= bplmt { // use loop "unpeeling" for a small improvement...
const slmt = s + bpi * 8 - 1;
while s <= slmt {
const bmi = s & 7; const msk = bitmsk[bmi];
var c = s >> 3; const clmt = sbtlmt >> 3;
while c <= clmt { cmpstsp[c] |= msk; c += bpi; }
nxt = min(nxt, (c << 3): int(64) | bmi: int(64)); s += bpi;
}
strtsp[i] = nxt: int(32); i += 1;
}
else { while s <= sbtlmt { // standard cull loop...
cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
strtsp[i] = s: int(32); i += 1; }
}
}
}
}

// a generic record that contains a page result generating function;
// allows manual iteration through the use of the next() method;
class PagedResults {
const cnvrtrclsr; // output converter closure emulator, (lwi, sba) => output
var lwi: PrimeNdx; var bsbtsz: int;
var bpas: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
var sbs: [ 0 .. NUMTHRDS - 1 ] SieveBuffer = new SieveBuffer();
var strts: [ 0 .. NUMTHRDS - 1 ] StrtsArr = new StrtsArr();
var qi: int = 0;
var wrkq\$: [ 0 .. NUMTHRDS - 1 ] sync PrimeNdx;
var rsltsq\$: [ 0 .. NUMTHRDS - 1 ] sync cnvrtrclsr(lwi, sbs(0)).type;

proc init(cvclsr, li: PrimeNdx, bsz: int) {
cnvrtrclsr = cvclsr; lwi = li; bsbtsz = bsz; }

proc deinit() { // kill the thread pool when out of scope...
if bpas == nil then return; // no thread pool!
for i in wrkq\$.domain {
wrkq\$[i].writeEF(-1); while true { const r = rsltsq\$[i].readFE();
if r == nil then break; }
}
}

proc next(): cnvrtrclsr(lwi, sbs(0)).type {
proc dowrk(ri: int) { // used internally!...
while true {
if li < 0 { rsltsq\$[ri].writeEF(nil: cnvrtrclsr(li, sbs(ri)).type); break; }
sbs[ri].fill(li);
sbs[ri].cull(li, bsbtsz, bpas!, strts[ri]);
rsltsq\$[ri].writeEF(cnvrtrclsr(li, sbs[ri]));
}
}
if this.bpas == nil { // init on first use; avoids data race!
this.bpas = new BasePrimeArrs();
if this.bsbtsz < CPUL1CACHE {
this.sbs = new SieveBuffer(bsbtsz);
this.strts = new StrtsArr({0 .. SZBPSTRTS - 1});
}
else {
this.sbs = new SieveBuffer(CPUL2CACHE);
this.strts = new StrtsArr({0 .. SZMNSTRTS - 1});
}
// start threadpool and give it inital work...
for i in rsltsq\$.domain {
begin with (const in i) dowrk(i);
this.wrkq\$[i].writeEF(this.lwi); this.lwi += this.sbs[i].cmpsts.size * 8;
}
}
this.wrkq\$[qi].writeEF(this.lwi);
this.lwi += this.sbs[qi].cmpsts.size * 8;
this.qi = if qi >= NUMTHRDS - 1 then 0 else qi + 1;
return rslt;
}

iter these() { while lwi >= 0 do yield next(); }
}

// the sieve buffer to base prime array converter closure...
record SB2BPArr {
proc this(lwi: PrimeNdx, sb: SieveBuffer): shared BasePrimeArr? {
const bsprm = (lwi + lwi + 3): BasePrime;
const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
var arr = new shared BasePrimeArr({ 0 .. sb.count(szlmt) - 1 });
while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 {
arr.bparr[j] = bsprm + (i + i): BasePrime; j += 1; }
i += 1; }
return arr;
}
}

// a memoizing lazy list of BasePrimeArr's...
class BasePrimeArrs {
var tail: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
var lock\$: sync bool = true;
var feed: shared PagedResults(SB2BPArr) =
new shared PagedResults(new SB2BPArr(), 65536, 65536);

proc init() { // make our own first array to break data race!
var sb = new SieveBuffer(256); sb.fill(0);
const sb2 = new SB2BPArr();
head = sb2(0, sb): shared BasePrimeArr;
this.complete(); // fake base primes!
sb = new SieveBuffer(65536); sb.fill(0);
// use (completed) self as source of base primes!
var strts = new StrtsArr({ 0 .. 256 });
sb.cull(0, 65536, this, strts);
// replace head with new larger version culled using fake base primes!...
head = sb2(0, sb): shared BasePrimeArr;
}

// for initializing for use by the fillWHLPTRN proc...
proc init(hd: shared BasePrimeArr) {
head = hd; feed = new shared PagedResults(new SB2BPArr(), 0, 0);
}

// for initializing lazily extended list as required...
proc init(hd: shared BasePrimeArr, fd: PagedResults) { head = hd; feed = fd; }

proc next(): shared BasePrimeArrs {
if this.tail == nil { // in case other thread slipped through!
if this.lock\$.readFE() && this.tail == nil { // empty sync -> block others!
const nhd = this.feed.next(): shared BasePrimeArr;
this.tail = new shared BasePrimeArrs(nhd , this.feed);
}
this.lock\$.writeEF(false); // fill the sync so other threads can do nothing!
}
return this.tail: shared BasePrimeArrs; // necessary cast!
}

iter these(): BasePrime {
for bp in head.bparr do yield bp; var cur = next();
while true {
for bp in cur.head.bparr do yield bp; cur = cur.next(); }
}
}

record SB2PrmArr {
proc this(lwi: PrimeNdx, sb: SieveBuffer): shared PrimeArr? {
const bsprm = (lwi + lwi + 3): Prime;
const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
var arr = new shared PrimeArr({0 .. sb.count(szlmt) - 1});
while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 then {
arr.prmarr[j] = bsprm + (i + i): Prime; j += 1; }
i += 1; }
return arr;
}
}

iter primes(): Prime {
for p in WHLPRMS do yield p: Prime;
for pa in new shared PagedResults(new SB2PrmArr(), 0, CPUL1CACHE) do
for p in pa!.prmarr do yield p;
}

// use a class so that it can be used as a generic sync value!...
class CntNxt { const cnt: int; const nxt: PrimeNdx; }

// a class that emulates a closure and a return value...
record SB2Cnt {
const nxtlmt: PrimeNdx;
proc this(lwi: PrimeNdx, sb: SieveBuffer): shared CntNxt? {
const btszlmt = sb.cmpsts.size * 8 - 1; const lstndx = lwi + btszlmt: PrimeNdx;
const btlmt = if lstndx > nxtlmt then max(0, (nxtlmt - lwi): int) else btszlmt;
return new shared CntNxt(sb.count(btlmt), lstndx);
}
}

// couut primes to limit, just like it says...
proc countPrimesTo(lmt: Prime): int(64) {
const nxtlmt = ((lmt - 3) / 2): PrimeNdx; var count = 0: int(64);
for p in WHLPRMS { if p > lmt then break; count += 1; }
if lmt < FRSTSVPRM then return count;
for cn in new shared PagedResults(new SB2Cnt(nxtlmt), 0, CPUL1CACHE) {
count += cn!.cnt: int(64); if cn!.nxt >= nxtlmt then break;
}
return count;
}

// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }

cnt = 0; for p in primes() { if p > 1000000 then break; cnt += 1; }
writeln("\nThere are ", cnt, " primes up to a million.");

write("Sieving to ", LIMIT, " with ");
write("CPU L1/L2 cache sizes of ", L1, "/", L2, " KiloBytes ");

var timer: Timer; timer.start();
// the slow way!:
// var count = 0; for p in primes() { if p > LIMIT then break; count += 1; }
const count = countPrimesTo(LIMIT); // the fast way!
timer.stop();

write("Found ", count, " primes up to ", LIMIT);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
```
Output:
```The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to a million.
Sieving to 1000000000 with CPU L1/L2 cache sizes of 16/128 KiloBytes using 4 threads.
Found 50847534 primes up to 1000000000 in 128.279 milliseconds.```

Time as run using Chapel version 1.24.1 on an Intel Skylake i5-6500 at 3.2 GHz (base, multi-threaded).

Note that the above code does implement some functional concepts as in a memoized lazy list of base prime arrays, but as this is used at the page level, the slowish performance doesn't impact the overall execution time much and the code is much more elegant in using this concept such that we compute new pages of base primes as they are required for increasing range.

Some of the most tricky bits due to having thread pools is stopping and de-initializing when they go out of scope; this is done by the `deinit` method of the `PagedResults` generic class, and was necessary to prevent a segmentation fault when the thread pool goes out of scope.

The tight inner loops for culling composite number representations have been optimized to some extent in using "loop unpeeling" for smaller base primes to simplify the loops down to simple masking by a constant with eight separate loops for the repeating pattern over bytes and culling by sub buffer CPU L1 cache sizes over the outer sieve buffer size of the CPU L2 cache size in order to make the task work-sized chunks larger for less task context switching overheads and for reduced time lost to culling start address calculations per base prime (which needs to use some integer division that is always slower than other integer operations). This last optimization allows for reasonably efficient culling up to the square of the CPU L2 cache size in bits or 1e12 for the one Megabit CPU L2 cache size many mid-range Intel CPU's have currently when used for multi-threading (half of the actual size for Hyper-Threaded - HT - threads as they share both the L1 and the L2 caches over the pairs of Hyper-Threaded (HT) threads per core).

Although this code can be used for much higher sieving ranges, it is not recommended due to not yet being tuned for better efficiency above 1e12; there are no checks limiting the user to this range, but, as well as decreasing efficiency for sieving limits much higher than this, at some point there will be errors due to integer overflows but these will be for huge sieving ranges taking days -> weeks -> months -> years to execute on common desktop CPU's.

A further optimization used is to create a pre-culled `WHLPTRN` `SieveBuffer` where the odd primes (since we cull odds-only) of 3, 5, 7, 11, 13, and 17 have already been culled and using that to pre-fill the page segment buffers so that no culling by these base prime values is required, this reduces the number of operations by about 45% compared to if it wasn't done but the ratio of better performance is only about 34.5% better as this changes the ratio of (fast) smaller base primes to larger (slower) ones.

All of the improvements to this point allow the shown performance as per the displayed output for the above program; using a command line argument of `--L1=32 --L2=256 --LIMIT=100000000000` (a hundred billion - 1e11 - on this computer, which has cache sizes of that amount and no Hyper-Threading - HT), it can count the primes to 1e11 in about 17.5 seconds using the above mentioned CPU. It will be over two times faster than this using a more modern desktop CPU such as the Intel Core i7-9700K which has twice as many effective cores, a higher CPU clock rate, is about 10% to 15% faster due the a more modern CPU architecture which is three generations newer. Of course using a top end AMD Threadripper CPU with its 64/128 cores/threads will be almost eight times faster again except that it will lose about 20% due to its slower clock speed when all cores/threads are used; note that high core CPU's will only give these speed gains for large sieving ranges such as 1e11 and above since otherwise there aren't enough work chunks to go around for all the threads available!

Incredibly, even run single threaded (argument of `--NUMTHRDS=1`) this implementation is only about 20% slower than the reference Sieve of Atkin "primegen/primespeed" implementation in counting the number of primes to a billion and is about 20% faster in counting the primes to a hundred billion (arguments of `--LIMIT=100000000000 --NUMTHRDS=1`) with both using the same size of CPU L1 cache buffer of 16 Kilobytes; This implementation does not yet have the level of wheel optimization of the Sieve of Atkin as it has only the limited wheel optimization of Odds-Only plus the use of the pre-cull fill. Maximum wheel factorization will reduce the number of operations for this code to less than about half the current number, making it faster than the Sieve of Atkin for all ranges, and approach the speed of Kim Walisch's "primesieve". However, not having primitive element pointers and pointer operations, there are some optimizations used that Kim Walisch's "primesieve" uses of extreme loop unrolling that mean that it can never quite reach the speed of "primeseive" by about 20% to 30%.

The above code is a fairly formidable benchmark, which I have also written in Fortran as in likely the major computer language that is comparable. I see that Chapel has the following advantages over Fortran:

1) It is somewhat cleaner to read and write code with more modern forms of expression, especially as to declaring variables/constants which can often be inferred as to type.

2) The Object Oriented Programming paradigm has been designed in from the beginning and isn't just an add-on that needs to be careful not to break legacy code; Fortran's method of expression this paradigm using modules seems awkward by comparison.

3) It has some more modern forms of automatic memory management as to type safety and sharing of allocated memory structures.

4) It has several modern forms of managing concurrency built in from the beginning rather than being add-on's or just being the ability to call through to OpenMP/MPI.

That said, it also as the following disadvantages, at least as I see it:

1) One of the worst things about Chapel is the slow compilation speed, which is about ten times slower than GNU gfortran.

2) It's just my personal opinion, but so much about forms of expression have been modernized and improved, it seems very dated to go back to using curly braces to delineate code blocks and semi-colons as line terminators; Most modern languages at least dispense with the latter.

3) Some programming features offered are still being defined, although most evolutionary changes now no longer are breaking code changes.

Speed isn't really an issue with either one, with some types of tasks better suited to one or the other but mostly about the same; for this particular task they are about the same if one were to implement the same algorithmic optimizations other than that one can do some of the extreme loop unrolling optimization with Fortran that can't be done with Chapel as Fortran has some limited form of pointers, although not the full set of pointer operators that C/C++ like languages have. I think that if both were optimized as much as each is capable, Fortran may run about 20% faster, perhaps due to the maturity of its compile and due to the availablity of (limited) pointer operations.

The primary additional optimization available to Chapel code is the addition of Maximum Wheel-Factorization as per my StackOverflow JavaScript Tutorial answer, with the other major improvement to add "bucket sieving" for sieving limits above about 1e12 so as to get reasonable efficiency up to 1e16 and above.

## Clojure

```(defn primes< [n]
(remove (set (mapcat #(range (* % %) n %)
(range 2 (Math/sqrt n))))
(range 2 n)))
```

The above is **not strictly a Sieve of Eratosthenes** as the composite culling ranges (in the mapcat) include all of the multiples of all of the numbers and not just the multiples of primes. When tested with `(println (time (count (primes< 1000000))))`, it takes about 5.5 seconds just to find the number of primes up to a million, partly because of the extra work due to the use of the non-primes, and partly because of the constant enumeration using sequences with multiple levels of function calls. Although very short, this code is likely only useful up to about this range of a million.

It may be written using the into #{} function to run slightly faster due to the set function being concerned with only distinct elements whereas the into #{} only does the conjunction, and even at that doesn't do that much as it does the conjunction to an empty sequence, the code as follows:

```(defn primes< [n]
(remove (into #{}
(mapcat #(range (* % %) n %)
(range 2 (Math/sqrt n))))
(range 2 n)))
```

The above code is slightly faster for the reasons given, but is still not strictly a Sieve of Eratosthenes due to sieving by all numbers and not just by the base primes.

The following code also uses the into #{} transducer but has been slightly wheel-factorized to sieve odds-only:

```(defn primes< [n]
(if (< n 2) ()
(cons 2 (remove (into #{}
(mapcat #(range (* % %) n %)
(range 3 (Math/sqrt n) 2)))
(range 3 n 2)))))
```

The above code is a little over twice as fast as the non-odds-only due to the reduced number of operations. It still isn't strictly a Sieve of Eratosthenes as it sieves by all odd base numbers and not only by the base primes.

The following code calculates primes up to and including n using a mutable boolean array but otherwise entirely functional code; it is tens (to a hundred) times faster than the purely functional codes due to the use of mutability in the boolean array:

```(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
(let [root (-> n Math/sqrt long),
cmpsts (boolean-array (inc n)),
cullp (fn [p]
(loop [i (* p p)]
(if (<= i n)
(do (aset cmpsts i true)
(recur (+ i p))))))]
(do (dorun (map #(cullp %) (filter #(not (aget cmpsts %))
(range 2 (inc root)))))
(filter #(not (aget cmpsts %)) (range 2 (inc n))))))
```

Alternative implementation using Clojure's side-effect oriented list comprehension.

``` (defn primes-to
"Returns a lazy sequence of prime numbers less than lim"
[lim]
(let [refs (boolean-array (+ lim 1) true)
root (int (Math/sqrt lim))]
(do (doseq [i (range 2 lim)
:while (<= i root)
:when (aget refs i)]
(doseq [j (range (* i i) lim i)]
(aset refs j false)))
(filter #(aget refs %) (range 2 lim)))))
```

Alternative implementation using Clojure's side-effect oriented list comprehension. Odds only.

```(defn primes-to
"Returns a lazy sequence of prime numbers less than lim"
[lim]
(let [max-i (int (/ (- lim 1) 2))
refs (boolean-array max-i true)
root (/ (dec (int (Math/sqrt lim))) 2)]
(do (doseq [i (range 1 (inc root))
:when (aget refs i)]
(doseq [j (range (* (+ i i) (inc i)) max-i (+ i i 1))]
(aset refs j false)))
(cons 2 (map #(+ % % 1) (filter #(aget refs %) (range 1 max-i)))))))
```

This implemantation is about twice as fast as the previous one and uses only half the memory. From the index of the array, it calculates the value it represents as (2*i + 1), the step between two indices that represent the multiples of primes to mark as composite is also (2*i + 1). The index of the square of the prime to start composite marking is 2*i*(i+1).

Alternative very slow entirely functional implementation using lazy sequences

```(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
(letfn [(nxtprm [cs] ; current candidates
(let [p (first cs)]
(if (> p (Math/sqrt n)) cs
(cons p (lazy-seq (nxtprm (-> (range (* p p) (inc n) p)
set (remove cs) rest)))))))]
(nxtprm (range 2 (inc n)))))
```

The reason that the above code is so slow is that it has has a high constant factor overhead due to using a (hash) set to remove the composites from the future composites stream, each prime composite stream removal requires a scan across all remaining composites (compared to using an array or vector where only the culled values are referenced, and due to the slowness of Clojure sequence operations as compared to iterator/sequence operations in other languages.

Version based on immutable Vector's

Here is an immutable boolean vector based non-lazy sequence version other than for the lazy sequence operations to output the result:

```(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[max-prime]
(let [sieve (fn [s n]
(if (<= (* n n) max-prime)
(recur (if (s n)
(reduce #(assoc %1 %2 false) s (range (* n n) (inc max-prime) n))
s)
(inc n))
s))]
(->> (-> (reduce conj (vector-of :boolean) (map #(= % %) (range (inc max-prime))))
(assoc 0 false)
(assoc 1 false)
(sieve 2))
(map-indexed #(vector %2 %1)) (filter first) (map second))))
```

The above code is still quite slow due to the cost of the immutable copy-on-modify operations.

Odds only bit packed mutable array based version

The following code implements an odds-only sieve using a mutable bit packed long array, only using a lazy sequence for the output of the resulting primes:

```(set! *unchecked-math* true)

(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
(let [root (-> n Math/sqrt long),
rootndx (long (/ (- root 3) 2)),
ndx (long (/ (- n 3) 2)),
cmpsts (long-array (inc (/ ndx 64))),
isprm #(zero? (bit-and (aget cmpsts (bit-shift-right % 6))
(bit-shift-left 1 (bit-and % 63)))),
cullp (fn [i]
(let [p (long (+ i i 3))]
(loop [i (bit-shift-right (- (* p p) 3) 1)]
(if (<= i ndx)
(do (let [w (bit-shift-right i 6)]
(aset cmpsts w (bit-or (aget cmpsts w)
(bit-shift-left 1 (bit-and i 63)))))
(recur (+ i p))))))),
cull (fn [] (loop [i 0] (if (<= i rootndx)
(do (if (isprm i) (cullp i)) (recur (inc i))))))]
(letfn [(nxtprm [i] (if (<= i ndx)
(cons (+ i i 3) (lazy-seq (nxtprm (loop [i (inc i)]
(if (or (> i ndx) (isprm i)) i
(recur (inc i)))))))))]
(if (< n 2) nil
(cons 3 (if (< n 3) nil (do (cull) (lazy-seq (nxtprm 0)))))))))
```

The above code is about as fast as any "one large sieving array" type of program in any computer language with this level of wheel factorization other than the lazy sequence operations are quite slow: it takes about ten times as long to enumerate the results as it does to do the actual sieving work of culling the composites from the sieving buffer array. The slowness of sequence operations is due to nested function calls, but primarily due to the way Clojure implements closures by "boxing" all arguments (and perhaps return values) as objects in the heap space, which then need to be "un-boxed" as primitives as necessary for integer operations. Some of the facilities provided by lazy sequences are not needed for this algorithm, such as the automatic memoization which means that each element of the sequence is calculated only once; it is not necessary for the sequence values to be retraced for this algorithm.

If further levels of wheel factorization were used, the time to enumerate the resulting primes would be an even higher overhead as compared to the actual composite number culling time, would get even higher if page segmentation were used to limit the buffer size to the size of the CPU L1 cache for many times better memory access times, most important in the culling operations, and yet higher again if multi-processing were used to share to page segment processing across CPU cores.

The following code overcomes many of those limitations by using an internal (OPSeq) "deftype" which implements the ISeq interface as well as the Counted interface to provide immediate count returns (based on a pre-computed total), as well as the IReduce interface which can greatly speed come computations based on the primes sequence (eased greatly using facilities provided by Clojure 1.7.0 and up):

```(defn primes-tox
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
(let [root (-> n Math/sqrt long),
rootndx (long (/ (- root 3) 2)),
ndx (max (long (/ (- n 3) 2)) 0),
lmt (quot ndx 64),
cmpsts (long-array (inc lmt)),
cullp (fn [i]
(let [p (long (+ i i 3))]
(loop [i (bit-shift-right (- (* p p) 3) 1)]
(if (<= i ndx)
(do (let [w (bit-shift-right i 6)]
(aset cmpsts w (bit-or (aget cmpsts w)
(bit-shift-left 1 (bit-and i 63)))))
(recur (+ i p))))))),
cull (fn [] (do (aset cmpsts lmt (bit-or (aget cmpsts lmt)
(bit-shift-left -2 (bit-and ndx 63))))
(loop [i 0]
(when (<= i rootndx)
(when (zero? (bit-and (aget cmpsts (bit-shift-right i 6))
(bit-shift-left 1 (bit-and i 63))))
(cullp i))
(recur (inc i))))))
numprms (fn []
(let [w (dec (alength cmpsts))] ;; fast results count bit counter
(loop [i 0, cnt (bit-shift-left (alength cmpsts) 6)]
(if (> i w) cnt
(recur (inc i)
(- cnt (java.lang.Long/bitCount (aget cmpsts i))))))))]
(if (< n 2) nil
(cons 2 (if (< n 3) nil
(do (cull)
(deftype OPSeq [^long i ^longs cmpsa ^long cnt ^long tcnt] ;; for arrays maybe need to embed the array so that it doesn't get garbage collected???
clojure.lang.ISeq
(first [_] (if (nil? cmpsa) nil (+ i i 3)))
(next [_] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) nil
(OPSeq.
(loop [j (inc i)]
(let [p? (zero? (bit-and (aget cmpsa (bit-shift-right j 6))
(bit-shift-left 1 (bit-and j 63))))]
(if p? j (recur (inc j)))))
cmpsa ncnt tcnt))))
(more [this] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) (OPSeq. 0 nil tcnt tcnt)
(.next this))))
(cons [this o] (clojure.core/cons o this))
(empty [_] (if (= cnt tcnt) nil (OPSeq. 0 nil tcnt tcnt)))
(equiv [this o] (if (or (not= (type this) (type o))
(not= cnt (.cnt ^OPSeq o)) (not= tcnt (.tcnt ^OPSeq o))
(not= i (.i ^OPSeq o))) false true))
clojure.lang.Counted
(count [_] (- tcnt cnt))
clojure.lang.Seqable
(clojure.lang.Seqable/seq [this] (if (= cnt tcnt) nil this))
clojure.lang.IReduce
(reduce [_ f v] (let [c (- tcnt cnt)]
(if (<= c 0) nil
(loop [ci i, n c, rslt v]
(if (zero? (bit-and (aget cmpsa (bit-shift-right ci 6))
(bit-shift-left 1 (bit-and ci 63))))
(let [rrslt (f rslt (+ ci ci 3)),
rdcd (reduced? rrslt),
nrslt (if rdcd @rrslt rrslt)]
(if (or (<= n 1) rdcd) nrslt
(recur (inc ci) (dec n) nrslt)))
(recur (inc ci) n rslt))))))
(reduce [this f] (if (nil? i) (f) (if (= (.count this) 1) (+ i i 3)
(.reduce ^clojure.lang.IReduce (.next this) f (+ i i 3)))))
clojure.lang.Sequential
Object
(toString [this] (if (= cnt tcnt) "()"
(.toString (seq (map identity this))))))
(->OPSeq 0 cmpsts 0 (numprms))))))))
```

'(time (count (primes-tox 10000000)))' takes about 40 milliseconds (compiled) to produce 664579.

Due to the better efficiency of the custom CIS type, the primes to the above range can be enumerated in about the same 40 milliseconds that it takes to cull and count the sieve buffer array.

Under Clojure 1.7.0, one can use '(time (reduce (fn [] (+ (long sum) (long v))) 0 (primes-tox 2000000)))' to find "142913828922" as the sum of the primes to two million as per Euler Problem 10 in about 40 milliseconds total with about half the time used for sieving the array and half for computing the sum.

To show how sensitive Clojure is to forms of expression of functions, the simple form '(time (reduce + (primes-tox 2000000)))' takes about twice as long even though it is using the same internal routine for most of the calculation due to the function not having the type coercion's.

Before one considers that this code is suitable for larger ranges, it is still lacks the improvements of page segmentation with pages about the size of the CPU L1/L2 caches (produces about a four times speed up), maximal wheel factorization (to make it another about four times faster), and the use of multi-processing (for a further gain of about 4 times for a multi-core desktop CPU such as an Intel i7), will make the sieving/counting code about 50 times faster than this, although there will only be a moderate improvement in the time to enumerate/process the resulting primes. Using these techniques, the number of primes to one billion can be counted in a small fraction of a second.

### Unbounded Versions

For some types of problems such as finding the nth prime (rather than the sequence of primes up to m), a prime sieve with no upper bound is a better tool.

The following variations on an incremental Sieve of Eratosthenes are based on or derived from the Richard Bird sieve as described in the Epilogue of Melissa E. O'Neill's definitive paper:

A Clojure version of Richard Bird's Sieve using Lazy Sequences (sieves odds only)

```(defn primes-Bird
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm by Richard Bird."
[]
(letfn [(mltpls [p] (let [p2 (* 2 p)]
(letfn [(nxtmltpl [c]
(cons c (lazy-seq (nxtmltpl (+ c p2)))))]
(nxtmltpl (* p p))))),
(allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),
(union [xs ys] (let [xv (first xs), yv (first ys)]
(if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))
(if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))
(cons xv (lazy-seq (union (next xs) (next ys)))))))),
(mrgmltpls [mltplss] (cons (first (first mltplss))
(lazy-seq (union (next (first mltplss))
(mrgmltpls (next mltplss)))))),
(minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]
(if (< n (first cmpsts))
(cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))
(recur (+ n 2) (next cmpsts)))))]
(do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
(cons 2 (lazy-seq oddprms)))))
```

The above code is quite slow due to both that the data structure is a linear merging of prime multiples and due to the slowness of the Clojure sequence operations.

A Clojure version of the tree folding sieve using Lazy Sequences

The following code speeds up the above code by merging the linear sequence of sequences as above by pairs into a right-leaning tree structure:

```(defn primes-treeFolding
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
[]
(letfn [(mltpls [p] (let [p2 (* 2 p)]
(letfn [(nxtmltpl [c]
(cons c (lazy-seq (nxtmltpl (+ c p2)))))]
(nxtmltpl (* p p))))),
(allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),
(union [xs ys] (let [xv (first xs), yv (first ys)]
(if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))
(if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))
(cons xv (lazy-seq (union (next xs) (next ys)))))))),
(pairs [mltplss] (let [tl (next mltplss)]
(cons (union (first mltplss) (first tl))
(lazy-seq (pairs (next tl)))))),
(mrgmltpls [mltplss] (cons (first (first mltplss))
(lazy-seq (union (next (first mltplss))
(mrgmltpls (pairs (next mltplss))))))),
(minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]
(if (< n (first cmpsts))
(cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))
(recur (+ n 2) (next cmpsts)))))]
(do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
(cons 2 (lazy-seq oddprms)))))
```

The above code is still slower than it should be due to the slowness of Clojure's sequence operations.

A Clojure version of the above tree folding sieve using a custom Co Inductive Sequence

The following code uses a custom "deftype" non-memoizing Co Inductive Stream/Sequence (CIS) implementing the ISeq interface to make the sequence operations more efficient and is about four times faster than the above code:

```(deftype CIS [v cont]
clojure.lang.ISeq
(first [_] v)
(next [_] (if (nil? cont) nil (cont)))
(more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))
(cons [this o] (clojure.core/cons o this))
(empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))
(equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)
(if (or (not= (type cis1) (type cis2))
(not= (.v cis1) (.v ^CIS cis2))
(and (nil? (.cont cis1))
(not (nil? (.cont ^CIS cis2))))
(and (nil? (.cont ^CIS cis2))
(not (nil? (.cont cis1))))) false
(if (nil? (.cont cis1)) true
(recur ((.cont cis1)) ((.cont ^CIS cis2))))))))
(count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt
(recur ((.cont cis)) (inc cnt)))))
clojure.lang.Seqable
(seq [this] (if (and (nil? v) (nil? cont)) nil this))
clojure.lang.Sequential
Object
(toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))

(defn primes-treeFoldingx
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
[]
(letfn [(mltpls [p] (let [p2 (* 2 p)]
(letfn [(nxtmltpl [c]
(->CIS c (fn [] (nxtmltpl (+ c p2)))))]
(nxtmltpl (* p p))))),
(allmtpls [^CIS ps] (->CIS (mltpls (.v ps)) (fn [] (allmtpls ((.cont ps)))))),
(union [^CIS xs ^CIS ys] (let [xv (.v xs), yv (.v ys)]
(if (< xv yv) (->CIS xv (fn [] (union ((.cont xs)) ys)))
(if (< yv xv) (->CIS yv (fn [] (union xs ((.cont ys)))))
(->CIS xv (fn [] (union (next xs) ((.cont ys))))))))),
(pairs [^CIS mltplss] (let [^CIS tl ((.cont mltplss))]
(->CIS (union (.v mltplss) (.v tl))
(fn [] (pairs ((.cont tl))))))),
(mrgmltpls [^CIS mltplss] (->CIS (.v ^CIS (.v mltplss))
(fn [] (union ((.cont ^CIS (.v mltplss)))
(mrgmltpls (pairs ((.cont mltplss)))))))),
(minusStrtAt [n ^CIS cmpsts] (loop [n n, cmpsts cmpsts]
(if (< n (.v cmpsts))
(->CIS n (fn [] (minusStrtAt (+ n 2) cmpsts)))
(recur (+ n 2) ((.cont cmpsts))))))]
(do (def oddprms (->CIS 3 (fn [] (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
(->CIS 2 (fn [] oddprms)))))
```

'(time (count (take-while #(<= (long %) 10000000) (primes-treeFoldingx))))' takes about 3.4 seconds for a range of 10 million.

The above code is useful for ranges up to about fifteen million primes, which is about the first million primes; it is comparable in speed to all of the bounded versions except for the fastest bit packed version which can reasonably be used for ranges about 100 times as large.

Incremental Hash Map based unbounded "odds-only" version

The following code is a version of the O'Neill Haskell code but does not use wheel factorization other than for sieving odds only (although it could be easily added) and uses a Hash Map (constant amortized access time) rather than a Priority Queue (log n access time for combined remove-and-insert-anew operations, which are the majority used for this algorithm) with a lazy sequence for output of the resulting primes; the code has the added feature that it uses a secondary base primes sequence generator and only adds prime culling sequences to the composites map when they are necessary, thus saving time and limiting storage to only that required for the map entries for primes up to the square root of the currently sieved number:

```(defn primes-hashmap
"Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Hashmap"
[]
(letfn [(nxtoddprm [c q bsprms cmpsts]
(if (>= c q) ;; only ever equal
(let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]
(recur (+ c 2) (* nbp nbp) nbps (assoc cmpsts (+ q p2) p2)))
(if (contains? cmpsts c)
(recur (+ c 2) q bsprms
(let [adv (cmpsts c), ncmps (dissoc cmpsts c)]
(assoc ncmps
(loop [try (+ c adv)] ;; ensure map entry is unique
(if (contains? ncmps try)
(cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts))))))]
(do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms {}))))
(cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms {}))))))
```

The above code is slower than the best tree folding version due to the added constant factor overhead of computing the hash functions for every hash map operation even though it has computational complexity of (n log log n) rather than the worse (n log n log log n) for the previous incremental tree folding sieve. It is still about 100 times slower than the sieve based on the bit-packed mutable array due to these constant factor hashing overheads.

There is almost no benefit of converting the above code to use a CIS as most of the time is expended in the hash map functions.

Incremental Priority Queue based unbounded "odds-only" version

In order to implement the O'Neill Priority Queue incremental Sieve of Eratosthenes algorithm, one requires an efficient implementation of a Priority Queue, which is not part of standard Clojure. For this purpose, the most suitable Priority Queue is a binary tree heap based MinHeap algorithm. The following code implements a purely functional (using entirely immutable state) MinHeap Priority Queue providing the required functions of (emtpy-pq) initialization, (getMin-pq pq) to examinte the minimum key/value pair in the queue, (insert-pq pq k v) to add entries to the queue, and (replaceMinAs-pq pq k v) to replaace the minimum entry with a key/value pair as given (it is more efficient that if functions were provided to delete and then re-insert entries in the queue; there is therefore no "delete" or other queue functions supplied as the algorithm does not requrie them:

```(deftype PQEntry [k, v]
Object
(toString [_] (str "<" k "," v ">")))
(deftype PQNode [ntry, lft, rght]
Object
(toString [_] (str "<" ntry " left: " (str lft) " right: " (str rght) ">")))

(defn empty-pq [] nil)

(defn getMin-pq [^PQNode pq]
(if (nil? pq)
nil
(.ntry pq)))

(defn insert-pq [^PQNode opq ok v]
(loop [^PQEntry kv (->PQEntry ok v), pq opq, cont identity]
(if (nil? pq)
(cont (->PQNode kv nil nil))
(let [k (.k kv),
^PQEntry kvn (.ntry pq), kn (.k kvn),
l (.lft pq), r (.rght pq)]
(if (<= k kn)
(recur kvn r #(cont (->PQNode kv % l)))
(recur kv r #(cont (->PQNode kvn % l))))))))

(defn replaceMinAs-pq [^PQNode opq k v]
(let [^PQEntry kv (->PQEntry k v)]
(if (nil? opq) ;; if was empty or just an entry, just use current entry
(->PQNode kv nil nil)
(loop [pq opq, cont identity]
(let [^PQNode l (.lft pq), ^PQNode r (.rght pq)]
(cond ;; if left us empty, right must be too
(nil? l)
(cont (->PQNode kv nil nil)),
(nil? r) ;; we only have a left...
(let [^PQEntry kvl (.ntry l), kl (.k kvl)]
(if (<= k kl)
(cont (->PQNode kv l nil))
(recur l #(cont (->PQNode kvl % nil))))),
:else (let [^PQEntry kvl (.ntry l), kl (.k kvl),
^PQEntry kvr (.ntry r), kr (.k kvr)] ;; we have both
(if (and (<= k kl) (<= k kr))
(cont (->PQNode kv l r))
(if (<= kl kr)
(recur l #(cont (->PQNode kvl % r)))
(recur r #(cont (->PQNode kvr l %))))))))))))
```

Note that the above code is written partially using continuation passing style so as to leave the "recur" calls in tail call position as required for efficient looping in Clojure; for practical sieving ranges, the algorithm could likely use just raw function recursion as recursion depth is unlikely to be used beyond a depth of about ten or so, but raw recursion is said to be less code efficient.

The actual incremental sieve using the Priority Queue is as follows, which code uses the same optimizations of postponing the addition of prime composite streams to the queue until the square root of the currently sieved number is reached and using a secondary base primes stream to generate the primes composite stream markers in the queue as was used for the Hash Map version:

```(defn primes-pq
"Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Priority Queue"
[]
(letfn [(nxtoddprm [c q bsprms cmpsts]
(if (>= c q) ;; only ever equal
(let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]
(recur (+ c 2) (* nbp nbp) nbps (insert-pq cmpsts (+ q p2) p2)))
(let [mn (getMin-pq cmpsts)]
(if (and mn (>= c (.k mn))) ;; never greater than
(recur (+ c 2) q bsprms
(loop [adv (.v mn), cmps cmpsts] ;; advance repeat composites for value
nmn (getMin-pq ncmps)]
(if (and nmn (>= c (.k nmn)))
(recur (.v nmn) ncmps)
ncmps))))
(cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts)))))))]
(do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms (empty-pq)))))
(cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms (empty-pq)))))))
```

The above code is faster than the Hash Map version up to about a sieving range of fifteen million or so, but gets progressively slower for larger ranges due to having (n log n log log n) computational complexity rather than the (n log log n) for the Hash Map version, which has a higher constant factor overhead that is overtaken by the extra "log n" factor.

It is slower that the fastest of the tree folding versions (which has the same computational complexity) due to the higher constant factor overhead of the Priority Queue operations (although perhaps a more efficient implementation of the MinHeap Priority Queue could be developed).

Again, these non-mutable array based sieves are about a hundred times slower than even the "one large memory buffer array" version as implemented in the bounded section; a page segmented version of the mutable bit-packed memory array would be several times faster.

All of these algorithms will respond to maximum wheel factorization, getting up to approximately four times faster if this is applied as compared to the the "odds-only" versions.

It is difficult if not impossible to apply efficient multi-processing to the above versions of the unbounded sieves as the next values of the primes sequence are dependent on previous changes of state for the Bird and Tree Folding versions; however, with the addition of a "update the whole Priority Queue (and reheapify)" or "update the Hash Map" to a given page start state functions, it is possible to do for these letter two algorithms; however, even though it is possible and there is some benefit for these latter two implementations, the benefit is less than using mutable arrays due to that the results must be enumerated into a data structure of some sort in order to be passed out of the page function whereas they can be directly enumerated from the array for the mutable array versions.

Bit packed page segmented array unbounded "odds-only" version

To show that Clojure does not need to be particularly slow, the following version runs about twice as fast as the non-segmented unbounded array based version above (extremely fast compared to the non-array based versions) and only a little slower than other equivalent versions running on virtual machines: C# or F# on DotNet or Java and Scala on the JVM:

```(set! *unchecked-math* true)

(def PGSZ (bit-shift-left 1 14)) ;; size of CPU cache
(def PGBTS (bit-shift-left PGSZ 3))
(def PGWRDS (bit-shift-right PGBTS 5))
(def BPWRDS (bit-shift-left 1 7)) ;; smaller page buffer for base primes
(def BPBTS (bit-shift-left BPWRDS 5))
(defn- count-pg
"count primes in the culled page buffer, with test for limit"
[lmt ^ints pg]
(let [pgsz (alength pg),
pgbts (bit-shift-left pgsz 5),
cntem (fn [lmtw]
(let [lmtw (long lmtw)]
(loop [i (long 0), c (long 0)]
(if (>= i lmtw) (- (bit-shift-left lmtw 5) c)
(recur (inc i)
(+ c (java.lang.Integer/bitCount (aget pg i))))))))]
(if (< lmt pgbts)
(let [lmtw (bit-shift-right lmt 5),
lmtb (bit-and lmt 31)
msk (bit-shift-left -2 lmtb)]
(+ (cntem lmtw)
(- 32 (java.lang.Integer/bitCount (bit-or (aget pg lmtw)
msk)))))
(- pgbts
(areduce pg i ret (long 0) (+ ret (java.lang.Integer/bitCount (aget pg i))))))))
;;      (cntem pgsz))))
(defn- primes-pages
"unbounded Sieve of Eratosthenes producing a lazy sequence of culled page buffers."
[]
(letfn [(make-pg [lowi pgsz bpgs]
(let [lowi (long lowi),
pgbts (long (bit-shift-left pgsz 5)),
pgrng (long (+ (bit-shift-left (+ lowi pgbts) 1) 3)),
^ints pg (int-array pgsz),
cull (fn [bpgs']
(loop [i (long 0), bpgs' bpgs']
(let [^ints fbpg (first bpgs'),
bpgsz (long (alength fbpg))]
(if (>= i bpgsz)
(recur 0 (next bpgs'))
(let [p (long (aget fbpg i)),
sqr (long (* p p))]
(if (< sqr pgrng) (do
(loop [j (long (let [s (long (bit-shift-right (- sqr 3) 1))]
(if (>= s lowi) (- s lowi)
(let [m (long (rem (- lowi s) p))]
(if (zero? m)
0
(- p m))))))]
(if (< j pgbts) ;; fast inner culling loop where most time is spent
(do
(let [w (bit-shift-right j 5)]
(aset pg w (int (bit-or (aget pg w)
(bit-shift-left 1 (bit-and j 31))))))
(recur (+ j p)))))
(recur (inc i) bpgs'))))))))]
(do (if (nil? bpgs)
(letfn [(mkbpps [i]
(if (zero? (bit-and (aget pg (bit-shift-right i 5))
(bit-shift-left 1 (bit-and i 31))))
(cons (int-array 1 (+ i i 3)) (lazy-seq (mkbpps (inc i))))
(recur (inc i))))]
(cull (mkbpps 0)))
(cull bpgs))
pg))),
(page-seq [lowi pgsz bps]
(letfn [(next-seq [lwi]
(cons (make-pg lwi pgsz bps)
(lazy-seq (next-seq (+ lwi (bit-shift-left pgsz 5))))))]
(next-seq lowi)))
(pgs->bppgs [ppgs]
(letfn [(nxt-pg [lowi pgs]
(let [^ints pg (first pgs),
cnt (count-pg BPBTS pg),
npg (int-array cnt)]
(do (loop [i 0, j 0]
(if (< i BPBTS)
(if (zero? (bit-and (aget pg (bit-shift-right i 5))
(bit-shift-left 1 (bit-and i 31))))
(do (aset npg j (+ (bit-shift-left (+ lowi i) 1) 3))
(recur (inc i) (inc j)))
(recur (inc i) j))))
(cons npg (lazy-seq (nxt-pg (+ lowi BPBTS) (next pgs)))))))]
(nxt-pg 0 ppgs))),
(make-base-prms-pgs []
(pgs->bppgs (cons (make-pg 0 BPWRDS nil)
(lazy-seq (page-seq BPBTS BPWRDS (make-base-prms-pgs))))))]
(page-seq 0 PGWRDS (make-base-prms-pgs))))
(defn primes-paged
"unbounded Sieve of Eratosthenes producing a lazy sequence of primes"
[]
(do (deftype CIS [v cont]
clojure.lang.ISeq
(first [_] v)
(next [_] (if (nil? cont) nil (cont)))
(more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))
(cons [this o] (clojure.core/cons o this))
(empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))
(equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)
(if (or (not= (type cis1) (type cis2))
(not= (.v cis1) (.v ^CIS cis2))
(and (nil? (.cont cis1))
(not (nil? (.cont ^CIS cis2))))
(and (nil? (.cont ^CIS cis2))
(not (nil? (.cont cis1))))) false
(if (nil? (.cont cis1)) true
(recur ((.cont cis1)) ((.cont ^CIS cis2))))))))
(count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt
(recur ((.cont cis)) (inc cnt)))))
clojure.lang.Seqable
(seq [this] (if (and (nil? v) (nil? cont)) nil this))
clojure.lang.Sequential
Object
(toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))
(letfn [(next-prm [lowi i pgseq]
(let [lowi (long lowi),
i (long i),
^ints pg (first pgseq),
pgsz (long (alength pg)),
pgbts (long (bit-shift-left pgsz 5)),
ni (long (loop [j (long i)]
(if (or (>= j pgbts)
(zero? (bit-and (aget pg (bit-shift-right j 5))
(bit-shift-left 1 (bit-and j 31)))))
j
(recur (inc j)))))]
(if (>= ni pgbts)
(recur (+ lowi pgbts) 0 (next pgseq))
(->CIS (+ (bit-shift-left (+ lowi ni) 1) 3)
(fn [] (next-prm lowi (inc ni) pgseq))))))]
(->CIS 2 (fn [] (next-prm 0 0 (primes-pages)))))))
(defn primes-paged-count-to
"counts primes generated by page segments by Sieve of Eratosthenes to the top limit"
[top]
(cond (< top 2) 0
(< top 3) 1
:else (letfn [(nxt-pg [lowi pgseq cnt]
(let [topi (bit-shift-right (- top 3) 1)
nxti (+ lowi PGBTS),
pg (first pgseq)]
(if (> nxti topi)
(+ cnt (count-pg (- topi lowi) pg))
(recur nxti
(next pgseq)
(+ cnt (count-pg PGBTS pg))))))]
(nxt-pg 0 (primes-pages) 1))))
```

The above code runs just as fast as other virtual machine languages when run on a 64-bit JVM; however, when run on a 32-bit JVM it runs almost five times slower. This is likely due to Clojure only using 64-bit integers for integer operations and these operations getting JIT compiled to use library functions to simulate those operations using combined 32-bit operations under a 32-bit JVM whereas direct CPU operations can be used on a 64-bit JVM

Clojure does one thing very slowly, just as here: it enumerates extremely slowly as compared to using a more imperative iteration interface; it helps to use a roll-your-own ISeq interface as here, where enumeration of the primes reduces the time from about four times as long as the composite culling operations for those primes to only about one and a half times as long, although one must also write their own sequence handling functions (can't use "take-while" or "count", for instance) in order to enjoy that benefit. That is why the "primes-paged-count-to" function is provided so it takes a negligible percentage of the time to count the primes over a range as compared to the time for the composite culling operations.

The practical range of the above sieve is about 16 million due to the fixed size of the page buffers; in order to extend the range, a larger page buffer could be used up to the size of the CPU L2 or L3 caches. If a 2^20 buffer were used (one Megabyte, as many modern dexktop CPU's easily have in their L3 cache), then the range would be increased up to about 10^14 at a cost of about a factor of two or three in slower memory accesses per composite culling operation loop. The base primes culling page size is already adequate for this range. One could make the culling page size automatically expand with growing range by about the square root of the current prime range with not too many changes to the code.

As for many implementations of unbounded sieves, the base primes less than the square root of the current range are generated by a secondary generated stream of primes; in this case it is done recursively, so another secondary stream generates the base primes for the base primes and so on down to where the innermost generator has only one page in the stream; this only takes one or two recursions for this type of range.

The base primes culling page size is reduced from the page size for the main primes so that there is less overhead for smaller primes ranges; otherwise excess base primes are generated for fairly small sieve ranges.

## CLU

```% Sieve of Eratosthenes
eratosthenes = proc (n: int) returns (array[bool])
prime: array[bool] := array[bool]\$fill(1, n, true)
prime := false

for p: int in int\$from_to(2, n/2) do
if prime[p] then
for c: int in int\$from_to_by(p*p, n, p) do
prime[c] := false
end
end
end
return(prime)
end eratosthenes

% Print primes up to 1000 using the sieve
start_up = proc ()
po: stream := stream\$primary_output()
prime: array[bool] := eratosthenes(1000)
col: int := 0

for i: int in array[bool]\$indexes(prime) do
if prime[i] then
col := col + 1
stream\$putright(po, int\$unparse(i), 5)
if col = 10 then
col := 0
stream\$putc(po, '\n')
end
end
end
end start_up```
Output:
```    2    3    5    7   11   13   17   19   23   29
31   37   41   43   47   53   59   61   67   71
73   79   83   89   97  101  103  107  109  113
127  131  137  139  149  151  157  163  167  173
179  181  191  193  197  199  211  223  227  229
233  239  241  251  257  263  269  271  277  281
283  293  307  311  313  317  331  337  347  349
353  359  367  373  379  383  389  397  401  409
419  421  431  433  439  443  449  457  461  463
467  479  487  491  499  503  509  521  523  541
547  557  563  569  571  577  587  593  599  601
607  613  617  619  631  641  643  647  653  659
661  673  677  683  691  701  709  719  727  733
739  743  751  757  761  769  773  787  797  809
811  821  823  827  829  839  853  857  859  863
877  881  883  887  907  911  919  929  937  941
947  953  967  971  977  983  991  997```

## CMake

```function(eratosthenes var limit)
# Check for integer overflow. With CMake using 32-bit signed integer,
# this check fails when limit > 46340.
if(NOT limit EQUAL 0)         # Avoid division by zero.
math(EXPR i "(\${limit} * \${limit}) / \${limit}")
if(NOT limit EQUAL \${i})
message(FATAL_ERROR "limit is too large, would cause integer overflow")
endif()
endif()

# Use local variables prime_2, prime_3, ..., prime_\${limit} as array.
# Initialize array to y => yes it is prime.
foreach(i RANGE 2 \${limit})
set(prime_\${i} y)
endforeach(i)

# Gather a list of prime numbers.
set(list)
foreach(i RANGE 2 \${limit})
if(prime_\${i})
# Append this prime to list.
list(APPEND list \${i})

# For each multiple of i, set n => no it is not prime.
# Optimization: start at i squared.
math(EXPR square "\${i} * \${i}")
if(NOT square GREATER \${limit})   # Avoid fatal error.
foreach(m RANGE \${square} \${limit} \${i})
set(prime_\${m} n)
endforeach(m)
endif()
endif(prime_\${i})
endforeach(i)
set(\${var} \${list} PARENT_SCOPE)
endfunction(eratosthenes)
```
```# Print all prime numbers through 100.
eratosthenes(primes 100)
message(STATUS "\${primes}")
```

## COBOL

```*> Please ignore the asterisks in the first column of the next comments,
*> which are kludges to get syntax highlighting to work.
IDENTIFICATION ```