I'm working on modernizing Rosetta Code's infrastructure. Starting with communications. Please accept this time-limited open invite to RC's Slack.. --Michael Mol (talk) 20:59, 30 May 2020 (UTC)

Sieve of Eratosthenes

Sieve of Eratosthenes
You are encouraged to solve this task according to the task description, using any language you may know.
This task has been clarified. Its programming examples are in need of review to ensure that they still fit the requirements of the task.

The Sieve of Eratosthenes is a simple algorithm that finds the prime numbers up to a given integer.

Implement the   Sieve of Eratosthenes   algorithm, with the only allowed optimization that the outer loop can stop at the square root of the limit, and the inner loop may start at the square of the prime just found.

That means especially that you shouldn't optimize by using pre-computed wheels, i.e. don't assume you need only to cross out odd numbers (wheel based on 2), numbers equal to 1 or 5 modulo 6 (wheel based on 2 and 3), or similar wheels based on low primes.

If there's an easy way to add such a wheel based optimization, implement it as an alternative version.

Note
• It is important that the sieve algorithm be the actual algorithm used to find prime numbers for the task.

11l

Translation of: Python
F primes_upto(limit)
V is_prime = [0B]*2 [+] [1B]*(limit - 1)
L(n) 0 .< Int(limit ^ 0.5 + 1.5)
I is_prime[n]
L(i) (n*n..limit).step(n)
is_prime[i] = 0B
R enumerate(is_prime).filter((i, prime) -> prime).map((i, prime) -> i)

print(primes_upto(100))
Output:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

360 Assembly

For maximum compatibility, this program uses only the basic instruction set.

*        Sieve of Eratosthenes
ERATOST CSECT
USING ERATOST,R12
SAVEAREA B STM-SAVEAREA(R15)
DC 17F'0'
DC CL8'ERATOST'
STM STM R14,R12,12(R13) save calling context
ST R13,4(R15)
ST R15,8(R13)
* ---- CODE
LA R4,1 I=1
LA R6,1 increment
L R7,N limit
LOOPI BXH R4,R6,ENDLOOPI do I=2 to N
LR R1,R4 R1=I
BCTR R1,0
LA R14,CRIBLE(R1)
CLI 0(R14),X'01'
BNE ENDIF if not CRIBLE(I)
LR R5,R4 J=I
LR R8,R4
LR R9,R7
LOOPJ BXH R5,R8,ENDLOOPJ do J=I*2 to N by I
LR R1,R5 R1=J
BCTR R1,0
LA R14,CRIBLE(R1)
MVI 0(R14),X'00' CRIBLE(J)='0'B
B LOOPJ
ENDLOOPJ EQU *
ENDIF EQU *
B LOOPI
ENDLOOPI EQU *
LA R4,1 I=1
LA R6,1
L R7,N
LOOP BXH R4,R6,ENDLOOP do I=1 to N
LR R1,R4 R1=I
BCTR R1,0
LA R14,CRIBLE(R1)
CLI 0(R14),X'01'
BNE NOTPRIME if not CRIBLE(I)
CVD R4,P P=I
UNPK Z,P Z=P
MVC C,Z C=Z
OI C+L'C-1,X'F0' zap sign
MVC WTOBUF(8),C+8
WTO MF=(E,WTOMSG)
NOTPRIME EQU *
B LOOP
ENDLOOP EQU *
RETURN EQU *
LM R14,R12,12(R13) restore context
XR R15,R15 set return code to 0
* ---- DATA
I DS F
J DS F
DS 0F
P DS PL8 packed
Z DS ZL16 zoned
C DS CL16 character
WTOMSG DS 0F
DC H'80' length of WTO buffer
DC H'0' must be binary zeroes
WTOBUF DC 80C' '
LTORG
N DC F'100000'
CRIBLE DC 100000X'01'
YREGS
END ERATOST
Output:
00000002
00000003
00000005
00000007
00000011
00000013
00000017
00000019
00000023
00000029
00000031
00000037
00000041
00000043
00000047
00000053
00000059
00000061
00000067
...
00099767
00099787
00099793
00099809
00099817
00099823
00099829
00099833
00099839
00099859
00099871
00099877
00099881
00099901
00099907
00099923
00099929
00099961
00099971
00099989
00099991

6502 Assembly

If this subroutine is called with the value of n in the accumulator, it will store an array of the primes less than n beginning at address 1000 hex and return the number of primes it has found in the accumulator.

ERATOS: STA  \$D0      ; value of n
LDA #\$00
LDX #\$00
SETUP: STA \$1000,X  ; populate array
INX
CPX \$D0
BPL SET
JMP SETUP
SET: LDX #\$02
SIEVE: LDA \$1000,X  ; find non-zero
INX
CPX \$D0
BPL SIEVED
CMP #\$00
BEQ SIEVE
STA \$D1  ; current prime
MARK: CLC
TAY
LDA #\$00
STA \$1000,Y
TYA
CMP \$D0
BPL SIEVE
JMP MARK
SIEVED: LDX #\$01
LDY #\$00
COPY: INX
CPX \$D0
BPL COPIED
LDA \$1000,X
CMP #\$00
BEQ COPY
STA \$2000,Y
INY
JMP COPY
COPIED: TYA  ; how many found
RTS

68000 Assembly

Algorithm somewhat optimized: array omits 1, 2, all higher odd numbers. Optimized for storage: uses bit array for prime/composite flags.

Works with: [EASy68K v5.13.00]

Some of the macro code is derived from the examples included with EASy68K. See 68000 "100 Doors" listing for additional information.

*-----------------------------------------------------------
* Title  : BitSieve
* Written by : G. A. Tippery
* Date  : 2014-Feb-24, 2013-Dec-22
* Description: Prime number sieve
*-----------------------------------------------------------
ORG \$1000

** ---- Generic macros ---- **
PUSH MACRO
MOVE.L \1,-(SP)
ENDM

POP MACRO
MOVE.L (SP)+,\1
ENDM

DROP MACRO
ENDM

PUTS MACRO
** Print a null-terminated string w/o CRLF **
** Returns with D0, A1 modified
MOVEQ #14,D0 ; task number 14 (display null string)
LEA \1,A1 ; address of string
TRAP #15 ; display it
ENDM

GETN MACRO
MOVEQ #4,D0 ; Read a number from the keyboard into D1.L.
TRAP #15
ENDM

** ---- Application-specific macros ---- **

val MACRO ; Used by bit sieve. Converts bit address to the number it represents.
ADD.L \1,\1 ; double it because odd numbers are omitted
ADDQ #3,\1 ; add offset because initial primes (1, 2) are omitted
ENDM

* ** ================================================================================ **
* ** Integer square root routine, bisection method **
* ** IN: D0, should be 0<D0<\$10000 (65536) -- higher values MAY work, no guarantee
* ** OUT: D1
*
SquareRoot:
*
MOVEM.L D2-D4,-(SP) ; save registers needed for local variables
* DO == n
* D1 == a
* D2 == b
* D3 == guess
* D4 == temp
*
* a = 1;
* b = n;
MOVEQ #1,D1
MOVE.L D0,D2
* do {
REPEAT
* guess = (a+b)/2;
MOVE.L D1,D3
LSR.L #1,D3
* if (guess*guess > n) { // inverse function of sqrt is square
MOVE.L D3,D4
MULU D4,D4 ; guess^2
CMP.L D0,D4
BLS .else
* b = guess;
MOVE.L D3,D2
BRA .endif
* } else {
.else:
* a = guess;
MOVE.L D3,D1
* } //if
.endif:
* } while ((b-a) > 1); ; Same as until (b-a)<=1 or until (a-b)>=1
MOVE.L D2,D4
SUB.L D1,D4 ; b-a
UNTIL.L D4 <LE> #1 DO.S
* return (a) ; Result is in D1
* } //LongSqrt()
MOVEM.L (SP)+,D2-D4 ; restore saved registers
RTS
*
* ** ================================================================================ **

** ======================================================================= **
*
** Prime-number Sieve of Eratosthenes routine using a big bit field for flags **
* Enter with D0 = size of sieve (bit array)
* Prints found primes 10 per line
* Returns # prime found in D6
*
* Register usage:
*
* D0 == n
* D1 == prime
* D2 == sqroot
* D3 == PIndex
* D4 == CIndex
* D5 == MaxIndex
* D6 == PCount
*
* A0 == PMtx
*
* On return, all registers above except D0 are modified. Could add MOVEMs to save and restore D2-D6/A0.
*

** ------------------------ **

GetBit: ** sub-part of Sieve subroutine **
** Entry: bit # is on TOS
** Exit: A6 holds the byte number, D7 holds the bit number within the byte
** Note: Input param is still on TOS after return. Could have passed via a register, but
** wanted to practice with stack. :)
*
MOVE.L (4,SP),D7 ; get value from (pre-call) TOS
ASR.L #3,D7 ; /8
MOVEA D7,A6 ; byte #
MOVE.L (4,SP),D7 ; get value from (pre-call) TOS
AND.L #\$7,D7 ; bit #
RTS

** ------------------------ **

Sieve:
MOVE D0,D5
SUBQ #1,D5
JSR SquareRoot ; sqrt D0 => D1
MOVE.L D1,D2
LEA PArray,A0
CLR.L D3
*
PrimeLoop:
MOVE.L D3,D1
val D1
MOVE.L D3,D4
*
CxLoop: ; Goes through array marking multiples of d1 as composite numbers
CMP.L D5,D4
BHI ExitCx
PUSH D4 ; set D7 as bit # and A6 as byte pointer for D4'th bit of array
JSR GetBit
DROP
BSET D7,0(A0,A6.L) ; set bit to mark as composite number
ADD.L D1,D4 ; next number to mark
BRA CxLoop
ExitCx:
CLR.L D1 ; Clear new-prime-found flag
ADDQ #1,D3 ; Start just past last prime found
PxLoop: ; Searches for next unmarked (not composite) number
CMP.L D2,D3 ; no point searching past where first unmarked multiple would be past end of array
BHI ExitPx ; if past end of array
TST.L D1
BNE ExitPx ; if flag set, new prime found
PUSH D3 ; check D3'th bit flag
JSR GetBit ; sets D7 as bit # and A6 as byte pointer
DROP ; drop TOS
BTST D7,0(A0,A6.L) ; read bit flag
BNE IsSet ; If already tagged as composite
MOVEQ #-1,D1 ; Set flag that we've found a new prime
IsSet:
BRA PxLoop
ExitPx:
SUBQ #1,D3 ; back up PIndex
TST.L D1 ; Did we find a new prime #?
BNE PrimeLoop ; If another prime # found, go process it
*
; fall through to print routine

** ------------------------ **

* Print primes found
*
* D4 == Column count
*
* Print header and assumed primes (#1, #2)
MOVEQ #2,D6 ; Start counter at 2 because #1 and #2 are assumed primes
MOVEQ #2,D4
*
MOVEQ #0,D3
PrintLoop:
CMP.L D5,D3
BHS ExitPL
PUSH D3
JSR GetBit ; sets D7 as bit # and A6 as byte pointer
DROP ; drop TOS
BTST D7,0(A0,A6.L)
BNE NotPrime
* printf(" %6d", val(PIndex)
MOVE.L D3,D1
val D1
AND.L #\$0000FFFF,D1
MOVEQ #6,D2
MOVEQ #20,D0 ; display signed RJ
TRAP #15
* *** Display formatting ***
* if((PCount % 10) == 0) printf("\n");
CMP #10,D4
BLO NoLF
PUTS CRLF
MOVEQ #0,D4
NoLF:
NotPrime:
BRA PrintLoop
ExitPL:
RTS

** ======================================================================= **

N EQU 5000 ; *** Size of boolean (bit) array ***
SizeInBytes EQU (N+7)/8
*
START: ; first instruction of program
MOVE.L #N,D0 ; # to test
JSR Sieve
* printf("\n %d prime numbers found.\n", D6); ***
PUTS Summary1,A1
MOVE #3,D0 ; Display signed number in D1.L in decimal in smallest field.
MOVE.W D6,D1
TRAP #15
PUTS Summary2,A1

SIMHALT ; halt simulator

** ======================================================================= **

* Variables and constants here

ORG \$2000
CR EQU 13
LF EQU 10
CRLF DC.B CR,LF,\$00

PArray: DCB.B SizeInBytes,0

DC.B ' 1 2',\$00

Summary1: DC.B CR,LF,' ',\$00
Summary2: DC.B ' prime numbers found.',CR,LF,\$00

END START ; last line of source

8086 Assembly

MAXPRM:	equ	5000		; Change this value for more primes
cpu 8086
bits 16
org 100h
section .text
erato: mov cx,MAXPRM ; Initialize array (set all items to prime)
mov bp,cx ; Keep a copy in BP
mov di,sieve
mov al,1
rep stosb
;;; Sieve
mov bx,sieve ; Set base register to array
inc cx ; CX=1 (CH=0, CL=1); CX was 0 before
mov si,cx ; Start at number 2 (1+1)
.next: inc si ; Next number
cmp cl,[bx+si] ; Is this number marked as prime?
jne .next ; If not, try next number
mov ax,si ; Otherwise, calculate square,
mul si
mov di,ax ; and put it in DI
cmp di,bp ; Check array bounds
ja output ; We're done when SI*SI>MAXPRM
.mark: mov [bx+di],ch ; Mark byte as composite
cmp di,bp ; While maximum not reached
jbe .mark
jmp .next
;;; Output
output: mov si,2 ; Start at 2
.test: dec byte [bx+si] ; Prime?
jnz .next ; If not, try next number
mov ax,si ; Otherwise, print number
call prax
.next: inc si
cmp si,MAXPRM
jbe .test
ret
;;; Write number in AX to standard output (using MS-DOS)
prax: push bx ; Save BX
mov bx,numbuf
mov bp,10 ; Divisor
.loop: xor dx,dx ; Divide AX by 10, modulus in DX
div bp
dec bx
mov [bx],dl ; Store ASCII digit
test ax,ax ; More digits?
jnz .loop
mov dx,bx ; Print number
mov ah,9 ; 9 = MS-DOS syscall to print string
int 21h
pop bx ; Restore BX
ret
section .data
db '*****' ; Room for number
numbuf: db 13,10,'\$'
section .bss
sieve: resb MAXPRM
Output:
2
3
5
7
11
...
4969
4973
4987
4993
4999

AArch64 Assembly

Works with: as version Raspberry Pi 3B version Buster 64 bits

/* ARM assembly AARCH64 Raspberry PI 3B */
/* program cribleEras64.s */

/*******************************************/
/* Constantes file */
/*******************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeConstantesARM64.inc"

.equ MAXI, 100

/*********************************/
/* Initialized data */
/*********************************/
.data
sMessResult: .asciz "Prime  : @ \n"
szCarriageReturn: .asciz "\n"

/*********************************/
/* UnInitialized data */
/*********************************/
.bss
sZoneConv: .skip 24
TablePrime: .skip 8 * MAXI
/*********************************/
/* code section */
/*********************************/
.text
.global main
main: // entry of program
mov x0,#2 // prime 2
bl displayPrime
mov x1,#2
mov x2,#1
1: // loop for multiple of 2
str x2,[x4,x1,lsl #3] // mark multiple of 2
cmp x1,#MAXI // end ?
ble 1b // no loop
mov x1,#3 // begin indice
mov x3,#1
2:
ldr x2,[x4,x1,lsl #3] // load table élément
cmp x2,#1 // is prime ?
beq 4f
mov x0,x1 // yes -> display
bl displayPrime
mov x2,x1
3: // and loop to mark multiples of this prime
str x3,[x4,x2,lsl #3]
cmp x2,#MAXI // end ?
ble 3b // no -> loop
4:
add x1,x1,2 // other prime in table
cmp x1,MAXI // end table ?
ble 2b // no -> loop

100: // standard end of the program
mov x0,0 // return code
mov x8,EXIT // request to exit program
svc 0 // perform the system call

/******************************************************************/
/* Display prime table elements */
/******************************************************************/
/* x0 contains the prime */
displayPrime:
stp x1,lr,[sp,-16]! // save registers
bl conversion10 // call décimal conversion
ldr x1,qAdrsZoneConv // insert conversion in message
bl strInsertAtCharInc
bl affichageMess // display message
100:
ldp x1,lr,[sp],16 // restaur 2 registers

/********************************************************/
/* File Include fonctions */
/********************************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeARM64.inc"

Prime  : 2
Prime  : 3
Prime  : 5
Prime  : 7
Prime  : 11
Prime  : 13
Prime  : 17
Prime  : 19
Prime  : 23
Prime  : 29
Prime  : 31
Prime  : 37
Prime  : 41
Prime  : 43
Prime  : 47
Prime  : 53
Prime  : 59
Prime  : 61
Prime  : 67
Prime  : 71
Prime  : 73
Prime  : 79
Prime  : 83
Prime  : 89
Prime  : 97

ABAP

PARAMETERS: p_limit TYPE i OBLIGATORY DEFAULT 100.

AT SELECTION-SCREEN ON p_limit.
IF p_limit LE 1.
MESSAGE 'Limit must be higher then 1.' TYPE 'E'.
ENDIF.

START-OF-SELECTION.
FIELD-SYMBOLS: <fs_prime> TYPE flag.
DATA: gt_prime TYPE TABLE OF flag,
gv_prime TYPE flag,
gv_i TYPE i,
gv_j TYPE i.

DO p_limit TIMES.
IF sy-index > 1.
gv_prime = abap_true.
ELSE.
gv_prime = abap_false.
ENDIF.

APPEND gv_prime TO gt_prime.
ENDDO.

gv_i = 2.
WHILE ( gv_i <= trunc( sqrt( p_limit ) ) ).
IF ( gt_prime[ gv_i ] EQ abap_true ).
gv_j = gv_i ** 2.
WHILE ( gv_j <= p_limit ).
gt_prime[ gv_j ] = abap_false.
gv_j = ( gv_i ** 2 ) + ( sy-index * gv_i ).
ENDWHILE.
ENDIF.
gv_i = gv_i + 1.
ENDWHILE.

LOOP AT gt_prime INTO gv_prime.
IF gv_prime = abap_true.
WRITE: / sy-tabix.
ENDIF.
ENDLOOP.

ACL2

(defun nats-to-from (n i)
(declare (xargs :measure (nfix (- n i))))
(if (zp (- n i))
nil
(cons i (nats-to-from n (+ i 1)))))

(defun remove-multiples-up-to-r (factor limit xs i)
(declare (xargs :measure (nfix (- limit i))))
(if (or (> i limit)
(zp (- limit i))
(zp factor))
xs
(remove-multiples-up-to-r
factor
limit
(remove i xs)
(+ i factor))))

(defun remove-multiples-up-to (factor limit xs)
(remove-multiples-up-to-r factor limit xs (* factor 2)))

(defun sieve-r (factor limit)
(declare (xargs :measure (nfix (- limit factor))))
(if (zp (- limit factor))
(nats-to-from limit 2)
(remove-multiples-up-to factor (+ limit 1)
(sieve-r (1+ factor) limit))))

(defun sieve (limit)
(sieve-r 2 limit))

Action!

DEFINE MAX="1000"

PROC Main()
BYTE ARRAY t(MAX+1)
INT i,j,k,first

FOR i=0 TO MAX
DO
t(i)=1
OD

t(0)=0
t(1)=0

i=2 first=1
WHILE i<=MAX
DO
IF t(i)=1 THEN
IF first=0 THEN
Print(", ")
FI
PrintI(i)
FOR j=2*i TO MAX STEP i
DO
t(j)=0
OD
first=0
FI
i==+1
OD
RETURN
Output:
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103,
107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223,
227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347,
349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463,
467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607,
613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743,
751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883,
887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997

ActionScript

Works with ActionScript 3.0 (this is utilizing the actions panel, not a separated class file)

function eratosthenes(limit:int):Array
{
var primes:Array = new Array();
if (limit >= 2) {
var sqrtlmt:int = int(Math.sqrt(limit) - 2);
for (var i:int = 2; i <= limit; i++) // and
nums.push(i); // only initialize the Array once...
for (var j:int = 0; j <= sqrtlmt; j++) {
var p:int = nums[j]
if (p)
for (var t:int = p * p - 2; t < nums.length; t += p)
nums[t] = 0;
}
for (var m:int = 0; m < nums.length; m++) {
var r:int = nums[m];
if (r)
primes.push(r);
}
}
return primes;
}
var e:Array = eratosthenes(1000);
trace(e);

Output:

Output:
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,947,953,967,971,977,983,991,997

procedure Eratos is

Prime: array(1 .. Last) of Boolean := (1 => False, others => True);
Base: Positive := 2;
Cnt: Positive;
begin
loop
exit when Base * Base > Last;
if Prime(Base) then
Cnt := Base + Base;
loop
exit when Cnt > Last;
Prime(Cnt) := False;
Cnt := Cnt + Base;
end loop;
end if;
Base := Base + 1;
end loop;
Ada.Text_IO.Put("Primes less or equal" & Positive'Image(Last) &" are:");
for Number in Prime'Range loop
if Prime(Number) then
end if;
end loop;
end Eratos;
Output:
> ./eratos 31
Primes less or equal 31 are : 2 3 5 7 11 13 17 19 23 29 31

Agda

-- imports
open import Data.Nat as ℕ using (ℕ; suc; zero; _+_; _∸_)
open import Data.Vec as Vec using (Vec; _∷_; []; tabulate; foldr)
open import Data.Fin as Fin using (Fin; suc; zero)
open import Function using (_∘_; const; id)
open import Data.List as List using (List; _∷_; [])
open import Data.Maybe using (Maybe; just; nothing)

-- Without square cutoff optimization
module Simple where
primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve (tabulate (just ∘ suc))
where
sieve : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m))
sieve [] = []
sieve (nothing ∷ xs) = sieve xs
sieve (just x ∷ xs) = suc x ∷ sieve (foldr B remove (const []) xs x)
where
B = λ n → ∀ {i} → Fin i → Vec (Maybe (Fin (2 + m))) n

remove : ∀ {n} → Maybe (Fin (2 + m)) → B n → B (suc n)
remove _ ys zero = nothing ∷ ys x
remove y ys (suc z) = y ∷ ys z

-- With square cutoff optimization
module SquareOpt where
primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve 1 m (Vec.tabulate (just ∘ Fin.suc ∘ Fin.suc))
where
sieve : ∀ {n} → ℕ → ℕ → Vec (Maybe (Fin (3 + m))) n → List (Fin (3 + m))
sieve _ zero = List.mapMaybe id ∘ Vec.toList
sieve _ (suc _) [] = []
sieve i (suc l) (nothing ∷ xs) = sieve (suc i) (l ∸ i ∸ i) xs
sieve i (suc l) (just x ∷ xs) = x ∷ sieve (suc i) (l ∸ i ∸ i) (Vec.foldr B remove (const []) xs i)
where
B = λ n → ℕ → Vec (Maybe (Fin (3 + m))) n

remove : ∀ {i} → Maybe (Fin (3 + m)) → B i → B (suc i)
remove _ ys zero = nothing ∷ ys i
remove y ys (suc j) = y ∷ ys j

Agena

Tested with Agena 2.9.5 Win32

# Sieve of Eratosthenes

# generate and return a sequence containing the primes up to sieveSize
sieve := proc( sieveSize :: number ) :: sequence is
local sieve, result;

result := seq(); # sequence of primes - initially empty
create register sieve( sieveSize ); # "vector" to be sieved

sieve[ 1 ] := false;
for sPos from 2 to sieveSize do sieve[ sPos ] := true od;

# sieve the primes
for sPos from 2 to entier( sqrt( sieveSize ) ) do
if sieve[ sPos ] then
for p from sPos * sPos to sieveSize by sPos do
sieve[ p ] := false
od
fi
od;

# construct the sequence of primes
for sPos from 1 to sieveSize do
if sieve[ sPos ] then insert sPos into result fi
od

return result
end; # sieve

# test the sieve proc
for i in sieve( 100 ) do write( " ", i ) od; print();
Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

ALGOL 60

Based on the 1962 Revised Repport:

comment Sieve of Eratosthenes;
begin
integer array t[0:1000];
integer i,j,k;
for i:=0 step 1 until 1000 do t[i]:=1;
t:=0; t:=0; i:=0;
for i:=i while i<1000 do
begin
for i:=i while i<1000 and t[i]=0 do i:=i+1;
if i<1000 then
begin
j:=2;
k:=j*i;
for k:=k while k<1000 do
begin
t[k]:=0;
j:=j+1;
k:=j*i
end;
i:=i+1
end
end;
for i:=0 step 1 until 999 do
if t[i]≠0 then print(i,ꞌ is primeꞌ)
end

An 1964 Implementation:

Works with: ALGOL 60 for OS/360

'BEGIN'
'INTEGER' 'ARRAY' CANDIDATES(/0..1000/);
'INTEGER' I,J,K;
'COMMENT' SET LINE-LENGTH=120,SET LINES-PER-PAGE=62,OPEN;
SYSACT(1,6,120); SYSACT(1,8,62); SYSACT(1,12,1);
'FOR' I := 0 'STEP' 1 'UNTIL' 1000 'DO'
'BEGIN'
CANDIDATES(/I/) := 1;
'END';
CANDIDATES(/0/) := 0;
CANDIDATES(/1/) := 0;
I := 0;
'FOR' I := I 'WHILE' I 'LESS' 1000 'DO'
'BEGIN'
'FOR' I := I 'WHILE' I 'LESS' 1000
'AND' CANDIDATES(/I/) 'EQUAL' 0 'DO'
I := I+1;
'IF' I 'LESS' 1000 'THEN'
'BEGIN'
J := 2;
K := J*I;
'FOR' K := K 'WHILE' K 'LESS' 1000 'DO'
'BEGIN'
CANDIDATES(/K/) := 0;
J := J + 1;
K := J*I;
'END';
I := I+1;
'END'
'END';
'FOR' I := 0 'STEP' 1 'UNTIL' 999 'DO'
'IF' CANDIDATES(/I/) 'NOTEQUAL' 0 'THEN'
'BEGIN'
OUTINTEGER(1,I);
OUTSTRING(1,'(' IS PRIME')');
'COMMENT' NEW LINE;
SYSACT(1,14,1)
'END'
'END'
'END'

ALGOL 68

BOOL prime = TRUE, non prime = FALSE;
PROC eratosthenes = (INT n)[]BOOL:
(
[n]BOOL sieve;
FOR i TO UPB sieve DO sieve[i] := prime OD;
INT m = ENTIER sqrt(n);
sieve := non prime;
FOR i FROM 2 TO m DO
IF sieve[i] = prime THEN
FOR j FROM i*i BY i TO n DO
sieve[j] := non prime
OD
FI
OD;
sieve
);

print((eratosthenes(80),new line))
Output:
FTTFTFTFFFTFTFFFTFTFFFTFFFFFTFTFFFFFTFFFTFTFFFTFFFFFTFFFFFTFTFFFFFTFFFTFTFFFFFTF

ALGOL W

Standard, non-optimised sieve

begin

% implements the sieve of Eratosthenes  %
% s(i) is set to true if i is prime, false otherwise  %
% algol W doesn't have a upb operator, so we pass the size of the  %
% array in n  %
procedure sieve( logical array s ( * ); integer value n ) ;
begin

for i := 1 until n do s( i ) := true;

% sieve out the non-primes  %
s( 1 ) := false;
for i := 2 until truncate( sqrt( n ) )
do begin
if s( i )
then begin
for p := i * i step i until n do s( p ) := false
end if_s_i
end for_i ;

end sieve ;

% test the sieve procedure  %

integer sieveMax;

sieveMax := 100;
begin

logical array s ( 1 :: sieveMax );

i_w := 2; % set output field width  %
s_w := 1; % and output separator width  %

% find and display the primes  %
sieve( s, sieveMax );
for i := 1 until sieveMax do if s( i ) then writeon( i );

end

end.
Output:
2  3  5  7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Odd numbers only version

Alternative version that only stores odd numbers greater than 1 in the sieve.

begin
% implements the sieve of Eratosthenes  %
% only odd numbers appear in the sieve, which starts at 3  %
% s( i ) is set to true if ( i * 2 ) + 1 is prime  %
procedure sieve2( logical array s ( * ); integer value n ) ;
begin
for i := 1 until n do s( i ) := true;
% sieve out the non-primes  %
% the subscripts of s are 1 2 3 4 5 6 7 8 9 10 11 12 13...  %
% which correspond to 3 5 7 9 11 13 15 17 19 21 23 25 27...  %
for i := 1 until truncate( sqrt( n ) ) do begin
if s( i ) then begin
integer ip;
ip := ( i * 2 ) + 1;
for p := i + ip step ip until n do s( p ) := false
end if_s_i
end for_i ;
end sieve2 ;
% test the sieve2 procedure  %
integer primeMax, arrayMax;
primeMax := 100;
arrayMax := ( primeMax div 2 ) - 1;
begin
logical array s ( 1 :: arrayMax);
i_w := 2; % set output field width  %
s_w := 1; % and output separator width  %
% find and display the primes  %
sieve2( s, arrayMax );
write( 2 );
for i := 1 until arrayMax do if s( i ) then writeon( ( i * 2 ) + 1 );
end
end.
Output:

Same as the standard version.

ALGOL-M

BEGIN

COMMENT
FIND PRIMES UP TO THE SPECIFIED LIMIT (HERE 1,000) USING
CLASSIC SIEVE OF ERATOSTHENES;

% CALCULATE INTEGER SQUARE ROOT %
INTEGER FUNCTION ISQRT(N);
INTEGER N;
BEGIN
INTEGER R1, R2;
R1 := N;
R2 := 1;
WHILE R1 > R2 DO
BEGIN
R1 := (R1+R2) / 2;
R2 := N / R1;
END;
ISQRT := R1;
END;

INTEGER LIMIT, I, J, FALSE, TRUE, COL, COUNT;
INTEGER ARRAY FLAGS[1:1000];

LIMIT := 1000;
FALSE := 0;
TRUE := 1;

WRITE("FINDING PRIMES FROM 2 TO",LIMIT);

% INITIALIZE TABLE %
WRITE("INITIALIZING ... ");
FOR I := 1 STEP 1 UNTIL LIMIT DO
FLAGS[I] := TRUE;

% SIEVE FOR PRIMES %
WRITEON("SIEVING ... ");
FOR I := 2 STEP 1 UNTIL ISQRT(LIMIT) DO
BEGIN
IF FLAGS[I] = TRUE THEN
FOR J := (I * I) STEP I UNTIL LIMIT DO
FLAGS[J] := FALSE;
END;

% WRITE OUT THE PRIMES TEN PER LINE %
WRITEON("PRINTING");
COUNT := 0;
COL := 1;
WRITE("");
FOR I := 2 STEP 1 UNTIL LIMIT DO
BEGIN
IF FLAGS[I] = TRUE THEN
BEGIN
WRITEON(I);
COUNT := COUNT + 1;
COL := COL + 1;
IF COL > 10 THEN
BEGIN
WRITE("");
COL := 1;
END;
END;
END;

WRITE("");
WRITE(COUNT, " PRIMES WERE FOUND.");

END

Output:
FINDING PRIMES FROM 2 TO  1000
INTIALIZING ... SIEVING ... PRINTING
2     3     5     7    11    13    17    19    23    29
31    37    41    43    47    53    59    61    67    71
. . .
877   881   883   887   907   911   919   929   937   941
947   953   967   971   977   983   991   997

168 PRIMES WERE FOUND.

APL

All these versions requires ⎕io←0 (index origin 0).

It would have been better to require a result of the boolean mask rather than the actual list of primes. The list of primes obtains readily from the mask by application of a simple function (here {⍵/⍳⍴⍵}). Other related computations (such as the number of primes < n) obtain readily from the mask, easier than producing the list of primes.

Non-Optimized Version

sieve2←{
b←⍵⍴1
b[⍳2⌊⍵]←0
2≥⍵:b
p←{⍵/⍳⍴⍵}∇⌈⍵*0.5
m←1+⌊(⍵-1+p×p)÷p
b ⊣ p {b[⍺×⍺+⍳⍵]←0}¨ m
}

primes2←{⍵/⍳⍴⍵}∘sieve2

The required list of prime divisors obtains by recursion ({⍵/⍳⍴⍵}∇⌈⍵*0.5).

Optimized Version

sieve←{
b←⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5
b[⍳6⌊⍵]←(6⌊⍵)⍴0 0 1 1 0 1
49≥⍵:b
p←3↓{⍵/⍳⍴⍵}∇⌈⍵*0.5
m←1+⌊(⍵-1+p×p)÷2×p
b ⊣ p {b[⍺×⍺+2×⍳⍵]←0}¨ m
}

primes←{⍵/⍳⍴⍵}∘sieve

The optimizations are as follows:

• Multiples of 2 3 5 are marked by initializing b with ⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5 rather than with ⍵⍴1.
• Subsequently, only odd multiples of primes > 5 are marked.
• Multiples of a prime to be marked start at its square.

Examples

primes 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

primes¨ ⍳14
┌┬┬┬─┬───┬───┬─────┬─────┬───────┬───────┬───────┬───────┬──────────┬──────────┐
││││2│2 3│2 3│2 3 5│2 3 5│2 3 5 7│2 3 5 7│2 3 5 7│2 3 5 7│2 3 5 7 11│2 3 5 7 11│
└┴┴┴─┴───┴───┴─────┴─────┴───────┴───────┴───────┴───────┴──────────┴──────────┘

sieve 13
0 0 1 1 0 1 0 1 0 0 0 1 0

+/∘sieve¨ 10*⍳10
0 4 25 168 1229 9592 78498 664579 5761455 50847534

The last expression computes the number of primes < 1e0 1e1 ... 1e9. The last number 50847534 can perhaps be called the anti-Bertelsen's number (http://mathworld.wolfram.com/BertelsensNumber.html).

AppleScript

on sieveOfEratosthenes(limit)
script o
property numberList : {missing value}
end script

repeat with n from 2 to limit
set end of o's numberList to n
end repeat
repeat with n from 2 to (limit ^ 0.5 div 1)
if (item n of o's numberList is n) then
repeat with multiple from (n * n) to limit by n
set item multiple of o's numberList to missing value
end repeat
end if
end repeat

return o's numberList's numbers
end sieveOfEratosthenes

sieveOfEratosthenes(1000)
Output:
{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997}

ARM Assembly

Works with: as version Raspberry Pi

/* ARM assembly Raspberry PI */
/* program cribleEras.s */

/* REMARK 1 : this program use routines in a include file
see task Include a file language arm assembly
for the routine affichageMess conversion10
see at end of this program the instruction include */
/* for constantes see task include a file in arm assembly */
/************************************/
/* Constantes */
/************************************/
.include "../constantes.inc"

.equ MAXI, 101

/*********************************/
/* Initialized data */
/*********************************/
.data
sMessResult: .asciz "Prime  : @ \n"
szCarriageReturn: .asciz "\n"

/*********************************/
/* UnInitialized data */
/*********************************/
.bss
sZoneConv: .skip 24
TablePrime: .skip 4 * MAXI
/*********************************/
/* code section */
/*********************************/
.text
.global main
main: @ entry of program
mov r0,#2 @ prime 2
bl displayPrime
mov r1,#2
mov r2,#1
1: @ loop for multiple of 2
str r2,[r4,r1,lsl #2] @ mark multiple of 2
cmp r1,#MAXI @ end ?
ble 1b @ no loop
mov r1,#3 @ begin indice
mov r3,#1
2:
ldr r2,[r4,r1,lsl #2] @ load table élément
cmp r2,#1 @ is prime ?
beq 4f
mov r0,r1 @ yes -> display
bl displayPrime
mov r2,r1
3: @ and loop to mark multiples of this prime
str r3,[r4,r2,lsl #2]
cmp r2,#MAXI @ end ?
ble 3b @ no -> loop
4:
add r1,#2 @ other prime in table
cmp r1,#MAXI @ end table ?
ble 2b @ no -> loop

100: @ standard end of the program
mov r0, #0 @ return code
mov r7, #EXIT @ request to exit program
svc #0 @ perform the system call

/******************************************************************/
/* Display prime table elements */
/******************************************************************/
/* r0 contains the prime */
displayPrime:
push {r1,lr} @ save registers
bl conversion10 @ call décimal conversion
ldr r1,iAdrsZoneConv @ insert conversion in message
bl strInsertAtCharInc
bl affichageMess @ display message
100:
pop {r1,lr}
bx lr
/***************************************************/
/* ROUTINES INCLUDE */
/***************************************************/
.include "../affichage.inc"

Prime  : 2
Prime  : 3
Prime  : 5
Prime  : 7
Prime  : 11
Prime  : 13
Prime  : 17
Prime  : 19
Prime  : 23
Prime  : 29
Prime  : 31
Prime  : 37
Prime  : 41
Prime  : 43
Prime  : 47
Prime  : 53
Prime  : 59
Prime  : 61
Prime  : 67
Prime  : 71
Prime  : 73
Prime  : 79
Prime  : 83
Prime  : 89
Prime  : 97
Prime  : 101

AutoHotkey

Search autohotkey.com: of Eratosthenes
Source: AutoHotkey forum by Laszlo

MsgBox % "12345678901234567890`n" Sieve(20)

Sieve(n) { ; Sieve of Eratosthenes => string of 0|1 chars, 1 at position k: k is prime
Static zero := 48, one := 49 ; Asc("0"), Asc("1")
VarSetCapacity(S,n,one)
NumPut(zero,S,0,"char")
i := 2
Loop % sqrt(n)-1 {
If (NumGet(S,i-1,"char") = one)
Loop % n//i
If (A_Index > 1)
NumPut(zero,S,A_Index*i-1,"char")
i += 1+(i>2)
}
Return S
}

Alternative Version

Sieve_of_Eratosthenes(n){
arr := []
loop % n-1
if A_Index>1
arr[A_Index] := true

for i, v in arr {
if (i>Sqrt(n))
break
else if arr[i]
while ((j := i*2 + (A_Index-1)*i) < n)
arr.delete(j)
}
return Arr
}
Examples:
n := 101
Arr := Sieve_of_Eratosthenes(n)
loop, % n-1
output .= (Arr[A_Index] ? A_Index : ".") . (!Mod(A_Index, 10) ? "`n" : "`t")
MsgBox % output
return
Output:
.	2	3	.	5	.	7	.	.	.
11	.	13	.	.	.	17	.	19	.
.	.	23	.	.	.	.	.	29	.
31	.	.	.	.	.	37	.	.	.
41	.	43	.	.	.	47	.	.	.
.	.	53	.	.	.	.	.	59	.
61	.	.	.	.	.	67	.	.	.
71	.	73	.	.	.	.	.	79	.
.	.	83	.	.	.	.	.	89	.
.	.	.	.	.	.	97	.	.	.

AutoIt

#include <Array.au3>
\$M = InputBox("Integer", "Enter biggest Integer")
Global \$a[\$M], \$r[\$M], \$c = 1
For \$i = 2 To \$M -1
If Not \$a[\$i] Then
\$r[\$c] = \$i
\$c += 1
For \$k = \$i To \$M -1 Step \$i
\$a[\$k] = True
Next
EndIf
Next
\$r = \$c - 1
ReDim \$r[\$c]
_ArrayDisplay(\$r)

AWK

An initial array holds all numbers 2..max (which is entered on stdin); then all products of integers are deleted from it; the remaining are displayed in the unsorted appearance of a hash table. Here, the script is entered directly on the commandline, and input entered on stdin:

\$ awk '{for(i=2;i<=\$1;i++) a[i]=1;
>       for(i=2;i<=sqrt(\$1);i++) for(j=2;j<=\$1;j++) delete a[i*j];
>       for(i in a) printf i" "}'
100
71 53 17 5 73 37 19 83 47 29 7 67 59 11 97 79 89 31 13 41 23 2 61 43 3

The following variant does not unset non-primes, but sets them to 0, to preserve order in output:

\$ awk '{for(i=2;i<=\$1;i++) a[i]=1;
>       for(i=2;i<=sqrt(\$1);i++) for(j=2;j<=\$1;j++) a[i*j]=0;
>       for(i=2;i<=\$1;i++) if(a[i])printf i" "}'
100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Now with the script from a file, input from commandline as well as stdin, and input is checked for valid numbers:

# usage: gawk -v n=101 -f sieve.awk

function sieve(n) { # print n,":"
for(i=2; i<=n; i++) a[i]=1;
for(i=2; i<=sqrt(n);i++) for(j=2;j<=n;j++) a[i*j]=0;
for(i=2; i<=n; i++) if(a[i]) printf i" "
print ""
}

BEGIN { print "Sieve of Eratosthenes:"
if(n>1) sieve(n)
}

{ n=\$1+0 }
n<2 { exit }
{ sieve(n) }

END { print "Bye!" }

Here is an alternate version that uses an associative array to record composites with a prime dividing it. It can be considered a slow version, as it does not cross out composites until needed. This version assumes enough memory to hold all primes up to ULIMIT. It prints out noncomposites greater than 1.

BEGIN { ULIMIT=100

for ( n=1 ; (n++) < ULIMIT ; )
if (n in S) {
p = S[n]
delete S[n]
for ( m = n ; (m += p) in S ; ) { }
S[m] = p
}
else print ( S[(n+n)] = n )
}

Bash

See solutions at UNIX Shell.

BASIC

Works with: FreeBASIC
Works with: RapidQ
DIM n AS Integer, k AS Integer, limit AS Integer

INPUT "Enter number to search to: "; limit
DIM flags(limit) AS Integer

FOR n = 2 TO SQR(limit)
IF flags(n) = 0 THEN
FOR k = n*n TO limit STEP n
flags(k) = 1
NEXT k
END IF
NEXT n

' Display the primes
FOR n = 2 TO limit
IF flags(n) = 0 THEN PRINT n; ", ";
NEXT n

Applesoft BASIC

10  INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT
20 DIM FLAGS(LIMIT)
30 FOR N = 2 TO SQR (LIMIT)
40 IF FLAGS(N) < > 0 GOTO 80
50 FOR K = N * N TO LIMIT STEP N
60 FLAGS(K) = 1
70 NEXT K
80 NEXT N
90 REM DISPLAY THE PRIMES
100 FOR N = 2 TO LIMIT
110 IF FLAGS(N) = 0 THEN PRINT N;", ";
120 NEXT N

Atari BASIC

Translation of: Commodore BASIC

Auto-initialization of arrays is not reliable, so we have to do our own. Also, PRINTing with commas doesn't quite format as nicely as one might hope, so we do a little extra work to keep the columns lined up.

100 REM SIEVE OF ERATOSTHENES
110 PRINT "LIMIT";:INPUT LI
120 DIM N(LI):FOR I=0 TO LI:N(I)=1:NEXT I
130 SL = SQR(LI)
140 N(0)=0:N(1)=0
150 FOR P=2 TO SL
160 IF N(P)=0 THEN 200
170 FOR I=P*P TO LI STEP P
180 N(I)=0
190 NEXT I
200 NEXT P
210 C=0
220 FOR I=2 TO LI
230 IF N(I)=0 THEN 260
240 PRINT I,:C=C+1
250 IF C=3 THEN PRINT:C=0
260 NEXT I
270 IF C THEN PRINT
Output:
RUN
LIMIT?100
2         3         5
7         11        13
17        19        23
29        31        37
41        43        47
53        59        61
67        71        73
79        83        89
97

Commodore BASIC

Since C= BASIC initializes arrays to all zeroes automatically, we avoid needing our own initialization loop by simply letting 0 mean prime and using 1 for composite.

100 REM SIEVE OF ERATOSTHENES
110 INPUT "LIMIT";LI
120 DIM N(LI)
130 SL = SQR(LI)
140 N(0)=1:N(1)=1
150 FOR P=2 TO SL
160 : IF N(P) THEN 200
170 : FOR I=P*P TO LI STEP P
180 : N(I)=1
190 : NEXT I
200 NEXT P
210 FOR I=2 TO LI
220 : IF N(I)=0 THEN PRINT I,
230 NEXT I
240 PRINT

Output:
RUN
LIMIT? 100
2         3         5         7
11        13        17        19
23        29        31        37
41        43        47        53
59        61        67        71
73        79        83        89
97

IS-BASIC

100 PROGRAM "Sieve.bas"
110 LET LIMIT=100
120 NUMERIC T(1 TO LIMIT)
130 FOR I=1 TO LIMIT
140 LET T(I)=0
150 NEXT
160 FOR I=2 TO SQR(LIMIT)
170 IF T(I)<>1 THEN
180 FOR K=I*I TO LIMIT STEP I
190 LET T(K)=1
200 NEXT
210 END IF
220 NEXT
230 FOR I=2 TO LIMIT ! Display the primes
240 IF T(I)=0 THEN PRINT I;
250 NEXT

Locomotive Basic

10 DEFINT a-z
20 INPUT "Limit";limit
30 DIM f(limit)
40 FOR n=2 TO SQR(limit)
50 IF f(n)=1 THEN 90
60 FOR k=n*n TO limit STEP n
70 f(k)=1
80 NEXT k
90 NEXT n
100 FOR n=2 TO limit
110 IF f(n)=0 THEN PRINT n;",";
120 NEXT

MSX Basic

5 REM Tested with MSXPen web emulator
6 REM Translated from Rosetta's ZX Spectrum implementation
10 INPUT "Enter number to search to: ";l
20 DIM p(l)
30 FOR n=2 TO SQR(l)
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
70 NEXT k
80 NEXT n
90 REM Display the primes
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n

Sinclair ZX81 BASIC

If you only have 1k of RAM, this program will work—but you will only be able to sieve numbers up to 101. The program is therefore more useful if you have more memory available.

A note on FAST and SLOW: under normal circumstances the CPU spends about 3/4 of its time driving the display and only 1/4 doing everything else. Entering FAST mode blanks the screen (which we do not want to update anyway), resulting in substantially improved performance; we then return to SLOW mode when we have something to print out.

10 INPUT L
20 FAST
30 DIM N(L)
40 FOR I=2 TO SQR L
50 IF N(I) THEN GOTO 90
60 FOR J=I+I TO L STEP I
70 LET N(J)=1
80 NEXT J
90 NEXT I
100 SLOW
110 FOR I=2 TO L
120 IF NOT N(I) THEN PRINT I;" ";
130 NEXT I

ZX Spectrum Basic

10 INPUT "Enter number to search to: ";l
20 DIM p(l)
30 FOR n=2 TO SQR l
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
70 NEXT k
80 NEXT n
90 REM Display the primes
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n

QL SuperBASIC

using 'easy way' to 'add' 2n wheels

Translation of: ZX Spectrum Basic

Sets h\$ to 1 for higher multiples of 2 via FILL\$, later on sets STEP to 2n; replaces Floating Pt array p(z) with string variable h\$(z) to sieve out all primes < z=441 (l=21) in under 1K, so that h\$ is fillable to its maximum (32766), even on a 48K ZX Spectrum if translated back.

10
INPUT "Enter Stopping Pt for squared factors: ";z
15 LET l=SQR(z)
20 LET h\$="10" : h\$=h\$ & FILL\$("01",z)
40 FOR n=3 TO l
50 IF h\$(n): NEXT n
60 FOR k=n*n TO z STEP n+n: h\$(k)=1
80 END FOR n
90 REM Display the primes
100 FOR n=2 TO z: IF h\$(n)=0: PRINT n;", ";

2i wheel emulation of Sinclair ZX81 BASIC

Backward-compatible also on Spectrums, as well as 1K ZX81s for all primes < Z=441. N.B. the STEP of 2 in line 40 mitigates line 50's inefficiency when going to 90.

10
INPUT Z
15 LET L=SQR(Z)
30 LET H\$="10"
32 FOR J=3 TO Z STEP 2
34 LET H\$=H\$ & "01"
36 NEXT J
40 FOR I=3 TO L STEP 2
50 IF H\$(I)="1" THEN GOTO 90
60 FOR J=I*I TO Z STEP I+I
70 LET H\$(J)="1"
80 NEXT J
90 NEXT I
110 FOR I=2 TO Z
120 IF H\$(I)="0" THEN PRINT I!
130 NEXT I

2i wheel emulation of Sinclair ZX80 BASIC

. . . with 2:1 compression (of 16-bit integer variables on ZX80s) such that it obviates having to account for any multiple of 2; one has to input odd upper limits on factors to be squared, L (=21 at most on 1K ZX80s for all primes till 439).

Backward-compatible on ZX80s after substituting ** for ^ in line 120.

10
INPUT L
15 LET Z=(L+1)*(L- 1)/2
30 DIM H(Z)
40 FOR I=3 TO L STEP 2
50 IF H((I-1)/ 2) THEN GOTO 90
60 FOR J=I*I TO L*L STEP I+I
70 LET H((J-1)/ 2)=1
80 NEXT J
90 NEXT I
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I

Sieve of Sundaram

Objections that the latter emulation has strayed far from the given task are obviously justified. Yet not as obvious is that we are now just a slight transformation away from the Sieve of Sundaram, as transformed as follows: O is the highest value for an Index of succesive diagonal elements in Sundaram's matrix, for which H(J) also includes the off-diagonal elements in-between, such that duplicate entries are omitted. Thus, a slightly transformed Sieve of Sundaram is what Eratosthenes' Sieve becomes upon applying all optimisations incorporated into the prior entries for QL SuperBASIC, except for any equivalent to line 50 in them.

Backward-compatible on 1K ZX80s for all primes < 441 (O=10) after substituting ** for ^ in line 120.

10
INPUT O
15 LET Z=2*O*O+O*2
30 DIM H(Z)
40 FOR I=1 TO O
45 LET A=2*I*I+I*2
50 REM IF H(A) THEN GOTO 90
60 FOR J=A TO Z STEP 1+I*2
65 REM IF H(J) THEN GOTO 80
70 LET H(J)=1
80 NEXT J
90 NEXT I
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I

Eulerian optimisation

While slower than the optimised Sieve of Eratosthenes before it, the Sieve of Sundaram above has a compatible compression scheme that's more convenient than the conventional one used beforehand. It is therefore applied below along with Euler's alternative optimisation in a reversed implementation that lacks backward-compatibility to ZX80 BASIC. This program is designed around features & limitations of the QL, yet can be rewritten more efficiently for 1K ZX80s, as they allow integer variables to be parameters of FOR statements (& as their 1K of static RAM is equivalent to L1 cache, even in FAST mode). That's left as an exercise for ZX80 enthusiasts, who for o%=14 should be able to generate all primes < 841, i.e. 3 orders of (base 2) magnitude above the limit for the program listed under Sinclair ZX81 BASIC. In QL SuperBASIC, o% may at most be 127--generating all primes < 65,025 (almost 2x the upper limit for indices & integer variables used to calculate them ~2x faster than for floating point as used in line 30, after which the integer code mimics an assembly algorithm for the QL's 68008.)

10
INPUT "Enter highest value of diagonal index q%: ";o%
15 LET z%=o%*(2+o%*2) : h\$=FILL\$(" ",z%+o%) : q%=1 : q=q% : m=z% DIV (2*q%+1)
30 FOR p=m TO q STEP -1: h\$((2*q+1)*p+q)="1"
42 GOTO 87
61 IF h\$(p%)="1": GOTO 63
62 IF p%<q%: GOTO 87 : ELSE h\$((2*q%+1)*p%+q%)="1"
63 LET p%=p%-1 : GOTO 61
87 LET q%=q%+1 : IF h\$(q%)="1": GOTO 87
90 LET p%=z% DIV (2*q%+1) : IF q%<=o%: GOTO 61
100 LET z%=z%-1 : IF z%=0: PRINT N%(z%) : STOP
101 IF h\$(z%)=" ": PRINT N%(z%)!
110 GOTO 100
127 DEF FN N%(i)=0^i+1+i*2

Batch File

:: Sieve of Eratosthenes for Rosetta Code - PG
@echo off
setlocal ENABLEDELAYEDEXPANSION
setlocal ENABLEEXTENSIONS
rem echo on
set /p n=limit:
rem set n=100
for /L %%i in (1,1,%n%) do set crible.%%i=1
for /L %%i in (2,1,%n%) do (
if !crible.%%i! EQU 1 (
set /A w = %%i * 2
for /L %%j in (!w!,%%i,%n%) do (
set crible.%%j=0
)
)
)
for /L %%i in (2,1,%n%) do (
if !crible.%%i! EQU 1 echo %%i
)
pause
Output:
limit: 100
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97

BBC BASIC

limit% = 100000
DIM sieve% limit%

prime% = 2
WHILE prime%^2 < limit%
FOR I% = prime%*2 TO limit% STEP prime%
sieve%?I% = 1
NEXT
REPEAT prime% += 1 : UNTIL sieve%?prime%=0
ENDWHILE

REM Display the primes:
FOR I% = 1 TO limit%
IF sieve%?I% = 0 PRINT I%;
NEXT

BCPL

get "libhdr"

manifest \$( LIMIT = 1000 \$)

let sieve(prime,max) be
\$( let i = 2
0!prime := false
1!prime := false
for i = 2 to max do i!prime := true
while i*i <= max do
\$( if i!prime do
\$( let j = i*i
while j <= max do
\$( j!prime := false
j := j + i
\$)
\$)
i := i + 1
\$)
\$)

let start() be
\$( let prime = vec LIMIT
let col = 0
sieve(prime, LIMIT)
for i = 2 to LIMIT do
if i!prime do
\$( writef("%I4",i)
col := col + 1
if col rem 20 = 0 then wrch('*N')
\$)
wrch('*N')
\$)
Output:
2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71
73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997

Odds-only bit packed array version (64 bit)

This sieve also uses an iterator structure to enumerate the primes in the sieve. It's inspired by the golang bit packed sieve that returns a closure as an iterator. However, BCPL does not support closures, so the code uses an iterator object.

GET "libhdr"

LET lowbit(n) =
0 -> -1,
VALOF {
// The table is byte packed to conserve space; therefore we must
// unpack the structure.
//
LET deBruijn64 = TABLE
#x0001300239311C03, #x3D3A322A261D1104,
#x3E373B2435332B16, #x2D27211E18120C05,
#x3F2F381B3C292510, #x362334152C20170B,
#x2E1A280F22141F0A, #x190E13090D080706

LET x6 = (n & -n) * #x3F79D71B4CB0A89 >> 58
RESULTIS deBruijn64[x6 >> 3] >> (7 - (x6 & 7) << 3) & #xFF
}

LET primes_upto(limit) =
limit < 3 -> 0,
VALOF {
LET bit_sz = (limit + 1) / 2 - 1
LET bit, p = ?, ?
LET q, r = bit_sz >> 6, bit_sz & #x3F
LET sz = q - (r > 0)
LET sieve = getvec(sz)

// Initialize the array
FOR i = 0 TO q - 1 DO
sieve!i := -1
IF r > 0 THEN sieve!q := ~(-1 << r)
sieve!sz := -1 // Sentinel value to mark the end -
// (after sieving, we'll never have 64 consecutive odd primes.)

// run the sieve
bit := 0
{
WHILE (sieve[bit >> 6] & 1 << (bit & #x3F)) = 0 DO
bit +:= 1
p := 2*bit + 3
q := p*p
IF q > limit THEN RESULTIS sieve
r := (q - 3) >> 1
UNTIL r >= bit_sz DO {
sieve[r >> 6] &:= ~(1 << (r & #x3F))
r +:= p
}
bit +:= 1
} REPEAT
}

MANIFEST { // fields in an iterable
sieve_start; sieve_bits; sieve_ptr
}

LET prime_iter(sieve) = VALOF {
LET iter = getvec(2)
iter!sieve_start := 0
iter!sieve_bits := sieve!0
iter!sieve_ptr := sieve
RESULTIS iter
}

LET nextprime(iter) =
!iter!sieve_ptr = -1 -> 0, // guard entry if at the end already
VALOF {
LET p, x = ?, ?

// iter!sieve_start is also a flag to yield 2.
IF iter!sieve_start = 0 {
iter!sieve_start := 3
RESULTIS 2
}
x := iter!sieve_bits
{
TEST x ~= 0
THEN {
p := (lowbit(x) << 1) + iter!sieve_start
x &:= x - 1
iter!sieve_bits := x
RESULTIS p
}
ELSE {
iter!sieve_start +:= 128
iter!sieve_ptr +:= 1
x := !iter!sieve_ptr
IF x = -1 RESULTIS 0
}
} REPEAT
}

LET show(sieve) BE {
LET iter = prime_iter(sieve)
LET c, p = 0, ?
{
p := nextprime(iter)
IF p = 0 THEN {
wrch('*n')
freevec(iter)
RETURN
}
IF c MOD 10 = 0 THEN wrch('*n')
c +:= 1
writef("%8d", p)
} REPEAT
}

LET start() = VALOF {
LET n = ?
LET argv = VEC 20
LET sz = ?
LET primes = ?

sz := rdargs("upto/a/n/p", argv, 20)
IF sz = 0 RESULTIS 1
n := !argv!0
primes := primes_upto(n)
IF primes = 0 RESULTIS 1 // no array allocated because limit too small
show(primes)
freevec(primes)
RESULTIS 0
}

Output:
\$ ./sieve 1000

BCPL 64-bit Cintcode System (13 Jan 2020)
0.000>
2       3       5       7      11      13      17      19      23      29
31      37      41      43      47      53      59      61      67      71
73      79      83      89      97     101     103     107     109     113
127     131     137     139     149     151     157     163     167     173
179     181     191     193     197     199     211     223     227     229
233     239     241     251     257     263     269     271     277     281
283     293     307     311     313     317     331     337     347     349
353     359     367     373     379     383     389     397     401     409
419     421     431     433     439     443     449     457     461     463
467     479     487     491     499     503     509     521     523     541
547     557     563     569     571     577     587     593     599     601
607     613     617     619     631     641     643     647     653     659
661     673     677     683     691     701     709     719     727     733
739     743     751     757     761     769     773     787     797     809
811     821     823     827     829     839     853     857     859     863
877     881     883     887     907     911     919     929     937     941
947     953     967     971     977     983     991     997
0.005>

Befunge

2>:3g" "-!v\  g30          <
|!`"O":+1_:.:03p>03g+:"O"`|
@               ^  p3\" ":<
2 234567890123456789012345678901234567890123456789012345678901234567890123456789

BQN

A more efficient sieve (primes below one billion in under a minute) is provided as PrimesTo in bqn-libs primes.bqn.

Primes ← {
𝕩≤2 ? ↕0 ; # No primes below 2
p ← 𝕊⌈√n←𝕩 # Initial primes by recursion
b ← 2≤↕n # Initial sieve: no 0 or 1
E ← {↕∘⌈⌾((𝕩×𝕩+⊢)⁼)n} # Multiples of 𝕩 under n, starting at 𝕩×𝕩
/ b E⊸{0¨⌾(𝕨⊸⊏)𝕩}´ p # Cross them out
}
Output:
Primes 100
⟨ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 ⟩
≠∘Primes¨ 10⋆↕7 # Number of primes below 1e0, 1e1, ... 1e6
⟨ 0 4 25 168 1229 9592 78498 ⟩

Bracmat

This solution does not use an array. Instead, numbers themselves are used as variables. The numbers that are not prime are set (to the silly value "nonprime"). Finally all numbers up to the limit are tested for being initialised. The uninitialised (unset) ones must be the primes.

( ( eratosthenes
= n j i
.  !arg:?n
& 1:?i
& whl
' ( (1+!i:?i)^2:~>!n:?j
& ( !!i
| whl
' ( !j:~>!n
& nonprime:?!j
& !j+!i:?j
)
)
)
& 1:?i
& whl
' ( 1+!i:~>!n:?i
& (!!i|put\$(!i " "))
)
)
& eratosthenes\$100
)
Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

C

Plain sieve, without any optimizations:
#include <stdlib.h>
#include <math.h>

char*
eratosthenes(int n, int *c)
{
char* sieve;
int i, j, m;

if(n < 2)
return NULL;

*c = n-1; /* primes count */
m = (int) sqrt((double) n);

/* calloc initializes to zero */
sieve = calloc(n+1,sizeof(char));
sieve = 1;
sieve = 1;
for(i = 2; i <= m; i++)
if(!sieve[i])
for (j = i*i; j <= n; j += i)
if(!sieve[j]){
sieve[j] = 1;
--(*c);
}
return sieve;
}
Possible optimizations include sieving only odd numbers (or more complex wheels), packing the sieve into bits to improve locality (and allow larger sieves), etc.

Another example:

We first fill ones into an array and assume all numbers are prime. Then, in a loop, fill zeroes into those places where i * j is less than or equal to n (number of primes requested), which means they have multiples! To understand this better, look at the output of the following example.

To print this back, we look for ones in the array and only print those spots.
#include <stdio.h>
#include <malloc.h>
void sieve(int *, int);

int main(int argc, char *argv)
{
int *array, n=10;
array =(int *)malloc((n + 1) * sizeof(int));
sieve(array,n);
return 0;
}

void sieve(int *a, int n)
{
int i=0, j=0;

for(i=2; i<=n; i++) {
a[i] = 1;
}

for(i=2; i<=n; i++) {
printf("\ni:%d", i);
if(a[i] == 1) {
for(j=i; (i*j)<=n; j++) {
printf ("\nj:%d", j);
printf("\nBefore a[%d*%d]: %d", i, j, a[i*j]);
a[(i*j)] = 0;
printf("\nAfter a[%d*%d]: %d", i, j, a[i*j]);
}
}
}

printf("\nPrimes numbers from 1 to %d are : ", n);
for(i=2; i<=n; i++) {
if(a[i] == 1)
printf("%d, ", i);
}
printf("\n\n");
}
Output:
i:2
j:2
Before a[2*2]: 1
After a[2*2]: 0
j:3
Before a[2*3]: 1
After a[2*3]: 0
j:4
Before a[2*4]: 1
After a[2*4]: 0
j:5
Before a[2*5]: 1
After a[2*5]: 0
i:3
j:3
Before a[3*3]: 1
After a[3*3]: 0
i:4
i:5
i:6
i:7
i:8
i:9
i:10
Primes numbers from 1 to 10 are : 2, 3, 5, 7,

C#

Works with: C# version 2.0+
using System;
using System.Collections;
using System.Collections.Generic;

namespace SieveOfEratosthenes
{
class Program
{
static void Main(string[] args)
{
int maxprime = int.Parse(args);
var primelist = GetAllPrimesLessThan(maxprime);
foreach (int prime in primelist)
{
Console.WriteLine(prime);
}
Console.WriteLine("Count = " + primelist.Count);
}

private static List<int> GetAllPrimesLessThan(int maxPrime)
{
var primes = new List<int>();
var maxSquareRoot = (int)Math.Sqrt(maxPrime);
var eliminated = new BitArray(maxPrime + 1);

for (int i = 2; i <= maxPrime; ++i)
{
if (!eliminated[i])
{
if (i <= maxSquareRoot)
{
for (int j = i * i; j <= maxPrime; j += i)
{
eliminated[j] = true;
}
}
}
}
return primes;
}
}
}

Unbounded

Richard Bird Sieve

Translation of: F#

To show that C# code can be written in somewhat functional paradigms, the following in an implementation of the Richard Bird sieve from the Epilogue of [Melissa E. O'Neill's definitive article](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) in Haskell:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesBird : IEnumerable<PrimeT> {
private struct CIS<T> {
public T v; public Func<CIS<T>> cont;
public CIS(T v, Func<CIS<T>> cont) {
this.v = v; this.cont = cont;
}
}
private CIS<PrimeT> pmlts(PrimeT p) {
Func<PrimeT, CIS<PrimeT>> fn = null;
fn = (c) => new CIS<PrimeT>(c, () => fn(c + p));
return fn(p * p);
}
private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {
return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont())); }
private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {
var x = xs.v; var y = ys.v;
if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));
else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));
else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));
}
private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {
return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(css.cont()))); }
private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {
var nn = n; var ncs = cs;
for (; ; ++nn) {
if (nn >= ncs.v) ncs = ncs.cont();
else return new CIS<PrimeT>(nn, () => minusat(++nn, ncs));
}
}
private CIS<PrimeT> prms() {
return new CIS<PrimeT>(2, () => minusat(3, cmpsts(allmlts(prms())))); }
public IEnumerator<PrimeT> GetEnumerator() {
for (var ps = prms(); ; ps = ps.cont()) yield return ps.v;
}
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}

Tree Folding Sieve

Translation of: F#

The above code can easily be converted to "odds-only" and a infinite tree-like folding scheme with the following minor changes:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesTreeFold : IEnumerable<PrimeT> {
private struct CIS<T> {
public T v; public Func<CIS<T>> cont;
public CIS(T v, Func<CIS<T>> cont) {
this.v = v; this.cont = cont;
}
}
private CIS<PrimeT> pmlts(PrimeT p) {
var adv = p + p;
Func<PrimeT, CIS<PrimeT>> fn = null;
fn = (c) => new CIS<PrimeT>(c, () => fn(c + adv));
return fn(p * p);
}
private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {
return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont()));
}
private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {
var x = xs.v; var y = ys.v;
if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));
else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));
else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));
}
private CIS<CIS<PrimeT>> pairs(CIS<CIS<PrimeT>> css) {
var nxtcss = css.cont();
return new CIS<CIS<PrimeT>>(merge(css.v, nxtcss.v), () => pairs(nxtcss.cont())); }
private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {
return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(pairs(css.cont()))));
}
private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {
var nn = n; var ncs = cs;
for (; ; nn += 2) {
if (nn >= ncs.v) ncs = ncs.cont();
else return new CIS<PrimeT>(nn, () => minusat(nn + 2, ncs));
}
}
private CIS<PrimeT> oddprms() {
return new CIS<PrimeT>(3, () => minusat(5, cmpsts(allmlts(oddprms()))));
}
public IEnumerator<PrimeT> GetEnumerator() {
yield return 2;
for (var ps = oddprms(); ; ps = ps.cont()) yield return ps.v;
}
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}

The above code runs over ten times faster than the original Richard Bird algorithm.

Priority Queue Sieve

Translation of: F#

First, an implementation of a Min Heap Priority Queue is provided as extracted from the entry at RosettaCode, with only the necessary methods duplicated here:

namespace PriorityQ {
using KeyT = System.UInt32;
using System;
using System.Collections.Generic;
using System.Linq;
class Tuple<K, V> { // for DotNet 3.5 without Tuple's
public K Item1; public V Item2;
public Tuple(K k, V v) { Item1 = k; Item2 = v; }
public override string ToString() {
return "(" + Item1.ToString() + ", " + Item2.ToString() + ")";
}
}
class MinHeapPQ<V> {
private struct HeapEntry {
public KeyT k; public V v;
public HeapEntry(KeyT k, V v) { this.k = k; this.v = v; }
}
private List<HeapEntry> pq;
private MinHeapPQ() { this.pq = new List<HeapEntry>(); }
private bool mt { get { return pq.Count == 0; } }
private Tuple<KeyT, V> pkmn {
get {
if (pq.Count == 0) return null;
else {
var mn = pq;
return new Tuple<KeyT, V>(mn.k, mn.v);
}
}
}
private void psh(KeyT k, V v) { // add extra very high item if none
if (pq.Count == 0) pq.Add(new HeapEntry(UInt32.MaxValue, v));
var i = pq.Count; pq.Add(pq[i - 1]); // copy bottom item...
for (var ni = i >> 1; ni > 0; i >>= 1, ni >>= 1) {
var t = pq[ni - 1];
if (t.k > k) pq[i - 1] = t; else break;
}
pq[i - 1] = new HeapEntry(k, v);
}
private void siftdown(KeyT k, V v, int ndx) {
var cnt = pq.Count - 1; var i = ndx;
for (var ni = i + i + 1; ni < cnt; ni = ni + ni + 1) {
var oi = i; var lk = pq[ni].k; var rk = pq[ni + 1].k;
var nk = k;
if (k > lk) { i = ni; nk = lk; }
if (nk > rk) { ni += 1; i = ni; }
if (i != oi) pq[oi] = pq[i]; else break;
}
pq[i] = new HeapEntry(k, v);
}
private void rplcmin(KeyT k, V v) {
if (pq.Count > 1) siftdown(k, v, 0); }
public static MinHeapPQ<V> empty { get { return new MinHeapPQ<V>(); } }
public static Tuple<KeyT, V> peekMin(MinHeapPQ<V> pq) { return pq.pkmn; }
public static MinHeapPQ<V> push(KeyT k, V v, MinHeapPQ<V> pq) {
pq.psh(k, v); return pq; }
public static MinHeapPQ<V> replaceMin(KeyT k, V v, MinHeapPQ<V> pq) {
pq.rplcmin(k, v); return pq; }
}

The following code implements an improved version of the odds-only O'Neil algorithm, which provides the improvements of only adding base prime composite number streams to the queue when the sieved number reaches the square of the base prime (saving a huge amount of memory and considerable execution time, including not needing as wide a range of a type for the internal prime numbers) as well as minimizing stream processing using fusion:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesPQ : IEnumerable<PrimeT> {
private IEnumerator<PrimeT> nmrtr() {
MinHeapPQ<PrimeT> pq = MinHeapPQ<PrimeT>.empty;
PrimeT bp = 3; PrimeT q = 9;
IEnumerator<PrimeT> bps = null;
yield return 2; yield return 3;
for (var n = (PrimeT)5; ; n += 2) {
if (n >= q) { // always equal or less...
if (q <= 9) {
bps = nmrtr();
bps.MoveNext(); bps.MoveNext(); } // move to 3...
bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;
var adv = bp + bp; bp = nbp;
}
else {
var pk = MinHeapPQ<PrimeT>.peekMin(pq);
var ck = (pk == null) ? q : pk.Item1;
if (n >= ck) {
do { var adv = pk.Item2;
pk = MinHeapPQ<PrimeT>.peekMin(pq); ck = pk.Item1;
} while (n >= ck);
}
else yield return n;
}
}
}
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}

The above code is at least about 2.5 times faster than the Tree Folding version.

Dictionary (Hash table) Sieve

The above code adds quite a bit of overhead in having to provide a version of a Priority Queue for little advantage over a Dictionary (hash table based) version as per the code below:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesDict : IEnumerable<PrimeT> {
private IEnumerator<PrimeT> nmrtr() {
Dictionary<PrimeT, PrimeT> dct = new Dictionary<PrimeT, PrimeT>();
PrimeT bp = 3; PrimeT q = 9;
IEnumerator<PrimeT> bps = null;
yield return 2; yield return 3;
for (var n = (PrimeT)5; ; n += 2) {
if (n >= q) { // always equal or less...
if (q <= 9) {
bps = nmrtr();
bps.MoveNext(); bps.MoveNext();
} // move to 3...
bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;
var adv = bp + bp; bp = nbp;
}
else {
if (dct.ContainsKey(n)) {
}
else yield return n;
}
}
}
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}

The above code runs in about three quarters of the time as the above Priority Queue based version for a range of a million primes which will fall even further behind for increasing ranges due to the Dictionary providing O(1) access times as compared to the O(log n) access times for the Priority Queue; the only slight advantage of the PQ based version is at very small ranges where the constant factor overhead of computing the table hashes becomes greater than the "log n" factor for small "n".

Page Segmented Array Sieve

All of the above unbounded versions are really just an intellectual exercise as with very little extra lines of code above the fastest Dictionary based version, one can have an bit-packed page-segmented array based version as follows:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesPgd : IEnumerable<PrimeT> {
private const int PGSZ = 1 << 14; // L1 CPU cache size in bytes
private const int BFBTS = PGSZ * 8; // in bits
private const int BFRNG = BFBTS * 2;
public IEnumerator<PrimeT> nmrtr() {
IEnumerator<PrimeT> bps = null;
List<uint> bpa = new List<uint>();
uint[] cbuf = new uint[PGSZ / 4]; // 4 byte words
yield return 2;
for (var lowi = (PrimeT)0; ; lowi += BFBTS) {
for (var bi = 0; ; ++bi) {
if (bi < 1) {
if (bi < 0) { bi = 0; yield return 2; }
PrimeT nxt = 3 + lowi + lowi + BFRNG;
if (lowi <= 0) { // cull very first page
for (int i = 0, p = 3, sqr = 9; sqr < nxt; i++, p += 2, sqr = p * p)
if ((cbuf[i >> 5] & (1 << (i & 31))) == 0)
for (int j = (sqr - 3) >> 1; j < BFBTS; j += p)
cbuf[j >> 5] |= 1u << j;
}
else { // cull for the rest of the pages
Array.Clear(cbuf, 0, cbuf.Length);
if (bpa.Count == 0) { // inite secondar base primes stream
bps = nmrtr(); bps.MoveNext(); bps.MoveNext();
} // add 3 to base primes array
// make sure bpa contains enough base primes...
for (PrimeT p = bpa[bpa.Count - 1], sqr = p * p; sqr < nxt; ) {
p = bps.Current; bps.MoveNext(); sqr = p * p; bpa.Add((uint)p);
}
for (int i = 0, lmt = bpa.Count - 1; i < lmt; i++) {
var p = (PrimeT)bpa[i]; var s = (p * p - 3) >> 1;
// adjust start index based on page lower limit...
if (s >= lowi) s -= lowi;
else {
var r = (lowi - s) % p;
s = (r != 0) ? p - r : 0;
}
for (var j = (uint)s; j < BFBTS; j += p)
cbuf[j >> 5] |= 1u << ((int)j);
}
}
}
while (bi < BFBTS && (cbuf[bi >> 5] & (1 << (bi & 31))) != 0) ++bi;
if (bi < BFBTS) yield return 3 + (((PrimeT)bi + lowi) << 1);
else break; // outer loop for next page segment...
}
}
}
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}

The above code is about 25 times faster than the Dictionary version at computing the first about 50 million primes (up to a range of one billion), with the actual enumeration of the result sequence now taking longer than the time it takes to cull the composite number representation bits from the arrays, meaning that it is over 50 times faster at actually sieving the primes. The code owes its speed as compared to a naive "one huge memory array" algorithm to using an array size that is the size of the CPU L1 or L2 caches and using bit-packing to fit more number representations into this limited capacity; in this way RAM memory access times are reduced by a factor of from about four to about 10 (depending on CPU and RAM speed) as compared to those naive implementations, and the minor computational cost of the bit manipulations is compensated by a large factor in total execution time.

The time to enumerate the result primes sequence can be reduced somewhat (about a second) by removing the automatic iterator "yield return" statements and converting them into a "rull-your-own" IEnumerable<PrimeT> implementation, but for page segmentation of odds-only, this iteration of the results will still take longer than the time to actually cull the composite numbers from the page arrays.

In order to make further gains in speed, custom methods must be used to avoid using iterator sequences. If this is done, then further gains can be made by extreme wheel factorization (up to about another about four times gain in speed) and multi-processing (with another gain in speed proportional to the actual independent CPU cores used).

Note that all of these gains in speed are not due to C# other than it compiles to reasonably efficient machine code, but rather to proper use of the Sieve of Eratosthenes algorithm.

All of the above unbounded code can be tested by the following "main" method (replace the name "PrimesXXX" with the name of the class to be tested):

static void Main(string[] args) {
Console.WriteLine(PrimesXXX().ElementAt(1000000 - 1)); // zero based indexing...
}

To produce the following output for all tested versions (although some are considerably faster than others):

Output:
15485863

C++

Standard Library

This implementation follows the standard library pattern of std::iota. The start and end iterators are provided for the container. The destination container is used for marking primes and then filled with the primes which are less than the container size. This method requires no memory allocation inside the function.

#include <iostream>
#include <algorithm>
#include <vector>

// requires Iterator satisfies RandomAccessIterator
template <typename Iterator>
size_t prime_sieve(Iterator start, Iterator end)
{
if (start == end) return 0;
// clear the container with 0
std::fill(start, end, 0);
// mark composites with 1
for (Iterator prime_it = start + 1; prime_it != end; ++prime_it)
{
if (*prime_it == 1) continue;
// determine the prime number represented by this iterator location
size_t stride = (prime_it - start) + 1;
// mark all multiples of this prime number up to max
Iterator mark_it = prime_it;
while ((end - mark_it) > stride)
{
*mark_it = 1;
}
}
// copy marked primes into container
Iterator out_it = start;
for (Iterator it = start + 1; it != end; ++it)
{
if (*it == 0)
{
*out_it = (it - start) + 1;
++out_it;
}
}
return out_it - start;
}

int main(int argc, const char* argv[])
{
std::vector<int> primes(100);
size_t count = prime_sieve(primes.begin(), primes.end());
// display the primes
for (size_t i = 0; i < count; ++i)
std::cout << primes[i] << " ";
std::cout << std::endl;
return 1;
}

Boost

// yield all prime numbers less than limit.
template<class UnaryFunction>
void primesupto(int limit, UnaryFunction yield)
{
std::vector<bool> is_prime(limit, true);

const int sqrt_limit = static_cast<int>(std::sqrt(limit));
for (int n = 2; n <= sqrt_limit; ++n)
if (is_prime[n]) {
yield(n);

for (unsigned k = n*n, ulim = static_cast<unsigned>(limit); k < ulim; k += n)
//NOTE: "unsigned" is used to avoid an overflow in `k+=n` for `limit` near INT_MAX
is_prime[k] = false;
}

for (int n = sqrt_limit + 1; n < limit; ++n)
if (is_prime[n])
yield(n);
}

Full program:

Works with: Boost
/**
\$ g++ -I/path/to/boost sieve.cpp -o sieve && sieve 10000000
*/

#include <inttypes.h> // uintmax_t
#include <limits>
#include <cmath>
#include <iostream>
#include <sstream>
#include <vector>

#include <boost/lambda/lambda.hpp>

int main(int argc, char *argv[])
{
using namespace std;
using namespace boost::lambda;

int limit = 10000;
if (argc == 2) {
stringstream ss(argv[--argc]);
ss >> limit;

if (limit < 1 or ss.fail()) {
cerr << "USAGE:\n sieve LIMIT\n\nwhere LIMIT in the range [1, "
<< numeric_limits<int>::max() << ")" << endl;
return 2;
}
}

// print primes less then 100
primesupto(100, cout << _1 << " ");
cout << endl;

// find number of primes less then limit and their sum
int count = 0;
uintmax_t sum = 0;
primesupto(limit, (var(sum) += _1, var(count) += 1));

cout << "limit sum pi(n)\n"
<< limit << " " << sum << " " << count << endl;
}

Chapel

 This example is incorrect. Please fix the code and remove this message.Details: Doesn't compile since at least Chapel version 1.20 to 1.24.1.

This solution uses nested iterators to create new wheels at run time:

// yield prime and remove all multiples of it from children sieves
iter sieve(prime):int {

yield prime;

var last = prime;
label candidates for candidate in sieve(prime+1) do {
for composite in last..candidate by prime do {

// candidate is a multiple of this prime
if composite == candidate {
// remember size of last composite
last = composite;
// and try the next candidate
continue candidates;
}
}

// candidate cannot need to be removed by this sieve
// yield to parent sieve for checking
yield candidate;
}
}
The topmost sieve needs to be started with 2 (the smallest prime):
config const N = 30;
for p in sieve(2) {
if p > N {
writeln();
break;
}
write(" ", p);
}

Alternate Conventional Bit-Packed Implementation

The following code implements the conventional monolithic (one large array) Sieve of Eratosthenes where the representations of the numbers use only one bit per number, using an iteration for output so as to not require further memory allocation:

compile with the `--fast` option

use Time;
use BitOps;

type Prime = uint(32);

config const limit: Prime = 1000000000; // sieve limit

proc main() {
write("The first 25 primes are: ");
for p in primes(100) do write(p, " "); writeln();

var count = 0; for p in primes(1000000) do count += 1;
writeln("Count of primes to a million is: ", count, ".");

var timer: Timer;
timer.start();

count = 0;
for p in primes(limit) do count += 1;

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

iter primes(n: Prime): Prime {
const szlmt = n / 8;
var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up

for bp in 2 .. n {
if cmpsts[bp >> 3] & (1: uint(8) << (bp & 7)) == 0 {
const s0 = bp * bp;
if s0 > n then break;
for c in s0 .. n by bp { cmpsts[c >> 3] |= 1: uint(8) << (c & 7); }
}
}

for p in 2 .. n do if cmpsts[p >> 3] & (1: uint(8) << (p & 7)) == 0 then yield p;

}
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Count of primes to a million is:  78498.
Found 50847534 primes up to 1000000000 in 7964.05 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

Alternate Odds-Only Bit-Packed Implementation

use Time;
use BitOps;

type Prime = int(32);

config const limit: Prime = 1000000000; // sieve limit

proc main() {
write("The first 25 primes are: ");
for p in primes(100) do write(p, " "); writeln();

var count = 0; for p in primes(1000000) do count += 1;
writeln("Count of primes to a million is: ", count, ".");

var timer: Timer;
timer.start();

count = 0;
for p in primes(limit) do count += 1;

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

iter primes(n: Prime): Prime {
const ndxlmt = (n - 3) / 2;
const szlmt = ndxlmt / 8;
var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up

for i in 0 .. ndxlmt { // never gets to the end!
if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 {
const bp = i + i + 3;
const s0 = (bp * bp - 3) / 2;
if s0 > ndxlmt then break;
for s in s0 .. ndxlmt by bp do cmpsts[s >> 3] |= 1: uint(8) << (s & 7);
}
}

yield 2;
for i in 0 .. ndxlmt do
if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 then yield i + i + 3;

}
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Count of primes to a million is:  78498.
Found 50847534 primes up to 1000000000 in 4008.16 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, sieving odds-only is about twice as fast due to the reduced number of operations; it also uses only half the amount of memory. However, this is still not all that fast at about 14.4 CPU clock cycles per sieve culling operation due to the size of the array exceeding the CPU cache size(s).

Hash Table Based Odds-Only Version

Translation of: Python
Works with: Chapel version 1.25.1
use Time;

config const limit = 100000000;

type Prime = uint(32);

class Primes { // needed so we can use next to get successive values
var n: Prime; var obp: Prime; var q: Prime;
var bps: owned Primes?;
var keys: domain(Prime); var dict: [keys] Prime;
proc next(): Prime { // odd primes!
if this.n < 5 { this.n = 5; return 3; }
if this.bps == nil {
this.bps = new Primes(); // secondary odd base primes feed
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
while true {
if this.n >= this.q { // advance secondary stream of base primes...
const adv = this.obp * 2; const key = this.q + adv;
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
this.keys += key; this.dict[key] = adv;
}
else if this.keys.contains(this.n) { // found a composite; advance...
var nkey = this.n + adv;
while this.keys.contains(nkey) do nkey += adv;
this.keys += nkey; this.dict[nkey] = adv;
}
else { const p = this.n; this.n += 2; return p; }
this.n += 2;
}
return 0; // to keep compiler happy in returning a value!
}
iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
var count = 0;
write("The first 25 primes are: ");
for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
writeln();

var timer: Timer;
timer.start();

count = 0;
for p in new Primes() { if p > limit then break; count += 1; }

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 5761455 primes up to 100000000 in 5195.41 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, this is much slower than the array based versions but much faster than previous Chapel version code as the hashing has been greatly improved.

As an alternate to the use of a built-in library, the following code implements a specialized BasePrimesTable that works similarly to the way the Python associative arrays work as to hashing algorithm used (no hashing, as the hash values for integers are just themselves) and something similar to the Python method of handling hash table collisions is used:

Works with: Chapel version 1.25.1

Compile with the `--fast` compiler command line option

use Time;

config const limit = 100000000;

type Prime = uint(32);

record BasePrimesTable { // specialized for the use here...
record BasePrimeEntry { var fullkey: Prime; var val: Prime; }
var cpcty: int = 8; var sz: int = 0;
var dom = { 0 .. cpcty - 1 }; var bpa: [dom] BasePrimeEntry;
proc grow() {
const ndom = dom; var cbpa: [ndom] BasePrimeEntry = bpa[ndom];
bpa = new BasePrimeEntry(); cpcty *= 2; dom = { 0 .. cpcty - 1 };
for kv in cbpa do if kv.fullkey != 0 then add(kv.fullkey, kv.val);
}
proc find(k: Prime): int { // internal get location of value or -1
const msk = cpcty - 1; var skey = k: int & msk;
var perturb = k: int; var loop = 8;
do {
if bpa[skey].fullkey == k then return skey;
perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
loop -= 1; if perturb > 0 then loop = 8;
} while loop > 0;
}
proc contains(k: Prime): bool { return find(k) >= 0; }
proc add(k, v: Prime) { // if exists then replaces else new entry
const fndi = find(k);
if fndi >= 0 then bpa[fndi] = new BasePrimeEntry(k, v);
else {
sz += 1; if 2 * sz > cpcty then grow();
const msk = cpcty - 1; var skey = k: int & msk;
var perturb = k: int; var loop = 8;
do {
if bpa[skey].fullkey == 0 {
bpa[skey] = new BasePrimeEntry(k, v); return; }
perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
loop -= 1; if perturb > 0 then loop = 8;
} while loop > 0;
}
}
proc remove(k: Prime) { // if doesn't exist does nothing
const fndi = find(k);
if fndi >= 0 { bpa[fndi].fullkey = 0; sz -= 1; }
}
proc this(k: Prime): Prime { // returns value or 0 if not found
const fndi = find(k);
if fndi < 0 then return 0; else return bpa[fndi].val;
}
}

class Primes { // needed so we can use next to get successive values
var n: Prime; var obp: Prime; var q: Prime;
var bps: shared Primes?; var dict = new BasePrimesTable();
proc next(): Prime { // odd primes!
if this.n < 5 { this.n = 5; return 3; }
if this.bps == nil {
this.bps = new Primes(); // secondary odd base primes feed
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
while true {
if this.n >= this.q { // advance secondary stream of base primes...
const adv = this.obp * 2; const key = this.q + adv;
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
else if this.dict.contains(this.n) { // found a composite; advance...
var nkey = this.n + adv;
while this.dict.contains(nkey) do nkey += adv;
}
else { const p = this.n; this.n += 2; return p; }
this.n += 2;
}
return 0; // to keep compiler happy in returning a value!
}
iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
var count = 0;
write("The first 25 primes are: ");
for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
writeln();

var timer: Timer;
timer.start();

count = 0;
for p in new Primes() { if p > limit then break; count += 1; }

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 5761455 primes up to 100000000 in 2351.79 milliseconds.

This last code is quite usable up to a hundred million (as here) or even a billion in a little over ten times the time, but is still slower than the very simple odds-only monolithic array version and is also more complex, although it uses less memory (only for the hash table for the base primes of about eight Kilobytes for sieving to a billion compared to over 60 Megabytes for the monolithic odds-only simple version).

Chapel version 1.25.1 provides yet another option as to the form of the code although the algorithm is the same in that one can now override the hashing function for Chapel records so that they can be used as the Key Type for Hash Map's as follows:

Works with: Chapel version 1.25.1

Compile with the `--fast` compiler command line option

use Time;

use Map;

config const limit = 100000000;

type Prime = uint(32);

class Primes { // needed so we can use next to get successive values
record PrimeR { var prime: Prime; proc hash() { return prime; } }
var n: PrimeR = new PrimeR(0); var obp: Prime; var q: Prime;
var bps: owned Primes?;
var dict = new map(PrimeR, Prime);
proc next(): Prime { // odd primes!
if this.n.prime < 5 { this.n.prime = 5; return 3; }
if this.bps == nil {
this.bps = new Primes(); // secondary odd base primes feed
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
while true {
if this.n.prime >= this.q { // advance secondary stream of base primes...
const adv = this.obp * 2; const key = new PrimeR(this.q + adv);
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
else if this.dict.contains(this.n) { // found a composite; advance...
var nkey = new PrimeR(this.n.prime + adv);
while this.dict.contains(nkey) do nkey.prime += adv;
}
else { const p = this.n.prime;
this.n.prime += 2; return p; }
this.n.prime += 2;
}
return 0; // to keep compiler happy in returning a value!
}
iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
var count = 0;
write("The first 25 primes are: ");
for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
writeln();

var timer: Timer;
timer.start();

count = 0;
for p in new Primes() { if p > limit then break; count += 1; }

timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

This works in about exactly the same time as the last previous code, but doesn't require special custom adaptations of the associative array so that the standard library Map can be used.

Functional Tree Folding Odds-Only Version

Chapel isn't really a very functional language even though it has some functional forms of code in the Higher Order Functions (HOF's) of zippered, scanned, and reduced, iterations and has first class functions (FCF's) and lambdas (anonymous functions), these last can't be closures (capture variable bindings from external scope(s)), nor can the work around of using classes to emulate closures handle recursive (Y-combinator type) variable bindings using reference fields (at least currently with version 1.22). However, the Tree Folding add-on to the Richard Bird lazy list sieve doesn't require any of the things that can't be emulated using classes, so a version is given as follows:

Translation of: Nim
Works with: 1.22 version - compile with the --fast compiler command line flag for full optimization
use Time;

type Prime = uint(32);

config const limit = 1000000: Prime;

// Chapel doesn't have closures, so we need to emulate them with classes...
class PrimeCIS { // base prime stream...
proc next(): shared PrimeCIS { return new shared PrimeCIS(); }
}

class PrimeMultiples: PrimeCIS {
override proc next(): shared PrimeCIS {
return new shared PrimeMultiples(
}

class PrimeCISCIS { // base stream of prime streams; never used directly...
proc init() { this.head = new shared PrimeCIS(); }
proc next(): shared PrimeCISCIS {
return new shared PrimeCISCIS(); }
}

class AllMultiples: PrimeCISCIS {
var bps: shared PrimeCIS;
proc init(bsprms: shared PrimeCIS) {
const bp = bsprms.head; const sqr = bp * bp; const adv = bp + bp;
this.bps = bsprms;
}
override proc next(): shared PrimeCISCIS {
return new shared AllMultiples(this.bps.next()): PrimeCISCIS; }
}

class Union: PrimeCIS {
var feeda, feedb: shared PrimeCIS;
proc init(fda: shared PrimeCIS, fdb: shared PrimeCIS) {
this.head = if ahd < bhd then ahd else bhd;
this.feeda = fda; this.feedb = fdb;
}
override proc next(): shared PrimeCIS {
if ahd < bhd then
return new shared Union(this.feeda.next(), this.feedb): shared PrimeCIS;
if ahd > bhd then
return new shared Union(this.feeda, this.feedb.next()): shared PrimeCIS;
return new shared Union(this.feeda.next(),
this.feedb.next()): shared PrimeCIS;
}
}

class Pairs: PrimeCISCIS {
var feed: shared PrimeCISCIS;
proc init(fd: shared PrimeCISCIS) {
const fs = fd.head; const sss = fd.next(); const ss = sss.head;
this.head = new shared Union(fs, ss): shared PrimeCIS; this.feed = sss;
}
override proc next(): shared PrimeCISCIS {
return new shared Pairs(this.feed.next()): shared PrimeCISCIS; }
}

class Composites: PrimeCIS {
var feed: shared PrimeCISCIS;
proc init(fd: shared PrimeCISCIS) {
}
override proc next(): shared PrimeCIS {
const prs = new shared Pairs(this.feed.next()): shared PrimeCISCIS;
const ncs = new shared Composites(prs): shared PrimeCIS;
return new shared Union(fs, ncs): shared PrimeCIS;
}
}

class OddPrimesFrom: PrimeCIS {
var cmpsts: shared PrimeCIS;
override proc next(): shared PrimeCIS {
var n = head + 2; var cs = this.cmpsts;
while true {
return new shared OddPrimesFrom(n, cs): shared PrimeCIS;
n += 2; cs = cs.next();
}
return this.cmpsts; // never used; keeps compiler happy!
}
}

class OddPrimes: PrimeCIS {
proc init() { this.head = 3; }
override proc next(): shared PrimeCIS {
const bps = new shared OddPrimes(): shared PrimeCIS;
const mlts = new shared AllMultiples(bps): shared PrimeCISCIS;
const cmpsts = new shared Composites(mlts): shared PrimeCIS;
return new shared OddPrimesFrom(5, cmpsts): shared PrimeCIS;
}
}

iter primes(): Prime {
yield 2; var cur = new shared OddPrimes(): shared PrimeCIS;
while true { yield cur.head; cur = cur.next(); }
}

// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

var timer: Timer; timer.start(); cnt = 0;
for p in primes() { if p > limit then break; cnt += 1; }
timer.stop(); write("\nFound ", cnt, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes up to 1000000 in 344.859 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

The above code is really just a toy example to show that Chapel can handle some tasks functionally (within the above stated limits) although doing so is slower than the Hash Table version above and also takes more memory as the nested lazy list structure consumes more memory in lazy list links and "plumbing" than does the simple implementation of a Hash Table. It also has a worst asymptotic performance with an extra `log(n)` factor where `n` is the sieving range; this can be shown by running the above program with `--limit=10000000` run time command line option to sieve to ten million which takes about 4.5 seconds to count the primes up to ten million (a factor of ten higher range, but much higher than the expected increased factor of about 10 per cent extra as per the Hash Table version with about 20 per cent more operations times the factor of ten for this version). Other than for the extra operations, this version is generally slower due to the time to do the many small allocations/de-allocations of the functional object instances, and this will be highly dependent on the platform on which it is run: cygwin on Windows may be particularly slow due to the extra level of indirection, and some on-line IDE's may also be slow due to their level of virtualization.

A Multi-Threaded Page-Segmented Odds-Only Bit-Packed Version

Works with: 1.24.1 version - compile with the --fast compiler command line flag for full optimization
use Time; use BitOps; use CPtr;

type Prime = uint(64);
type PrimeNdx = int(64);
type BasePrime = uint(32);

config const LIMIT = 1000000000: Prime;

config const L1 = 16; // CPU L1 cache size in Kilobytes (1024);
assert (L1 == 16 || L1 == 32 || L1 == 64,
"L1 cache size must be 16, 32, or 64 Kilobytes!");
config const L2 = 128; // CPU L2 cache size in Kilobytes (1024);
assert (L2 == 128 || L2 == 256 || L2 == 512,
"L2 cache size must be 128, 256, or 512 Kilobytes!");
const CPUL1CACHE: int = L1 * 1024 * 8; // size in bits!
const CPUL2CACHE: int = L2 * 1024 * 8; // size in bits!
assert(NUMTHRDS >= 1, "NUMTHRDS must be at least one!");

const WHLPRMS = [ 2: Prime, 3: Prime, 5: Prime, 7: Prime,
11: Prime, 13: Prime, 17: Prime];
const FRSTSVPRM = 19: Prime; // past the pre-cull primes!
// 2 eliminated as even; 255255 in bytes...
const WHLPTRNSPN = 3 * 5 * 7 * 11 * 13 * 17;
// rounded up to next 64-bit boundary plus a 16 Kilobyte buffer for overflow...
const WHLPTRNBTSZ = ((WHLPTRNSPN * 8 + 63) & (-64)) + 131072;

// number of base primes within small span!
const SZBPSTRTS = 6542 - WHLPRMS.size + 1; // extra one for marker!
// number of base primes for CPU L1 cache buffer!
const SZMNSTRTS = (if L1 == 16 then 12251 else
if L1 == 32 then 23000 else 43390)
- WHLPRMS.size + 1; // extra one for marker!

// using this Look Up Table faster than bit twiddling...
const bitmsk = for i in 0 .. 7 do 1:uint(8) << i;

var WHLPTRN: SieveBuffer = new SieveBuffer(WHLPTRNBTSZ); fillWHLPTRN(WHLPTRN);
proc fillWHLPTRN(ref wp: SieveBuffer) {
const hi = WHLPRMS.size - 1;
const rng = 0 .. hi; var whlhd = new shared BasePrimeArr({rng});
// contains wheel pattern primes skipping the small wheel prime (2)!...
// never advances past the first base prime arr as it ends with a huge!...
for i in rng do whlhd.bparr[i] = (if i != hi then WHLPRMS[i + 1] // skip 2!
else 0x7FFFFFFF): BasePrime; // last huge!
var whlbpas = new shared BasePrimeArrs(whlhd);
var whlstrts = new StrtsArr({rng});
wp.cull(0, WHLPTRNBTSZ, whlbpas, whlstrts);
// eliminate wheel primes from the WHLPTRN buffer!...
wp.cmpsts = 0xFF: uint(8);
}

// the following two must be classes for compability with sync...
class PrimeArr { var dom = { 0 .. -1 }; var prmarr: [dom] Prime; }
class BasePrimeArr { var dom = { 0 .. -1 }; var bparr: [dom] BasePrime; }
record StrtsArr { var dom = { 0 .. -1 }; var strtsarr: [dom] int(32); }
record SieveBuffer {
var dom = { 0 .. -1 }; var cmpsts: [dom] uint(8) = 0;
proc init() {}
proc init(btsz: int) { dom = { 0 .. btsz / 8 - 1 }; }
proc deinit() { dom = { 0 .. -1 }; }

proc fill(lwi: PrimeNdx) { // fill from the WHLPTRN stamp...
const sz = cmpsts.size; const mvsz = min(sz, 16384);
var mdlo = ((lwi / 8) % (WHLPTRNSPN: PrimeNdx)): int;
for i in 0 .. sz - 1 by 16384 {
c_memcpy(c_ptrTo(cmpsts[i]): c_void_ptr,
c_ptrTo(WHLPTRN.cmpsts[mdlo]): c_void_ptr, mvsz);
mdlo += 16384; if mdlo >= WHLPTRNSPN then mdlo -= WHLPTRNSPN;
}
}

proc count(btlmt: int) { // count by 64 bits using CPU popcount...
const lstwrd = btlmt / 64; const lstmsk = (-2):uint(64) << (btlmt & 63);
const cmpstsp = c_ptrTo(cmpsts: [dom] uint(8)): c_ptr(uint(64));
var i = 0; var cnt = (lstwrd * 64 + 64): int;
while i < lstwrd { cnt -= popcount(cmpstsp[i]): int; i += 1; }
return cnt - popcount(cmpstsp[lstwrd] | lstmsk): int;
}

// most of the time is spent doing culling operations as follows!...
proc cull(lwi: PrimeNdx, bsbtsz: int, bpas: BasePrimeArrs,
ref strts: StrtsArr) {
const btlmt = cmpsts.size * 8 - 1; const bplmt = bsbtsz / 32;
const ndxlmt = lwi: Prime + btlmt: Prime; // can't overflow!
const strtssz = strts.strtsarr.size;
// C pointer for speed magic!...
const cmpstsp = c_ptrTo(cmpsts);
const strtsp = c_ptrTo(strts.strtsarr);

// first fill the strts array with pre-calculated start addresses...
var i = 0; for bp in bpas {
// calculate page start address for the given base prime...
const bpi = bp: int; const bbp = bp: Prime; const ndx0 = (bbp - 3) / 2;
const s0 = (ndx0 + ndx0) * (ndx0 + 3) + 3; // can't overflow!
if s0 > ndxlmt then {
if i < strtssz then strtsp[i] = -1: int(32); break; }
var s = 0: int;
if s0 >= lwi: Prime then s = (s0 - lwi: Prime): int;
else { const r = (lwi: Prime - s0) % bbp;
if r == 0 then s = 0: int; else s = (bbp - r): int; };
if i < strtssz - 1 { strtsp[i] = s: int(32); i += 1; continue; }
if i < strtssz { strtsp[i] = -1; i = strtssz; }
// cull the full buffer for this given base prime as usual...
// only works up to limit of int(32)**2!!!!!!!!
while s <= btlmt { cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
}

// cull the smaller sub buffers according to the strts array...
for sbtlmt in bsbtsz - 1 .. btlmt by bsbtsz {
i = 0; for bp in bpas { // bp never bigger than uint(32)!
// cull the sub buffer for this given base prime...
var s = strtsp[i]: int; if s < 0 then break;
var bpi = bp: int; var nxt = 0x7FFFFFFFFFFFFFFF;
if bpi <= bplmt { // use loop "unpeeling" for a small improvement...
const slmt = s + bpi * 8 - 1;
while s <= slmt {
const bmi = s & 7; const msk = bitmsk[bmi];
var c = s >> 3; const clmt = sbtlmt >> 3;
while c <= clmt { cmpstsp[c] |= msk; c += bpi; }
nxt = min(nxt, (c << 3): int(64) | bmi: int(64)); s += bpi;
}
strtsp[i] = nxt: int(32); i += 1;
}
else { while s <= sbtlmt { // standard cull loop...
cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
strtsp[i] = s: int(32); i += 1; }
}
}
}
}

// a generic record that contains a page result generating function;
// allows manual iteration through the use of the next() method;
class PagedResults {
const cnvrtrclsr; // output converter closure emulator, (lwi, sba) => output
var lwi: PrimeNdx; var bsbtsz: int;
var bpas: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
var sbs: [ 0 .. NUMTHRDS - 1 ] SieveBuffer = new SieveBuffer();
var strts: [ 0 .. NUMTHRDS - 1 ] StrtsArr = new StrtsArr();
var qi: int = 0;
var wrkq\$: [ 0 .. NUMTHRDS - 1 ] sync PrimeNdx;
var rsltsq\$: [ 0 .. NUMTHRDS - 1 ] sync cnvrtrclsr(lwi, sbs(0)).type;

proc init(cvclsr, li: PrimeNdx, bsz: int) {
cnvrtrclsr = cvclsr; lwi = li; bsbtsz = bsz; }

proc deinit() { // kill the thread pool when out of scope...
if bpas == nil then return; // no thread pool!
for i in wrkq\$.domain {
wrkq\$[i].writeEF(-1); while true { const r = rsltsq\$[i].readFE();
if r == nil then break; }
}
}

proc next(): cnvrtrclsr(lwi, sbs(0)).type {
proc dowrk(ri: int) { // used internally!...
while true {
if li < 0 { rsltsq\$[ri].writeEF(nil: cnvrtrclsr(li, sbs(ri)).type); break; }
sbs[ri].fill(li);
sbs[ri].cull(li, bsbtsz, bpas!, strts[ri]);
rsltsq\$[ri].writeEF(cnvrtrclsr(li, sbs[ri]));
}
}
if this.bpas == nil { // init on first use; avoids data race!
this.bpas = new BasePrimeArrs();
if this.bsbtsz < CPUL1CACHE {
this.sbs = new SieveBuffer(bsbtsz);
this.strts = new StrtsArr({0 .. SZBPSTRTS - 1});
}
else {
this.sbs = new SieveBuffer(CPUL2CACHE);
this.strts = new StrtsArr({0 .. SZMNSTRTS - 1});
}
// start threadpool and give it inital work...
for i in rsltsq\$.domain {
begin with (const in i) dowrk(i);
this.wrkq\$[i].writeEF(this.lwi); this.lwi += this.sbs[i].cmpsts.size * 8;
}
}
this.wrkq\$[qi].writeEF(this.lwi);
this.lwi += this.sbs[qi].cmpsts.size * 8;
this.qi = if qi >= NUMTHRDS - 1 then 0 else qi + 1;
return rslt;
}

iter these() { while lwi >= 0 do yield next(); }
}

// the sieve buffer to base prime array converter closure...
record SB2BPArr {
proc this(lwi: PrimeNdx, sb: SieveBuffer): shared BasePrimeArr? {
const bsprm = (lwi + lwi + 3): BasePrime;
const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
var arr = new shared BasePrimeArr({ 0 .. sb.count(szlmt) - 1 });
while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 {
arr.bparr[j] = bsprm + (i + i): BasePrime; j += 1; }
i += 1; }
return arr;
}
}

// a memoizing lazy list of BasePrimeArr's...
class BasePrimeArrs {
var tail: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
var lock\$: sync bool = true;
var feed: shared PagedResults(SB2BPArr) =
new shared PagedResults(new SB2BPArr(), 65536, 65536);

proc init() { // make our own first array to break data race!
var sb = new SieveBuffer(256); sb.fill(0);
const sb2 = new SB2BPArr();
head = sb2(0, sb): shared BasePrimeArr;
this.complete(); // fake base primes!
sb = new SieveBuffer(65536); sb.fill(0);
// use (completed) self as source of base primes!
var strts = new StrtsArr({ 0 .. 256 });
sb.cull(0, 65536, this, strts);
// replace head with new larger version culled using fake base primes!...
head = sb2(0, sb): shared BasePrimeArr;
}

// for initializing for use by the fillWHLPTRN proc...
proc init(hd: shared BasePrimeArr) {
head = hd; feed = new shared PagedResults(new SB2BPArr(), 0, 0);
}

// for initializing lazily extended list as required...
proc init(hd: shared BasePrimeArr, fd: PagedResults) { head = hd; feed = fd; }

proc next(): shared BasePrimeArrs {
if this.tail == nil { // in case other thread slipped through!
if this.lock\$.readFE() && this.tail == nil { // empty sync -> block others!
const nhd = this.feed.next(): shared BasePrimeArr;
this.tail = new shared BasePrimeArrs(nhd , this.feed);
}
this.lock\$.writeEF(false); // fill the sync so other threads can do nothing!
}
return this.tail: shared BasePrimeArrs; // necessary cast!
}

iter these(): BasePrime {
for bp in head.bparr do yield bp; var cur = next();
while true {
for bp in cur.head.bparr do yield bp; cur = cur.next(); }
}
}

record SB2PrmArr {
proc this(lwi: PrimeNdx, sb: SieveBuffer): shared PrimeArr? {
const bsprm = (lwi + lwi + 3): Prime;
const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
var arr = new shared PrimeArr({0 .. sb.count(szlmt) - 1});
while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 then {
arr.prmarr[j] = bsprm + (i + i): Prime; j += 1; }
i += 1; }
return arr;
}
}

iter primes(): Prime {
for p in WHLPRMS do yield p: Prime;
for pa in new shared PagedResults(new SB2PrmArr(), 0, CPUL1CACHE) do
for p in pa!.prmarr do yield p;
}

// use a class so that it can be used as a generic sync value!...
class CntNxt { const cnt: int; const nxt: PrimeNdx; }

// a class that emulates a closure and a return value...
record SB2Cnt {
const nxtlmt: PrimeNdx;
proc this(lwi: PrimeNdx, sb: SieveBuffer): shared CntNxt? {
const btszlmt = sb.cmpsts.size * 8 - 1; const lstndx = lwi + btszlmt: PrimeNdx;
const btlmt = if lstndx > nxtlmt then max(0, (nxtlmt - lwi): int) else btszlmt;
return new shared CntNxt(sb.count(btlmt), lstndx);
}
}

// couut primes to limit, just like it says...
proc countPrimesTo(lmt: Prime): int(64) {
const nxtlmt = ((lmt - 3) / 2): PrimeNdx; var count = 0: int(64);
for p in WHLPRMS { if p > lmt then break; count += 1; }
if lmt < FRSTSVPRM then return count;
for cn in new shared PagedResults(new SB2Cnt(nxtlmt), 0, CPUL1CACHE) {
count += cn!.cnt: int(64); if cn!.nxt >= nxtlmt then break;
}
return count;
}

// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }

cnt = 0; for p in primes() { if p > 1000000 then break; cnt += 1; }
writeln("\nThere are ", cnt, " primes up to a million.");

write("Sieving to ", LIMIT, " with ");
write("CPU L1/L2 cache sizes of ", L1, "/", L2, " KiloBytes ");

var timer: Timer; timer.start();
// the slow way!:
// var count = 0; for p in primes() { if p > LIMIT then break; count += 1; }
const count = countPrimesTo(LIMIT); // the fast way!
timer.stop();

write("Found ", count, " primes up to ", LIMIT);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to a million.
Sieving to 1000000000 with CPU L1/L2 cache sizes of 16/128 KiloBytes using 4 threads.
Found 50847534 primes up to 1000000000 in 128.279 milliseconds.

Time as run using Chapel version 1.24.1 on an Intel Skylake i5-6500 at 3.2 GHz (base, multi-threaded).

Note that the above code does implement some functional concepts as in a memoized lazy list of base prime arrays, but as this is used at the page level, the slowish performance doesn't impact the overall execution time much and the code is much more elegant in using this concept such that we compute new pages of base primes as they are required for increasing range.

Some of the most tricky bits due to having thread pools is stopping and de-initializing when they go out of scope; this is done by the `deinit` method of the `PagedResults` generic class, and was necessary to prevent a segmentation fault when the thread pool goes out of scope.

The tight inner loops for culling composite number representations have been optimized to some extent in using "loop unpeeling" for smaller base primes to simplify the loops down to simple masking by a constant with eight separate loops for the repeating pattern over bytes and culling by sub buffer CPU L1 cache sizes over the outer sieve buffer size of the CPU L2 cache size in order to make the task work-sized chunks larger for less task context switching overheads and for reduced time lost to culling start address calculations per base prime (which needs to use some integer division that is always slower than other integer operations). This last optimization allows for reasonably efficient culling up to the square of the CPU L2 cache size in bits or 1e12 for the one Megabit CPU L2 cache size many mid-range Intel CPU's have currently when used for multi-threading (half of the actual size for Hyper-Threaded - HT - threads as they share both the L1 and the L2 caches over the pairs of Hyper-Threaded (HT) threads per core).

Although this code can be used for much higher sieving ranges, it is not recommended due to not yet being tuned for better efficiency above 1e12; there are no checks limiting the user to this range, but, as well as decreasing efficiency for sieving limits much higher than this, at some point there will be errors due to integer overflows but these will be for huge sieving ranges taking days -> weeks -> months -> years to execute on common desktop CPU's.

A further optimization used is to create a pre-culled `WHLPTRN` `SieveBuffer` where the odd primes (since we cull odds-only) of 3, 5, 7, 11, 13, and 17 have already been culled and using that to pre-fill the page segment buffers so that no culling by these base prime values is required, this reduces the number of operations by about 45% compared to if it wasn't done but the ratio of better performance is only about 34.5% better as this changes the ratio of (fast) smaller base primes to larger (slower) ones.

All of the improvements to this point allow the shown performance as per the displayed output for the above program; using a command line argument of `--L1=32 --L2=256 --LIMIT=100000000000` (a hundred billion - 1e11 - on this computer, which has cache sizes of that amount and no Hyper-Threading - HT), it can count the primes to 1e11 in about 17.5 seconds using the above mentioned CPU. It will be over two times faster than this using a more modern desktop CPU such as the Intel Core i7-9700K which has twice as many effective cores, a higher CPU clock rate, is about 10% to 15% faster due the a more modern CPU architecture which is three generations newer. Of course using a top end AMD Threadripper CPU with its 64/128 cores/threads will be almost eight times faster again except that it will lose about 20% due to its slower clock speed when all cores/threads are used; note that high core CPU's will only give these speed gains for large sieving ranges such as 1e11 and above since otherwise there aren't enough work chunks to go around for all the threads available!

Incredibly, even run single threaded (argument of `--NUMTHRDS=1`) this implementation is only about 20% slower than the reference Sieve of Atkin "primegen/primespeed" implementation in counting the number of primes to a billion and is about 20% faster in counting the primes to a hundred billion (arguments of `--LIMIT=100000000000 --NUMTHRDS=1`) with both using the same size of CPU L1 cache buffer of 16 Kilobytes; This implementation does not yet have the level of wheel optimization of the Sieve of Atkin as it has only the limited wheel optimization of Odds-Only plus the use of the pre-cull fill. Maximum wheel factorization will reduce the number of operations for this code to less than about half the current number, making it faster than the Sieve of Atkin for all ranges, and approach the speed of Kim Walisch's "primesieve". However, not having primitive element pointers and pointer operations, there are some optimizations used that Kim Walisch's "primesieve" uses of extreme loop unrolling that mean that it can never quite reach the speed of "primeseive" by about 20% to 30%.

The above code is a fairly formidable benchmark, which I have also written in Fortran as in likely the major computer language that is comparable. I see that Chapel has the following advantages over Fortran:

1) It is somewhat cleaner to read and write code with more modern forms of expression, especially as to declaring variables/constants which can often be inferred as to type.

2) The Object Oriented Programming paradigm has been designed in from the beginning and isn't just an add-on that needs to be careful not to break legacy code; Fortran's method of expression this paradigm using modules seems awkward by comparison.

3) It has some more modern forms of automatic memory management as to type safety and sharing of allocated memory structures.

4) It has several modern forms of managing concurrency built in from the beginning rather than being add-on's or just being the ability to call through to OpenMP/MPI.

That said, it also as the following disadvantages, at least as I see it:

1) One of the worst things about Chapel is the slow compilation speed, which is about ten times slower than GNU gfortran.

2) It's just my personal opinion, but so much about forms of expression have been modernized and improved, it seems very dated to go back to using curly braces to delineate code blocks and semi-colons as line terminators; Most modern languages at least dispense with the latter.

3) Some programming features offered are still being defined, although most evolutionary changes now no longer are breaking code changes.

Speed isn't really an issue with either one, with some types of tasks better suited to one or the other but mostly about the same; for this particular task they are about the same if one were to implement the same algorithmic optimizations other than that one can do some of the extreme loop unrolling optimization with Fortran that can't be done with Chapel as Fortran has some limited form of pointers, although not the full set of pointer operators that C/C++ like languages have. I think that if both were optimized as much as each is capable, Fortran may run about 20% faster, perhaps due to the maturity of its compile and due to the availablity of (limited) pointer operations.

The primary additional optimization available to Chapel code is the addition of Maximum Wheel-Factorization as per my StackOverflow JavaScript Tutorial answer, with the other major improvement to add "bucket sieving" for sieving limits above about 1e12 so as to get reasonable efficiency up to 1e16 and above.

Clojure

(defn primes< [n]
(remove (set (mapcat #(range (* % %) n %)
(range 2 (Math/sqrt n))))
(range 2 n)))

The above is **not strictly a Sieve of Eratosthenes** as the composite culling ranges (in the mapcat) include all of the multiples of all of the numbers and not just the multiples of primes. When tested with (println (time (count (primes< 1000000)))), it takes about 5.5 seconds just to find the number of primes up to a million, partly because of the extra work due to the use of the non-primes, and partly because of the constant enumeration using sequences with multiple levels of function calls. Although very short, this code is likely only useful up to about this range of a million.

It may be written using the into #{} function to run slightly faster due to the set function being concerned with only distinct elements whereas the into #{} only does the conjunction, and even at that doesn't do that much as it does the conjunction to an empty sequence, the code as follows:

(defn primes< [n]
(remove (into #{}
(mapcat #(range (* % %) n %)
(range 2 (Math/sqrt n))))
(range 2 n)))

The above code is slightly faster for the reasons given, but is still not strictly a Sieve of Eratosthenes due to sieving by all numbers and not just by the base primes.

The following code also uses the into #{} transducer but has been slightly wheel-factorized to sieve odds-only:

(defn primes< [n]
(if (< n 2) ()
(cons 2 (remove (into #{}
(mapcat #(range (* % %) n %)
(range 3 (Math/sqrt n) 2)))
(range 3 n 2)))))

The above code is a little over twice as fast as the non-odds-only due to the reduced number of operations. It still isn't strictly a Sieve of Eratosthenes as it sieves by all odd base numbers and not only by the base primes.

The following code calculates primes up to and including n using a mutable boolean array but otherwise entirely functional code; it is tens (to a hundred) times faster than the purely functional codes due to the use of mutability in the boolean array:

(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
(let [root (-> n Math/sqrt long),
cmpsts (boolean-array (inc n)),
cullp (fn [p]
(loop [i (* p p)]
(if (<= i n)
(do (aset cmpsts i true)
(recur (+ i p))))))]
(do (dorun (map #(cullp %) (filter #(not (aget cmpsts %))
(range 2 (inc root)))))
(filter #(not (aget cmpsts %)) (range 2 (inc n))))))

Alternative implementation using Clojure's side-effect oriented list comprehension.

(defn primes-to
"Returns a lazy sequence of prime numbers less than lim"
[lim]
(let [refs (boolean-array (+ lim 1) true)
root (int (Math/sqrt lim))]
(do (doseq [i (range 2 lim)
:while (<= i root)
:when (aget refs i)]
(doseq [j (range (* i i) lim i)]
(aset refs j false)))
(filter #(aget refs %) (range 2 lim)))))

Alternative implementation using Clojure's side-effect oriented list comprehension. Odds only.

(defn primes-to
"Returns a lazy sequence of prime numbers less than lim"
[lim]
(let [max-i (int (/ (- lim 1) 2))
refs (boolean-array max-i true)
root (/ (dec (int (Math/sqrt lim))) 2)]
(do (doseq [i (range 1 (inc root))
:when (aget refs i)]
(doseq [j (range (* (+ i i) (inc i)) max-i (+ i i 1))]
(aset refs j false)))
(cons 2 (map #(+ % % 1) (filter #(aget refs %) (range 1 max-i)))))))

This implemantation is about twice as fast as the previous one and uses only half the memory. From the index of the array, it calculates the value it represents as (2*i + 1), the step between two indices that represent the multiples of primes to mark as composite is also (2*i + 1). The index of the square of the prime to start composite marking is 2*i*(i+1).

Alternative very slow entirely functional implementation using lazy sequences

(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
(letfn [(nxtprm [cs] ; current candidates
(let [p (first cs)]
(if (> p (Math/sqrt n)) cs
(cons p (lazy-seq (nxtprm (-> (range (* p p) (inc n) p)
set (remove cs) rest)))))))]
(nxtprm (range 2 (inc n)))))

The reason that the above code is so slow is that it has has a high constant factor overhead due to using a (hash) set to remove the composites from the future composites stream, each prime composite stream removal requires a scan across all remaining composites (compared to using an array or vector where only the culled values are referenced, and due to the slowness of Clojure sequence operations as compared to iterator/sequence operations in other languages.

Version based on immutable Vector's

Here is an immutable boolean vector based non-lazy sequence version other than for the lazy sequence operations to output the result:

(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[max-prime]
(let [sieve (fn [s n]
(if (<= (* n n) max-prime)
(recur (if (s n)
(reduce #(assoc %1 %2 false) s (range (* n n) (inc max-prime) n))
s)
(inc n))
s))]
(->> (-> (reduce conj (vector-of :boolean) (map #(= % %) (range (inc max-prime))))
(assoc 0 false)
(assoc 1 false)
(sieve 2))
(map-indexed #(vector %2 %1)) (filter first) (map second))))

The above code is still quite slow due to the cost of the immutable copy-on-modify operations.

Odds only bit packed mutable array based version

The following code implements an odds-only sieve using a mutable bit packed long array, only using a lazy sequence for the output of the resulting primes:

(set! *unchecked-math* true)

(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
(let [root (-> n Math/sqrt long),
rootndx (long (/ (- root 3) 2)),
ndx (long (/ (- n 3) 2)),
cmpsts (long-array (inc (/ ndx 64))),
isprm #(zero? (bit-and (aget cmpsts (bit-shift-right % 6))
(bit-shift-left 1 (bit-and % 63)))),
cullp (fn [i]
(let [p (long (+ i i 3))]
(loop [i (bit-shift-right (- (* p p) 3) 1)]
(if (<= i ndx)
(do (let [w (bit-shift-right i 6)]
(aset cmpsts w (bit-or (aget cmpsts w)
(bit-shift-left 1 (bit-and i 63)))))
(recur (+ i p))))))),
cull (fn [] (loop [i 0] (if (<= i rootndx)
(do (if (isprm i) (cullp i)) (recur (inc i))))))]
(letfn [(nxtprm [i] (if (<= i ndx)
(cons (+ i i 3) (lazy-seq (nxtprm (loop [i (inc i)]
(if (or (> i ndx) (isprm i)) i
(recur (inc i)))))))))]
(if (< n 2) nil
(cons 3 (if (< n 3) nil (do (cull) (lazy-seq (nxtprm 0)))))))))

The above code is about as fast as any "one large sieving array" type of program in any computer language with this level of wheel factorization other than the lazy sequence operations are quite slow: it takes about ten times as long to enumerate the results as it does to do the actual sieving work of culling the composites from the sieving buffer array. The slowness of sequence operations is due to nested function calls, but primarily due to the way Clojure implements closures by "boxing" all arguments (and perhaps return values) as objects in the heap space, which then need to be "un-boxed" as primitives as necessary for integer operations. Some of the facilities provided by lazy sequences are not needed for this algorithm, such as the automatic memoization which means that each element of the sequence is calculated only once; it is not necessary for the sequence values to be retraced for this algorithm.

If further levels of wheel factorization were used, the time to enumerate the resulting primes would be an even higher overhead as compared to the actual composite number culling time, would get even higher if page segmentation were used to limit the buffer size to the size of the CPU L1 cache for many times better memory access times, most important in the culling operations, and yet higher again if multi-processing were used to share to page segment processing across CPU cores.

The following code overcomes many of those limitations by using an internal (OPSeq) "deftype" which implements the ISeq interface as well as the Counted interface to provide immediate count returns (based on a pre-computed total), as well as the IReduce interface which can greatly speed come computations based on the primes sequence (eased greatly using facilities provided by Clojure 1.7.0 and up):

(defn primes-tox
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
(let [root (-> n Math/sqrt long),
rootndx (long (/ (- root 3) 2)),
ndx (max (long (/ (- n 3) 2)) 0),
lmt (quot ndx 64),
cmpsts (long-array (inc lmt)),
cullp (fn [i]
(let [p (long (+ i i 3))]
(loop [i (bit-shift-right (- (* p p) 3) 1)]
(if (<= i ndx)
(do (let [w (bit-shift-right i 6)]
(aset cmpsts w (bit-or (aget cmpsts w)
(bit-shift-left 1 (bit-and i 63)))))
(recur (+ i p))))))),
cull (fn [] (do (aset cmpsts lmt (bit-or (aget cmpsts lmt)
(bit-shift-left -2 (bit-and ndx 63))))
(loop [i 0]
(when (<= i rootndx)
(when (zero? (bit-and (aget cmpsts (bit-shift-right i 6))
(bit-shift-left 1 (bit-and i 63))))
(cullp i))
(recur (inc i))))))
numprms (fn []
(let [w (dec (alength cmpsts))] ;; fast results count bit counter
(loop [i 0, cnt (bit-shift-left (alength cmpsts) 6)]
(if (> i w) cnt
(recur (inc i)
(- cnt (java.lang.Long/bitCount (aget cmpsts i))))))))]
(if (< n 2) nil
(cons 2 (if (< n 3) nil
(do (cull)
(deftype OPSeq [^long i ^longs cmpsa ^long cnt ^long tcnt] ;; for arrays maybe need to embed the array so that it doesn't get garbage collected???
clojure.lang.ISeq
(first [_] (if (nil? cmpsa) nil (+ i i 3)))
(next [_] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) nil
(OPSeq.
(loop [j (inc i)]
(let [p? (zero? (bit-and (aget cmpsa (bit-shift-right j 6))
(bit-shift-left 1 (bit-and j 63))))]
(if p? j (recur (inc j)))))
cmpsa ncnt tcnt))))
(more [this] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) (OPSeq. 0 nil tcnt tcnt)
(.next this))))
(cons [this o] (clojure.core/cons o this))
(empty [_] (if (= cnt tcnt) nil (OPSeq. 0 nil tcnt tcnt)))
(equiv [this o] (if (or (not= (type this) (type o))
(not= cnt (.cnt ^OPSeq o)) (not= tcnt (.tcnt ^OPSeq o))
(not= i (.i ^OPSeq o))) false true))
clojure.lang.Counted
(count [_] (- tcnt cnt))
clojure.lang.Seqable
(clojure.lang.Seqable/seq [this] (if (= cnt tcnt) nil this))
clojure.lang.IReduce
(reduce [_ f v] (let [c (- tcnt cnt)]
(if (<= c 0) nil
(loop [ci i, n c, rslt v]
(if (zero? (bit-and (aget cmpsa (bit-shift-right ci 6))
(bit-shift-left 1 (bit-and ci 63))))
(let [rrslt (f rslt (+ ci ci 3)),
rdcd (reduced? rrslt),
nrslt (if rdcd @rrslt rrslt)]
(if (or (<= n 1) rdcd) nrslt
(recur (inc ci) (dec n) nrslt)))
(recur (inc ci) n rslt))))))
(reduce [this f] (if (nil? i) (f) (if (= (.count this) 1) (+ i i 3)
(.reduce ^clojure.lang.IReduce (.next this) f (+ i i 3)))))
clojure.lang.Sequential
Object
(toString [this] (if (= cnt tcnt) "()"
(.toString (seq (map identity this))))))
(->OPSeq 0 cmpsts 0 (numprms))))))))

'(time (count (primes-tox 10000000)))' takes about 40 milliseconds (compiled) to produce 664579.

Due to the better efficiency of the custom CIS type, the primes to the above range can be enumerated in about the same 40 milliseconds that it takes to cull and count the sieve buffer array.

Under Clojure 1.7.0, one can use '(time (reduce (fn [] (+ (long sum) (long v))) 0 (primes-tox 2000000)))' to find "142913828922" as the sum of the primes to two million as per Euler Problem 10 in about 40 milliseconds total with about half the time used for sieving the array and half for computing the sum.

To show how sensitive Clojure is to forms of expression of functions, the simple form '(time (reduce + (primes-tox 2000000)))' takes about twice as long even though it is using the same internal routine for most of the calculation due to the function not having the type coercion's.

Before one considers that this code is suitable for larger ranges, it is still lacks the improvements of page segmentation with pages about the size of the CPU L1/L2 caches (produces about a four times speed up), maximal wheel factorization (to make it another about four times faster), and the use of multi-processing (for a further gain of about 4 times for a multi-core desktop CPU such as an Intel i7), will make the sieving/counting code about 50 times faster than this, although there will only be a moderate improvement in the time to enumerate/process the resulting primes. Using these techniques, the number of primes to one billion can be counted in a small fraction of a second.

Unbounded Versions

For some types of problems such as finding the nth prime (rather than the sequence of primes up to m), a prime sieve with no upper bound is a better tool.

The following variations on an incremental Sieve of Eratosthenes are based on or derived from the Richard Bird sieve as described in the Epilogue of Melissa E. O'Neill's definitive paper:

A Clojure version of Richard Bird's Sieve using Lazy Sequences (sieves odds only)

(defn primes-Bird
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm by Richard Bird."
[]
(letfn [(mltpls [p] (let [p2 (* 2 p)]
(letfn [(nxtmltpl [c]
(cons c (lazy-seq (nxtmltpl (+ c p2)))))]
(nxtmltpl (* p p))))),
(allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),
(union [xs ys] (let [xv (first xs), yv (first ys)]
(if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))
(if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))
(cons xv (lazy-seq (union (next xs) (next ys)))))))),
(mrgmltpls [mltplss] (cons (first (first mltplss))
(lazy-seq (union (next (first mltplss))
(mrgmltpls (next mltplss)))))),
(minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]
(if (< n (first cmpsts))
(cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))
(recur (+ n 2) (next cmpsts)))))]
(do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
(cons 2 (lazy-seq oddprms)))))

The above code is quite slow due to both that the data structure is a linear merging of prime multiples and due to the slowness of the Clojure sequence operations.

A Clojure version of the tree folding sieve using Lazy Sequences

The following code speeds up the above code by merging the linear sequence of sequences as above by pairs into a right-leaning tree structure:

(defn primes-treeFolding
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
[]
(letfn [(mltpls [p] (let [p2 (* 2 p)]
(letfn [(nxtmltpl [c]
(cons c (lazy-seq (nxtmltpl (+ c p2)))))]
(nxtmltpl (* p p))))),
(allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),
(union [xs ys] (let [xv (first xs), yv (first ys)]
(if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))
(if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))
(cons xv (lazy-seq (union (next xs) (next ys)))))))),
(pairs [mltplss] (let [tl (next mltplss)]
(cons (union (first mltplss) (first tl))
(lazy-seq (pairs (next tl)))))),
(mrgmltpls [mltplss] (cons (first (first mltplss))
(lazy-seq (union (next (first mltplss))
(mrgmltpls (pairs (next mltplss))))))),
(minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]
(if (< n (first cmpsts))
(cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))
(recur (+ n 2) (next cmpsts)))))]
(do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
(cons 2 (lazy-seq oddprms)))))

The above code is still slower than it should be due to the slowness of Clojure's sequence operations.

A Clojure version of the above tree folding sieve using a custom Co Inductive Sequence

The following code uses a custom "deftype" non-memoizing Co Inductive Stream/Sequence (CIS) implementing the ISeq interface to make the sequence operations more efficient and is about four times faster than the above code:

(deftype CIS [v cont]
clojure.lang.ISeq
(first [_] v)
(next [_] (if (nil? cont) nil (cont)))
(more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))
(cons [this o] (clojure.core/cons o this))
(empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))
(equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)
(if (or (not= (type cis1) (type cis2))
(not= (.v cis1) (.v ^CIS cis2))
(and (nil? (.cont cis1))
(not (nil? (.cont ^CIS cis2))))
(and (nil? (.cont ^CIS cis2))
(not (nil? (.cont cis1))))) false
(if (nil? (.cont cis1)) true
(recur ((.cont cis1)) ((.cont ^CIS cis2))))))))
(count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt
(recur ((.cont cis)) (inc cnt)))))
clojure.lang.Seqable
(seq [this] (if (and (nil? v) (nil? cont)) nil this))
clojure.lang.Sequential
Object
(toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))

(defn primes-treeFoldingx
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
[]
(letfn [(mltpls [p] (let [p2 (* 2 p)]
(letfn [(nxtmltpl [c]
(->CIS c (fn [] (nxtmltpl (+ c p2)))))]
(nxtmltpl (* p p))))),
(allmtpls [^CIS ps] (->CIS (mltpls (.v ps)) (fn [] (allmtpls ((.cont ps)))))),
(union [^CIS xs ^CIS ys] (let [xv (.v xs), yv (.v ys)]
(if (< xv yv) (->CIS xv (fn [] (union ((.cont xs)) ys)))
(if (< yv xv) (->CIS yv (fn [] (union xs ((.cont ys)))))
(->CIS xv (fn [] (union (next xs) ((.cont ys))))))))),
(pairs [^CIS mltplss] (let [^CIS tl ((.cont mltplss))]
(->CIS (union (.v mltplss) (.v tl))
(fn [] (pairs ((.cont tl))))))),
(mrgmltpls [^CIS mltplss] (->CIS (.v ^CIS (.v mltplss))
(fn [] (union ((.cont ^CIS (.v mltplss)))
(mrgmltpls (pairs ((.cont mltplss)))))))),
(minusStrtAt [n ^CIS cmpsts] (loop [n n, cmpsts cmpsts]
(if (< n (.v cmpsts))
(->CIS n (fn [] (minusStrtAt (+ n 2) cmpsts)))
(recur (+ n 2) ((.cont cmpsts))))))]
(do (def oddprms (->CIS 3 (fn [] (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
(->CIS 2 (fn [] oddprms)))))

'(time (count (take-while #(<= (long %) 10000000) (primes-treeFoldingx))))' takes about 3.4 seconds for a range of 10 million.

The above code is useful for ranges up to about fifteen million primes, which is about the first million primes; it is comparable in speed to all of the bounded versions except for the fastest bit packed version which can reasonably be used for ranges about 100 times as large.

Incremental Hash Map based unbounded "odds-only" version

The following code is a version of the O'Neill Haskell code but does not use wheel factorization other than for sieving odds only (although it could be easily added) and uses a Hash Map (constant amortized access time) rather than a Priority Queue (log n access time for combined remove-and-insert-anew operations, which are the majority used for this algorithm) with a lazy sequence for output of the resulting primes; the code has the added feature that it uses a secondary base primes sequence generator and only adds prime culling sequences to the composites map when they are necessary, thus saving time and limiting storage to only that required for the map entries for primes up to the square root of the currently sieved number:

(defn primes-hashmap
"Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Hashmap"
[]
(letfn [(nxtoddprm [c q bsprms cmpsts]
(if (>= c q) ;; only ever equal
(let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]
(recur (+ c 2) (* nbp nbp) nbps (assoc cmpsts (+ q p2) p2)))
(if (contains? cmpsts c)
(recur (+ c 2) q bsprms
(let [adv (cmpsts c), ncmps (dissoc cmpsts c)]
(assoc ncmps
(loop [try (+ c adv)] ;; ensure map entry is unique
(if (contains? ncmps try)
(cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts))))))]
(do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms {}))))
(cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms {}))))))

The above code is slower than the best tree folding version due to the added constant factor overhead of computing the hash functions for every hash map operation even though it has computational complexity of (n log log n) rather than the worse (n log n log log n) for the previous incremental tree folding sieve. It is still about 100 times slower than the sieve based on the bit-packed mutable array due to these constant factor hashing overheads.

There is almost no benefit of converting the above code to use a CIS as most of the time is expended in the hash map functions.

Incremental Priority Queue based unbounded "odds-only" version

In order to implement the O'Neill Priority Queue incremental Sieve of Eratosthenes algorithm, one requires an efficient implementation of a Priority Queue, which is not part of standard Clojure. For this purpose, the most suitable Priority Queue is a binary tree heap based MinHeap algorithm. The following code implements a purely functional (using entirely immutable state) MinHeap Priority Queue providing the required functions of (emtpy-pq) initialization, (getMin-pq pq) to examinte the minimum key/value pair in the queue, (insert-pq pq k v) to add entries to the queue, and (replaceMinAs-pq pq k v) to replaace the minimum entry with a key/value pair as given (it is more efficient that if functions were provided to delete and then re-insert entries in the queue; there is therefore no "delete" or other queue functions supplied as the algorithm does not requrie them:

(deftype PQEntry [k, v]
Object
(toString [_] (str "<" k "," v ">")))
(deftype PQNode [ntry, lft, rght]
Object
(toString [_] (str "<" ntry " left: " (str lft) " right: " (str rght) ">")))

(defn empty-pq [] nil)

(defn getMin-pq [^PQNode pq]
(if (nil? pq)
nil
(.ntry pq)))

(defn insert-pq [^PQNode opq ok v]
(loop [^PQEntry kv (->PQEntry ok v), pq opq, cont identity]
(if (nil? pq)
(cont (->PQNode kv nil nil))
(let [k (.k kv),
^PQEntry kvn (.ntry pq), kn (.k kvn),
l (.lft pq), r (.rght pq)]
(if (<= k kn)
(recur kvn r #(cont (->PQNode kv % l)))
(recur kv r #(cont (->PQNode kvn % l))))))))

(defn replaceMinAs-pq [^PQNode opq k v]
(let [^PQEntry kv (->PQEntry k v)]
(if (nil? opq) ;; if was empty or just an entry, just use current entry
(->PQNode kv nil nil)
(loop [pq opq, cont identity]
(let [^PQNode l (.lft pq), ^PQNode r (.rght pq)]
(cond ;; if left us empty, right must be too
(nil? l)
(cont (->PQNode kv nil nil)),
(nil? r) ;; we only have a left...
(let [^PQEntry kvl (.ntry l), kl (.k kvl)]
(if (<= k kl)
(cont (->PQNode kv l nil))
(recur l #(cont (->PQNode kvl % nil))))),
:else (let [^PQEntry kvl (.ntry l), kl (.k kvl),
^PQEntry kvr (.ntry r), kr (.k kvr)] ;; we have both
(if (and (<= k kl) (<= k kr))
(cont (->PQNode kv l r))
(if (<= kl kr)
(recur l #(cont (->PQNode kvl % r)))
(recur r #(cont (->PQNode kvr l %))))))))))))

Note that the above code is written partially using continuation passing style so as to leave the "recur" calls in tail call position as required for efficient looping in Clojure; for practical sieving ranges, the algorithm could likely use just raw function recursion as recursion depth is unlikely to be used beyond a depth of about ten or so, but raw recursion is said to be less code efficient.

The actual incremental sieve using the Priority Queue is as follows, which code uses the same optimizations of postponing the addition of prime composite streams to the queue until the square root of the currently sieved number is reached and using a secondary base primes stream to generate the primes composite stream markers in the queue as was used for the Hash Map version:

(defn primes-pq
"Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Priority Queue"
[]
(letfn [(nxtoddprm [c q bsprms cmpsts]
(if (>= c q) ;; only ever equal
(let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]
(recur (+ c 2) (* nbp nbp) nbps (insert-pq cmpsts (+ q p2) p2)))
(let [mn (getMin-pq cmpsts)]
(if (and mn (>= c (.k mn))) ;; never greater than
(recur (+ c 2) q bsprms
(loop [adv (.v mn), cmps cmpsts] ;; advance repeat composites for value
nmn (getMin-pq ncmps)]
(if (and nmn (>= c (.k nmn)))
(recur (.v nmn) ncmps)
ncmps))))
(cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts)))))))]
(do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms (empty-pq)))))
(cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms (empty-pq)))))))

The above code is faster than the Hash Map version up to about a sieving range of fifteen million or so, but gets progressively slower for larger ranges due to having (n log n log log n) computational complexity rather than the (n log log n) for the Hash Map version, which has a higher constant factor overhead that is overtaken by the extra "log n" factor.

It is slower that the fastest of the tree folding versions (which has the same computational complexity) due to the higher constant factor overhead of the Priority Queue operations (although perhaps a more efficient implementation of the MinHeap Priority Queue could be developed).

Again, these non-mutable array based sieves are about a hundred times slower than even the "one large memory buffer array" version as implemented in the bounded section; a page segmented version of the mutable bit-packed memory array would be several times faster.

All of these algorithms will respond to maximum wheel factorization, getting up to approximately four times faster if this is applied as compared to the the "odds-only" versions.

It is difficult if not impossible to apply efficient multi-processing to the above versions of the unbounded sieves as the next values of the primes sequence are dependent on previous changes of state for the Bird and Tree Folding versions; however, with the addition of a "update the whole Priority Queue (and reheapify)" or "update the Hash Map" to a given page start state functions, it is possible to do for these letter two algorithms; however, even though it is possible and there is some benefit for these latter two implementations, the benefit is less than using mutable arrays due to that the results must be enumerated into a data structure of some sort in order to be passed out of the page function whereas they can be directly enumerated from the array for the mutable array versions.

Bit packed page segmented array unbounded "odds-only" version

To show that Clojure does not need to be particularly slow, the following version runs about twice as fast as the non-segmented unbounded array based version above (extremely fast compared to the non-array based versions) and only a little slower than other equivalent versions running on virtual machines: C# or F# on DotNet or Java and Scala on the JVM:

(set! *unchecked-math* true)

(def PGSZ (bit-shift-left 1 14)) ;; size of CPU cache
(def PGBTS (bit-shift-left PGSZ 3))
(def PGWRDS (bit-shift-right PGBTS 5))
(def BPWRDS (bit-shift-left 1 7)) ;; smaller page buffer for base primes
(def BPBTS (bit-shift-left BPWRDS 5))
(defn- count-pg
"count primes in the culled page buffer, with test for limit"
[lmt ^ints pg]
(let [pgsz (alength pg),
pgbts (bit-shift-left pgsz 5),
cntem (fn [lmtw]
(let [lmtw (long lmtw)]
(loop [i (long 0), c (long 0)]
(if (>= i lmtw) (- (bit-shift-left lmtw 5) c)
(recur (inc i)
(+ c (java.lang.Integer/bitCount (aget pg i))))))))]
(if (< lmt pgbts)
(let [lmtw (bit-shift-right lmt 5),
lmtb (bit-and lmt 31)
msk (bit-shift-left -2 lmtb)]
(+ (cntem lmtw)
(- 32 (java.lang.Integer/bitCount (bit-or (aget pg lmtw)
msk)))))
(- pgbts
(areduce pg i ret (long 0) (+ ret (java.lang.Integer/bitCount (aget pg i))))))))
;; (cntem pgsz))))
(defn- primes-pages
"unbounded Sieve of Eratosthenes producing a lazy sequence of culled page buffers."
[]
(letfn [(make-pg [lowi pgsz bpgs]
(let [lowi (long lowi),
pgbts (long (bit-shift-left pgsz 5)),
pgrng (long (+ (bit-shift-left (+ lowi pgbts) 1) 3)),
^ints pg (int-array pgsz),
cull (fn [bpgs']
(loop [i (long 0), bpgs' bpgs']
(let [^ints fbpg (first bpgs'),
bpgsz (long (alength fbpg))]
(if (>= i bpgsz)
(recur 0 (next bpgs'))
(let [p (long (aget fbpg i)),
sqr (long (* p p))]
(if (< sqr pgrng) (do
(loop [j (long (let [s (long (bit-shift-right (- sqr 3) 1))]
(if (>= s lowi) (- s lowi)
(let [m (long (rem (- lowi s) p))]
(if (zero? m)
0
(- p m))))))]
(if (< j pgbts) ;; fast inner culling loop where most time is spent
(do
(let [w (bit-shift-right j 5)]
(aset pg w (int (bit-or (aget pg w)
(bit-shift-left 1 (bit-and j 31))))))
(recur (+ j p)))))
(recur (inc i) bpgs'))))))))]
(do (if (nil? bpgs)
(letfn [(mkbpps [i]
(if (zero? (bit-and (aget pg (bit-shift-right i 5))
(bit-shift-left 1 (bit-and i 31))))
(cons (int-array 1 (+ i i 3)) (lazy-seq (mkbpps (inc i))))
(recur (inc i))))]
(cull (mkbpps 0)))
(cull bpgs))
pg))),
(page-seq [lowi pgsz bps]
(letfn [(next-seq [lwi]
(cons (make-pg lwi pgsz bps)
(lazy-seq (next-seq (+ lwi (bit-shift-left pgsz 5))))))]
(next-seq lowi)))
(pgs->bppgs [ppgs]
(letfn [(nxt-pg [lowi pgs]
(let [^ints pg (first pgs),
cnt (count-pg BPBTS pg),
npg (int-array cnt)]
(do (loop [i 0, j 0]
(if (< i BPBTS)
(if (zero? (bit-and (aget pg (bit-shift-right i 5))
(bit-shift-left 1 (bit-and i 31))))
(do (aset npg j (+ (bit-shift-left (+ lowi i) 1) 3))
(recur (inc i) (inc j)))
(recur (inc i) j))))
(cons npg (lazy-seq (nxt-pg (+ lowi BPBTS) (next pgs)))))))]
(nxt-pg 0 ppgs))),
(make-base-prms-pgs []
(pgs->bppgs (cons (make-pg 0 BPWRDS nil)
(lazy-seq (page-seq BPBTS BPWRDS (make-base-prms-pgs))))))]
(page-seq 0 PGWRDS (make-base-prms-pgs))))
(defn primes-paged
"unbounded Sieve of Eratosthenes producing a lazy sequence of primes"
[]
(do (deftype CIS [v cont]
clojure.lang.ISeq
(first [_] v)
(next [_] (if (nil? cont) nil (cont)))
(more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))
(cons [this o] (clojure.core/cons o this))
(empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))
(equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)
(if (or (not= (type cis1) (type cis2))
(not= (.v cis1) (.v ^CIS cis2))
(and (nil? (.cont cis1))
(not (nil? (.cont ^CIS cis2))))
(and (nil? (.cont ^CIS cis2))
(not (nil? (.cont cis1))))) false
(if (nil? (.cont cis1)) true
(recur ((.cont cis1)) ((.cont ^CIS cis2))))))))
(count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt
(recur ((.cont cis)) (inc cnt)))))
clojure.lang.Seqable
(seq [this] (if (and (nil? v) (nil? cont)) nil this))
clojure.lang.Sequential
Object
(toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))
(letfn [(next-prm [lowi i pgseq]
(let [lowi (long lowi),
i (long i),
^ints pg (first pgseq),
pgsz (long (alength pg)),
pgbts (long (bit-shift-left pgsz 5)),
ni (long (loop [j (long i)]
(if (or (>= j pgbts)
(zero? (bit-and (aget pg (bit-shift-right j 5))
(bit-shift-left 1 (bit-and j 31)))))
j
(recur (inc j)))))]
(if (>= ni pgbts)
(recur (+ lowi pgbts) 0 (next pgseq))
(->CIS (+ (bit-shift-left (+ lowi ni) 1) 3)
(fn [] (next-prm lowi (inc ni) pgseq))))))]
(->CIS 2 (fn [] (next-prm 0 0 (primes-pages)))))))
(defn primes-paged-count-to
"counts primes generated by page segments by Sieve of Eratosthenes to the top limit"
[top]
(cond (< top 2) 0
(< top 3) 1
:else (letfn [(nxt-pg [lowi pgseq cnt]
(let [topi (bit-shift-right (- top 3) 1)
nxti (+ lowi PGBTS),
pg (first pgseq)]
(if (> nxti topi)
(+ cnt (count-pg (- topi lowi) pg))
(recur nxti
(next pgseq)
(+ cnt (count-pg PGBTS pg))))))]
(nxt-pg 0 (primes-pages) 1))))

The above code runs just as fast as other virtual machine languages when run on a 64-bit JVM; however, when run on a 32-bit JVM it runs almost five times slower. This is likely due to Clojure only using 64-bit integers for integer operations and these operations getting JIT compiled to use library functions to simulate those operations using combined 32-bit operations under a 32-bit JVM whereas direct CPU operations can be used on a 64-bit JVM

Clojure does one thing very slowly, just as here: it enumerates extremely slowly as compared to using a more imperative iteration interface; it helps to use a roll-your-own ISeq interface as here, where enumeration of the primes reduces the time from about four times as long as the composite culling operations for those primes to only about one and a half times as long, although one must also write their own sequence handling functions (can't use "take-while" or "count", for instance) in order to enjoy that benefit. That is why the "primes-paged-count-to" function is provided so it takes a negligible percentage of the time to count the primes over a range as compared to the time for the composite culling operations.

The practical range of the above sieve is about 16 million due to the fixed size of the page buffers; in order to extend the range, a larger page buffer could be used up to the size of the CPU L2 or L3 caches. If a 2^20 buffer were used (one Megabyte, as many modern dexktop CPU's easily have in their L3 cache), then the range would be increased up to about 10^14 at a cost of about a factor of two or three in slower memory accesses per composite culling operation loop. The base primes culling page size is already adequate for this range. One could make the culling page size automatically expand with growing range by about the square root of the current prime range with not too many changes to the code.

As for many implementations of unbounded sieves, the base primes less than the square root of the current range are generated by a secondary generated stream of primes; in this case it is done recursively, so another secondary stream generates the base primes for the base primes and so on down to where the innermost generator has only one page in the stream; this only takes one or two recursions for this type of range.

The base primes culling page size is reduced from the page size for the main primes so that there is less overhead for smaller primes ranges; otherwise excess base primes are generated for fairly small sieve ranges.

CLU

% Sieve of Eratosthenes
eratosthenes = proc (n: int) returns (array[bool])
prime: array[bool] := array[bool]\$fill(1, n, true)
prime := false

for p: int in int\$from_to(2, n/2) do
if prime[p] then
for c: int in int\$from_to_by(p*p, n, p) do
prime[c] := false
end
end
end
return(prime)
end eratosthenes

% Print primes up to 1000 using the sieve
start_up = proc ()
po: stream := stream\$primary_output()
prime: array[bool] := eratosthenes(1000)
col: int := 0

for i: int in array[bool]\$indexes(prime) do
if prime[i] then
col := col + 1
stream\$putright(po, int\$unparse(i), 5)
if col = 10 then
col := 0
stream\$putc(po, '\n')
end
end
end
end start_up
Output:
2    3    5    7   11   13   17   19   23   29
31   37   41   43   47   53   59   61   67   71
73   79   83   89   97  101  103  107  109  113
127  131  137  139  149  151  157  163  167  173
179  181  191  193  197  199  211  223  227  229
233  239  241  251  257  263  269  271  277  281
283  293  307  311  313  317  331  337  347  349
353  359  367  373  379  383  389  397  401  409
419  421  431  433  439  443  449  457  461  463
467  479  487  491  499  503  509  521  523  541
547  557  563  569  571  577  587  593  599  601
607  613  617  619  631  641  643  647  653  659
661  673  677  683  691  701  709  719  727  733
739  743  751  757  761  769  773  787  797  809
811  821  823  827  829  839  853  857  859  863
877  881  883  887  907  911  919  929  937  941
947  953  967  971  977  983  991  997

CMake

function(eratosthenes var limit)
# Check for integer overflow. With CMake using 32-bit signed integer,
# this check fails when limit > 46340.
if(NOT limit EQUAL 0) # Avoid division by zero.
math(EXPR i "(\${limit} * \${limit}) / \${limit}")
if(NOT limit EQUAL \${i})
message(FATAL_ERROR "limit is too large, would cause integer overflow")
endif()
endif()

# Use local variables prime_2, prime_3, ..., prime_\${limit} as array.
# Initialize array to y => yes it is prime.
foreach(i RANGE 2 \${limit})
set(prime_\${i} y)
endforeach(i)

# Gather a list of prime numbers.
set(list)
foreach(i RANGE 2 \${limit})
if(prime_\${i})
# Append this prime to list.
list(APPEND list \${i})

# For each multiple of i, set n => no it is not prime.
# Optimization: start at i squared.
math(EXPR square "\${i} * \${i}")
if(NOT square GREATER \${limit}) # Avoid fatal error.
foreach(m RANGE \${square} \${limit} \${i})
set(prime_\${m} n)
endforeach(m)
endif()
endif(prime_\${i})
endforeach(i)
set(\${var} \${list} PARENT_SCOPE)
endfunction(eratosthenes)
# Print all prime numbers through 100.
eratosthenes(primes 100)
message(STATUS "\${primes}")

COBOL

*> Please ignore the asterisks in the first column of the next comments,
*> which are kludges to get syntax highlighting to work.
IDENTIFICATION DIVISION.
PROGRAM-ID. Sieve-Of-Eratosthenes.

DATA DIVISION.
WORKING-STORAGE SECTION.

01 Max-Number USAGE UNSIGNED-INT.
01 Max-Prime USAGE UNSIGNED-INT.

01 Num-Group.
03 Num-Table PIC X VALUE "P"
OCCURS 1 TO 10000000 TIMES DEPENDING ON Max-Number
INDEXED BY Num-Index.
88 Is-Prime VALUE "P" FALSE "N".

01 Current-Prime USAGE UNSIGNED-INT.

01 I USAGE UNSIGNED-INT.

PROCEDURE DIVISION.
DISPLAY "Enter the limit: " WITH NO ADVANCING
ACCEPT Max-Number
DIVIDE Max-Number BY 2 GIVING Max-Prime

* *> Set Is-Prime of all non-prime numbers to false.
SET Is-Prime (1) TO FALSE
PERFORM UNTIL Max-Prime < Current-Prime
* *> Set current-prime to next prime.
PERFORM VARYING Num-Index FROM Current-Prime BY 1
UNTIL Is-Prime (Num-Index)
END-PERFORM
MOVE Num-Index TO Current-Prime

* *> Set Is-Prime of all multiples of current-prime to
* *> false, starting from current-prime sqaured.
COMPUTE Num-Index = Current-Prime ** 2
PERFORM UNTIL Max-Number < Num-Index
SET Is-Prime (Num-Index) TO FALSE
SET Num-Index UP BY Current-Prime
END-PERFORM
END-PERFORM

* *> Display the prime numbers.
PERFORM VARYING Num-Index FROM 1 BY 1
UNTIL Max-Number < Num-Index
IF Is-Prime (Num-Index)
DISPLAY Num-Index
END-IF
END-PERFORM

GOBACK
.

Comal

Translation of: BASIC
// Sieve of Eratosthenes
input "Limit? ": limit
dim sieve(1:limit)
sqrlimit:=sqr(limit)
sieve(1):=1
p:=2
while p<=sqrlimit do
while sieve(p) and p<sqrlimit do
p:=p+1
endwhile
if p>sqrlimit then goto done
for i:=p*p to limit step p do
sieve(i):=1
endfor i
p:=p+1
endwhile
done:
print 2,
for i:=3 to limit do
if sieve(i)=0 then
print ", ",i,
endif
endfor i
print
Output:
Limit? 100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
79, 83, 89, 97

end

Common Lisp

(defun sieve-of-eratosthenes (maximum)
(loop
with sieve = (make-array (1+ maximum)
:element-type 'bit
:initial-element 0)
for candidate from 2 to maximum
when (zerop (bit sieve candidate))
collect candidate
and do (loop for composite from (expt candidate 2)
to maximum by candidate
do (setf (bit sieve composite) 1))))

Working with odds only (above twice speedup), and marking composites only for primes up to the square root of the maximum:

(defun sieve-odds (maximum)
"Prime numbers sieve for odd numbers.
Returns a list with all the primes that are less than or equal to maximum."

(loop :with maxi = (ash (1- maximum) -1)
:with stop = (ash (isqrt maximum) -1)
:with sieve = (make-array (1+ maxi) :element-type 'bit :initial-element 0)
:for i :from 1 :to maxi
:for odd-number = (1+ (ash i 1))
:when (zerop (sbit sieve i))
:collect odd-number :into values
:when (<= i stop)
:do (loop :for j :from (* i (1+ i) 2) :to maxi :by odd-number
:do (setf (sbit sieve j) 1))
:finally (return (cons 2 values))))

The indexation scheme used here interprets each index i as standing for the value 2i+1. Bit 0 is unused, a small price to pay for the simpler index calculations compared with the 2i+3 indexation scheme. The multiples of a given odd prime p are enumerated in increments of 2p, which corresponds to the index increment of p on the sieve array. The starting point p*p = (2i+1)(2i+1) = 4i(i+1)+1 corresponds to the index 2i(i+1).

While formally a wheel, odds are uniformly spaced and do not require any special processing except for value translation. Wheels proper aren't uniformly spaced and are thus trickier.

Cowgol

include "cowgol.coh";

# To change the maximum prime, change the size of this array
# Everything else is automatically filled in at compile time
var sieve: uint8;

# Make sure all elements of the sieve are set to zero
MemZero(&sieve as [uint8], @bytesof sieve);

# Generate the sieve
var prime: @indexof sieve := 2;
while prime < @sizeof sieve loop
if sieve[prime] == 0 then
var comp: @indexof sieve := prime * prime;
while comp < @sizeof sieve loop
sieve[comp] := 1;
comp := comp + prime;
end loop;
end if;
prime := prime + 1;
end loop;

# Print all primes
var cand: @indexof sieve := 2;
while cand < @sizeof sieve loop
if sieve[cand] == 0 then
print_i16(cand as uint16);
print_nl();
end if;
cand := cand + 1;
end loop;
Output:
2
3
5
7
11
...
4967
4969
4973
4987
4999

Crystal

Basic Version

This implementation uses a `BitArray` so it is automatically bit-packed to use just one bit per number representation:

# compile with `--release --no-debug` for speed...

require "bit_array"

alias Prime = UInt64

class SoE
include Iterator(Prime)
@bits : BitArray; @bitndx : Int32 = 2

def initialize(range : Prime)
if range < 2
@bits = BitArray.new 0
else
@bits = BitArray.new((range + 1).to_i32)
end
ba = @bits; ndx = 2
while true
wi = ndx * ndx
break if wi >= ba.size
if ba[ndx]
ndx += 1; next
end
while wi < ba.size
ba[wi] = true; wi += ndx
end
ndx += 1
end
end

def next
while @bitndx < @bits.size
if @bits[@bitndx]
@bitndx += 1; next
end
rslt = @bitndx.to_u64; @bitndx += 1; return rslt
end
stop
end
end

print "Primes up to a hundred: "
SoE.new(100).each { |p| print " ", p }; puts
print "Number of primes to a million: "
puts SoE.new(1_000_000).each.size
print "Number of primes to a billion: "
start_time = Time.monotonic
print SoE.new(1_000_000_000).each.size
elpsd = (Time.monotonic - start_time).total_milliseconds
puts " in #{elpsd} milliseconds."
Output:
Primes up to a hundred:   2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Number of primes to a million:  78498
Number of primes to a billion:  50847534 in 10219.222539 milliseconds.

This is as run on an Intel SkyLake i5-6500 at 3.6 GHz (automatic boost for single threaded as here).

Odds-Only Version

the non-odds-only version as per the above should never be used because in not using odds-only, it uses twice the memory and over two and a half times the CPU operations as the following odds-only code, which is very little more complex:

# compile with `--release --no-debug` for speed...

require "bit_array"

alias Prime = UInt64

class SoE_Odds
include Iterator(Prime)
@bits : BitArray; @bitndx : Int32 = -1

def initialize(range : Prime)
if range < 3
@bits = BitArray.new 0
else
@bits = BitArray.new(((range - 1) >> 1).to_i32)
end
ba = @bits; ndx = 0
while true
wi = (ndx + ndx) * (ndx + 3) + 3 # start cull index calculation
break if wi >= ba.size
if ba[ndx]
ndx += 1; next
end
bp = ndx + ndx + 3
while wi < ba.size
ba[wi] = true; wi += bp
end
ndx += 1
end
end

def next
while @bitndx < @bits.size
if @bitndx < 0
@bitndx += 1; return 2_u64
elsif @bits[@bitndx]
@bitndx += 1; next
end
rslt = (@bitndx + @bitndx + 3).to_u64; @bitndx += 1; return rslt
end
stop
end
end

print "Primes up to a hundred: "
SoE_Odds.new(100).each { |p| print " ", p }; puts
print "Number of primes to a million: "
puts SoE_Odds.new(1_000_000).each.size
print "Number of primes to a billion: "
start_time = Time.monotonic
print SoE_Odds.new(1_000_000_000).each.size
elpsd = (Time.monotonic - start_time).total_milliseconds
puts " in #{elpsd} milliseconds."
Output:
Primes up to a hundred:   2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Number of primes to a million:  78498
Number of primes to a billion:  50847534 in 4877.829642 milliseconds.

As can be seen, this is over two times faster than the non-odds-only version when run on the same CPU due to reduced pressure on the CPU data cache; however it is only reasonably performant for ranges of a few millions, and above that a page-segmented version of odds-only (or further wheel factorization) should be used plus other techniques for a further reduction of number of CPU clock cycles per culling/marking operation.

Page-Segmented Odds-Only Version

For sieving of ranges larger than a few million efficiently, a page-segmented sieve should always be used to preserve CPU cache associativity by making the page size to be about that of the CPU L1 data cache. The following code implements a page-segmented version that is an extensible sieve (no upper limit needs be specified) using a secondary memoized feed of base prime value arrays which use a smaller page-segment size for efficiency. When the count of the number of primes is desired, the sieve is polymorphic in output and counts the unmarked composite bits by using fast `popcount` instructions taken 64-bits at a time. The code is as follows:

# compile with `--release --no-debug` for speed...

alias Prime = UInt64
alias PrimeNdx = Int64
alias PrimeArr = Array(Prime)
alias SieveBuffer = Pointer(UInt8)
alias BasePrime = UInt32
alias BasePrimeArr = Array(BasePrime)

CPUL1CACHE = 131072 # 16 Kilobytes in nimber of bits

BITMASK = Pointer(UInt8).malloc(8) { |i| 1_u8 << i }

# Count number of non-composite (zero) bits within index range...
# sieve buffer is always evenly divisible by 64-bit words...
private def count_page_to(ndx : Int32, sb : SieveBuffer)
lstwrdndx = ndx >> 6; mask = (~1_u64) << (ndx & 63)
cnt = lstwrdndx * 64 + 64; sbw = sb.as(Pointer(UInt64))
lstwrdndx.times { |i| cnt -= sbw[i].popcount }
end

# Cull composite bits from sieve buffer using base prime arrays;
# starting at overall given prime index for given buffer bit size...
private def cull_page(pndx : PrimeNdx, bitsz : Int32,
bps : Iterator(BasePrimeArr), sb : SieveBuffer)
bps.each { |bpa|
bpa.each { |bpu32|
bp = bpu32.to_i64; bpndx = (bp - 3) >> 1
swi = (bpndx + bpndx) * (bpndx + 3) + 3 # calculate start prime index
return if swi >= pndx + bitsz.to_i64
bpi = bp.to_i32 # calculate buffer start culling index...
bi = (swi >= pndx) ? (swi - pndx).to_i32 : begin
r = (pndx - swi) % bp; r == 0 ? 0 : bpi - r.to_i32
end
# when base prime is small enough, cull using strided loops to
# simplify the inner loops at the cost of more loop overhead...
# allmost all of the work is done by the following loop...
if bpi < (bitsz >> 4)
bilmt = bi + (bpi << 3); cplmt = sb + (bitsz >> 3)
bilmt = CPUL1CACHE if bilmt > CPUL1CACHE
while bi < bilmt
cp = sb + (bi >> 3); msk = BITMASK[bi & 7]
while cp < cplmt # use pointer to save loop overhead
cp |= msk; cp += bpi
end
bi += bpi
end
else
while bi < bitsz # bitsz
sb[bi >> 3] |= BITMASK[bi & 7]; bi += bpi
end
end } }
end

# Iterator over processed prime pages, polymorphic by the converter function...
private class PagedResults(T)
@bpas : BasePrimeArrays
@cmpsts : SieveBuffer

def initialize(@prmndx : PrimeNdx,
@cmpstsbitsz : Int32,
@cnvrtrfnc : (Int64, Int32, SieveBuffer) -> T)
@bpas = BasePrimeArrays.new
@cmpsts = SieveBuffer.malloc(((@cmpstsbitsz + 63) >> 3) & (-8))
end

private def dopage
(@prmndx..).step(@cmpstsbitsz.to_i64).map { |pn|
@cmpsts.clear(@cmpstsbitsz >> 3)
cull_page(pn, @cmpstsbitsz, @bpas.each, @cmpsts)
@cnvrtrfnc.call(pn, @cmpstsbitsz, @cmpsts) }
end

def each
dopage
end

def each(& : T -> _) : Nil
itr = dopage
while true
value = itr.next
break if value.is_a?(Iterator::Stop)
yield value
end
end
end

# Secondary memoized chain of BasePrime arrays (by small page size),
# which is actually a iterable lazy list (memoized) of BasePrimeArr;
# Crystal has closures, so it is easy to implement a LazyList class
# which memoizes the results of the thunk so it is only executed once...
private class BasePrimeArrays
@baseprmarr : BasePrimeArr # head of lezy list
@tail : BasePrimeArrays? = nil # tail starts as non-existing

def initialize # special case for first page of base primes
# converter of sieve buffer to base primes array...
sb2bparrprc = -> (pn : PrimeNdx, bl : Int32, sb : SieveBuffer) {
cnt = count_page_to(bl - 1, sb)
bparr = BasePrimeArr.new(cnt, 0); j = 0
bsprm = (pn + pn + 3).to_u32
bl.times.each { |i|
next if (sb[i >> 3] & BITMASK[i & 7]) != 0
bparr[j] = bsprm + (i + i).to_u32; j += 1 }
bparr }

cmpsts = SieveBuffer.malloc 128 # fake bparr for first iter...
frstbparr = sb2bparrprc.call(0_i64, 1024, cmpsts)
cull_page(0_i64, 1024, Iterator.of(frstbparr).each, cmpsts)
@baseprmarr = sb2bparrprc.call(0_i64, 1024, cmpsts)

# initialization of pages after the first is deferred to avoid data race...
initbpas = -> { PagedResults.new(1024_i64, 1024, sb2bparrprc).each }
# recursive LazyList generator function...
nxtbpa = uninitialized Proc(Iterator(BasePrimeArr), BasePrimeArrays)
nxtbpa = -> (bppgs : Iterator(BasePrimeArr)) {
nbparr = bppgs.next
abort "Unexpectedbase primes end!!!" if nbparr.is_a?(Iterator::Stop)
BasePrimeArrays.new(nbparr, ->{ nxtbpa.call(bppgs) }) }
@thunk = ->{ nxtbpa.call(initbpas.call) }
end
def initialize(@baseprmarr : BasePrimeArr, @thunk : Proc(BasePrimeArrays))
end
def initialize(@baseprmarr : BasePrimeArr, @thunk : Proc(Nil))
end
def initialize(@baseprmarr : BasePrimeArr, @thunk : Nil)
end

def tail # not thread safe without a lock/mutex...
if thnk = @thunk
@tail = thnk.call; @thunk = nil
end
@tail
end

private class BasePrimeArrIter # iterator over BasePrime arrays...
include Iterator(BasePrimeArr)
@dbparrs : Proc(BasePrimeArrays?)

def initialize(fromll : BasePrimeArrays)
@dbparrs = ->{ fromll.as(BasePrimeArrays?) }
end

def next
if bpas = @dbparrs.call
rslt = [email protected]; @dbparrs = -> { bpas.tail }; rslt
else
abort "Unexpected end of base primes array iteration!!!"
end
end
end

def each
BasePrimeArrIter.new(self)
end
end

# An "infinite" extensible iteration of primes,...
def primes
sb2prms = ->(pn : PrimeNdx, bitsz : Int32, sb : SieveBuffer) {
cnt = count_page_to(bitsz - 1, sb)
prmarr = PrimeArr.new(cnt, 0); j = 0
bsprm = (pn + pn + 3).to_u64
bitsz.times.each { |i|
next if (sb[i >> 3] & BITMASK[i & 7]) != 0
prmarr[j] = bsprm + (i + i).to_u64; j += 1 }
prmarr
}
(2_u64..2_u64).each
.chain PagedResults.new(0, CPUL1CACHE, sb2prms).each.flat_map { |prmspg| prmspg.each }
end

# Counts number of primes to given limit...
def primes_count_to(lmt : Prime)
if lmt < 3
lmt < 2 ? return 0 : return 1
end
lmtndx = ((lmt - 3) >> 1).to_i64
sb2cnt = ->(pn : PrimeNdx, bitsz : Int32, sb : SieveBuffer) {
pglmt = pn + bitsz.to_i64 - 1
if (pn + CPUL1CACHE.to_i64) > lmtndx
Tuple.new(count_page_to((lmtndx - pn).to_i32, sb).to_i64, pglmt)
else
Tuple.new(count_page_to(bitsz - 1, sb).to_i64, pglmt)
end
}
count = 1
PagedResults.new(0, CPUL1CACHE, sb2cnt).each { |(cnt, lmt)|
count += cnt; break if lmt >= lmtndx }
count
end

print "The primes up to 100 are: "
primes.each.take_while { |p| p <= 100_u64 }.each { |p| print " ", p }
print ".\r\nThe Number of primes up to a million is "
print primes.each.take_while { |p| p <= 1_000_000_u64 }.size
print ".\r\nThe number of primes up to a billion is "
start_time = Time.monotonic
# answr = primes.each.take_while { |p| p <= 1_000_000_000_u64 }.size # slow way
answr = primes_count_to(1_000_000_000) # fast way
elpsd = (Time.monotonic - start_time).total_milliseconds
print "#{answr} in #{elpsd} milliseconds.\r\n"
Output:
The primes up to 100 are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97.
The Number of primes up to a million is 78498.
The number of primes up to a billion is 50847534 in 658.466028 milliseconds.

When run on the same machine as the previous version, the code is about seven and a half times as fast as even the above Odds-Only version at about 2.4 CPU clock cycles per culling operation rather than over 17, partly due to better cache associativity (about half the gain) but also due to tuning the inner culling loop for small base prime values to operate by byte pointer strides with a constant mask value to simplify the code generated for these inner loops; as there is some overhead in the eight outer loops that set this up, this technique is only applicable for smaller base primes.

Further gains are possible by using maximum wheel factorization rather than just factorization for odd base primes which can reduce the number of operations by a factor of about four and the number of CPU clock cycles per culling operation can be reduced by an average of a further about 25 percent for sieving to a billion by using extreme loop unrolling techniques for both the dense and sparse culling cases. As well, multi-threading by pages can reduce the wall clock time by a factor of the number of effective cores (non Hyper-Threaded cores).

D

Simpler Version

Prints all numbers less than the limit.
import std.stdio, std.algorithm, std.range, std.functional;

uint[] sieve(in uint limit) nothrow @safe {
if (limit < 2)
return [];
auto composite = new bool[limit];

foreach (immutable uint n; 2 .. cast(uint)(limit ^^ 0.5) + 1)
if (!composite[n])
for (uint k = n * n; k < limit; k += n)
composite[k] = true;

//return iota(2, limit).filter!(not!composite).array;
return iota(2, limit).filter!(i => !composite[i]).array;
}

void main() {
50.sieve.writeln;
}
Output:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

Faster Version

This version uses an array of bits (instead of booleans, that are represented with one byte), and skips even numbers. The output is the same.

import std.stdio, std.math, std.array;

size_t[] sieve(in size_t m) pure nothrow @safe {
if (m < 3)
return null;
immutable size_t n = m - 1;
enum size_t bpc = size_t.sizeof * 8;
auto F = new size_t[((n + 2) / 2) / bpc + 1];
F[] = size_t.max;

size_t isSet(in size_t i) nothrow @safe @nogc {
immutable size_t offset = i / bpc;
immutable size_t mask = 1 << (i % bpc);
}

void resetBit(in size_t i) nothrow @safe @nogc {
immutable size_t offset = i / bpc;
immutable size_t mask = 1 << (i % bpc);
if ((F[offset] & mask) != 0)
}

for (size_t i = 3; i <= sqrt(real(n)); i += 2)
if (isSet((i - 3) / 2))
for (size_t j = i * i; j <= n; j += 2 * i)
resetBit((j - 3) / 2);

Appender!(size_t[]) result;
result ~= 2;
for (size_t i = 3; i <= n; i += 2)
if (isSet((i - 3) / 2))
result ~= i;
return result.data;
}

void main() {
50.sieve.writeln;
}

Extensible Version

(This version is used in the task Extensible prime generator.)

/// Extensible Sieve of Eratosthenes.
struct Prime {
uint[] a = ;

private void grow() pure nothrow @safe {
immutable p0 = a[\$ - 1] + 1;
auto b = new bool[p0];

foreach (immutable di; a) {
immutable uint i0 = p0 / di * di;
uint i = (i0 < p0) ? i0 + di - p0 : i0 - p0;
for (; i < b.length; i += di)
b[i] = true;
}

foreach (immutable uint i, immutable bi; b)
if (!b[i])
a ~= p0 + i;
}

uint opCall(in uint n) pure nothrow @safe {
while (n >= a.length)
grow;
return a[n];
}
}

version (sieve_of_eratosthenes3_main) {
void main() {
import std.stdio, std.range, std.algorithm;

Prime prime;
uint.max.iota.map!prime.until!q{a > 50}.writeln;
}
}

To see the output (that is the same), compile with -version=sieve_of_eratosthenes3_main.

Dart

// helper function to pretty print an Iterable
String iterableToString(Iterable seq) {
String str = "[";
Iterator i = seq.iterator;
if (i.moveNext()) str += i.current.toString();
while(i.moveNext()) {
str += ", " + i.current.toString();
}
return str + "]";
}

main() {
int limit = 1000;
int strt = new DateTime.now().millisecondsSinceEpoch;
Set<int> sieve = new Set<int>();

for(int i = 2; i <= limit; i++) {
}
for(int i = 2; i * i <= limit; i++) {
if(sieve.contains(i)) {
for(int j = i * i; j <= limit; j += i) {
sieve.remove(j);
}
}
}
var sortedValues = new List<int>.from(sieve);
int elpsd = new DateTime.now().millisecondsSinceEpoch - strt;
print("Found " + sieve.length.toString() + " primes up to " + limit.toString() +
" in " + elpsd.toString() + " milliseconds.");
print(iterableToString(sortedValues)); // expect sieve.length to be 168 up to 1000...
// Expect.equals(168, sieve.length);
}
Output:

Found 168 primes up to 1000 in 9 milliseconds. [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]

Although it has the characteristics of a true Sieve of Eratosthenes, the above code isn't very efficient due to the remove/modify operations on the Set. Due to these, the computational complexity isn't close to linear with increasing range and it is quite slow for larger sieve ranges compared to compiled languages, taking an average of about 22 thousand CPU clock cycles for each of the 664579 primes (about 4 seconds on a 3.6 Gigahertz CPU) just to sieve to ten million.

faster bit-packed array odds-only solution

import 'dart:typed_data';
import 'dart:math';

Iterable<int> soeOdds(int limit) {
if (limit < 3) return limit < 2 ? Iterable.empty() : ;
int lmti = (limit - 3) >> 1;
int bfsz = (lmti >> 3) + 1;
int sqrtlmt = (sqrt(limit) - 3).floor() >> 1;
Uint32List cmpsts = Uint32List(bfsz);
for (int i = 0; i <= sqrtlmt; ++i)
if ((cmpsts[i >> 5] & (1 << (i & 31))) == 0) {
int p = i + i + 3;
for (int j = (p * p - 3) >> 1; j <= lmti; j += p)
cmpsts[j >> 5] |= 1 << (j & 31);
}
return
.followedBy(
Iterable.generate(lmti + 1)
.where((i) => cmpsts[i >> 5] & (1 << (i & 31)) == 0)
.map((i) => i + i + 3) );
}

void main() {
final int range = 100000000;
String s = "( ";
primesPaged().take(25).forEach((p)=>s += "\$p "); print(s + ")");
print("There are \${countPrimesTo(1000000)} primes to 1000000.");
final start = DateTime.now().millisecondsSinceEpoch;
final elapsed = DateTime.now().millisecondsSinceEpoch - start;
print("There were \$answer primes found up to \$range.");
print("This test bench took \$elapsed milliseconds.");
}
Output:
( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes to 1000000.
There were 5761455 primes found up to 100000000.
This test bench took 4604 milliseconds.

The above code is somewhat faster at about 1.5 thousand CPU cycles per prime here run on a 1.92 Gigahertz low end Intel x5-Z8350 CPU or about 2.5 seconds on a 3.6 Gigahertz CPU using the Dart VM to sieve to 100 million.

Unbounded infinite iterators/generators of primes

Infinite generator using a (hash) Map (sieves odds-only)

The following code will have about O(n log (log n)) performance due to a hash table having O(1) average performance and is only somewhat slow due to the constant overhead of processing hashes:

Iterable<int> primesMap() {
Iterable<int> oddprms() sync* {
yield(3); yield(5); // need at least 2 for initialization
final Map<int, int> bpmap = {9: 6};
final Iterator<int> bps = oddprms().iterator;
bps.moveNext(); bps.moveNext(); // skip past 3 to 5
int bp = bps.current;
int n = bp;
int q = bp * bp;
while (true) {
n += 2;
while (n >= q || bpmap.containsKey(n)) {
if (n >= q) {
final int inc = bp << 1;
bpmap[bp * bp + inc] = inc;
bps.moveNext(); bp = bps.current; q = bp * bp;
} else {
final int inc = bpmap.remove(n);
int next = n + inc;
while (bpmap.containsKey(next)) {
next += inc;
}
bpmap[next] = inc;
}
n += 2;
}
yield(n);
}
}
return .followedBy(oddprms());
}

void main() {
final int range = 100000000;
String s = "( ";
primesMap().take(25).forEach((p)=>s += "\$p "); print(s + ")");
print("There are \${primesMap().takeWhile((p)=>p<=1000000).length} preimes to 1000000.");
final start = DateTime.now().millisecondsSinceEpoch;
final elapsed = DateTime.now().millisecondsSinceEpoch - start;
print("There were \$answer primes found up to \$range.");
print("This test bench took \$elapsed milliseconds.");
}
Output:
( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 preimes to 1000000.
There were 5761455 primes found up to 100000000.
This test bench took 16086 milliseconds.

This takes about 5300 CPU clock cycles per prime or about 8.4 seconds if run on a 3.6 Gigahertz CPU, which is slower than the above fixed bit-packed array version but has the advantage that it runs indefinitely, (at least on 64-bit machines; on 32 bit machines it can only be run up to the 32-bit number range, or just about a factor of 20 above as above).

Due to the constant execution overhead this is only reasonably useful for ranges up to tens of millions anyway.

Fast page segmented array infinite generator (sieves odds-only)

The following code also theoretically has a O(n log (log n)) execution speed performance and the same limited use on 32-bit execution platformas, but won't realize the theoretical execution complexity for larger primes due to the cache size increasing in size beyond its limits; but as the CPU L2 cache size that it automatically grows to use isn't any slower than the basic culling loop speed, it won't slow down much above that limit up to ranges of about 2.56e14, which will take in the order of weeks:

Translation of: Kotlin
import 'dart:typed_data';
import 'dart:math';
import 'dart:collection';

// a lazy list
typedef _LazyList _Thunk();
class _LazyList<T> {
_Thunk thunk;
_LazyList<T> _rest;
_LazyList<T> get rest {
if (this.thunk != null) {
this._rest = this.thunk();
this.thunk = null;
}
return this._rest;
}
}

class _LazyListIterable<T> extends IterableBase<T> {
_LazyList<T> _first;
_LazyListIterable(_LazyList<T> this._first);
@override Iterator<T> get iterator {
Iterable<T> inner() sync* {
_LazyList<T> current = this._first;
while (true) {
current = current.rest;
}
}
return inner().iterator;
}
}

// zero bit population count Look Up Table for 16-bit range...
final Uint8List CLUT =
Uint8List.fromList(
Iterable.generate(65536)
.map((i) {
final int v0 = ~i & 0xFFFF;
final int v1 = v0 - ((v0 & 0xAAAA) >> 1);
final int v2 = (v1 & 0x3333) + ((v1 & 0xCCCC) >> 2);
return (((((v2 & 0x0F0F) + ((v2 & 0xF0F0) >> 4)) * 0x0101)) >> 8) & 31;
})
.toList());

int _countComposites(Uint8List cmpsts) {
Uint16List buf = Uint16List.view(cmpsts.buffer);
int lmt = buf.length;
int count = 0;
for (var i = 0; i < lmt; ++i) {
count += CLUT[buf[i]];
}
return count;
}

// converts an entire sieved array of bytes into an array of UInt32 primes,
// to be used as a source of base primes...
Uint32List _composites2BasePrimeArray(int low, Uint8List cmpsts) {
final int lmti = cmpsts.length << 3;
final int len = _countComposites(cmpsts);
final Uint32List rslt = Uint32List(len);
int j = 0;
for (int i = 0; i < lmti; ++i) {
if (cmpsts[i >> 3] & 1 << (i & 7) == 0) {
rslt[j++] = low + i + i;
}
}
return rslt;
}

// do sieving work based on low starting value for the given buffer and
// the given lazy list of base prime arrays...
void _sieveComposites(int low, Uint8List buffer, Iterable<Uint32List> bpas) {
final int lowi = (low - 3) >> 1;
final int len = buffer.length;
final int lmti = len << 3;
final int nxti = lowi + lmti;
for (var bpa in bpas) {
for (var bp in bpa) {
final int bpi = (bp - 3) >> 1;
int strti = ((bpi * (bpi + 3)) << 1) + 3;
if (strti >= nxti) return;
if (strti >= lowi) strti = strti - lowi;
else {
strti = (lowi - strti) % bp;
if (strti != 0) strti = bp - strti;
}
if (bp <= len >> 3 && strti <= lmti - bp << 6) {
final int slmti = min(lmti, strti + bp << 3);
for (var s = strti; s < slmti; s += bp) {
final int msk = 1 << (s & 7);
for (var c = s >> 3; c < len; c += bp) {
buffer[c] |= msk;
}
}
}
else {
for (var c = strti; c < lmti; c += bp) {
buffer[c >> 3] |= 1 << (c & 7);
}
}
}
}
}

// starts the secondary base primes feed with minimum size in bits set to 4K...
// thus, for the first buffer primes up to 8293,
// the seeded primes easily cover it as 97 squared is 9409...
Iterable<Uint32List> _makeBasePrimeArrays() {
var cmpsts = Uint8List(512);
_LazyList<Uint32List> _nextelem(int low, Iterable<Uint32List> bpas) {
// calculate size so that the bit span is at least as big as the
// maximum culling prime required, rounded up to minsizebits blocks...
final int rqdsz = 2 + sqrt((1 + low).toDouble()).toInt();
final sz = (((rqdsz >> 12) + 1) << 9); // size in bytes
if (sz > cmpsts.length) cmpsts = Uint8List(sz);
cmpsts.fillRange(0, cmpsts.length, 0);
_sieveComposites(low, cmpsts, bpas);
final arr = _composites2BasePrimeArray(low, cmpsts);
final nxt = low + (cmpsts.length << 4);
return _LazyList(arr, () => _nextelem(nxt, bpas));
}
// pre-seeding breaks recursive race,
// as only known base primes used for first page...
final preseedarr = Uint32List.fromList( [ // pre-seed to 100, can sieve to 10,000...
3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ] );
return _LazyListIterable(
_LazyList(preseedarr,
() => _nextelem(101, _makeBasePrimeArrays()))
);
}

// an iterable sequence over successive sieved buffer composite arrays,
// returning a tuple of the value represented by the lowest possible prime
// in the sieved composites array and the array itself;
// the array has a 16 Kilobytes minimum size (CPU L1 cache), but
// will grow so that the bit span is larger than the
// maximum culling base prime required, possibly making it larger than
// the L1 cache for large ranges, but still reasonably efficient using
// the L2 cache: very efficient up to about 16e9 range;
// reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 day...
Iterable<List> _makeSievePages() sync* {
final bpas = _makeBasePrimeArrays(); // secondary source of base prime arrays
int low = 3;
Uint8List cmpsts = Uint8List(16384);
_sieveComposites(3, cmpsts, bpas);
while (true) {
yield([low, cmpsts]);
final rqdsz = 2 + sqrt((1 + low).toDouble()).toInt(); // problem with sqrt not exact past about 10^12!!!!!!!!!
final sz = ((rqdsz >> 17) + 1) << 14; // size iin bytes
if (sz > cmpsts.length) cmpsts = Uint8List(sz);
cmpsts.fillRange(0, cmpsts.length, 0);
low += cmpsts.length << 4;
_sieveComposites(low, cmpsts, bpas);
}
}

int countPrimesTo(int range) {
if (range < 3) { if (range < 2) return 0; else return 1; }
var count = 1;
for (var sp in _makeSievePages()) {
int low = sp; Uint8List cmpsts = sp;
if ((low + (cmpsts.length << 4)) > range) {
int lsti = (range - low) >> 1;
var lstw = (lsti >> 4); var lstb = lstw << 1;
var msk = (-2 << (lsti & 15)) & 0xFFFF;
var buf = Uint16List.view(cmpsts.buffer, 0, lstw);
for (var i = 0; i < lstw; ++i)
count += CLUT[buf[i]];
count += CLUT[(cmpsts[lstb + 1] << 8) | cmpsts[lstb] | msk];
break;
} else {
count += _countComposites(cmpsts);
}
}
return count;
}

// sequence over primes from above page iterator;
// unless doing something special with individual primes, usually unnecessary;
// better to do manipulations based on the composites bit arrays...
// takes at least as long to enumerate the primes as sieve them...
Iterable<int> primesPaged() sync* {
yield(2);
for (var sp in _makeSievePages()) {
int low = sp; Uint8List cmpsts = sp;
var szbts = cmpsts.length << 3;
for (var i = 0; i < szbts; ++i) {
if (cmpsts[i >> 3].toInt() & (1 << (i & 7)) != 0) continue;
yield(low + i + i);
}
}
}

void main() {
final int range = 1000000000;
String s = "( ";
primesPaged().take(25).forEach((p)=>s += "\$p "); print(s + ")");
print("There are \${countPrimesTo(1000000)} primes to 1000000.");
final start = DateTime.now().millisecondsSinceEpoch;
final answer = countPrimesTo(range); // fast way
// final answer = primesPaged().takeWhile((p)=>p<=range).length; // slow way using enumeration
final elapsed = DateTime.now().millisecondsSinceEpoch - start;
print("There were \$answer primes found up to \$range.");
print("This test bench took \$elapsed milliseconds.");
}
Output:
( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes to 1000000.
There were 50847534 primes found up to 1000000000.
This test bench took 9385 milliseconds.

This version counts the primes up to one billion in about five seconds at 3.6 Gigahertz (a low end 1.92 Gigahertz CPU used here) or about 350 CPU clock cycles per prime under the Dart Virtual Machine (VM).

Note that it takes about four times as long to do this using the provided primes generator/enumerator as noted in the code, which is normal for all languages that it takes longer to actually enumerate the primes than it does to sieve in culling the composite numbers, but Dart is somewhat slower than most for this.

The algorithm can be sped up by a factor of four by extreme wheel factorization and (likely) about a factor of the effective number of CPU cores by using multi-processing isolates, but there isn't much point if one is to use the prime generator for output. For most purposes, it is better to use custom functions that directly manipulate the culled bit-packed page segments as `countPrimesTo` does here.

Delphi

program erathostenes;

{\$APPTYPE CONSOLE}

type
TSieve = class
private
fPrimes: TArray<boolean>;
procedure InitArray;
procedure Sieve;
function getNextPrime(aStart: integer): integer;
function getPrimeArray: TArray<integer>;
public
function getPrimes(aMax: integer): TArray<integer>;
end;

{ TSieve }

function TSieve.getNextPrime(aStart: integer): integer;
begin
result := aStart;
while not fPrimes[result] do
inc(result);
end;

function TSieve.getPrimeArray: TArray<integer>;
var
i, n: integer;
begin
n := 0;
setlength(result, length(fPrimes)); // init array with maximum elements
for i := 2 to high(fPrimes) do
begin
if fPrimes[i] then
begin
result[n] := i;
inc(n);
end;
end;
setlength(result, n); // reduce array to actual elements
end;

function TSieve.getPrimes(aMax: integer): TArray<integer>;
begin
setlength(fPrimes, aMax);
InitArray;
Sieve;
result := getPrimeArray;
end;

procedure TSieve.InitArray;
begin
for i := 2 to high(fPrimes) do
fPrimes[i] := true;
end;

procedure TSieve.Sieve;
var
i, n, max: integer;
begin
max := length(fPrimes);
i := 2;
while i < sqrt(max) do
begin
n := sqr(i);
while n < max do
begin
fPrimes[n] := false;
inc(n, i);
end;
i := getNextPrime(i + 1);
end;
end;

var
i: integer;
Sieve: TSieve;

begin
Sieve := TSieve.Create;
for i in Sieve.getPrimes(100) do
write(i, ' ');
Sieve.Free;
end.

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Draco

/* Sieve of Eratosthenes - fill a given boolean array */
proc nonrec sieve([*] bool prime) void:
word p, c, max;
max := dim(prime,1)-1;
prime := false;
prime := false;
for p from 2 upto max do prime[p] := true od;
for p from 2 upto max>>1 do
if prime[p] then
for c from p*2 by p upto max do
prime[c] := false
od
fi
od
corp

/* Print primes up to 1000 using the sieve */
proc nonrec main() void:
word MAX = 1000;
unsigned MAX i;
byte c;
[MAX+1] bool prime;
sieve(prime);

c := 0;
for i from 0 upto MAX do
if prime[i] then
write(i:4);
c := c + 1;
if c=10 then c:=0; writeln() fi
fi
od
corp
Output:
2   3   5   7  11  13  17  19  23  29
31  37  41  43  47  53  59  61  67  71
73  79  83  89  97 101 103 107 109 113
127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229
233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349
353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463
467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601
607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733
739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863
877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997

DWScript

function Primes(limit : Integer) : array of Integer;
var
n, k : Integer;
sieve := new Boolean[limit+1];
begin
for n := 2 to Round(Sqrt(limit)) do begin
if not sieve[n] then begin
for k := n*n to limit step n do
sieve[k] := True;
end;
end;

for k:=2 to limit do
if not sieve[k] then
end;

var r := Primes(50);
var i : Integer;
for i:=0 to r.High do
PrintLn(r[i]);

Dylan

With outer to sqrt and inner to p^2 optimizations:

define method primes(n)
let limit = floor(n ^ 0.5) + 1;
let sieve = make(limited(<simple-vector>, of: <boolean>), size: n + 1, fill: #t);
let last-prime = 2;

while (last-prime < limit)
for (x from last-prime ^ 2 to n by last-prime)
sieve[x] := #f;
end for;
block (found-prime)
for (n from last-prime + 1 below limit)
if (sieve[n] = #f)
last-prime := n;
found-prime()
end;
end;
last-prime := limit;
end block;
end while;

for (x from 2 to n)
if (sieve[x]) format-out("Prime: %d\n", x); end;
end;
end;

E

E's standard library doesn't have a step-by-N numeric range, so we'll define one, implementing the standard iteration protocol.

def rangeFromBelowBy(start, limit, step) {
return def stepper {
to iterate(f) {
var i := start
while (i < limit) {
f(null, i)
i += step
}
}
}
}

The sieve itself:

def eratosthenes(limit :(int > 2), output) {
def composite := [].asSet().diverge()
for i ? (!composite.contains(i)) in 2..!limit {
output(i)
}
}

Example usage:

? eratosthenes(12, println)
# stdout: 2
#         3
#         5
#         7
#         11

EasyLang

len prims[] 100
max = sqrt len prims[]
tst = 2
while tst <= max
if prims[tst] = 0
i = tst * tst
while i < len prims[]
prims[i] = 1
i += tst
.
.
tst += 1
.
i = 2
while i < len prims[]
if prims[i] = 0
print i
.
i += 1
.

eC

 This example is incorrect. Please fix the code and remove this message.Details: It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.

Note: this is not a Sieve of Eratosthenes; it is just trial division.

public class FindPrime
{
Array<int> primeList { [ 2 ], minAllocSize = 64 };
int index;

index = 3;

bool HasPrimeFactor(int x)
{
int max = (int)floor(sqrt((double)x));

for(i : primeList)
{
if(i > max) break;
if(x % i == 0) return true;
}
return false;
}

public int GetPrime(int x)
{
if(x > primeList.count - 1)
{
for (; primeList.count != x; index += 2)
if(!HasPrimeFactor(index))
{
if(primeList.count >= primeList.minAllocSize) primeList.minAllocSize *= 2;
}
}
return primeList[x-1];
}
}

class PrimeApp : Application
{
FindPrime fp { };
void Main()
{
int num = argc > 1 ? atoi(argv) : 1;
PrintLn(fp.GetPrime(num));
}
}

EchoLisp

Sieve

(require 'types) ;; bit-vector

;; converts sieve->list for integers in [nmin .. nmax[
(define (s-range sieve nmin nmax (base 0))
(for/list ([ i (in-range nmin nmax)]) #:when (bit-vector-ref sieve i) (+ i base)))

;; next prime in sieve > p, or #f
(define (s-next-prime sieve p ) ;;
(bit-vector-scan-1 sieve (1+ p)))

;; returns a bit-vector - sieve- all numbers in [0..n[
(define (eratosthenes n)
(define primes (make-bit-vector-1 n ))
(bit-vector-set! primes 0 #f)
(bit-vector-set! primes 1 #f)
(for ([p (1+ (sqrt n))])
#:when (bit-vector-ref primes p)
(for ([j (in-range (* p p) n p)])
(bit-vector-set! primes j #f)))
primes)

(define s-primes (eratosthenes 10_000_000))

(s-range s-primes 0 100)
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)
(s-range s-primes 1_000_000 1_000_100)
(1000003 1000033 1000037 1000039 1000081 1000099)
(s-next-prime s-primes 9_000_000)
9000011

Segmented sieve

Allow to extend the basis sieve (n) up to n^2. Memory requirement is O(√n)

;; ref :  http://research.cs.wisc.edu/techreports/1990/TR909.pdf
;; delta multiple of sqrt(n)
;; segment is [left .. left+delta-1]

(define (segmented sieve left delta (p 2) (first 0))
(define segment (make-bit-vector-1 delta))
(define right (+ left (1- delta)))
(define pmax (sqrt right))
(while p
#:break (> p pmax)
(set! first (+ left (modulo (- p (modulo left p)) p )))

(for [(q (in-range first (1+ right) p))]
(bit-vector-set! segment (- q left) #f))
(set! p (bit-vector-scan-1 sieve (1+ p))))
segment)

(define (seg-range nmin delta)
(s-range (segmented s-primes nmin delta) 0 delta nmin))

(seg-range 10_000_000_000 1000) ;; 15 milli-sec

(10000000019 10000000033 10000000061 10000000069 10000000097 10000000103 10000000121
10000000141 10000000147 10000000207 10000000259 10000000277 10000000279 10000000319
10000000343 10000000391 10000000403 10000000469 10000000501 10000000537 10000000583
10000000589 10000000597 10000000601 10000000631 10000000643 10000000649 10000000667
10000000679 10000000711 10000000723 10000000741 10000000753 10000000793 10000000799
10000000807 10000000877 10000000883 10000000889 10000000949 10000000963 10000000991
10000000993 10000000999)

;; 8 msec using the native (prime?) function
(for/list ((p (in-range 1_000_000_000 1_000_001_000))) #:when (prime? p) p)

Wheel

A 2x3 wheel gives a 50% performance gain.

;; 2x3 wheel
(define (weratosthenes n)
(define primes (make-bit-vector n )) ;; everybody to #f (false)
(bit-vector-set! primes 2 #t)
(bit-vector-set! primes 3 #t)
(bit-vector-set! primes 5 #t)

(for ([i (in-range 6 n 6) ]) ;; set candidate primes
(bit-vector-set! primes (1+ i) #t)
(bit-vector-set! primes (+ i 5) #t)
)

(for ([p (in-range 5 (1+ (sqrt n)) 2 ) ])
#:when (bit-vector-ref primes p)
(for ([j (in-range (* p p) n p)])
(bit-vector-set! primes j #f)))
primes)

EDSAC order code

This sieve program is based on one by Eiiti Wada, which on 2020-07-05 could be found at https://www.dcs.warwick.ac.uk/~edsac/

The main external change is that the program is not designed to be viewed in the monitor; it just writes as many primes as possible within the limitations imposed by Rosetta Code. Apart from the addition of comments, internal changes include the elimination of one set of masks, and a revised method of switching from one mask to another.

On the EdsacPC simulator (see link above) the printout starts off very slowly, and gradually gets faster.

[Sieve of Eratosthenes]
[EDSAC program. Initial Orders 2]

[Memory usage:
56..87 library subroutine P6, for printing
88..222 main program
224..293 mask table: 35 long masks; each has 34 1's and a single 0
294..1023 array of bits for integers 2, 3, 4, ...,
where bit is changed from 1 to 0 when integer is crossed out.
The address of the mask table must be even, and clear of the main program.
To change it, just change the value after "T47K" below.
The address of the bit array will then be changed automatically.]

[Subroutine M3, prints header, terminated by blank row of tape.
It's an "interlude", which runs and then gets overwritten.]
[email protected]@E8FEZPF
@&*SIEVE!OF!ERATOSTHENES!#2020
..PZ

[Subroutine P6, prints strictly positive integer.
32 locations; working locations 1, 4, 5.]
T 56 K
[email protected]@[email protected]@[email protected]@TFTF
[email protected]@[email protected]@J995FJF!F

(chosen because its code letter M is first letter of "Mask").
Address must be even and clear of main program.]
T 47 K

[Main program]
Must be even, because of long values at start.]
G K [set @ (theta) for relative addressing]

[Long constants]
T#Z PF TZ [clears sandwich digit between 0 and 1]
 PD PF [long value 1; also low word = short 1]
T2#Z PF T2Z [clears sandwich digit between 2 and 3]
 PF K4096F [long value 1000...000 binary;
also high word = teleprinter null]

[Short constants
The address in the following C order is the (exclusive) end of the bit table.
Must be even: max = 1024, min = M + 72 where M is address of mask table set up above.
Usually 1024, but may be reduced, e.g. to make the program run faster.]
 C1024 D [or e.g. C 326 D to make it much faster]
 U F ['U' = 'T' - 'C']
 K F ['K' = 'S' - 'C']
 H #M [H order for start of mask table]
 H 70#M [used to test for end of mask table]
 P 2 F [constant4, or 2 in address field]
 P 70 F [constant 140, or 70 in address field]
 @ F [carriage return]
 & F [line feed]

[Short variables]
 P 1 F [p = number under test
Let p = 35*q + r, where 0 <= r < 35]
 P F [4*q]
 P 4 F [4*r]

[Initial values of orders; required only for optional code below.]
 C 70#M [initial value of a variable C order]
 T #M [initial value of a variable T order]
 T 70#M [initial value of a variable T order]


[Enter with acc = 0]

[Optional code to do some initializing at run time.
This code allows the program to run again without being loaded again.]
A 7 @ [initial values of variable orders]
T 65 @
A 16 @
T 66 @
A 17 @
T 44 @
A 18 @
T 52 @

[Initialize variables]
L D [shift left 1]
U 13 @ [p := 2]
L 1 F [shift left 2]
T 15 @ [4*r := 8]
T 14 @ [4*q := 0]
[End of optional code]

[Make table of 35 masks 111...110, 111...101, ..., 011...111
Treat the mask 011...111 separately to avoid accumulator overflow.
Assume acc = 0 here.]
S #@ [acc all 1's]
S 2 #@ [acc := 0111...111]
 T 68 #M [store at high end of mask table]
S #@ [acc := -1]
[Loop shifting the mask right and storing the result in the mask table.
Uses first entry of bit array as temporary store.]
 T F [clear acc]
L D [shift left]
 U 70 #M [update current mask]
 T #M [store it in table (order changed at run time)]
A 44 @ [load preceding T order]
A 9 @ [inc address by 2]
U 44 @ [store back]
S 35 @ [reached high entry yet?]
G 39 @ [loop back if not]

[Initialize bit array: no numbers crossed out, so all bits are 1]
 T F [clear acc]
S #@ [subtract long 1, make top 35 bits all 1's]
 T 70 #M [store as long value, both words all 1's (order changed at run time)]
A 52 @ [load preceding order]
U 52 @ [and store back]
S 5 @ [convert to C order with same address (*)]
S 4 @ [test for end of bit array]
G 50 @ [loop until stored all 1's in bit table]
[(*) Done so that end of bit table can be stored at one place only
in list of constants, i.e. 'C m D' only, not 'T m D' as well.]

[Start of main loop.]
[Testing whether number has been crossed out]
 T F [acc := 0]
A 66 @ [deriving S order from C order]
A 6 @
T 64 @
S #@ [acc := -1]
 S F [acc := 1's complement of bit-table entry (order changed at run time)]
 H #M [mult reg := start of mask array (order changed at run time)]
 C 70#M [acc := -1 iff p (current number) is crossed out (order changed at run time)]
[The next order is to avoid accumulator overflow if acc = max positive number]
E 70 @ [if acc >= 0, jump to process new prime]
A #@ [if acc < 0, add 1 to test for -1]
E 106 @ [if acc now >= 0 number is crossed out, jump to test next]
[Here if new prime found.
Send it to the teleprinter]
 O 11 @ [print CR]
O 12 @ [print LF]
T F [clear acc]
T F [store in C(0) for print routine]
A 75 @ [for subroutine return]
G 56 F [print prime]

[Cross out its multiples by setting corresponding bits to 0]
A 65 @ [load H order above]
T 102 @ [plant in crossing-out loop]
A 66 @ [load C order above]
T1 03 @ [plant in crossing-out loop]

[Start of crossing-out loop. Here acc must = 0]
 A 102 @ [load H order below]
A 15 @ [inc address field by 2*r, where p = 35q + r]
U 102 @ [update H order]
S 8 @ [compare with 'H 70 #M']
G 93 @ [skip if not gone beyond end of mask table]
A 102 @ [load H order below]
T 102 @ [update H order]
A 103 @ [load C order below]
T 103 @ [update C order]
 T F [clear acc]
A 103 @ [load C order below]
A 14 @ [inc address field by 2*q, where p = 35q + r]
U 103 @ [update C order]
S 4 @ [test for end of bit array]
E 106 @ [if finished crossing out, loop to test next number]
A 4 @ [restore C order]
A 5 @ [make T order with same address]
T 104 @ [store below]

[Execute the crossing-out orders created above]
 X F [mult reg := mask (order created at run time)]
 X F [acc := logical and with bit-table entry (order created at run time)]
 X F [update entry (order created at run time)]
E 81 @ [loop back with acc = 0]

 T F [clear acc]
A 13 @ [load p = number under test]
T 13 @ [update]
A 15 @ [load 4*r, where p = 35q + r]
U 15 @ [store back (r inc'd by 1)]
S 10 @ [is 4*r now >= 140?]
G 119 @ [no, skip]
T 15 @ [yes, reduce 4*r by 140]
T 14 @ [store back (q inc'd by 1)]
 T F [clear acc]
A 65 @ [load 'H ... D' order, which refers to a mask]
U 65 @ [update order]
S 8 @ [over end of mask table?]
G 59 @ [no, skip wrapround code]
A 7 @ [yes, add constant to wrap round]
T 65 @ [update H order]
A 66 @
A 9 @ [inc address by 2]
U 66 @ [and store back]
S 4 @ [test for end, as defined by C order at start]
G 59 @ [loop back if not at end]

[Finished whole thing]
 O 3 @ [output null to flush teleprinter buffer]
Z F [stop]
E 19 Z [address to start execution]
P F [acc = 0 at start]

Output:
SIEVE OF ERATOSTHENES 2020
BASED ON CODE BY EIITI WADA 2001
2
3
5
7
11
13
17
[...]
12703
12713
12721
12739
12743
12757
12763

Eiffel

Works with: EiffelStudio version 6.6 beta (with provisional loop syntax)
class
APPLICATION

create
make

feature
make
-- Run application.
do
across primes_through (100) as ic loop print (ic.item.out + " ") end
end

-- Prime numbers through `a_limit'
require
valid_upper_limit: a_limit >= 2
local
l_tab: ARRAY [BOOLEAN]
do
create Result.make
create l_tab.make_filled (True, 2, a_limit)
across
l_tab as ic
loop
if ic.item then
Result.extend (ic.target_index)
across ((ic.target_index * ic.target_index) |..| l_tab.upper).new_cursor.with_step (ic.target_index) as id
loop
l_tab [id.item] := False
end
end
end
end
end

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Elixir

defmodule Prime do
def eratosthenes(limit \\ 1000) do
sieve = [false, false | Enum.to_list(2..limit)] |> List.to_tuple
check_list = [2 | Stream.iterate(3, &(&1+2)) |> Enum.take(round(:math.sqrt(limit)/2))]
Enum.reduce(check_list, sieve, fn i,tuple ->
if elem(tuple,i) do
clear_num = Stream.iterate(i*i, &(&1+i)) |> Enum.take_while(fn x -> x <= limit end)
clear(tuple, clear_num)
else
tuple
end
end)
end

defp clear(sieve, list) do
Enum.reduce(list, sieve, fn i, acc -> put_elem(acc, i, false) end)
end
end

limit = 199
sieve = Prime.eratosthenes(limit)
Enum.each(0..limit, fn n ->
if x=elem(sieve, n), do: :io.format("~3w", [x]), else: :io.format(" .")
if rem(n+1, 20)==0, do: IO.puts ""
end)
Output:
.  .  2  3  .  5  .  7  .  .  . 11  . 13  .  .  . 17  . 19
.  .  . 23  .  .  .  .  . 29  . 31  .  .  .  .  . 37  .  .
. 41  . 43  .  .  . 47  .  .  .  .  . 53  .  .  .  .  . 59
. 61  .  .  .  .  . 67  .  .  . 71  . 73  .  .  .  .  . 79
.  .  . 83  .  .  .  .  . 89  .  .  .  .  .  .  . 97  .  .
.101  .103  .  .  .107  .109  .  .  .113  .  .  .  .  .  .
.  .  .  .  .  .  .127  .  .  .131  .  .  .  .  .137  .139
.  .  .  .  .  .  .  .  .149  .151  .  .  .  .  .157  .  .
.  .  .163  .  .  .167  .  .  .  .  .173  .  .  .  .  .179
.181  .  .  .  .  .  .  .  .  .191  .193  .  .  .197  .199

Shorter version (but slow):

defmodule Sieve do
def primes_to(limit), do: sieve(Enum.to_list(2..limit))

defp sieve([h|t]), do: [h|sieve(t -- for n <- 1..length(t), do: h*n)]
defp sieve([]), do: []
end

Alternate much faster odds-only version more suitable for immutable data structures using a (hash) Map

The above code has a very limited useful range due to being very slow: for example, to sieve to a million, even changing the algorithm to odds-only, requires over 800 thousand "copy-on-update" operations of the entire saved immutable tuple ("array") of 500 thousand bytes in size, making it very much a "toy" application. The following code overcomes that problem by using a (immutable/hashed) Map to store the record of the current state of the composite number chains resulting from each of the secondary streams of base primes, which are only 167 in number up to this range; it is a functional "incremental" Sieve of Eratosthenes implementation:

defmodule PrimesSoEMap do
@typep stt :: {integer, integer, integer, Enumerable.integer, %{integer => integer}}

defp advance {n, bp, q, bps?, map} do
bps = if bps? === nil do Stream.drop(oddprms(), 1) else bps? end
nn = n + 2
if nn >= q do
inc = bp + bp
nbps = bps |> Stream.drop(1)
[nbp] = nbps |> Enum.take(1)
advance {nn, nbp, nbp * nbp, nbps, map |> Map.put(nn + inc, inc)}
else if Map.has_key?(map, nn) do
{inc, rmap} = Map.pop(map, nn)
[next] =
Stream.iterate(nn + inc, &(&1 + inc))
|> Stream.drop_while(&(Map.has_key?(rmap, &1))) |> Enum.take(1)
advance {nn, bp, q, bps, Map.put(rmap, next, inc)}
else
{nn, bp, q, bps, map}
end end
end

@spec oddprms() :: Enumerable.integer
defp oddprms do # put first base prime cull seq in Map so never empty
# advance base odd primes to 5 when initialized
init = {7, 5, 25, nil, %{9 => 6}}
[3, 5] # to avoid race, preseed with the first 2 elements...
|> Stream.concat(
|> Stream.map(fn {p,_,_,_,_} -> p end))
end

@spec primes() :: Enumerable.integer
def primes do
Stream.concat(, oddprms())
end

end

range = 1000000
IO.write "The first 25 primes are:\n( "
PrimesSoEMap.primes() |> Stream.take(25) |> Enum.each(&(IO.write "#{&1} "))
IO.puts ")"
testfunc =
fn () ->
ans =
PrimesSoEMap.primes() |> Stream.take_while(&(&1 <= range)) |> Enum.count()
ans end
:timer.tc(testfunc)
|> (fn {t,ans} ->
IO.puts "There are #{ans} primes up to #{range}."
IO.puts "This test bench took #{t} microseconds." end).()
Output:
The first 25 primes are:
( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes up to 1000000.
This test bench took 3811957 microseconds.

The output time of about 3.81 seconds to one million is on a 1.92 Gigahertz CPU meaning that it takes about 93 thousand CPU clock cycles per prime which is still quite slow compared to mutable data structure implementations but comparable to "functional" implementations in other languages and is slow due to the time to calculate the required hashes. One advantage that it has is that it is O(n log (log n)) asymptotic computational complexity meaning that it takes not much more than ten times as long to sieve a range ten times higher.

This algorithm could be easily changed to use a Priority Queue (preferably Min-Heap based for the least constant factor computational overhead) to save some of the computation time, but then it will have the same computational complexity as the following code and likely about the same execution time.

Alternate faster odds-only version more suitable for immutable data structures using lazy Streams of Co-Inductive Streams

In order to save the computation time of computing the hashes, the following version uses a deferred execution Co-Inductive Stream type (constructed using Tuple's) in an infinite tree folding structure (by the `pairs` function):

defmodule PrimesSoETreeFolding do
@typep cis :: {integer, (() -> cis)}
@typep ciss :: {cis, (() -> ciss)}

@spec merge(cis, cis) :: cis
defp merge(xs, ys) do
{x, restxs} = xs; {y, restys} = ys
cond do
x < y -> {x, fn () -> merge(restxs.(), ys) end}
y < x -> {y, fn () -> merge(xs, restys.()) end}
true -> {x, fn () -> merge(restxs.(), restys.()) end}
end
end

@spec smlt(integer, integer) :: cis
defp smlt(c, inc) do
{c, fn () -> smlt(c + inc, inc) end}
end

@spec smult(integer) :: cis
defp smult(p) do
smlt(p * p, p + p)
end
P
@spec allmults(cis) :: ciss
defp allmults {p, restps} do
{smult(p), fn () -> allmults(restps.()) end}
end

@spec pairs(ciss) :: ciss
defp pairs {cs0, restcss0} do
{cs1, restcss1} = restcss0.()
{merge(cs0, cs1), fn () -> pairs(restcss1.()) end}
end

@spec cmpsts(ciss) :: cis
defp cmpsts {cs, restcss} do
{c, restcs} = cs
{c, fn () -> merge(restcs.(), cmpsts(pairs(restcss.()))) end}
end

@spec minusat(integer, cis) :: cis
defp minusat(n, cmps) do
{c, restcs} = cmps
if n < c do
{n, fn () -> minusat(n + 2, cmps) end}
else
minusat(n + 2, restcs.())
end
end

@spec oddprms() :: cis
defp oddprms() do
{3, fn () ->
{5, fn () -> minusat(7, cmpsts(allmults(oddprms()))) end}
end}
end

@spec primes() :: Enumerable.t
def primes do
 |> Stream.concat(
Stream.iterate(oddprms(), fn {_, restps} -> restps.() end)
|> Stream.map(fn {p, _} -> p end)
)
end

end

range = 1000000
IO.write "The first 25 primes are:\n( "
PrimesSoETreeFolding.primes() |> Stream.take(25) |> Enum.each(&(IO.write "#{&1} "))
IO.puts ")"
testfunc =
fn () ->
ans =
PrimesSoETreeFolding.primes() |> Stream.take_while(&(&1 <= range)) |> Enum.count()
ans end
:timer.tc(testfunc)
|> (fn {t,ans} ->
IO.puts "There are #{ans} primes up to #{range}."
IO.puts "This test bench took #{t} microseconds." end).()

It's output is identical to the previous version other than the time required is less than half; however, it has a O(n (log n) (log (log n))) asymptotic computation complexity meaning that it gets slower with range faster than the above version. That said, it would take sieving to billions taking hours before the two would take about the same time.

Emacs Lisp

(defun sieve-set (limit)
(let ((xs (make-vector (1+ limit) 0)))
(loop for i from 2 to limit
when (zerop (aref xs i))
collect i
and do (loop for m from (* i i) to limit by i
do (aset xs m 1)))))

Straightforward implementation of sieve of Eratosthenes, 2 times faster:

(defun sieve (limit)
(let ((xs (vconcat [0 0] (number-sequence 2 limit))))
(loop for i from 2 to (sqrt limit)
when (aref xs i)
do (loop for m from (* i i) to limit by i
do (aset xs m 0)))
(remove 0 xs)))

Erlang

Erlang using Dicts

 This example is incorrect. Please fix the code and remove this message.Details: See talk page.

-module( sieve_of_eratosthenes ).

-export( [primes_upto/1] ).

primes_upto( N ) ->
Ns = lists:seq( 2, N ),
Dict = dict:from_list( [{X, potential_prime} || X <- Ns] ),
{Upto_sqrt_ns, _T} = lists:split( erlang:round(math:sqrt(N)), Ns ),
{N, Prime_dict} = lists:foldl( fun find_prime/2, {N, Dict}, Upto_sqrt_ns ),
lists:sort( dict:fetch_keys(Prime_dict) ).

find_prime( N, {Max, Dict} ) -> find_prime( dict:find(N, Dict), N, {Max, Dict} ).

find_prime( error, _N, Acc ) -> Acc;
find_prime( {ok, _Value}, N, {Max, Dict} ) -> {Max, lists:foldl( fun dict:erase/2, Dict, lists:seq(N*N, Max, N) )}.

Output:
35> sieve_of_eratosthenes:primes_upto( 20 ).
[2,3,5,7,11,13,17,19]

Erlang Lists of Tuples, Sloww

A much slower, perverse method, using only lists of tuples. Especially evil is the P = lists:filtermap operation which yields a list for every iteration of the X * M row. Has the virtue of working for any -> N :)

-module( sieve ).
-export( [main/1,primes/2] ).

main(N) -> io:format("Primes: ~w~n", [ primes(2,N) ]).

primes(M,N) -> primes(M, N,lists:seq( M, N ),[]).

primes(M,N,_Acc,Tuples) when M > N/2-> out(Tuples);

primes(M,N,Acc,Tuples) when length(Tuples) < 1 ->
primes(M,N,Acc,[{X, X} || X <- Acc]);

primes(M,N,Acc,Tuples) ->
{SqrtN, _T} = lists:split( erlang:round(math:sqrt(N)), Acc ),
F = Tuples,
Ms = lists:filtermap(fun(X) -> if X > 0 -> {true, X * M}; true -> false end end, SqrtN),
P = lists:filtermap(fun(T) ->
case lists:keymember(T,1,F) of true ->
{true, lists:keyreplace(T,1,F,{T,0})};
_-> false end end, Ms),
AA = mergeT(P,lists:last(P),1 ),
primes(M+1,N,Acc,AA).

mergeT(L,M,Acc) when Acc == length(L) -> M;
mergeT(L,M,Acc) ->
A = lists:nth(Acc,L),
B = M,
Mer = lists:zipwith(fun(X, Y) -> if X < Y -> X; true -> Y end end, A, B),
mergeT(L,Mer,Acc+1).

out(Tuples) ->
Primes = lists:filter( fun({_,Y}) -> Y > 0 end, Tuples),
[ X || {X,_} <- Primes ].

Output:
109> sieve:main(20).
Primes: [2,3,5,7,11,13,17,19]
ok
110> timer:tc(sieve, main, ).
Primes: [2,3,5,7,11,13,17,19]
{129,ok}

Erlang with ordered sets

Since I had written a really odd and slow one, I thought I'd best do a better performer. Inspired by an example from https://github.com/jupp0r

-module(ossieve).
-export([main/1]).

sieve(Candidates,SearchList,Primes,_Maximum) when length(SearchList) == 0 ->
ordsets:union(Primes,Candidates);
sieve(Candidates,SearchList,Primes,Maximum) ->
H = lists:nth(1,string:substr(Candidates,1,1)),
Reduced1 = ordsets:del_element(H, Candidates),
{Reduced2, ReducedSearch} = remove_multiples_of(H, Reduced1, SearchList),
sieve(Reduced2, ReducedSearch, NewPrimes, Maximum).

remove_multiples_of(Number,Candidates,SearchList) ->
NewSearchList = ordsets:filter( fun(X) -> X >= Number * Number end, SearchList),
RemoveList = ordsets:filter( fun(X) -> X rem Number == 0 end, NewSearchList),
{ordsets:subtract(Candidates, RemoveList), ordsets:subtract(NewSearchList, RemoveList)}.

main(N) ->
io:fwrite("Creating Candidates...~n"),
CandidateList = lists:seq(3,N,2),
Candidates = ordsets:from_list(CandidateList),
io:fwrite("Sieving...~n"),
io:fwrite("Sieved... ~w~n",[ResultSet]).

Output:
36> ossieve:main(100).
Creating Candidates...
Sieving...
Sieved... [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
ok

Erlang Canonical

A pure list comprehension approach.

-module(sieveof).
-export([main/1,primes/1, primes/2]).

main(X) -> io:format("Primes: ~w~n", [ primes(X) ]).

primes(X) -> sieve(range(2, X)).
primes(X, Y) -> remove(primes(X), primes(Y)).

range(X, X) -> [X];
range(X, Y) -> [X | range(X + 1, Y)].

sieve([X]) -> [X];
sieve([H | T]) -> [H | sieve(remove([H * X || X <-[H | T]], T))].

remove(_, []) -> [];
remove([H | X], [H | Y]) -> remove(X, Y);
remove(X, [H | Y]) -> [H | remove(X, Y)].

{out}

> timer:tc(sieve, main, ).
Primes: [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
{7350,ok}
61> timer:tc(sieveof, main, ).
Primes: [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
{363,ok}

Clearly not only more elegant, but faster :) Thanks to http://stackoverflow.com/users/113644/g-b

Erlang ets + cpu distributed implementation

much faster previous erlang examples

#!/usr/bin/env escript
%% -*- erlang -*-
%%! -smp enable -sname p10_4
% vim:syn=erlang

-mode(compile).

main([N0]) ->
N = list_to_integer(N0),
ets:new(comp, [public, named_table, {write_concurrency, true} ]),
ets:new(prim, [public, named_table, {write_concurrency, true}]),
composite_mc(N),
primes_mc(N),

primes_mc(N) ->
case erlang:system_info(schedulers) of
1 -> primes(N);
C -> launch_primes(lists:seq(1,C), C, N, N div C)
end.
launch_primes([1|T], C, N, R) -> P = self(), spawn(fun()-> primes(2,R), P ! {ok, prm} end), launch_primes(T, C, N, R);
launch_primes([H|[]], C, N, R)-> P = self(), spawn(fun()-> primes(R*(H-1)+1,N), P ! {ok, prm} end), wait_primes(C);
launch_primes([H|T], C, N, R) -> P = self(), spawn(fun()-> primes(R*(H-1)+1,R*H), P ! {ok, prm} end), launch_primes(T, C, N, R).

wait_primes(0) -> ok;
wait_primes(C) ->
{ok, prm} -> wait_primes(C-1)
after 1000 -> wait_primes(C)
end.

primes(N) -> primes(2, N).
primes(I,N) when I =< N ->
case ets:lookup(comp, I) of
[] -> ets:insert(prim, {I,1})
;_ -> ok
end,
primes(I+1, N);
primes(I,N) when I > N -> ok.

composite_mc(N) -> composite_mc(N,2,round(math:sqrt(N)),erlang:system_info(schedulers)).
composite_mc(N,I,M,C) when I =< M, C > 0 ->
C1 = case ets:lookup(comp, I) of
[] -> comp_i_mc(I*I, I, N), C-1
;_ -> C
end,
composite_mc(N,I+1,M,C1);
composite_mc(_,I,M,_) when I > M -> ok;
composite_mc(N,I,M,0) ->
{ok, cim} -> composite_mc(N,I,M,1)
after 1000 -> composite_mc(N,I,M,0)
end.

comp_i_mc(J, I, N) ->
Parent = self(),
spawn(fun() ->
comp_i(J, I, N),
Parent ! {ok, cim}
end).

comp_i(J, I, N) when J =< N -> ets:insert(comp, {J, 1}), comp_i(J+I, I, N);
comp_i(J, _, N) when J > N -> ok.

Output:
[email protected]:~/work/mblog/pr_euler/p10\$ ./generator.erl 100
97]

another several erlang implementation: http://mijkenator.github.io/2015/11/29/project-euler-problem-10/

ERRE

PROGRAM SIEVE_ORG
! --------------------------------------------------
! Eratosthenes Sieve Prime Number Program in BASIC
! (da 3 a SIZE*2) from Byte September 1981
!---------------------------------------------------
CONST SIZE%=8190

DIM FLAGS%[SIZE%]

BEGIN
PRINT("Only 1 iteration")
COUNT%=0
FOR I%=0 TO SIZE% DO
IF FLAGS%[I%]=TRUE THEN
!\$NULL
ELSE
PRIME%=I%+I%+3
K%=I%+PRIME%
WHILE NOT (K%>SIZE%) DO
FLAGS%[K%]=TRUE
K%=K%+PRIME%
END WHILE
PRINT(PRIME%;)
COUNT%=COUNT%+1
END IF
END FOR
PRINT
PRINT(COUNT%;" PRIMES")
END PROGRAM

Output:

last lines of the output screen

15749  15761  15767  15773  15787  15791  15797  15803  15809  15817  15823
15859  15877  15881  15887  15889  15901  15907  15913  15919  15923  15937
15959  15971  15973  15991  16001  16007  16033  16057  16061  16063  16067
16069  16073  16087  16091  16097  16103  16111  16127  16139  16141  16183
16187  16189  16193  16217  16223  16229  16231  16249  16253  16267  16273
16301  16319  16333  16339  16349  16361  16363  16369  16381
1899  PRIMES

Euphoria

constant limit = 1000
sequence flags,primes
flags = repeat(1, limit)
for i = 2 to sqrt(limit) do
if flags[i] then
for k = i*i to limit by i do
flags[k] = 0
end for
end if
end for

primes = {}
for i = 2 to limit do
if flags[i] = 1 then
primes &= i
end if
end for
? primes

Output:

{2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,
97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,
181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,
277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,
383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,
487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,
601,607,613,617,619,631,641,643,647,653,659,661,673,677,683,691,701,
709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,
827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,
947,953,967,971,977,983,991,997}

F#

Short with mutable state

let primes max =
let mutable xs = [|2..max|]
let limit = max |> float |> sqrt |> int
for x in [|2..limit|] do
xs <- xs |> Array.except [|x*x..x..max|]
xs

Short Sweet Functional and Idiotmatic

Well lists may not be lazy, but if you call it a sequence then it's a lazy list!

(*
An interesting implementation of The Sieve of Eratosthenes.
Nigel Galloway April 7th., 2017.
*)

let SofE =
let rec fn n g = seq{ match n with
|1 -> yield false; yield! fn g g
|_ -> yield true; yield! fn (n - 1) g}
let rec fg ng = seq {
let g = (Seq.findIndex(id) ng) + 2 // decreasingly inefficient with range at O(n)!
yield g; yield! fn (g - 1) g |> Seq.map2 (&&) ng |> Seq.cache |> fg }
Seq.initInfinite (fun x -> true) |> fg

Output:
> SofE |> Seq.take 10 |> Seq.iter(printfn "%d");;
2
3
5
7
11
13
17
19
23
29

Although interesting intellectually, and although the algorithm is more Sieve of Eratosthenes (SoE) than not in that it uses a progression of composite number representations separated by base prime gaps to cull, it isn't really SoE in performance due to several used functions that aren't linear with range, such as the "findIndex" that scans from the beginning of all primes to find the next un-culled value as the next prime in the sequence and the general slowness and inefficiency of F# nested sequence generation.

It is so slow that it takes in the order of seconds just to find the primes to a thousand!

For practical use, one would be much better served by any of the other functional sieves below, which can sieve to a million in less time than it takes this one to sieve to ten thousand. Those other functional sieves aren't all that many lines of code than this one.

Functional

Richard Bird Sieve

This is the idea behind Richard Bird's unbounded code presented in the Epilogue of M. O'Neill's article in Haskell. It is about twice as much code as the Haskell code because F# does not have a built-in lazy list so that the effect must be constructed using a Co-Inductive Stream (CIS) type since no memoization is required, along with the use of recursive functions in combination with sequences. The type inference needs some help with the new CIS type (including selecting the generic type for speed). Note the use of recursive functions to implement multiple non-sharing delayed generating base primes streams, which along with these being non-memoizing means that the entire primes stream is not held in memory as for the original Bird code:

type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness

let primesBird() =
let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
if x < y then CIS(x, fun() -> xtlf() ^^ ys)
elif y < x then CIS(y, fun() -> xs ^^ ytlf())
else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
let pmltpls p = let rec nxt c = CIS(c, fun() -> nxt (c + p)) in nxt (p * p)
let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
CIS(c, fun() -> (ctlf()) ^^ (cmpsts (amstlf())))
let rec minusat n (CIS(c, ctlf) as cs) =
if n < c then CIS(n, fun() -> minusat (n + 1u) cs)
else minusat (n + 1u) (ctlf())
let rec baseprms() = CIS(2u, fun() -> baseprms() |> allmltps |> cmpsts |> minusat 3u)
Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (baseprms())

The above code sieves all numbers of two and up including all even numbers as per the page specification; the following code makes the very minor changes for an odds-only sieve, with a speedup of over a factor of two:

type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness

let primesBirdOdds() =
let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
if x < y then CIS(x, fun() -> xtlf() ^^ ys)
elif y < x then CIS(y, fun() -> xs ^^ ytlf())
else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
let pmltpls p = let adv = p + p
let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)
let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
CIS(c, fun() -> ctlf() ^^ cmpsts (amstlf()))
let rec minusat n (CIS(c, ctlf) as cs) =
if n < c then CIS(n, fun() -> minusat (n + 2u) cs)
else minusat (n + 2u) (ctlf())
let rec oddprms() = CIS(3u, fun() -> oddprms() |> allmltps |> cmpsts |> minusat 5u)
Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (CIS(2u, fun() -> oddprms()))

Tree Folding Sieve

The above code is still somewhat inefficient as it operates on a linear right extending structure that deepens linearly with increasing base primes (those up to the square root of the currently sieved number); the following code changes the structure into an infinite binary tree-like folding by combining each pair of prime composite streams before further processing as usual - this decreases the processing by approximately a factor of log n:

type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness

let primesTreeFold() =
let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
if x < y then CIS(x, fun() -> xtlf() ^^ ys)
elif y < x then CIS(y, fun() -> xs ^^ ytlf())
else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
let pmltpls p = let adv = p + p
let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)
let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
let rec pairs (CIS(cs0, cs0tlf)) =
let (CIS(cs1, cs1tlf)) = cs0tlf() in CIS(cs0 ^^ cs1, fun() -> pairs (cs1tlf()))
let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
CIS(c, fun() -> ctlf() ^^ (cmpsts << pairs << amstlf)())
let rec minusat n (CIS(c, ctlf) as cs) =
if n < c then CIS(n, fun() -> minusat (n + 2u) cs)
else minusat (n + 2u) (ctlf())
let rec oddprms() = CIS(3u, fun() -> oddprms() |> allmltps |> cmpsts |> minusat 5u)
Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (CIS(2u, fun() -> oddprms()))

The above code is over four times faster than the "BirdOdds" version (at least 10x faster than the first, "primesBird", producing the millionth prime) and is moderately useful for a range of the first million primes or so.

Priority Queue Sieve

In order to investigate Priority Queue Sieves as espoused by O'Neill in the referenced article, one must find an equivalent implementation of a Min Heap Priority Queue as used by her. There is such an purely functional implementation in RosettaCode translated from the Haskell code she used, from which the essential parts are duplicated here (Note that the key value is given an integer type in order to avoid the inefficiency of F# in generic comparison):

[<RequireQualifiedAccess>]
module MinHeap =

type HeapEntry<'V> = struct val k:uint32 val v:'V new(k,v) = {k=k;v=v} end
[<CompilationRepresentation(CompilationRepresentationFlags.UseNullAsTrueValue)>]
[<NoEquality; NoComparison>]
type PQ<'V> =
| Mt
| Br of HeapEntry<'
V> * PQ<'V> * PQ<'V>

let empty = Mt

let peekMin = function | Br(kv, _, _) -> Some(kv.k, kv.v)
| _ -> None

let rec push wk wv =
function | Mt -> Br(HeapEntry(wk, wv), Mt, Mt)
| Br(vkv, ll, rr) ->
if wk <= vkv.k then
Br(HeapEntry(wk, wv), push vkv.k vkv.v rr, ll)
else Br(vkv, push wk wv rr, ll)

let private siftdown wk wv pql pqr =
let rec sift pl pr =
match pl with
| Mt -> Br(HeapEntry(wk, wv), Mt, Mt)
| Br(vkvl, pll, plr) ->
match pr with
| Mt -> if wk <= vkvl.k then Br(HeapEntry(wk, wv), pl, Mt)
else Br(vkvl, Br(HeapEntry(wk, wv), Mt, Mt), Mt)
| Br(vkvr, prl, prr) ->
if wk <= vkvl.k && wk <= vkvr.k then Br(HeapEntry(wk, wv), pl, pr)
elif vkvl.k <= vkvr.k then Br(vkvl, sift pll plr, pr)
else Br(vkvr, pl, sift prl prr)
sift pql pqr

let replaceMin wk wv = function | Mt -> Mt
| Br(_, ll, rr) -> siftdown wk wv ll rr

Except as noted for any individual code, all of the following codes need the following prefix code in order to implement the non-memoizing Co-Inductive Streams (CIS's) and to set the type of particular constants used in the codes to the same time as the "Prime" type:

type CIS<'T> = struct val v: 'T val cont: unit -> CIS<'T> new(v,cont) = {v=v;cont=cont} end
type Prime = uint32
let frstprm = 2u
let frstoddprm = 3u
let inc1 = 1u
let inc = 2u

The F# equivalent to O'Neill's "odds-only" code is then implemented as follows, which needs the included changed prefix in order to change the primes type to a larger one to prevent overflow (as well the key type for the MinHeap needs to be changed from uint32 to uint64); it is functionally the same as the O'Neill code other than for minor changes to suit the use of CIS streams and the option output of the "peekMin" function:

type CIS<'T> = struct val v: 'T val cont: unit -> CIS<'T> new(v,cont) = {v=v;cont=cont} end
type Prime = uint64
let frstprm = 2UL
let frstoddprm = 3UL
let inc = 2UL

let primesPQ() =
let pmult p (xs: CIS<Prime>) = // does map (* p) xs
let rec nxtm (cs: CIS<Prime>) =
CIS(p * cs.v, fun() -> nxtm (cs.cont())) in nxtm xs
let insertprime p xs table =
MinHeap.push (p * p) (pmult p xs) table
let rec sieve'
(ns: CIS<Prime>) table =
let nextcomposite = match MinHeap.peekMin table with
| None -> ns.v // never happens
| Some (k, _) -> k
let (n, advs) = match MinHeap.peekMin table with
| None -> (ns.v, ns.cont()) // never happens
| Some kv -> kv
else table
if nextcomposite <= ns.v then sieve' (ns.cont()) (adjust table)
else let n = ns.v in CIS(n, fun() ->
let nxtns = ns.cont() in sieve'
nxtns (insertprime n nxtns table))
let rec sieve (ns: CIS<Prime>) = let n = ns.v in CIS(n, fun() ->
let nxtns = ns.cont() in sieve' nxtns (insertprime n nxtns MinHeap.empty))
let odds = // is the odds CIS from 3 up
let rec nxto i = CIS(i, fun() -> nxto (i + inc)) in nxto frstoddprm
Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
(CIS(frstprm, fun() -> (sieve odds)))

However, that algorithm suffers in speed and memory use due to over-eager adding of prime composite streams to the queue such that the queue used is much larger than it needs to be and a much larger range of primes number must be used in order to avoid numeric overflow on the square of the prime added to the queue. The following code corrects that by using a secondary (actually a multiple of) base primes streams which are constrained to be based on a prime that is no larger than the square root of the currently sieved number - this permits the use of much smaller Prime types as per the default prefix:

let primesPQx() =
let rec nxtprm n pq q (bps: CIS<Prime>) =
if n >= q then let bp = bps.v in let adv = bp + bp
let nbps = bps.cont() in let nbp = nbps.v
nxtprm (n + inc) (MinHeap.push (n + adv) adv pq) (nbp * nbp) nbps
else let ck, cv = match MinHeap.peekMin pq with
| None -> (q, inc) // only happens until first insertion
| Some kv -> kv
if n >= ck then let rec adjpq ck cv pq =
let npq = MinHeap.replaceMin (ck + cv) cv pq
match MinHeap.peekMin npq with
| None -> npq // never happens
| Some(nk, nv) -> if n >= nk then adjpq nk nv npq
else npq
nxtprm (n + inc) (adjpq ck cv pq) q bps
else CIS(n, fun() -> nxtprm (n + inc) pq q bps)
let rec oddprms() = CIS(frstoddprm, fun() ->
nxtprm (frstoddprm + inc) MinHeap.empty (frstoddprm * frstoddprm) (oddprms()))
Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
(CIS(frstprm, fun() -> (oddprms())))

The above code is well over five times faster than the previous translated O'Neill version for the given variety of reasons.

Although slightly faster than the Tree Folding code, this latter code is also limited in practical usefulness to about the first one to ten million primes or so.

All of the above codes can be tested in the F# REPL with the following to produce the millionth prime (the "nth" function is zero based):

> primesXXX() |> Seq.nth 999999;;

where primesXXX() is replaced by the given primes generating function to be tested, and which all produce the following output (after a considerable wait in some cases):

Output:
val it : Prime = 15485863u

Imperative

The following code is written in functional style other than it uses a mutable bit array to sieve the composites:

let primes limit =
let buf = System.Collections.BitArray(int limit + 1, true)
let cull p = { p * p .. p .. limit } |> Seq.iter (fun c -> buf.[int c] <- false)
{ 2u .. uint32 (sqrt (double limit)) } |> Seq.iter (fun c -> if buf.[int c] then cull c)
{ 2u .. limit } |> Seq.map (fun i -> if buf.[int i] then i else 0u) |> Seq.filter ((<>) 0u)

[<EntryPoint>]
let main argv =
if argv = null || argv.Length = 0 then failwith "no command line argument for limit!!!"
printfn "%A" (primes (System.UInt32.Parse argv.) |> Seq.length)
0 // return an integer exit code

Substituting the following minor changes to the code for the "primes" function will only deal with the odd prime candidates for a speed up of over a factor of two as well as a reduction of the buffer size by a factor of two:

let primes limit =
let lmtb,lmtbsqrt = (limit - 3u) / 2u, (uint32 (sqrt (double limit)) - 3u) / 2u
let buf = System.Collections.BitArray(int lmtb + 1, true)
let cull i = let p = i + i + 3u in let s = p * (i + 1u) + i in
{ s .. p .. lmtb } |> Seq.iter (fun c -> buf.[int c] <- false)
{ 0u .. lmtbsqrt } |> Seq.iter (fun i -> if buf.[int i] then cull i )
let oddprms = { 0u .. lmtb } |> Seq.map (fun i -> if buf.[int i] then i + i + 3u else 0u)
|> Seq.filter ((<>) 0u)
seq { yield 2u; yield! oddprms }

The following code uses other functional forms for the inner culling loops of the "primes function" to reduce the use of inefficient sequences so as to reduce the execution time by another factor of almost three:

let primes limit =
let lmtb,lmtbsqrt = (limit - 3u) / 2u, (uint32 (sqrt (double limit)) - 3u) / 2u
let buf = System.Collections.BitArray(int lmtb + 1, true)
let rec culltest i = if i <= lmtbsqrt then
let p = i + i + 3u in let s = p * (i + 1u) + i in
let rec cullp c = if c <= lmtb then buf.[int c] <- false; cullp (c + p)
(if buf.[int i] then cullp s); culltest (i + 1u) in culltest 0u
seq {yield 2u; for i = 0u to lmtb do if buf.[int i] then yield i + i + 3u }

Now much of the remaining execution time is just the time to enumerate the primes as can be seen by turning "primes" into a primes counting function by substituting the following for the last line in the above code doing the enumeration; this makes the code run about a further five times faster:

let rec count i acc =
if i > int lmtb then acc else if buf.[i] then count (i + 1) (acc + 1) else count (i + 1) acc
count 0 1

Since the final enumeration of primes is the main remaining bottleneck, it is worth using a "roll-your-own" enumeration implemented as an object expression so as to save many inefficiencies in the use of the built-in seq computational expression by substituting the following code for the last line of the previous codes, which will decrease the execution time by a factor of over three (instead of almost five for the counting-only version, making it almost as fast):

let nmrtr() =
let i = ref -2
let rec nxti() = i:=!i + 1;if !i <= int lmtb && not buf.[!i] then nxti() else !i <= int lmtb
let inline curr() = if !i < 0 then (if !i= -1 then 2u else failwith "Enumeration not started!!!")
else let v = uint32 !i in v + v + 3u
{ new System.Collections.Generic.IEnumerator<_> with
member this.Current = curr()
interface System.Collections.IEnumerator with
member this.Current = box (curr())
member this.MoveNext() = if !i< -1 then i:=!i+1;true else nxti()
member this.Reset() = failwith "IEnumerator.Reset() not implemented!!!"a
interface System.IDisposable with
member this.Dispose() = () }
{ new System.Collections.Generic.IEnumerable<_> with
member this.GetEnumerator() = nmrtr()
interface System.Collections.IEnumerable with
member this.GetEnumerator() = nmrtr() :> System.Collections.IEnumerator }

The various optimization techniques shown here can be used "jointly and severally" on any of the basic versions for various trade-offs between code complexity and performance. Not shown here are other techniques of making the sieve faster, including extending wheel factorization to much larger wheels such as 2/3/5/7, pre-culling the arrays, page segmentation, and multi-processing.

Almost functional Unbounded

the following odds-only implmentations are written in an almost functional style avoiding the use of mutability except for the contents of the data structures uses to hold the state of the and any mutability necessary to implement a "roll-your-own" IEnumberable iterator interface for speed.

Unbounded Dictionary (Mutable Hash Table) Based Sieve

The following code uses the DotNet Dictionary class instead of the above functional Priority Queue to implement the sieve; as average (amortized) hash table access is O(1) rather than O(log n) as for the priority queue, this implementation is slightly faster than the priority queue version for the first million primes and will always be faster for any range above some low range value:

type Prime = uint32
let frstprm = 2u
let frstoddprm = 3u
let inc = 2u
let primesDict() =
let dct = System.Collections.Generic.Dictionary()
let rec nxtprm n q (bps: CIS<Prime>) =
if n >= q then let bp = bps.v in let adv = bp + bp
let nbps = bps.cont() in let nbp = nbps.v
nxtprm (n + inc) (nbp * nbp) nbps
else if dct.ContainsKey(n) then
dct.Remove(n) |> ignore
// let mutable nn = n + adv // ugly imperative code
// while dct.ContainsKey(nn) do nn <- nn + adv
let rec nxtmt k = // advance to next empty spot
if dct.ContainsKey(k) then nxtmt (k + adv)
nxtprm (n + inc) q bps
else CIS(n, fun() -> nxtprm (n + inc) q bps)
let rec oddprms() = CIS(frstoddprm, fun() ->
nxtprm (frstoddprm + inc) (frstoddprm * frstoddprm) (oddprms()))
Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
(CIS(frstprm, fun() -> (oddprms())))

The above code uses functional forms of code (with the imperative style commented out to show how it could be done imperatively) and also uses a recursive non-sharing secondary source of base primes just as for the Priority Queue version. As for the functional codes, the Primes type can easily be changed to "uint64" for wider range of sieving.

In spite of having true O(n log log n) Sieve of Eratosthenes computational complexity where n is the range of numbers to be sieved, the above code is still not particularly fast due to the time required to compute the hash values and manipulations of the hash table.

Unbounded Page-Segmented Bit-Packed Odds-Only Mutable Array Sieve

Note that the following code is used for the F# entry Extensible_prime_generator#Unbounded_Mutable_Array_Generator of the Extensible prime generator page.

All of the above unbounded implementations including the above Dictionary based version are quite slow due to their large constant factor computational overheads, making them more of an intellectual exercise than something practical, especially when larger sieving ranges are required. The following code implements an unbounded page segmented version of the sieve in not that many more lines of code, yet runs about 25 times faster than the Dictionary version for larger ranges of sieving such as to one billion; it uses functional forms without mutability other than for the contents of the arrays and the `primes` enumeration generator function that must use mutability for speed:

type Prime = float // use uint64/int64 for regular 64-bit F#
type private PrimeNdx = float // they are slow in JavaScript polyfills

let inline private prime n = float n // match these convenience conversions
let inline private primendx n = float n // with the types above!

let private cPGSZBTS = (1 <<< 14) * 8 // sieve buffer size in bits = CPUL1CACHE

type private SieveBuffer = uint8[]

/// a Co-Inductive Stream (CIS) of an "infinite" non-memoized series...
type private CIS<'T> = CIS of 'T * (unit -> CIS<'T>) //' apostrophe formatting adjustment

/// lazy list (memoized) series of base prime page arrays...
type private BasePrime = uint32
type private BasePrimeArr = BasePrime[]
type private BasePrimeArrs = BasePrimeArrs of BasePrimeArr * Option<Lazy<BasePrimeArrs>>

/// Masking array is faster than bit twiddle bit shifts!
let private cBITMASK = [| 1uy; 2uy; 4uy; 8uy; 16uy; 32uy; 64uy; 128uy |]

let private cullSieveBuffer lwi (bpas: BasePrimeArrs) (sb: SieveBuffer) =
let btlmt = (sb.Length <<< 3) - 1 in let lmti = lwi + primendx btlmt
let rec loopbp (BasePrimeArrs(bpa, bpatl) as ibpas) i =
if i >= bpa.Length then
match bpatl with
| None -> ()
| Some lv -> loopbp lv.Value 0 else
let bp = prime bpa.[i] in let bpndx = primendx ((bp - prime 3) / prime 2)
let s = (bpndx * primendx 2) * (bpndx + primendx 3) + primendx 3 in let bpint = int bp
if s <= lmti then
let s0 = // page cull start address calculation...
if s >= lwi then int (s - lwi) else
let r = (lwi - s) % (primendx bp)
if r = primendx 0 then 0 else int (bp - prime r)
let slmt = min btlmt (s0 - 1 + (bpint <<< 3))
let rec loopc c = // loop "unpeeling" is used so
if c <= slmt then // a constant mask can be used over the inner loop
let msk = cBITMASK.[c &&& 7]
let rec loopw w =
if w < sb.Length then sb.[w] <- sb.[w] ||| msk; loopw (w + bpint)
loopw (c >>> 3); loopc (c + bpint)
loopc s0; loopbp ibpas (i + 1) in loopbp bpas 0

/// fast Counting Look Up Table (CLUT) for pop counting...
let private cCLUT =
let arr = Array.zeroCreate 65536
let rec popcnt n cnt = if n > 0 then popcnt (n &&& (n - 1)) (cnt + 1) else uint8 cnt
let rec loop i = if i < 65536 then arr.[i] <- popcnt i 0; loop (i + 1)
loop 0; arr

let countSieveBuffer ndxlmt (sb: SieveBuffer): int =
let lstw = (ndxlmt >>> 3) &&& -2
let msk = (-2 <<< (ndxlmt &&& 15)) &&& 0xFFFF
let inline cntem i m =
int cCLUT.[int (((uint32 sb.[i + 1]) <<< 8) + uint32 sb.[i]) ||| m]
let rec loop i cnt =
if i >= lstw then cnt - cntem lstw msk else loop (i + 2) (cnt - cntem i 0)
loop 0 ((lstw <<< 3) + 16)

/// a CIS series of pages from the given start index with the given SieveBuffer size,
/// and provided with a polymorphic converter function to produce
/// and type of result from the culled page parameters...
let rec private makePrimePages strtwi btsz
(cnvrtrf: PrimeNdx -> SieveBuffer -> 'T): CIS<'T> =
let bpas = makeBasePrimes() in let sb = Array.zeroCreate (btsz >>> 3)
let rec nxtpg lwi =
Array.fill sb 0 sb.Length 0uy; cullSieveBuffer lwi bpas sb
CIS(cnvrtrf lwi sb, fun() -> nxtpg (lwi + primendx btsz))
nxtpg strtwi

/// secondary feed of lazy list of memoized pages of base primes...
and private makeBasePrimes(): BasePrimeArrs =
let sb2bpa lwi (sb: SieveBuffer) =
let bsbp = uint32 (primendx 3 + lwi + lwi)
let arr = Array.zeroCreate <| countSieveBuffer 255 sb
let rec loop i j =
if i < 256 then
if sb.[i >>> 3] &&& cBITMASK.[i &&& 7] <> 0uy then loop (i + 1) j
else arr.[j] <- bsbp + uint32 (i + i); loop (i + 1) (j + 1)
loop 0 0; arr
// finding the first page as not part of the loop and making succeeding
// pages lazy breaks the recursive data race!
let frstsb = Array.zeroCreate 32
let fkbpas = BasePrimeArrs(sb2bpa (primendx 0) frstsb, None)
cullSieveBuffer (primendx 0) fkbpas frstsb
let rec nxtbpas (CIS(bpa, tlf)) = BasePrimeArrs(bpa, Some(lazy (nxtbpas (tlf()))))
BasePrimeArrs(sb2bpa (primendx 0) frstsb,
Some(lazy (nxtbpas <| makePrimePages (primendx 256) 256 sb2bpa)))

/// produces a generator of primes; uses mutability for better speed...
let primes(): unit -> Prime =
let sb2prms lwi (sb: SieveBuffer) = lwi, sb in let mutable ndx = -1
let (CIS((nlwi, nsb), npgtlf)) = // use page generator function above!
makePrimePages (primendx 0) cPGSZBTS sb2prms
let mutable lwi = nlwi in let mutable sb = nsb
let mutable pgtlf = npgtlf
let mutable baseprm = prime 3 + prime (lwi + lwi)
fun() ->
if ndx < 0 then ndx <- 0; prime 2 else
let inline notprm i = sb.[i >>> 3] &&& cBITMASK.[i &&& 7] <> 0uy
while ndx < cPGSZBTS && notprm ndx do ndx <- ndx + 1
if ndx >= cPGSZBTS then // get next page if over
let (CIS((nlwi, nsb), npgtlf)) = pgtlf() in ndx <- 0
lwi <- nlwi; sb <- nsb; pgtlf <- npgtlf
baseprm <- prime 3 + prime (lwi + lwi)
while notprm ndx do ndx <- ndx + 1
let ni = ndx in ndx <- ndx + 1 // ready for next call!
baseprm + prime (ni + ni)

let countPrimesTo (limit: Prime): int = // much faster!
if limit < prime 3 then (if limit < prime 2 then 0 else 1) else
let topndx = (limit - prime 3) / prime 2 |> primendx
let sb2cnt lwi (sb: SieveBuffer) =
let btlmt = (sb.Length <<< 3) - 1 in let lmti = lwi + primendx btlmt
countSieveBuffer
(if lmti < topndx then btlmt else int (topndx - lwi)) sb, lmti
let rec loop (CIS((cnt, nxti), tlf)) count =
if nxti < topndx then loop (tlf()) (count + cnt)
else count + cnt
loop <| makePrimePages (primendx 0) cPGSZBTS sb2cnt <| 1

/// sequences are convenient but slow...
let primesSeq() = primes() |> Seq.unfold (fun gen -> Some(gen(), gen))
printfn "The first 25 primes are:  %s"
( primesSeq() |> Seq.take 25
|> Seq.fold (fun s p -> s + string p + " ") "" )
printfn "There are %d primes up to a million."
( primesSeq() |> Seq.takeWhile ((>=) (prime 1000000)) |> Seq.length )

let rec cntto gen lmt cnt = // faster than seq's but still slow
if gen() > lmt then cnt else cntto gen lmt (cnt + 1)

let limit = prime 1_000_000_000
let start = System.DateTime.Now.Ticks
// let answr = cntto (primes()) limit 0 // slower way!
let answr = countPrimesTo limit // over twice as fast way!
let elpsd = (System.DateTime.Now.Ticks - start) / 10000L
printfn "Found %d primes to %A in %d milliseconds." answr limit elpsd
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to a million.
Found 50847534 primes to 1000000000 in 2161 milliseconds.

As with all of the efficient unbounded sieves, the above code uses a secondary enumerator of the base primes less than the square root of the currently culled range, which is this case is a lazy (deferred memoized evaluation) binding by small pages of base primes which also uses the laziness of the deferral of subsequent pages so as to avoid a race condition.

The above code is written to output the "uint64" type for very large ranges of primes since there is little computational cost to doing this for this algorithm when used with 64-bit compilation; however, for the Fable transpiled to JavaScript, the largest contiguous integer that can be represented is the 64-bit floating point mantissa of 52 bits and thus the large numbers can be represented by floats in this case since a 64-bit polyfill is very slow. As written, the practical range for this sieve is about 16 billion, however, it can be extended to about 10^14 (a week or two of execution time) by setting the "PGSZBTS" constant to the size of the CPU L2 cache rather than the L1 cache (L2 is up to about two Megabytes for modern high end desktop CPU's) at a slight loss of efficiency (a factor of up to two or so) per composite number culling operation due to the slower memory access time. When the Fable compilation option is used, execution speed is roughly the same as using F# with DotNet Core.

Even with the custom `primes` enumerator generator (the F#/Fable built-in sequence operators are terribly inefficient), the time to enumerate the resulting primes takes longer than the time to actually cull the composite numbers from the sieving arrays. The time to do the actual culling is thus over 50 times faster than done using the Dictionary version. The slowness of enumeration, no matter what further tweaks are done to improve it (each value enumerated will always take a function calls and a scan loop that will always take something in the order of 100 CPU clock cycles per value), means that further gains in speed using extreme wheel factorization and multi-processing have little point unless the actual work on the resulting primes is done through use of auxiliary functions not using iteration. Such a function is provided here to count the primes by pages using a "pop count" look up table to reduce the counting time to only a small fraction of a second.

Factor

Factor already contains two implementations of the sieve of Eratosthenes in math.primes.erato and math.primes.erato.fast. It is suggested to use one of them for real use, as they use faster types, faster unsafe arithmetic, and/or wheels to speed up the sieve further. Shown here is a more straightforward implementation that adheres to the restrictions given by the task (namely, no wheels).

Factor is pleasantly multiparadigm. Usually, it's natural to write more functional or declarative code in Factor, but this is an instance where it is more natural to write imperative code. Lexical variables are useful here for expressing the necessary mutations in a clean way.

USING: bit-arrays io kernel locals math math.functions
math.ranges prettyprint sequences ;
IN: rosetta-code.sieve-of-erato

<PRIVATE

: init-sieve ( n -- seq )  ! Include 0 and 1 for easy indexing.
1 - <bit-array> dup set-bits ?{ f f } prepend ;

! Given the sieve and a prime starting index, create a range of
! values to mark composite. Start at the square of the prime.
: to-mark ( seq n -- range )
[ length 1 - ] [ dup dup * ] bi* -rot <range> ;

! Mark multiples of prime n as composite.
: mark-nths ( seq n -- )
dupd to-mark [ swap [ f ] 2dip set-nth ] with each ;

: next-prime ( index seq -- n ) [ t = ] find-from drop ;

PRIVATE>

:: sieve ( n -- seq )
n sqrt 2 n init-sieve :> ( limit i! s )
[ i limit < ]  ! sqrt optimization
[ s i mark-nths i 1 + s next-prime i! ] while t s indices ;

: sieve-demo ( -- )
"Primes up to 120 using sieve of Eratosthenes:" print
120 sieve . ;

MAIN: sieve-demo

FOCAL

1.2 A N
1.3 I (2047-N)5.1
1.4 D 2
1.5 Q

2.1 F X=2,FSQT(N); D 3
2.2 F W=2,N; I (SIEVE(W)-2)4.1

3.1 I (-SIEVE(X))3.3
3.2 F Y=X*X,X,N; S SIEVE(Y)=2
3.3 R

4.1 T %4.0,W,!

5.1 T "PLEASE ENTER A NUMBER LESS THAN 2048."!; G 1.1

Note that with the 4k paper tape version of FOCAL, the program will run out of memory for N>190 or so.

Forth

: prime? ( n -- ? ) here + [email protected] 0= ;
: composite! ( n -- ) here + 1 swap c! ;

: sieve ( n -- )
here over erase
2
begin
2dup dup * >
while
dup prime? if
2dup dup * do
i composite!
dup +loop
then
1+
repeat
drop
." Primes: " 2 do i prime? if i . then loop ;

100 sieve
Output:
Primes: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Alternate Odds-Only, Better Style

The above code is not really very good Forth style as the main initialization, sieving, and output, are all in one `sieve` routine which makes it difficult to understand and refactor; Forth code is normally written in a series of very small routines which makes it easier to understand what is happening on the data stack, since Forth does not have named local re-entrant variable names as most other languages do for local variables (which other languages also normally store local variables on the stack). Also, it uses the `HERE` pointer to user space which points to the next available memory after all compilation is done as a unsized buffer pointer, but as it does not reserve that space for the sieving buffer, it can be changed by other concatenated routines in unexpected ways; better is to allocate the sieving buffer as required from the available space at the time the routines are run and pass that address between concatenated functions until a finalization function frees the memory and clears the stack; this is equivalent to allocating from the "heap" in other languages. The below code demonstrates these ideas:

: prime? ( addr -- ? ) [email protected] 0= ; \ test composites array for prime

\ given square index and prime index, u0, sieve the multiples of said prime...
: cullpi! ( u addr u u0 -- u addr u0 )
DUP DUP + 3 + ROT 4 PICK SWAP \ -- numv addr i prm numv sqri
DO 2 PICK I + TRUE SWAP C! DUP +LOOP DROP ;

\ process for required prime limit; allocate and initialize returned buffer...
: initsieve ( u -- u a-addr)
3 - DUP 0< IF 0 ELSE
1 RSHIFT 1+ DUP ALLOCATE 0<> IF ABORT" Memory allocation error!!!"
ELSE 2DUP SWAP ERASE THEN
THEN ;

\ pass through sieving to given index in given buffer address as side effect...
0 \ initialize test index i -- numv bufa i
BEGIN \ test prime square index < limit
DUP DUP DUP + SWAP 3 + * 3 + TUCK 4 PICK SWAP > \ sqri = 2*i * (I+3) + 3
WHILE \ -- numv bufa sqri i
2 PICK OVER + prime? IF cullpi! \ -- numv bufa i
ELSE SWAP DROP THEN 1+ \ -- numv bufa ni
REPEAT 2DROP ; \ -- numv bufa; drop sqri i

\ print primes to given limit...
: .primes ( u a-addr -- )
OVER 0< IF DROP 2 - 0< IF ( ." No primes!" ) ELSE ( ." Prime: 2" ) THEN
ELSE ." Primes: 2 " SWAP 0
DO DUP I + prime? IF I I + 3 + . THEN LOOP FREE DROP THEN ;

\ count number of primes found for number odd numbers within
\ given presumed sieved buffer starting at address...
: [email protected] ( u a-addr -- )
SWAP DUP 0< IF 1+ 0< IF DROP 0 ELSE 1 THEN
ELSE 1 SWAP \ -- bufa cnt numv
0 DO OVER I + prime? IF 1+ THEN LOOP SWAP FREE DROP
THEN ;

\ shows counted number of primes to the given limit...
: .countprimesto ( u -- )
DUP initsieve sieve [email protected]
CR ." Found " . ." primes Up to the " . ." limit." ;

\ testing the code...
100 initsieve sieve .primes
1000000 .countprimesto
Output:
Primes:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes Up to the 1000000 limit.

As well as solving the stated problems making it much easier to understand and refactor, an odds-only sieve takes half the space and less than half the time.

Bit-Packing the Sieve Buffer (Odds-Only)

Although the above version resolves many problems of the first version, it is wasteful of memory as each composite number in the sieve buffer is a byte of eight bits representing a boolean value. The memory required can be reduced eight-fold by bit packing the sieve buffer; this will take more "bit-twiddling" to read and write the bits, but reducing the memory used will give better cache assiciativity to larger ranges such that there will be a net gain in performance. This will make the code more complex and the stack manipulations will be harder to write, debug, and maintain, so ANS Forth 1994 provides a local variable naming facility to make this much easier. The following code implements bit-packing of the sieve buffer using local named variables when required:

\ produces number of one bits in given word...
: numbts ( u -- u ) \ pop count number of bits...
0 SWAP BEGIN DUP 0<> WHILE SWAP 1+ SWAP DUP 1- AND REPEAT DROP ;

\ constants for variable 32/64 etc. CELL size...
1 CELLS 3 LSHIFT 1- CONSTANT CellMsk
CellMsk numbts CONSTANT CellShft

CREATE bits 8 ALLOT \ bit position Look Up Table...
: mkbts 8 0 DO 1 I LSHIFT I bits + c! LOOP ; mkbts

\ test bit index composites array for prime...
: prime? ( u addr -- ? )
OVER 3 RSHIFT + [email protected] SWAP 7 AND bits + [email protected] AND 0= ;

\ given square index and prime index, u0, sieve the multiples of said prime...
: cullpi! ( u addr u u0 -- u addr u0 )
DUP DUP + 3 + ROT 4 PICK SWAP \ -- numv addr i prm numv sqri
DO I 3 RSHIFT 3 PICK + DUP [email protected] I 7 AND bits + [email protected] OR SWAP C! DUP +LOOP
DROP ;

\ initializes sieve storage and parameters
\ given sieve limit, returns bit limit and buffer address ..
: initsieve ( u -- u a-addr )
3 - \ test limit...
DUP 0< IF 0 ELSE \ return if number of bits is <= 0!
1 RSHIFT 1+ \ finish conbersion to number of bits
DUP 1- CellShft RSHIFT 1+ \ round up to even number of cells
CELLS DUP ALLOCATE 0= IF DUP ROT ERASE \ set cells0. to zero
ELSE ABORT" Memory allocation error!!!"
THEN
THEN ;

\ pass through sieving to given index in given buffer address as side effect...
0 \ initialize test index i -- numv bufa i
BEGIN \ test prime square index < limit
DUP DUP DUP + SWAP 3 + * 3 + TUCK 4 PICK SWAP > \ sqri = 2*i * (I+3) + 3
WHILE \ -- numv bufa sqri i
DUP 3 PICK prime? IF cullpi! \ -- numv bufa i
ELSE SWAP DROP THEN 1+ \ -- numv bufa ni
REPEAT 2DROP ; \ -- numv bufa; drop sqri i

\ prints already found primes from sieved array...
: .primes ( u a-addr -- )
SWAP CR ." Primes to " DUP DUP + 2 + 2 MAX . ." are: "
DUP 0< IF 1+ 0< IF ." none." ELSE 2 . THEN DROP \ case no primes or just 2
ELSE 2 . 0 DO I OVER prime? IF I I + 3 + . THEN LOOP FREE DROP
THEN ;

\ pop count style Look Up Table by 16 bits entry;
\ is a 65536 byte array containing number of zero bits for each index...
CREATE cntLUT16 65536 ALLOT
: mkpop ( u -- u ) numbts 16 SWAP - ;
: initLUT ( -- ) cntLUT16 65536 0 DO I mkpop OVER I + C! LOOP DROP ; initLUT
: [email protected] ( u -- u )
0 1 CELlS 1 RSHIFT 0
DO OVER 65535 AND cntLUT16 + [email protected] + SWAP 16 RSHIFT SWAP LOOP SWAP DROP ;

\ count number of zero bits up to given bits index-1 in array address;
\ params are number of bits used - bits, negative indicates <2/2 out: 0/1,
\ given address is of the allocated bit buffer - bufa;
\ values used: bmsk is bit mask to limit bit in last cell,
\ lci is cell index of last cell used, cnt is the return value...
\ NOTE. this is for little-endian; big-endian needs a byte swap
\ before the last mask and popcount operation!!!
: [email protected] ( u a-addr -- u )
LOCALS| bufa numb |
numb 0< IF numb 1+ 0< IF 0 ELSE 1 THEN \ < 3 -> <2/2 -> 0/1!
ELSE
numb 1- TO numb \ numb -= 1
1 \ initial count
numb CellShft RSHIFT CELLS TUCK \ lci = byte index of CELL including numv
0 ?DO bufa I + @ [email protected] + 1 CELLS +LOOP \ -- lci cnt
SWAP bufa + @ \ -- cnt lstCELL
-2 numb CellMsk AND LSHIFT OR \ bmsk for last CELL -- cnt mskdCELL
[email protected] + \ add popcount of last masked CELL -- cnt
bufa FREE DROP \ free bufa -- bmsk cnt [email protected]
THEN ;

: .countprimesto ( u -- u )
dup initsieve sieve [email protected]
CR ." There are " . ." primes Up to the " . ." limit." ;

100 initsieve sieve .primes
1000000000 .countprimesto

The output of the above code is the same as the previous version, but it takes about two thirds the time while using eight times less memory; it takes about 6.5 seconds on my Intel Skylake i5-6500 at 3.6 GHz (turbo) using swiftForth (32-bit) and about 3.5 seconds on VFX Forth (64-bit), both of which compile to machine code but with the latter much more optimized; gforth-fast is about twice as slow as swiftForth and five times slower then VFX Forth as it just compiles to threaded execution tokens (more like an interpreter).

Page-Segmented Bit-Packed Odds-Only Version

While the above version does greatly reduce the amount of memory used for a given sieving range and thereby also somewhat reduces execution time; any sieve intended for sieving to limits of a hundred million or more should use a page-segmented implementation; page-segmentation means that only storage for a representation of the base primes up to the square root of the limit plus a sieve buffer that should also be at least proportional to the same square root is required; this will again make the execution faster as ranges go up due to better cache associativity with most memory accesses being within the CPU cache sizes. The following Forth code implements a basic version that does this:

\ CPU L1 and L2 cache sizes in bits; power of 2...
1 17 LSHIFT CONSTANT L1CacheBits
L1CacheBits 8 * CONSTANT L2CacheBits

\ produces number of one bits in given word...
: numbts ( u -- u ) \ pop count number of bits...
0 SWAP BEGIN DUP 0<> WHILE SWAP 1+ SWAP DUP 1- AND REPEAT DROP ;

\ constants for variable 32/64 etc. CELL size...
1 CELLS 3 LSHIFT 1- CONSTANT CellMsk
CellMsk numbts CONSTANT CellShft

CREATE bits 8 ALLOT \ bit position Look Up Table...
: mkbts 8 0 DO 1 I LSHIFT I bits + c! LOOP ; mkbts

\ initializes sieve buffer storage and parameters
\ given sieve buffer bit size (even number of CELLS), returns buffer address ..
: initSieveBuffer ( u -- a-addr )
CellShft RSHIFT \ even number of cells
CELLS ALLOCATE 0<> IF ABORT" Memory allocation error!!!" THEN ;

\ test bit index composites array for prime...
: prime? ( u addr -- ? )
OVER 3 RSHIFT + [email protected] SWAP 7 AND bits + [email protected] AND 0= ;

\ given square index and prime index, u0, as sell as bitsz,
\ sieve the multiples of said prime leaving prime index on the stack...
: cullpi! ( u u0 u u addr -- u0 )
LOCALS| sba bitsz lwi | DUP DUP + 3 + ROT \ -- i prm sqri
\ culling start incdx address calculation...
lwi 2DUP > IF - ELSE SWAP - OVER MOD DUP 0<> IF OVER SWAP - THEN
THEN bitsz SWAP \ -- i prm bitsz strti
DO I 3 RSHIFT sba + DUP [email protected] I 7 AND bits + [email protected] OR SWAP C! DUP +LOOP
DROP ;

\ cull sieve buffer given base wheel index, bit size,
\ address base prime sieved buffer and
\ the address of the sieve buffer to be culled of composite bits...
>R >R 2DUP + R> R> \ -- lwi bitsz rngi bpba sba
LOCALS| sba bpba rngi bitsz lwi |
bitsz 1- CellShft RSHIFT 1+ CELLS sba SWAP ERASE \ clear sieve buffer
0 \ initialize base prime index i -- i
BEGIN \ test prime square index < limit
DUP DUP DUP + SWAP 3 + * 3 + TUCK rngi < \ sqri = 2*i * (I+3) + 3
WHILE \ -- sqri i
DUP bpba prime? IF lwi bitsz sba cullpi! ELSE SWAP DROP THEN \ -- i
1+ REPEAT 2DROP ; \ --

\ pop count style Look Up Table by 16 bits entry;
\ is a 65536 byte array containing number of zero bits for each index...
CREATE cntLUT16 65536 ALLOT
: mkpop ( u -- u ) numbts 16 SWAP - ;
: initLUT ( -- ) cntLUT16 65536 0 DO I mkpop OVER I + C! LOOP DROP ; initLUT
: [email protected] ( u -- u )
0 1 CELlS 1 RSHIFT 0
DO OVER 65535 AND cntLUT16 + [email protected] + SWAP 16 RSHIFT SWAP LOOP SWAP DROP ;

\ count number of zero bits up to given bits index in array address...
: [email protected] ( u a-addr -- u )
LOCALS| bufa lmti |
0 \ initial count -- cnt
lmti CellShft RSHIFT CELLS TUCK \ lci = byte index of CELL including numv
0 ?DO bufa I + @ [email protected] + 1 CELLS +LOOP \ -- lci cnt
SWAP bufa + @ \ -- cnt lstCELL
-2 lmti CellMsk AND LSHIFT OR \ bmsk for last CELL -- cnt mskdCELL
[email protected] + ; \ add popcount of last masked CELL -- cnt

\ prints found primes from series of culled sieve buffers...
: .primes ( u -- )
DUP CR ." Primes to " . ." are: "
DUP 3 - 0< IF DUP 2 - 0< IF ." none." ELSE 2 . THEN \ <2/2 -> 0/1
ELSE 2 .
3 - 1 RSHIFT 1+ \ -- rngi
DUP 1- L2CacheBits / L2CacheBits * 3 RSHIFT \ -- rng rngi pglmtbytes
L1CacheBits initSieveBuffer \ address of base prime sieve buffer
L2CacheBits initSieveBuffer \ address of main sieve buffer
LOCALS| sba bpsba pglmt | \ local variables -- rngi
0 OVER L1CacheBits MIN bpsba bpsba cullSieveBuffer
pglmt 0 ?DO
I L2CacheBits bpsba sba cullSieveBuffer
I L2CacheBits 0 DO I sba prime? IF DUP I + DUP + 3 + . THEN LOOP DROP
L2CacheBits +LOOP \ rngi
L2CacheBits mod DUP 0> IF \ one more page!
pglmt DUP L2CacheBits bpsba sba cullSieveBuffer
SWAP 0 DO I sba prime? IF DUP I + DUP + 3 + . THEN LOOP DROP
THEN bpsba FREE DROP sba FREE DROP
THEN ; \ --

\ prints count of found primes from series of culled sieve buffers...
: .countPrimesTo ( u -- )
DUP 3 - 0< IF 2 - 0< IF 0 ELSE 1 THEN \ < 3 -> <2/2 -> 0/1!
ELSE
DUP 3 - 1 RSHIFT 1+
DUP 1- L2CacheBits / L2CacheBits * \ -- rng rngi pglmtbytes
L1CacheBits initSieveBuffer \ address of base prime sieve buffer
L2CacheBits initSieveBuffer \ address of main sieve buffer
LOCALS| sba bpsba pglmt | \ local variables -- rng rngi
0 OVER L1CacheBits MIN bpsba bpsba cullSieveBuffer
1 pglmt 0 ?DO
I L2CacheBits bpsba sba cullSieveBuffer
L2CacheBits 1- sba [email protected] +
L2CacheBits +LOOP \ rng rngi cnt
SWAP L2CacheBits mod DUP 0> IF \ one more page!
pglmt OVER bpsba sba cullSieveBuffer
1- sba [email protected] + \ partial count!
THEN
bpsba FREE DROP sba FREE DROP \ -- range cnt
THEN CR ." There are " . ." primes Up to the " . ." limit." ;

100 .primes
1000000000 .countPrimesTo
Output:
Primes to 100 are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 50847534 primes Up to the 1000000000 limit.

For simplicity, the base primes array is left as a sieved bit packed array (which takes minimum space) at the cost of having to scan the bit array for base primes on every page-segment culling pass. The page-segment sieve buffer is set as a fixed multiple of this (intended to fit within the CPU L2 cache size) in order to reduce the base prime start index address calculation overhead by this factor at the cost of slightly increased memory access times, which access times are still only about the same as the fastest inner culling time or less anyway. When the cache sizes are set to the 32 Kilobyte/256 Kilobyte size for L1/L2, respectively, by changing 1 18 LSHIFT CONSTANT L1CacheBits) as for my Intel Skylake i5-6500 at 3.6 GHz (single-threaded turbo), it runs in about 1.25 seconds on 64-bit VFX Forth, 3.75 seconds on 32-bit swiftForth, and 12.4 seconds on 64-bit gforth-fast, obviously with the tuned in-lined machine language compiling of VFX Forth much faster than the threaded execution token interpreting of gforth and with swiftForth lacking the machine code inlining of VFX Forth.

VFX Forth is only about 25 % slower than the algorithm as written in the fastest of languages, just as they advertise.

As written, the algorithm works efficiently up to over ten billion (1e10) with 64-bit systems, but could easily be refactored to use floating point or double precision for inputs and outputs as I have done in a StackOverflow answer in JavaScript without costing much in execution time so 32-bit systems would have the much higher limit.

The implementation is efficient up to this range, but with a change so that the base primes array can grow with increasing limit, can sieve to much higher ranges with a loss of efficiency in unused base prime start address calculations that can't be used as the culling spans exceed the fixed sieve buffer size. Again, this can be solved by also making the page-segmentation sieve buffer grow as the square root of the limit.

Further improvements by a factor of almost four in overall execution speed would be gained by implementing maximum wheel-factorization as per my other StackOverflow JavaScript answer, which also effectively increases sieve buffer sizes by a factor of 48 in sieving by modulo residual bit planes.

Finally, multi-processing could be applied to increase the execution speed by about the number of effective cores (non SMT - Hyper Threads) as in four on my Skylake machine; however, neither the 1994 ANS Forth standard nor the 2012 standard has a standard Forth way of implementing this so each of the implementations use their own custom WORDS; since the resulting code would not be cross-implementation, I am not going to do this.

I likely won't even add the Maximum Wheel-Factorized version as in the above linked JavaScript code, since this code is enough to demonstrate what I was going to show: that Forth can be an efficient language, albeit a little hard to code, read, and maintain due to the reliance on anonymous data stack operations; it is a language whose best use is likely in cross-compiling to embedded systems where it can easily be customized and extended as required, and because it doesn't actually require a base operating system, can use its core facilities, functions, and extensions in place of such an OS to result in a minimum memory footprint.

Fortran

Works with: Fortran version 90 and later
program sieve

implicit none
integer, parameter :: i_max = 100
integer :: i
logical, dimension (i_max) :: is_prime

is_prime = .true.
is_prime (1) = .false.
do i = 2, int (sqrt (real (i_max)))
if (is_prime (i)) is_prime (i * i : i_max : i) = .false.
end do
do i = 1, i_max
if (is_prime (i)) write (*, '(i0, 1x)', advance = 'no') i
end do
write (*, *)

end program sieve

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Because it uses four byte logical's (default size) as elements of the sieve buffer, the above code uses 400 bytes of memory for this trivial task of sieving to 100; it also has 49 + 31 + 16 + 8 = 104 (for the culling by the primes of two, three, five, and seven) culling operations.

Optimised using a pre-computed wheel based on 2:

program sieve_wheel_2

implicit none
integer, parameter :: i_max = 100
integer :: i
logical, dimension (i_max) :: is_prime

is_prime = .true.
is_prime (1) = .false.
is_prime (4 : i_max : 2) = .false.
do i = 3, int (sqrt (real (i_max))), 2
if (is_prime (i)) is_prime (i * i : i_max : 2 * i) = .false.
end do
do i = 1, i_max
if (is_prime (i)) write (*, '(i0, 1x)', advance = 'no') i
end do
write (*, *)

end program sieve_wheel_2

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

This so-called "optimized" version still uses 400 bytes of memory but slightly reduces to 74 operations from 104 operations including the initialization of marking all of the even representations as composite due to skipping the re-culling of the even representation, so isn't really much of an optimization at all!

Optimized using a proper implementation of a wheel 2:

The above implementations, especially the second odds-only code, are some of the most inefficient versions of the Sieve of Eratosthenes in any language here as to time and space efficiency, only worse by some naive JavaScript implementations that use eight-byte Number's as logical values; the second claims to be wheel factorized but still uses all the same memory as the first and still culls by the even numbers in the initialization of the sieve buffer. As well, using four bytes (default logical size) to store a boolean value is terribly wasteful if these implementations were to be extended to non-toy ranges. The following code implements proper wheel factorization by two, reducing the space used by a factor of about eight to 49 bytes by using `byte` as the sieve buffer array elements and not requiring the evens initialization, thus reducing the number of operations to 16 + 8 + 4 = 28 (for the culling primes of three, five, and seven) culling operations:

program sieve_wheel_2

implicit none
integer, parameter :: i_max = 100
integer, parameter :: i_limit = (i_max - 3) / 2
integer :: i
byte, dimension (0:i_limit) :: composites

composites = 0
do i = 0, (int (sqrt (real (i_max))) - 3) / 2
if (composites(i) == 0) composites ((i + i) * (i + 3) + 3 : i_limit : i + i + 3) = 1.
end do
write (*, '(i0, 1x)', advance = 'no') 2
do i = 0, i_limit
if (composites (i) == 0) write (*, '(i0, 1x)', advance = 'no') (i + i + 3)
end do
write (*, *)

end program sieve_wheel_2

The output is the same as the earlier version.

Optimized using bit packing to reduce the memory use by a further factor of eight:

The above implementation is still space inefficient in effectively only using one bit out of eight; the following version implements bit packing to reduce memory use by a factor of eight by using bits to represent composite numbers rather than bytes:

program sieve_wheel_2

implicit none
integer, parameter :: i_max = 10000000
integer, parameter :: i_range = (i_max - 3) / 2
integer :: i, j, k, cnt
byte, dimension (0:i_range / 8) :: composites

composites = 0 ! pre-initialized?
do i = 0, (int (sqrt (real (i_max))) - 3) / 2
if (iand(composites(shiftr(i, 3)), shiftl(1, iand(i, 7))) == 0) then
do j = (i + i) * (i + 3) + 3, i_range, i + i + 3
k = shiftr(j, 3)
composites(k) = ior(composites(k), shiftl(1, iand(j, 7)))
end do
end if
end do
! write (*, '(i0, 1x)', advance = 'no') 2
cnt = 1
do i = 0, i_range
if (iand(composites(shiftr(i, 3)), shiftl(1, iand(i, 7))) == 0) then
! write (*, '(i0, 1x)', advance = 'no') (i + i + 3)
cnt = cnt + 1
end if
end do
! write (*, *)
print '(a, i0, a, i0, a, f0.0, a)', &
'There are ', cnt, ' primes up to ', i_max, '.'
end program sieve_wheel_2
Output:
There are 664579 primes up to 10000000.

When the lines to print the results are enabled, the output to a maximum value of 100 is still exactly the same as the other versions, and it has exactly the same number of culling operations as the immediately above optimized version for the same range; the only difference is that less memory is used. Although the culling operations are somewhat more complex, for larger ranges the time saved in better cache associativity due to more effective use of the cache more than makes up for it so average culling time is actually reduced, so that this version can count the number of primes to several million (it takes a lot of time to list hundreds of thousands of primes, but counting is faster) in a few tens of milliseconds. For ranges above a few tens of millions, a page-segmented sieve is much more efficient due to further improved use of the CPU caches.

As well as adding page-segmentation, the following code adds multi-processing which is onc of the capabilities for which modern Fortran is known:

subroutine cullSieveBuffer(lwi, size, bpa, sba)

implicit none
integer, intent(in) :: lwi, size
byte, intent(in) :: bpa(0:size - 1)
byte, intent(out) :: sba(0:size - 1)
integer :: i_limit, i_bitlmt, i_bplmt, i, sqri, bp, si, olmt, msk, j
byte, dimension (0:7) :: bits
common /twiddling/ bits

i_bitlmt = size * 8 - 1
i_limit = lwi + i_bitlmt
i_bplmt = size / 4
sba = 0
i = 0
sqri = (i + i) * (i + 3) + 3
do while (sqri <= i_limit)
if (iand(int(bpa(shiftr(i, 3))), shiftl(1, iand(i, 7))) == 0) then
bp = i + i + 3
if (lwi <= sqri) then
si = sqri - lwi
else
si = mod((lwi - sqri), bp)
if (si /= 0) si = bp - si
end if
if (bp <= i_bplmt) then
olmt = min(i_bitlmt, si + bp * 8 - 1)
do while (si <= olmt)
msk = bits(iand(si, 7))
do j = shiftr(si, 3), size - 1, bp
sba(j) = ior(int(sba(j)), msk)
end do
si = si + bp
end do
else
do while (si <= i_bitlmt)
j = shiftr(si, 3)
sba(j) = ior(sba(j), bits(iand(si, 7)))
si = si + bp
end do
end if
end if
i = i + 1
sqri = (i + i) * (i + 3) + 3
end do

end subroutine cullSieveBuffer

integer function countSieveBuffer(lmti, almti, sba)

implicit none
integer, intent(in) :: lmti, almti
byte, intent(in) :: sba(0:almti)
integer :: bmsk, lsti, i, cnt
byte, dimension (0:65535) :: clut
common /counting/ clut

cnt = 0
bmsk = iand(shiftl(-2, iand(lmti, 15)), 65535)
lsti = iand(shiftr(lmti, 3), -2)
do i = 0, lsti - 1, 2
cnt = cnt + clut(shiftl(iand(int(sba(i)), 255), 8) + iand(int(sba(i + 1)), 255))
end do
countSieveBuffer = cnt + clut(ior(shiftl(iand(int(sba(lsti)), 255), 8) + iand(int(sba(lsti + 1)), 255), bmsk))

end function countSieveBuffer

program sieve_paged

use OMP_LIB
implicit none
integer, parameter :: i_max = 1000000000, i_range = (i_max - 3) / 2
integer, parameter :: i_l1cache_size = 16384, i_l1cache_bitsz = i_l1cache_size * 8
integer, parameter :: i_l2cache_size = i_l1cache_size * 8, i_l2cache_bitsz = i_l2cache_size * 8
integer :: cr, c0, c1, i, j, k, cnt
integer, save :: scnt
integer :: countSieveBuffer
integer :: numthrds
byte, dimension (0:i_l1cache_size - 1) :: bpa
byte, save, allocatable, dimension (:) :: sba
byte, dimension (0:7) :: bits = (/ 1, 2, 4, 8, 16, 32, 64, -128 /)
byte, dimension (0:65535) :: clut
common /twiddling/ bits
common /counting/ clut

type heaparr
byte, allocatable, dimension(:) :: thrdsba
end type heaparr
type(heaparr), allocatable, dimension (:) :: sbaa

numthrds = 1
allocate(sbaa(0:numthrds - 1))
do i = 0, numthrds - 1
allocate(sbaa(i)%thrdsba(0:i_l2cache_size - 1))
end do

CALL SYSTEM_CLOCK(count_rate=cr)
CALL SYSTEM_CLOCK(c0)
do k = 0, 65535 ! initialize counting Look Up Table
j = k
i = 16
do while (j > 0)
i = i - 1
j = iand(j, j - 1)
end do
clut(k) = i
end do
bpa = 0 ! pre-initialization not guaranteed!
call cullSieveBuffer(0, i_l1cache_size, bpa, bpa)

cnt = 1
!\$OMP PARALLEL DO ORDERED
do i = i_l2cache_bitsz, i_range, i_l2cache_bitsz * 8
scnt = 0
sba = sbaa(mod(i, numthrds))%thrdsba
do j = i, min(i_range, i + 8 * i_l2cache_bitsz - 1), i_l2cache_bitsz
call cullSieveBuffer(j - i_l2cache_bitsz, i_l2cache_size, bpa, sba)
scnt = scnt + countSieveBuffer(i_l2cache_bitsz - 1, i_l2cache_size, sba)
end do
!\$OMP ATOMIC
cnt = cnt + scnt
end do
!\$OMP END PARALLEL DO

j = i_range / i_l2cache_bitsz * i_l2cache_bitsz
k = i_range - j
if (k /= i_l2cache_bitsz - 1) then
call cullSieveBuffer(j, i_l2cache_size, bpa, sbaa(0)%thrdsba)
cnt = cnt + countSieveBuffer(k, i_l2cache_size, sbaa(0)%thrdsba)
end if
! write (*, '(i0, 1x)', advance = 'no') 2
! do i = 0, i_range
! if (iand(sba(shiftr(i, 3)), bits(iand(i, 7))) == 0) write (*, '(i0, 1x)', advance='no') (i + i + 3)
! end do
! write (*, *)
CALL SYSTEM_CLOCK(c1)
print '(a, i0, a, i0, a, f0.0, a)', 'Found ', cnt, ' primes up to ', i_max, &
' in ', ((c1 - c0) / real(cr) * 1000), ' milliseconds.'

do i = 0, numthrds - 1
deallocate(sbaa(i)%thrdsba)
end do
deallocate(sbaa)

end program sieve_paged
Output:
Found 50847534 primes up to 1000000000 in 219. milliseconds.

The above output was as compiled with gfortran -O3 -fopenmp using version 11.1.1-1 on my Intel Skylake i5-6500 CPU at 3.2 GHz multithreaded with four cores. There are a few more optimizations that could be made in applying Maximum Wheel-Factorization as per my StackOverflow answer in JavaScript, which will make this almost four times faster yet again. If that optimization were done, sieving to a billion as here is really too trivial to measure and one should sieve at least up to ten billion to start to get a long enough time to be measured accurately. As explained in that answer, the Maximum Wheel-Factorized code will work efficiently up to about a trillion (1e12), when it needs yet another "bucket sieve" optimization to allow it to continue to scale efficiently for increasing range. The final optimization which can speed up the code by almost a factor of two is a very low level loop unrolling technique that I'm not sure will work with the compiler, but as it works in C/C++ and other similar languages including those that compile through LLVM, it ought to.

Free Pascal

Basic version

function Sieve returns a list of primes less than or equal to the given aLimit

program prime_sieve;
{\$mode objfpc}{\$coperators on}
uses
SysUtils, GVector;
type
TPrimeList = specialize TVector<DWord>;
function Sieve(aLimit: DWord): TPrimeList;
var
IsPrime: array of Boolean;
I, SqrtBound: DWord;
J: QWord;
begin
Result := TPrimeList.Create;
Inc(aLimit, Ord(aLimit < High(DWord))); //not a problem because High(DWord) is composite
SetLength(IsPrime, aLimit);
FillChar(Pointer(IsPrime)^, aLimit, Byte(True));
SqrtBound := Trunc(Sqrt(aLimit));
for I := 2 to aLimit do
if IsPrime[I] then
begin
Result.PushBack(I);
if I <= SqrtBound then
begin
J := I * I;
repeat
IsPrime[J] := False;
J += I;
until J > aLimit;
end;
end;
end;

//usage

var
Limit: DWord = 0;
var
Lim: Int64;
begin
if (ParamCount = 1) and Lim.TryParse(ParamStr(1), Lim) then
if (Lim >= 0) and (Lim <= High(DWord)) then
begin
Limit := DWord(Lim);
exit(True);
end;
Result := False;
end;
procedure PrintUsage;
begin
WriteLn('Usage: prime_sieve Limit');
WriteLn(' where Limit in the range [0, ', High(DWord), ']');
Halt;
end;
procedure PrintPrimes(aList: TPrimeList);
var
I: DWord;
begin
for I := 0 to aList.Size - 2 do
Write(aList[I], ', ');
WriteLn(aList[aList.Size - 1]);
aList.Free;
end;
begin
PrintUsage;
try
PrintPrimes(Sieve(Limit));
except
on e: Exception do
WriteLn('An exception ', e.ClassName, ' occurred with message: ', e.Message);
end;
end.

Alternative segmented(odds only) version

function OddSegmentSieve returns a list of primes less than or equal to the given aLimit

program prime_sieve;
{\$mode objfpc}{\$coperators on}
uses
SysUtils, Math;
type
TPrimeList = array of DWord;
function OddSegmentSieve(aLimit: DWord): TPrimeList;
function EstimatePrimeCount(aLimit: DWord): DWord;
begin
case aLimit of
0..1: Result := 0;
2..200: Result := Trunc(1.6 * aLimit/Ln(aLimit)) + 1;
else
Result := Trunc(aLimit/(Ln(aLimit) - 2)) + 1;
end;
end;
function Sieve(aLimit: DWord; aNeed2: Boolean): TPrimeList;
var
IsPrime: array of Boolean;
I: DWord = 3;
J, SqrtBound: DWord;
Count: Integer = 0;
begin
if aLimit < 2 then
exit(nil);
SetLength(IsPrime, (aLimit - 1) div 2);
FillChar(Pointer(IsPrime)^, Length(IsPrime), Byte(True));
SetLength(Result, EstimatePrimeCount(aLimit));
SqrtBound := Trunc(Sqrt(aLimit));
if aNeed2 then
begin
Result := 2;
Inc(Count);
end;
for I := 0 to High(IsPrime) do
if IsPrime[I] then
begin
Result[Count] := I * 2 + 3;
if Result[Count] <= SqrtBound then
begin
J := Result[Count] * Result[Count];
repeat
IsPrime[(J - 3) div 2] := False;
J += Result[Count] * 2;
until J > aLimit;
end;
Inc(Count);
end;
SetLength(Result, Count);
end;
const
PAGE_SIZE = \$8000;
var
IsPrime: array[0..Pred(PAGE_SIZE)] of Boolean; //current page
SmallPrimes: TPrimeList = nil;
I: QWord;
J, PageHigh, Prime: DWord;
Count: Integer;
begin
if aLimit < PAGE_SIZE div 4 then
exit(Sieve(aLimit, True));
I := Trunc(Sqrt(aLimit));
SmallPrimes := Sieve(I + 1, False);
Count := Length(SmallPrimes) + 1;
I += Ord(not Odd(I));
SetLength(Result, EstimatePrimeCount(aLimit));
while I <= aLimit do
begin
PageHigh := Min(Pred(PAGE_SIZE * 2), aLimit - I);
FillChar(IsPrime, PageHigh div 2 + 1, Byte(True));
for Prime in SmallPrimes do
begin
J := DWord(I) mod Prime;
if J <> 0 then
J := Prime shl (1 - J and 1) - J;
while J <= PageHigh do
begin
IsPrime[J div 2] := False;
J += Prime * 2;
end;
end;
for J := 0 to PageHigh div 2 do
if IsPrime[J] then
begin
Result[Count] := J * 2 + I;
Inc(Count);
end;
I += PAGE_SIZE * 2;
end;
SetLength(Result, Count);
Result := 2;
Move(SmallPrimes, Result, Length(SmallPrimes) * SizeOf(DWord));
end;

//usage

var
Limit: DWord = 0;
var
Lim: Int64;
begin
if (ParamCount = 1) and Lim.TryParse(ParamStr(1), Lim) then
if (Lim >= 0) and (Lim <= High(DWord)) then
begin
Limit := DWord(Lim);
exit(True);
end;
Result := False;
end;
procedure PrintUsage;
begin
WriteLn('Usage: prime_sieve Limit');
WriteLn(' where Limit in the range [0, ', High(DWord), ']');
Halt;
end;
procedure PrintPrimes(const aList: TPrimeList);
var
I: DWord;
begin
for I := 0 to Length(aList) - 2 do
Write(aList[I], ', ');
WriteLn(aList[High(aList)]);
end;
begin
PrintUsage;
PrintPrimes(OddSegmentSieve(Limit));
end.

FreeBASIC

' FB 1.05.0

Sub sieve(n As Integer)
If n < 2 Then Return
Dim a(2 To n) As Integer
For i As Integer = 2 To n : a(i) = i : Next
Dim As Integer p = 2, q
' mark non-prime numbers by setting the corresponding array element to 0
Do
For j As Integer = p * p To n Step p
a(j) = 0
Next j
' look for next non-zero element in array after 'p'
q = 0
For j As Integer = p + 1 To Sqr(n)
If a(j) <> 0 Then
q = j
Exit For
End If
Next j
If q = 0 Then Exit Do
p = q
Loop

' print the non-zero numbers remaining i.e. the primes
For i As Integer = 2 To n
If a(i) <> 0 Then
Print Using "####"; a(i);
End If
Next
Print
End Sub

Print "The primes up to 1000 are :"
Print
sieve(1000)
Print
Print "Press any key to quit"
Sleep
Output:
The primes up to 1000 are :

2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71
73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
947 953 967 971 977 983 991 997

Frink

n = eval[input["Enter highest number: "]]
results = array[sieve[n]]
println[results]
println[length[results] + " prime numbers less than or equal to " + n]

sieve[n] :=
{
// Initialize array
array = array[0 to n]
[email protected] = 0

for i = 2 to ceil[sqrt[n]]
if [email protected] != 0
for j = i^2 to n step i
[email protected] = 0

return select[array, { |x| x != 0 }]
}

Furor

Note: With benchmark function

tick sto startingtick
#g 100000 sto MAX
one count
2 @MAX külső: {||
@count {|
|} // @count vége
|} // @MAX vége
."Time : " tick @startingtick - print ." tick\n"
."Prímek száma = " @count printnl
end
{ „MAX” } { „startingtick” } { „primeNumbers” } { „count” }

FutureBasic

Basic sieve of array of booleans

include "ConsoleWindow"

begin globals
dim dynamic gPrimes(1) as Boolean
end globals

local fn SieveOfEratosthenes( n as long )
dim as long i, j

for i = 2 to n
for j = i * i to n step i
gPrimes(j) = _true
next
if gPrimes(i) = 0 then print i;
next i
kill gPrimes
end fn

fn SieveOfEratosthenes( 100 )

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Fōrmulæ

Fōrmulæ programs are not textual, visualization/edition of programs is done showing/manipulating structures but not text. Moreover, there can be multiple visual representations of the same program. Even though it is possible to have textual representation —i.e. XML, JSON— they are intended for storage and transfer purposes more than visualization and edition.

Programs in Fōrmulæ are created/edited online in its website, However they run on execution servers. By default remote servers are used, but they are limited in memory and processing power, since they are intended for demonstration and casual use. A local server can be downloaded and installed, it has no limitations (it runs in your own computer). Because of that, example programs can be fully visualized and edited, but some of them will not run if they require a moderate or heavy computation/memory resources, and no local server is being used.

GAP

Eratosthenes := function(n)
local a, i, j;
a := ListWithIdenticalEntries(n, true);
if n < 2 then
return [];
else
for i in [2 .. n] do
if a[i] then
j := i*i;
if j > n then
return Filtered([2 .. n], i -> a[i]);
else
while j <= n do
a[j] := false;
j := j + i;
od;
fi;
fi;
od;
fi;
end;

Eratosthenes(100);

[ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ]

GLBasic

// Sieve of Eratosthenes (find primes)
// GLBasic implementation

GLOBAL n%, k%, limit%, flags%[]

limit = 100 // search primes up to this number

DIM flags[limit+1] // GLBasic arrays start at 0

FOR n = 2 TO SQR(limit)
IF flags[n] = 0
FOR k = n*n TO limit STEP n
flags[k] = 1
NEXT
ENDIF
NEXT

// Display the primes
FOR n = 2 TO limit
IF flags[n] = 0 THEN STDOUT n + ", "
NEXT

KEYWAIT

Go

Basic sieve of array of booleans

package main
import "fmt"

func main() {
const limit = 201 // means sieve numbers < 201

// sieve
c := make([]bool, limit) // c for composite. false means prime candidate
c = true // 1 not considered prime
p := 2
for {
// first allowed optimization: outer loop only goes to sqrt(limit)
p2 := p * p
if p2 >= limit {
break
}
// second allowed optimization: inner loop starts at sqr(p)
for i := p2; i < limit; i += p {
c[i] = true // it's a composite

}
// scan to get next prime for outer loop
for {
p++
if !c[p] {
break
}
}
}

// sieve complete. now print a representation.
for n := 1; n < limit; n++ {
if c[n] {
fmt.Print(" .")
} else {
fmt.Printf("%3d", n)
}
if n%20 == 0 {
fmt.Println("")
}
}
}

Output:

.  2  3  .  5  .  7  .  .  . 11  . 13  .  .  . 17  . 19  .
.  . 23  .  .  .  .  . 29  . 31  .  .  .  .  . 37  .  .  .
41  . 43  .  .  . 47  .  .  .  .  . 53  .  .  .  .  . 59  .
61  .  .  .  .  . 67  .  .  . 71  . 73  .  .  .  .  . 79  .
.  . 83  .  .  .  .  . 89  .  .  .  .  .  .  . 97  .  .  .
101  .103  .  .  .107  .109  .  .  .113  .  .  .  .  .  .  .
.  .  .  .  .  .127  .  .  .131  .  .  .  .  .137  .139  .
.  .  .  .  .  .  .  .149  .151  .  .  .  .  .157  .  .  .
.  .163  .  .  .167  .  .  .  .  .173  .  .  .  .  .179  .
181  .  .  .  .  .  .  .  .  .191  .193  .  .  .197  .199  .

Odds-only bit-packed array output-enumerating version

The above version's output is rather specialized; the following version uses a closure function to enumerate over the culled composite number array, which is bit packed. By using this scheme for output, no extra memory is required above that required for the culling array:

package main

import (
"fmt"
"math"
)

func primesOdds(top uint) func() uint {
topndx := int((top - 3) / 2)
topsqrtndx := (int(math.Sqrt(float64(top))) - 3) / 2
cmpsts := make([]uint, (topndx/32)+1)
for i