Sieve of Eratosthenes

This task has been clarified. Its programming examples are in need of review to ensure that they still fit the requirements of the task.

The Sieve of Eratosthenes is a simple algorithm that finds the prime numbers up to a given integer.

Task

Implement the Sieve of Eratosthenes algorithm, with the only allowed optimization that the outer loop can stop at the square root of the limit, and the inner loop may start at the square of the prime just found.

That means especially that you shouldn't optimize by using pre-computed wheels, i.e. don't assume you need only to cross out odd numbers (wheel based on 2), numbers equal to 1 or 5 modulo 6 (wheel based on 2 and 3), or similar wheels based on low primes.

If there's an easy way to add such a wheel based optimization, implement it as an alternative version.

Note

It is important that the sieve algorithm be the actual algorithm used to find prime numbers for the task.

Related tasks

11l

Translation of: Python

F primes_upto(limit)
   V is_prime = [0B]*2 [+] [1B]*(limit - 1)
   L(n) 0 .< Int(limit ^ 0.5 + 1.5)
      I is_prime[n]
         L(i) (n*n..limit).step(n)
            is_prime[i] = 0B
   R enumerate(is_prime).filter((i, prime) -> prime).map((i, prime) -> i)

print(primes_upto(100))

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

360 Assembly

For maximum compatibility, this program uses only the basic instruction set.

*        Sieve of Eratosthenes 
ERATOST  CSECT  
         USING  ERATOST,R12
SAVEAREA B      STM-SAVEAREA(R15)
         DC     17F'0'
         DC     CL8'ERATOST'
STM      STM    R14,R12,12(R13) save calling context
         ST     R13,4(R15)      
         ST     R15,8(R13)
         LR     R12,R15         set addessability
*        ----   CODE
         LA     R4,1            I=1  
         LA     R6,1            increment
         L      R7,N            limit
LOOPI    BXH    R4,R6,ENDLOOPI  do I=2 to N
         LR     R1,R4           R1=I
         BCTR   R1,0             
         LA     R14,CRIBLE(R1)
         CLI    0(R14),X'01'
         BNE    ENDIF           if not CRIBLE(I)
         LR     R5,R4           J=I
         LR     R8,R4
         LR     R9,R7
LOOPJ    BXH    R5,R8,ENDLOOPJ  do J=I*2 to N by I
         LR     R1,R5           R1=J
         BCTR   R1,0
         LA     R14,CRIBLE(R1)
         MVI    0(R14),X'00'    CRIBLE(J)='0'B
         B      LOOPJ
ENDLOOPJ EQU    *
ENDIF    EQU    *
         B      LOOPI
ENDLOOPI EQU    *
         LA     R4,1            I=1  
         LA     R6,1
         L      R7,N
LOOP     BXH    R4,R6,ENDLOOP   do I=1 to N
         LR     R1,R4           R1=I
         BCTR   R1,0
         LA     R14,CRIBLE(R1)
         CLI    0(R14),X'01'
         BNE    NOTPRIME        if not CRIBLE(I) 
         CVD    R4,P            P=I
         UNPK   Z,P             Z=P
         MVC    C,Z             C=Z
         OI     C+L'C-1,X'F0'   zap sign
         MVC    WTOBUF(8),C+8
         WTO    MF=(E,WTOMSG)		  
NOTPRIME EQU    *
         B      LOOP
ENDLOOP  EQU    *
RETURN   EQU    *
         LM     R14,R12,12(R13) restore context
         XR     R15,R15         set return code to 0
         BR     R14             return to caller
*        ----   DATA
I        DS     F
J        DS     F
         DS     0F
P        DS     PL8             packed
Z        DS     ZL16            zoned
C        DS     CL16            character 
WTOMSG   DS     0F
         DC     H'80'           length of WTO buffer
         DC     H'0'            must be binary zeroes
WTOBUF   DC     80C' '
         LTORG  
N        DC     F'100000'
CRIBLE   DC     100000X'01'
         YREGS  
         END    ERATOST

Output:

6502 Assembly

If this subroutine is called with the value of n in the accumulator, it will store an array of the primes less than n beginning at address 1000 hex and return the number of primes it has found in the accumulator.

ERATOS: STA  $D0      ; value of n
        LDA  #$00
        LDX  #$00
SETUP:  STA  $1000,X  ; populate array
        ADC  #$01
        INX
        CPX  $D0
        BPL  SET
        JMP  SETUP
SET:    LDX  #$02
SIEVE:  LDA  $1000,X  ; find non-zero
        INX
        CPX  $D0
        BPL  SIEVED
        CMP  #$00
        BEQ  SIEVE
        STA  $D1      ; current prime
MARK:   CLC
        ADC  $D1
        TAY
        LDA  #$00
        STA  $1000,Y
        TYA
        CMP  $D0
        BPL  SIEVE
        JMP  MARK
SIEVED: LDX  #$01
        LDY  #$00
COPY:   INX
        CPX  $D0
        BPL  COPIED
        LDA  $1000,X
        CMP  #$00
        BEQ  COPY
        STA  $2000,Y
        INY
        JMP  COPY
COPIED: TYA           ; how many found
        RTS

68000 Assembly

Algorithm somewhat optimized: array omits 1, 2, all higher odd numbers. Optimized for storage: uses bit array for prime/composite flags.

Works with: [EASy68K v5.13.00]

Some of the macro code is derived from the examples included with EASy68K. See 68000 "100 Doors" listing for additional information.

*-----------------------------------------------------------
* Title      : BitSieve
* Written by : G. A. Tippery
* Date       : 2014-Feb-24, 2013-Dec-22
* Description: Prime number sieve
*-----------------------------------------------------------
    	ORG    $1000

**	---- Generic macros ----	**
PUSH	MACRO
	MOVE.L	\1,-(SP)
	ENDM

POP	MACRO
	MOVE.L	(SP)+,\1
	ENDM

DROP	MACRO
	ADDQ	#4,SP
	ENDM
	
PUTS	MACRO
	** Print a null-terminated string w/o CRLF **
	** Usage: PUTS stringaddress
	** Returns with D0, A1 modified
	MOVEQ	#14,D0	; task number 14 (display null string)
	LEA	\1,A1	; address of string
	TRAP	#15	; display it
	ENDM
	
GETN	MACRO
	MOVEQ	#4,D0	; Read a number from the keyboard into D1.L. 
	TRAP	#15
	ENDM

**	---- Application-specific macros ----	**

val	MACRO		; Used by bit sieve. Converts bit address to the number it represents.
	ADD.L	\1,\1	; double it because odd numbers are omitted
	ADDQ	#3,\1	; add offset because initial primes (1, 2) are omitted
	ENDM

* ** ================================================================================ **
* ** Integer square root routine, bisection method **
* ** IN: D0, should be 0<D0<$10000 (65536) -- higher values MAY work, no guarantee
* ** OUT: D1
*
SquareRoot:
*
	MOVEM.L	D2-D4,-(SP)	; save registers needed for local variables
*	DO == n
*	D1 == a
*	D2 == b
*	D3 == guess
*	D4 == temp
*
*		a = 1;
*		b = n;
	MOVEQ	#1,D1
	MOVE.L	D0,D2
*		do {
	REPEAT
*		guess = (a+b)/2;
	MOVE.L	D1,D3
	ADD.L	D2,D3
	LSR.L	#1,D3
*		if (guess*guess > n) {	// inverse function of sqrt is square
	MOVE.L	D3,D4
	MULU	D4,D4		; guess^2
	CMP.L	D0,D4
	BLS	.else
*		b = guess;
	MOVE.L	D3,D2
	BRA	.endif
*		} else {
.else:
*		a = guess;
	MOVE.L	D3,D1
*		} //if
.endif:
*		} while ((b-a) > 1);	; Same as until (b-a)<=1 or until (a-b)>=1
	MOVE.L	D2,D4
	SUB.L	D1,D4	; b-a
	UNTIL.L	  D4 <LE> #1 DO.S
*		return (a)	; Result is in D1
*		} //LongSqrt()
	MOVEM.L	(SP)+,D2-D4	; restore saved registers
	RTS
*
* ** ================================================================================ **


** ======================================================================= **
*
**  Prime-number Sieve of Eratosthenes routine using a big bit field for flags  **
*  Enter with D0 = size of sieve (bit array)
*  Prints found primes 10 per line
*  Returns # prime found in D6
*
*   Register usage:
*
*	D0 == n
*	D1 == prime
*	D2 == sqroot
*	D3 == PIndex
*	D4 == CIndex
*	D5 == MaxIndex
*	D6 == PCount
*
*	A0 == PMtx[0]
*
*   On return, all registers above except D0 are modified. Could add MOVEMs to save and restore D2-D6/A0.
*

**	------------------------	**

GetBit:		** sub-part of Sieve subroutine **
		** Entry: bit # is on TOS
		** Exit: A6 holds the byte number, D7 holds the bit number within the byte
		** Note: Input param is still on TOS after return. Could have passed via a register, but
                **  wanted to practice with stack. :)
*		
	MOVE.L	(4,SP),D7	; get value from (pre-call) TOS
	ASR.L	#3,D7	; /8
	MOVEA	D7,A6	; byte #
	MOVE.L	(4,SP),D7	; get value from (pre-call) TOS
	AND.L	#$7,D7	; bit #
	RTS

**	------------------------	**

Sieve:
	MOVE	D0,D5
	SUBQ	#1,D5
	JSR	SquareRoot	; sqrt D0 => D1
	MOVE.L	D1,D2
	LEA	PArray,A0
	CLR.L	D3
*
PrimeLoop:
	MOVE.L	D3,D1
	val	D1
	MOVE.L	D3,D4
	ADD.L	D1,D4
*
CxLoop:		; Goes through array marking multiples of d1 as composite numbers
	CMP.L	D5,D4
	BHI	ExitCx
	PUSH	D4	; set D7 as bit # and A6 as byte pointer for D4'th bit of array
	JSR GetBit
	DROP
	BSET	D7,0(A0,A6.L)	; set bit to mark as composite number
	ADD.L	D1,D4	; next number to mark
	BRA	CxLoop
ExitCx:
	CLR.L	D1	; Clear new-prime-found flag
	ADDQ	#1,D3	; Start just past last prime found 
PxLoop:		; Searches for next unmarked (not composite) number
	CMP.L	D2,D3	; no point searching past where first unmarked multiple would be past end of array
	BHI	ExitPx	; if past end of array
	TST.L	D1
	BNE	ExitPx	; if flag set, new prime found
	PUSH D3		; check D3'th bit flag
	JSR	GetBit	; sets D7 as bit # and A6 as byte pointer
	DROP		; drop TOS
	BTST	D7,0(A0,A6.L)	; read bit flag
	BNE	IsSet	; If already tagged as composite
	MOVEQ	#-1,D1	; Set flag that we've found a new prime
IsSet:
	ADDQ	#1,D3	; next PIndex
	BRA	PxLoop
ExitPx:
	SUBQ	#1,D3	; back up PIndex
	TST.L	D1	; Did we find a new prime #?
	BNE	PrimeLoop	; If another prime # found, go process it
*
		; fall through to print routine

**	------------------------	**

* Print primes found
*
*	D4 == Column count
*
*	Print header and assumed primes (#1, #2) 
    	PUTS	Header	; Print string @ Header, no CR/LF
	MOVEQ	#2,D6	; Start counter at 2 because #1 and #2 are assumed primes
	MOVEQ	#2,D4
*
	MOVEQ	#0,D3
PrintLoop:
	CMP.L	D5,D3
	BHS	ExitPL
	PUSH	D3
	JSR	GetBit	; sets D7 as bit # and A6 as byte pointer
	DROP		; drop TOS
	BTST	D7,0(A0,A6.L)
	BNE		NotPrime
*		printf(" %6d", val(PIndex)
	MOVE.L	D3,D1
	val	D1
	AND.L	#$0000FFFF,D1
	MOVEQ	#6,D2
	MOVEQ	#20,D0	; display signed RJ
	TRAP	#15
	ADDQ	#1,D4
	ADDQ	#1,D6
*	*** Display formatting ***
*		if((PCount % 10) == 0) printf("\n");
	CMP	#10,D4
	BLO	NoLF
	PUTS	CRLF
	MOVEQ	#0,D4
NoLF:
NotPrime:
	ADDQ	#1,D3
	BRA	PrintLoop
ExitPL:
	RTS

** ======================================================================= **

N	EQU	5000	; *** Size of boolean (bit) array ***
SizeInBytes	EQU	(N+7)/8
*
START:                  	; first instruction of program
	MOVE.L	#N,D0	; # to test
	JSR	Sieve
*		printf("\n %d prime numbers found.\n", D6); ***
	PUTS	Summary1,A1
	MOVE	#3,D0	; Display signed number in D1.L in decimal in smallest field.
	MOVE.W	D6,D1
	TRAP	#15
	PUTS	Summary2,A1

	SIMHALT             	; halt simulator

** ======================================================================= **

* Variables and constants here

	ORG	$2000
CR	EQU	13
LF	EQU	10
CRLF	DC.B	CR,LF,$00

PArray:	DCB.B	SizeInBytes,0

Header:	DC.B	CR,LF,LF,' Primes',CR,LF,' ======',CR,LF
		DC.B	'     1     2',$00

Summary1:	DC.B	CR,LF,' ',$00
Summary2:	DC.B	' prime numbers found.',CR,LF,$00

    END    START        	; last line of source

8086 Assembly

MAXPRM:	equ	5000		; Change this value for more primes
	cpu	8086
	bits	16
	org	100h
section	.text
erato:	mov	cx,MAXPRM	; Initialize array (set all items to prime)
	mov	bp,cx		; Keep a copy in BP
	mov	di,sieve
	mov	al,1
	rep	stosb
	;;;	Sieve
	mov	bx,sieve	; Set base register to array
	inc	cx		; CX=1 (CH=0, CL=1); CX was 0 before
	mov	si,cx		; Start at number 2 (1+1)
.next:	inc	si		; Next number 
	cmp	cl,[bx+si]	; Is this number marked as prime?
	jne	.next		; If not, try next number
	mov	ax,si		; Otherwise, calculate square,
	mul	si
	mov	di,ax		; and put it in DI
	cmp	di,bp		; Check array bounds
	ja	output		; We're done when SI*SI>MAXPRM
.mark:	mov	[bx+di],ch	; Mark byte as composite
	add	di,si		; Next composite
	cmp 	di,bp		; While maximum not reached
	jbe	.mark
	jmp	.next
	;;;	Output
output:	mov	si,2		; Start at 2
.test:	dec	byte [bx+si]	; Prime?
	jnz	.next		; If not, try next number
	mov	ax,si		; Otherwise, print number
	call	prax
.next:	inc	si
	cmp	si,MAXPRM
	jbe	.test
	ret
	;;;	Write number in AX to standard output (using MS-DOS)
prax:	push	bx		; Save BX
	mov	bx,numbuf
	mov	bp,10		; Divisor
.loop:	xor	dx,dx		; Divide AX by 10, modulus in DX
	div	bp
	add	dl,'0'		; ASCII digit
	dec	bx
	mov	[bx],dl		; Store ASCII digit
	test	ax,ax		; More digits?
	jnz	.loop
	mov	dx,bx		; Print number
	mov	ah,9		; 9 = MS-DOS syscall to print string
	int	21h
	pop	bx		; Restore BX
	ret
section	.data
	db	'*****'		; Room for number
numbuf:	db	13,10,'$'
section	.bss 
sieve:	resb	MAXPRM

Output:

8th

with: n

\ create a new buffer which will function as a bit vector
: bit-vec SED: n -- b
    dup 3 shr swap 7 band if 1+ then b:new b:clear ;

\ given a buffer, sieving prime, and limit, cross off multiples
\ of the sieving prime.
: +composites SED: b n n -- b
    >r dup sqr rot \ want: -- n index b
    repeat
        over 1- true b:bit!
        >r over + r>
    over r@ > until!
    rdrop nip nip ;

\ SoE algorithm proper
: make-sieve SED: n -- b
    dup>r bit-vec 2
    repeat
        tuck 1- b:bit@ not
        if
            over r@ +composites
        then swap 1+
    dup sqr r@ < while!
    rdrop drop ;

\ traverse the final buffer, creating an array of primes
: sieve>a SED: b n -- a
    >r a:new swap
    ( 1- b:bit@ not if >r I a:push r> then ) 2 r> loop drop ;

;with

: sieve SED: n -- a
    dup make-sieve swap sieve>a ;

Output:

ok> 100 sieve .
 [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
ok> 1_000_000 sieve a:len . \ count primes up to 1,000,000
78498
ok> -1 a:@ . \ largest prime < 1,000,000
999983

AArch64 Assembly

Works with: as version Raspberry Pi 3B version Buster 64 bits

/* ARM assembly AARCH64 Raspberry PI 3B */
/*  program cribleEras64.s   */

/*******************************************/
/* Constantes file                         */
/*******************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeConstantesARM64.inc"

.equ MAXI,      100

/*********************************/
/* Initialized data              */
/*********************************/
.data
sMessResult:        .asciz "Prime  : @ \n"
szCarriageReturn:   .asciz "\n"

/*********************************/
/* UnInitialized data            */
/*********************************/
.bss  
sZoneConv:                  .skip 24
TablePrime:                 .skip   8 * MAXI 
/*********************************/
/*  code section                 */
/*********************************/
.text
.global main 
main:                               // entry of program 
    ldr x4,qAdrTablePrime           // address prime table
    mov x0,#2                       // prime 2
    bl displayPrime
    mov x1,#2
    mov x2,#1
1:                                  // loop for multiple of 2
    str x2,[x4,x1,lsl #3]           // mark  multiple of 2
    add x1,x1,#2
    cmp x1,#MAXI                    // end ?
    ble 1b                          // no loop
    mov x1,#3                       // begin indice
    mov x3,#1
2:
    ldr x2,[x4,x1,lsl #3]           // load table élément
    cmp x2,#1                       // is prime ?
    beq 4f
    mov x0,x1                       // yes -> display
    bl displayPrime
    mov x2,x1
3:                                  // and loop to mark multiples of this prime
    str x3,[x4,x2,lsl #3]
    add x2,x2,x1                    // add the prime
    cmp x2,#MAXI                    // end ?
    ble 3b                          // no -> loop
4:
    add x1,x1,2                     // other prime in table
    cmp x1,MAXI                     // end table ?
    ble 2b                          // no -> loop

100:                                // standard end of the program 
    mov x0,0                        // return code
    mov x8,EXIT                     // request to exit program
    svc 0                           // perform the system call
qAdrszCarriageReturn:    .quad szCarriageReturn
qAdrsMessResult:         .quad sMessResult
qAdrTablePrime:          .quad TablePrime

/******************************************************************/
/*      Display prime table elements                                */ 
/******************************************************************/
/* x0 contains the prime */
displayPrime:
    stp x1,lr,[sp,-16]!             // save  registers
    ldr x1,qAdrsZoneConv
    bl conversion10                 // call décimal conversion
    ldr x0,qAdrsMessResult
    ldr x1,qAdrsZoneConv            // insert conversion in message
    bl strInsertAtCharInc
    bl affichageMess                // display message
100:
    ldp x1,lr,[sp],16               // restaur  2 registers
    ret                             // return to address lr x30
qAdrsZoneConv:                   .quad sZoneConv  

/********************************************************/
/*        File Include fonctions                        */
/********************************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeARM64.inc"

Prime  : 2
Prime  : 3
Prime  : 5
Prime  : 7
Prime  : 11
Prime  : 13
Prime  : 17
Prime  : 19
Prime  : 23
Prime  : 29
Prime  : 31
Prime  : 37
Prime  : 41
Prime  : 43
Prime  : 47
Prime  : 53
Prime  : 59
Prime  : 61
Prime  : 67
Prime  : 71
Prime  : 73
Prime  : 79
Prime  : 83
Prime  : 89
Prime  : 97

ABAP

PARAMETERS: p_limit TYPE i OBLIGATORY DEFAULT 100.

AT SELECTION-SCREEN ON p_limit.
  IF p_limit LE 1.
    MESSAGE 'Limit must be higher then 1.' TYPE 'E'.
  ENDIF.

START-OF-SELECTION.
  FIELD-SYMBOLS: <fs_prime> TYPE flag.
  DATA: gt_prime TYPE TABLE OF flag,
        gv_prime TYPE flag,
        gv_i     TYPE i,
        gv_j     TYPE i.

  DO p_limit TIMES.
    IF sy-index > 1.
      gv_prime = abap_true.
    ELSE.
      gv_prime = abap_false.
    ENDIF.

    APPEND gv_prime TO gt_prime.
  ENDDO.

  gv_i = 2.
  WHILE ( gv_i <= trunc( sqrt( p_limit ) ) ).
    IF ( gt_prime[ gv_i ] EQ abap_true ).
      gv_j =  gv_i ** 2.
      WHILE ( gv_j <= p_limit ).
        gt_prime[ gv_j ] = abap_false.
        gv_j = ( gv_i ** 2 ) + ( sy-index * gv_i ).
      ENDWHILE.
    ENDIF.
    gv_i = gv_i + 1.
  ENDWHILE.

  LOOP AT gt_prime INTO gv_prime.
    IF gv_prime = abap_true.
      WRITE: / sy-tabix.
    ENDIF.
  ENDLOOP.

ABC

HOW TO SIEVE UP TO n:
    SHARE sieve
    PUT {} IN sieve
    FOR cand IN {2..n}: PUT 1 IN sieve[cand]
    FOR cand IN {2..floor root n}:
        IF sieve[cand] = 1:
            PUT cand*cand IN comp
            WHILE comp <= n:
                PUT 0 IN sieve[comp]
                PUT comp+cand IN comp

HOW TO REPORT prime n:
    SHARE sieve
    IF n<2: FAIL
    REPORT sieve[n] = 1

SIEVE UP TO 100
FOR n IN {1..100}:
    IF prime n: WRITE n

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

ACL2

(defun nats-to-from (n i)
   (declare (xargs :measure (nfix (- n i))))
   (if (zp (- n i))
       nil
       (cons i (nats-to-from n (+ i 1)))))

(defun remove-multiples-up-to-r (factor limit xs i)
   (declare (xargs :measure (nfix (- limit i))))
   (if (or (> i limit)
           (zp (- limit i))
           (zp factor))
       xs
       (remove-multiples-up-to-r
        factor
        limit
        (remove i xs)
        (+ i factor))))

(defun remove-multiples-up-to (factor limit xs)
   (remove-multiples-up-to-r factor limit xs (* factor 2)))

(defun sieve-r (factor limit)
   (declare (xargs :measure (nfix (- limit factor))))
   (if (zp (- limit factor))
       (nats-to-from limit 2)
       (remove-multiples-up-to factor (+ limit 1)
                               (sieve-r (1+ factor) limit))))

(defun sieve (limit)
   (sieve-r 2 limit))

Action!

DEFINE MAX="1000"

PROC Main()
  BYTE ARRAY t(MAX+1)
  INT i,j,k,first

  FOR i=0 TO MAX
  DO
    t(i)=1
  OD

  t(0)=0
  t(1)=0

  i=2 first=1
  WHILE i<=MAX
  DO
    IF t(i)=1 THEN
      IF first=0 THEN
        Print(", ")
      FI
      PrintI(i)
      FOR j=2*i TO MAX STEP i
      DO
        t(j)=0
      OD
      first=0
    FI
    i==+1
  OD
RETURN

Output:

Screenshot from Atari 8-bit computer

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103,
107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223,
227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347,
349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463,
467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607,
613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743,
751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883,
887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997

ActionScript

Works with ActionScript 3.0 (this is utilizing the actions panel, not a separated class file)

function eratosthenes(limit:int):Array
{
	var primes:Array = new Array();
	if (limit >= 2) {
		var sqrtlmt:int = int(Math.sqrt(limit) - 2);
		var nums:Array = new Array(); // start with an empty Array...
		for (var i:int = 2; i <= limit; i++) // and
			nums.push(i); // only initialize the Array once...
		for (var j:int = 0; j <= sqrtlmt; j++) {
			var p:int = nums[j]
			if (p)
				for (var t:int = p * p - 2; t < nums.length; t += p)
					nums[t] = 0;
		}
		for (var m:int = 0; m < nums.length; m++) {
			var r:int = nums[m];
			if (r)
				primes.push(r);
		}
	}
	return primes;
}
var e:Array = eratosthenes(1000);
trace(e);

Output:

Output:

2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,947,953,967,971,977,983,991,997

Ada

with Ada.Text_IO, Ada.Command_Line;

procedure Eratos is
 
   Last: Positive := Positive'Value(Ada.Command_Line.Argument(1));
   Prime: array(1 .. Last) of Boolean := (1 => False, others => True);
   Base: Positive := 2;
   Cnt: Positive;
begin
   while Base * Base <= Last loop
      if Prime(Base) then
         Cnt := Base + Base;
         while Cnt <= Last loop
            Prime(Cnt) := False;
            Cnt := Cnt + Base;
         end loop;
      end if;
      Base := Base + 1;
   end loop;
   Ada.Text_IO.Put("Primes less or equal" & Positive'Image(Last) &" are:");
   for Number in Prime'Range loop
      if Prime(Number) then
         Ada.Text_IO.Put(Positive'Image(Number));
      end if;
   end loop;
end Eratos;

Output:

> ./eratos 31
Primes less or equal 31 are : 2 3 5 7 11 13 17 19 23 29 31

Agda

-- imports
open import Data.Nat as ℕ     using (ℕ; suc; zero; _+_; _∸_)
open import Data.Vec as Vec   using (Vec; _∷_; []; tabulate; foldr)
open import Data.Fin as Fin   using (Fin; suc; zero)
open import Function          using (_∘_; const; id)
open import Data.List as List using (List; _∷_; [])
open import Data.Maybe        using (Maybe; just; nothing)

-- Without square cutoff optimization
module Simple where
  primes : ∀ n → List (Fin n)
  primes zero = []
  primes (suc zero) = []
  primes (suc (suc zero)) = []
  primes (suc (suc (suc m))) = sieve (tabulate (just ∘ suc))
    where
    sieve : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m))
    sieve [] = []
    sieve (nothing ∷ xs) =         sieve xs
    sieve (just x  ∷ xs) = suc x ∷ sieve (foldr B remove (const []) xs x)
      where
      B = λ n → ∀ {i} → Fin i → Vec (Maybe (Fin (2 + m))) n

      remove : ∀ {n} → Maybe (Fin (2 + m)) → B n → B (suc n)
      remove _ ys zero    = nothing ∷ ys x
      remove y ys (suc z) = y       ∷ ys z

-- With square cutoff optimization
module SquareOpt where
  primes : ∀ n → List (Fin n)
  primes zero = []
  primes (suc zero) = []
  primes (suc (suc zero)) = []
  primes (suc (suc (suc m))) = sieve 1 m (Vec.tabulate (just ∘ Fin.suc ∘ Fin.suc))
    where
    sieve : ∀ {n} → ℕ → ℕ → Vec (Maybe (Fin (3 + m))) n → List (Fin (3 + m))
    sieve _ zero = List.mapMaybe id ∘ Vec.toList
    sieve _ (suc _) [] = []
    sieve i (suc l) (nothing ∷ xs) =     sieve (suc i) (l ∸ i ∸ i) xs
    sieve i (suc l) (just x  ∷ xs) = x ∷ sieve (suc i) (l ∸ i ∸ i) (Vec.foldr B remove (const []) xs i)
      where
      B = λ n → ℕ → Vec (Maybe (Fin (3 + m))) n

      remove : ∀ {i} → Maybe (Fin (3 + m)) → B i → B (suc i)
      remove _ ys zero    = nothing ∷ ys i
      remove y ys (suc j) = y       ∷ ys j

Agena

Tested with Agena 2.9.5 Win32

# Sieve of Eratosthenes

# generate and return a sequence containing the primes up to sieveSize
sieve := proc( sieveSize :: number ) :: sequence is
    local sieve, result;

    result := seq(); # sequence of primes - initially empty
    create register sieve( sieveSize ); # "vector" to be sieved

    sieve[ 1 ] := false;
    for sPos from 2 to sieveSize do sieve[ sPos ] := true od;

    # sieve the primes
    for sPos from 2 to entier( sqrt( sieveSize ) ) do
        if sieve[ sPos ] then
            for p from sPos * sPos to sieveSize by sPos do
                sieve[ p ] := false
            od
        fi
    od;

    # construct the sequence of primes
    for sPos from 1 to sieveSize do
        if sieve[ sPos ] then insert sPos into result fi
    od

return result
end; # sieve


# test the sieve proc
for i in sieve( 100 ) do write( " ", i ) od; print();

Output:

 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

ALGOL 60

Based on the 1962 Revised Repport:

comment Sieve of Eratosthenes;
begin
   integer array t[0:1000];
   integer i,j,k;
   for i:=0 step 1 until 1000 do t[i]:=1;
   t[0]:=0; t[1]:=0; i:=0;
   for i:=i while i<1000 do
   begin
       for i:=i while i<1000 and t[i]=0 do i:=i+1;
       if i<1000 then
       begin
           j:=2;
           k:=j*i;
           for k:=k while k<1000 do
           begin
               t[k]:=0;
               j:=j+1;
               k:=j*i
           end;
           i:=i+1
       end
   end;
   for i:=0 step 1 until 999 do
   if t[i]≠0 then print(i,ꞌ is primeꞌ)
end

An 1964 Implementation:

Works with: ALGOL 60 for OS/360

'BEGIN'
    'INTEGER' 'ARRAY' CANDIDATES(/0..1000/);
    'INTEGER' I,J,K;
    'COMMENT' SET LINE-LENGTH=120,SET LINES-PER-PAGE=62,OPEN;
    SYSACT(1,6,120); SYSACT(1,8,62); SYSACT(1,12,1);
    'FOR' I := 0 'STEP' 1 'UNTIL' 1000 'DO'
    'BEGIN'
        CANDIDATES(/I/) := 1;
    'END';
    CANDIDATES(/0/) := 0;
    CANDIDATES(/1/) := 0;
    I := 0;
    'FOR' I := I 'WHILE' I 'LESS' 1000 'DO'
    'BEGIN'
        'FOR' I := I 'WHILE' I 'LESS' 1000
          'AND' CANDIDATES(/I/) 'EQUAL' 0 'DO'
            I := I+1;
        'IF' I 'LESS' 1000 'THEN'
        'BEGIN'
            J := 2;
            K := J*I;
            'FOR' K := K 'WHILE' K 'LESS' 1000 'DO'
            'BEGIN'
                CANDIDATES(/K/) := 0;
                J := J + 1;
                K := J*I;
            'END';
            I := I+1;
            'END'
        'END';
        'FOR' I := 0 'STEP' 1 'UNTIL' 999 'DO'
        'IF' CANDIDATES(/I/) 'NOTEQUAL' 0  'THEN'
        'BEGIN'
            OUTINTEGER(1,I);
            OUTSTRING(1,'(' IS PRIME')');
            'COMMENT' NEW LINE;
            SYSACT(1,14,1)
        'END'
    'END'
'END'

ALGOL 68

BOOL prime = TRUE, non prime = FALSE;
PROC eratosthenes = (INT n)[]BOOL:
(
  [n]BOOL sieve;
  FOR i TO UPB sieve DO sieve[i] := prime OD;
  INT m = ENTIER sqrt(n);
  sieve[1] := non prime;
  FOR i FROM 2 TO m DO
    IF sieve[i] = prime THEN
      FOR j FROM i*i BY i TO n DO
        sieve[j] := non prime
      OD
    FI
  OD;
  sieve
);
 
 print((eratosthenes(80),new line))

Output:

FTTFTFTFFFTFTFFFTFTFFFTFFFFFTFTFFFFFTFFFTFTFFFTFFFFFTFFFFFTFTFFFFFTFFFTFTFFFFFTF

ALGOL W

Standard, non-optimised sieve

begin

    % implements the sieve of Eratosthenes                                   %
    %     s(i) is set to true if i is prime, false otherwise                 %
    %     algol W doesn't have a upb operator, so we pass the size of the    %
    %     array in n                                                         %
    procedure sieve( logical array s ( * ); integer value n ) ;
    begin

        % start with everything flagged as prime                             % 
        for i := 1 until n do s( i ) := true;

        % sieve out the non-primes                                           %
        s( 1 ) := false;
        for i := 2 until truncate( sqrt( n ) )
        do begin
            if s( i )
            then begin
                for p := i * i step i until n do s( p ) := false
            end if_s_i
        end for_i ;

    end sieve ;

    % test the sieve procedure                                               %

    integer sieveMax;

    sieveMax := 100;
    begin

        logical array s ( 1 :: sieveMax );

        i_w := 2; % set output field width                                   %
        s_w := 1; % and output separator width                               %

        % find and display the primes                                        %
        sieve( s, sieveMax );
        for i := 1 until sieveMax do if s( i ) then writeon( i );

    end

end.

Output:

 2  3  5  7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Odd numbers only version

Alternative version that only stores odd numbers greater than 1 in the sieve.

begin
    % implements the sieve of Eratosthenes                                   %
    % only odd numbers appear in the sieve, which starts at 3                %
    % s( i ) is set to true if ( i * 2 ) + 1 is prime                        %
    procedure sieve2( logical array s ( * ); integer value n ) ;
    begin
        % start with everything flagged as prime                             % 
        for i := 1 until n do s( i ) := true;
        % sieve out the non-primes                                           %
        % the subscripts of s are  1  2  3  4  5  6  7  8  9 10 11 12 13...  %
        %      which correspond to 3  5  7  9 11 13 15 17 19 21 23 25 27...  %
        for i := 1 until truncate( sqrt( n ) ) do begin
            if s( i ) then begin
                integer ip;
                ip := ( i * 2 ) + 1;
                for p := i + ip step ip until n do s( p ) := false
            end if_s_i
        end for_i ;
    end sieve2 ;
    % test the sieve2 procedure                                              %
    integer primeMax, arrayMax;
    primeMax := 100;
    arrayMax := ( primeMax div 2 ) - 1;
    begin
        logical array s ( 1 :: arrayMax);
        i_w := 2; % set output field width                                   %
        s_w := 1; % and output separator width                               %
        % find and display the primes                                        %
        sieve2( s, arrayMax );
        write( 2 );
        for i := 1 until arrayMax do if s( i ) then writeon( ( i * 2 ) + 1 );
    end
end.

Output:

Same as the standard version.

ALGOL-M

BEGIN

COMMENT
  FIND PRIMES UP TO THE SPECIFIED LIMIT (HERE 1,000) USING 
  CLASSIC SIEVE OF ERATOSTHENES;

% CALCULATE INTEGER SQUARE ROOT %
INTEGER FUNCTION ISQRT(N);
INTEGER N;
BEGIN
    INTEGER R1, R2;
    R1 := N;
    R2 := 1;
    WHILE R1 > R2 DO
        BEGIN
            R1 := (R1+R2) / 2;
            R2 := N / R1;
        END;
    ISQRT := R1;
END;

INTEGER LIMIT, I, J, FALSE, TRUE, COL, COUNT;
INTEGER ARRAY FLAGS[1:1000];

LIMIT := 1000;
FALSE := 0;
TRUE := 1;

WRITE("FINDING PRIMES FROM 2 TO",LIMIT);

% INITIALIZE TABLE %
WRITE("INITIALIZING ... ");
FOR I := 1 STEP 1 UNTIL LIMIT DO
  FLAGS[I] := TRUE;

% SIEVE FOR PRIMES %
WRITEON("SIEVING ... ");
FOR I := 2 STEP 1 UNTIL ISQRT(LIMIT) DO
  BEGIN
    IF FLAGS[I] = TRUE THEN
        FOR J := (I * I) STEP I UNTIL LIMIT DO
          FLAGS[J] := FALSE;
  END;

% WRITE OUT THE PRIMES TEN PER LINE %
WRITEON("PRINTING");
COUNT := 0;
COL := 1;
WRITE("");  
FOR I := 2 STEP 1 UNTIL LIMIT DO
  BEGIN
    IF FLAGS[I] = TRUE THEN 
      BEGIN
         WRITEON(I);
         COUNT := COUNT + 1;
         COL := COL + 1;
         IF COL > 10 THEN
           BEGIN
             WRITE("");
             COL := 1;
           END;
      END;
  END;

WRITE("");
WRITE(COUNT, " PRIMES WERE FOUND.");

END

Output:

FINDING PRIMES FROM 2 TO  1000
INTIALIZING ... SIEVING ... PRINTING
    2     3     5     7    11    13    17    19    23    29
   31    37    41    43    47    53    59    61    67    71
                      . . .
  877   881   883   887   907   911   919   929   937   941
  947   953   967   971   977   983   991   997

  168 PRIMES WERE FOUND.

APL

All these versions requires ⎕io←0 (index origin 0).

It would have been better to require a result of the boolean mask rather than the actual list of primes. The list of primes obtains readily from the mask by application of a simple function (here {⍵/⍳⍴⍵}). Other related computations (such as the number of primes < n) obtain readily from the mask, easier than producing the list of primes.

Non-Optimized Version

sieve2←{                          
  b←⍵⍴1             
  b[⍳2⌊⍵]←0         
  2≥⍵:b             
  p←{⍵/⍳⍴⍵}∇⌈⍵*0.5  
  m←1+⌊(⍵-1+p×p)÷p  
  b ⊣ p {b[⍺×⍺+⍳⍵]←0}¨ m
}

primes2←{⍵/⍳⍴⍵}∘sieve2

The required list of prime divisors obtains by recursion ({⍵/⍳⍴⍵}∇⌈⍵*0.5).

=== Optimized Version ===|

sieve←{                           
  b←⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5
  b[⍳6⌊⍵]←(6⌊⍵)⍴0 0 1 1 0 1 
  49≥⍵:b                    
  p←3↓{⍵/⍳⍴⍵}∇⌈⍵*0.5        
  m←1+⌊(⍵-1+p×p)÷2×p        
  b ⊣ p {b[⍺×⍺+2×⍳⍵]←0}¨ m      
}

primes←{⍵/⍳⍴⍵}∘sieve

The optimizations are as follows:

Multiples of 2 3 5 are marked by initializing b with ⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5 rather than with ⍵⍴1.
Subsequently, only odd multiples of primes > 5 are marked.
Multiples of a prime to be marked start at its square.

Examples

   primes 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

   primes¨ ⍳14
┌┬┬┬─┬───┬───┬─────┬─────┬───────┬───────┬───────┬───────┬──────────┬──────────┐
││││2│2 3│2 3│2 3 5│2 3 5│2 3 5 7│2 3 5 7│2 3 5 7│2 3 5 7│2 3 5 7 11│2 3 5 7 11│
└┴┴┴─┴───┴───┴─────┴─────┴───────┴───────┴───────┴───────┴──────────┴──────────┘

   sieve 13
0 0 1 1 0 1 0 1 0 0 0 1 0

   +/∘sieve¨ 10*⍳10
0 4 25 168 1229 9592 78498 664579 5761455 50847534

The last expression computes the number of primes < 1e0 1e1 ... 1e9. The last number 50847534 can perhaps be called the anti-Bertelsen's number (http://mathworld.wolfram.com/BertelsensNumber.html).

AppleScript

on sieveOfEratosthenes(limit)
    script o
        property numberList : {missing value}
    end script
    
    repeat with n from 2 to limit
        set end of o's numberList to n
    end repeat
    repeat with n from 2 to (limit ^ 0.5 div 1)
        if (item n of o's numberList is n) then
            repeat with multiple from (n * n) to limit by n
                set item multiple of o's numberList to missing value
            end repeat
        end if
    end repeat
    
    return o's numberList's numbers
end sieveOfEratosthenes

sieveOfEratosthenes(1000)

Output:

{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997}

ARM Assembly

Works with: as version Raspberry Pi

/* ARM assembly Raspberry PI  */
/*  program cribleEras.s   */

 /* REMARK 1 : this program use routines in a include file 
   see task Include a file language arm assembly 
   for the routine affichageMess conversion10 
   see at end of this program the instruction include */
/* for constantes see task include a file in arm assembly */
/************************************/
/* Constantes                       */
/************************************/
.include "../constantes.inc"

.equ MAXI,      101


/*********************************/
/* Initialized data              */
/*********************************/
.data
sMessResult:        .asciz "Prime  : @ \n"
szCarriageReturn:   .asciz "\n"

/*********************************/
/* UnInitialized data            */
/*********************************/
.bss  
sZoneConv:                  .skip 24
TablePrime:                 .skip   4 * MAXI 
/*********************************/
/*  code section                 */
/*********************************/
.text
.global main 
main:                               @ entry of program 
    ldr r4,iAdrTablePrime           @ address prime table
    mov r0,#2                       @ prime 2
    bl displayPrime
    mov r1,#2
    mov r2,#1
1:                                  @ loop for multiple of 2
    str r2,[r4,r1,lsl #2]           @ mark  multiple of 2
    add r1,#2
    cmp r1,#MAXI                    @ end ?
    ble 1b                          @ no loop
    mov r1,#3                       @ begin indice
    mov r3,#1
2:
    ldr r2,[r4,r1,lsl #2]           @ load table élément
    cmp r2,#1                       @ is prime ?
    beq 4f
    mov r0,r1                       @ yes -> display
    bl displayPrime
    mov r2,r1
3:                                  @ and loop to mark multiples of this prime
    str r3,[r4,r2,lsl #2]
    add r2,r1                       @ add the prime
    cmp r2,#MAXI              @ end ?
    ble 3b                          @ no -> loop
4:
    add r1,#2                       @ other prime in table
    cmp r1,#MAXI              @ end table ?
    ble 2b                          @ no -> loop

100:                                @ standard end of the program 
    mov r0, #0                      @ return code
    mov r7, #EXIT                   @ request to exit program
    svc #0                          @ perform the system call
iAdrszCarriageReturn:    .int szCarriageReturn
iAdrsMessResult:         .int sMessResult
iAdrTablePrime:          .int TablePrime

/******************************************************************/
/*      Display prime table elements                                */ 
/******************************************************************/
/* r0 contains the prime */
displayPrime:
    push {r1,lr}                    @ save registers
    ldr r1,iAdrsZoneConv
    bl conversion10                 @ call décimal conversion
    ldr r0,iAdrsMessResult
    ldr r1,iAdrsZoneConv            @ insert conversion in message
    bl strInsertAtCharInc
    bl affichageMess                @ display message
100:
    pop {r1,lr}
    bx lr
iAdrsZoneConv:                   .int sZoneConv  
/***************************************************/
/*      ROUTINES INCLUDE                           */
/***************************************************/
.include "../affichage.inc"

Prime  : 2
Prime  : 3
Prime  : 5
Prime  : 7
Prime  : 11
Prime  : 13
Prime  : 17
Prime  : 19
Prime  : 23
Prime  : 29
Prime  : 31
Prime  : 37
Prime  : 41
Prime  : 43
Prime  : 47
Prime  : 53
Prime  : 59
Prime  : 61
Prime  : 67
Prime  : 71
Prime  : 73
Prime  : 79
Prime  : 83
Prime  : 89
Prime  : 97
Prime  : 101

Arturo

sieve: function [upto][
    composites: array.of: inc upto false
    loop 2..to :integer sqrt upto 'x [
        if not? composites\[x][
            loop range.step: x x^2 upto 'c [
                composites\[c]: true
            ]
        ]
    ]
    result: new []
    loop.with:'i composites 'c [
        unless c -> 'result ++ i
    ]
    return result -- [0,1]
]

print sieve 100

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

AutoHotkey

Search autohotkey.com: of Eratosthenes
Source: AutoHotkey forum by Laszlo

MsgBox % "12345678901234567890`n" Sieve(20) 

Sieve(n) { ; Sieve of Eratosthenes => string of 0|1 chars, 1 at position k: k is prime 
   Static zero := 48, one := 49 ; Asc("0"), Asc("1") 
   VarSetCapacity(S,n,one) 
   NumPut(zero,S,0,"char") 
   i := 2 
   Loop % sqrt(n)-1 { 
      If (NumGet(S,i-1,"char") = one) 
         Loop % n//i 
            If (A_Index > 1) 
               NumPut(zero,S,A_Index*i-1,"char") 
      i += 1+(i>2) 
   } 
   Return S 
}

Alternative Version

Sieve_of_Eratosthenes(n){
	arr := []
	loop % n-1
		if A_Index>1
			arr[A_Index] := true

	for i, v in arr	{
		if (i>Sqrt(n))
			break
		else if arr[i]
			while ((j := i*2 + (A_Index-1)*i) < n)
				arr.delete(j)
	}
	return Arr
}

Examples:

n := 101
Arr := Sieve_of_Eratosthenes(n)
loop, % n-1
	output .= (Arr[A_Index] ? A_Index : ".") . (!Mod(A_Index, 10) ? "`n" : "`t")
MsgBox % output
return

Output:

.	2	3	.	5	.	7	.	.	.
11	.	13	.	.	.	17	.	19	.
.	.	23	.	.	.	.	.	29	.
31	.	.	.	.	.	37	.	.	.
41	.	43	.	.	.	47	.	.	.
.	.	53	.	.	.	.	.	59	.
61	.	.	.	.	.	67	.	.	.
71	.	73	.	.	.	.	.	79	.
.	.	83	.	.	.	.	.	89	.
.	.	.	.	.	.	97	.	.	.

AutoIt

#include <Array.au3>
$M = InputBox("Integer", "Enter biggest Integer")
Global $a[$M], $r[$M], $c = 1
For $i = 2 To $M -1
	If Not $a[$i] Then
		$r[$c] = $i
		$c += 1
		For $k = $i To $M -1 Step $i
			$a[$k] = True
		Next
	EndIf
Next
$r[0] = $c - 1
ReDim $r[$c]
_ArrayDisplay($r)

AWK

An initial array holds all numbers 2..max (which is entered on stdin); then all products of integers are deleted from it; the remaining are displayed in the unsorted appearance of a hash table. Here, the script is entered directly on the commandline, and input entered on stdin:

$ awk '{for(i=2;i<=$1;i++) a[i]=1;
>       for(i=2;i<=sqrt($1);i++) for(j=2;j<=$1;j++) delete a[i*j];
>       for(i in a) printf i" "}'
100
71 53 17 5 73 37 19 83 47 29 7 67 59 11 97 79 89 31 13 41 23 2 61 43 3

The following variant does not unset non-primes, but sets them to 0, to preserve order in output:

$ awk '{for(i=2;i<=$1;i++) a[i]=1;
>       for(i=2;i<=sqrt($1);i++) for(j=2;j<=$1;j++) a[i*j]=0;
>       for(i=2;i<=$1;i++) if(a[i])printf i" "}'
100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Now with the script from a file, input from commandline as well as stdin, and input is checked for valid numbers:

# usage:  gawk  -v n=101  -f sieve.awk

function sieve(n) { # print n,":"
	for(i=2; i<=n;      i++) a[i]=1;
	for(i=2; i<=sqrt(n);i++) for(j=2;j<=n;j++) a[i*j]=0;
	for(i=2; i<=n;      i++) if(a[i]) printf i" "
	print ""
}

BEGIN	{ print "Sieve of Eratosthenes:"
	  if(n>1) sieve(n)
	}

	{ n=$1+0 }
n<2	{ exit }
	{ sieve(n) }

END	{ print "Bye!" }

Here is an alternate version that uses an associative array to record composites with a prime dividing it. It can be considered a slow version, as it does not cross out composites until needed. This version assumes enough memory to hold all primes up to ULIMIT. It prints out noncomposites greater than 1.

BEGIN {  ULIMIT=100

for ( n=1 ; (n++) < ULIMIT ; ) 
    if (n in S) {
        p = S[n]
        delete S[n]
        for ( m = n ; (m += p) in S ; )  { }
        S[m] = p 
        }
    else  print ( S[(n+n)] = n )
}

Bash

See solutions at UNIX Shell.

BASIC

Works with: FreeBASIC

Works with: RapidQ

DIM n AS Integer, k AS Integer, limit AS Integer

INPUT "Enter number to search to: "; limit
DIM flags(limit) AS Integer

FOR n = 2 TO SQR(limit)
    IF flags(n) = 0 THEN
        FOR k = n*n TO limit STEP n
            flags(k) = 1
        NEXT k
    END IF
NEXT n

' Display the primes
FOR n = 2 TO limit
    IF flags(n) = 0 THEN PRINT n; ", ";
NEXT n

Applesoft BASIC

10  INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT
20  DIM FLAGS(LIMIT)
30  FOR N = 2 TO SQR (LIMIT)
40  IF FLAGS(N) < > 0 GOTO 80
50  FOR K = N * N TO LIMIT STEP N
60  FLAGS(K) = 1
70  NEXT K
80  NEXT N
90  REM  DISPLAY THE PRIMES
100  FOR N = 2 TO LIMIT
110  IF FLAGS(N) = 0 THEN PRINT N;", ";
120  NEXT N

Atari BASIC

Translation of: Commodore BASIC

Auto-initialization of arrays is not reliable, so we have to do our own. Also, PRINTing with commas doesn't quite format as nicely as one might hope, so we do a little extra work to keep the columns lined up.

100 REM SIEVE OF ERATOSTHENES
110 PRINT "LIMIT";:INPUT LI
120 DIM N(LI):FOR I=0 TO LI:N(I)=1:NEXT I
130 SL = SQR(LI)
140 N(0)=0:N(1)=0
150 FOR P=2 TO SL
160  IF N(P)=0 THEN 200
170  FOR I=P*P TO LI STEP P
180    N(I)=0
190  NEXT I
200 NEXT P
210 C=0
220 FOR I=2 TO LI
230   IF N(I)=0 THEN 260
240   PRINT I,:C=C+1
250   IF C=3 THEN PRINT:C=0
260 NEXT I
270 IF C THEN PRINT

Output:

  Ready
  RUN
  LIMIT?100
  2         3         5
  7         11        13
  17        19        23
  29        31        37
  41        43        47
  53        59        61
  67        71        73
  79        83        89
  97

Commodore BASIC

Since C= BASIC initializes arrays to all zeroes automatically, we avoid needing our own initialization loop by simply letting 0 mean prime and using 1 for composite.

100 REM SIEVE OF ERATOSTHENES
110 INPUT "LIMIT";LI
120 DIM N(LI)
130 SL = SQR(LI)
140 N(0)=1:N(1)=1
150 FOR P=2 TO SL
160 : IF N(P) THEN 200
170 : FOR I=P*P TO LI STEP P
180 :   N(I)=1
190 : NEXT I
200 NEXT P
210 FOR I=2 TO LI
220 : IF N(I)=0 THEN PRINT I,
230 NEXT I
240 PRINT

Output:

READY.
RUN
LIMIT? 100
 2         3         5         7
 11        13        17        19
 23        29        31        37
 41        43        47        53
 59        61        67        71
 73        79        83        89
 97

READY.

IS-BASIC

100 PROGRAM "Sieve.bas"
110 LET LIMIT=100
120 NUMERIC T(1 TO LIMIT)
130 FOR I=1 TO LIMIT
140   LET T(I)=0
150 NEXT
160 FOR I=2 TO SQR(LIMIT)
170   IF T(I)<>1 THEN
180     FOR K=I*I TO LIMIT STEP I
190       LET T(K)=1
200     NEXT
210   END IF
220 NEXT
230 FOR I=2 TO LIMIT ! Display the primes
240   IF T(I)=0 THEN PRINT I;
250 NEXT

Locomotive Basic

10 DEFINT a-z
20 INPUT "Limit";limit
30 DIM f(limit)
40 FOR n=2 TO SQR(limit)
50 IF f(n)=1 THEN 90
60 FOR k=n*n TO limit STEP n
70 f(k)=1
80 NEXT k
90 NEXT n
100 FOR n=2 TO limit
110 IF f(n)=0 THEN PRINT n;",";
120 NEXT

MSX Basic

5 REM Tested with MSXPen web emulator
6 REM Translated from Rosetta's ZX Spectrum implementation 
10 INPUT "Enter number to search to: ";l
20 DIM p(l)
30 FOR n=2 TO SQR(l)
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
70 NEXT k
80 NEXT n
90 REM Display the primes
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n

Sinclair ZX81 BASIC

If you only have 1k of RAM, this program will work—but you will only be able to sieve numbers up to 101. The program is therefore more useful if you have more memory available.

A note on FAST and SLOW: under normal circumstances the CPU spends about 3/4 of its time driving the display and only 1/4 doing everything else. Entering FAST mode blanks the screen (which we do not want to update anyway), resulting in substantially improved performance; we then return to SLOW mode when we have something to print out.

 10 INPUT L
 20 FAST
 30 DIM N(L)
 40 FOR I=2 TO SQR L
 50 IF N(I) THEN GOTO 90
 60 FOR J=I+I TO L STEP I
 70 LET N(J)=1
 80 NEXT J
 90 NEXT I
100 SLOW
110 FOR I=2 TO L
120 IF NOT N(I) THEN PRINT I;" ";
130 NEXT I

ZX Spectrum Basic

10 INPUT "Enter number to search to: ";l
20 DIM p(l)
30 FOR n=2 TO SQR l
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
70 NEXT k
80 NEXT n
90 REM Display the primes
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n

QBasic

Works with: QBasic version 1.1

Works with: QuickBasic version 4.5

limit = 120

DIM flags(limit)
FOR n = 2 TO limit
    flags(n) = 1
NEXT n

PRINT "Prime numbers less than or equal to "; limit; " are: "
FOR n = 2 TO SQR(limit)
    IF flags(n) = 1 THEN
        FOR i = n * n TO limit STEP n
            flags(i) = 0
    NEXT i
    END IF
NEXT n

FOR n = 1 TO limit
    IF flags(n) THEN PRINT USING "####"; n;
NEXT n

Output:

Prime numbers less than or equal to 120 are: 
   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97 101 103 107 109 113

BASIC256

arraybase 1
limit = 120

dim  flags(limit)
for n = 2 to limit
    flags[n] = True
next n

print "Prime numbers less than or equal to "; limit; " are: "
for n = 2 to sqr(limit)
    if flags[n] then
        for i = n * n to limit step n
            flags[i] = False
        next i
    end if
next n

for n = 1 to limit
    if flags[n] then print rjust(n,4);
next n

Output:

Prime numbers less than or equal to 120 are: 
   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97 101 103 107 109 113

True BASIC

Translation of: QBasic

LET limit = 120
DIM flags(0)
MAT redim flags(limit)
FOR n = 2 to limit
    LET flags(n) = 1
NEXT n
PRINT "Prime numbers less than or equal to "; limit; " are: "
FOR n = 2 to sqr(limit)
    IF flags(n) = 1 then
       FOR i = n*n to limit step n
           LET flags(i) = 0
       NEXT i
    END IF
NEXT n
FOR n = 1 to limit
    IF flags(n)<>0 then PRINT  using "####": n;
NEXT n
END

Output:

Same as QBasic entry.

QL SuperBASIC

using 'easy way' to 'add' 2n wheels

Translation of: ZX Spectrum Basic

Sets h$ to 1 for higher multiples of 2 via FILL$, later on sets STEP to 2n; replaces Floating Pt array p(z) with string variable h$(z) to sieve out all primes < z=441 (l=21) in under 1K, so that h$ is fillable to its maximum (32766), even on a 48K ZX Spectrum if translated back.

10 INPUT "Enter Stopping Pt for squared factors: ";z
15 LET l=SQR(z)
20 LET h$="10" : h$=h$ & FILL$("01",z)
40      FOR n=3 TO l 
50 IF h$(n): NEXT n
60 FOR k=n*n TO z STEP n+n: h$(k)=1 
80      END FOR n   
90 REM Display the primes     
100 FOR n=2 TO z: IF h$(n)=0: PRINT n;", ";

2i wheel emulation of Sinclair ZX81 BASIC

Backward-compatible also on Spectrums, as well as 1K ZX81s for all primes < Z=441. N.B. the STEP of 2 in line 40 mitigates line 50's inefficiency when going to 90.

 10 INPUT Z
 15 LET L=SQR(Z)
 30 LET H$="10"
 32 FOR J=3 TO Z STEP 2
 34 LET H$=H$ & "01"
 36 NEXT J
 40 FOR I=3 TO L STEP 2
 50 IF H$(I)="1" THEN GOTO 90
 60 FOR J=I*I TO Z STEP I+I
 70 LET H$(J)="1"
 80 NEXT J
 90 NEXT I
110 FOR I=2 TO Z
120 IF H$(I)="0" THEN PRINT I!
130 NEXT I

2i wheel emulation of Sinclair ZX80 BASIC

. . . with 2:1 compression (of 16-bit integer variables on ZX80s) such that it obviates having to account for any multiple of 2; one has to input odd upper limits on factors to be squared, L (=21 at most on 1K ZX80s for all primes till 439).

Backward-compatible on ZX80s after substituting ** for ^ in line 120.

 10 INPUT L
 15 LET Z=(L+1)*(L- 1)/2
 30 DIM H(Z)
 40 FOR I=3 TO L STEP 2
 50 IF H((I-1)/ 2) THEN GOTO 90
 60 FOR J=I*I TO L*L STEP I+I
 70 LET H((J-1)/ 2)=1
 80 NEXT J
 90 NEXT I
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I

Sieve of Sundaram

Objections that the latter emulation has strayed far from the given task are obviously justified. Yet not as obvious is that we are now just a slight transformation away from the Sieve of Sundaram, as transformed as follows: O is the highest value for an Index of succesive diagonal elements in Sundaram's matrix, for which H(J) also includes the off-diagonal elements in-between, such that duplicate entries are omitted. Thus, a slightly transformed Sieve of Sundaram is what Eratosthenes' Sieve becomes upon applying all optimisations incorporated into the prior entries for QL SuperBASIC, except for any equivalent to line 50 in them.

Backward-compatible on 1K ZX80s for all primes < 441 (O=10) after substituting ** for ^ in line 120.

 10 INPUT O
 15 LET Z=2*O*O+O*2
 30 DIM H(Z)
 40 FOR I=1 TO O
 45 LET A=2*I*I+I*2
 50 REM IF H(A) THEN GOTO 90
 60 FOR J=A TO Z STEP 1+I*2
 65 REM IF H(J) THEN GOTO 80
 70 LET H(J)=1
 80 NEXT J
 90 NEXT I 
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I

Eulerian optimisation

While slower than the optimised Sieve of Eratosthenes before it, the Sieve of Sundaram above has a compatible compression scheme that's more convenient than the conventional one used beforehand. It is therefore applied below along with Euler's alternative optimisation in a reversed implementation that lacks backward-compatibility to ZX80 BASIC. This program is designed around features & limitations of the QL, yet can be rewritten more efficiently for 1K ZX80s, as they allow integer variables to be parameters of FOR statements (& as their 1K of static RAM is equivalent to L1 cache, even in FAST mode). That's left as an exercise for ZX80 enthusiasts, who for o%=14 should be able to generate all primes < 841, i.e. 3 orders of (base 2) magnitude above the limit for the program listed under Sinclair ZX81 BASIC. In QL SuperBASIC, o% may at most be 127--generating all primes < 65,025 (almost 2x the upper limit for indices & integer variables used to calculate them ~2x faster than for floating point as used in line 30, after which the integer code mimics an assembly algorithm for the QL's 68008.)

 
10 INPUT "Enter highest value of diagonal index q%: ";o%
15 LET z%=o%*(2+o%*2) : h$=FILL$(" ",z%+o%) : q%=1 : q=q% : m=z% DIV (2*q%+1)
30 FOR p=m TO q STEP -1: h$((2*q+1)*p+q)="1" 
42 GOTO 87
61 IF h$(p%)="1": GOTO 63
62 IF p%<q%: GOTO 87 : ELSE h$((2*q%+1)*p%+q%)="1"
63 LET p%=p%-1 : GOTO 61
87 LET q%=q%+1 : IF h$(q%)="1": GOTO 87
90 LET p%=z% DIV (2*q%+1) : IF q%<=o%: GOTO 61
100 LET z%=z%-1 : IF z%=0: PRINT N%(z%) : STOP
101 IF h$(z%)=" ": PRINT N%(z%)! 
110 GOTO 100
127 DEF FN N%(i)=0^i+1+i*2

Batch File

:: Sieve of Eratosthenes for Rosetta Code - PG
@echo off
setlocal ENABLEDELAYEDEXPANSION
setlocal ENABLEEXTENSIONS
rem echo on
set /p n=limit: 
rem set n=100
for /L %%i in (1,1,%n%) do set crible.%%i=1
for /L %%i in (2,1,%n%) do (
  if !crible.%%i! EQU 1 (
    set /A w = %%i * 2
    for /L %%j in (!w!,%%i,%n%) do (
	  set crible.%%j=0
	)
  )
)
for /L %%i in (2,1,%n%) do (
  if !crible.%%i! EQU 1 echo %%i 
)
pause

Output:

BBC BASIC

      limit% = 100000
      DIM sieve% limit%
      
      prime% = 2
      WHILE prime%^2 < limit%
        FOR I% = prime%*2 TO limit% STEP prime%
          sieve%?I% = 1
        NEXT
        REPEAT prime% += 1 : UNTIL sieve%?prime%=0
      ENDWHILE
      
      REM Display the primes:
      FOR I% = 1 TO limit%
        IF sieve%?I% = 0 PRINT I%;
      NEXT

BCPL

get "libhdr"

manifest $( LIMIT = 1000 $)

let sieve(prime,max) be
$(  let i = 2
    0!prime := false
    1!prime := false
    for i = 2 to max do i!prime := true
    while i*i <= max do
    $(  if i!prime do
        $(  let j = i*i
            while j <= max do
            $(  j!prime := false
                j := j + i
            $)
        $)
        i := i + 1
    $)
$)

let start() be
$(  let prime = vec LIMIT
    let col = 0
    sieve(prime, LIMIT)
    for i = 2 to LIMIT do
        if i!prime do
        $(  writef("%I4",i)   
            col := col + 1
            if col rem 20 = 0 then wrch('*N')
        $)
    wrch('*N')
$)

Output:

   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71
  73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
 947 953 967 971 977 983 991 997

Odds-only bit packed array version (64 bit)

This sieve also uses an iterator structure to enumerate the primes in the sieve. It's inspired by the golang bit packed sieve that returns a closure as an iterator. However, BCPL does not support closures, so the code uses an iterator object.

GET "libhdr"

LET lowbit(n) =
    0 -> -1,
    VALOF {
        // The table is byte packed to conserve space; therefore we must
        // unpack the structure.
        //
        LET deBruijn64 = TABLE
            #x0001300239311C03, #x3D3A322A261D1104,
            #x3E373B2435332B16, #x2D27211E18120C05,
            #x3F2F381B3C292510, #x362334152C20170B,
            #x2E1A280F22141F0A, #x190E13090D080706

        LET x6 = (n & -n) * #x3F79D71B4CB0A89 >> 58
        RESULTIS deBruijn64[x6 >> 3] >> (7 - (x6 & 7) << 3) & #xFF
    }

LET primes_upto(limit) =
    limit < 3 -> 0,
    VALOF {
        LET bit_sz = (limit + 1) / 2 - 1
        LET bit, p = ?, ?
        LET q, r = bit_sz >> 6, bit_sz & #x3F
        LET sz = q - (r > 0)
        LET sieve = getvec(sz)

        // Initialize the array
        FOR i = 0 TO q - 1 DO
            sieve!i := -1
        IF r > 0 THEN sieve!q := ~(-1 << r)
        sieve!sz := -1 // Sentinel value to mark the end -
              // (after sieving, we'll never have 64 consecutive odd primes.)

        // run the sieve
        bit := 0
        {
            WHILE (sieve[bit >> 6] & 1 << (bit & #x3F)) = 0 DO
                bit +:= 1
            p := 2*bit + 3
            q := p*p
            IF q > limit THEN RESULTIS sieve
            r := (q - 3) >> 1
            UNTIL r >= bit_sz DO {
                sieve[r >> 6] &:= ~(1 << (r & #x3F))
                r +:= p
            }
            bit +:= 1
        } REPEAT
    }

MANIFEST { // fields in an iterable
    sieve_start; sieve_bits; sieve_ptr
}

LET prime_iter(sieve) = VALOF {
    LET iter = getvec(2)
    iter!sieve_start := 0
    iter!sieve_bits := sieve!0
    iter!sieve_ptr := sieve
    RESULTIS iter
}

LET nextprime(iter) =
    !iter!sieve_ptr = -1 -> 0,  // guard entry if at the end already
    VALOF {
        LET p, x = ?, ?

        // iter!sieve_start is also a flag to yield 2.
        IF iter!sieve_start = 0 {
            iter!sieve_start := 3
            RESULTIS 2
        }
        x := iter!sieve_bits
        {
            TEST x ~= 0
            THEN {
                p := (lowbit(x) << 1) + iter!sieve_start
                x &:= x - 1
                iter!sieve_bits := x
                RESULTIS p
            }
            ELSE {
                iter!sieve_start +:= 128
                iter!sieve_ptr +:= 1
                x := !iter!sieve_ptr
                IF x = -1 RESULTIS 0
            }
        } REPEAT
    }

LET show(sieve) BE {
    LET iter = prime_iter(sieve)
    LET c, p = 0, ?
    {
        p := nextprime(iter)
        IF p = 0 THEN {
            wrch('*n')
            freevec(iter)
            RETURN
        }
        IF c MOD 10 = 0 THEN wrch('*n')
        c +:= 1
        writef("%8d", p)
    } REPEAT
}

LET start() = VALOF {
    LET n = ?
    LET argv = VEC 20
    LET sz = ?
    LET primes = ?

    sz := rdargs("upto/a/n/p", argv, 20)
    IF sz = 0 RESULTIS 1
    n := !argv!0
    primes := primes_upto(n)
    IF primes = 0 RESULTIS 1 // no array allocated because limit too small
    show(primes)
    freevec(primes)
    RESULTIS 0
}

Output:

$ ./sieve 1000

BCPL 64-bit Cintcode System (13 Jan 2020)
0.000> 
       2       3       5       7      11      13      17      19      23      29
      31      37      41      43      47      53      59      61      67      71
      73      79      83      89      97     101     103     107     109     113
     127     131     137     139     149     151     157     163     167     173
     179     181     191     193     197     199     211     223     227     229
     233     239     241     251     257     263     269     271     277     281
     283     293     307     311     313     317     331     337     347     349
     353     359     367     373     379     383     389     397     401     409
     419     421     431     433     439     443     449     457     461     463
     467     479     487     491     499     503     509     521     523     541
     547     557     563     569     571     577     587     593     599     601
     607     613     617     619     631     641     643     647     653     659
     661     673     677     683     691     701     709     719     727     733
     739     743     751     757     761     769     773     787     797     809
     811     821     823     827     829     839     853     857     859     863
     877     881     883     887     907     911     919     929     937     941
     947     953     967     971     977     983     991     997
0.005>

Befunge

2>:3g" "-!v\  g30          <
 |!`"O":+1_:.:03p>03g+:"O"`|
 @               ^  p3\" ":<
2 234567890123456789012345678901234567890123456789012345678901234567890123456789

Binary Lambda Calculus

The BLC sieve of Eratosthenes as documented at https://github.com/tromp/AIT/blob/master/characteristic_sequences/primes.lam is the 167 bit program

00010001100110010100011010000000010110000010010001010111110111101001000110100001110011010000000000101101110011100111111101111000000001111100110111000000101100000110110

The infinitely long output is

001101010001010001010001000001010000010001010001000001000001010000010001010000010001000001000000010001010001010001000000000000010001000001010000000001010000010000010001000001000001010000000001010001010000000000010000000000010001010001000001010000000001000001000001000001010000010001010000000001000000000000010001010001000000000000010000010000000001010001000001000...

BQN

A more efficient sieve (primes below one billion in under a minute) is provided as PrimesTo in bqn-libs primes.bqn.

Primes ← {
  𝕩≤2 ? ↕0 ;             # No primes below 2
  p ← 𝕊⌈√n←𝕩             # Initial primes by recursion
  b ← 2≤↕n               # Initial sieve: no 0 or 1
  E ← {↕∘⌈⌾((𝕩×𝕩+⊢)⁼)n}  # Multiples of 𝕩 under n, starting at 𝕩×𝕩
  / b E⊸{0¨⌾(𝕨⊸⊏)𝕩}´ p   # Cross them out
}

Output:

   Primes 100
⟨ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 ⟩
   ≠∘Primes¨ 10⋆↕7  # Number of primes below 1e0, 1e1, ... 1e6
⟨ 0 4 25 168 1229 9592 78498 ⟩

Bracmat

This solution does not use an array. Instead, numbers themselves are used as variables. The numbers that are not prime are set (to the silly value "nonprime"). Finally all numbers up to the limit are tested for being initialised. The uninitialised (unset) ones must be the primes.

( ( eratosthenes
  =   n j i
    .   !arg:?n
      & 1:?i
      &   whl
        ' ( (1+!i:?i)^2:~>!n:?j
          & ( !!i
            |   whl
              ' ( !j:~>!n
                & nonprime:?!j
                & !j+!i:?j
                )
            )
          )
      & 1:?i
      &   whl
        ' ( 1+!i:~>!n:?i
          & (!!i|put$(!i " "))
          )
  )
& eratosthenes$100
)

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

C

Plain sieve, without any optimizations:

#include <stdlib.h>
#include <math.h>

char*
eratosthenes(int n, int *c)
{
	char* sieve;
	int i, j, m;

	if(n < 2)
		return NULL;

	*c = n-1;     /* primes count */
	m = (int) sqrt((double) n);

	/* calloc initializes to zero */
	sieve = calloc(n+1,sizeof(char));
	sieve[0] = 1;
	sieve[1] = 1;
	for(i = 2; i <= m; i++)
		if(!sieve[i])
			for (j = i*i; j <= n; j += i)
				if(!sieve[j]){
					sieve[j] = 1; 
					--(*c);
				}
  	return sieve;
}

Possible optimizations include sieving only odd numbers (or more complex wheels), packing the sieve into bits to improve locality (and allow larger sieves), etc.

Another example:

We first fill ones into an array and assume all numbers are prime. Then, in a loop, fill zeroes into those places where i * j is less than or equal to n (number of primes requested), which means they have multiples! To understand this better, look at the output of the following example.

To print this back, we look for ones in the array and only print those spots.

#include <stdio.h>
#include <malloc.h>
void sieve(int *, int);

int main(int argc, char *argv)
{
    int *array, n=10;
    array =(int *)malloc((n + 1) * sizeof(int));
    sieve(array,n);
    return 0;
}

void sieve(int *a, int n)
{
    int i=0, j=0;

    for(i=2; i<=n; i++) {
        a[i] = 1;
    }

    for(i=2; i<=n; i++) {
        printf("\ni:%d", i);
        if(a[i] == 1) {
            for(j=i; (i*j)<=n; j++) {
                printf ("\nj:%d", j);
                printf("\nBefore a[%d*%d]: %d", i, j, a[i*j]);
                a[(i*j)] = 0;
                printf("\nAfter a[%d*%d]: %d", i, j, a[i*j]);
            }
        }
    }

    printf("\nPrimes numbers from 1 to %d are : ", n);
    for(i=2; i<=n; i++) {
        if(a[i] == 1)
            printf("%d, ", i);
    }
    printf("\n\n");
}

Output:

i:2
j:2
Before a[2*2]: 1
After a[2*2]: 0
j:3
Before a[2*3]: 1
After a[2*3]: 0
j:4
Before a[2*4]: 1
After a[2*4]: 0
j:5
Before a[2*5]: 1
After a[2*5]: 0
i:3
j:3
Before a[3*3]: 1
After a[3*3]: 0
i:4
i:5
i:6
i:7
i:8
i:9
i:10
Primes numbers from 1 to 10 are : 2, 3, 5, 7,

C#

Works with: C# version 2.0+

using System;
using System.Collections;
using System.Collections.Generic;

namespace SieveOfEratosthenes
{
    class Program
    {
        static void Main(string[] args)
        {
            int maxprime = int.Parse(args[0]);
            var primelist = GetAllPrimesLessThan(maxprime);
            foreach (int prime in primelist)
            {
                Console.WriteLine(prime);
            }
            Console.WriteLine("Count = " + primelist.Count);
            Console.ReadLine();
        }

        private static List<int> GetAllPrimesLessThan(int maxPrime)
        {
            var primes = new List<int>();
            var maxSquareRoot = (int)Math.Sqrt(maxPrime);
            var eliminated = new BitArray(maxPrime + 1);

            for (int i = 2; i <= maxPrime; ++i)
            {
                if (!eliminated[i])
                {
                    primes.Add(i);
                    if (i <= maxSquareRoot)
                    {
                        for (int j = i * i; j <= maxPrime; j += i)
                        {
                            eliminated[j] = true;
                        }
                    }
                }
            }
            return primes;
        }
    }
}

Richard Bird Sieve

Translation of: F#

To show that C# code can be written in somewhat functional paradigms, the following in an implementation of the Richard Bird sieve from the Epilogue of [Melissa E. O'Neill's definitive article](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) in Haskell:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesBird : IEnumerable<PrimeT> {
    private struct CIS<T> {
      public T v; public Func<CIS<T>> cont;
      public CIS(T v, Func<CIS<T>> cont) {
        this.v = v; this.cont = cont;
      }
    }
    private CIS<PrimeT> pmlts(PrimeT p) {
      Func<PrimeT, CIS<PrimeT>> fn = null;
      fn = (c) => new CIS<PrimeT>(c, () => fn(c + p));
      return fn(p * p);
    }
    private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {
      return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont())); }
    private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {
      var x = xs.v; var y = ys.v;
      if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));
      else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));
      else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));
    }
    private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {
      return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(css.cont()))); }
    private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {
      var nn = n; var ncs = cs;
      for (; ; ++nn) {
        if (nn >= ncs.v) ncs = ncs.cont();
        else return new CIS<PrimeT>(nn, () => minusat(++nn, ncs));
      }
    }
    private CIS<PrimeT> prms() {
      return new CIS<PrimeT>(2, () => minusat(3, cmpsts(allmlts(prms())))); }
    public IEnumerator<PrimeT> GetEnumerator() {
      for (var ps = prms(); ; ps = ps.cont()) yield return ps.v;
    }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

Tree Folding Sieve

Translation of: F#

The above code can easily be converted to "odds-only" and a infinite tree-like folding scheme with the following minor changes:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesTreeFold : IEnumerable<PrimeT> {
    private struct CIS<T> {
      public T v; public Func<CIS<T>> cont;
      public CIS(T v, Func<CIS<T>> cont) {
        this.v = v; this.cont = cont;
      }
    }
    private CIS<PrimeT> pmlts(PrimeT p) {
      var adv = p + p;
      Func<PrimeT, CIS<PrimeT>> fn = null;
      fn = (c) => new CIS<PrimeT>(c, () => fn(c + adv));
      return fn(p * p);
    }
    private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {
      return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont()));
    }
    private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {
      var x = xs.v; var y = ys.v;
      if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));
      else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));
      else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));
    }
    private CIS<CIS<PrimeT>> pairs(CIS<CIS<PrimeT>> css) {
      var nxtcss = css.cont();
      return new CIS<CIS<PrimeT>>(merge(css.v, nxtcss.v), () => pairs(nxtcss.cont())); }
    private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {
      return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(pairs(css.cont()))));
    }
    private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {
      var nn = n; var ncs = cs;
      for (; ; nn += 2) {
        if (nn >= ncs.v) ncs = ncs.cont();
        else return new CIS<PrimeT>(nn, () => minusat(nn + 2, ncs));
      }
    }
    private CIS<PrimeT> oddprms() {
      return new CIS<PrimeT>(3, () => minusat(5, cmpsts(allmlts(oddprms()))));
    }
    public IEnumerator<PrimeT> GetEnumerator() {
      yield return 2;
      for (var ps = oddprms(); ; ps = ps.cont()) yield return ps.v;
    }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

The above code runs over ten times faster than the original Richard Bird algorithm.

Priority Queue Sieve

Translation of: F#

First, an implementation of a Min Heap Priority Queue is provided as extracted from the entry at RosettaCode, with only the necessary methods duplicated here:

namespace PriorityQ {
  using KeyT = System.UInt32;
  using System;
  using System.Collections.Generic;
  using System.Linq;
  class Tuple<K, V> { // for DotNet 3.5 without Tuple's
    public K Item1; public V Item2;
    public Tuple(K k, V v) { Item1 = k; Item2 = v; }
    public override string ToString() {
      return "(" + Item1.ToString() + ", " + Item2.ToString() + ")";
    }
  }
  class MinHeapPQ<V> {
    private struct HeapEntry {
      public KeyT k; public V v;
      public HeapEntry(KeyT k, V v) { this.k = k; this.v = v; }
    }
    private List<HeapEntry> pq;
    private MinHeapPQ() { this.pq = new List<HeapEntry>(); }
    private bool mt { get { return pq.Count == 0; } }
    private Tuple<KeyT, V> pkmn {
      get {
        if (pq.Count == 0) return null;
        else {
          var mn = pq[0];
          return new Tuple<KeyT, V>(mn.k, mn.v);
        }
      }
    }
    private void psh(KeyT k, V v) { // add extra very high item if none
      if (pq.Count == 0) pq.Add(new HeapEntry(UInt32.MaxValue, v));
      var i = pq.Count; pq.Add(pq[i - 1]); // copy bottom item...
      for (var ni = i >> 1; ni > 0; i >>= 1, ni >>= 1) {
        var t = pq[ni - 1];
        if (t.k > k) pq[i - 1] = t; else break;
      }
      pq[i - 1] = new HeapEntry(k, v);
    }
    private void siftdown(KeyT k, V v, int ndx) {
      var cnt = pq.Count - 1; var i = ndx;
      for (var ni = i + i + 1; ni < cnt; ni = ni + ni + 1) {
        var oi = i; var lk = pq[ni].k; var rk = pq[ni + 1].k;
        var nk = k;
        if (k > lk) { i = ni; nk = lk; }
        if (nk > rk) { ni += 1; i = ni; }
        if (i != oi) pq[oi] = pq[i]; else break;
      }
      pq[i] = new HeapEntry(k, v);
    }
    private void rplcmin(KeyT k, V v) {
      if (pq.Count > 1) siftdown(k, v, 0); }
    public static MinHeapPQ<V> empty { get { return new MinHeapPQ<V>(); } }
    public static Tuple<KeyT, V> peekMin(MinHeapPQ<V> pq) { return pq.pkmn; }
    public static MinHeapPQ<V> push(KeyT k, V v, MinHeapPQ<V> pq) {
      pq.psh(k, v); return pq; }
    public static MinHeapPQ<V> replaceMin(KeyT k, V v, MinHeapPQ<V> pq) {
      pq.rplcmin(k, v); return pq; }
}

Restricted Base Primes Queue

The following code implements an improved version of the odds-only O'Neil algorithm, which provides the improvements of only adding base prime composite number streams to the queue when the sieved number reaches the square of the base prime (saving a huge amount of memory and considerable execution time, including not needing as wide a range of a type for the internal prime numbers) as well as minimizing stream processing using fusion:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesPQ : IEnumerable<PrimeT> {
    private IEnumerator<PrimeT> nmrtr() {
      MinHeapPQ<PrimeT> pq = MinHeapPQ<PrimeT>.empty;
      PrimeT bp = 3; PrimeT q = 9;
      IEnumerator<PrimeT> bps = null;
      yield return 2; yield return 3;
      for (var n = (PrimeT)5; ; n += 2) {
        if (n >= q) { // always equal or less...
          if (q <= 9) {
            bps = nmrtr();
            bps.MoveNext(); bps.MoveNext(); } // move to 3...
          bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;
          var adv = bp + bp; bp = nbp;
          pq = MinHeapPQ<PrimeT>.push(n + adv, adv, pq);
        }
        else {
          var pk = MinHeapPQ<PrimeT>.peekMin(pq);
          var ck = (pk == null) ? q : pk.Item1;
          if (n >= ck) {
            do { var adv = pk.Item2;
                  pq = MinHeapPQ<PrimeT>.replaceMin(ck + adv, adv, pq);
                  pk = MinHeapPQ<PrimeT>.peekMin(pq); ck = pk.Item1;
            } while (n >= ck);
          }
          else yield return n;
        }
      }
    }
    public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

The above code is at least about 2.5 times faster than the Tree Folding version.

Dictionary (Hash table) Sieve

The above code adds quite a bit of overhead in having to provide a version of a Priority Queue for little advantage over a Dictionary (hash table based) version as per the code below:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesDict : IEnumerable<PrimeT> {
    private IEnumerator<PrimeT> nmrtr() {
      Dictionary<PrimeT, PrimeT> dct = new Dictionary<PrimeT, PrimeT>();
      PrimeT bp = 3; PrimeT q = 9;
      IEnumerator<PrimeT> bps = null;
      yield return 2; yield return 3;
      for (var n = (PrimeT)5; ; n += 2) {
        if (n >= q) { // always equal or less...
          if (q <= 9) {
            bps = nmrtr();
            bps.MoveNext(); bps.MoveNext();
          } // move to 3...
          bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;
          var adv = bp + bp; bp = nbp;
          dct.Add(n + adv, adv);
        }
        else {
          if (dct.ContainsKey(n)) {
            PrimeT nadv; dct.TryGetValue(n, out nadv); dct.Remove(n); var nc = n + nadv;
            while (dct.ContainsKey(nc)) nc += nadv;
            dct.Add(nc, nadv);
          }
          else yield return n;
        }
      }
    }
    public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

The above code runs in about three quarters of the time as the above Priority Queue based version for a range of a million primes which will fall even further behind for increasing ranges due to the Dictionary providing O(1) access times as compared to the O(log n) access times for the Priority Queue; the only slight advantage of the PQ based version is at very small ranges where the constant factor overhead of computing the table hashes becomes greater than the "log n" factor for small "n".

Best performance: CPU-Cache-Optimized Segmented Sieve

All of the above unbounded versions are really just an intellectual exercise as with very little extra lines of code above the fastest Dictionary based version, one can have an bit-packed page-segmented array based version as follows:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesPgd : IEnumerable<PrimeT> {
    private const int PGSZ = 1 << 14; // L1 CPU cache size in bytes
    private const int BFBTS = PGSZ * 8; // in bits
    private const int BFRNG = BFBTS * 2;
    public IEnumerator<PrimeT> nmrtr() {
      IEnumerator<PrimeT> bps = null;
      List<uint> bpa = new List<uint>();
      uint[] cbuf = new uint[PGSZ / 4]; // 4 byte words
      yield return 2;
      for (var lowi = (PrimeT)0; ; lowi += BFBTS) {
        for (var bi = 0; ; ++bi) {
          if (bi < 1) {
            if (bi < 0) { bi = 0; yield return 2; }
            PrimeT nxt = 3 + lowi + lowi + BFRNG;
            if (lowi <= 0) { // cull very first page
              for (int i = 0, p = 3, sqr = 9; sqr < nxt; i++, p += 2, sqr = p * p)
                if ((cbuf[i >> 5] & (1 << (i & 31))) == 0)
                  for (int j = (sqr - 3) >> 1; j < BFBTS; j += p)
                    cbuf[j >> 5] |= 1u << j;
            }
            else { // cull for the rest of the pages
              Array.Clear(cbuf, 0, cbuf.Length);
              if (bpa.Count == 0) { // inite secondar base primes stream
                bps = nmrtr(); bps.MoveNext(); bps.MoveNext();
                bpa.Add((uint)bps.Current); bps.MoveNext();
              } // add 3 to base primes array
              // make sure bpa contains enough base primes...
              for (PrimeT p = bpa[bpa.Count - 1], sqr = p * p; sqr < nxt; ) {
                p = bps.Current; bps.MoveNext(); sqr = p * p; bpa.Add((uint)p);
              }
              for (int i = 0, lmt = bpa.Count - 1; i < lmt; i++) {
                var p = (PrimeT)bpa[i]; var s = (p * p - 3) >> 1;
                // adjust start index based on page lower limit...
                if (s >= lowi) s -= lowi;
                else {
                  var r = (lowi - s) % p;
                  s = (r != 0) ? p - r : 0;
                }
                for (var j = (uint)s; j < BFBTS; j += p)
                  cbuf[j >> 5] |= 1u << ((int)j);
              }
            }
          }
          while (bi < BFBTS && (cbuf[bi >> 5] & (1 << (bi & 31))) != 0) ++bi;
          if (bi < BFBTS) yield return 3 + (((PrimeT)bi + lowi) << 1);
          else break; // outer loop for next page segment...
        }
      }
    }
    public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

The above code is about 25 times faster than the Dictionary version at computing the first about 50 million primes (up to a range of one billion), with the actual enumeration of the result sequence now taking longer than the time it takes to cull the composite number representation bits from the arrays, meaning that it is over 50 times faster at actually sieving the primes. The code owes its speed as compared to a naive "one huge memory array" algorithm to using an array size that is the size of the CPU L1 or L2 caches and using bit-packing to fit more number representations into this limited capacity; in this way RAM memory access times are reduced by a factor of from about four to about 10 (depending on CPU and RAM speed) as compared to those naive implementations, and the minor computational cost of the bit manipulations is compensated by a large factor in total execution time.

The time to enumerate the result primes sequence can be reduced somewhat (about a second) by removing the automatic iterator "yield return" statements and converting them into a "roll-your-own" IEnumerable<PrimeT> implementation, but for page segmentation of odds-only, this iteration of the results will still take longer than the time to actually cull the composite numbers from the page arrays.

In order to make further gains in speed, custom methods must be used to avoid using iterator sequences. If this is done, then further gains can be made by extreme wheel factorization (up to about another about four times gain in speed) and multi-processing (with another gain in speed proportional to the actual independent CPU cores used).

Note that all of these gains in speed are not due to C# other than it compiles to reasonably efficient machine code, but rather to proper use of the Sieve of Eratosthenes algorithm.

All of the above unbounded code can be tested by the following "main" method (replace the name "PrimesXXX" with the name of the class to be tested):

    static void Main(string[] args) {
      Console.WriteLine(new PrimesXXX().ElementAt(1000000 - 1)); // zero based indexing...
    }

To produce the following output for all tested versions (although some are considerably faster than others):

Output:

15485863

C++

Standard Library

This implementation follows the standard library pattern of std::iota. The start and end iterators are provided for the container. The destination container is used for marking primes and then filled with the primes which are less than the container size. This method requires no memory allocation inside the function.

#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>

// Fills the range [start, end) with 1 if the integer corresponding to the index is composite and 0 otherwise.
// requires: I is RandomAccessIterator
template<typename I>
void mark_composites(I start, I end)
{
    std::fill(start, end, 0);

    for (auto it = start + 1; it != end; ++it)
    {
        if (*it == 0)
        {
            auto prime = std::distance(start, it) + 1;
            // mark all multiples of this prime number as composite.
            auto multiple_it = it;
            while (std::distance(multiple_it, end) > prime)
            {
                std::advance(multiple_it, prime);
                *multiple_it = 1;
            }
        }
    }
}

// Fills "out" with the prime numbers in the range 2...N where N = distance(start, end).
// requires: I is a RandomAccessIterator
//           O is an OutputIterator
template <typename I, typename O>
O sieve_primes(I start, I end, O out)
{
    mark_composites(start, end);
    for (auto it = start + 1; it != end; ++it)
    {
        if (*it == 0)
        {
            *out = std::distance(start, it) + 1;
            ++out;
        }
    }
    return out;
}

int main()
{
    std::vector<uint8_t> is_composite(1000);
    sieve_primes(is_composite.begin(), is_composite.end(), std::ostream_iterator<int>(std::cout, " "));

    // Alternative to store in a vector: 
    // std::vector<int> primes;
    // sieve_primes(is_composite.begin(), is_composite.end(), std::back_inserter(primes));
}

Boost

// yield all prime numbers less than limit. 
template<class UnaryFunction>
void primesupto(int limit, UnaryFunction yield)
{
  std::vector<bool> is_prime(limit, true);
  
  const int sqrt_limit = static_cast<int>(std::sqrt(limit));
  for (int n = 2; n <= sqrt_limit; ++n)
    if (is_prime[n]) {
	yield(n);

	for (unsigned k = n*n, ulim = static_cast<unsigned>(limit); k < ulim; k += n) 
      //NOTE: "unsigned" is used to avoid an overflow in `k+=n` for `limit` near INT_MAX
	  is_prime[k] = false;
    }

  for (int n = sqrt_limit + 1; n < limit; ++n)
    if (is_prime[n])
	yield(n);
}

Full program:

Works with: Boost

/**
   $ g++ -I/path/to/boost sieve.cpp -o sieve && sieve 10000000
 */
#include <inttypes.h> // uintmax_t
#include <limits>
#include <cmath>
#include <iostream>
#include <sstream>
#include <vector>

#include <boost/lambda/lambda.hpp>

int main(int argc, char *argv[])
{
  using namespace std;
  using namespace boost::lambda;

  int limit = 10000;
  if (argc == 2) {
    stringstream ss(argv[--argc]);
    ss >> limit;

    if (limit < 1 or ss.fail()) {
      cerr << "USAGE:\n  sieve LIMIT\n\nwhere LIMIT in the range [1, " 
	   << numeric_limits<int>::max() << ")" << endl;
      return 2;
    }
  }

  // print primes less then 100
  primesupto(100, cout << _1 << " ");
  cout << endl;  

  // find number of primes less then limit and their sum
  int count = 0;
  uintmax_t sum = 0;
  primesupto(limit, (var(sum) += _1, var(count) += 1));

  cout << "limit sum pi(n)\n" 
       << limit << " " << sum << " " << count << endl;
}

Chapel

This example is incorrect. Please fix the code and remove this message.

Details: Doesn't compile since at least Chapel version 1.20 to 1.24.1.

This solution uses nested iterators to create new wheels at run time:

// yield prime and remove all multiples of it from children sieves
iter sieve(prime):int {

        yield prime;

        var last = prime;
        label candidates for candidate in sieve(prime+1) do {
                for composite in last..candidate by prime do {

                        // candidate is a multiple of this prime
                        if composite == candidate {
                                // remember size of last composite
                                last = composite;
                                // and try the next candidate
                                continue candidates;
                        }
                }

                // candidate cannot need to be removed by this sieve
                // yield to parent sieve for checking
                yield candidate;
        }
}

The topmost sieve needs to be started with 2 (the smallest prime):

config const N = 30;
for p in sieve(2) {
        if p > N {
                writeln();
                break;
        }
        write(" ", p);
}

Alternate Conventional Bit-Packed Implementation

The following code implements the conventional monolithic (one large array) Sieve of Eratosthenes where the representations of the numbers use only one bit per number, using an iteration for output so as to not require further memory allocation:

compile with the `--fast` option

use Time;
use BitOps;

type Prime = uint(32);

config const limit: Prime = 1000000000; // sieve limit

proc main() {
  write("The first 25 primes are:  ");
  for p in primes(100) do write(p, " "); writeln();
  
  var count = 0; for p in primes(1000000) do count += 1;
  writeln("Count of primes to a million is:  ", count, ".");
  
  var timer: Timer;
  timer.start();

  count = 0;
  for p in primes(limit) do count += 1;

  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

iter primes(n: Prime): Prime {
  const szlmt = n / 8;
  var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up

  for bp in 2 .. n {
    if cmpsts[bp >> 3] & (1: uint(8) << (bp & 7)) == 0 {
      const s0 = bp * bp;
      if s0 > n then break;
      for c in s0 .. n by bp { cmpsts[c >> 3] |= 1: uint(8) << (c & 7); }
    }
  }

  for p in 2 .. n do if cmpsts[p >> 3] & (1: uint(8) << (p & 7)) == 0 then yield p;

}

Output:

The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Count of primes to a million is:  78498.
Found 50847534 primes up to 1000000000 in 7964.05 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

Alternate Odds-Only Bit-Packed Implementation

use Time;
use BitOps;

type Prime = int(32);

config const limit: Prime = 1000000000; // sieve limit

proc main() {
  write("The first 25 primes are:  ");
  for p in primes(100) do write(p, " "); writeln();
  
  var count = 0; for p in primes(1000000) do count += 1;
  writeln("Count of primes to a million is:  ", count, ".");
  
  var timer: Timer;
  timer.start();

  count = 0;
  for p in primes(limit) do count += 1;

  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

iter primes(n: Prime): Prime {
  const ndxlmt = (n - 3) / 2;
  const szlmt = ndxlmt / 8;
  var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up

  for i in 0 .. ndxlmt { // never gets to the end!
    if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 {
      const bp = i + i + 3;
      const s0 = (bp * bp - 3) / 2;
      if s0 > ndxlmt then break;
      for s in s0 .. ndxlmt by bp do cmpsts[s >> 3] |= 1: uint(8) << (s & 7);
    }
  }

  yield 2;
  for i in 0 .. ndxlmt do
    if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 then yield i + i + 3;

}

Output:

The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Count of primes to a million is:  78498.
Found 50847534 primes up to 1000000000 in 4008.16 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, sieving odds-only is about twice as fast due to the reduced number of operations; it also uses only half the amount of memory. However, this is still not all that fast at about 14.4 CPU clock cycles per sieve culling operation due to the size of the array exceeding the CPU cache size(s).

Hash Table Based Odds-Only Version

Translation of: Python

code link

Works with: Chapel version 1.25.1

use Time;

config const limit = 100000000;

type Prime = uint(32);

class Primes { // needed so we can use next to get successive values
  var n: Prime; var obp: Prime; var q: Prime;
  var bps: owned Primes?;
  var keys: domain(Prime); var dict: [keys] Prime;
  proc next(): Prime { // odd primes!
    if this.n < 5 { this.n = 5; return 3; }
    if this.bps == nil {
      this.bps = new Primes(); // secondary odd base primes feed
      this.obp = this.bps!.next(); this.q = this.obp * this.obp;
    }
    while true {
      if this.n >= this.q { // advance secondary stream of base primes...
        const adv = this.obp * 2; const key = this.q + adv;
        this.obp = this.bps!.next(); this.q = this.obp * this.obp;       
        this.keys += key; this.dict[key] = adv;
      }
      else if this.keys.contains(this.n) { // found a composite; advance...
        const adv = this.dict[this.n]; this.keys.remove(this.n);
        var nkey = this.n + adv;
        while this.keys.contains(nkey) do nkey += adv;
        this.keys += nkey; this.dict[nkey] = adv;
      }
      else { const p = this.n; this.n += 2; return p; }
      this.n += 2;
    }
    return 0; // to keep compiler happy in returning a value!
  }
  iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
  var count = 0;
  write("The first 25 primes are:  ");
  for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
  writeln();
  
  var timer: Timer;
  timer.start();

  count = 0;
  for p in new Primes() { if p > limit then break; count += 1; }

  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

Output:

The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Found 5761455 primes up to 100000000 in 5195.41 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, this is much slower than the array based versions but much faster than previous Chapel version code as the hashing has been greatly improved.

As an alternate to the use of a built-in library, the following code implements a specialized BasePrimesTable that works similarly to the way the Python associative arrays work as to hashing algorithm used (no hashing, as the hash values for integers are just themselves) and something similar to the Python method of handling hash table collisions is used:

Works with: Chapel version 1.25.1

Compile with the `--fast` compiler command line option

use Time;
 
config const limit = 100000000;
 
type Prime = uint(32);

record BasePrimesTable { // specialized for the use here...
  record BasePrimeEntry { var fullkey: Prime; var val: Prime; }
  var cpcty: int = 8; var sz: int = 0;
  var dom = { 0 .. cpcty - 1 }; var bpa: [dom] BasePrimeEntry;
  proc grow() {   
    const ndom = dom; var cbpa: [ndom] BasePrimeEntry = bpa[ndom];
    bpa = new BasePrimeEntry(); cpcty *= 2; dom = { 0 .. cpcty - 1 };
    for kv in cbpa do if kv.fullkey != 0 then add(kv.fullkey, kv.val);   
  }
  proc find(k: Prime): int { // internal get location of value or -1
    const msk = cpcty - 1; var skey = k: int & msk;
    var perturb = k: int; var loop = 8;
    do {
      if bpa[skey].fullkey == k then return skey;
      perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
      loop -= 1; if perturb > 0 then loop = 8;
    } while loop > 0;
    return -1; // not found!
  }
  proc contains(k: Prime): bool { return find(k) >= 0; }
  proc add(k, v: Prime) { // if exists then replaces else new entry
    const fndi = find(k);
    if fndi >= 0 then bpa[fndi] = new BasePrimeEntry(k, v);
    else {
      sz += 1; if 2 * sz > cpcty then grow();
      const msk = cpcty - 1; var skey = k: int & msk;
      var perturb = k: int; var loop = 8;
      do {
        if bpa[skey].fullkey == 0 {
          bpa[skey] = new BasePrimeEntry(k, v); return; }
        perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
        loop -= 1; if perturb > 0 then loop = 8;
      } while loop > 0;
    }
  }
  proc remove(k: Prime) { // if doesn't exist does nothing
    const fndi = find(k);
    if fndi >= 0 { bpa[fndi].fullkey = 0; sz -= 1; }
  }
  proc this(k: Prime): Prime { // returns value or 0 if not found
    const fndi = find(k);
    if fndi < 0 then return 0; else return bpa[fndi].val;
  }
}

class Primes { // needed so we can use next to get successive values
  var n: Prime; var obp: Prime; var q: Prime;
  var bps: shared Primes?; var dict = new BasePrimesTable();
  proc next(): Prime { // odd primes!
    if this.n < 5 { this.n = 5; return 3; }
    if this.bps == nil {
      this.bps = new Primes(); // secondary odd base primes feed
      this.obp = this.bps!.next(); this.q = this.obp * this.obp;
    }
    while true {
      if this.n >= this.q { // advance secondary stream of base primes...
        const adv = this.obp * 2; const key = this.q + adv;
        this.obp = this.bps!.next(); this.q = this.obp * this.obp;
        this.dict.add(key, adv);
      }
      else if this.dict.contains(this.n) { // found a composite; advance...
        const adv = this.dict[this.n]; this.dict.remove(this.n);
        var nkey = this.n + adv;
        while this.dict.contains(nkey) do nkey += adv;
        this.dict.add(nkey, adv);
      }
      else { const p = this.n; this.n += 2; return p; }
      this.n += 2;
    }
    return 0; // to keep compiler happy in returning a value!
  }
  iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
  var count = 0;
  write("The first 25 primes are:  ");
  for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
  writeln();
 
  var timer: Timer;
  timer.start();
 
  count = 0;
  for p in new Primes() { if p > limit then break; count += 1; }
 
  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

Output:

The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Found 5761455 primes up to 100000000 in 2351.79 milliseconds.

This last code is quite usable up to a hundred million (as here) or even a billion in a little over ten times the time, but is still slower than the very simple odds-only monolithic array version and is also more complex, although it uses less memory (only for the hash table for the base primes of about eight Kilobytes for sieving to a billion compared to over 60 Megabytes for the monolithic odds-only simple version).

Chapel version 1.25.1 provides yet another option as to the form of the code although the algorithm is the same in that one can now override the hashing function for Chapel records so that they can be used as the Key Type for Hash Map's as follows:

Works with: Chapel version 1.25.1

Compile with the `--fast` compiler command line option

use Time;

use Map;
 
config const limit = 100000000;
 
type Prime = uint(32);
 
class Primes { // needed so we can use next to get successive values
  record PrimeR { var prime: Prime; proc hash() { return prime; } }
  var n: PrimeR = new PrimeR(0); var obp: Prime; var q: Prime;
  var bps: owned Primes?;
  var dict = new map(PrimeR, Prime);
  proc next(): Prime { // odd primes!
    if this.n.prime < 5 { this.n.prime = 5; return 3; }
    if this.bps == nil {
      this.bps = new Primes(); // secondary odd base primes feed
      this.obp = this.bps!.next(); this.q = this.obp * this.obp;
    }
    while true {
      if this.n.prime >= this.q { // advance secondary stream of base primes...
        const adv = this.obp * 2; const key = new PrimeR(this.q + adv);
        this.obp = this.bps!.next(); this.q = this.obp * this.obp;       
        this.dict.add(key, adv);
      }
      else if this.dict.contains(this.n) { // found a composite; advance...
        const adv = this.dict.getValue(this.n); this.dict.remove(this.n);
        var nkey = new PrimeR(this.n.prime + adv);
        while this.dict.contains(nkey) do nkey.prime += adv;
        this.dict.add(nkey, adv);
      }
      else { const p = this.n.prime;
             this.n.prime += 2; return p; }
      this.n.prime += 2;
    }
    return 0; // to keep compiler happy in returning a value!
  }
  iter these(): Prime { yield 2; while true do yield this.next(); }
}
 
proc main() {
  var count = 0;
  write("The first 25 primes are:  ");
  for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
  writeln();
 
  var timer: Timer;
  timer.start();
 
  count = 0;
  for p in new Primes() { if p > limit then break; count += 1; }
 
  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

This works in about exactly the same time as the last previous code, but doesn't require special custom adaptations of the associative array so that the standard library Map can be used.

Functional Tree Folding Odds-Only Version

Chapel isn't really a very functional language even though it has some functional forms of code in the Higher Order Functions (HOF's) of zippered, scanned, and reduced, iterations and has first class functions (FCF's) and lambdas (anonymous functions), these last can't be closures (capture variable bindings from external scope(s)), nor can the work around of using classes to emulate closures handle recursive (Y-combinator type) variable bindings using reference fields (at least currently with version 1.22). However, the Tree Folding add-on to the Richard Bird lazy list sieve doesn't require any of the things that can't be emulated using classes, so a version is given as follows:

Translation of: Nim

code link

Works with: 1.22 version - compile with the --fast compiler command line flag for full optimization

use Time;

type Prime = uint(32);

config const limit = 1000000: Prime;

// Chapel doesn't have closures, so we need to emulate them with classes...
class PrimeCIS { // base prime stream...
  var head: Prime;
  proc next(): shared PrimeCIS { return new shared PrimeCIS(); }
}

class PrimeMultiples: PrimeCIS {
  var adv: Prime;
  override proc next(): shared PrimeCIS {
    return new shared PrimeMultiples(
      this.head + this.adv, this.adv): shared PrimeCIS; }
}

class PrimeCISCIS { // base stream of prime streams; never used directly...
  var head: shared PrimeCIS;
  proc init() { this.head = new shared PrimeCIS(); }
  proc next(): shared PrimeCISCIS {
    return new shared PrimeCISCIS(); }
}

class AllMultiples: PrimeCISCIS {
  var bps: shared PrimeCIS;
  proc init(bsprms: shared PrimeCIS) {
    const bp = bsprms.head; const sqr = bp * bp; const adv = bp + bp;
    this.head = new shared PrimeMultiples(sqr, adv): PrimeCIS;
    this.bps = bsprms;
  }
  override proc next(): shared PrimeCISCIS {
    return new shared AllMultiples(this.bps.next()): PrimeCISCIS; }
}

class Union: PrimeCIS {
  var feeda, feedb: shared PrimeCIS;
  proc init(fda: shared PrimeCIS, fdb: shared PrimeCIS) {
    const ahd = fda.head; const bhd = fdb.head;
    this.head = if ahd < bhd then ahd else bhd;
    this.feeda = fda; this.feedb = fdb;
  }
  override proc next(): shared PrimeCIS {    
    const ahd = this.feeda.head; const bhd = this.feedb.head;
    if ahd < bhd then
      return new shared Union(this.feeda.next(), this.feedb): shared PrimeCIS;
    if ahd > bhd then
      return new shared Union(this.feeda, this.feedb.next()): shared PrimeCIS;
    return new shared Union(this.feeda.next(),
                            this.feedb.next()): shared PrimeCIS;
  }
}

class Pairs: PrimeCISCIS {
  var feed: shared PrimeCISCIS;
  proc init(fd: shared PrimeCISCIS) {
    const fs = fd.head; const sss = fd.next(); const ss = sss.head;
    this.head = new shared Union(fs, ss): shared PrimeCIS; this.feed = sss;
  }
  override proc next(): shared PrimeCISCIS {
    return new shared Pairs(this.feed.next()): shared PrimeCISCIS; }
}

class Composites: PrimeCIS {
  var feed: shared PrimeCISCIS;
  proc init(fd: shared PrimeCISCIS) {
    this.head = fd.head.head; this.feed = fd;
  }
  override proc next(): shared PrimeCIS {
    const fs = this.feed.head.next();
    const prs = new shared Pairs(this.feed.next()): shared PrimeCISCIS;
    const ncs = new shared Composites(prs): shared PrimeCIS;
    return new shared Union(fs, ncs): shared PrimeCIS;
  }
}

class OddPrimesFrom: PrimeCIS {
  var cmpsts: shared PrimeCIS;
  override proc next(): shared PrimeCIS {
    var n = head + 2; var cs = this.cmpsts;
    while true {
      if n < cs.head then
        return new shared OddPrimesFrom(n, cs): shared PrimeCIS;
      n += 2; cs = cs.next();
    }
    return this.cmpsts; // never used; keeps compiler happy!
  }
}

class OddPrimes: PrimeCIS {
  proc init() { this.head = 3; }
  override proc next(): shared PrimeCIS {
    const bps = new shared OddPrimes(): shared PrimeCIS;
    const mlts = new shared AllMultiples(bps): shared PrimeCISCIS;
    const cmpsts = new shared Composites(mlts): shared PrimeCIS;
    return new shared OddPrimesFrom(5, cmpsts): shared PrimeCIS;
  }
}

iter primes(): Prime {
  yield 2; var cur = new shared OddPrimes(): shared PrimeCIS;
  while true { yield cur.head; cur = cur.next(); }
}

// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

var timer: Timer; timer.start(); cnt = 0;
for p in primes() { if p > limit then break; cnt += 1; }
timer.stop(); write("\nFound ", cnt, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");

Output:

The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes up to 1000000 in 344.859 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

The above code is really just a toy example to show that Chapel can handle some tasks functionally (within the above stated limits) although doing so is slower than the Hash Table version above and also takes more memory as the nested lazy list structure consumes more memory in lazy list links and "plumbing" than does the simple implementation of a Hash Table. It also has a worst asymptotic performance with an extra `log(n)` factor where `n` is the sieving range; this can be shown by running the above program with `--limit=10000000` run time command line option to sieve to ten million which takes about 4.5 seconds to count the primes up to ten million (a factor of ten higher range, but much higher than the expected increased factor of about 10 per cent extra as per the Hash Table version with about 20 per cent more operations times the factor of ten for this version). Other than for the extra operations, this version is generally slower due to the time to do the many small allocations/de-allocations of the functional object instances, and this will be highly dependent on the platform on which it is run: cygwin on Windows may be particularly slow due to the extra level of indirection, and some on-line IDE's may also be slow due to their level of virtualization.

A Multi-Threaded Page-Segmented Odds-Only Bit-Packed Version

To take advantage of the features that make Chapel shine, we need to use it to do some parallel computations, and to efficiently do that for the Sieve of Eratosthenes, we need to divide the work into page segments where we can assign each largish segment to a separate thread/task; this also improves the speed due to better cache associativity with most memory accesses to values that are already in the cache(s). Once we have divided the work, Chapel offers lots of means to implement the parallelism but to be a true Sieve of Eratosthenes, we need to have the ability to output the results in order; with many of the convenience mechanisms not doing that, the best/simplest option is likely task parallelism with the output results assigned to an rotary indexed array containing the `sync` results. It turns out that, although the Chapel compiler can sometimes optimize the code so the overhead of creating tasks is not onerous, for this case where the actual tasks are somewhat complex, the compiler can't recognize that an automatically generated thread pool(s) are required so we need to generate the thread pool(s) manually. The code that implements the multi-threading of page segments using thread pools is as follows:

Works with: 1.24.1 version - compile with the --fast compiler command line flag for full optimization

use Time; use BitOps; use CPtr;

type Prime = uint(64);
type PrimeNdx = int(64);
type BasePrime = uint(32);

config const LIMIT = 1000000000: Prime;

config const L1 = 16; // CPU L1 cache size in Kilobytes (1024);
assert (L1 == 16 || L1 == 32 || L1 == 64,
        "L1 cache size must be 16, 32, or 64 Kilobytes!");
config const L2 = 128; // CPU L2 cache size in Kilobytes (1024);
assert (L2 == 128 || L2 == 256 || L2 == 512,
        "L2 cache size must be 128, 256, or 512 Kilobytes!");
const CPUL1CACHE: int = L1 * 1024 * 8; // size in bits!
const CPUL2CACHE: int = L2 * 1024 * 8; // size in bits!
config const NUMTHRDS = here.maxTaskPar;
assert(NUMTHRDS >= 1, "NUMTHRDS must be at least one!");

const WHLPRMS = [ 2: Prime, 3: Prime, 5: Prime, 7: Prime,
                            11: Prime, 13: Prime, 17: Prime];
const FRSTSVPRM = 19: Prime; // past the pre-cull primes!
// 2 eliminated as even; 255255 in bytes...
const WHLPTRNSPN = 3 * 5 * 7 * 11 * 13 * 17;
// rounded up to next 64-bit boundary plus a 16 Kilobyte buffer for overflow...
const WHLPTRNBTSZ = ((WHLPTRNSPN * 8 + 63) & (-64)) + 131072;

// number of base primes within small span!
const SZBPSTRTS = 6542 - WHLPRMS.size + 1; // extra one for marker!
// number of base primes for CPU L1 cache buffer!
const SZMNSTRTS = (if L1 == 16 then 12251 else
                     if L1 == 32 then 23000 else 43390)
                       - WHLPRMS.size + 1; // extra one for marker!

// using this Look Up Table faster than bit twiddling...
const bitmsk = for i in 0 .. 7 do 1:uint(8) << i;

var WHLPTRN: SieveBuffer = new SieveBuffer(WHLPTRNBTSZ); fillWHLPTRN(WHLPTRN);
proc fillWHLPTRN(ref wp: SieveBuffer) {
  const hi = WHLPRMS.size - 1;
  const rng = 0 .. hi; var whlhd = new shared BasePrimeArr({rng});
  // contains wheel pattern primes skipping the small wheel prime (2)!...
  // never advances past the first base prime arr as it ends with a huge!...
  for i in rng do whlhd.bparr[i] = (if i != hi then WHLPRMS[i + 1] // skip 2!
                                    else 0x7FFFFFFF): BasePrime; // last huge!
  var whlbpas = new shared BasePrimeArrs(whlhd);
  var whlstrts = new StrtsArr({rng});
  wp.cull(0, WHLPTRNBTSZ, whlbpas, whlstrts);
  // eliminate wheel primes from the WHLPTRN buffer!...
  wp.cmpsts[0] = 0xFF: uint(8);
}

// the following two must be classes for compability with sync...
class PrimeArr { var dom = { 0 .. -1 }; var prmarr: [dom] Prime; }
class BasePrimeArr { var dom = { 0 .. -1 }; var bparr: [dom] BasePrime; }
record StrtsArr { var dom = { 0 .. -1 }; var strtsarr: [dom] int(32); }
record SieveBuffer {
  var dom = { 0 .. -1 }; var cmpsts: [dom] uint(8) = 0;
  proc init() {}
  proc init(btsz: int) { dom = { 0 .. btsz / 8 - 1 }; }
  proc deinit() { dom = { 0 .. -1 }; }

  proc fill(lwi: PrimeNdx) { // fill from the WHLPTRN stamp...
    const sz = cmpsts.size; const mvsz = min(sz, 16384);
    var mdlo = ((lwi / 8) % (WHLPTRNSPN: PrimeNdx)): int;
    for i in 0 .. sz - 1 by 16384 {
      c_memcpy(c_ptrTo(cmpsts[i]): c_void_ptr,
               c_ptrTo(WHLPTRN.cmpsts[mdlo]): c_void_ptr, mvsz);    
      mdlo += 16384; if mdlo >= WHLPTRNSPN then mdlo -= WHLPTRNSPN;
    }
  }

  proc count(btlmt: int) { // count by 64 bits using CPU popcount...
    const lstwrd = btlmt / 64; const lstmsk = (-2):uint(64) << (btlmt & 63);
    const cmpstsp = c_ptrTo(cmpsts: [dom] uint(8)): c_ptr(uint(64));
    var i = 0; var cnt = (lstwrd * 64 + 64): int;
    while i < lstwrd { cnt -= popcount(cmpstsp[i]): int; i += 1; }
    return cnt - popcount(cmpstsp[lstwrd] | lstmsk): int;    
  }

  // most of the time is spent doing culling operations as follows!...
  proc cull(lwi: PrimeNdx, bsbtsz: int, bpas: BasePrimeArrs,
                                        ref strts: StrtsArr) {
    const btlmt = cmpsts.size * 8 - 1; const bplmt = bsbtsz / 32;
    const ndxlmt = lwi: Prime + btlmt: Prime; // can't overflow!
    const strtssz = strts.strtsarr.size;
    // C pointer for speed magic!...
    const cmpstsp = c_ptrTo(cmpsts[0]);
    const strtsp = c_ptrTo(strts.strtsarr);

    // first fill the strts array with pre-calculated start addresses...
    var i = 0; for bp in bpas {
      // calculate page start address for the given base prime...
      const bpi = bp: int; const bbp = bp: Prime; const ndx0 = (bbp - 3) / 2;
      const s0 = (ndx0 + ndx0) * (ndx0 + 3) + 3; // can't overflow!
      if s0 > ndxlmt then {
        if i < strtssz then strtsp[i] = -1: int(32); break; }
      var s = 0: int;
      if s0 >= lwi: Prime then s = (s0 - lwi: Prime): int;
      else { const r = (lwi: Prime - s0) % bbp;
            if r == 0 then s = 0: int; else s = (bbp - r): int; };
      if i < strtssz - 1 { strtsp[i] = s: int(32); i += 1; continue; }
      if i < strtssz { strtsp[i] = -1; i = strtssz; }
      // cull the full buffer for this given base prime as usual...
      // only works up to limit of int(32)**2!!!!!!!!
      while s <= btlmt { cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
    }

    // cull the smaller sub buffers according to the strts array...
    for sbtlmt in bsbtsz - 1 .. btlmt by bsbtsz {
      i = 0; for bp in bpas { // bp never bigger than uint(32)!
        // cull the sub buffer for this given base prime...
        var s = strtsp[i]: int; if s < 0 then break;
        var bpi = bp: int; var nxt = 0x7FFFFFFFFFFFFFFF;
        if bpi <= bplmt { // use loop "unpeeling" for a small improvement...
          const slmt = s + bpi * 8 - 1;
          while s <= slmt {
            const bmi = s & 7; const msk = bitmsk[bmi];
            var c = s >> 3; const clmt = sbtlmt >> 3;
            while c <= clmt { cmpstsp[c] |= msk; c += bpi; }
            nxt = min(nxt, (c << 3): int(64) | bmi: int(64)); s += bpi;
          }
          strtsp[i] = nxt: int(32); i += 1;
        }
        else { while s <= sbtlmt { // standard cull loop...
                 cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
               strtsp[i] = s: int(32); i += 1; }
      }
    }
  }
}

// a generic record that contains a page result generating function;
// allows manual iteration through the use of the next() method;
// multi-threaded through the use of a thread pool...
class PagedResults {
  const cnvrtrclsr; // output converter closure emulator, (lwi, sba) => output
  var lwi: PrimeNdx; var bsbtsz: int;
  var bpas: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
  var sbs: [ 0 .. NUMTHRDS - 1 ] SieveBuffer = new SieveBuffer();
  var strts: [ 0 .. NUMTHRDS - 1 ] StrtsArr = new StrtsArr();
  var qi: int = 0;
  var wrkq$: [ 0 .. NUMTHRDS - 1 ] sync PrimeNdx;
  var rsltsq$: [ 0 .. NUMTHRDS - 1 ] sync cnvrtrclsr(lwi, sbs(0)).type;

  proc init(cvclsr, li: PrimeNdx, bsz: int) {
    cnvrtrclsr = cvclsr; lwi = li; bsbtsz = bsz; }

  proc deinit() { // kill the thread pool when out of scope...
    if bpas == nil then return; // no thread pool!
    for i in wrkq$.domain {
      wrkq$[i].writeEF(-1); while true { const r = rsltsq$[i].readFE();
        if r == nil then break; }
    }
  }
 
  proc next(): cnvrtrclsr(lwi, sbs(0)).type {   
    proc dowrk(ri: int) { // used internally!...
      while true {
        const li = wrkq$[ri].readFE(); // following to kill thread!
        if li < 0 { rsltsq$[ri].writeEF(nil: cnvrtrclsr(li, sbs(ri)).type); break; }
        sbs[ri].fill(li);
        sbs[ri].cull(li, bsbtsz, bpas!, strts[ri]);
        rsltsq$[ri].writeEF(cnvrtrclsr(li, sbs[ri]));
      }
    }
    if this.bpas == nil { // init on first use; avoids data race!
      this.bpas = new BasePrimeArrs();
      if this.bsbtsz < CPUL1CACHE {
        this.sbs = new SieveBuffer(bsbtsz);
        this.strts = new StrtsArr({0 .. SZBPSTRTS - 1});
      }
      else {
        this.sbs = new SieveBuffer(CPUL2CACHE);
        this.strts = new StrtsArr({0 .. SZMNSTRTS - 1});
      }
      // start threadpool and give it inital work...
      for i in rsltsq$.domain {
        begin with (const in i) dowrk(i);
        this.wrkq$[i].writeEF(this.lwi); this.lwi += this.sbs[i].cmpsts.size * 8;
      }
    }
    const rslt = this.rsltsq$[qi].readFE();
    this.wrkq$[qi].writeEF(this.lwi);
    this.lwi += this.sbs[qi].cmpsts.size * 8;
    this.qi = if qi >= NUMTHRDS - 1 then 0 else qi + 1;
    return rslt;
  }
 
  iter these() { while lwi >= 0 do yield next(); }
}

// the sieve buffer to base prime array converter closure...
record SB2BPArr {  
  proc this(lwi: PrimeNdx, sb: SieveBuffer): shared BasePrimeArr? {
    const bsprm = (lwi + lwi + 3): BasePrime;
    const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
    var arr = new shared BasePrimeArr({ 0 .. sb.count(szlmt) - 1 });
    while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 {
                        arr.bparr[j] = bsprm + (i + i): BasePrime; j += 1; }
                      i += 1; }
    return arr;
  }
}

// a memoizing lazy list of BasePrimeArr's...
class BasePrimeArrs {
  var head: shared BasePrimeArr;
  var tail: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
  var lock$: sync bool = true;
  var feed: shared PagedResults(SB2BPArr) =
    new shared PagedResults(new SB2BPArr(), 65536, 65536);

  proc init() { // make our own first array to break data race!
    var sb = new SieveBuffer(256); sb.fill(0);
    const sb2 = new SB2BPArr();
    head = sb2(0, sb): shared BasePrimeArr;
    this.complete(); // fake base primes!
    sb = new SieveBuffer(65536); sb.fill(0);
    // use (completed) self as source of base primes!
    var strts = new StrtsArr({ 0 .. 256 });
    sb.cull(0, 65536, this, strts);
    // replace head with new larger version culled using fake base primes!...
    head = sb2(0, sb): shared BasePrimeArr;
  }

  // for initializing for use by the fillWHLPTRN proc...
  proc init(hd: shared BasePrimeArr) {
    head = hd; feed = new shared PagedResults(new SB2BPArr(), 0, 0);
  }

  // for initializing lazily extended list as required...
  proc init(hd: shared BasePrimeArr, fd: PagedResults) { head = hd; feed = fd; }

  proc next(): shared BasePrimeArrs {
    if this.tail == nil { // in case other thread slipped through!     
      if this.lock$.readFE() && this.tail == nil { // empty sync -> block others!
        const nhd = this.feed.next(): shared BasePrimeArr;
        this.tail = new shared BasePrimeArrs(nhd , this.feed);
      }
      this.lock$.writeEF(false); // fill the sync so other threads can do nothing!
    }
    return this.tail: shared BasePrimeArrs; // necessary cast!
  }
 
  iter these(): BasePrime {
    for bp in head.bparr do yield bp; var cur = next();
    while true {
      for bp in cur.head.bparr do yield bp; cur = cur.next(); }
  }
}

record SB2PrmArr {
  proc this(lwi: PrimeNdx, sb: SieveBuffer): shared PrimeArr? {
    const bsprm = (lwi + lwi + 3): Prime;
    const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
    var arr = new shared PrimeArr({0 .. sb.count(szlmt) - 1});
    while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 then {
                        arr.prmarr[j] = bsprm + (i + i): Prime; j += 1; }
                      i += 1; }
    return arr;
  }
}

iter primes(): Prime {
  for p in WHLPRMS do yield p: Prime;
  for pa in new shared PagedResults(new SB2PrmArr(), 0, CPUL1CACHE) do
    for p in pa!.prmarr do yield p;
}

// use a class so that it can be used as a generic sync value!...
class CntNxt { const cnt: int; const nxt: PrimeNdx; }

// a class that emulates a closure and a return value...
record SB2Cnt {
  const nxtlmt: PrimeNdx;
  proc this(lwi: PrimeNdx, sb: SieveBuffer): shared CntNxt? {
    const btszlmt = sb.cmpsts.size * 8 - 1; const lstndx = lwi + btszlmt: PrimeNdx;
    const btlmt = if lstndx > nxtlmt then max(0, (nxtlmt - lwi): int) else btszlmt;
    return new shared CntNxt(sb.count(btlmt), lstndx);
  }
}

// couut primes to limit, just like it says...
proc countPrimesTo(lmt: Prime): int(64) {
  const nxtlmt = ((lmt - 3) / 2): PrimeNdx; var count = 0: int(64);
  for p in WHLPRMS { if p > lmt then break; count += 1; }
  if lmt < FRSTSVPRM then return count;
  for cn in new shared PagedResults(new SB2Cnt(nxtlmt), 0, CPUL1CACHE) {
    count += cn!.cnt: int(64); if cn!.nxt >= nxtlmt then break;
  }
  return count;
}

// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }

cnt = 0; for p in primes() { if p > 1000000 then break; cnt += 1; }
writeln("\nThere are ", cnt, " primes up to a million.");

write("Sieving to ", LIMIT, " with ");
write("CPU L1/L2 cache sizes of ", L1, "/", L2, " KiloBytes ");
writeln("using ", NUMTHRDS, " threads.");

var timer: Timer; timer.start();
// the slow way!:
// var count = 0; for p in primes() { if p > LIMIT then break; count += 1; }
const count = countPrimesTo(LIMIT); // the fast way!
timer.stop();

write("Found ", count, " primes up to ", LIMIT);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");

Output:

The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to a million.
Sieving to 1000000000 with CPU L1/L2 cache sizes of 16/128 KiloBytes using 4 threads.
Found 50847534 primes up to 1000000000 in 128.279 milliseconds.

Time as run using Chapel version 1.24.1 on an Intel Skylake i5-6500 at 3.2 GHz (base, multi-threaded).

Note that the above code does implement some functional concepts as in a memoized lazy list of base prime arrays, but as this is used at the page level, the slowish performance doesn't impact the overall execution time much and the code is much more elegant in using this concept such that we compute new pages of base primes as they are required for increasing range.

Some of the most tricky bits due to having thread pools is stopping and de-initializing when they go out of scope; this is done by the `deinit` method of the `PagedResults` generic class, and was necessary to prevent a segmentation fault when the thread pool goes out of scope.

The tight inner loops for culling composite number representations have been optimized to some extent in using "loop unpeeling" for smaller base primes to simplify the loops down to simple masking by a constant with eight separate loops for the repeating pattern over bytes and culling by sub buffer CPU L1 cache sizes over the outer sieve buffer size of the CPU L2 cache size in order to make the task work-sized chunks larger for less task context switching overheads and for reduced time lost to culling start address calculations per base prime (which needs to use some integer division that is always slower than other integer operations). This last optimization allows for reasonably efficient culling up to the square of the CPU L2 cache size in bits or 1e12 for the one Megabit CPU L2 cache size many mid-range Intel CPU's have currently when used for multi-threading (half of the actual size for Hyper-Threaded - HT - threads as they share both the L1 and the L2 caches over the pairs of Hyper-Threaded (HT) threads per core).

Although this code can be used for much higher sieving ranges, it is not recommended due to not yet being tuned for better efficiency above 1e12; there are no checks limiting the user to this range, but, as well as decreasing efficiency for sieving limits much higher than this, at some point there will be errors due to integer overflows but these will be for huge sieving ranges taking days -> weeks -> months -> years to execute on common desktop CPU's.

A further optimization used is to create a pre-culled `WHLPTRN` `SieveBuffer` where the odd primes (since we cull odds-only) of 3, 5, 7, 11, 13, and 17 have already been culled and using that to pre-fill the page segment buffers so that no culling by these base prime values is required, this reduces the number of operations by about 45% compared to if it wasn't done but the ratio of better performance is only about 34.5% better as this changes the ratio of (fast) smaller base primes to larger (slower) ones.

All of the improvements to this point allow the shown performance as per the displayed output for the above program; using a command line argument of `--L1=32 --L2=256 --LIMIT=100000000000` (a hundred billion - 1e11 - on this computer, which has cache sizes of that amount and no Hyper-Threading - HT), it can count the primes to 1e11 in about 17.5 seconds using the above mentioned CPU. It will be over two times faster than this using a more modern desktop CPU such as the Intel Core i7-9700K which has twice as many effective cores, a higher CPU clock rate, is about 10% to 15% faster due the a more modern CPU architecture which is three generations newer. Of course using a top end AMD Threadripper CPU with its 64/128 cores/threads will be almost eight times faster again except that it will lose about 20% due to its slower clock speed when all cores/threads are used; note that high core CPU's will only give these speed gains for large sieving ranges such as 1e11 and above since otherwise there aren't enough work chunks to go around for all the threads available!

Incredibly, even run single threaded (argument of `--NUMTHRDS=1`) this implementation is only about 20% slower than the reference Sieve of Atkin "primegen/primespeed" implementation in counting the number of primes to a billion and is about 20% faster in counting the primes to a hundred billion (arguments of `--LIMIT=100000000000 --NUMTHRDS=1`) with both using the same size of CPU L1 cache buffer of 16 Kilobytes; This implementation does not yet have the level of wheel optimization of the Sieve of Atkin as it has only the limited wheel optimization of Odds-Only plus the use of the pre-cull fill. Maximum wheel factorization will reduce the number of operations for this code to less than about half the current number, making it faster than the Sieve of Atkin for all ranges, and approach the speed of Kim Walisch's "primesieve". However, not having primitive element pointers and pointer operations, there are some optimizations used that Kim Walisch's "primesieve" uses of extreme loop unrolling that mean that it can never quite reach the speed of "primeseive" by about 20% to 30%.

The above code is a fairly formidable benchmark, which I have also written in Fortran as in likely the major computer language that is comparable. I see that Chapel has the following advantages over Fortran:

1) It is somewhat cleaner to read and write code with more modern forms of expression, especially as to declaring variables/constants which can often be inferred as to type.

2) The Object Oriented Programming paradigm has been designed in from the beginning and isn't just an add-on that needs to be careful not to break legacy code; Fortran's method of expression this paradigm using modules seems awkward by comparison.

3) It has some more modern forms of automatic memory management as to type safety and sharing of allocated memory structures.

4) It has several modern forms of managing concurrency built in from the beginning rather than being add-on's or just being the ability to call through to OpenMP/MPI.

That said, it also as the following disadvantages, at least as I see it:

1) One of the worst things about Chapel is the slow compilation speed, which is about ten times slower than GNU gfortran.

2) It's just my personal opinion, but so much about forms of expression have been modernized and improved, it seems very dated to go back to using curly braces to delineate code blocks and semi-colons as line terminators; Most modern languages at least dispense with the latter.

3) Some programming features offered are still being defined, although most evolutionary changes now no longer are breaking code changes.

Speed isn't really an issue with either one, with some types of tasks better suited to one or the other but mostly about the same; for this particular task they are about the same if one were to implement the same algorithmic optimizations other than that one can do some of the extreme loop unrolling optimization with Fortran that can't be done with Chapel as Fortran has some limited form of pointers, although not the full set of pointer operators that C/C++ like languages have. I think that if both were optimized as much as each is capable, Fortran may run about 20% faster, perhaps due to the maturity of its compile and due to the availablity of (limited) pointer operations.

The primary additional optimization available to Chapel code is the addition of Maximum Wheel-Factorization as per my StackOverflow JavaScript Tutorial answer, with the other major improvement to add "bucket sieving" for sieving limits above about 1e12 so as to get reasonable efficiency up to 1e16 and above.

Clojure

(defn primes< [n]
   (remove (set (mapcat #(range (* % %) n %)
                        (range 2 (Math/sqrt n))))
           (range 2 n)))

The above is **not strictly a Sieve of Eratosthenes** as the composite culling ranges (in the mapcat) include all of the multiples of all of the numbers and not just the multiples of primes. When tested with (println (time (count (primes< 1000000)))), it takes about 5.5 seconds just to find the number of primes up to a million, partly because of the extra work due to the use of the non-primes, and partly because of the constant enumeration using sequences with multiple levels of function calls. Although very short, this code is likely only useful up to about this range of a million.

It may be written using the into #{} function to run slightly faster due to the set function being concerned with only distinct elements whereas the into #{} only does the conjunction, and even at that doesn't do that much as it does the conjunction to an empty sequence, the code as follows:

(defn primes< [n]
   (remove (into #{}
                 (mapcat #(range (* % %) n %)
                         (range 2 (Math/sqrt n))))
           (range 2 n)))

The above code is slightly faster for the reasons given, but is still not strictly a Sieve of Eratosthenes due to sieving by all numbers and not just by the base primes.

The following code also uses the into #{} transducer but has been slightly wheel-factorized to sieve odds-only:

(defn primes< [n]
   (if (< n 2) ()
     (cons 2 (remove (into #{}
                           (mapcat #(range (* % %) n %)
                                   (range 3 (Math/sqrt n) 2)))
                     (range 3 n 2)))))

The above code is a little over twice as fast as the non-odds-only due to the reduced number of operations. It still isn't strictly a Sieve of Eratosthenes as it sieves by all odd base numbers and not only by the base primes.

The following code calculates primes up to and including n using a mutable boolean array but otherwise entirely functional code; it is tens (to a hundred) times faster than the purely functional codes due to the use of mutability in the boolean array:

(defn primes-to
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [n]
  (let [root (-> n Math/sqrt long),
        cmpsts (boolean-array (inc n)),
        cullp (fn [p]
                (loop [i (* p p)]
                  (if (<= i n)
                    (do (aset cmpsts i true)
                        (recur (+ i p))))))]
    (do (dorun (map #(cullp %) (filter #(not (aget cmpsts %))
                                       (range 2 (inc root)))))
        (filter #(not (aget cmpsts %)) (range 2 (inc n))))))

Alternative implementation using Clojure's side-effect oriented list comprehension.

 (defn primes-to
  "Returns a lazy sequence of prime numbers less than lim"
  [lim]
  (let [refs (boolean-array (+ lim 1) true)
        root (int (Math/sqrt lim))]
    (do (doseq [i (range 2 lim)
                :while (<= i root)
                :when (aget refs i)]
          (doseq [j (range (* i i) lim i)]
            (aset refs j false)))
        (filter #(aget refs %) (range 2 lim)))))

Alternative implementation using Clojure's side-effect oriented list comprehension. Odds only.

(defn primes-to
  "Returns a lazy sequence of prime numbers less than lim"
  [lim]
  (let [max-i (int (/ (- lim 1) 2))
        refs (boolean-array max-i true)
        root (/ (dec (int (Math/sqrt lim))) 2)]
    (do (doseq [i (range 1 (inc root))
                :when (aget refs i)]
          (doseq [j (range (* (+ i i) (inc i)) max-i (+ i i 1))]
            (aset refs j false)))
        (cons 2 (map #(+ % % 1) (filter #(aget refs %) (range 1 max-i)))))))

This implemantation is about twice as fast as the previous one and uses only half the memory. From the index of the array, it calculates the value it represents as (2*i + 1), the step between two indices that represent the multiples of primes to mark as composite is also (2*i + 1). The index of the square of the prime to start composite marking is 2*i*(i+1).

Alternative very slow entirely functional implementation using lazy sequences

(defn primes-to
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [n]
  (letfn [(nxtprm [cs] ; current candidates
            (let [p (first cs)]
              (if (> p (Math/sqrt n)) cs
                (cons p (lazy-seq (nxtprm (-> (range (* p p) (inc n) p)
                                              set (remove cs) rest)))))))]
    (nxtprm (range 2 (inc n)))))

The reason that the above code is so slow is that it has has a high constant factor overhead due to using a (hash) set to remove the composites from the future composites stream, each prime composite stream removal requires a scan across all remaining composites (compared to using an array or vector where only the culled values are referenced, and due to the slowness of Clojure sequence operations as compared to iterator/sequence operations in other languages.

Version based on immutable Vector's

Here is an immutable boolean vector based non-lazy sequence version other than for the lazy sequence operations to output the result:

(defn primes-to
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [max-prime]
  (let [sieve (fn [s n]
                (if (<= (* n n) max-prime)
                  (recur (if (s n)
                           (reduce #(assoc %1 %2 false) s (range (* n n) (inc max-prime) n))
                           s)
                         (inc n))
                  s))]
    (->> (-> (reduce conj (vector-of :boolean) (map #(= % %) (range (inc max-prime))))
             (assoc 0 false)
             (assoc 1 false)
             (sieve 2))
         (map-indexed #(vector %2 %1)) (filter first) (map second))))

The above code is still quite slow due to the cost of the immutable copy-on-modify operations.

Odds only bit packed mutable array based version

The following code implements an odds-only sieve using a mutable bit packed long array, only using a lazy sequence for the output of the resulting primes:

(set! *unchecked-math* true)

(defn primes-to
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [n]
  (let [root (-> n Math/sqrt long),
        rootndx (long (/ (- root 3) 2)),
        ndx (long (/ (- n 3) 2)),
        cmpsts (long-array (inc (/ ndx 64))),
        isprm #(zero? (bit-and (aget cmpsts (bit-shift-right % 6))
                               (bit-shift-left 1 (bit-and % 63)))),
        cullp (fn [i]
                (let [p (long (+ i i 3))]
	                (loop [i (bit-shift-right (- (* p p) 3) 1)]
	                  (if (<= i ndx)
	                    (do (let [w (bit-shift-right i 6)]
	                    (aset cmpsts w (bit-or (aget cmpsts w)
	                                           (bit-shift-left 1 (bit-and i 63)))))
	                        (recur (+ i p))))))),
        cull (fn [] (loop [i 0] (if (<= i rootndx)
                                  (do (if (isprm i) (cullp i)) (recur (inc i))))))]
    (letfn [(nxtprm [i] (if (<= i ndx)
                          (cons (+ i i 3) (lazy-seq (nxtprm (loop [i (inc i)]
                                                              (if (or (> i ndx) (isprm i)) i
                                                                (recur (inc i)))))))))]
      (if (< n 2) nil
        (cons 3 (if (< n 3) nil (do (cull) (lazy-seq (nxtprm 0)))))))))

The above code is about as fast as any "one large sieving array" type of program in any computer language with this level of wheel factorization other than the lazy sequence operations are quite slow: it takes about ten times as long to enumerate the results as it does to do the actual sieving work of culling the composites from the sieving buffer array. The slowness of sequence operations is due to nested function calls, but primarily due to the way Clojure implements closures by "boxing" all arguments (and perhaps return values) as objects in the heap space, which then need to be "un-boxed" as primitives as necessary for integer operations. Some of the facilities provided by lazy sequences are not needed for this algorithm, such as the automatic memoization which means that each element of the sequence is calculated only once; it is not necessary for the sequence values to be retraced for this algorithm.

If further levels of wheel factorization were used, the time to enumerate the resulting primes would be an even higher overhead as compared to the actual composite number culling time, would get even higher if page segmentation were used to limit the buffer size to the size of the CPU L1 cache for many times better memory access times, most important in the culling operations, and yet higher again if multi-processing were used to share to page segment processing across CPU cores.

The following code overcomes many of those limitations by using an internal (OPSeq) "deftype" which implements the ISeq interface as well as the Counted interface to provide immediate count returns (based on a pre-computed total), as well as the IReduce interface which can greatly speed come computations based on the primes sequence (eased greatly using facilities provided by Clojure 1.7.0 and up):

(defn primes-tox
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [n]
  (let [root (-> n Math/sqrt long),
        rootndx (long (/ (- root 3) 2)),
        ndx (max (long (/ (- n 3) 2)) 0),
        lmt (quot ndx 64),
        cmpsts (long-array (inc lmt)),
        cullp (fn [i]
                (let [p (long (+ i i 3))]
	                (loop [i (bit-shift-right (- (* p p) 3) 1)]
	                  (if (<= i ndx)
	                    (do (let [w (bit-shift-right i 6)]
                            (aset cmpsts w (bit-or (aget cmpsts w)
                                                   (bit-shift-left 1 (bit-and i 63)))))
                          (recur (+ i p))))))),
        cull (fn [] (do (aset cmpsts lmt (bit-or (aget cmpsts lmt)
                                                 (bit-shift-left -2 (bit-and ndx 63))))
                        (loop [i 0]
                          (when (<= i rootndx)
                            (when (zero? (bit-and (aget cmpsts (bit-shift-right i 6))
                                                   (bit-shift-left 1 (bit-and i 63))))
                              (cullp i))
                            (recur (inc i))))))
        numprms (fn []
                  (let [w (dec (alength cmpsts))] ;; fast results count bit counter
                    (loop [i 0, cnt (bit-shift-left (alength cmpsts) 6)]
                      (if (> i w) cnt
                        (recur (inc i) 
                               (- cnt (java.lang.Long/bitCount (aget cmpsts i))))))))]
    (if (< n 2) nil
      (cons 2 (if (< n 3) nil
                (do (cull)
                    (deftype OPSeq [^long i ^longs cmpsa ^long cnt ^long tcnt] ;; for arrays maybe need to embed the array so that it doesn't get garbage collected???
                      clojure.lang.ISeq
                        (first [_] (if (nil? cmpsa) nil (+ i i 3)))
                        (next [_] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) nil
                                                          (OPSeq.
                                                            (loop [j (inc i)]
                                                              (let [p? (zero? (bit-and (aget cmpsa (bit-shift-right j 6))
                                                                                       (bit-shift-left 1 (bit-and j 63))))]
                                                                (if p? j (recur (inc j)))))
                                                            cmpsa ncnt tcnt))))
                        (more [this] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) (OPSeq. 0 nil tcnt tcnt)
                                                             (.next this))))
                        (cons [this o] (clojure.core/cons o this))
                        (empty [_] (if (= cnt tcnt) nil (OPSeq. 0 nil tcnt tcnt)))
                        (equiv [this o] (if (or (not= (type this) (type o))
                                                (not= cnt (.cnt ^OPSeq o)) (not= tcnt (.tcnt ^OPSeq o))
                                                (not= i (.i ^OPSeq o))) false true))
                        clojure.lang.Counted
                          (count [_] (- tcnt cnt))
                        clojure.lang.Seqable
                          (clojure.lang.Seqable/seq [this] (if (= cnt tcnt) nil this))
                        clojure.lang.IReduce
                          (reduce [_ f v] (let [c (- tcnt cnt)]
                                            (if (<= c 0) nil
                                              (loop [ci i, n c, rslt v]
                                                (if (zero? (bit-and (aget cmpsa (bit-shift-right ci 6))
                                                                    (bit-shift-left 1 (bit-and ci 63))))
                                                  (let [rrslt (f rslt (+ ci ci 3)),
                                                        rdcd (reduced? rrslt),
                                                        nrslt (if rdcd @rrslt rrslt)]
                                                    (if (or (<= n 1) rdcd) nrslt
                                                      (recur (inc ci) (dec n) nrslt)))
                                                  (recur (inc ci) n rslt))))))
                          (reduce [this f] (if (nil? i) (f) (if (= (.count this) 1) (+ i i 3)
                                                              (.reduce ^clojure.lang.IReduce (.next this) f (+ i i 3)))))
                        clojure.lang.Sequential
                        Object
                          (toString [this] (if (= cnt tcnt) "()"
                                             (.toString (seq (map identity this)))))) 
                    (->OPSeq 0 cmpsts 0 (numprms))))))))

'(time (count (primes-tox 10000000)))' takes about 40 milliseconds (compiled) to produce 664579.

Due to the better efficiency of the custom CIS type, the primes to the above range can be enumerated in about the same 40 milliseconds that it takes to cull and count the sieve buffer array.

Under Clojure 1.7.0, one can use '(time (reduce (fn [] (+ (long sum) (long v))) 0 (primes-tox 2000000)))' to find "142913828922" as the sum of the primes to two million as per Euler Problem 10 in about 40 milliseconds total with about half the time used for sieving the array and half for computing the sum.

To show how sensitive Clojure is to forms of expression of functions, the simple form '(time (reduce + (primes-tox 2000000)))' takes about twice as long even though it is using the same internal routine for most of the calculation due to the function not having the type coercion's.

Before one considers that this code is suitable for larger ranges, it is still lacks the improvements of page segmentation with pages about the size of the CPU L1/L2 caches (produces about a four times speed up), maximal wheel factorization (to make it another about four times faster), and the use of multi-processing (for a further gain of about 4 times for a multi-core desktop CPU such as an Intel i7), will make the sieving/counting code about 50 times faster than this, although there will only be a moderate improvement in the time to enumerate/process the resulting primes. Using these techniques, the number of primes to one billion can be counted in a small fraction of a second.

Unbounded Versions

For some types of problems such as finding the nth prime (rather than the sequence of primes up to m), a prime sieve with no upper bound is a better tool.

The following variations on an incremental Sieve of Eratosthenes are based on or derived from the Richard Bird sieve as described in the Epilogue of Melissa E. O'Neill's definitive paper:

A Clojure version of Richard Bird's Sieve using Lazy Sequences (sieves odds only)

(defn primes-Bird
  "Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm by Richard Bird."
  []
  (letfn [(mltpls [p] (let [p2 (* 2 p)]
                        (letfn [(nxtmltpl [c]
                                  (cons c (lazy-seq (nxtmltpl (+ c p2)))))]
                          (nxtmltpl (* p p))))),
          (allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),
          (union [xs ys] (let [xv (first xs), yv (first ys)]
                           (if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))
                             (if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))
                               (cons xv (lazy-seq (union (next xs) (next ys)))))))),
          (mrgmltpls [mltplss] (cons (first (first mltplss))
                                     (lazy-seq (union (next (first mltplss))
                                                      (mrgmltpls (next mltplss)))))),
          (minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]
                                    (if (< n (first cmpsts))
                                      (cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))
                                      (recur (+ n 2) (next cmpsts)))))]
    (do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
                                         (minusStrtAt 5 cmpsts)))))
        (cons 2 (lazy-seq oddprms)))))

The above code is quite slow due to both that the data structure is a linear merging of prime multiples and due to the slowness of the Clojure sequence operations.

A Clojure version of the tree folding sieve using Lazy Sequences

The following code speeds up the above code by merging the linear sequence of sequences as above by pairs into a right-leaning tree structure:

(defn primes-treeFolding
  "Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
  []
  (letfn [(mltpls [p] (let [p2 (* 2 p)]
                        (letfn [(nxtmltpl [c]
                                  (cons c (lazy-seq (nxtmltpl (+ c p2)))))]
                          (nxtmltpl (* p p))))),
          (allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),
          (union [xs ys] (let [xv (first xs), yv (first ys)]
                           (if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))
                             (if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))
                               (cons xv (lazy-seq (union (next xs) (next ys)))))))),
          (pairs [mltplss] (let [tl (next mltplss)]
                             (cons (union (first mltplss) (first tl))
                                   (lazy-seq (pairs (next tl)))))),
          (mrgmltpls [mltplss] (cons (first (first mltplss))
                                     (lazy-seq (union (next (first mltplss))
                                                      (mrgmltpls (pairs (next mltplss))))))),
          (minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]
                                    (if (< n (first cmpsts))
                                      (cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))
                                      (recur (+ n 2) (next cmpsts)))))]
    (do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
                                         (minusStrtAt 5 cmpsts)))))
        (cons 2 (lazy-seq oddprms)))))

The above code is still slower than it should be due to the slowness of Clojure's sequence operations.

A Clojure version of the above tree folding sieve using a custom Co Inductive Sequence

The following code uses a custom "deftype" non-memoizing Co Inductive Stream/Sequence (CIS) implementing the ISeq interface to make the sequence operations more efficient and is about four times faster than the above code:

(deftype CIS [v cont]
  clojure.lang.ISeq
    (first [_] v)
    (next [_] (if (nil? cont) nil (cont)))
    (more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))
    (cons [this o] (clojure.core/cons o this))
    (empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))
    (equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)
                                                (if (or (not= (type cis1) (type cis2))
                                                        (not= (.v cis1) (.v ^CIS cis2))
                                                        (and (nil? (.cont cis1))
                                                              (not (nil? (.cont ^CIS cis2))))
                                                        (and (nil? (.cont ^CIS cis2))
                                                              (not (nil? (.cont cis1))))) false
                                                  (if (nil? (.cont cis1)) true
                                                    (recur ((.cont cis1)) ((.cont ^CIS cis2))))))))
    (count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt
                                            (recur ((.cont cis)) (inc cnt)))))
  clojure.lang.Seqable
    (seq [this] (if (and (nil? v) (nil? cont)) nil this))
  clojure.lang.Sequential
  Object
    (toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))

(defn primes-treeFoldingx
  "Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
  []
  (letfn [(mltpls [p] (let [p2 (* 2 p)]
                        (letfn [(nxtmltpl [c]
                                  (->CIS c (fn [] (nxtmltpl (+ c p2)))))]
                          (nxtmltpl (* p p))))),
          (allmtpls [^CIS ps] (->CIS (mltpls (.v ps)) (fn [] (allmtpls ((.cont ps)))))),
          (union [^CIS xs ^CIS ys] (let [xv (.v xs), yv (.v ys)]
                                     (if (< xv yv) (->CIS xv (fn [] (union ((.cont xs)) ys)))
                                       (if (< yv xv) (->CIS yv (fn [] (union xs ((.cont ys)))))
                                         (->CIS xv (fn [] (union (next xs) ((.cont ys))))))))),
          (pairs [^CIS mltplss] (let [^CIS tl ((.cont mltplss))]
                                  (->CIS (union (.v mltplss) (.v tl))
                                         (fn [] (pairs ((.cont tl))))))),
          (mrgmltpls [^CIS mltplss] (->CIS (.v ^CIS (.v mltplss))
                                           (fn [] (union ((.cont ^CIS (.v mltplss)))
                                                         (mrgmltpls (pairs ((.cont mltplss)))))))),
          (minusStrtAt [n ^CIS cmpsts] (loop [n n, cmpsts cmpsts]
                                         (if (< n (.v cmpsts))
                                           (->CIS n (fn [] (minusStrtAt (+ n 2) cmpsts)))
                                           (recur (+ n 2) ((.cont cmpsts))))))]
    (do (def oddprms (->CIS 3 (fn [] (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
                                       (minusStrtAt 5 cmpsts)))))
        (->CIS 2 (fn [] oddprms)))))

'(time (count (take-while #(<= (long %) 10000000) (primes-treeFoldingx))))' takes about 3.4 seconds for a range of 10 million.

The above code is useful for ranges up to about fifteen million primes, which is about the first million primes; it is comparable in speed to all of the bounded versions except for the fastest bit packed version which can reasonably be used for ranges about 100 times as large.

Incremental Hash Map based unbounded "odds-only" version

The following code is a version of the O'Neill Haskell code but does not use wheel factorization other than for sieving odds only (although it could be easily added) and uses a Hash Map (constant amortized access time) rather than a Priority Queue (log n access time for combined remove-and-insert-anew operations, which are the majority used for this algorithm) with a lazy sequence for output of the resulting primes; the code has the added feature that it uses a secondary base primes sequence generator and only adds prime culling sequences to the composites map when they are necessary, thus saving time and limiting storage to only that required for the map entries for primes up to the square root of the currently sieved number:

(defn primes-hashmap
  "Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Hashmap"
  []
  (letfn [(nxtoddprm [c q bsprms cmpsts]
            (if (>= c q) ;; only ever equal
              (let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]
                (recur (+ c 2) (* nbp nbp) nbps (assoc cmpsts (+ q p2) p2)))
              (if (contains? cmpsts c)
                (recur (+ c 2) q bsprms
                       (let [adv (cmpsts c), ncmps (dissoc cmpsts c)]
                         (assoc ncmps
                                (loop [try (+ c adv)] ;; ensure map entry is unique
                                  (if (contains? ncmps try)
                                    (recur (+ try adv)) try)) adv)))
                (cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts))))))]
    (do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms {}))))
        (cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms {}))))))

The above code is slower than the best tree folding version due to the added constant factor overhead of computing the hash functions for every hash map operation even though it has computational complexity of (n log log n) rather than the worse (n log n log log n) for the previous incremental tree folding sieve. It is still about 100 times slower than the sieve based on the bit-packed mutable array due to these constant factor hashing overheads.

There is almost no benefit of converting the above code to use a CIS as most of the time is expended in the hash map functions.

Incremental Priority Queue based unbounded "odds-only" version

In order to implement the O'Neill Priority Queue incremental Sieve of Eratosthenes algorithm, one requires an efficient implementation of a Priority Queue, which is not part of standard Clojure. For this purpose, the most suitable Priority Queue is a binary tree heap based MinHeap algorithm. The following code implements a purely functional (using entirely immutable state) MinHeap Priority Queue providing the required functions of (emtpy-pq) initialization, (getMin-pq pq) to examinte the minimum key/value pair in the queue, (insert-pq pq k v) to add entries to the queue, and (replaceMinAs-pq pq k v) to replaace the minimum entry with a key/value pair as given (it is more efficient that if functions were provided to delete and then re-insert entries in the queue; there is therefore no "delete" or other queue functions supplied as the algorithm does not requrie them:

(deftype PQEntry [k, v]
  Object
    (toString [_] (str "<" k "," v ">")))
(deftype PQNode [ntry, lft, rght]
  Object
    (toString [_] (str "<" ntry " left: " (str lft) " right: " (str rght) ">")))

(defn empty-pq [] nil)

(defn getMin-pq [^PQNode pq]
  (if (nil? pq)
    nil
    (.ntry pq)))

(defn insert-pq [^PQNode opq ok v]
  (loop [^PQEntry kv (->PQEntry ok v), pq opq, cont identity]
    (if (nil? pq)
      (cont (->PQNode kv nil nil))
      (let [k (.k kv),
            ^PQEntry kvn (.ntry pq), kn (.k kvn),
            l (.lft pq), r (.rght pq)]
        (if (<= k kn)
          (recur kvn r #(cont (->PQNode kv % l)))
          (recur kv r #(cont (->PQNode kvn % l))))))))

(defn replaceMinAs-pq [^PQNode opq k v]
  (let [^PQEntry kv (->PQEntry k v)]
    (if (nil? opq) ;; if was empty or just an entry, just use current entry
      (->PQNode kv nil nil)
      (loop [pq opq, cont identity]
        (let [^PQNode l (.lft pq), ^PQNode r (.rght pq)]
          (cond ;; if left us empty, right must be too
            (nil? l)
              (cont (->PQNode kv nil nil)),
            (nil? r) ;; we only have a left...
              (let [^PQEntry kvl (.ntry l), kl (.k kvl)]
                    (if (<= k kl)
                      (cont (->PQNode kv l nil))
                      (recur l #(cont (->PQNode kvl % nil))))),
            :else (let [^PQEntry kvl (.ntry l), kl (.k kvl),
                        ^PQEntry kvr (.ntry r), kr (.k kvr)] ;; we have both
                    (if (and (<= k kl) (<= k kr))
                      (cont (->PQNode kv l r))
                      (if (<= kl kr)
                        (recur l #(cont (->PQNode kvl % r)))
                        (recur r #(cont (->PQNode kvr l %))))))))))))

Note that the above code is written partially using continuation passing style so as to leave the "recur" calls in tail call position as required for efficient looping in Clojure; for practical sieving ranges, the algorithm could likely use just raw function recursion as recursion depth is unlikely to be used beyond a depth of about ten or so, but raw recursion is said to be less code efficient.

The actual incremental sieve using the Priority Queue is as follows, which code uses the same optimizations of postponing the addition of prime composite streams to the queue until the square root of the currently sieved number is reached and using a secondary base primes stream to generate the primes composite stream markers in the queue as was used for the Hash Map version:

(defn primes-pq
  "Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Priority Queue"
  []
  (letfn [(nxtoddprm [c q bsprms cmpsts]
            (if (>= c q) ;; only ever equal
              (let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]
                (recur (+ c 2) (* nbp nbp) nbps (insert-pq cmpsts (+ q p2) p2)))
              (let [mn (getMin-pq cmpsts)]
                (if (and mn (>= c (.k mn))) ;; never greater than
                  (recur (+ c 2) q bsprms
                         (loop [adv (.v mn), cmps cmpsts] ;; advance repeat composites for value
                           (let [ncmps (replaceMinAs-pq cmps (+ c adv) adv),
                                 nmn (getMin-pq ncmps)]
                             (if (and nmn (>= c (.k nmn)))
                               (recur (.v nmn) ncmps)
                               ncmps))))
                  (cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts)))))))]
    (do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms (empty-pq)))))
        (cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms (empty-pq)))))))

The above code is faster than the Hash Map version up to about a sieving range of fifteen million or so, but gets progressively slower for larger ranges due to having (n log n log log n) computational complexity rather than the (n log log n) for the Hash Map version, which has a higher constant factor overhead that is overtaken by the extra "log n" factor.

It is slower that the fastest of the tree folding versions (which has the same computational complexity) due to the higher constant factor overhead of the Priority Queue operations (although perhaps a more efficient implementation of the MinHeap Priority Queue could be developed).

Again, these non-mutable array based sieves are about a hundred times slower than even the "one large memory buffer array" version as implemented in the bounded section; a page segmented version of the mutable bit-packed memory array would be several times faster.

All of these algorithms will respond to maximum wheel factorization, getting up to approximately four times faster if this is applied as compared to the the "odds-only" versions.

It is difficult if not impossible to apply efficient multi-processing to the above versions of the unbounded sieves as the next values of the primes sequence are dependent on previous changes of state for the Bird and Tree Folding versions; however, with the addition of a "update the whole Priority Queue (and reheapify)" or "update the Hash Map" to a given page start state functions, it is possible to do for these letter two algorithms; however, even though it is possible and there is some benefit for these latter two implementations, the benefit is less than using mutable arrays due to that the results must be enumerated into a data structure of some sort in order to be passed out of the page function whereas they can be directly enumerated from the array for the mutable array versions.

Bit packed page segmented array unbounded "odds-only" version

To show that Clojure does not need to be particularly slow, the following version runs about twice as fast as the non-segmented unbounded array based version above (extremely fast compared to the non-array based versions) and only a little slower than other equivalent versions running on virtual machines: C# or F# on DotNet or Java and Scala on the JVM:

(set! *unchecked-math* true)

(def PGSZ (bit-shift-left 1 14)) ;; size of CPU cache
(def PGBTS (bit-shift-left PGSZ 3))
(def PGWRDS (bit-shift-right PGBTS 5))
(def BPWRDS (bit-shift-left 1 7)) ;; smaller page buffer for base primes
(def BPBTS (bit-shift-left BPWRDS 5))
(defn- count-pg
  "count primes in the culled page buffer, with test for limit"
  [lmt ^ints pg]
  (let [pgsz (alength pg),
        pgbts (bit-shift-left pgsz 5),
        cntem (fn [lmtw]
                (let [lmtw (long lmtw)]
	          (loop [i (long 0), c (long 0)]
	            (if (>= i lmtw) (- (bit-shift-left lmtw 5) c)
	              (recur (inc i)
	              (+ c (java.lang.Integer/bitCount (aget pg i))))))))]
    (if (< lmt pgbts)
      (let [lmtw (bit-shift-right lmt 5),
            lmtb (bit-and lmt 31)
            msk (bit-shift-left -2 lmtb)]
        (+ (cntem lmtw)
           (- 32 (java.lang.Integer/bitCount (bit-or (aget pg lmtw)
                                                      msk)))))
      (- pgbts
         (areduce pg i ret (long 0) (+ ret (java.lang.Integer/bitCount (aget pg i))))))))
;;      (cntem pgsz))))
(defn- primes-pages
  "unbounded Sieve of Eratosthenes producing a lazy sequence of culled page buffers."
  []
  (letfn [(make-pg [lowi pgsz bpgs]
            (let [lowi (long lowi),
                  pgbts (long (bit-shift-left pgsz 5)),
                  pgrng (long (+ (bit-shift-left (+ lowi pgbts) 1) 3)),
                  ^ints pg (int-array pgsz),
                  cull (fn [bpgs']
                         (loop [i (long 0), bpgs' bpgs']
	                         (let [^ints fbpg (first bpgs'),
	                               bpgsz (long (alength fbpg))]
	                           (if (>= i bpgsz)
	                             (recur 0 (next bpgs'))
	                             (let [p (long (aget fbpg i)),
	                                   sqr (long (* p p))]
	                               (if (< sqr pgrng) (do
                   (loop [j (long (let [s (long (bit-shift-right (- sqr 3) 1))]
                                     (if (>= s lowi) (- s lowi)
                                       (let [m (long (rem (- lowi s) p))]
                                         (if (zero? m)
                                           0
                                           (- p m))))))]
                     (if (< j pgbts) ;; fast inner culling loop where most time is spent
                       (do
                         (let [w (bit-shift-right j 5)]
                           (aset pg w (int (bit-or (aget pg w)
                                                   (bit-shift-left 1 (bit-and j 31))))))
                         (recur (+ j p)))))
                     (recur (inc i) bpgs'))))))))]
              (do (if (nil? bpgs)
                    (letfn [(mkbpps [i]
                              (if (zero? (bit-and (aget pg (bit-shift-right i 5))
                                                  (bit-shift-left 1 (bit-and i 31))))
                                (cons (int-array 1 (+ i i 3)) (lazy-seq (mkbpps (inc i))))
                                (recur (inc i))))]
                      (cull (mkbpps 0)))
                    (cull bpgs))
                  pg))),
          (page-seq [lowi pgsz bps]
            (letfn [(next-seq [lwi]
                      (cons (make-pg lwi pgsz bps)
                            (lazy-seq (next-seq (+ lwi (bit-shift-left pgsz 5))))))]
              (next-seq lowi)))
          (pgs->bppgs [ppgs]
            (letfn [(nxt-pg [lowi pgs]
                      (let [^ints pg (first pgs),
                            cnt (count-pg BPBTS pg),
                            npg (int-array cnt)]
                        (do (loop [i 0, j 0]
                              (if (< i BPBTS)
                                (if (zero? (bit-and (aget pg (bit-shift-right i 5))
                                                    (bit-shift-left 1 (bit-and i 31))))
                                  (do (aset npg j (+ (bit-shift-left (+ lowi i) 1) 3))
                                      (recur (inc i) (inc j)))
                                  (recur (inc i) j))))
                            (cons npg (lazy-seq (nxt-pg (+ lowi BPBTS) (next pgs)))))))]
              (nxt-pg 0 ppgs))),
          (make-base-prms-pgs []
            (pgs->bppgs (cons (make-pg 0 BPWRDS nil)
                              (lazy-seq (page-seq BPBTS BPWRDS (make-base-prms-pgs))))))]
    (page-seq 0 PGWRDS (make-base-prms-pgs))))
(defn primes-paged
  "unbounded Sieve of Eratosthenes producing a lazy sequence of primes"
  []
  (do (deftype CIS [v cont]
        clojure.lang.ISeq
          (first [_] v)
          (next [_] (if (nil? cont) nil (cont)))
          (more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))
          (cons [this o] (clojure.core/cons o this))
          (empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))
          (equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)
                                                      (if (or (not= (type cis1) (type cis2))
                                                              (not= (.v cis1) (.v ^CIS cis2))
                                                              (and (nil? (.cont cis1))
                                                                   (not (nil? (.cont ^CIS cis2))))
                                                              (and (nil? (.cont ^CIS cis2))
                                                                   (not (nil? (.cont cis1))))) false
                                                        (if (nil? (.cont cis1)) true
                                                          (recur ((.cont cis1)) ((.cont ^CIS cis2))))))))
          (count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt
                                                  (recur ((.cont cis)) (inc cnt)))))
        clojure.lang.Seqable
          (seq [this] (if (and (nil? v) (nil? cont)) nil this))
        clojure.lang.Sequential
        Object
          (toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))
		  (letfn [(next-prm [lowi i pgseq]
		            (let [lowi (long lowi),
                      i (long i),
                      ^ints pg (first pgseq),
		                  pgsz (long (alength pg)),
		                  pgbts (long (bit-shift-left pgsz 5)),
		                  ni (long (loop [j (long i)]
		                             (if (or (>= j pgbts)
		                                     (zero? (bit-and (aget pg (bit-shift-right j 5))
		                                               (bit-shift-left 1 (bit-and j 31)))))
		                               j
		                               (recur (inc j)))))]
		              (if (>= ni pgbts)
		                (recur (+ lowi pgbts) 0 (next pgseq))
		                (->CIS (+ (bit-shift-left (+ lowi ni) 1) 3)
		                       (fn [] (next-prm lowi (inc ni) pgseq))))))]
		    (->CIS 2 (fn [] (next-prm 0 0 (primes-pages)))))))
(defn primes-paged-count-to
  "counts primes generated by page segments by Sieve of Eratosthenes to the top limit"
  [top]
  (cond (< top 2) 0
        (< top 3) 1
        :else (letfn [(nxt-pg [lowi pgseq cnt]
                        (let [topi (bit-shift-right (- top 3) 1)
                              nxti (+ lowi PGBTS),
                              pg (first pgseq)]
                          (if (> nxti topi)
                            (+ cnt (count-pg (- topi lowi) pg))
                            (recur nxti
                                   (next pgseq)
                                   (+ cnt (count-pg PGBTS pg))))))]
                (nxt-pg 0 (primes-pages) 1))))

The above code runs just as fast as other virtual machine languages when run on a 64-bit JVM; however, when run on a 32-bit JVM it runs almost five times slower. This is likely due to Clojure only using 64-bit integers for integer operations and these operations getting JIT compiled to use library functions to simulate those operations using combined 32-bit operations under a 32-bit JVM whereas direct CPU operations can be used on a 64-bit JVM

Clojure does one thing very slowly, just as here: it enumerates extremely slowly as compared to using a more imperative iteration interface; it helps to use a roll-your-own ISeq interface as here, where enumeration of the primes reduces the time from about four times as long as the composite culling operations for those primes to only about one and a half times as long, although one must also write their own sequence handling functions (can't use "take-while" or "count", for instance) in order to enjoy that benefit. That is why the "primes-paged-count-to" function is provided so it takes a negligible percentage of the time to count the primes over a range as compared to the time for the composite culling operations.

The practical range of the above sieve is about 16 million due to the fixed size of the page buffers; in order to extend the range, a larger page buffer could be used up to the size of the CPU L2 or L3 caches. If a 2^20 buffer were used (one Megabyte, as many modern dexktop CPU's easily have in their L3 cache), then the range would be increased up to about 10^14 at a cost of about a factor of two or three in slower memory accesses per composite culling operation loop. The base primes culling page size is already adequate for this range. One could make the culling page size automatically expand with growing range by about the square root of the current prime range with not too many changes to the code.

As for many implementations of unbounded sieves, the base primes less than the square root of the current range are generated by a secondary generated stream of primes; in this case it is done recursively, so another secondary stream generates the base primes for the base primes and so on down to where the innermost generator has only one page in the stream; this only takes one or two recursions for this type of range.

The base primes culling page size is reduced from the page size for the main primes so that there is less overhead for smaller primes ranges; otherwise excess base primes are generated for fairly small sieve ranges.

CLU

% Sieve of Eratosthenes
eratosthenes = proc (n: int) returns (array[bool])
    prime: array[bool] := array[bool]$fill(1, n, true)
    prime[1] := false

    for p: int in int$from_to(2, n/2) do
        if prime[p] then
            for c: int in int$from_to_by(p*p, n, p) do
                prime[c] := false
            end
        end
    end
    return(prime)
end eratosthenes

% Print primes up to 1000 using the sieve
start_up = proc ()
    po: stream := stream$primary_output()
    prime: array[bool] := eratosthenes(1000)
    col: int := 0

    for i: int in array[bool]$indexes(prime) do
        if prime[i] then
            col := col + 1
            stream$putright(po, int$unparse(i), 5)
            if col = 10 then
                col := 0
                stream$putc(po, '\n')
            end
        end
    end
end start_up

Output:

    2    3    5    7   11   13   17   19   23   29
   31   37   41   43   47   53   59   61   67   71
   73   79   83   89   97  101  103  107  109  113
  127  131  137  139  149  151  157  163  167  173
  179  181  191  193  197  199  211  223  227  229
  233  239  241  251  257  263  269  271  277  281
  283  293  307  311  313  317  331  337  347  349
  353  359  367  373  379  383  389  397  401  409
  419  421  431  433  439  443  449  457  461  463
  467  479  487  491  499  503  509  521  523  541
  547  557  563  569  571  577  587  593  599  601
  607  613  617  619  631  641  643  647  653  659
  661  673  677  683  691  701  709  719  727  733
  739  743  751  757  761  769  773  787  797  809
  811  821  823  827  829  839  853  857  859  863
  877  881  883  887  907  911  919  929  937  941
  947  953  967  971  977  983  991  997

CMake

function(eratosthenes var limit)
  # Check for integer overflow. With CMake using 32-bit signed integer,
  # this check fails when limit > 46340.
  if(NOT limit EQUAL 0)         # Avoid division by zero.
    math(EXPR i "(${limit} * ${limit}) / ${limit}")
    if(NOT limit EQUAL ${i})
      message(FATAL_ERROR "limit is too large, would cause integer overflow")
    endif()
  endif()

  # Use local variables prime_2, prime_3, ..., prime_${limit} as array.
  # Initialize array to y => yes it is prime.
  foreach(i RANGE 2 ${limit})
    set(prime_${i} y)
  endforeach(i)

  # Gather a list of prime numbers.
  set(list)
  foreach(i RANGE 2 ${limit})
    if(prime_${i})
      # Append this prime to list.
      list(APPEND list ${i})

      # For each multiple of i, set n => no it is not prime.
      # Optimization: start at i squared.
      math(EXPR square "${i} * ${i}")
      if(NOT square GREATER ${limit})   # Avoid fatal error.
        foreach(m RANGE ${square} ${limit} ${i})
          set(prime_${m} n)
        endforeach(m)
      endif()
    endif(prime_${i})
  endforeach(i)
  set(${var} ${list} PARENT_SCOPE)
endfunction(eratosthenes)

# Print all prime numbers through 100.
eratosthenes(primes 100)
message(STATUS "${primes}")

COBOL

*> Please ignore the asterisks in the first column of the next comments,
*> which are kludges to get syntax highlighting to work.
       IDENTIFICATION DIVISION.
       PROGRAM-ID. Sieve-Of-Eratosthenes.

       DATA DIVISION.
       WORKING-STORAGE SECTION.

       01  Max-Number       USAGE UNSIGNED-INT.
       01  Max-Prime        USAGE UNSIGNED-INT.

       01  Num-Group.
           03  Num-Table PIC X VALUE "P"
                   OCCURS 1 TO 10000000 TIMES DEPENDING ON Max-Number
                   INDEXED BY Num-Index.
               88  Is-Prime VALUE "P" FALSE "N".
               
       01  Current-Prime    USAGE UNSIGNED-INT.

       01  I                USAGE UNSIGNED-INT.

       PROCEDURE DIVISION.
           DISPLAY "Enter the limit: " WITH NO ADVANCING
           ACCEPT Max-Number
           DIVIDE Max-Number BY 2 GIVING Max-Prime

*          *> Set Is-Prime of all non-prime numbers to false.
           SET Is-Prime (1) TO FALSE
           PERFORM UNTIL Max-Prime < Current-Prime
*              *> Set current-prime to next prime.
               ADD 1 TO Current-Prime
               PERFORM VARYING Num-Index FROM Current-Prime BY 1
                   UNTIL Is-Prime (Num-Index)
               END-PERFORM
               MOVE Num-Index TO Current-Prime

*              *> Set Is-Prime of all multiples of current-prime to
*              *> false, starting from current-prime sqaured.
               COMPUTE Num-Index = Current-Prime ** 2
               PERFORM UNTIL Max-Number < Num-Index
                   SET Is-Prime (Num-Index) TO FALSE
                   SET Num-Index UP BY Current-Prime
               END-PERFORM
           END-PERFORM

*          *> Display the prime numbers.
           PERFORM VARYING Num-Index FROM 1 BY 1
                   UNTIL Max-Number < Num-Index
               IF Is-Prime (Num-Index)
                   DISPLAY Num-Index
               END-IF
           END-PERFORM

           GOBACK
           .

Comal

Translation of: BASIC

// Sieve of Eratosthenes
input "Limit? ": limit
dim sieve(1:limit)
sqrlimit:=sqr(limit)
sieve(1):=1
p:=2
while p<=sqrlimit do
 while sieve(p) and p<sqrlimit do
  p:=p+1
 endwhile
 if p>sqrlimit then goto done
 for i:=p*p to limit step p do
  sieve(i):=1
 endfor i
 p:=p+1
endwhile
done:
print 2,
for i:=3 to limit do
 if sieve(i)=0 then
  print ", ",i,
 endif
endfor i
print

Output:

Limit? 100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
79, 83, 89, 97

end

Common Lisp

(defun sieve-of-eratosthenes (maximum)
  (loop
     with sieve = (make-array (1+ maximum)
                              :element-type 'bit
                              :initial-element 0)
     for candidate from 2 to maximum
     when (zerop (bit sieve candidate))
     collect candidate
     and do (loop for composite from (expt candidate 2) 
               to maximum by candidate
               do (setf (bit sieve composite) 1))))

Working with odds only (above twice speedup), and marking composites only for primes up to the square root of the maximum:

(defun sieve-odds (maximum)
  "Prime numbers sieve for odd numbers. 
   Returns a list with all the primes that are less than or equal to maximum."
  (loop :with maxi = (ash (1- maximum) -1)
        :with stop = (ash (isqrt maximum) -1)
        :with sieve = (make-array (1+ maxi) :element-type 'bit :initial-element 0)
        :for i :from 1 :to maxi
        :for odd-number = (1+ (ash i 1))
        :when (zerop (sbit sieve i))
          :collect odd-number :into values
        :when (<= i stop)
          :do (loop :for j :from (* i (1+ i) 2) :to maxi :by odd-number
                    :do (setf (sbit sieve j) 1))
        :finally (return (cons 2 values))))

The indexation scheme used here interprets each index i as standing for the value 2i+1. Bit 0 is unused, a small price to pay for the simpler index calculations compared with the 2i+3 indexation scheme. The multiples of a given odd prime p are enumerated in increments of 2p, which corresponds to the index increment of p on the sieve array. The starting point p*p = (2i+1)(2i+1) = 4i(i+1)+1 corresponds to the index 2i(i+1).

While formally a wheel, odds are uniformly spaced and do not require any special processing except for value translation. Wheels proper aren't uniformly spaced and are thus trickier.

Cowgol

include "cowgol.coh";

# To change the maximum prime, change the size of this array
# Everything else is automatically filled in at compile time
var sieve: uint8[5000];

# Make sure all elements of the sieve are set to zero
MemZero(&sieve as [uint8], @bytesof sieve);

# Generate the sieve
var prime: @indexof sieve := 2;
while prime < @sizeof sieve loop
    if sieve[prime] == 0 then
        var comp: @indexof sieve := prime * prime;
        while comp < @sizeof sieve loop
            sieve[comp] := 1;
            comp := comp + prime;
        end loop;
    end if;
    prime := prime + 1;
end loop;

# Print all primes
var cand: @indexof sieve := 2;
while cand < @sizeof sieve loop
    if sieve[cand] == 0 then
        print_i16(cand as uint16);
        print_nl();
    end if;
    cand := cand + 1;
end loop;

Output:

Craft Basic

define limit = 120

dim flags[limit]

for n = 2 to limit

	let flags[n] = 1

next n

print "prime numbers less than or equal to ", limit ," are:"

for n = 2 to sqrt(limit)

	if flags[n] = 1 then

		for i = n * n to limit step n

			let flags[i] = 0

		next i

	endif

next n

for n = 1 to limit

	if flags[n] then

		print n

	endif

next n

Output:

prime numbers less than or equal to 120 are:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113

Crystal

Basic Version

This implementation uses a `BitArray` so it is automatically bit-packed to use just one bit per number representation:

# compile with `--release --no-debug` for speed...

require "bit_array"

alias Prime = UInt64

class SoE
  include Iterator(Prime)
  @bits : BitArray; @bitndx : Int32 = 2

  def initialize(range : Prime)
    if range < 2
      @bits = BitArray.new 0
    else
      @bits = BitArray.new((range + 1).to_i32)
    end
    ba = @bits; ndx = 2
    while true
      wi = ndx * ndx
      break if wi >= ba.size
      if ba[ndx]
        ndx += 1; next
      end
      while wi < ba.size
        ba[wi] = true; wi += ndx
      end
      ndx += 1
    end
  end

  def next
    while @bitndx < @bits.size
      if @bits[@bitndx]
        @bitndx += 1; next
      end
      rslt = @bitndx.to_u64; @bitndx += 1; return rslt
    end
    stop
  end
end

print "Primes up to a hundred:  "
SoE.new(100).each { |p| print " ", p }; puts
print "Number of primes to a million:  "
puts SoE.new(1_000_000).each.size
print "Number of primes to a billion:  "
start_time = Time.monotonic
print SoE.new(1_000_000_000).each.size
elpsd = (Time.monotonic - start_time).total_milliseconds
puts " in #{elpsd} milliseconds."

Output:

Primes up to a hundred:   2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Number of primes to a million:  78498
Number of primes to a billion:  50847534 in 10219.222539 milliseconds.

This is as run on an Intel SkyLake i5-6500 at 3.6 GHz (automatic boost for single threaded as here).

Odds-Only Version

the non-odds-only version as per the above should never be used because in not using odds-only, it uses twice the memory and over two and a half times the CPU operations as the following odds-only code, which is very little more complex:

# compile with `--release --no-debug` for speed...

require "bit_array"

alias Prime = UInt64

class SoE_Odds
  include Iterator(Prime)
  @bits : BitArray; @bitndx : Int32 = -1

  def initialize(range : Prime)
    if range < 3
      @bits = BitArray.new 0
    else
      @bits = BitArray.new(((range - 1) >> 1).to_i32)
    end
    ba = @bits; ndx = 0
    while true
      wi = (ndx + ndx) * (ndx + 3) + 3 # start cull index calculation
      break if wi >= ba.size
      if ba[ndx]
        ndx += 1; next
      end
      bp = ndx + ndx + 3
      while wi < ba.size
        ba[wi] = true; wi += bp
      end
      ndx += 1
    end
  end

  def next
    while @bitndx < @bits.size
      if @bitndx < 0
        @bitndx += 1; return 2_u64
      elsif @bits[@bitndx]
        @bitndx += 1; next
      end
      rslt = (@bitndx + @bitndx + 3).to_u64; @bitndx += 1; return rslt
    end
    stop
  end
end

print "Primes up to a hundred:  "
SoE_Odds.new(100).each { |p| print " ", p }; puts
print "Number of primes to a million:  "
puts SoE_Odds.new(1_000_000).each.size
print "Number of primes to a billion:  "
start_time = Time.monotonic
print SoE_Odds.new(1_000_000_000).each.size
elpsd = (Time.monotonic - start_time).total_milliseconds
puts " in #{elpsd} milliseconds."

Output:

Primes up to a hundred:   2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Number of primes to a million:  78498
Number of primes to a billion:  50847534 in 4877.829642 milliseconds.

As can be seen, this is over two times faster than the non-odds-only version when run on the same CPU due to reduced pressure on the CPU data cache; however it is only reasonably performant for ranges of a few millions, and above that a page-segmented version of odds-only (or further wheel factorization) should be used plus other techniques for a further reduction of number of CPU clock cycles per culling/marking operation.

Page-Segmented Odds-Only Version

For sieving of ranges larger than a few million efficiently, a page-segmented sieve should always be used to preserve CPU cache associativity by making the page size to be about that of the CPU L1 data cache. The following code implements a page-segmented version that is an extensible sieve (no upper limit needs be specified) using a secondary memoized feed of base prime value arrays which use a smaller page-segment size for efficiency. When the count of the number of primes is desired, the sieve is polymorphic in output and counts the unmarked composite bits by using fast `popcount` instructions taken 64-bits at a time. The code is as follows:

# compile with `--release --no-debug` for speed...

alias Prime = UInt64
alias PrimeNdx = Int64
alias PrimeArr = Array(Prime)
alias SieveBuffer = Pointer(UInt8)
alias BasePrime = UInt32
alias BasePrimeArr = Array(BasePrime)

CPUL1CACHE = 131072 # 16 Kilobytes in nimber of bits

BITMASK = Pointer(UInt8).malloc(8) { |i| 1_u8 << i }

# Count number of non-composite (zero) bits within index range...
# sieve buffer is always evenly divisible by 64-bit words...
private def count_page_to(ndx : Int32, sb : SieveBuffer)
  lstwrdndx = ndx >> 6; mask = (~1_u64) << (ndx & 63)
  cnt = lstwrdndx * 64 + 64; sbw = sb.as(Pointer(UInt64))
  lstwrdndx.times { |i| cnt -= sbw[i].popcount }
  cnt - (sbw[lstwrdndx] | mask).popcount
end

# Cull composite bits from sieve buffer using base prime arrays;
# starting at overall given prime index for given buffer bit size...
private def cull_page(pndx : PrimeNdx, bitsz : Int32,
              bps : Iterator(BasePrimeArr), sb : SieveBuffer)
  bps.each { |bpa|
    bpa.each { |bpu32|
      bp = bpu32.to_i64; bpndx = (bp - 3) >> 1
      swi = (bpndx + bpndx) * (bpndx + 3) + 3 # calculate start prime index
      return if swi >= pndx + bitsz.to_i64
      bpi = bp.to_i32 # calculate buffer start culling index...
      bi = (swi >= pndx) ? (swi - pndx).to_i32 : begin
        r = (pndx - swi) % bp; r == 0 ? 0 : bpi - r.to_i32
      end
      # when base prime is small enough, cull using strided loops to
      # simplify the inner loops at the cost of more loop overhead...
      # allmost all of the work is done by the following loop...
      if bpi < (bitsz >> 4)
        bilmt = bi + (bpi << 3); cplmt = sb + (bitsz >> 3)
        bilmt = CPUL1CACHE if bilmt > CPUL1CACHE
        while bi < bilmt
          cp = sb + (bi >> 3); msk = BITMASK[bi & 7]
          while cp < cplmt # use pointer to save loop overhead
            cp[0] |= msk; cp += bpi
          end
          bi += bpi
        end
      else
        while bi < bitsz # bitsz
          sb[bi >> 3] |= BITMASK[bi & 7]; bi += bpi
        end
      end } }
end

# Iterator over processed prime pages, polymorphic by the converter function...
private class PagedResults(T)
  @bpas : BasePrimeArrays
  @cmpsts : SieveBuffer

  def initialize(@prmndx : PrimeNdx,
                 @cmpstsbitsz : Int32,
                 @cnvrtrfnc : (Int64, Int32, SieveBuffer) -> T)
    @bpas = BasePrimeArrays.new
    @cmpsts = SieveBuffer.malloc(((@cmpstsbitsz + 63) >> 3) & (-8))
  end

  private def dopage
    (@prmndx..).step(@cmpstsbitsz.to_i64).map { |pn|
        @cmpsts.clear(@cmpstsbitsz >> 3)
        cull_page(pn, @cmpstsbitsz, @bpas.each, @cmpsts)
        @cnvrtrfnc.call(pn, @cmpstsbitsz, @cmpsts) }
  end

  def each
    dopage
  end

  def each(& : T -> _) : Nil
    itr = dopage
    while true
      value = itr.next
      break if value.is_a?(Iterator::Stop)
      yield value
    end
  end
end

# Secondary memoized chain of BasePrime arrays (by small page size),
# which is actually a iterable lazy list (memoized) of BasePrimeArr;
# Crystal has closures, so it is easy to implement a LazyList class
# which memoizes the results of the thunk so it is only executed once...
private class BasePrimeArrays
  @baseprmarr : BasePrimeArr # head of lezy list
  @tail : BasePrimeArrays? = nil # tail starts as non-existing

  def initialize # special case for first page of base primes
    # converter of sieve buffer to base primes array...
    sb2bparrprc = -> (pn : PrimeNdx, bl : Int32, sb : SieveBuffer) {
      cnt = count_page_to(bl - 1, sb)
      bparr = BasePrimeArr.new(cnt, 0); j = 0
      bsprm = (pn + pn + 3).to_u32
      bl.times.each { |i|
        next if (sb[i >> 3] & BITMASK[i & 7]) != 0 
        bparr[j] = bsprm + (i + i).to_u32; j += 1 }
      bparr }

    cmpsts = SieveBuffer.malloc 128 # fake bparr for first iter...
    frstbparr = sb2bparrprc.call(0_i64, 1024, cmpsts)
    cull_page(0_i64, 1024, Iterator.of(frstbparr).each, cmpsts)
    @baseprmarr = sb2bparrprc.call(0_i64, 1024, cmpsts)

    # initialization of pages after the first is deferred to avoid data race...
    initbpas = -> { PagedResults.new(1024_i64, 1024, sb2bparrprc).each }
    # recursive LazyList generator function...
    nxtbpa = uninitialized Proc(Iterator(BasePrimeArr), BasePrimeArrays)
    nxtbpa = -> (bppgs : Iterator(BasePrimeArr)) {
      nbparr = bppgs.next
      abort "Unexpectedbase primes end!!!" if nbparr.is_a?(Iterator::Stop)
      BasePrimeArrays.new(nbparr, ->{ nxtbpa.call(bppgs) }) }
    @thunk = ->{ nxtbpa.call(initbpas.call) }
  end
  def initialize(@baseprmarr : BasePrimeArr, @thunk : Proc(BasePrimeArrays))
  end
  def initialize(@baseprmarr : BasePrimeArr, @thunk : Proc(Nil))
  end
  def initialize(@baseprmarr : BasePrimeArr, @thunk : Nil)
  end

  def tail # not thread safe without a lock/mutex...
    if thnk = @thunk
      @tail = thnk.call; @thunk = nil
    end
    @tail
  end

  private class BasePrimeArrIter # iterator over BasePrime arrays...
    include Iterator(BasePrimeArr)
    @dbparrs : Proc(BasePrimeArrays?)

    def initialize(fromll : BasePrimeArrays)
      @dbparrs = ->{ fromll.as(BasePrimeArrays?) }
    end

    def next
      if bpas = @dbparrs.call
        rslt = bpas.@baseprmarr; @dbparrs = -> { bpas.tail }; rslt
      else
        abort "Unexpected end of base primes array iteration!!!"
      end
    end
  end
  
  def each
    BasePrimeArrIter.new(self)
  end
end

# An "infinite" extensible iteration of primes,...
def primes
  sb2prms = ->(pn : PrimeNdx, bitsz : Int32, sb : SieveBuffer) {
    cnt = count_page_to(bitsz - 1, sb)
    prmarr = PrimeArr.new(cnt, 0); j = 0
    bsprm = (pn + pn + 3).to_u64
    bitsz.times.each { |i|
      next if (sb[i >> 3] & BITMASK[i & 7]) != 0
      prmarr[j] = bsprm + (i + i).to_u64; j += 1 }
    prmarr
  }
  (2_u64..2_u64).each
    .chain PagedResults.new(0, CPUL1CACHE, sb2prms).each.flat_map { |prmspg| prmspg.each }
end

# Counts number of primes to given limit...
def primes_count_to(lmt : Prime)
  if lmt < 3
    lmt < 2 ? return 0 : return 1
  end
  lmtndx = ((lmt - 3) >> 1).to_i64
  sb2cnt = ->(pn : PrimeNdx, bitsz : Int32, sb : SieveBuffer) {
    pglmt = pn + bitsz.to_i64 - 1
    if (pn + CPUL1CACHE.to_i64) > lmtndx
      Tuple.new(count_page_to((lmtndx - pn).to_i32, sb).to_i64, pglmt)
    else
      Tuple.new(count_page_to(bitsz - 1, sb).to_i64, pglmt)
    end
  }
  count = 1
  PagedResults.new(0, CPUL1CACHE, sb2cnt).each { |(cnt, lmt)|
    count += cnt; break if lmt >= lmtndx }
  count
end

print "The primes up to 100 are: "
primes.each.take_while { |p| p <= 100_u64 }.each { |p| print " ", p }
print ".\r\nThe Number of primes up to a million is "
print primes.each.take_while { |p| p <= 1_000_000_u64 }.size
print ".\r\nThe number of primes up to a billion is "
start_time = Time.monotonic
# answr = primes.each.take_while { |p| p <= 1_000_000_000_u64 }.size # slow way
answr = primes_count_to(1_000_000_000) # fast way
elpsd = (Time.monotonic - start_time).total_milliseconds
print "#{answr} in #{elpsd} milliseconds.\r\n"

Output:

The primes up to 100 are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97.
The Number of primes up to a million is 78498.
The number of primes up to a billion is 50847534 in 658.466028 milliseconds.

When run on the same machine as the previous version, the code is about seven and a half times as fast as even the above Odds-Only version at about 2.4 CPU clock cycles per culling operation rather than over 17, partly due to better cache associativity (about half the gain) but also due to tuning the inner culling loop for small base prime values to operate by byte pointer strides with a constant mask value to simplify the code generated for these inner loops; as there is some overhead in the eight outer loops that set this up, this technique is only applicable for smaller base primes.

Further gains are possible by using maximum wheel factorization rather than just factorization for odd base primes which can reduce the number of operations by a factor of about four and the number of CPU clock cycles per culling operation can be reduced by an average of a further about 25 percent for sieving to a billion by using extreme loop unrolling techniques for both the dense and sparse culling cases. As well, multi-threading by pages can reduce the wall clock time by a factor of the number of effective cores (non Hyper-Threaded cores).

D

Simpler Version

Prints all numbers less than the limit.

import std.stdio, std.algorithm, std.range, std.functional;

uint[] sieve(in uint limit) nothrow @safe {
    if (limit < 2)
        return [];
    auto composite = new bool[limit];

    foreach (immutable uint n; 2 .. cast(uint)(limit ^^ 0.5) + 1)
        if (!composite[n])
            for (uint k = n * n; k < limit; k += n)
                composite[k] = true;

    //return iota(2, limit).filter!(not!composite).array;
    return iota(2, limit).filter!(i => !composite[i]).array;
}

void main() {
    50.sieve.writeln;
}

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

Faster Version

This version uses an array of bits (instead of booleans, that are represented with one byte), and skips even numbers. The output is the same.

import std.stdio, std.math, std.array;

size_t[] sieve(in size_t m) pure nothrow @safe {
    if (m < 3)
        return null;
    immutable size_t n = m - 1;
    enum size_t bpc = size_t.sizeof * 8;
    auto F = new size_t[((n + 2) / 2) / bpc + 1];
    F[] = size_t.max;

    size_t isSet(in size_t i) nothrow @safe @nogc {
        immutable size_t offset = i / bpc;
        immutable size_t mask = 1 << (i % bpc);
        return F[offset] & mask;
    }

    void resetBit(in size_t i) nothrow @safe @nogc {
        immutable size_t offset = i / bpc;
        immutable size_t mask = 1 << (i % bpc);
        if ((F[offset] & mask) != 0)
            F[offset] = F[offset] ^ mask;
    }

    for (size_t i = 3; i <= sqrt(real(n)); i += 2)
        if (isSet((i - 3) / 2))
            for (size_t j = i * i; j <= n; j += 2 * i)
                resetBit((j - 3) / 2);

    Appender!(size_t[]) result;
    result ~= 2;
    for (size_t i = 3; i <= n; i += 2)
        if (isSet((i - 3) / 2))
            result ~= i;
    return result.data;
}

void main() {
    50.sieve.writeln;
}

Extensible Version

(This version is used in the task Extensible prime generator.)

/// Extensible Sieve of Eratosthenes.
struct Prime {
    uint[] a = [2];

    private void grow() pure nothrow @safe {
        immutable p0 = a[$ - 1] + 1;
        auto b = new bool[p0];

        foreach (immutable di; a) {
            immutable uint i0 = p0 / di * di;
            uint i = (i0 < p0) ? i0 + di - p0 : i0 - p0;
            for (; i < b.length; i += di)
                b[i] = true;
        }

        foreach (immutable uint i, immutable bi; b)
            if (!b[i])
                a ~= p0 + i;
    }

    uint opCall(in uint n) pure nothrow @safe {
        while (n >= a.length)
            grow;
        return a[n];
    }
}

version (sieve_of_eratosthenes3_main) {
    void main() {
        import std.stdio, std.range, std.algorithm;

        Prime prime;
        uint.max.iota.map!prime.until!q{a > 50}.writeln;
    }
}

To see the output (that is the same), compile with -version=sieve_of_eratosthenes3_main.

Dart

// helper function to pretty print an Iterable
String iterableToString(Iterable seq) {
  String str = "[";
  Iterator i = seq.iterator;
  if (i.moveNext()) str += i.current.toString();
  while(i.moveNext()) {
    str += ", " + i.current.toString();
  }
  return str + "]";
}

main() {
  int limit = 1000;
  int strt = new DateTime.now().millisecondsSinceEpoch;
  Set<int> sieve = new Set<int>();
  
  for(int i = 2; i <= limit; i++) {
    sieve.add(i);
  }
  for(int i = 2; i * i <= limit; i++) {
   if(sieve.contains(i)) {
     for(int j = i * i; j <= limit; j += i) {
       sieve.remove(j);
     }
   }
  }
  var sortedValues = new List<int>.from(sieve);
  int elpsd = new DateTime.now().millisecondsSinceEpoch - strt;
  print("Found " + sieve.length.toString() + " primes up to " + limit.toString() +
      " in " + elpsd.toString() + " milliseconds.");
  print(iterableToString(sortedValues)); // expect sieve.length to be 168 up to 1000...
//  Expect.equals(168, sieve.length);
}

Output:

Found 168 primes up to 1000 in 9 milliseconds. [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]

Although it has the characteristics of a true Sieve of Eratosthenes, the above code isn't very efficient due to the remove/modify operations on the Set. Due to these, the computational complexity isn't close to linear with increasing range and it is quite slow for larger sieve ranges compared to compiled languages, taking an average of about 22 thousand CPU clock cycles for each of the 664579 primes (about 4 seconds on a 3.6 Gigahertz CPU) just to sieve to ten million.

faster bit-packed array odds-only solution

import 'dart:typed_data';
import 'dart:math';

Iterable<int> soeOdds(int limit) {
  if (limit < 3) return limit < 2 ? Iterable.empty() : [2];
  int lmti = (limit - 3) >> 1;
  int bfsz = (lmti >> 3) + 1;
  int sqrtlmt = (sqrt(limit) - 3).floor() >> 1;
  Uint32List cmpsts = Uint32List(bfsz);
  for (int i = 0; i <= sqrtlmt; ++i)
    if ((cmpsts[i >> 5] & (1 << (i & 31))) == 0) {
      int p = i + i + 3;
      for (int j = (p * p - 3) >> 1; j <= lmti; j += p)
        cmpsts[j >> 5] |= 1 << (j & 31);
    }
  return
    [2].followedBy(
      Iterable.generate(lmti + 1)
      .where((i) => cmpsts[i >> 5] & (1 << (i & 31)) == 0)
      .map((i) => i + i + 3) );
}

void main() {
  final int range = 100000000;
  String s = "( ";
  primesPaged().take(25).forEach((p)=>s += "$p "); print(s + ")");
  print("There are ${countPrimesTo(1000000)} primes to 1000000.");
  final start = DateTime.now().millisecondsSinceEpoch;
  final answer = soeOdds(range).length;
  final elapsed = DateTime.now().millisecondsSinceEpoch - start;
  print("There were $answer primes found up to $range.");
  print("This test bench took $elapsed milliseconds.");
}

Output:

( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes to 1000000.
There were 5761455 primes found up to 100000000.
This test bench took 4604 milliseconds.

The above code is somewhat faster at about 1.5 thousand CPU cycles per prime here run on a 1.92 Gigahertz low end Intel x5-Z8350 CPU or about 2.5 seconds on a 3.6 Gigahertz CPU using the Dart VM to sieve to 100 million.

Unbounded infinite iterators/generators of primes

Infinite generator using a (hash) Map (sieves odds-only)

The following code will have about O(n log (log n)) performance due to a hash table having O(1) average performance and is only somewhat slow due to the constant overhead of processing hashes:

Iterable<int> primesMap() {
    Iterable<int> oddprms() sync* {
      yield(3); yield(5); // need at least 2 for initialization
      final Map<int, int> bpmap = {9: 6};
      final Iterator<int> bps = oddprms().iterator;
      bps.moveNext(); bps.moveNext(); // skip past 3 to 5
      int bp = bps.current;
      int n = bp;
      int q = bp * bp;
      while (true) {
        n += 2;
        while (n >= q || bpmap.containsKey(n)) {
          if (n >= q) {
            final int inc = bp << 1;
            bpmap[bp * bp + inc] = inc;
            bps.moveNext(); bp = bps.current; q = bp * bp;
          } else {
            final int inc = bpmap.remove(n);
            int next = n + inc;
            while (bpmap.containsKey(next)) {
              next += inc;
            }
            bpmap[next] = inc;
          }
          n += 2;
        }
        yield(n);
      }
    }
    return [2].followedBy(oddprms());
}

void main() {
  final int range = 100000000;
  String s = "( ";
  primesMap().take(25).forEach((p)=>s += "$p "); print(s + ")");
  print("There are ${primesMap().takeWhile((p)=>p<=1000000).length} preimes to 1000000.");
  final start = DateTime.now().millisecondsSinceEpoch;
  final answer = primesMap().takeWhile((p)=>p<=range).length;
  final elapsed = DateTime.now().millisecondsSinceEpoch - start;
  print("There were $answer primes found up to $range.");
  print("This test bench took $elapsed milliseconds.");
}

Output:

( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 preimes to 1000000.
There were 5761455 primes found up to 100000000.
This test bench took 16086 milliseconds.

This takes about 5300 CPU clock cycles per prime or about 8.4 seconds if run on a 3.6 Gigahertz CPU, which is slower than the above fixed bit-packed array version but has the advantage that it runs indefinitely, (at least on 64-bit machines; on 32 bit machines it can only be run up to the 32-bit number range, or just about a factor of 20 above as above).

Due to the constant execution overhead this is only reasonably useful for ranges up to tens of millions anyway.

Fast page segmented array infinite generator (sieves odds-only)

The following code also theoretically has a O(n log (log n)) execution speed performance and the same limited use on 32-bit execution platformas, but won't realize the theoretical execution complexity for larger primes due to the cache size increasing in size beyond its limits; but as the CPU L2 cache size that it automatically grows to use isn't any slower than the basic culling loop speed, it won't slow down much above that limit up to ranges of about 2.56e14, which will take in the order of weeks:

Translation of: Kotlin

import 'dart:typed_data';
import 'dart:math';
import 'dart:collection';

// a lazy list
typedef _LazyList _Thunk();
class _LazyList<T> {
  final T head;
  _Thunk thunk;
  _LazyList<T> _rest;
  _LazyList(T this.head, _Thunk this.thunk);
  _LazyList<T> get rest {
    if (this.thunk != null) {
      this._rest = this.thunk();
      this.thunk = null;
    }
    return this._rest;
  }
}

class _LazyListIterable<T> extends IterableBase<T> {
  _LazyList<T> _first;
  _LazyListIterable(_LazyList<T> this._first);
  @override Iterator<T> get iterator {
    Iterable<T> inner() sync* {
      _LazyList<T> current = this._first;
      while (true) {
        yield(current.head);
        current = current.rest;
      }
    }
    return inner().iterator;
  }
}

// zero bit population count Look Up Table for 16-bit range...
final Uint8List CLUT =
  Uint8List.fromList(
    Iterable.generate(65536)
    .map((i) {
      final int v0 = ~i & 0xFFFF;
      final int v1 = v0 - ((v0 & 0xAAAA) >> 1);
      final int v2 = (v1 & 0x3333) + ((v1 & 0xCCCC) >> 2);
      return (((((v2 & 0x0F0F) + ((v2 & 0xF0F0) >> 4)) * 0x0101)) >> 8) & 31;
    })
    .toList());

int _countComposites(Uint8List cmpsts) {
  Uint16List buf = Uint16List.view(cmpsts.buffer);
  int lmt = buf.length;
  int count = 0;
  for (var i = 0; i < lmt; ++i) {
    count += CLUT[buf[i]];
  }
  return count;
} 

// converts an entire sieved array of bytes into an array of UInt32 primes,
// to be used as a source of base primes...
Uint32List _composites2BasePrimeArray(int low, Uint8List cmpsts) {
  final int lmti = cmpsts.length << 3;
  final int len = _countComposites(cmpsts);
  final Uint32List rslt = Uint32List(len);
  int j = 0;
  for (int i = 0; i < lmti; ++i) {
    if (cmpsts[i >> 3] & 1 << (i & 7) == 0) {
        rslt[j++] = low + i + i;
    }
  }
  return rslt;
}

// do sieving work based on low starting value for the given buffer and
// the given lazy list of base prime arrays...
void _sieveComposites(int low, Uint8List buffer, Iterable<Uint32List> bpas) {
  final int lowi = (low - 3) >> 1;
  final int len = buffer.length;
  final int lmti = len << 3;
  final int nxti = lowi + lmti;
  for (var bpa in bpas) {
    for (var bp in bpa) {
      final int bpi = (bp - 3) >> 1;
      int strti = ((bpi * (bpi + 3)) << 1) + 3;
      if (strti >= nxti) return;
      if (strti >= lowi) strti = strti - lowi;
      else {
        strti = (lowi - strti) % bp;
        if (strti != 0) strti = bp - strti;
      }
      if (bp <= len >> 3 && strti <= lmti - bp << 6) {
        final int slmti = min(lmti, strti + bp << 3);
        for (var s = strti; s < slmti; s += bp) {
          final int msk = 1 << (s & 7);
          for (var c = s >> 3; c < len; c += bp) {
              buffer[c] |= msk;
          }
        }
      }
      else {
        for (var c = strti; c < lmti; c += bp) {
            buffer[c >> 3] |= 1 << (c & 7);
        }
      }
    }
  } 
}

// starts the secondary base primes feed with minimum size in bits set to 4K...
// thus, for the first buffer primes up to 8293,
// the seeded primes easily cover it as 97 squared is 9409...
Iterable<Uint32List> _makeBasePrimeArrays() {
  var cmpsts = Uint8List(512);
  _LazyList<Uint32List> _nextelem(int low, Iterable<Uint32List> bpas) {
    // calculate size so that the bit span is at least as big as the
    // maximum culling prime required, rounded up to minsizebits blocks...
    final int rqdsz = 2 + sqrt((1 + low).toDouble()).toInt();
    final sz = (((rqdsz >> 12) + 1) << 9); // size in bytes
    if (sz > cmpsts.length) cmpsts = Uint8List(sz);
    cmpsts.fillRange(0, cmpsts.length, 0);
    _sieveComposites(low, cmpsts, bpas);
    final arr = _composites2BasePrimeArray(low, cmpsts);
    final nxt = low + (cmpsts.length << 4);
    return _LazyList(arr, () => _nextelem(nxt, bpas));
  }
  // pre-seeding breaks recursive race,
  // as only known base primes used for first page...
  final preseedarr = Uint32List.fromList( [ // pre-seed to 100, can sieve to 10,000...
    3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
    , 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ] );
  return _LazyListIterable(
           _LazyList(preseedarr,
           () => _nextelem(101, _makeBasePrimeArrays()))
         );
}

// an iterable sequence over successive sieved buffer composite arrays,
// returning a tuple of the value represented by the lowest possible prime
// in the sieved composites array and the array itself;
// the array has a 16 Kilobytes minimum size (CPU L1 cache), but
// will grow so that the bit span is larger than the
// maximum culling base prime required, possibly making it larger than
// the L1 cache for large ranges, but still reasonably efficient using
// the L2 cache: very efficient up to about 16e9 range;
// reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 day...
Iterable<List> _makeSievePages() sync*  {
  final bpas = _makeBasePrimeArrays(); // secondary source of base prime arrays
  int low = 3;
  Uint8List cmpsts = Uint8List(16384);
  _sieveComposites(3, cmpsts, bpas);
  while (true) {
    yield([low, cmpsts]);
    final rqdsz = 2 + sqrt((1 + low).toDouble()).toInt(); // problem with sqrt not exact past about 10^12!!!!!!!!!
    final sz = ((rqdsz >> 17) + 1) << 14; // size iin bytes
    if (sz > cmpsts.length) cmpsts = Uint8List(sz);
    cmpsts.fillRange(0, cmpsts.length, 0);
    low += cmpsts.length << 4;
    _sieveComposites(low, cmpsts, bpas);
  }
}

int countPrimesTo(int range) {
  if (range < 3) { if (range < 2) return 0; else return 1; }
  var count = 1;
  for (var sp in _makeSievePages()) {
    int low = sp[0]; Uint8List cmpsts = sp[1];
    if ((low + (cmpsts.length << 4)) > range) {
      int lsti = (range - low) >> 1;
      var lstw = (lsti >> 4); var lstb = lstw << 1;
      var msk = (-2 << (lsti & 15)) & 0xFFFF;
      var buf = Uint16List.view(cmpsts.buffer, 0, lstw);
      for (var i = 0; i < lstw; ++i)
        count += CLUT[buf[i]];
      count += CLUT[(cmpsts[lstb + 1] << 8) | cmpsts[lstb] | msk];
      break;
    } else {
      count += _countComposites(cmpsts);
    }
  }
  return count;
}

// sequence over primes from above page iterator;
// unless doing something special with individual primes, usually unnecessary;
// better to do manipulations based on the composites bit arrays...
// takes at least as long to enumerate the primes as sieve them...
Iterable<int> primesPaged() sync* {
  yield(2);
  for (var sp in _makeSievePages()) {
    int low = sp[0]; Uint8List cmpsts = sp[1];
    var szbts = cmpsts.length << 3;
    for (var i = 0; i < szbts; ++i) {
        if (cmpsts[i >> 3].toInt() & (1 << (i & 7)) != 0) continue;
        yield(low + i + i);
    }
  }
}

void main() {
  final int range = 1000000000;
  String s = "( ";
  primesPaged().take(25).forEach((p)=>s += "$p "); print(s + ")");
  print("There are ${countPrimesTo(1000000)} primes to 1000000.");
  final start = DateTime.now().millisecondsSinceEpoch;
  final answer = countPrimesTo(range); // fast way
//  final answer = primesPaged().takeWhile((p)=>p<=range).length; // slow way using enumeration
  final elapsed = DateTime.now().millisecondsSinceEpoch - start;
  print("There were $answer primes found up to $range.");
  print("This test bench took $elapsed milliseconds.");
}

Output:

( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes to 1000000.
There were 50847534 primes found up to 1000000000.
This test bench took 9385 milliseconds.

This version counts the primes up to one billion in about five seconds at 3.6 Gigahertz (a low end 1.92 Gigahertz CPU used here) or about 350 CPU clock cycles per prime under the Dart Virtual Machine (VM).

Note that it takes about four times as long to do this using the provided primes generator/enumerator as noted in the code, which is normal for all languages that it takes longer to actually enumerate the primes than it does to sieve in culling the composite numbers, but Dart is somewhat slower than most for this.

The algorithm can be sped up by a factor of four by extreme wheel factorization and (likely) about a factor of the effective number of CPU cores by using multi-processing isolates, but there isn't much point if one is to use the prime generator for output. For most purposes, it is better to use custom functions that directly manipulate the culled bit-packed page segments as `countPrimesTo` does here.

dc

[dn[,]n dsx [d 1 r :a lx + d ln!<.] ds.x lx] ds@
[sn 2 [d;a 0=@ 1 + d ln!<#] ds#x] se

100 lex

Output:

2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,\
97,

Delphi

program erathostenes;

{$APPTYPE CONSOLE}

type
  TSieve = class
  private
    fPrimes: TArray<boolean>;
    procedure InitArray;
    procedure Sieve;
    function getNextPrime(aStart: integer): integer;
    function getPrimeArray: TArray<integer>;
  public
    function getPrimes(aMax: integer): TArray<integer>;
  end;

  { TSieve }

function TSieve.getNextPrime(aStart: integer): integer;
begin
  result := aStart;
  while not fPrimes[result] do
    inc(result);
end;

function TSieve.getPrimeArray: TArray<integer>;
var
  i, n: integer;
begin
  n := 0;
  setlength(result, length(fPrimes)); // init array with maximum elements
  for i := 2 to high(fPrimes) do
  begin
    if fPrimes[i] then
    begin
      result[n] := i;
      inc(n);
    end;
  end;
  setlength(result, n); // reduce array to actual elements
end;

function TSieve.getPrimes(aMax: integer): TArray<integer>;
begin
  setlength(fPrimes, aMax);
  InitArray;
  Sieve;
  result := getPrimeArray;
end;

procedure TSieve.InitArray;
begin
  for i := 2 to high(fPrimes) do
    fPrimes[i] := true;
end;

procedure TSieve.Sieve;
var
  i, n, max: integer;
begin
  max := length(fPrimes);
  i := 2;
  while i < sqrt(max) do
  begin
    n := sqr(i);
    while n < max do
    begin
      fPrimes[n] := false;
      inc(n, i);
    end;
    i := getNextPrime(i + 1);
  end;
end;

var
  i: integer;
  Sieve: TSieve;

begin
  Sieve := TSieve.Create;
  for i in Sieve.getPrimes(100) do
    write(i, ' ');
  Sieve.Free;
  readln;
end.

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Draco

/* Sieve of Eratosthenes - fill a given boolean array */
proc nonrec sieve([*] bool prime) void:
    word p, c, max;
    max := dim(prime,1)-1;
    prime[0] := false;
    prime[1] := false;
    for p from 2 upto max do prime[p] := true od;
    for p from 2 upto max>>1 do
        if prime[p] then
            for c from p*2 by p upto max do
                prime[c] := false
            od
        fi
    od
corp

/* Print primes up to 1000 using the sieve */
proc nonrec main() void:
    word MAX = 1000;
    unsigned MAX i;
    byte c;
    [MAX+1] bool prime;
    sieve(prime);     

    c := 0;
    for i from 0 upto MAX do
        if prime[i] then
            write(i:4);
            c := c + 1;
            if c=10 then c:=0; writeln() fi
        fi
    od
corp

Output:

   2   3   5   7  11  13  17  19  23  29
  31  37  41  43  47  53  59  61  67  71
  73  79  83  89  97 101 103 107 109 113
 127 131 137 139 149 151 157 163 167 173
 179 181 191 193 197 199 211 223 227 229
 233 239 241 251 257 263 269 271 277 281
 283 293 307 311 313 317 331 337 347 349
 353 359 367 373 379 383 389 397 401 409
 419 421 431 433 439 443 449 457 461 463
 467 479 487 491 499 503 509 521 523 541
 547 557 563 569 571 577 587 593 599 601
 607 613 617 619 631 641 643 647 653 659
 661 673 677 683 691 701 709 719 727 733
 739 743 751 757 761 769 773 787 797 809
 811 821 823 827 829 839 853 857 859 863
 877 881 883 887 907 911 919 929 937 941
 947 953 967 971 977 983 991 997

DWScript

function Primes(limit : Integer) : array of Integer;
var
   n, k : Integer;
   sieve := new Boolean[limit+1];
begin
   for n := 2 to Round(Sqrt(limit)) do begin
      if not sieve[n] then begin
         for k := n*n to limit step n do
            sieve[k] := True;
      end;
   end;
   
   for k:=2 to limit do
      if not sieve[k] then
         Result.Add(k);
end;

var r := Primes(50);
var i : Integer;
for i:=0 to r.High do
   PrintLn(r[i]);

Dylan

With outer to sqrt and inner to p^2 optimizations:

define method primes(n)
  let limit = floor(n ^ 0.5) + 1;
  let sieve = make(limited(<simple-vector>, of: <boolean>), size: n + 1, fill: #t);
  let last-prime = 2;

  while (last-prime < limit)
    for (x from last-prime ^ 2 to n by last-prime)
      sieve[x] := #f;
    end for;
    block (found-prime)
      for (n from last-prime + 1 below limit)
        if (sieve[n] = #f)
          last-prime := n;
          found-prime()
        end;
      end;
      last-prime := limit;
    end block;
  end while;

  for (x from 2 to n)
    if (sieve[x]) format-out("Prime: %d\n", x); end;
  end;
end;

E

E's standard library doesn't have a step-by-N numeric range, so we'll define one, implementing the standard iteration protocol.

def rangeFromBelowBy(start, limit, step) {
  return def stepper {
    to iterate(f) {
      var i := start
      while (i < limit) {
        f(null, i)
        i += step
      }
    }
  }
}

The sieve itself:

def eratosthenes(limit :(int > 2), output) {
  def composite := [].asSet().diverge()
  for i ? (!composite.contains(i)) in 2..!limit {
    output(i)
    composite.addAll(rangeFromBelowBy(i ** 2, limit, i))
  }
}

Example usage:

? eratosthenes(12, println)
# stdout: 2
#         3
#         5
#         7
#         11

EasyLang

len is_divisible[] 100
max = sqrt len is_divisible[]
for d = 2 to max
   if is_divisible[d] = 0
      for i = d * d step d to len is_divisible[]
         is_divisible[i] = 1
      .
   .
.
for i = 2 to len is_divisible[]
   if is_divisible[i] = 0
      print i
   .
.

eC

This example is incorrect. Please fix the code and remove this message.

Details: It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.

Note: this is not a Sieve of Eratosthenes; it is just trial division.

public class FindPrime
{
   Array<int> primeList { [ 2 ], minAllocSize = 64 };
   int index;

   index = 3;

   bool HasPrimeFactor(int x)
   {
      int max = (int)floor(sqrt((double)x));
     
      for(i : primeList)
      {
         if(i > max) break;
         if(x % i == 0) return true;
      }
      return false;
   }

   public int GetPrime(int x)
   {
      if(x > primeList.count - 1)
      {
         for (; primeList.count != x; index += 2)
            if(!HasPrimeFactor(index))
            {
               if(primeList.count >= primeList.minAllocSize) primeList.minAllocSize *= 2;
               primeList.Add(index);
            }
      }
      return primeList[x-1];
   }
}

class PrimeApp : Application
{
   FindPrime fp { };
   void Main()
   {
      int num = argc > 1 ? atoi(argv[1]) : 1;
      PrintLn(fp.GetPrime(num));
   }
}

EchoLisp

Sieve

(require 'types) ;; bit-vector

;; converts sieve->list for integers in [nmin .. nmax[
(define (s-range sieve nmin nmax (base 0))
	(for/list ([ i (in-range nmin nmax)]) #:when (bit-vector-ref sieve i) (+ i base)))
	
;; next prime in sieve > p, or #f
(define (s-next-prime sieve p ) ;; 
		(bit-vector-scan-1 sieve (1+ p)))
		

;; returns a bit-vector - sieve- all numbers in [0..n[
(define (eratosthenes n)
  (define primes (make-bit-vector-1 n ))
  (bit-vector-set! primes 0 #f)
  (bit-vector-set! primes 1 #f)
  (for ([p (1+ (sqrt n))])
  		 #:when (bit-vector-ref primes  p)
         (for ([j (in-range (* p p) n p)])
    (bit-vector-set! primes j #f)))
   primes) 
  
(define s-primes (eratosthenes 10_000_000))

(s-range s-primes 0 100)
   → (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)
(s-range s-primes 1_000_000 1_000_100)
   → (1000003 1000033 1000037 1000039 1000081 1000099)
(s-next-prime s-primes 9_000_000)
   → 9000011

Segmented sieve

Allow to extend the basis sieve (n) up to n^2. Memory requirement is O(√n)

;; ref :  http://research.cs.wisc.edu/techreports/1990/TR909.pdf
;; delta multiple of  sqrt(n)
;; segment is [left .. left+delta-1]


(define (segmented sieve left delta  (p 2) (first 0))
	(define segment (make-bit-vector-1 delta))
	(define right (+ left (1- delta)))
	 (define pmax (sqrt right))
	  (while p
	  #:break (> p pmax)
	 (set! first (+ left (modulo (- p (modulo left p)) p )))
			
 	(for   [(q (in-range first (1+ right) p))]
	(bit-vector-set! segment (- q left) #f))	
        (set! p (bit-vector-scan-1 sieve (1+ p))))
	segment)

(define (seg-range nmin delta)
    (s-range (segmented s-primes nmin delta) 0 delta nmin))


(seg-range 10_000_000_000 1000) ;; 15 milli-sec

    → (10000000019 10000000033 10000000061 10000000069 10000000097 10000000103 10000000121 
       10000000141 10000000147 10000000207 10000000259 10000000277 10000000279 10000000319 
       10000000343 10000000391 10000000403 10000000469 10000000501 10000000537 10000000583 
       10000000589 10000000597 10000000601 10000000631 10000000643 10000000649 10000000667 
       10000000679 10000000711 10000000723 10000000741 10000000753 10000000793 10000000799 
       10000000807 10000000877 10000000883 10000000889 10000000949 10000000963 10000000991 
       10000000993 10000000999)

;; 8 msec using the native (prime?) function
(for/list ((p (in-range 1_000_000_000 1_000_001_000))) #:when (prime? p) p)

Wheel

A 2x3 wheel gives a 50% performance gain.

;; 2x3 wheel
(define (weratosthenes n)
  (define primes (make-bit-vector n )) ;; everybody to #f (false)
  (bit-vector-set! primes 2 #t)
  (bit-vector-set! primes 3 #t)
  (bit-vector-set! primes 5 #t)
  
  (for ([i  (in-range 6 n 6) ]) ;; set candidate primes
  		(bit-vector-set! primes (1+ i) #t)
  		(bit-vector-set! primes (+ i 5) #t)
  		)
  		
  (for ([p  (in-range 5 (1+ (sqrt n)) 2 ) ])
  		 #:when (bit-vector-ref primes  p)
         (for ([j (in-range (* p p) n p)])
    (bit-vector-set! primes j #f)))
   primes)

EDSAC order code

This sieve program is based on one by Eiiti Wada, which on 2020-07-05 could be found at https://www.dcs.warwick.ac.uk/~edsac/

The main external change is that the program is not designed to be viewed in the monitor; it just writes as many primes as possible within the limitations imposed by Rosetta Code. Apart from the addition of comments, internal changes include the elimination of one set of masks, and a revised method of switching from one mask to another.

On the EdsacPC simulator (see link above) the printout starts off very slowly, and gradually gets faster.

 [Sieve of Eratosthenes]
 [EDSAC program. Initial Orders 2]

[Memory usage:
   56..87   library subroutine P6, for printing
   88..222  main program
  224..293  mask table: 35 long masks; each has 34 1's and a single 0
  294..1023 array of bits for integers 2, 3, 4, ...,
            where bit is changed from 1 to 0 when integer is crossed out.
  The address of the mask table must be even, and clear of the main program.
  To change it, just change the value after "T47K" below.
  The address of the bit array will then be changed automatically.]
 
[Subroutine M3, prints header, terminated by blank row of tape.
 It's an "interlude", which runs and then gets overwritten.]
      PFGKIFAFRDLFUFOFE@A6FG@E8FEZPF
      @&*SIEVE!OF!ERATOSTHENES!#2020
      @&*BASED!ON!CODE!BY!EIITI!WADA!#2001
      ..PZ

[Subroutine P6, prints strictly positive integer.
 32 locations; working locations 1, 4, 5.]
        T  56 K
  GKA3FT25@H29@VFT4DA3@TFH30@S6@T1FV4DU4DAFG26@TFTF
  O5FA4DF4FS4FL4FT4DA1FS3@G9@EFSFO31@E20@J995FJF!F

  [Store address of mask table in (say) location 47
  (chosen because its code letter M is first letter of "Mask").
  Address must be even and clear of main program.]
        T  47 K
        P 224 F  [<-------- address of mask table]

[Main program]
        T  88 K  [Define load address for main program.
                 Must be even, because of long values at start.]
        G     K [set @ (theta) for relative addressing]

[Long constants]
        T#Z PF TZ [clears sandwich digit between 0 and 1]
    [0] PD PF     [long value 1; also low word = short 1]
        T2#Z PF T2Z [clears sandwich digit between 2 and 3]
    [2] PF K4096F [long value 1000...000 binary;
                   also high word = teleprinter null]

 [Short constants
  The address in the following C order is the (exclusive) end of the bit table.
  Must be even: max = 1024, min = M + 72 where M is address of mask table set up above.
  Usually 1024, but may be reduced, e.g. to make the program run faster.]
    [4] C1024 D [or e.g. C 326 D to make it much faster]
    [5] U     F ['U' = 'T' - 'C']
    [6] K     F ['K' = 'S' - 'C']
    [7] H    #M [H order for start of mask table]
    [8] H  70#M [used to test for end of mask table]
    [9] P   2 F [constant4, or 2 in address field]
   [10] P  70 F [constant 140, or 70 in address field]
   [11] @     F [carriage return]
   [12] &     F [line feed]

 [Short variables]
   [13] P   1 F [p = number under test
                Let p = 35*q + r, where 0 <= r < 35]
   [14] P     F [4*q]
   [15] P   4 F [4*r]

 [Initial values of orders; required only for optional code below.]
   [16] C  70#M [initial value of a variable C order]
   [17] T    #M [initial value of a variable T order]
   [18] T  70#M [initial value of a variable T order]

   [19]
   [Enter with acc = 0]

  [Optional code to do some initializing at run time.
   This code allows the program to run again without being loaded again.]
         A  7 @ [initial values of variable orders]
         T 65 @
         A 16 @
         T 66 @
         A 17 @
         T 44 @
         A 18 @
         T 52 @

  [Initialize variables]
         A    @ [load 1 (short)]
         L    D [shift left 1]
         U 13 @ [p := 2]
         L  1 F [shift left 2]
         T 15 @ [4*r :=  8]
         T 14 @ [4*q :=  0]
  [End of optional code]

 [Make table of 35 masks 111...110,  111...101, ...,  011...111
  Treat the mask 011...111 separately to avoid accumulator overflow.
  Assume acc = 0 here.]
        S    #@ [acc all 1's]
        S  2 #@ [acc := 0111...111]
   [35] T 68 #M [store at high end of mask table]
        S    #@ [acc := -1]        
        L     D [make mask 111...1110]
        G 43  @ [jump to store it
 [Loop shifting the mask right and storing the result in the mask table.
  Uses first entry of bit array as temporary store.]
   [39] T     F [clear acc]
        A 70 #M [load previous mask]
        L     D [shift left]
        A    #@ [add 1]
   [43] U 70 #M [update current mask]
   [44] T    #M [store it in table (order changed at run time)]
        A 44  @ [load preceding T order]
        A  9  @ [inc address by 2]
        U 44  @ [store back]
        S 35  @ [reached high entry yet?]
        G 39  @ [loop back if not]
 [Mask table is now complete]

 [Initialize bit array: no numbers crossed out, so all bits are 1]
   [50] T     F [clear acc]
        S     #@ [subtract long 1, make top 35 bits all 1's]
   [52] T 70 #M [store as long value, both words all 1's  (order changed at run time)]
        A  52 @ [load preceding order]
        A   9 @ [add 2 to address field]
        U  52 @ [and store back]
        S   5 @ [convert to C order with same address (*)]
        S   4 @ [test for end of bit array]
        G  50 @ [loop until stored all 1's in bit table]
 [(*) Done so that end of bit table can be stored at one place only
      in list of constants, i.e. 'C m D' only, not 'T m D' as well.]

 [Start of main loop.]
 [Testing whether number has been crossed out]
   [59] T     F [acc := 0]
        A  66 @ [deriving S order from C order]
        A   6 @
        T  64 @
        S    #@ [acc := -1]
   [64] S     F [acc := 1's complement of bit-table entry (order changed at run time)]
   [65] H    #M [mult reg := start of mask array (order changed at run time)]
   [66] C  70#M [acc := -1 iff p (current number) is crossed out (order changed at run time)]
 [The next order is to avoid accumulator overflow if acc = max positive number]
        E  70 @ [if acc >= 0, jump to process new prime]
       A     #@ [if acc < 0, add 1 to test for -1]
       E 106  @ [if acc now >= 0 number is crossed out, jump to test next]
 [Here if new prime found.
  Send it to the teleprinter]
   [70] O  11 @ [print CR]
        O  12 @ [print LF]
        T     F [clear acc]
        A  13 @ [load prime]
        T     F [store in C(0) for print routine]
        A  75 @ [for subroutine return]
        G 56  F [print prime]

 [Cross out its multiples by setting corresponding bits to 0]
        A  65 @ [load H order above]
        T 102 @ [plant in crossing-out loop]
        A  66 @ [load C order above]
        T1 03 @ [plant in crossing-out loop]

 [Start of crossing-out loop. Here acc must = 0]
   [81] A 102 @ [load H order below]
        A  15 @ [inc address field by 2*r, where p = 35q + r]
        U 102 @ [update H order]
        S   8 @ [compare with 'H 70 #M']
        G  93 @ [skip if not gone beyond end of mask table]
        T     F [wrap mask address and inc address in bit tsble]
        A 102 @ [load H order below]
        S  10 @ [reduce mask address by 70]
        T 102 @ [update H order]
        A 103 @ [load C order below]
        A   9 @ [add 2 to address]
        T 103 @ [update C order]
   [93] T     F [clear acc]
        A 103 @ [load C order below]
        A  14 @ [inc address field by 2*q, where p = 35q + r]
        U 103 @ [update C order]
        S   4 @ [test for end of bit array]
        E 106 @ [if finished crossing out, loop to test next number]
        A  4  @ [restore C order]
        A  5  @ [make T order with same address]
        T 104 @ [store below]

 [Execute the crossing-out orders created above]
  [102] X     F [mult reg := mask (order created at run time)]
  [103] X     F [acc := logical and with bit-table entry (order created at run time)]
  [104] X     F [update entry (order created at run time)]
        E  81 @ [loop back with acc = 0]

  [106] T     F [clear acc]
        A  13 @ [load p = number under test]
        A     @ [add 1 (single)]
        T  13 @ [update]
        A  15 @ [load 4*r, where p = 35q + r]
        A   9 @ [add 4]
        U  15 @ [store back (r inc'd by 1)]
        S  10 @ [is 4*r now >= 140?]
        G 119 @ [no, skip]
        T  15 @ [yes, reduce 4*r by 140]
        A  14 @ [load 4*q]
        A   9 @ [add 4]
        T  14 @ [store back (q inc'd by 1)]
  [119] T     F [clear acc]
        A  65 @ [load 'H ... D' order, which refers to a mask]
        A   9 @ [inc mask address by 2]
        U  65 @ [update order]
        S   8 @ [over end of mask table?]
        G  59 @ [no, skip wrapround code]
        A   7 @ [yes, add constant to wrap round]
        T  65 @ [update H order]
        A  66 @ 
        A   9 @ [inc address by 2]
        U  66 @ [and store back]
        S   4 @ [test for end, as defined by C order at start]
        G  59 @ [loop back if not at end]

[Finished whole thing]
  [132] O   3 @ [output null to flush teleprinter buffer]
        Z     F [stop]
        E  19 Z [address to start execution]
        P     F [acc = 0 at start]

Output:

SIEVE OF ERATOSTHENES 2020
BASED ON CODE BY EIITI WADA 2001
    2
    3
    5
    7
   11
   13
   17
[...]
12703
12713
12721
12739
12743
12757
12763

Eiffel

Works with: EiffelStudio version 6.6 beta (with provisional loop syntax)

class
    APPLICATION
 
create
    make
 
feature
       make
            -- Run application.
        do
            across primes_through (100) as ic loop print (ic.item.out + " ") end
        end
 
    primes_through (a_limit: INTEGER): LINKED_LIST [INTEGER]
            -- Prime numbers through `a_limit'
        require
            valid_upper_limit: a_limit >= 2
        local
            l_tab: ARRAY [BOOLEAN]
        do
            create Result.make
            create l_tab.make_filled (True, 2, a_limit)
            across
                l_tab as ic
            loop
                if ic.item then
                    Result.extend (ic.target_index)
                    across ((ic.target_index * ic.target_index) |..| l_tab.upper).new_cursor.with_step (ic.target_index) as id
                    loop
                        l_tab [id.item] := False
                    end
                end
            end
        end
end

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Elixir

defmodule Prime do
  def eratosthenes(limit \\ 1000) do
    sieve = [false, false | Enum.to_list(2..limit)] |> List.to_tuple
    check_list = [2 | Stream.iterate(3, &(&1+2)) |> Enum.take(round(:math.sqrt(limit)/2))]
    Enum.reduce(check_list, sieve, fn i,tuple ->
      if elem(tuple,i) do
        clear_num = Stream.iterate(i*i, &(&1+i)) |> Enum.take_while(fn x -> x <= limit end)
        clear(tuple, clear_num)
      else
        tuple
      end
    end)
  end
  
  defp clear(sieve, list) do
    Enum.reduce(list, sieve, fn i, acc -> put_elem(acc, i, false) end)
  end
end

limit = 199
sieve = Prime.eratosthenes(limit)
Enum.each(0..limit, fn n ->
  if x=elem(sieve, n), do: :io.format("~3w", [x]), else: :io.format("  .") 
  if rem(n+1, 20)==0, do: IO.puts ""
end)

Output:

  .  .  2  3  .  5  .  7  .  .  . 11  . 13  .  .  . 17  . 19
  .  .  . 23  .  .  .  .  . 29  . 31  .  .  .  .  . 37  .  .
  . 41  . 43  .  .  . 47  .  .  .  .  . 53  .  .  .  .  . 59
  . 61  .  .  .  .  . 67  .  .  . 71  . 73  .  .  .  .  . 79
  .  .  . 83  .  .  .  .  . 89  .  .  .  .  .  .  . 97  .  .
  .101  .103  .  .  .107  .109  .  .  .113  .  .  .  .  .  .
  .  .  .  .  .  .  .127  .  .  .131  .  .  .  .  .137  .139
  .  .  .  .  .  .  .  .  .149  .151  .  .  .  .  .157  .  .
  .  .  .163  .  .  .167  .  .  .  .  .173  .  .  .  .  .179
  .181  .  .  .  .  .  .  .  .  .191  .193  .  .  .197  .199

Shorter version (but slow):

defmodule Sieve do
  def primes_to(limit), do: sieve(Enum.to_list(2..limit))

  defp sieve([h|t]), do: [h|sieve(t -- for n <- 1..length(t), do: h*n)]
  defp sieve([]), do: []
end

Alternate much faster odds-only version more suitable for immutable data structures using a (hash) Map

The above code has a very limited useful range due to being very slow: for example, to sieve to a million, even changing the algorithm to odds-only, requires over 800 thousand "copy-on-update" operations of the entire saved immutable tuple ("array") of 500 thousand bytes in size, making it very much a "toy" application. The following code overcomes that problem by using a (immutable/hashed) Map to store the record of the current state of the composite number chains resulting from each of the secondary streams of base primes, which are only 167 in number up to this range; it is a functional "incremental" Sieve of Eratosthenes implementation:

defmodule PrimesSoEMap do
  @typep stt :: {integer, integer, integer, Enumerable.integer, %{integer => integer}}

  @spec advance(stt) :: stt
  defp advance {n, bp, q, bps?, map} do
    bps = if bps? === nil do Stream.drop(oddprms(), 1) else bps? end
    nn = n + 2
    if nn >= q do
      inc = bp + bp
      nbps = bps |> Stream.drop(1)
      [nbp] = nbps |> Enum.take(1)
      advance {nn, nbp, nbp * nbp, nbps, map |> Map.put(nn + inc, inc)}
    else if Map.has_key?(map, nn) do
      {inc, rmap} = Map.pop(map, nn)
      [next] =
        Stream.iterate(nn + inc, &(&1 + inc))
          |> Stream.drop_while(&(Map.has_key?(rmap, &1))) |> Enum.take(1)
      advance {nn, bp, q, bps, Map.put(rmap, next, inc)}
    else
      {nn, bp, q, bps, map}
    end end
  end

  @spec oddprms() :: Enumerable.integer
  defp oddprms do # put first base prime cull seq in Map so never empty
    # advance base odd primes to 5 when initialized
    init = {7, 5, 25, nil, %{9 => 6}}
    [3, 5] # to avoid race, preseed with the first 2 elements...
      |> Stream.concat(
            Stream.iterate(init, &(advance &1))
              |> Stream.map(fn {p,_,_,_,_} -> p end))
  end

  @spec primes() :: Enumerable.integer
  def primes do
    Stream.concat([2], oddprms())
  end

end

range = 1000000
IO.write "The first 25 primes are:\n( "
PrimesSoEMap.primes() |> Stream.take(25) |> Enum.each(&(IO.write "#{&1} "))
IO.puts ")"
testfunc =
  fn () ->
    ans =
      PrimesSoEMap.primes() |> Stream.take_while(&(&1 <= range)) |> Enum.count()
    ans end
:timer.tc(testfunc)
  |> (fn {t,ans} ->
    IO.puts "There are #{ans} primes up to #{range}."
    IO.puts "This test bench took #{t} microseconds." end).()

Output:

The first 25 primes are:
( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
There are 78498 primes up to 1000000.
This test bench took 3811957 microseconds.

The output time of about 3.81 seconds to one million is on a 1.92 Gigahertz CPU meaning that it takes about 93 thousand CPU clock cycles per prime which is still quite slow compared to mutable data structure implementations but comparable to "functional" implementations in other languages and is slow due to the time to calculate the required hashes. One advantage that it has is that it is O(n log (log n)) asymptotic computational complexity meaning that it takes not much more than ten times as long to sieve a range ten times higher.

This algorithm could be easily changed to use a Priority Queue (preferably Min-Heap based for the least constant factor computational overhead) to save some of the computation time, but then it will have the same computational complexity as the following code and likely about the same execution time.

Alternate faster odds-only version more suitable for immutable data structures using lazy Streams of Co-Inductive Streams

In order to save the computation time of computing the hashes, the following version uses a deferred execution Co-Inductive Stream type (constructed using Tuple's) in an infinite tree folding structure (by the `pairs` function):

defmodule PrimesSoETreeFolding do
  @typep cis :: {integer, (() -> cis)}
  @typep ciss :: {cis, (() -> ciss)}

  @spec merge(cis, cis) :: cis
  defp merge(xs, ys) do
    {x, restxs} = xs; {y, restys} = ys
    cond do
      x < y -> {x, fn () -> merge(restxs.(), ys) end}
      y < x -> {y, fn () -> merge(xs, restys.()) end}
      true -> {x, fn () -> merge(restxs.(), restys.()) end}
    end
  end

  @spec smlt(integer, integer) :: cis
  defp smlt(c, inc) do
    {c, fn () -> smlt(c + inc, inc) end}
  end

  @spec smult(integer) :: cis
  defp smult(p) do
    smlt(p * p, p + p)
  end
P
  @spec allmults(cis) :: ciss
  defp allmults {p, restps} do
    {smult(p), fn () -> allmults(restps.()) end}
  end

  @spec pairs(ciss) :: ciss
  defp pairs {cs0, restcss0} do
    {cs1, restcss1} = restcss0.()
    {merge(cs0, cs1), fn () -> pairs(restcss1.()) end}
  end

  @spec cmpsts(ciss) :: cis
  defp cmpsts {cs, restcss} do
    {c, restcs} = cs
    {c, fn () -> merge(restcs.(), cmpsts(pairs(restcss.()))) end}
  end

  @spec minusat(integer, cis) :: cis
  defp minusat(n, cmps) do
    {c, restcs} = cmps
    if n < c do
      {n, fn () -> minusat(n + 2, cmps) end}
    else
      minusat(n + 2, restcs.())
    end
  end

  @spec oddprms() :: cis
  defp oddprms() do
    {3, fn () ->
      {5, fn () -> minusat(7, cmpsts(allmults(oddprms()))) end}
    end}
  end

  @spec primes() :: Enumerable.t
  def primes do
    [2] |> Stream.concat(
      Stream.iterate(oddprms(), fn {_, restps} -> restps.() end)
        |> Stream.map(fn {p, _} -> p end)
    )
  end

end

range = 1000000
IO.write "The first 25 primes are:\n( "
PrimesSoETreeFolding.primes() |> Stream.take(25) |> Enum.each(&(IO.write "#{&1} "))
IO.puts ")"
testfunc =
  fn () ->
    ans =
      PrimesSoETreeFolding.primes() |> Stream.take_while(&(&1 <= range)) |> Enum.count()
    ans end
:timer.tc(testfunc)
  |> (fn {t,ans} ->
    IO.puts "There are #{ans} primes up to #{range}."
    IO.puts "This test bench took #{t} microseconds." end).()

It's output is identical to the previous version other than the time required is less than half; however, it has a O(n (log n) (log (log n))) asymptotic computation complexity meaning that it gets slower with range faster than the above version. That said, it would take sieving to billions taking hours before the two would take about the same time.

Elm

Elm with immutable arrays

module PrimeArray exposing (main)

import Array exposing (Array, foldr, map, set)
import Html exposing (div, h1, p, text)
import Html.Attributes exposing (style)


{-
The Eratosthenes sieve task in Rosetta Code does not accept the use of modulo function  (allthough Elm functions modBy and remainderBy work always correctly as they require type Int excluding type Float). Thus the solution needs an indexed work array as Elm has no indexes for lists.

In this method we need no division remainder calculations, as we just set the markings of non-primes into the array. We need the indexes that we know, where the marking of the non-primes shall be set.

Because everything is immutable in Elm, every change of array values will create a new array save the original array unchanged. That makes the program running slower or consuming more space of memory than with non-functional imperative languages. All conventional loops (for, while, until) are excluded in Elm because immutability requirement.

   Live: https://ellie-app.com/pTHJyqXcHtpa1
-}


alist =
    List.range 2 150



-- Work array contains integers 2 ... 149


workArray =
    Array.fromList alist


n : Int
n =
    List.length alist



-- The max index of integers used in search for primes
-- limit * limit < n
-- equal: limit <= √n


limit : Int
limit =
    round (0.5 + sqrt (toFloat n))
 

-- Remove zero cells of the array


findZero : Int -> Bool
findZero =
    \el -> el > 0


zeroFree : Array Int
zeroFree =
    Array.filter findZero workResult


nrFoundPrimes =
    Array.length zeroFree


workResult : Array Int
workResult =
    loopI 2 limit workArray



{- As Elm has no loops (for, while, until)
we must use recursion instead!
The search of prime starts allways saving the 
first found value (not setting zero) and continues setting the multiples of prime to zero.
Zero is no integer and may thus be used as marking of non-prime numbers. At the end, only the primes remain in the array and the zeroes are removed from the resulted array to be shown in Html. 
-}

-- The recursion increasing variable i follows:

loopI : Int -> Int -> Array Int -> Array Int
loopI i imax arr =
    if i > imax then
        arr

    else
        let
            arr2 =
                phase i arr
        in
        loopI (i + 1) imax arr2



-- The helper function


phase : Int -> Array Int -> Array Int
phase i =
    arrayMarker i (2 * i - 2) n


lastPrime =
    Maybe.withDefault 0 <| Array.get (nrFoundPrimes - 1) zeroFree


outputArrayInt : Array Int -> String
outputArrayInt arr =
    decorateString <|
        foldr (++) "" <|
            Array.map (\x -> String.fromInt x ++ " ") arr


decorateString : String -> String
decorateString str =
    "[ " ++ str ++ "]"



-- Recursively marking the multiples of p with zero
-- This loop operates with constant p


arrayMarker : Int -> Int -> Int -> Array Int -> Array Int
arrayMarker p min max arr =
    let
        arr2 =
            set min 0 arr

        min2 =
            min + p
    in
    if min < max then
        arrayMarker p min2 max arr2

    else
        arr


main =
    div [ style "margin" "2%" ]
        [ h1 [] [ text "Sieve of Eratosthenes" ]
        , text ("List of integers [2, ... ," ++ String.fromInt n ++ "]")
        , p [] [ text ("Total integers " ++ String.fromInt n) ]
        , p [] [ text ("Max prime of search " ++ String.fromInt limit) ]
        , p [] [ text ("The largest found prime " ++ String.fromInt lastPrime) ]
        , p [ style "color" "blue", style "font-size" "1.5em" ]
            [ text (outputArrayInt zeroFree) ]
        , p [] [ text ("Found " ++ String.fromInt nrFoundPrimes ++ " primes") ]
        ]

Output:

List of integers [2, ... ,149]

Total integers 149

Max prime of search 13

The largest found prime 149

[ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 ]

Found 35 primes

Concise Elm Immutable Array Version

Although functional, the above code is written in quite an imperative style, so the following code is written in a more concise functional style and includes timing information for counting the number of primes to a million:

module Main exposing (main)

import Browser exposing (element)
import Task exposing (Task, succeed, perform, andThen)
import Html exposing (Html, div, text)
import Time exposing (now, posixToMillis)

import Array exposing (repeat, get, set)

cLIMIT : Int
cLIMIT = 1000000

primesArray : Int -> List Int
primesArray n =
  if n < 2 then [] else
  let
    sz = n + 1
    loopbp bp arr =
      let s = bp * bp in
      if s >= sz then arr else
      let tst = get bp arr |> Maybe.withDefault True in
      if tst then loopbp (bp + 1) arr else
      let
        cullc c iarr =
          if c >= sz then iarr else
          cullc (c + bp) (set c True iarr)
      in loopbp (bp + 1) (cullc s arr)
    cmpsts = loopbp 2 (repeat sz False)
    cnvt (i, t) = if t then Nothing else Just i
  in cmpsts |> Array.toIndexedList
      |> List.drop 2 -- skip the values for zero and one
      |> List.filterMap cnvt -- primes are indexes of not composites

type alias Model = List String

type alias Msg = Model

test : (Int -> List Int) -> Int -> Cmd Msg
test primesf lmt =
  let
    to100 = primesf 100 |> List.map String.fromInt |> String.join ", "
    to100str = "The primes to 100 are:  " ++ to100
    timemillis() = now |> andThen (succeed << posixToMillis)
  in timemillis() |> andThen (\ strt ->
       let cnt = primesf lmt |> List.length
       in timemillis() |> andThen (\ stop ->
         let answrstr = "Found " ++ (String.fromInt cnt) ++ " primes to "
                          ++ (String.fromInt cLIMIT) ++ " in "
                          ++ (String.fromInt (stop - strt)) ++ " milliseconds."
         in succeed [to100str, answrstr] ) ) |> perform identity

main : Program () Model Msg
main =
  element { init = \ _ -> ( [], test primesArray cLIMIT )
          , update = \ msg _ -> (msg, Cmd.none)
          , subscriptions = \ _ -> Sub.none
          , view = div [] << List.map (div [] << List.singleton << text) }

Output:

The primes up to 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Found 78498 primes to 1000000 in 958 milliseconds.

The above output is the contents of the HTML web page as shown with Google Chrome version 1.23 running on an AMD 7840HS CPU at 5.1 GHz (single thread boosted).

Concise Elm Immutable Array Odds-Only Version

The following code can replace the `primesArray` function in the above program and called from the testing and display code (two places):

primesArrayOdds : Int -> List Int
primesArrayOdds n =
  if n < 2 then [] else
  let
    sz = (n - 1) // 2
    loopi i arr =
      let s = (i + i) * (i + 3) + 3 in
      if s >= sz then arr else
      let tst = get i arr |> Maybe.withDefault True in
      if tst then loopi (i + 1) arr else
      let
        bp = i + i + 3
        cullc c iarr =
          if c >= sz then iarr else
          cullc (c + bp) (set c True iarr)
      in loopi (i + 1) (cullc s arr)
    cmpsts = loopi 0 (repeat sz False)
    cnvt (i, t) = if t then Nothing else Just <| i + i + 3
    oddprms = cmpsts |> Array.toIndexedList |> List.filterMap cnvt
  in 2 :: oddprms

Output:

The primes up to 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Found 78498 primes to 1000000 in 371 milliseconds.

The above output is the contents of the HTML web page as shown with Google Chrome version 1.23 running on an AMD 7840HS CPU at 5.1 GHz (single thread boosted).

Richard Bird Tree Folding Elm Version

The Elm language doesn't efficiently handle the Sieve of Eratosthenes (SoE) algorithm because it doesn't have directly accessible linear arrays (the Array module used above is based on a persistent tree of sub arrays) and also does Copy On Write (COW) for every write to every location as well as a logarithmic process of updating as a "tree" to minimize the COW operations. Thus, there is better performance implementing the Richard Bird Tree Folding functional algorithm, as follows:

Translation of: Haskell

module Main exposing (main)

import Browser exposing (element)
import Task exposing (Task, succeed, perform, andThen)
import Html exposing (Html, div, text)
import Time exposing (now, posixToMillis)

cLIMIT : Int
cLIMIT = 1000000

type CIS a = CIS a (() -> CIS a)

uptoCIS2List : comparable -> CIS comparable -> List comparable
uptoCIS2List n cis =
  let loop (CIS hd tl) lst =
        if hd > n then List.reverse lst
        else loop (tl()) (hd :: lst)
  in loop cis []

countCISTo : comparable -> CIS comparable -> Int
countCISTo n cis =
  let loop (CIS hd tl) cnt =
        if hd > n then cnt else loop (tl()) (cnt + 1)
  in loop cis 0

primesTreeFolding : () -> CIS Int
primesTreeFolding() =
  let
    merge (CIS x xtl as xs) (CIS y ytl as ys) =
      case compare x y of
        LT -> CIS x <| \ () -> merge (xtl()) ys
        EQ -> CIS x <| \ () -> merge (xtl()) (ytl())
        GT -> CIS y <| \ () -> merge xs (ytl())
    pmult bp =
      let adv = bp + bp
          pmlt p = CIS p <| \ () -> pmlt (p + adv)
      in pmlt (bp * bp)
    allmlts (CIS bp bptl) =
      CIS (pmult bp) <| \ () -> allmlts (bptl())
    pairs (CIS frst tls) =
      let (CIS scnd tlss) = tls()
      in CIS (merge frst scnd) <| \ () -> pairs (tlss())
    cmpsts (CIS (CIS hd tl) tls) =
      CIS hd <| \ () -> merge (tl()) <| cmpsts <| pairs (tls())
    testprm n (CIS hd tl as cs) =
      if n < hd then CIS n <| \ () -> testprm (n + 2) cs
      else testprm (n + 2) (tl())
    oddprms() =
      CIS 3 <| \ () -> testprm 5 <| cmpsts <| allmlts <| oddprms()
  in CIS 2 <| \ () -> oddprms()

type alias Model = List String

type alias Msg = Model

test : (() -> CIS Int) -> Int -> Cmd Msg
test primesf lmt =
  let
    to100 = primesf() |> uptoCIS2List 100
              |> List.map String.fromInt |> String.join ", "
    to100str = "The primes to 100 are:  " ++ to100
    timemillis() = now |> andThen (succeed << posixToMillis)
  in timemillis() |> andThen (\ strt ->
       let cnt = primesf() |> countCISTo lmt
       in timemillis() |> andThen (\ stop ->
         let answrstr = "Found " ++ (String.fromInt cnt) ++ " primes to "
                          ++ (String.fromInt cLIMIT) ++ " in "
                          ++ (String.fromInt (stop - strt)) ++ " milliseconds."
         in succeed [to100str, answrstr] ) ) |> perform identity

main : Program () Model Msg
main =
  element { init = \ _ -> ( [], test primesTreeFolding cLIMIT )
          , update = \ msg _ -> (msg, Cmd.none)
          , subscriptions = \ _ -> Sub.none
          , view = div [] << List.map (div [] << List.singleton << text) }

Output:

The primes up to 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Found 78498 primes to 1000000 in 201 milliseconds.

The above output is the contents of the HTML web page as shown with Google Chrome version 1.23 running on an AMD 7840HS CPU at 5.1 GHz (single thread boosted).

Elm Priority Queue Version

Using a Binary Minimum Heap Priority Queue is a constant factor faster than the above code as the data structure is balanced rather than "heavy to the right" and requires less memory allocations/deallocation in the following code, which implements enough of the Priority Queue for the purpose. Just substitute the following code for `primesTreeFolding` and pass `primesPQ` as an argument to `test` rather than `primesTreeFolding`:

type PriorityQ comparable v =
  Mt
  | Br comparable v (PriorityQ comparable v)
                    (PriorityQ comparable v)

emptyPQ : PriorityQ comparable v
emptyPQ = Mt

peekMinPQ : PriorityQ comparable v -> Maybe (comparable, v)
peekMinPQ  pq = case pq of
                  (Br k v _ _) -> Just (k, v)
                  Mt -> Nothing

pushPQ : comparable -> v -> PriorityQ comparable v
           -> PriorityQ comparable v
pushPQ wk wv pq =
  case pq of
    Mt -> Br wk wv Mt Mt
    (Br vk vv pl pr) -> 
      if wk <= vk then Br wk wv (pushPQ vk vv pr) pl
      else Br vk vv (pushPQ wk wv pr) pl

siftdown : comparable -> v -> PriorityQ comparable v
             -> PriorityQ comparable v -> PriorityQ comparable v
siftdown wk wv pql pqr =
  case pql of
    Mt -> Br wk wv Mt Mt
    (Br vkl vvl pll prl) ->
      case pqr of
        Mt -> if wk <= vkl then Br wk wv pql Mt
              else Br vkl vvl (Br wk wv Mt Mt) Mt
        (Br vkr vvr plr prr) ->
          if wk <= vkl && wk <= vkr then Br wk wv pql pqr
          else if vkl <= vkr then Br vkl vvl (siftdown wk wv pll prl) pqr
               else Br vkr vvr pql (siftdown wk wv plr prr)

replaceMinPQ : comparable -> v -> PriorityQ comparable v
                 -> PriorityQ comparable v
replaceMinPQ wk wv pq = case pq of
                          Mt -> Mt
                          (Br _ _ pl pr) -> siftdown wk wv pl pr

primesPQ : () -> CIS Int
primesPQ() =
  let    
    sieve n pq q (CIS bp bptl as bps) =
      if n >= q then
        let adv = bp + bp in let (CIS nbp _ as nbps) = bptl()
        in sieve (n + 2) (pushPQ (q + adv) adv pq) (nbp * nbp) nbps
      else let
             (nxtc, _) = peekMinPQ pq |> Maybe.withDefault (q, 0) -- default when empty
             adjust tpq =
               let (c, adv) = peekMinPQ tpq |> Maybe.withDefault (0, 0)
               in if c > n then tpq
                  else adjust (replaceMinPQ (c + adv) adv tpq)
           in if n >= nxtc then sieve (n + 2) (adjust pq) q bps
              else CIS n <| \ () -> sieve (n + 2) pq q bps
    oddprms() = CIS 3 <| \ () -> sieve 5 emptyPQ 9 <| oddprms()
  in CIS 2 <| \ () -> oddprms()

Output:

The primes up to 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.
Found 78498 primes to 1000000 in 124 milliseconds.

The above output is the contents of the HTML web page as shown with Google Chrome version 1.23 running on an AMD 7840HS CPU at 5.1 GHz (single thread boosted).

Emacs Lisp

Library: cl-lib

(defun sieve-set (limit)
  (let ((xs (make-vector (1+ limit) 0)))
    (cl-loop for i from 2 to limit
             when (zerop (aref xs i))
             collect i
             and do (cl-loop for m from (* i i) to limit by i
                             do (aset xs m 1)))))

Straightforward implementation of sieve of Eratosthenes, 2 times faster:

(defun sieve (limit)
  (let ((xs (vconcat [0 0] (number-sequence 2 limit))))
    (cl-loop for i from 2 to (sqrt limit)
             when (aref xs i)
             do (cl-loop for m from (* i i) to limit by i
                         do (aset xs m 0)))
    (remove 0 xs)))

Erlang

Erlang using Dicts

-module( sieve_of_eratosthenes ).

-export( [primes_upto/1] ).

primes_upto( N ) ->
	Ns = lists:seq( 2, N ),
	Dict = dict:from_list( [{X, potential_prime} || X <- Ns] ),
	{Upto_sqrt_ns, _T} = lists:split( erlang:round(math:sqrt(N)), Ns ),
	{N, Prime_dict} = lists:foldl( fun find_prime/2, {N, Dict}, Upto_sqrt_ns ),
	lists:sort( dict:fetch_keys(Prime_dict) ).



find_prime( N, {Max, Dict} ) -> find_prime( dict:find(N, Dict), N, {Max, Dict} ).

find_prime( error, _N, Acc ) -> Acc;
find_prime( {ok, _Value}, N, {Max, Dict} ) when Max > N*N ->
    {Max, lists:foldl( fun dict:erase/2, Dict, lists:seq(N*N, Max, N))};
find_prime( {ok, _Value}, _, R) -> R.

Output:

35> sieve_of_eratosthenes:primes_upto( 20 ).
[2,3,5,7,11,13,17,19]

Erlang Lists of Tuples, Sloww

A much slower, perverse method, using only lists of tuples. Especially evil is the P = lists:filtermap operation which yields a list for every iteration of the X * M row. Has the virtue of working for any -> N :)

-module( sieve ).                                                                                                    
-export( [main/1,primes/2] ).                                                                                        
                                                                                                                     
main(N) -> io:format("Primes: ~w~n", [ primes(2,N) ]).                                                               
                                                                                                                     
primes(M,N) -> primes(M, N,lists:seq( M, N ),[]).                                                                    
                                                                                                                     
primes(M,N,_Acc,Tuples) when M > N/2-> out(Tuples);                                                                  

primes(M,N,Acc,Tuples) when length(Tuples) < 1 -> 
        primes(M,N,Acc,[{X, X} || X <- Acc]);                              

primes(M,N,Acc,Tuples) ->                                                                                            
        {SqrtN, _T} = lists:split( erlang:round(math:sqrt(N)), Acc ),                                                
        F = Tuples,                                                                                                  
        Ms = lists:filtermap(fun(X) -> if X > 0 -> {true, X * M}; true -> false end end, SqrtN),                     
        P = lists:filtermap(fun(T) -> 
            case lists:keymember(T,1,F) of true -> 
            {true, lists:keyreplace(T,1,F,{T,0})}; 
             _-> false end end,  Ms),                                                                                              
        AA = mergeT(P,lists:last(P),1 ),                                                                             
        primes(M+1,N,Acc,AA).                                                                                        
                                                                                                                     
mergeT(L,M,Acc) when Acc == length(L) -> M;                                                                          
mergeT(L,M,Acc) ->                                                                                                   
        A = lists:nth(Acc,L),                                                                                        
        B = M,                                                                                                       
        Mer = lists:zipwith(fun(X, Y) -> if X < Y -> X; true -> Y end end, A, B),                                    
        mergeT(L,Mer,Acc+1).                                                                                         
                                                                                                                     
out(Tuples) ->                                                                                                       
        Primes = lists:filter( fun({_,Y}) -> Y > 0 end,  Tuples),                                                    
        [ X || {X,_} <- Primes ].

Output:

109> sieve:main(20).
Primes: [2,3,5,7,11,13,17,19]
ok
110> timer:tc(sieve, main, [20]).        
Primes: [2,3,5,7,11,13,17,19]
{129,ok}

Erlang with ordered sets

Since I had written a really odd and slow one, I thought I'd best do a better performer. Inspired by an example from https://github.com/jupp0r

-module(ossieve).
-export([main/1]).

sieve(Candidates,SearchList,Primes,_Maximum) when length(SearchList) == 0 ->
    ordsets:union(Primes,Candidates);
sieve(Candidates,SearchList,Primes,Maximum)  ->
     H = lists:nth(1,string:substr(Candidates,1,1)),
     Reduced1 = ordsets:del_element(H, Candidates),
     {Reduced2, ReducedSearch} = remove_multiples_of(H, Reduced1, SearchList),
     NewPrimes = ordsets:add_element(H,Primes),
     sieve(Reduced2, ReducedSearch, NewPrimes, Maximum).

remove_multiples_of(Number,Candidates,SearchList) ->                                 
    NewSearchList = ordsets:filter( fun(X) -> X >= Number * Number end, SearchList), 
    RemoveList = ordsets:filter( fun(X) -> X rem Number == 0 end, NewSearchList),
    {ordsets:subtract(Candidates, RemoveList), ordsets:subtract(NewSearchList, RemoveList)}.

main(N) ->      
    io:fwrite("Creating Candidates...~n"),
    CandidateList = lists:seq(3,N,2),
    Candidates = ordsets:from_list(CandidateList),
    io:fwrite("Sieving...~n"),
    ResultSet = ordsets:add_element(2,sieve(Candidates,Candidates,ordsets:new(),N)),
    io:fwrite("Sieved... ~w~n",[ResultSet]).

Output:

36> ossieve:main(100).
Creating Candidates...
Sieving...
Sieved... [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
ok

Erlang Canonical

A pure list comprehension approach.

-module(sieveof).
-export([main/1,primes/1, primes/2]).                 
                                                      
main(X) -> io:format("Primes: ~w~n", [ primes(X) ]).  
                                 
primes(X) -> sieve(range(2, X)).                                         
primes(X, Y) -> remove(primes(X), primes(Y)).                            
                                                                         
range(X, X) -> [X];                                                      
range(X, Y) -> [X | range(X + 1, Y)].                                    
                                                                         
sieve([X]) -> [X];                                                       
sieve([H | T]) -> [H | sieve(remove([H * X || X <-[H | T]], T))].        
                                                                         
remove(_, []) -> [];                                                     
remove([H | X], [H | Y]) -> remove(X, Y);                                
remove(X, [H | Y]) -> [H | remove(X, Y)].

{out}

> timer:tc(sieve, main, [100]). 
Primes: [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
{7350,ok}
61> timer:tc(sieveof, main, [100]). 
Primes: [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
{363,ok}

Clearly not only more elegant, but faster :) Thanks to http://stackoverflow.com/users/113644/g-b

Erlang ets + cpu distributed implementation

much faster previous erlang examples

#!/usr/bin/env escript
%% -*- erlang -*-
%%! -smp enable -sname p10_4
% vim:syn=erlang

-mode(compile).

main([N0]) ->
    N = list_to_integer(N0),
    ets:new(comp, [public, named_table, {write_concurrency, true} ]),
    ets:new(prim, [public, named_table, {write_concurrency, true}]),
    composite_mc(N),
    primes_mc(N),
    io:format("Answer: ~p ~n", [lists:sort([X||{X,_}<-ets:tab2list(prim)])]).

primes_mc(N) ->
    case erlang:system_info(schedulers) of
        1 -> primes(N);
        C -> launch_primes(lists:seq(1,C), C, N, N div C)
    end.
launch_primes([1|T], C, N, R) -> P = self(), spawn(fun()-> primes(2,R), P ! {ok, prm} end), launch_primes(T, C, N, R);
launch_primes([H|[]], C, N, R)-> P = self(), spawn(fun()-> primes(R*(H-1)+1,N), P ! {ok, prm} end), wait_primes(C);
launch_primes([H|T], C, N, R) -> P = self(), spawn(fun()-> primes(R*(H-1)+1,R*H), P ! {ok, prm} end), launch_primes(T, C, N, R).

wait_primes(0) -> ok;
wait_primes(C) ->
    receive
        {ok, prm} -> wait_primes(C-1)
    after 1000    -> wait_primes(C)
    end.

primes(N) -> primes(2, N).
primes(I,N) when I =< N ->
    case ets:lookup(comp, I) of
        [] -> ets:insert(prim, {I,1})
        ;_ -> ok
    end,
    primes(I+1, N);
primes(I,N) when I > N -> ok.


composite_mc(N) -> composite_mc(N,2,round(math:sqrt(N)),erlang:system_info(schedulers)).
composite_mc(N,I,M,C) when I =< M, C > 0 ->
    C1 = case ets:lookup(comp, I) of
        [] -> comp_i_mc(I*I, I, N), C-1
        ;_ -> C
    end,
    composite_mc(N,I+1,M,C1);
composite_mc(_,I,M,_) when I > M -> ok;
composite_mc(N,I,M,0) ->
    receive
        {ok, cim} -> composite_mc(N,I,M,1)
    after 1000    -> composite_mc(N,I,M,0)
    end.

comp_i_mc(J, I, N) -> 
    Parent = self(),
    spawn(fun() ->
        comp_i(J, I, N),
        Parent ! {ok, cim}
    end).

comp_i(J, I, N) when J =< N -> ets:insert(comp, {J, 1}), comp_i(J+I, I, N);
comp_i(J, _, N) when J > N -> ok.

Output:

mkh@mkh-xps:~/work/mblog/pr_euler/p10$ ./generator.erl 100
Answer: [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,
         97]

another several erlang implementation: http://mijkenator.github.io/2015/11/29/project-euler-problem-10/

ERRE

PROGRAM SIEVE_ORG
  ! --------------------------------------------------
  ! Eratosthenes Sieve Prime Number Program in BASIC
  ! (da 3 a SIZE*2)   from Byte September 1981
  !---------------------------------------------------
  CONST SIZE%=8190

  DIM FLAGS%[SIZE%]

BEGIN
  PRINT("Only 1 iteration")
  COUNT%=0
  FOR I%=0 TO SIZE% DO
     IF FLAGS%[I%]=TRUE THEN
         !$NULL
       ELSE
         PRIME%=I%+I%+3
         K%=I%+PRIME%
         WHILE NOT (K%>SIZE%) DO
            FLAGS%[K%]=TRUE
            K%=K%+PRIME%
         END WHILE
         PRINT(PRIME%;)
         COUNT%=COUNT%+1
     END IF
  END FOR
  PRINT
  PRINT(COUNT%;" PRIMES")
END PROGRAM

Output:

last lines of the output screen

 15749  15761  15767  15773  15787  15791  15797  15803  15809  15817  15823 
 15859  15877  15881  15887  15889  15901  15907  15913  15919  15923  15937 
 15959  15971  15973  15991  16001  16007  16033  16057  16061  16063  16067 
 16069  16073  16087  16091  16097  16103  16111  16127  16139  16141  16183 
 16187  16189  16193  16217  16223  16229  16231  16249  16253  16267  16273 
 16301  16319  16333  16339  16349  16361  16363  16369  16381 
 1899  PRIMES

Euler

The original Euler doesn't have loops built-in. Loops can easily be added by defining and calling suitable procedures with literal procedures as parameters. In this sample, a C-style "for" loop procedure is defined and used to sieve and print the primes.

begin
    new sieve; new for; new prime; new i;

    for   <- ` formal init; formal test; formal incr; formal body;
               begin
                 label again;
                 init;
again:           if test then begin body; incr; goto again end else 0
               end
             '
           ;

    sieve <- ` formal n;
               begin
                 new primes; new i; new i2; new j;
                 primes <- list n;
                 for( ` i <- 1 ', ` i <= n ', ` i <- i + 1 '
                    , ` primes[ i ] <- true '
                    );
                 primes[ 1 ] <- false;
                 for( ` i <- 2 '
                    , ` [ i2 <- i * i ] <= n '
                    , ` i <- i + 1 '
                    , ` if primes[ i ] then
                          for( ` j <- i2 ', ` j <= n ', ` j <- j + i '
                             , ` primes[ j ] <- false '
                             )
                        else 0
                      '
                    );
                 primes
               end
             '
           ;

    prime <- sieve( 30 );
    for( ` i <- 1 ', ` i <= length prime ', ` i <- i + 1 '
       , ` if prime[ i ] then out i else 0 '
       )

end $

Output:

    NUMBER                   2
    NUMBER                   3
    NUMBER                   5
    NUMBER                   7
    NUMBER                  11
    NUMBER                  13
    NUMBER                  17
    NUMBER                  19
    NUMBER                  23
    NUMBER                  29

Euphoria

constant limit = 1000
sequence flags,primes
flags = repeat(1, limit)
for i = 2 to sqrt(limit) do
    if flags[i] then
        for k = i*i to limit by i do
            flags[k] = 0
        end for
    end if
end for

primes = {}
for i = 2 to limit do
    if flags[i] = 1 then
        primes &= i
    end if
end for
? primes

Output:

{2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,
97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,
181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,
277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,
383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,
487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,
601,607,613,617,619,631,641,643,647,653,659,661,673,677,683,691,701,
709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,
827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,
947,953,967,971,977,983,991,997}

F#

Short with mutable state

let primes max =
    let mutable xs = [|2..max|]
    let limit = max |> float |> sqrt |> int
    for x in [|2..limit|] do
        xs <- xs |> Array.except [|x*x..x..max|]
    xs

Short Sweet Functional and Idiotmatic

Well lists may not be lazy, but if you call it a sequence then it's a lazy list!

(*
  An interesting implementation of The Sieve of Eratosthenes.
  Nigel Galloway April 7th., 2017.
*)
let SofE =
  let rec fn n g = seq{ match n with
                        |1 -> yield false; yield! fn g g 
                        |_ -> yield  true; yield! fn (n - 1) g}
  let rec fg ng = seq {
    let g = (Seq.findIndex(id) ng) + 2 // decreasingly inefficient with range at O(n)!
    yield g; yield! fn (g - 1) g |> Seq.map2 (&&) ng |> Seq.cache |> fg }
  Seq.initInfinite (fun x -> true) |> fg

Output:

> SofE |> Seq.take 10 |> Seq.iter(printfn "%d");;
2
3
5
7
11
13
17
19
23
29

Although interesting intellectually, and although the algorithm is more Sieve of Eratosthenes (SoE) than not in that it uses a progression of composite number representations separated by base prime gaps to cull, it isn't really SoE in performance due to several used functions that aren't linear with range, such as the "findIndex" that scans from the beginning of all primes to find the next un-culled value as the next prime in the sequence and the general slowness and inefficiency of F# nested sequence generation.

It is so slow that it takes in the order of seconds just to find the primes to a thousand!

For practical use, one would be much better served by any of the other functional sieves below, which can sieve to a million in less time than it takes this one to sieve to ten thousand. Those other functional sieves aren't all that many lines of code than this one.

Functional

Richard Bird Sieve

This is the idea behind Richard Bird's unbounded code presented in the Epilogue of M. O'Neill's article in Haskell. It is about twice as much code as the Haskell code because F# does not have a built-in lazy list so that the effect must be constructed using a Co-Inductive Stream (CIS) type since no memoization is required, along with the use of recursive functions in combination with sequences. The type inference needs some help with the new CIS type (including selecting the generic type for speed). Note the use of recursive functions to implement multiple non-sharing delayed generating base primes streams, which along with these being non-memoizing means that the entire primes stream is not held in memory as for the original Bird code:

type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness

let primesBird() =
  let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
    if x < y then CIS(x, fun() -> xtlf() ^^ ys)
    elif y < x then CIS(y, fun() -> xs ^^ ytlf())
    else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
  let pmltpls p = let rec nxt c = CIS(c, fun() -> nxt (c + p)) in nxt (p * p)
  let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
  let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
    CIS(c, fun() -> (ctlf()) ^^ (cmpsts (amstlf())))
  let rec minusat n (CIS(c, ctlf) as cs) =
    if n < c then CIS(n, fun() -> minusat (n + 1u) cs)
    else minusat (n + 1u) (ctlf())
  let rec baseprms() = CIS(2u, fun() -> baseprms() |> allmltps |> cmpsts |> minusat 3u)
  Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (baseprms())

The above code sieves all numbers of two and up including all even numbers as per the page specification; the following code makes the very minor changes for an odds-only sieve, with a speedup of over a factor of two:

type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness

let primesBirdOdds() =
  let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
    if x < y then CIS(x, fun() -> xtlf() ^^ ys)
    elif y < x then CIS(y, fun() -> xs ^^ ytlf())
    else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
  let pmltpls p = let adv = p + p
                  let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)
  let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
  let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
    CIS(c, fun() -> ctlf() ^^ cmpsts (amstlf()))
  let rec minusat n (CIS(c, ctlf) as cs) =
    if n < c then CIS(n, fun() -> minusat (n + 2u) cs)
    else minusat (n + 2u) (ctlf())
  let rec oddprms() = CIS(3u, fun() -> oddprms() |> allmltps |> cmpsts |> minusat 5u)
  Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (CIS(2u, fun() -> oddprms()))

Tree Folding Sieve

The above code is still somewhat inefficient as it operates on a linear right extending structure that deepens linearly with increasing base primes (those up to the square root of the currently sieved number); the following code changes the structure into an infinite binary tree-like folding by combining each pair of prime composite streams before further processing as usual - this decreases the processing by approximately a factor of log n:

type 'a CIS = CIS of 'a * (unit -> 'a CIS) //'Co Inductive Stream for laziness

let primesTreeFold() =
  let rec (^^) (CIS(x, xtlf) as xs) (CIS(y, ytlf) as ys) = // stream merge function
    if x < y then CIS(x, fun() -> xtlf() ^^ ys)
    elif y < x then CIS(y, fun() -> xs ^^ ytlf())
    else CIS(x, fun() -> xtlf() ^^ ytlf()) // no duplication
  let pmltpls p = let adv = p + p
                  let rec nxt c = CIS(c, fun() -> nxt (c + adv)) in nxt (p * p)
  let rec allmltps (CIS(p, ptlf)) = CIS(pmltpls p, fun() -> allmltps (ptlf()))
  let rec pairs (CIS(cs0, cs0tlf)) =
    let (CIS(cs1, cs1tlf)) = cs0tlf() in CIS(cs0 ^^ cs1, fun() -> pairs (cs1tlf()))
  let rec cmpsts (CIS(CIS(c, ctlf), amstlf)) =
    CIS(c, fun() -> ctlf() ^^ (cmpsts << pairs << amstlf)())
  let rec minusat n (CIS(c, ctlf) as cs) =
    if n < c then CIS(n, fun() -> minusat (n + 2u) cs)
    else minusat (n + 2u) (ctlf())
  let rec oddprms() = CIS(3u, fun() -> oddprms() |> allmltps |> cmpsts |> minusat 5u)
  Seq.unfold (fun (CIS(p, ptlf)) -> Some(p, ptlf())) (CIS(2u, fun() -> oddprms()))

The above code is over four times faster than the "BirdOdds" version (at least 10x faster than the first, "primesBird", producing the millionth prime) and is moderately useful for a range of the first million primes or so.

Priority Queue Sieve

In order to investigate Priority Queue Sieves as espoused by O'Neill in the referenced article, one must find an equivalent implementation of a Min Heap Priority Queue as used by her. There is such an purely functional implementation in RosettaCode translated from the Haskell code she used, from which the essential parts are duplicated here (Note that the key value is given an integer type in order to avoid the inefficiency of F# in generic comparison):

[<RequireQualifiedAccess>]
module MinHeap =

  type HeapEntry<'V> = struct val k:uint32 val v:'V new(k,v) = {k=k;v=v} end
  [<CompilationRepresentation(CompilationRepresentationFlags.UseNullAsTrueValue)>]
  [<NoEquality; NoComparison>]
  type PQ<'V> =
         | Mt
         | Br of HeapEntry<'V> * PQ<'V> * PQ<'V>

  let empty = Mt

  let peekMin = function | Br(kv, _, _) -> Some(kv.k, kv.v)
                         | _            -> None

  let rec push wk wv = 
    function | Mt -> Br(HeapEntry(wk, wv), Mt, Mt)
             | Br(vkv, ll, rr) ->
                 if wk <= vkv.k then
                   Br(HeapEntry(wk, wv), push vkv.k vkv.v rr, ll)
                 else Br(vkv, push wk wv rr, ll)

  let private siftdown wk wv pql pqr =
    let rec sift pl pr =
      match pl with
        | Mt -> Br(HeapEntry(wk, wv), Mt, Mt)
        | Br(vkvl, pll, plr) ->
            match pr with
              | Mt -> if wk <= vkvl.k then Br(HeapEntry(wk, wv), pl, Mt)
                      else Br(vkvl, Br(HeapEntry(wk, wv), Mt, Mt), Mt)
              | Br(vkvr, prl, prr) ->
                  if wk <= vkvl.k && wk <= vkvr.k then Br(HeapEntry(wk, wv), pl, pr)
                  elif vkvl.k <= vkvr.k then Br(vkvl, sift pll plr, pr)
                  else Br(vkvr, pl, sift prl prr)
    sift pql pqr                                        

  let replaceMin wk wv = function | Mt -> Mt
                                  | Br(_, ll, rr) -> siftdown wk wv ll rr

Except as noted for any individual code, all of the following codes need the following prefix code in order to implement the non-memoizing Co-Inductive Streams (CIS's) and to set the type of particular constants used in the codes to the same time as the "Prime" type:

type CIS<'T> = struct val v: 'T val cont: unit -> CIS<'T> new(v,cont) = {v=v;cont=cont} end
type Prime = uint32
let frstprm = 2u
let frstoddprm = 3u
let inc1 = 1u
let inc = 2u

The F# equivalent to O'Neill's "odds-only" code is then implemented as follows, which needs the included changed prefix in order to change the primes type to a larger one to prevent overflow (as well the key type for the MinHeap needs to be changed from uint32 to uint64); it is functionally the same as the O'Neill code other than for minor changes to suit the use of CIS streams and the option output of the "peekMin" function:

type CIS<'T> = struct val v: 'T val cont: unit -> CIS<'T> new(v,cont) = {v=v;cont=cont} end
type Prime = uint64
let frstprm = 2UL
let frstoddprm = 3UL
let inc = 2UL

let primesPQ() =
  let pmult p (xs: CIS<Prime>) = // does map (* p) xs
    let rec nxtm (cs: CIS<Prime>) =
      CIS(p * cs.v, fun() -> nxtm (cs.cont())) in nxtm xs
  let insertprime p xs table =
    MinHeap.push (p * p) (pmult p xs) table
  let rec sieve' (ns: CIS<Prime>) table =
    let nextcomposite = match MinHeap.peekMin table with
                          | None -> ns.v // never happens
                          | Some (k, _) -> k
    let rec adjust table =
      let (n, advs) = match MinHeap.peekMin table with
                        | None -> (ns.v, ns.cont()) // never happens
                        | Some kv -> kv
      if n <= ns.v then adjust (MinHeap.replaceMin advs.v (advs.cont()) table)
      else table
    if nextcomposite <= ns.v then sieve' (ns.cont()) (adjust table)
    else let n = ns.v in CIS(n, fun() ->
           let nxtns = ns.cont() in sieve' nxtns (insertprime n nxtns table))
  let rec sieve (ns: CIS<Prime>) = let n = ns.v in CIS(n, fun() ->
      let nxtns = ns.cont() in sieve' nxtns (insertprime n nxtns MinHeap.empty))
  let odds = // is the odds CIS from 3 up
    let rec nxto i = CIS(i, fun() -> nxto (i + inc)) in nxto frstoddprm
  Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
             (CIS(frstprm, fun() -> (sieve odds)))

However, that algorithm suffers in speed and memory use due to over-eager adding of prime composite streams to the queue such that the queue used is much larger than it needs to be and a much larger range of primes number must be used in order to avoid numeric overflow on the square of the prime added to the queue. The following code corrects that by using a secondary (actually a multiple of) base primes streams which are constrained to be based on a prime that is no larger than the square root of the currently sieved number - this permits the use of much smaller Prime types as per the default prefix:

let primesPQx() =
  let rec nxtprm n pq q (bps: CIS<Prime>) =
    if n >= q then let bp = bps.v in let adv = bp + bp
                   let nbps = bps.cont() in let nbp = nbps.v
                   nxtprm (n + inc) (MinHeap.push (n + adv) adv pq) (nbp * nbp) nbps
    else let ck, cv = match MinHeap.peekMin pq with
                        | None -> (q, inc) // only happens until first insertion
                        | Some kv -> kv
         if n >= ck then let rec adjpq ck cv pq =
                             let npq = MinHeap.replaceMin (ck + cv) cv pq
                             match MinHeap.peekMin npq with
                               | None -> npq // never happens
                               | Some(nk, nv) -> if n >= nk then adjpq nk nv npq
                                                 else npq
                         nxtprm (n + inc) (adjpq ck cv pq) q bps
         else CIS(n, fun() -> nxtprm (n + inc) pq q bps)
  let rec oddprms() = CIS(frstoddprm, fun() ->
      nxtprm (frstoddprm + inc) MinHeap.empty (frstoddprm * frstoddprm) (oddprms()))
  Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
             (CIS(frstprm, fun() -> (oddprms())))

The above code is well over five times faster than the previous translated O'Neill version for the given variety of reasons.

Although slightly faster than the Tree Folding code, this latter code is also limited in practical usefulness to about the first one to ten million primes or so.

All of the above codes can be tested in the F# REPL with the following to produce the millionth prime (the "nth" function is zero based):

> primesXXX() |> Seq.nth 999999;;

where primesXXX() is replaced by the given primes generating function to be tested, and which all produce the following output (after a considerable wait in some cases):

Output:

val it : Prime = 15485863u

Imperative

The following code is written in functional style other than it uses a mutable bit array to sieve the composites:

let primes limit =
  let buf = System.Collections.BitArray(int limit + 1, true)
  let cull p = { p * p .. p .. limit } |> Seq.iter (fun c -> buf.[int c] <- false)
  { 2u .. uint32 (sqrt (double limit)) } |> Seq.iter (fun c -> if buf.[int c] then cull c)
  { 2u .. limit } |> Seq.map (fun i -> if buf.[int i] then i else 0u) |> Seq.filter ((<>) 0u)

[<EntryPoint>]
let main argv =
  if argv = null || argv.Length = 0 then failwith "no command line argument for limit!!!"
  printfn "%A" (primes (System.UInt32.Parse argv.[0]) |> Seq.length)
  0 // return an integer exit code

Substituting the following minor changes to the code for the "primes" function will only deal with the odd prime candidates for a speed up of over a factor of two as well as a reduction of the buffer size by a factor of two:

let primes limit =
  let lmtb,lmtbsqrt = (limit - 3u) / 2u, (uint32 (sqrt (double limit)) - 3u) / 2u
  let buf = System.Collections.BitArray(int lmtb + 1, true)
  let cull i = let p = i + i + 3u in let s = p * (i + 1u) + i in
               { s .. p .. lmtb } |> Seq.iter (fun c -> buf.[int c] <- false)
  { 0u .. lmtbsqrt } |> Seq.iter (fun i -> if buf.[int i] then cull i )
  let oddprms = { 0u .. lmtb } |> Seq.map (fun i -> if buf.[int i] then i + i + 3u else 0u)
                |> Seq.filter ((<>) 0u)
  seq { yield 2u; yield! oddprms }

The following code uses other functional forms for the inner culling loops of the "primes function" to reduce the use of inefficient sequences so as to reduce the execution time by another factor of almost three:

let primes limit =
  let lmtb,lmtbsqrt = (limit - 3u) / 2u, (uint32 (sqrt (double limit)) - 3u) / 2u
  let buf = System.Collections.BitArray(int lmtb + 1, true)
  let rec culltest i = if i <= lmtbsqrt then
                         let p = i + i + 3u in let s = p * (i + 1u) + i in
                         let rec cullp c = if c <= lmtb then buf.[int c] <- false; cullp (c + p)
                         (if buf.[int i] then cullp s); culltest (i + 1u) in culltest 0u
  seq {yield 2u; for i = 0u to lmtb do if buf.[int i] then yield i + i + 3u }

Now much of the remaining execution time is just the time to enumerate the primes as can be seen by turning "primes" into a primes counting function by substituting the following for the last line in the above code doing the enumeration; this makes the code run about a further five times faster:

  let rec count i acc =
    if i > int lmtb then acc else if buf.[i] then count (i + 1) (acc + 1) else count (i + 1) acc
  count 0 1

Since the final enumeration of primes is the main remaining bottleneck, it is worth using a "roll-your-own" enumeration implemented as an object expression so as to save many inefficiencies in the use of the built-in seq computational expression by substituting the following code for the last line of the previous codes, which will decrease the execution time by a factor of over three (instead of almost five for the counting-only version, making it almost as fast):

  let nmrtr() =
    let i = ref -2
    let rec nxti() = i:=!i + 1;if !i <= int lmtb && not buf.[!i] then nxti() else !i <= int lmtb
    let inline curr() = if !i < 0 then (if !i= -1 then 2u else failwith "Enumeration not started!!!")
                        else let v = uint32 !i in v + v + 3u
    { new System.Collections.Generic.IEnumerator<_> with
        member this.Current = curr()
      interface System.Collections.IEnumerator with
        member this.Current = box (curr())
        member this.MoveNext() = if !i< -1 then i:=!i+1;true else nxti()
        member this.Reset() = failwith "IEnumerator.Reset() not implemented!!!"a
      interface System.IDisposable with
        member this.Dispose() = () }
  { new System.Collections.Generic.IEnumerable<_> with
      member this.GetEnumerator() = nmrtr()
    interface System.Collections.IEnumerable with
      member this.GetEnumerator() = nmrtr() :> System.Collections.IEnumerator }

The various optimization techniques shown here can be used "jointly and severally" on any of the basic versions for various trade-offs between code complexity and performance. Not shown here are other techniques of making the sieve faster, including extending wheel factorization to much larger wheels such as 2/3/5/7, pre-culling the arrays, page segmentation, and multi-processing.

Almost functional Unbounded

the following odds-only implmentations are written in an almost functional style avoiding the use of mutability except for the contents of the data structures uses to hold the state of the and any mutability necessary to implement a "roll-your-own" IEnumberable iterator interface for speed.

Unbounded Dictionary (Mutable Hash Table) Based Sieve

The following code uses the DotNet Dictionary class instead of the above functional Priority Queue to implement the sieve; as average (amortized) hash table access is O(1) rather than O(log n) as for the priority queue, this implementation is slightly faster than the priority queue version for the first million primes and will always be faster for any range above some low range value:

type Prime = uint32
let frstprm = 2u
let frstoddprm = 3u
let inc = 2u
let primesDict() =
  let dct = System.Collections.Generic.Dictionary()
  let rec nxtprm n q (bps: CIS<Prime>) =
    if n >= q then let bp = bps.v in let adv = bp + bp
                   let nbps = bps.cont() in let nbp = nbps.v
                   dct.Add(n + adv, adv)
                   nxtprm (n + inc) (nbp * nbp) nbps
    else if dct.ContainsKey(n) then
           let adv = dct.[n]
           dct.Remove(n) |> ignore
//           let mutable nn = n + adv // ugly imperative code
//           while dct.ContainsKey(nn) do nn <- nn + adv
//           dct.Add(nn, adv)
           let rec nxtmt k = // advance to next empty spot
             if dct.ContainsKey(k) then nxtmt (k + adv)
             else dct.Add(k, adv) in nxtmt (n + adv)
           nxtprm (n + inc) q bps
         else CIS(n, fun() -> nxtprm (n + inc) q bps)
  let rec oddprms() = CIS(frstoddprm, fun() ->
      nxtprm (frstoddprm + inc) (frstoddprm * frstoddprm) (oddprms()))
  Seq.unfold (fun (cis: CIS<Prime>) -> Some(cis.v, cis.cont()))
             (CIS(frstprm, fun() -> (oddprms())))

The above code uses functional forms of code (with the imperative style commented out to show how it could be done imperatively) and also uses a recursive non-sharing secondary source of base primes just as for the Priority Queue version. As for the functional codes, the Primes type can easily be changed to "uint64" for wider range of sieving.

In spite of having true O(n log log n) Sieve of Eratosthenes computational complexity where n is the range of numbers to be sieved, the above code is still not particularly fast due to the time required to compute the hash values and manipulations of the hash table.

Unbounded Page-Segmented Bit-Packed Odds-Only Mutable Array Sieve

Note that the following code is used for the F# entry Extensible_prime_generator#Unbounded_Mutable_Array_Generator of the Extensible prime generator page.

All of the above unbounded implementations including the above Dictionary based version are quite slow due to their large constant factor computational overheads, making them more of an intellectual exercise than something practical, especially when larger sieving ranges are required. The following code implements an unbounded page segmented version of the sieve in not that many more lines of code, yet runs about 25 times faster than the Dictionary version for larger ranges of sieving such as to one billion; it uses functional forms without mutability other than for the contents of the arrays and the `primes` enumeration generator function that must use mutability for speed:

type Prime = float // use uint64/int64 for regular 64-bit F#
type private PrimeNdx = float // they are slow in JavaScript polyfills

let inline private prime n = float n // match these convenience conversions
let inline private primendx n = float n // with the types above!

let private cPGSZBTS = (1 <<< 14) * 8 // sieve buffer size in bits = CPUL1CACHE

type private SieveBuffer = uint8[]

/// a Co-Inductive Stream (CIS) of an "infinite" non-memoized series...
type private CIS<'T> = CIS of 'T * (unit -> CIS<'T>) //' apostrophe formatting adjustment

/// lazy list (memoized) series of base prime page arrays...
type private BasePrime = uint32
type private BasePrimeArr = BasePrime[]
type private BasePrimeArrs = BasePrimeArrs of BasePrimeArr * Option<Lazy<BasePrimeArrs>>

/// Masking array is faster than bit twiddle bit shifts!
let private cBITMASK = [| 1uy; 2uy; 4uy; 8uy; 16uy; 32uy; 64uy; 128uy |]

let private cullSieveBuffer lwi (bpas: BasePrimeArrs) (sb: SieveBuffer) =
  let btlmt = (sb.Length <<< 3) - 1 in let lmti = lwi + primendx btlmt
  let rec loopbp (BasePrimeArrs(bpa, bpatl) as ibpas) i =
    if i >= bpa.Length then
      match bpatl with
      | None -> ()
      | Some lv -> loopbp lv.Value 0 else
    let bp = prime bpa.[i] in let bpndx = primendx ((bp - prime 3) / prime 2)
    let s = (bpndx * primendx 2) * (bpndx + primendx 3) + primendx 3 in let bpint = int bp
    if s <= lmti then
      let s0 = // page cull start address calculation...
        if s >= lwi then int (s - lwi) else
        let r = (lwi - s) % (primendx bp)
        if r = primendx 0 then 0 else int (bp - prime r)
      let slmt = min btlmt (s0 - 1 + (bpint <<< 3))
      let rec loopc c = // loop "unpeeling" is used so
        if c <= slmt then // a constant mask can be used over the inner loop
          let msk = cBITMASK.[c &&& 7]
          let rec loopw w =
            if w < sb.Length then sb.[w] <- sb.[w] ||| msk; loopw (w + bpint)
          loopw (c >>> 3); loopc (c + bpint)
      loopc s0; loopbp ibpas (i + 1) in loopbp bpas 0

/// fast Counting Look Up Table (CLUT) for pop counting...
let private cCLUT =
  let arr = Array.zeroCreate 65536
  let rec popcnt n cnt = if n > 0 then popcnt (n &&& (n - 1)) (cnt + 1) else uint8 cnt
  let rec loop i = if i < 65536 then arr.[i] <- popcnt i 0; loop (i + 1)
  loop 0; arr

let countSieveBuffer ndxlmt (sb: SieveBuffer): int =
  let lstw = (ndxlmt >>> 3) &&& -2
  let msk = (-2 <<< (ndxlmt &&& 15)) &&& 0xFFFF
  let inline cntem i m =
    int cCLUT.[int (((uint32 sb.[i + 1]) <<< 8) + uint32 sb.[i]) ||| m]
  let rec loop i cnt =
    if i >= lstw then cnt - cntem lstw msk else loop (i + 2) (cnt - cntem i 0)
  loop 0 ((lstw <<< 3) + 16)

/// a CIS series of pages from the given start index with the given SieveBuffer size,
/// and provided with a polymorphic converter function to produce
/// and type of result from the culled page parameters...
let rec private makePrimePages strtwi btsz
                               (cnvrtrf: PrimeNdx -> SieveBuffer -> 'T): CIS<'T> =
  let bpas = makeBasePrimes() in let sb = Array.zeroCreate (btsz >>> 3)
  let rec nxtpg lwi =
    Array.fill sb 0 sb.Length 0uy; cullSieveBuffer lwi bpas sb
    CIS(cnvrtrf lwi sb, fun() -> nxtpg (lwi + primendx btsz))
  nxtpg strtwi

/// secondary feed of lazy list of memoized pages of base primes...
and private makeBasePrimes(): BasePrimeArrs =
  let sb2bpa lwi (sb: SieveBuffer) =
    let bsbp = uint32 (primendx 3 + lwi + lwi)
    let arr = Array.zeroCreate <| countSieveBuffer 255 sb
    let rec loop i j =
      if i < 256 then
        if sb.[i >>> 3] &&& cBITMASK.[i &&& 7] <> 0uy then loop (i + 1) j
        else arr.[j] <- bsbp + uint32 (i + i); loop (i + 1) (j + 1)
    loop 0 0; arr
  // finding the first page as not part of the loop and making succeeding
  // pages lazy breaks the recursive data race!
  let frstsb = Array.zeroCreate 32
  let fkbpas = BasePrimeArrs(sb2bpa (primendx 0) frstsb, None)
  cullSieveBuffer (primendx 0) fkbpas frstsb
  let rec nxtbpas (CIS(bpa, tlf)) = BasePrimeArrs(bpa, Some(lazy (nxtbpas (tlf()))))
  BasePrimeArrs(sb2bpa (primendx 0) frstsb,
                Some(lazy (nxtbpas <| makePrimePages (primendx 256) 256 sb2bpa)))

/// produces a generator of primes; uses mutability for better speed...
let primes(): unit -> Prime =
  let sb2prms lwi (sb: SieveBuffer) = lwi, sb in let mutable ndx = -1
  let (CIS((nlwi, nsb), npgtlf)) = // use page generator function above!
    makePrimePages (primendx 0) cPGSZBTS sb2prms
  let mutable lwi = nlwi in let mutable sb = nsb
  let mutable pgtlf = npgtlf
  let mutable baseprm = prime 3 + prime (lwi + lwi) 
  fun() -> 
    if ndx < 0 then ndx <- 0; prime 2 else
    let inline notprm i = sb.[i >>> 3] &&& cBITMASK.[i &&& 7] <> 0uy
    while ndx < cPGSZBTS && notprm ndx do ndx <- ndx + 1
    if ndx >= cPGSZBTS then // get next page if over
      let (CIS((nlwi, nsb), npgtlf)) = pgtlf() in ndx <- 0
      lwi <- nlwi; sb <- nsb; pgtlf <- npgtlf
      baseprm <- prime 3 + prime (lwi + lwi) 
      while notprm ndx do ndx <- ndx + 1
    let ni = ndx in ndx <- ndx + 1 // ready for next call!
    baseprm + prime (ni + ni)

let countPrimesTo (limit: Prime): int = // much faster!
  if limit < prime 3 then (if limit < prime 2 then 0 else 1) else
  let topndx = (limit - prime 3) / prime 2 |> primendx
  let sb2cnt lwi (sb: SieveBuffer) =
    let btlmt = (sb.Length <<< 3) - 1 in let lmti = lwi + primendx btlmt
    countSieveBuffer
      (if lmti < topndx then btlmt else int (topndx - lwi)) sb, lmti
  let rec loop (CIS((cnt, nxti), tlf)) count =
    if nxti < topndx then loop (tlf()) (count + cnt)
    else count + cnt
  loop <| makePrimePages (primendx 0) cPGSZBTS sb2cnt <| 1

/// sequences are convenient but slow...
let primesSeq() = primes() |> Seq.unfold (fun gen -> Some(gen(), gen))
printfn "The first 25 primes are:  %s"
  ( primesSeq() |> Seq.take 25
      |> Seq.fold (fun s p -> s + string p + " ") "" )
printfn "There are %d primes up to a million." 
  ( primesSeq() |> Seq.takeWhile ((>=) (prime 1000000)) |> Seq.length )

let rec cntto gen lmt cnt = // faster than seq's but still slow
  if gen() > lmt then cnt else cntto gen lmt (cnt + 1)

let limit = prime 1_000_000_000
let start = System.DateTime.Now.Ticks
// let answr = cntto (primes()) limit 0 // slower way!
let answr = countPrimesTo limit // over twice as fast way!
let elpsd = (System.DateTime.Now.Ticks - start) / 10000L
printfn "Found %d primes to %A in %d milliseconds." answr limit elpsd

Output:

The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
There are 78498 primes up to a million.
Found 50847534 primes to 1000000000 in 2161 milliseconds.

As with all of the efficient unbounded sieves, the above code uses a secondary enumerator of the base primes less than the square root of the currently culled range, which is this case is a lazy (deferred memoized evaluation) binding by small pages of base primes which also uses the laziness of the deferral of subsequent pages so as to avoid a race condition.

The above code is written to output the "uint64" type for very large ranges of primes since there is little computational cost to doing this for this algorithm when used with 64-bit compilation; however, for the Fable transpiled to JavaScript, the largest contiguous integer that can be represented is the 64-bit floating point mantissa of 52 bits and thus the large numbers can be represented by floats in this case since a 64-bit polyfill is very slow. As written, the practical range for this sieve is about 16 billion, however, it can be extended to about 10^14 (a week or two of execution time) by setting the "PGSZBTS" constant to the size of the CPU L2 cache rather than the L1 cache (L2 is up to about two Megabytes for modern high end desktop CPU's) at a slight loss of efficiency (a factor of up to two or so) per composite number culling operation due to the slower memory access time. When the Fable compilation option is used, execution speed is roughly the same as using F# with DotNet Core.

Even with the custom `primes` enumerator generator (the F#/Fable built-in sequence operators are terribly inefficient), the time to enumerate the resulting primes takes longer than the time to actually cull the composite numbers from the sieving arrays. The time to do the actual culling is thus over 50 times faster than done using the Dictionary version. The slowness of enumeration, no matter what further tweaks are done to improve it (each value enumerated will always take a function calls and a scan loop that will always take something in the order of 100 CPU clock cycles per value), means that further gains in speed using extreme wheel factorization and multi-processing have little point unless the actual work on the resulting primes is done through use of auxiliary functions not using iteration. Such a function is provided here to count the primes by pages using a "pop count" look up table to reduce the counting time to only a small fraction of a second.

Factor

Factor already contains two implementations of the sieve of Eratosthenes in math.primes.erato and math.primes.erato.fast. It is suggested to use one of them for real use, as they use faster types, faster unsafe arithmetic, and/or wheels to speed up the sieve further. Shown here is a more straightforward implementation that adheres to the restrictions given by the task (namely, no wheels).

Factor is pleasantly multiparadigm. Usually, it's natural to write more functional or declarative code in Factor, but this is an instance where it is more natural to write imperative code. Lexical variables are useful here for expressing the necessary mutations in a clean way.

USING: bit-arrays io kernel locals math math.functions
math.ranges prettyprint sequences ;
IN: rosetta-code.sieve-of-erato

<PRIVATE

: init-sieve ( n -- seq )   ! Include 0 and 1 for easy indexing.
    1 - <bit-array> dup set-bits ?{ f f } prepend ;

! Given the sieve and a prime starting index, create a range of
! values to mark composite. Start at the square of the prime.
: to-mark ( seq n -- range )
    [ length 1 - ] [ dup dup * ] bi* -rot <range> ;

! Mark multiples of prime n as composite.
: mark-nths ( seq n -- ) 
    dupd to-mark [ swap [ f ] 2dip set-nth ] with each ;

: next-prime ( index seq -- n ) [ t = ] find-from drop ;

PRIVATE>

:: sieve ( n -- seq )
    n sqrt 2 n init-sieve :> ( limit i! s )
    [ i limit < ]             ! sqrt optimization 
    [ s i mark-nths i 1 + s next-prime i! ] while t s indices ;

: sieve-demo ( -- )
    "Primes up to 120 using sieve of Eratosthenes:" print
    120 sieve . ;

MAIN: sieve-demo

FOCAL

1.1 T "PLEASE ENTER LIMIT"
1.2 A N
1.3 I (2047-N)5.1
1.4 D 2
1.5 Q

2.1 F X=2,FSQT(N); D 3
2.2 F W=2,N; I (SIEVE(W)-2)4.1

3.1 I (-SIEVE(X))3.3
3.2 F Y=X*X,X,N; S SIEVE(Y)=2
3.3 R

4.1 T %4.0,W,!

5.1 T "PLEASE ENTER A NUMBER LESS THAN 2048."!; G 1.1

Note that with the 4k paper tape version of FOCAL, the program will run out of memory for N>190 or so.

Forth

: prime? ( n -- ? ) here + c@ 0= ;
: composite! ( n -- ) here + 1 swap c! ;

: sieve ( n -- )
  here over erase
  2
  begin
    2dup dup * >
  while
    dup prime? if
      2dup dup * do
        i composite!
      dup +loop
    then
    1+
  repeat
  drop
  ." Primes: " 2 do i prime? if i . then loop ;

100 sieve

Output:

Primes: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Alternate Odds-Only, Better Style

The above code is not really very good Forth style as the main initialization, sieving, and output, are all in one `sieve` routine which makes it difficult to understand and refactor; Forth code is normally written in a series of very small routines which makes it easier to understand what is happening on the data stack, since Forth does not have named local re-entrant variable names as most other languages do for local variables (which other languages also normally store local variables on the stack). Also, it uses the `HERE` pointer to user space which points to the next available memory after all compilation is done as a unsized buffer pointer, but as it does not reserve that space for the sieving buffer, it can be changed by other concatenated routines in unexpected ways; better is to allocate the sieving buffer as required from the available space at the time the routines are run and pass that address between concatenated functions until a finalization function frees the memory and clears the stack; this is equivalent to allocating from the "heap" in other languages. The below code demonstrates these ideas:

: prime? ( addr -- ? ) C@ 0= ; \ test composites array for prime

\ given square index and prime index, u0, sieve the multiples of said prime...
: cullpi! ( u addr u u0 -- u addr u0 )
   DUP DUP + 3 + ROT 4 PICK SWAP \ -- numv addr i prm numv sqri
   DO 2 PICK I + TRUE SWAP C! DUP +LOOP DROP ;

\ process for required prime limit; allocate and initialize returned buffer...
: initsieve ( u -- u a-addr)
   3 - DUP 0< IF 0 ELSE
      1 RSHIFT 1+ DUP ALLOCATE 0<> IF ABORT" Memory allocation error!!!"
      ELSE 2DUP SWAP ERASE THEN
   THEN ;

\ pass through sieving to given index in given buffer address as side effect...
: sieve ( u a-addr -- u a-addr )
   0 \ initialize test index i -- numv bufa i
   BEGIN \ test prime square index < limit
      DUP DUP DUP + SWAP 3 + * 3 + TUCK 4 PICK SWAP > \ sqri = 2*i * (I+3) + 3
   WHILE \ -- numv bufa sqri i
      2 PICK OVER + prime? IF cullpi! \ -- numv bufa i
      ELSE SWAP DROP THEN 1+ \ -- numv bufa ni
   REPEAT 2DROP ; \ -- numv bufa; drop sqri i

\ print primes to given limit...
: .primes ( u a-addr -- )
   OVER 0< IF DROP 2 - 0< IF ( ." No primes!" ) ELSE ( ." Prime:  2" ) THEN
   ELSE ." Primes:  2 " SWAP 0
      DO DUP I + prime? IF I I + 3 + . THEN LOOP FREE DROP THEN ;

\ count number of primes found for number odd numbers within
\ given presumed sieved buffer starting at address...
: countprimes@ ( u a-addr -- )
  SWAP DUP 0< IF 1+ 0< IF DROP 0 ELSE 1 THEN
   ELSE 1 SWAP \ -- bufa cnt numv
      0 DO OVER I + prime? IF 1+ THEN LOOP SWAP FREE DROP
   THEN ;

\ shows counted number of primes to the given limit...
: .countprimesto ( u -- )
   DUP initsieve sieve countprimes@
   CR ." Found " . ." primes Up to the " . ." limit." ;

\ testing the code...
100 initsieve sieve .primes
1000000 .countprimesto

Output:

Primes:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Found 78498 primes Up to the 1000000 limit.

As well as solving the stated problems making it much easier to understand and refactor, an odds-only sieve takes half the space and less than half the time.

Bit-Packing the Sieve Buffer (Odds-Only)

Although the above version resolves many problems of the first version, it is wasteful of memory as each composite number in the sieve buffer is a byte of eight bits representing a boolean value. The memory required can be reduced eight-fold by bit packing the sieve buffer; this will take more "bit-twiddling" to read and write the bits, but reducing the memory used will give better cache assiciativity to larger ranges such that there will be a net gain in performance. This will make the code more complex and the stack manipulations will be harder to write, debug, and maintain, so ANS Forth 1994 provides a local variable naming facility to make this much easier. The following code implements bit-packing of the sieve buffer using local named variables when required:

\ produces number of one bits in given word...
: numbts ( u -- u ) \ pop count number of bits...
   0 SWAP BEGIN DUP 0<> WHILE SWAP 1+ SWAP DUP 1- AND REPEAT DROP ;

\ constants for variable 32/64 etc. CELL size...
1 CELLS 3 LSHIFT 1- CONSTANT CellMsk
CellMsk numbts CONSTANT CellShft

CREATE bits 8 ALLOT \ bit position Look Up Table...
: mkbts 8 0 DO 1 I LSHIFT I bits + c! LOOP ; mkbts

\ test bit index composites array for prime...
: prime? ( u addr -- ? )
    OVER 3 RSHIFT + C@ SWAP 7 AND bits + C@ AND 0= ;

\ given square index and prime index, u0, sieve the multiples of said prime...
: cullpi! ( u addr u u0 -- u addr u0 )
   DUP DUP + 3 + ROT 4 PICK SWAP \ -- numv addr i prm numv sqri
   DO I 3 RSHIFT 3 PICK + DUP C@ I 7 AND bits + C@ OR SWAP C! DUP +LOOP
   DROP ;

\ initializes sieve storage and parameters
\ given sieve limit, returns bit limit and buffer address ..
: initsieve ( u -- u a-addr )
   3 - \ test limit...
   DUP 0< IF 0 ELSE \ return if number of bits is <= 0!
      1 RSHIFT 1+ \ finish conbersion to number of bits
      DUP 1- CellShft RSHIFT 1+ \ round up to even number of cells
      CELLS DUP ALLOCATE 0= IF DUP ROT ERASE \ set cells0. to zero
      ELSE ABORT" Memory allocation error!!!"
      THEN
   THEN ;

\ pass through sieving to given index in given buffer address as side effect...
: sieve ( u a-addr -- u a-addr )
   0 \ initialize test index i -- numv bufa i
   BEGIN \ test prime square index < limit
      DUP DUP DUP + SWAP 3 + * 3 + TUCK 4 PICK SWAP > \ sqri = 2*i * (I+3) + 3
   WHILE \ -- numv bufa sqri i
      DUP 3 PICK prime? IF cullpi! \ -- numv bufa i
      ELSE SWAP DROP THEN 1+ \ -- numv bufa ni
   REPEAT 2DROP ; \ -- numv bufa; drop sqri i

\ prints already found primes from sieved array...
: .primes ( u a-addr -- )
   SWAP CR ." Primes to " DUP DUP + 2 + 2 MAX . ." are:  "
   DUP 0< IF 1+ 0< IF ." none." ELSE 2 . THEN DROP \ case no primes or just 2
   ELSE 2 . 0 DO I OVER prime? IF I I + 3 + . THEN LOOP FREE DROP
   THEN ;

\ pop count style Look Up Table by 16 bits entry;
\ is a 65536 byte array containing number of zero bits for each index...
CREATE cntLUT16 65536 ALLOT
: mkpop ( u -- u )   numbts 16 SWAP - ;
: initLUT ( -- )   cntLUT16 65536 0 DO I mkpop OVER I + C! LOOP DROP ; initLUT
: popcount@ ( u -- u )
   0 1 CELlS 1 RSHIFT 0
   DO OVER 65535 AND cntLUT16 + C@ + SWAP 16 RSHIFT SWAP LOOP SWAP DROP ;

\ count number of zero bits up to given bits index-1 in array address;
\ params are number of bits used - bits, negative indicates <2/2 out: 0/1,
\ given address is of the allocated bit buffer - bufa;
\ values used: bmsk is bit mask to limit bit in last cell,
\ lci is cell index of last cell used, cnt is the return value...
\ NOTE. this is for little-endian; big-endian needs a byte swap
\ before the last mask and popcount operation!!!
: primecount@ ( u a-addr -- u )
   LOCALS| bufa numb |
   numb 0< IF numb 1+ 0< IF 0 ELSE 1 THEN \ < 3 -> <2/2 -> 0/1!
   ELSE
      numb 1- TO numb \ numb -= 1
      1 \ initial count
      numb CellShft RSHIFT CELLS TUCK \ lci = byte index of CELL including numv
      0 ?DO bufa I + @ popcount@ + 1 CELLS +LOOP \ -- lci cnt
      SWAP bufa + @ \ -- cnt lstCELL
      -2 numb CellMsk AND LSHIFT OR \ bmsk for last CELL -- cnt mskdCELL
      popcount@ + \ add popcount of last masked CELL -- cnt
      bufa FREE DROP \ free bufa -- bmsk cnt lastcell@
   THEN ;

: .countprimesto ( u -- u )
   dup initsieve sieve primecount@
   CR ." There are " . ." primes Up to the " . ." limit." ;

100 initsieve sieve .primes
1000000000 .countprimesto

The output of the above code is the same as the previous version, but it takes about two thirds the time while using eight times less memory; it takes about 6.5 seconds on my Intel Skylake i5-6500 at 3.6 GHz (turbo) using swiftForth (32-bit) and about 3.5 seconds on VFX Forth (64-bit), both of which compile to machine code but with the latter much more optimized; gforth-fast is about twice as slow as swiftForth and five times slower then VFX Forth as it just compiles to threaded execution tokens (more like an interpreter).

Page-Segmented Bit-Packed Odds-Only Version

While the above version does greatly reduce the amount of memory used for a given sieving range and thereby also somewhat reduces execution time; any sieve intended for sieving to limits of a hundred million or more should use a page-segmented implementation; page-segmentation means that only storage for a representation of the base primes up to the square root of the limit plus a sieve buffer that should also be at least proportional to the same square root is required; this will again make the execution faster as ranges go up due to better cache associativity with most memory accesses being within the CPU cache sizes. The following Forth code implements a basic version that does this:

\ CPU L1 and L2 cache sizes in bits; power of 2...
1 17 LSHIFT CONSTANT L1CacheBits
L1CacheBits 8 * CONSTANT L2CacheBits

\ produces number of one bits in given word...
: numbts ( u -- u ) \ pop count number of bits...
   0 SWAP BEGIN DUP 0<> WHILE SWAP 1+ SWAP DUP 1- AND REPEAT DROP ;

\ constants for variable 32/64 etc. CELL size...
1 CELLS 3 LSHIFT 1- CONSTANT CellMsk
CellMsk numbts CONSTANT CellShft

CREATE bits 8 ALLOT \ bit position Look Up Table...
: mkbts 8 0 DO 1 I LSHIFT I bits + c! LOOP ; mkbts

\ initializes sieve buffer storage and parameters
\ given sieve buffer bit size (even number of CELLS), returns buffer address ..
: initSieveBuffer ( u -- a-addr )
   CellShft RSHIFT \ even number of cells
   CELLS ALLOCATE 0<> IF ABORT" Memory allocation error!!!" THEN ;

\ test bit index composites array for prime...
: prime? ( u addr -- ? )
    OVER 3 RSHIFT + C@ SWAP 7 AND bits + C@ AND 0= ;

\ given square index and prime index, u0, as sell as bitsz,
\ sieve the multiples of said prime leaving prime index on the stack...
: cullpi! ( u u0 u u addr -- u0 )
   LOCALS| sba bitsz lwi | DUP DUP + 3 + ROT \ -- i prm sqri
   \ culling start incdx address calculation...
   lwi 2DUP > IF - ELSE SWAP - OVER MOD DUP 0<> IF OVER SWAP - THEN
   THEN bitsz SWAP \ -- i prm bitsz strti
   DO I 3 RSHIFT sba + DUP C@ I 7 AND bits + C@ OR SWAP C! DUP +LOOP
   DROP ;

\ cull sieve buffer given base wheel index, bit size, 
\ address base prime sieved buffer and
\ the address of the sieve buffer to be culled of composite bits...
: cullSieveBuffer ( u u a-addr a-addr -- )
   >R >R 2DUP + R> R>  \ -- lwi bitsz rngi bpba sba
   LOCALS| sba bpba rngi bitsz lwi |
   bitsz 1- CellShft RSHIFT 1+ CELLS sba SWAP ERASE \ clear sieve buffer
   0 \ initialize base prime index i -- i
   BEGIN \ test prime square index < limit
      DUP DUP DUP + SWAP 3 + * 3 + TUCK rngi < \ sqri = 2*i * (I+3) + 3
   WHILE \ -- sqri i
      DUP bpba prime? IF lwi bitsz sba cullpi! ELSE SWAP DROP THEN \ -- i     
   1+ REPEAT 2DROP ; \ --

\ pop count style Look Up Table by 16 bits entry;
\ is a 65536 byte array containing number of zero bits for each index...
CREATE cntLUT16 65536 ALLOT
: mkpop ( u -- u )   numbts 16 SWAP - ;
: initLUT ( -- )   cntLUT16 65536 0 DO I mkpop OVER I + C! LOOP DROP ; initLUT
: popcount@ ( u -- u )
   0 1 CELlS 1 RSHIFT 0
   DO OVER 65535 AND cntLUT16 + C@ + SWAP 16 RSHIFT SWAP LOOP SWAP DROP ;

\ count number of zero bits up to given bits index in array address...
: countSieveBuffer@ ( u a-addr -- u )
   LOCALS| bufa lmti |
   0 \ initial count -- cnt
   lmti CellShft RSHIFT CELLS TUCK \ lci = byte index of CELL including numv
   0 ?DO bufa I + @ popcount@ + 1 CELLS +LOOP \ -- lci cnt
   SWAP bufa + @ \ -- cnt lstCELL
   -2 lmti CellMsk AND LSHIFT OR \ bmsk for last CELL -- cnt mskdCELL
   popcount@ + ; \ add popcount of last masked CELL -- cnt

\ prints found primes from series of culled sieve buffers...
: .primes ( u -- )
   DUP CR ." Primes to " . ." are:  "
   DUP 3 - 0< IF DUP 2 - 0< IF ." none." ELSE 2 . THEN \ <2/2 -> 0/1
   ELSE 2 .
      3 - 1 RSHIFT 1+ \ -- rngi
      DUP 1- L2CacheBits / L2CacheBits * 3 RSHIFT \ -- rng rngi pglmtbytes
      L1CacheBits initSieveBuffer \ address of base prime sieve buffer
      L2CacheBits initSieveBuffer \ address of main sieve buffer
      LOCALS| sba bpsba pglmt | \ local variables -- rngi
      0 OVER L1CacheBits MIN bpsba bpsba cullSieveBuffer
      pglmt 0 ?DO
         I L2CacheBits bpsba sba cullSieveBuffer
         I L2CacheBits 0 DO I sba prime? IF DUP I + DUP + 3 + . THEN LOOP DROP
      L2CacheBits +LOOP \ rngi
      L2CacheBits mod DUP 0> IF \ one more page!
         pglmt DUP L2CacheBits bpsba sba cullSieveBuffer
         SWAP 0 DO I sba prime? IF DUP I + DUP + 3 + . THEN LOOP DROP
      THEN bpsba FREE DROP sba FREE DROP
   THEN ; \ --

\ prints count of found primes from series of culled sieve buffers...
: .countPrimesTo ( u -- )
   DUP 3 - 0< IF 2 - 0< IF 0 ELSE 1 THEN \ < 3 -> <2/2 -> 0/1!
   ELSE
      DUP 3 - 1 RSHIFT 1+
      DUP 1- L2CacheBits / L2CacheBits * \ -- rng rngi pglmtbytes
      L1CacheBits initSieveBuffer \ address of base prime sieve buffer
      L2CacheBits initSieveBuffer \ address of main sieve buffer
      LOCALS| sba bpsba pglmt | \ local variables -- rng rngi
      0 OVER L1CacheBits MIN bpsba bpsba cullSieveBuffer
      1 pglmt 0 ?DO
         I L2CacheBits bpsba sba cullSieveBuffer
         L2CacheBits 1- sba countSieveBuffer@ +
      L2CacheBits +LOOP \ rng rngi cnt
      SWAP L2CacheBits mod DUP 0> IF \ one more page!
         pglmt OVER bpsba sba cullSieveBuffer
         1- sba countSieveBuffer@ + \ partial count!
      THEN
      bpsba FREE DROP sba FREE DROP \ -- range cnt
   THEN CR ." There are " . ." primes Up to the " . ." limit." ;

100 .primes
1000000000 .countPrimesTo

Output:

Primes to 100 are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
There are 50847534 primes Up to the 1000000000 limit.

For simplicity, the base primes array is left as a sieved bit packed array (which takes minimum space) at the cost of having to scan the bit array for base primes on every page-segment culling pass. The page-segment sieve buffer is set as a fixed multiple of this (intended to fit within the CPU L2 cache size) in order to reduce the base prime start index address calculation overhead by this factor at the cost of slightly increased memory access times, which access times are still only about the same as the fastest inner culling time or less anyway. When the cache sizes are set to the 32 Kilobyte/256 Kilobyte size for L1/L2, respectively, by changing 1 18 LSHIFT CONSTANT L1CacheBits) as for my Intel Skylake i5-6500 at 3.6 GHz (single-threaded turbo), it runs in about 1.25 seconds on 64-bit VFX Forth, 3.75 seconds on 32-bit swiftForth, and 12.4 seconds on 64-bit gforth-fast, obviously with the tuned in-lined machine language compiling of VFX Forth much faster than the threaded execution token interpreting of gforth and with swiftForth lacking the machine code inlining of VFX Forth.

VFX Forth is only about 25 % slower than the algorithm as written in the fastest of languages, just as they advertise.

As written, the algorithm works efficiently up to over ten billion (1e10) with 64-bit systems, but could easily be refactored to use floating point or double precision for inputs and outputs as I have done in a StackOverflow answer in JavaScript without costing much in execution time so 32-bit systems would have the much higher limit.

The implementation is efficient up to this range, but with a change so that the base primes array can grow with increasing limit, can sieve to much higher ranges with a loss of efficiency in unused base prime start address calculations that can't be used as the culling spans exceed the fixed sieve buffer size. Again, this can be solved by also making the page-segmentation sieve buffer grow as the square root of the limit.

Further improvements by a factor of almost four in overall execution speed would be gained by implementing maximum wheel-factorization as per my other StackOverflow JavaScript answer, which also effectively increases sieve buffer sizes by a factor of 48 in sieving by modulo residual bit planes.

Finally, multi-processing could be applied to increase the execution speed by about the number of effective cores (non SMT - Hyper Threads) as in four on my Skylake machine; however, neither the 1994 ANS Forth standard nor the 2012 standard has a standard Forth way of implementing this so each of the implementations use their own custom WORDS; since the resulting code would not be cross-implementation, I am not going to do this.

I likely won't even add the Maximum Wheel-Factorized version as in the above linked JavaScript code, since this code is enough to demonstrate what I was going to show: that Forth can be an efficient language, albeit a little hard to code, read, and maintain due to the reliance on anonymous data stack operations; it is a language whose best use is likely in cross-compiling to embedded systems where it can easily be customized and extended as required, and because it doesn't actually require a base operating system, can use its core facilities, functions, and extensions in place of such an OS to result in a minimum memory footprint.

Fortran

Works with: Fortran version 77

      PROGRAM MAIN
      INTEGER LI
      WRITE (6,100)
      READ  (5,110) LI
      call SOE(LI)
 100  FORMAT( 'Limit:' )
 110  FORMAT( I4 )
      STOP
      END
      
C --- SIEVE OF ERATOSTHENES ----------
      SUBROUTINE SOE( LI )
      INTEGER LI
      LOGICAL A(LI)
      INTEGER SL,P,I
      
      DO 10 I=1,LI
         A(I) = .TRUE.
 10   CONTINUE
      
      SL = INT(SQRT(REAL(LI)))
      A(1) = .FALSE.
      DO 30 P=2,SL
         IF ( .NOT. A(P) ) GOTO 30
         DO 20 I=P*P,LI,P
            A(I)=.FALSE.
 20      CONTINUE
 30   CONTINUE

      DO 40 I=2,LI
         IF ( A(I) ) WRITE(6,100) I
 40   CONTINUE

 100  FORMAT(I3)
      RETURN
      END

Works with: Fortran version 90 and later

program sieve

  implicit none
  integer, parameter :: i_max = 100
  integer :: i
  logical, dimension (i_max) :: is_prime

  is_prime = .true.
  is_prime (1) = .false.
  do i = 2, int (sqrt (real (i_max)))
    if (is_prime (i)) is_prime (i * i : i_max : i) = .false.
  end do
  do i = 1, i_max
    if (is_prime (i)) write (*, '(i0, 1x)', advance = 'no') i
  end do
  write (*, *)

end program sieve

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Because it uses four byte logical's (default size) as elements of the sieve buffer, the above code uses 400 bytes of memory for this trivial task of sieving to 100; it also has 49 + 31 + 16 + 8 = 104 (for the culling by the primes of two, three, five, and seven) culling operations.

Optimised using a pre-computed wheel based on 2:

program sieve_wheel_2

  implicit none
  integer, parameter :: i_max = 100
  integer :: i
  logical, dimension (i_max) :: is_prime

  is_prime = .true.
  is_prime (1) = .false.
  is_prime (4 : i_max : 2) = .false.
  do i = 3, int (sqrt (real (i_max))), 2
    if (is_prime (i)) is_prime (i * i : i_max : 2 * i) = .false.
  end do
  do i = 1, i_max
    if (is_prime (i)) write (*, '(i0, 1x)', advance = 'no') i
  end do
  write (*, *)

end program sieve_wheel_2

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

This so-called "optimized" version still uses 400 bytes of memory but slightly reduces to 74 operations from 104 operations including the initialization of marking all of the even representations as composite due to skipping the re-culling of the even representation, so isn't really much of an optimization at all!

Optimized using a proper implementation of a wheel 2:

The above implementations, especially the second odds-only code, are some of the most inefficient versions of the Sieve of Eratosthenes in any language here as to time and space efficiency, only worse by some naive JavaScript implementations that use eight-byte Number's as logical values; the second claims to be wheel factorized but still uses all the same memory as the first and still culls by the even numbers in the initialization of the sieve buffer. As well, using four bytes (default logical size) to store a boolean value is terribly wasteful if these implementations were to be extended to non-toy ranges. The following code implements proper wheel factorization by two, reducing the space used by a factor of about eight to 49 bytes by using `byte` as the sieve buffer array elements and not requiring the evens initialization, thus reducing the number of operations to 16 + 8 + 4 = 28 (for the culling primes of three, five, and seven) culling operations:

program sieve_wheel_2
 
  implicit none
  integer, parameter :: i_max = 100
  integer, parameter :: i_limit = (i_max - 3) / 2
  integer :: i
  byte, dimension (0:i_limit) :: composites
 
  composites = 0
  do i = 0, (int (sqrt (real (i_max))) - 3) / 2
    if (composites(i) == 0) composites ((i + i) * (i + 3) + 3 : i_limit : i + i + 3) = 1.
  end do
  write (*, '(i0, 1x)', advance = 'no') 2
  do i = 0, i_limit
    if (composites (i) == 0) write (*, '(i0, 1x)', advance = 'no') (i + i + 3)
  end do
  write (*, *)
 
end program sieve_wheel_2

The output is the same as the earlier version.

Optimized using bit packing to reduce the memory use by a further factor of eight:

The above implementation is still space inefficient in effectively only using one bit out of eight; the following version implements bit packing to reduce memory use by a factor of eight by using bits to represent composite numbers rather than bytes:

program sieve_wheel_2
 
  implicit none
  integer, parameter :: i_max = 10000000
  integer, parameter :: i_range = (i_max - 3) / 2
  integer :: i, j, k, cnt
  byte, dimension (0:i_range / 8) :: composites
 
  composites = 0 ! pre-initialized?
  do i = 0, (int (sqrt (real (i_max))) - 3) / 2
    if (iand(composites(shiftr(i, 3)), shiftl(1, iand(i, 7))) == 0) then
      do j = (i + i) * (i + 3) + 3, i_range, i + i + 3
        k = shiftr(j, 3)
        composites(k) = ior(composites(k), shiftl(1, iand(j, 7)))
      end do
    end if
  end do
!  write (*, '(i0, 1x)', advance = 'no') 2
  cnt = 1
  do i = 0, i_range
    if (iand(composites(shiftr(i, 3)), shiftl(1, iand(i, 7))) == 0) then
!      write (*, '(i0, 1x)', advance = 'no') (i + i + 3)
      cnt = cnt + 1
    end if
  end do
!  write (*, *)
  print '(a, i0, a, i0, a, f0.0, a)', &
        'There are ', cnt, ' primes up to ', i_max, '.'
end program sieve_wheel_2

Output:

There are 664579 primes up to 10000000.

When the lines to print the results are enabled, the output to a maximum value of 100 is still exactly the same as the other versions, and it has exactly the same number of culling operations as the immediately above optimized version for the same range; the only difference is that less memory is used. Although the culling operations are somewhat more complex, for larger ranges the time saved in better cache associativity due to more effective use of the cache more than makes up for it so average culling time is actually reduced, so that this version can count the number of primes to several million (it takes a lot of time to list hundreds of thousands of primes, but counting is faster) in a few tens of milliseconds. For ranges above a few tens of millions, a page-segmented sieve is much more efficient due to further improved use of the CPU caches.

Multi-Threaded Page-Segmented Bit-Packed Odds-Only Version

As well as adding page-segmentation, the following code adds multi-processing which is onc of the capabilities for which modern Fortran is known:

subroutine cullSieveBuffer(lwi, size, bpa, sba)

    implicit none
    integer, intent(in) :: lwi, size
    byte, intent(in) :: bpa(0:size - 1)
    byte, intent(out) :: sba(0:size - 1)
    integer :: i_limit, i_bitlmt, i_bplmt, i, sqri, bp, si, olmt, msk, j
    byte, dimension (0:7) :: bits
    common /twiddling/ bits
    
    i_bitlmt = size * 8 - 1
    i_limit = lwi + i_bitlmt
    i_bplmt = size / 4
    sba = 0
    i = 0
    sqri = (i + i) * (i + 3) + 3
    do while (sqri <= i_limit)
      if (iand(int(bpa(shiftr(i, 3))), shiftl(1, iand(i, 7))) == 0) then
        ! start index address calculation...
        bp = i + i + 3
        if (lwi <= sqri) then
          si = sqri - lwi
        else
          si = mod((lwi - sqri), bp)
          if (si /= 0) si = bp - si
        end if
        if (bp <= i_bplmt) then
          olmt = min(i_bitlmt, si + bp * 8 - 1)
          do while (si <= olmt)
            msk = bits(iand(si, 7))
            do j = shiftr(si, 3), size - 1, bp
              sba(j) = ior(int(sba(j)), msk)
            end do
            si = si + bp
          end do
        else
          do while (si <= i_bitlmt)
            j = shiftr(si, 3)
            sba(j) = ior(sba(j), bits(iand(si, 7)))
            si = si + bp
          end do
        end if
      end if
      i = i + 1
      sqri = (i + i) * (i + 3) + 3
    end do
  
  end subroutine cullSieveBuffer
  
  integer function countSieveBuffer(lmti, almti, sba)
  
    implicit none
    integer, intent(in) :: lmti, almti
    byte, intent(in) :: sba(0:almti)
    integer :: bmsk, lsti, i, cnt
    byte, dimension (0:65535) :: clut
    common /counting/ clut
  
    cnt = 0
    bmsk = iand(shiftl(-2, iand(lmti, 15)), 65535)
    lsti = iand(shiftr(lmti, 3), -2)
    do i = 0, lsti - 1, 2
      cnt = cnt + clut(shiftl(iand(int(sba(i)), 255), 8) + iand(int(sba(i + 1)), 255))
    end do
    countSieveBuffer = cnt + clut(ior(shiftl(iand(int(sba(lsti)), 255), 8) + iand(int(sba(lsti + 1)), 255), bmsk))
    
  end function countSieveBuffer
  
  program sieve_paged
  
    use OMP_LIB
    implicit none
    integer, parameter :: i_max = 1000000000, i_range = (i_max - 3) / 2
    integer, parameter :: i_l1cache_size = 16384, i_l1cache_bitsz = i_l1cache_size * 8
    integer, parameter :: i_l2cache_size = i_l1cache_size * 8, i_l2cache_bitsz = i_l2cache_size * 8
    integer :: cr, c0, c1, i, j, k, cnt
    integer, save :: scnt
    integer :: countSieveBuffer
    integer :: numthrds
    byte, dimension (0:i_l1cache_size - 1) :: bpa
    byte, save, allocatable, dimension (:) :: sba
    byte, dimension (0:7) :: bits = (/ 1, 2, 4, 8, 16, 32, 64, -128 /)
    byte, dimension (0:65535) :: clut
    common /twiddling/ bits
    common /counting/ clut
  
    type heaparr
      byte, allocatable, dimension(:) :: thrdsba
    end type heaparr
    type(heaparr), allocatable, dimension (:) :: sbaa
  
    !$OMP THREADPRIVATE(scnt, sba)
  
    numthrds = 1
    !$ numthrds = OMP_get_max_threads()
    allocate(sbaa(0:numthrds - 1))
    do i = 0, numthrds - 1
      allocate(sbaa(i)%thrdsba(0:i_l2cache_size - 1))
    end do
  
    CALL SYSTEM_CLOCK(count_rate=cr)
    CALL SYSTEM_CLOCK(c0)
    do k = 0, 65535 ! initialize counting Look Up Table
      j = k
      i = 16
      do while (j > 0)
        i = i - 1
        j = iand(j, j - 1)
      end do
      clut(k) = i
    end do
    bpa = 0 ! pre-initialization not guaranteed!
    call cullSieveBuffer(0, i_l1cache_size, bpa, bpa)
  
    cnt = 1
    !$OMP PARALLEL DO ORDERED
      do i = i_l2cache_bitsz, i_range, i_l2cache_bitsz * 8
        scnt = 0
        sba = sbaa(mod(i, numthrds))%thrdsba
        do j = i, min(i_range, i + 8 * i_l2cache_bitsz - 1), i_l2cache_bitsz
          call cullSieveBuffer(j - i_l2cache_bitsz, i_l2cache_size, bpa, sba)
          scnt = scnt + countSieveBuffer(i_l2cache_bitsz - 1, i_l2cache_size, sba)
        end do
        !$OMP ATOMIC
          cnt = cnt + scnt
      end do
    !$OMP END PARALLEL DO
  
    j = i_range / i_l2cache_bitsz * i_l2cache_bitsz
    k = i_range - j
    if (k /= i_l2cache_bitsz - 1) then
      call cullSieveBuffer(j, i_l2cache_size, bpa, sbaa(0)%thrdsba)
      cnt = cnt + countSieveBuffer(k, i_l2cache_size, sbaa(0)%thrdsba)
    end if
  !  write (*, '(i0, 1x)', advance = 'no') 2
  !  do i = 0, i_range
  !    if (iand(sba(shiftr(i, 3)), bits(iand(i, 7))) == 0) write (*, '(i0, 1x)', advance='no') (i + i + 3)
  !  end do
  !  write (*, *)
    CALL SYSTEM_CLOCK(c1)
    print '(a, i0, a, i0, a, f0.0, a)', 'Found ', cnt, ' primes up to ', i_max, &
          ' in ', ((c1 - c0) / real(cr) * 1000), ' milliseconds.'
  
    do i = 0, numthrds - 1
      deallocate(sbaa(i)%thrdsba)
    end do
    deallocate(sbaa)
  
  end program sieve_paged

Output:

Found 50847534 primes up to 1000000000 in 219. milliseconds.

The above output was as compiled with gfortran -O3 -fopenmp using version 11.1.1-1 on my Intel Skylake i5-6500 CPU at 3.2 GHz multithreaded with four cores. There are a few more optimizations that could be made in applying Maximum Wheel-Factorization as per my StackOverflow answer in JavaScript, which will make this almost four times faster yet again. If that optimization were done, sieving to a billion as here is really too trivial to measure and one should sieve at least up to ten billion to start to get a long enough time to be measured accurately. As explained in that answer, the Maximum Wheel-Factorized code will work efficiently up to about a trillion (1e12), when it needs yet another "bucket sieve" optimization to allow it to continue to scale efficiently for increasing range. The final optimization which can speed up the code by almost a factor of two is a very low level loop unrolling technique that I'm not sure will work with the compiler, but as it works in C/C++ and other similar languages including those that compile through LLVM, it ought to.

Free Pascal

Basic version

function Sieve returns a list of primes less than or equal to the given aLimit

program prime_sieve;
{$mode objfpc}{$coperators on}
uses
  SysUtils, GVector;
type
  TPrimeList = specialize TVector<DWord>;
function Sieve(aLimit: DWord): TPrimeList;
var
  IsPrime: array of Boolean;
  I, SqrtBound: DWord;
  J: QWord;
begin
  Result := TPrimeList.Create;
  Inc(aLimit, Ord(aLimit < High(DWord))); //not a problem because High(DWord) is composite
  SetLength(IsPrime, aLimit);
  FillChar(Pointer(IsPrime)^, aLimit, Byte(True));
  SqrtBound := Trunc(Sqrt(aLimit));
  for I := 2 to aLimit do
    if IsPrime[I] then
      begin
        Result.PushBack(I);
        if I <= SqrtBound then
          begin
            J := I * I;
            repeat
              IsPrime[J] := False;
              J += I;
            until J > aLimit;
          end;
      end;
end;

 //usage

var
  Limit: DWord = 0;
function ReadLimit: Boolean;
var
  Lim: Int64;
begin
  if (ParamCount = 1) and Lim.TryParse(ParamStr(1), Lim) then
    if (Lim >= 0) and (Lim <= High(DWord)) then
      begin
        Limit := DWord(Lim);
        exit(True);
      end;
  Result := False;
end;
procedure PrintUsage;
begin
  WriteLn('Usage: prime_sieve Limit');
  WriteLn('  where Limit in the range [0, ', High(DWord), ']');
  Halt;
end;
procedure PrintPrimes(aList: TPrimeList);
var
  I: DWord;
begin
  if aList.Size <> 0 then begin
    if aList.Size > 1 then
      for I := 0 to aList.Size - 2 do
        Write(aList[I], ', ');
    WriteLn(aList[aList.Size - 1]);
  end;
  aList.Free;
end;
begin
  if not ReadLimit then
    PrintUsage;
  try
    PrintPrimes(Sieve(Limit));
  except
    on e: Exception do
      WriteLn('An exception ', e.ClassName, ' occurred with message: ', e.Message);
  end;
end.

Alternative segmented(odds only) version

function OddSegmentSieve returns a list of primes less than or equal to the given aLimit

program prime_sieve;
{$mode objfpc}{$coperators on}
uses
  SysUtils, Math;
type
  TPrimeList = array of DWord;
function OddSegmentSieve(aLimit: DWord): TPrimeList;
  function EstimatePrimeCount(aLimit: DWord): DWord;
  begin
    case aLimit of
      0..1:   Result := 0;
      2..200: Result := Trunc(1.6 * aLimit/Ln(aLimit)) + 1;
    else
      Result := Trunc(aLimit/(Ln(aLimit) - 2)) + 1;
    end;
  end;
  function Sieve(aLimit: DWord; aNeed2: Boolean): TPrimeList;
  var
    IsPrime: array of Boolean;
    I: DWord = 3;
    J, SqrtBound: DWord;
    Count: Integer = 0;
  begin
    if aLimit < 2 then
      exit(nil);
    SetLength(IsPrime, (aLimit - 1) div 2);
    FillChar(Pointer(IsPrime)^, Length(IsPrime), Byte(True));
    SetLength(Result, EstimatePrimeCount(aLimit));
    SqrtBound := Trunc(Sqrt(aLimit));
    if aNeed2 then
      begin
        Result[0] := 2;
        Inc(Count);
      end;
    for I := 0 to High(IsPrime) do
      if IsPrime[I] then
        begin
          Result[Count] := I * 2 + 3;
          if Result[Count] <= SqrtBound then
            begin
              J := Result[Count] * Result[Count];
              repeat
                IsPrime[(J - 3) div 2] := False;
                J += Result[Count] * 2;
              until J > aLimit;
            end;
          Inc(Count);
        end;
    SetLength(Result, Count);
  end;
const
  PAGE_SIZE = $8000;
var
  IsPrime: array[0..Pred(PAGE_SIZE)] of Boolean; //current page
  SmallPrimes: TPrimeList = nil;
  I: QWord;
  J, PageHigh, Prime: DWord;
  Count: Integer;
begin
  if aLimit < PAGE_SIZE div 4 then
    exit(Sieve(aLimit, True));
  I := Trunc(Sqrt(aLimit));
  SmallPrimes := Sieve(I + 1, False);
  Count := Length(SmallPrimes) + 1;
  I += Ord(not Odd(I));
  SetLength(Result, EstimatePrimeCount(aLimit));
  while I <= aLimit do
    begin
      PageHigh := Min(Pred(PAGE_SIZE * 2), aLimit - I);
      FillChar(IsPrime, PageHigh div 2 + 1, Byte(True));
      for Prime in SmallPrimes do
        begin
          J := DWord(I) mod Prime;
          if J <> 0 then
            J := Prime shl (1 - J and 1) - J;
          while J <= PageHigh do
            begin
              IsPrime[J div 2] := False;
              J += Prime * 2;
            end;
        end;
      for J := 0 to PageHigh div 2 do
        if IsPrime[J] then
          begin
            Result[Count] := J * 2 + I;
            Inc(Count);
          end;
      I += PAGE_SIZE * 2;
    end;
  SetLength(Result, Count);
  Result[0] := 2;
  Move(SmallPrimes[0], Result[1], Length(SmallPrimes) * SizeOf(DWord));
end;

  //usage

var
  Limit: DWord = 0;
function ReadLimit: Boolean;
var
  Lim: Int64;
begin
  if (ParamCount = 1) and Lim.TryParse(ParamStr(1), Lim) then
    if (Lim >= 0) and (Lim <= High(DWord)) then
      begin
        Limit := DWord(Lim);
        exit(True);
      end;
  Result := False;
end;
procedure PrintUsage;
begin
  WriteLn('Usage: prime_sieve Limit');
  WriteLn('  where Limit in the range [0, ', High(DWord), ']');
  Halt;
end;
procedure PrintPrimes(const aList: TPrimeList);
var
  I: DWord;
begin
  for I := 0 to Length(aList) - 2 do
    Write(aList[I], ', ');
  if aList <> nil then
    WriteLn(aList[High(aList)]);
end;
begin
  if not ReadLimit then
    PrintUsage;
  PrintPrimes(OddSegmentSieve(Limit));
end.

FreeBASIC

' FB 1.05.0

Sub sieve(n As Integer)
  If n < 2 Then Return
  Dim a(2 To n) As Integer
  For i As Integer = 2 To n : a(i) = i : Next
  Dim As Integer p = 2, q
  ' mark non-prime numbers by setting the corresponding array element to 0
  Do
    For j As Integer = p * p To n Step p
      a(j) = 0
    Next j
    ' look for next non-zero element in array after 'p'
    q = 0
    For j As Integer = p + 1 To Sqr(n)
      If a(j) <> 0 Then
        q = j
        Exit For
      End If
    Next j    
    If q = 0 Then Exit Do
    p = q
  Loop

  ' print the non-zero numbers remaining i.e. the primes
  For i As Integer = 2 To n
    If a(i) <> 0 Then
      Print Using "####"; a(i);      
    End If
  Next
  Print
End Sub

Print "The primes up to 1000 are :"
Print
sieve(1000)
Print
Print "Press any key to quit"
Sleep

Output:

The primes up to 1000 are :

   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71
  73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
 947 953 967 971 977 983 991 997

Frink

n = eval[input["Enter highest number: "]]
results = array[sieve[n]]
println[results]
println[length[results] + " prime numbers less than or equal to " + n]

sieve[n] :=
{
   // Initialize array
   array = array[0 to n]
   array@1 = 0

   for i = 2 to ceil[sqrt[n]]
      if array@i != 0
         for j = i^2 to n step i
            array@j = 0

   return select[array, { |x| x != 0 }]
}

Furor

Note: With benchmark function

tick sto startingtick
#g 100000 sto MAX
@MAX mem !maximize sto primeNumbers
one count
@primeNumbers 0 2 [^]
2 @MAX külső: {||
@count {|
{}§külső {} []@primeNumbers !/ else{<}§külső
|} // @count vége
@primeNumbers @count++ {} [^]
|} // @MAX vége
@primeNumbers free
."Time : " tick @startingtick - print ." tick\n"
."Prímek száma = " @count printnl
end
{ „MAX” } { „startingtick” } { „primeNumbers” } { „count” }

Peri

Note: With benchmark function

###sysinclude standard.uh
tick sto startingtick
#g 100000 sto MAX
@MAX mem !maximize sto primeNumbers
one count
2 0 sto#s primeNumbers
2 @MAX külső: {{ ,
@count {{
{{}}§külső primeNumbers[{{}}] !/ else {{<}}§külső
}} // @count vége
//{{}} gprintnl  // A talált prímszám kiiratásához kommentezzük ki e sort
{{}} @count++ sto#s primeNumbers
}} // @MAX vége
@primeNumbers inv mem
//."Time : " tick @startingtick - print ." tick\n"
."Prímek száma = " @count printnl
end
{ „MAX” } { „startingtick” } { „primeNumbers” } { „count” }

FutureBasic

Basic sieve of array of booleans

window 1, @"Sieve of Eratosthenes", (0,0,720,300)

begin globals
dynamic gPrimes(1) as Boolean
end globals

local fn SieveOfEratosthenes( n as long )
  long i, j
  
  for i = 2 to  n
    for j = i * i to n step i
      gPrimes(j) = _true
    next
    if gPrimes(i) = 0 then print i,
  next i
  kill gPrimes
end fn

fn SieveOfEratosthenes( 100 )

HandleEvents

Output:

 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Fōrmulæ

Fōrmulæ programs are not textual, visualization/edition of programs is done showing/manipulating structures but not text. Moreover, there can be multiple visual representations of the same program. Even though it is possible to have textual representation —i.e. XML, JSON— they are intended for storage and transfer purposes more than visualization and edition.

Programs in Fōrmulæ are created/edited online in its website.

In this page you can see and run the program(s) related to this task and their results. You can also change either the programs or the parameters they are called with, for experimentation, but remember that these programs were created with the main purpose of showing a clear solution of the task, and they generally lack any kind of validation.

Solution

Test case

GAP

Eratosthenes := function(n)
    local a, i, j;
    a := ListWithIdenticalEntries(n, true);
    if n < 2 then
        return [];
    else
        for i in [2 .. n] do
            if a[i] then
                j := i*i;
                if j > n then
                    return Filtered([2 .. n], i -> a[i]);
                else
                    while j <= n do
                        a[j] := false;
                        j := j + i;
                    od;
                fi;
            fi;
        od;
    fi;
end;

Eratosthenes(100);

[ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ]

GLBasic

// Sieve of Eratosthenes (find primes)
// GLBasic implementation


GLOBAL n%, k%, limit%, flags%[]
 
limit = 100			// search primes up to this number

DIM flags[limit+1]		// GLBasic arrays start at 0
 
FOR n = 2 TO SQR(limit)
    IF flags[n] = 0
        FOR k = n*n TO limit STEP n
            flags[k] = 1
        NEXT
    ENDIF
NEXT
 
// Display the primes
FOR n = 2 TO limit
    IF flags[n] = 0 THEN STDOUT n + ", "
NEXT

KEYWAIT

Go

Basic sieve of array of booleans

package main
import "fmt"

func main() {
    const limit = 201 // means sieve numbers < 201

    // sieve
    c := make([]bool, limit) // c for composite.  false means prime candidate
    c[1] = true              // 1 not considered prime
    p := 2
    for {
        // first allowed optimization:  outer loop only goes to sqrt(limit)
        p2 := p * p
        if p2 >= limit {
            break
        }
        // second allowed optimization:  inner loop starts at sqr(p)
        for i := p2; i < limit; i += p {
            c[i] = true // it's a composite

        }
        // scan to get next prime for outer loop
        for {
            p++
            if !c[p] {
                break
            }
        }
    }

    // sieve complete.  now print a representation.
    for n := 1; n < limit; n++ {
        if c[n] {
            fmt.Print("  .")
        } else {
            fmt.Printf("%3d", n)
        }
        if n%20 == 0 {
            fmt.Println("")
        }
    }
}

Output:

  .  2  3  .  5  .  7  .  .  . 11  . 13  .  .  . 17  . 19  .
  .  . 23  .  .  .  .  . 29  . 31  .  .  .  .  . 37  .  .  .
 41  . 43  .  .  . 47  .  .  .  .  . 53  .  .  .  .  . 59  .
 61  .  .  .  .  . 67  .  .  . 71  . 73  .  .  .  .  . 79  .
  .  . 83  .  .  .  .  . 89  .  .  .  .  .  .  . 97  .  .  .
101  .103  .  .  .107  .109  .  .  .113  .  .  .  .  .  .  .
  .  .  .  .  .  .127  .  .  .131  .  .  .  .  .137  .139  .
  .  .  .  .  .  .  .  .149  .151  .  .  .  .  .157  .  .  .
  .  .163  .  .  .167  .  .  .  .  .173  .  .  .  .  .179  .
181  .  .  .  .  .  .  .  .  .191  .193  .  .  .197  .199  .

Odds-only bit-packed array output-enumerating version

The above version's output is rather specialized; the following version uses a closure function to enumerate over the culled composite number array, which is bit packed. By using this scheme for output, no extra memory is required above that required for the culling array:

package main

import (
	"fmt"
	"math"
)

func primesOdds(top uint) func() uint {
	topndx := int((top - 3) / 2)
	topsqrtndx := (int(math.Sqrt(float64(top))) - 3) / 2
	cmpsts := make([]uint, (topndx/32)+1)
	for i := 0; i <= topsqrtndx; i++ {
		if cmpsts[i>>5]&(uint(1)<<(uint(i)&0x1F)) == 0 {
			p := (i << 1) + 3
			for j := (p*p - 3) >> 1; j <= topndx; j += p {
				cmpsts[j>>5] |= 1 << (uint(j) & 0x1F)
			}
		}
	}
	i := -1
	return func() uint {
		oi := i
		if i <= topndx {
			i++
		}
		for i <= topndx && cmpsts[i>>5]&(1<<(uint(i)&0x1F)) != 0 {
			i++
		}
		if oi < 0 {
			return 2
		} else {
			return (uint(oi) << 1) + 3
		}
	}
}

func main() {
	iter := primesOdds(100)
	for v := iter(); v <= 100; v = iter() {
		print(v, " ")
	}
	iter = primesOdds(1000000)
	count := 0
	for v := iter(); v <= 1000000; v = iter() {
		count++
	}
	fmt.Printf("\r\n%v\r\n", count)
}

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
78498

Sieve Tree

A fairly odd sieve tree method:

package main
import "fmt"

type xint uint64
type xgen func()(xint)

func primes() func()(xint) {
	pp, psq := make([]xint, 0), xint(25)

	var sieve func(xint, xint)xgen
	sieve = func(p, n xint) xgen {
		m, next := xint(0), xgen(nil)
		return func()(r xint) {
			if next == nil {
				r = n
				if r <= psq {
					n += p
					return
				}

				next = sieve(pp[0] * 2, psq) // chain in
				pp = pp[1:]
				psq = pp[0] * pp[0]

				m = next()
			}
			switch {
			case n < m: r, n = n, n + p
			case n > m: r, m = m, next()
			default:    r, n, m = n, n + p, next()
			}
			return
		}
	}

	f := sieve(6, 9)
	n, p := f(), xint(0)

	return func()(xint) {
		switch {
		case p < 2: p = 2
		case p < 3: p = 3
		default:
			for p += 2; p == n; {
				p += 2
				if p > n {
					n = f()
				}
			}
			pp = append(pp, p)
		}
		return p
	}
}

func main() {
	for i, p := 0, primes(); i < 100000; i++ {
		fmt.Println(p())
	}
}

Concurrent Daisy-chain sieve

A concurrent prime sieve adopted from the example in the "Go Playground" window at http://golang.org/

package main
import "fmt"
 
// Send the sequence 2, 3, 4, ... to channel 'out'
func Generate(out chan<- int) {
	for i := 2; ; i++ {
		out <- i                  // Send 'i' to channel 'out'
	}
}
 
// Copy the values from 'in' channel to 'out' channel,
//   removing the multiples of 'prime' by counting.
// 'in' is assumed to send increasing numbers
func Filter(in <-chan int, out chan<- int, prime int) {
        m := prime + prime                // first multiple of prime
	for {
		i := <- in                // Receive value from 'in'
		for i > m {
			m = m + prime     // next multiple of prime
			}
		if i < m {
			out <- i          // Send 'i' to 'out'
			}
	}
}
 
// The prime sieve: Daisy-chain Filter processes
func Sieve(out chan<- int) {
	gen := make(chan int)             // Create a new channel
	go Generate(gen)                  // Launch Generate goroutine
	for  {
		prime := <- gen
		out <- prime
		ft := make(chan int)
		go Filter(gen, ft, prime)
		gen = ft
	}
}

func main() {
	sv := make(chan int)              // Create a new channel
	go Sieve(sv)                      // Launch Sieve goroutine
	for i := 0; i < 1000; i++ {
		prime := <- sv
		if i >= 990 { 
		    fmt.Printf("%4d ", prime) 
		    if (i+1)%20==0 {
			fmt.Println("")
		    }
		}
	}
}

The output:

7841 7853 7867 7873 7877 7879 7883 7901 7907 7919

Runs at ~ n^2.1 empirically, producing up to n=3000 primes in under 5 seconds.

Postponed Concurrent Daisy-chain sieve

Here we postpone the creation of filters until the prime's square is seen in the input, to radically reduce the amount of filter channels in the sieve chain.

package main
import "fmt"
 
// Send the sequence 2, 3, 4, ... to channel 'out'
func Generate(out chan<- int) {
	for i := 2; ; i++ {
		out <- i                  // Send 'i' to channel 'out'
	}
}
 
// Copy the values from 'in' channel to 'out' channel,
//   removing the multiples of 'prime' by counting.
// 'in' is assumed to send increasing numbers
func Filter(in <-chan int, out chan<- int, prime int) {
        m := prime * prime                // start from square of prime
	for {
		i := <- in                // Receive value from 'in'
		for i > m {
			m = m + prime     // next multiple of prime
			}
		if i < m {
			out <- i          // Send 'i' to 'out'
			}
	}
}
 
// The prime sieve: Postponed-creation Daisy-chain of Filters
func Sieve(out chan<- int) {
	gen := make(chan int)             // Create a new channel
	go Generate(gen)                  // Launch Generate goroutine
	p := <- gen
	out <- p
	p = <- gen          // make recursion shallower ---->
	out <- p            // (Go channels are _push_, not _pull_)
	
	base_primes := make(chan int)     // separate primes supply  
	go Sieve(base_primes)             
	bp := <- base_primes              // 2           <---- here
	bq := bp * bp                     // 4

	for  {
		p = <- gen
		if p == bq {                    // square of a base prime
			ft := make(chan int)
			go Filter(gen, ft, bp)  // filter multiples of bp in gen out
			gen = ft
			bp = <- base_primes     // 3
			bq = bp * bp            // 9
		} else {
			out <- p
		}
	}
}

func main() {
	sv := make(chan int)              // Create a new channel
	go Sieve(sv)                      // Launch Sieve goroutine
	lim := 25000              
	for i := 0; i < lim; i++ {
		prime := <- sv
		if i >= (lim-10) { 
		    fmt.Printf("%4d ", prime) 
		    if (i+1)%20==0 {
			fmt.Println("")
		    }
		}
	}
}

The output:

286999 287003 287047 287057 287059 287087 287093 287099 287107 287117

Runs at ~ n^1.2 empirically, producing up to n=25,000 primes on ideone in under 5 seconds.

Incremental Odds-only Sieve

Uses Go's built-in hash tables to store odd composites, and defers adding new known composites until the square is seen.

package main

import "fmt"

func main() {
    primes := make(chan int)
    go PrimeSieve(primes)

    p := <-primes
    for p < 100 {
        fmt.Printf("%d ", p)
        p = <-primes
    }

    fmt.Println()
}

func PrimeSieve(out chan int) {
    out <- 2
    out <- 3

    primes := make(chan int)
    go PrimeSieve(primes)

    var p int
    <-primes
    p = <-primes

    sieve := make(map[int]int)
    q := p * p
    n := p

    for {
        n += 2
        step, isComposite := sieve[n]
        if isComposite {
            delete(sieve, n)
            m := n + step
            for sieve[m] != 0 {
                m += step
            }
            sieve[m] = step

        } else if n < q {
            out <- n

        } else {
            step = p + p
            m := n + step
            for sieve[m] != 0 {
                m += step
            }
            sieve[m] = step
            p = <-primes
            q = p * p
        }
    }
}

The output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Groovy

This solution uses a BitSet for compactness and speed, but in Groovy, BitSet has full List semantics. It also uses both the "square root of the boundary" shortcut and the "square of the prime" shortcut.

def sievePrimes = { bound -> 
    def isPrime  = new BitSet(bound)
    isPrime[0..1] = false
    isPrime[2..bound] = true
    (2..(Math.sqrt(bound))).each { pc ->
        if (isPrime[pc]) {
            ((pc**2)..bound).step(pc) { isPrime[it] = false }
        }
    }
    (0..bound).findAll { isPrime[it] }
}

Test:

println sievePrimes(100)

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

GW-BASIC

10  INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT
20  DIM FLAGS(LIMIT)
30  FOR N = 2 TO SQR (LIMIT)
40  IF FLAGS(N) < > 0 GOTO 80
50  FOR K = N * N TO LIMIT STEP N
60  FLAGS(K) = 1
70  NEXT K
80  NEXT N
90  REM  DISPLAY THE PRIMES
100  FOR N = 2 TO LIMIT
110  IF FLAGS(N) = 0 THEN PRINT N;", ";
120  NEXT N

Haskell

Mutable unboxed arrays

Mutable array of unboxed Bools indexed by Ints:

{-# LANGUAGE FlexibleContexts #-} -- too lazy to write contexts...
{-# OPTIONS_GHC -O2 #-}

import Control.Monad.ST ( runST, ST )
import Data.Array.Base ( MArray(newArray, unsafeRead, unsafeWrite),
                         IArray(unsafeAt),
                         STUArray, unsafeFreezeSTUArray, assocs )
import Data.Time.Clock.POSIX ( getPOSIXTime ) -- for timing...

primesTo :: Int -> [Int] -- generate a list of primes to given limit...
primesTo limit = runST $ do
  let lmt = limit - 2-- raw index of limit!
  cmpsts <- newArray (2, limit) False -- when indexed is true is composite
  cmpstsf <- unsafeFreezeSTUArray cmpsts -- frozen in place!
  let getbpndx bp = (bp, bp * bp - 2) -- bp -> bp, raw index of start cull
      cullcmpst i = unsafeWrite cmpsts i True -- cull composite by raw ndx
      cull4bpndx (bp, si0) = mapM_ cullcmpst [ si0, si0 + bp .. lmt ]
  mapM_ cull4bpndx
        $ takeWhile ((>=) lmt . snd) -- for bp's <= square root limit
                    [ getbpndx bp | (bp, False) <- assocs cmpstsf ]
  return [ p | (p, False) <- assocs cmpstsf ] -- non-raw ndx is prime

-- testing...
main :: IO ()
main = do
  putStrLn $ "The primes up to 100 are " ++ show (primesTo 100)
  putStrLn $ "The number of primes up to a million is " ++
               show (length $ primesTo 1000000)
  let top = 1000000000
  start <- getPOSIXTime
  let answr = length $ primesTo top
  stop <- answr `seq` getPOSIXTime -- force result for timing!
  let elpsd =  round $ 1e3 * (stop - start) :: Int

  putStrLn $ "Found " ++ show answr ++ " to " ++ show top ++
               " in " ++ show elpsd ++ " milliseconds."

The above code chooses conciseness and elegance over speed, but it isn't too slow:

Output:

The primes up to 100 are [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
The number of primes up to a million is 78498
Found 50847534 to 1000000000 in 12435 milliseconds.

Run on an Intel Sky Lake i5-2500 at 3.6 GHZ (single threaded boost). As per the comments in the below, this is greatly sped up by a constant factor by using the raw `unsafeWrite`; use of the "unsafe" versions that avoid run time array bounds checks on every operation is entirely safe here as the indexing is inherently limited to be within the bounds by their use in the loops. There is an additional benefit of about 20 per cent in speed if run with the LLVM back end compiler option (add the "-fllvm" flag) if the right version of LLVM is available to the GHC Haskell compiler. We see the relatively small benefit of using LLVM in that this program spends a relatively small percentage of time in the tight inner culling loop where LLVM can help the most and a high part of the time is spent just enumerating the result list.

Mutable unboxed arrays, odds only

Mutable array of unboxed Bools indexed by Ints, representing odds only:

import Control.Monad (forM_, when)
import Control.Monad.ST
import Data.Array.ST
import Data.Array.Unboxed

sieveUO :: Int -> UArray Int Bool
sieveUO top = runSTUArray $ do
    let m = (top-1) `div` 2
        r = floor . sqrt $ fromIntegral top + 1
    sieve <- newArray (1,m) True          -- :: ST s (STUArray s Int Bool)
    forM_ [1..r `div` 2] $ \i -> do       -- prime(i) = 2i+1
      isPrime <- readArray sieve i        -- ((2i+1)^2-1)`div`2 = 2i(i+1)
      when isPrime $ do                   
        forM_ [2*i*(i+1), 2*i*(i+2)+1..m] $ \j -> do
          writeArray sieve j False
    return sieve

primesToUO :: Int -> [Int]
primesToUO top | top > 1   = 2 : [2*i + 1 | (i,True) <- assocs $ sieveUO top]
               | otherwise = []

This represents odds only in the array. Empirical orders of growth is ~ n^1.2 in n primes produced, and improving for bigger n‍ ‍s. Memory consumption is low (array seems to be packed) and growing about linearly with n. Can further be significantly sped up by re-writing the forM_ loops with direct recursion, and using unsafeRead and unsafeWrite operations.

In light of the performance of the previous and following submissions results, the IDEOne results seem somewhat slow at about 10 seconds over a range of about a third of a billion, likely due to some lazily deferred operations in the processing. See the next submission for expected speeds for odds only.

The measured empirical orders of growth as per the table in the IDEOne link are easily understood if one considers that these slowish run times are primarily limited by the time to lazily enumerate the results and that the number of found primes to enumerate varies as (top / log top) by the Euler relationship. Since the prime density decreases by this relationship, the enumeration has the inverse relationship as it takes longer per prime to find the primes in the sieved buffer. Log of a million is 1.2 times larger than log of a hundred thousand and of course this ratio gets smaller with range: the ratio of the log of a billion as compared to log of a hundred million is 1.125, etc.

Alternate Version of Mutable unboxed arrays, odds only

The reason for this alternate version is to have an accessible version of "odds only" that uses the same optimizations and is written in the same coding style as the basic version. This can be used by just substituting the following code for the function of the same name in the first base example above. Mutable array of unboxed Bools indexed by Ints, representing odds only:

primesTo :: Int -> [Int] -- generate a list of primes to given limit...
primesTo limit
  | limit < 2 = []
  | otherwise = runST $ do
      let lmt = (limit - 3) `div` 2 - 1 -- limit index!
      oddcmpsts <- newArray (0, lmt) False -- when indexed is true is composite
      oddcmpstsf <- unsafeFreezeSTUArray oddcmpsts -- frozen in place!
      let getbpndx i = (i + i + 3, (i + i) * (i + 3) + 3) -- index -> bp, si0
          cullcmpst i = unsafeWrite oddcmpsts i True -- cull composite by index
          cull4bpndx (bp, si0) = mapM_ cullcmpst [ si0, si0 + bp .. lmt ]
      mapM_ cull4bpndx
            $ takeWhile ((>=) lmt . snd) -- for bp's <= square root limit
                        [ getbpndx i | (i, False) <- assocs oddcmpstsf ]
      return $ 2 : [ i + i + 3 | (i, False) <- assocs oddcmpstsf ]

Output:

The primes up to 100 are [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
The number of primes up to a million is 78498
Found 50847534 to 1000000000 in 6085 milliseconds.

A "monolithic buffer" odds only sieve uses half the memory as compared to the basic version.

This is not the expected about 2.5 times faster as the basic version because there are other factors to execution time cost than just the number of culling operations, as follows:

1) Since the amount of memory used to sieve to a billion has been dropped from 125 million bytes to 62.5 million bytes, the cache associativity is slightly better, which should make it faster; however

2) We have eliminated the culling by the very small span of the base prime of two, which means a lesser percentage of the culling span operations will be within a given CPU cache size, which will make it slower, but

3) The primary reason we observe only about a factor of two difference in run times is that we have increased the prime density in the sieving buffer by a factor of two, which means that we have half the work to enumerate the primes. Since enumeration of the found primes is a major contribution of the execution time, the execution time will tend to change more by its cost than any other.

As to "empirical orders of growth", the comments made in the above are valid, but there is a further observation. For smaller ranges of primes up to a few million where the sieving buffer fits within the CPU L2 cache size (generally 256 Kilobytes/2 million bits, representing a range of about four million for this version), the cull times are their fastest and enumeration is a bigger percentage of the time; as ranges increase above that, more and more time is spent waiting on memory at the access times of the next level memory (CPU L3 cache, if present, followed by main memory) so that the controlling factor is a little less that of the enumeration time as range gets larger.

Because of the greatly increasing memory demands and the high execution cost of memory access as ranges exceed the span of the CPU caches, it is not recommended that these simple "monolithic buffer" sieves be used for sieving of ranges above about a hundred million. Rather, one should use a "Paged-Segmented" sieve as per the examples near the end of this Haskell section.

Immutable arrays

Monolithic sieving array. Even numbers above 2 are pre-marked as composite, and sieving is done only by odd multiples of odd primes:

import Data.Array.Unboxed
 
primesToA m = sieve 3 (array (3,m) [(i,odd i) | i<-[3..m]] :: UArray Int Bool)
  where
    sieve p a 
      | p*p > m   = 2 : [i | (i,True) <- assocs a]
      | a!p       = sieve (p+2) $ a//[(i,False) | i <- [p*p, p*p+2*p..m]]
      | otherwise = sieve (p+2) a

Its performance sharply depends on compiler optimizations. Compiled with -O2 flag in the presence of the explicit type signature, it is very fast in producing first few million primes. (//) is an array update operator.

Immutable arrays, by segments

Works by segments between consecutive primes' squares. Should be the fastest of non-monadic code. Evens are entirely ignored:

import Data.Array.Unboxed

primesSA = 2 : prs ()
  where 
    prs () = 3 : sieve 3 [] (prs ())
    sieve x fs (p:ps) = [i*2 + x | (i,True) <- assocs a] 
                        ++ sieve (p*p) fs2 ps
     where
      q     = (p*p-x)`div`2                  
      fs2   = (p,0) : [(s, rem (y-q) s) | (s,y) <- fs]
      a     :: UArray Int Bool
      a     = accumArray (\ b c -> False) True (1,q-1)
                         [(i,()) | (s,y) <- fs, i <- [y+s, y+s+s..q]]

As list comprehension

import Data.Array.Unboxed
import Data.List (tails, inits)

primes = 2 : [ n |
   (r:q:_, px) <- zip (tails (2 : [p*p | p <- primes]))
                      (inits primes),
   (n, True)   <- assocs ( accumArray (\_ _ -> False) True
                     (r+1,q-1)
                     [ (m,()) | p <- px
                              , s <- [ div (r+p) p * p]
                              , m <- [s,s+p..q-1] ] :: UArray Int Bool
                  ) ]

Basic list-based sieve

Straightforward implementation of the sieve of Eratosthenes in its original bounded form. This finds primes in gaps between the composites, and composites as an enumeration of each prime's multiples.

primesTo m = eratos [2..m] where
   eratos (p : xs) 
      | p*p > m   = p : xs
      | otherwise = p : eratos (xs `minus` [p*p, p*p+p..m])
                                    -- map (p*) [p..]  
                                    -- map (p*) (p:xs)   -- (Euler's sieve)
   
minus a@(x:xs) b@(y:ys) = case compare x y of
         LT -> x : minus  xs b
         EQ ->     minus  xs ys
         GT ->     minus  a  ys
minus a        b        = a

Its time complexity is similar to that of optimal trial division because of limitations of Haskell linked lists, where (minus a b) takes time proportional to length(union a b) and not (length b), as achieved in imperative setting with direct-access memory. Uses ordered list representation of sets.

This is reasonably useful up to ranges of fifteen million or about the first million primes.

Unbounded list based sieve

Unbounded, "naive", too eager to subtract (see above for the definition of minus):

primesE  = sieve [2..] 
           where
           sieve (p:xs) = p : sieve (minus xs [p, p+p..])
-- unfoldr (\(p:xs)-> Just (p, minus xs [p, p+p..])) [2..]

This is slow, with complexity increasing as a square law or worse so that it is only moderately useful for the first few thousand primes or so.

The number of active streams can be limited to what's strictly necessary by postponement until the square of a prime is seen, getting a massive complexity improvement to better than ~ n^1.5 so it can get first million primes or so in a tolerable time:

primesPE = 2 : sieve [3..] 4 primesPE
               where
               sieve (x:xs) q (p:t)
                 | x < q     = x : sieve xs q (p:t)
                 | otherwise =     sieve (minus xs [q, q+p..]) (head t^2) t
-- fix $ (2:) . concat 
--     . unfoldr (\(p:ps,xs)-> Just . second ((ps,) . (`minus` [p*p, p*p+p..])) 
--                                  . span (< p*p) $ xs) . (,[3..])

Transposing the workflow, going by segments between the consecutive squares of primes:

import Data.List (inits)

primesSE = 2 : sieve 3 4 (tail primesSE) (inits primesSE) 
               where
               sieve x q ps (fs:ft) =  
                  foldl minus [x..q-1] [[n, n+f..q-1] | f <- fs, let n=div x f * f]
                          -- [i|(i,True) <- assocs ( accumArray (\ b c -> False) 
                          --     True (x,q-1) [(i,()) | f <- fs, let n=div(x+f-1)f*f,
                          --         i <- [n, n+f..q-1]] :: UArray Int Bool )]
                  ++ sieve q (head ps^2) (tail ps) ft

The basic gradually-deepening left-leaning (((a-b)-c)- ... ) workflow of foldl minus a bs above can be rearranged into the right-leaning (a-(b+(c+ ... ))) workflow of minus a (foldr union [] bs). This is the idea behind Richard Bird's unbounded code presented in M. O'Neill's article, equivalent to:

primesB = _Y ( (2:) . minus [3..] . foldr (\p-> (p*p :) . union [p*p+p, p*p+2*p..]) [] )

--      = _Y ( (2:) . minus [3..] . _LU . map(\p-> [p*p, p*p+p..]) )
-- _LU ((x:xs):t) = x : (union xs . _LU) t             -- linear folding big union

_Y g = g (_Y g)  -- = g (g (g ( ... )))      non-sharing multistage fixpoint combinator
--                  = g . g . g . ...            ... = g^inf
--   = let x = g x in g x -- = g (fix g)     two-stage fixpoint combinator 
--   = let x = g x in x   -- = fix g         sharing fixpoint combinator

union a@(x:xs) b@(y:ys) = case compare x y of
         LT -> x : union  xs b
         EQ -> x : union  xs ys
         GT -> y : union  a  ys

Using _Y is meant to guarantee the separate supply of primes to be independently calculated, recursively, instead of the same one being reused, corecursively; thus the memory footprint is drastically reduced. This idea was introduced by M. ONeill as a double-staged production, with a separate primes feed.

The above code is also useful to a range of the first million primes or so. The code can be further optimized by fusing minus [3..] into one function, preventing a space leak with the newer GHC versions, getting the function gaps defined below.

Tree-merging incremental sieve

Linear merging structure can further be replaced with an wiki.haskell.org/Prime_numbers#Tree_merging indefinitely deepening to the right tree-like structure, (a-(b+((c+d)+( ((e+f)+(g+h)) + ... )))).

This merges primes' multiples streams in a tree-like fashion, as a sequence of balanced trees of union nodes, likely achieving theoretical time complexity only a log n factor above the optimal n log n log (log n), for n primes produced. Indeed, empirically it runs at about ~ n^1.2 (for producing first few million primes), similarly to priority-queue–based version of M. O'Neill's, and with very low space complexity too (not counting the produced sequence of course):

primes :: () -> [Int]   
primes() = 2 : _Y ((3:) . gaps 5 . _U . map(\p-> [p*p, p*p+2*p..])) where
  _Y g = g (_Y g)  -- = g (g (g ( ... )))   non-sharing multistage fixpoint combinator
  gaps k s@(c:cs) | k < c     = k : gaps (k+2) s  -- ~= ([k,k+2..] \\ s)
                  | otherwise =     gaps (k+2) cs --   when null(s\\[k,k+2..]) 
  _U ((x:xs):t) = x : (merge xs . _U . pairs) t   -- tree-shaped folding big union
  pairs (xs:ys:t) = merge xs ys : pairs t
  merge xs@(x:xt) ys@(y:yt) | x < y     = x : merge xt ys
                            | y < x     = y : merge xs yt
                            | otherwise = x : merge xt yt

Works with odds only, the simplest kind of wheel. Here's the test entry on Ideone.com, and a comparison with more versions.

With Wheel

Using _U defined above,

primesW :: [Int]   
primesW = [2,3,5,7] ++ _Y ( (11:) . gapsW 13 (tail wheel) . _U .
                            map (\p->  
                              map (p*) . dropWhile (< p) $
                                scanl (+) (p - rem (p-11) 210) wheel) )

gapsW k (d:w) s@(c:cs) | k < c     = k : gapsW (k+d) w s    -- set difference
                       | otherwise =     gapsW (k+d) w cs   --   k==c

wheel = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:    -- gaps = (`gapsW` cycle [2])
        4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel
  -- cycle $ zipWith (-) =<< tail $ [i | i <- [11..221], gcd i 210 == 1]

Used here and here.

Improved efficiency Wheels

1. The generation of large wheels such as the 2/3/5/7/11/13/17 wheel, which has 92160 cyclic elements, needs to be done based on sieve culling which is much better as to performance and can be used without inserting the generated table.

2. Improving the means to re-generate the position on the wheel for the recursive base primes without the use of `dropWhile`, etc. The below improved code uses a copy of the place in the wheel for each found base prime for ease of use in generating the composite number to-be-culled chains.

-- autogenerates wheel primes, first sieve prime, and gaps
wheelGen :: Int -> ([Int],Int,[Int])
wheelGen n = loop 1 3 [2] [2] where
  loop i frst wps gps =
    if i >= n then (wps, frst, gps) else
    let nfrst = frst + head gps
        nhts = (length gps) * (frst - 1)
        cmpsts = scanl (\ c g -> c + frst * g)  (frst * frst) (cycle gps)
        cull n (g:gs') cs@(c:cs') og
            | nn >= c = cull nn gs' cs' (og + g) -- n == c; never greater!
            | otherwise = (og + g) : cull nn gs' cs 0 where nn = n + g
    in nfrst `seq` nhts `seq` loop (i + 1) nfrst (wps ++ [frst]) $ take nhts
                                $ cull nfrst (tail $ cycle gps) cmpsts 0

(wheelPrimes, firstSievePrime, gaps) = wheelGen 7

primesTreeFoldingWheeled :: () -> [Int]   
primesTreeFoldingWheeled() =    
    wheelPrimes ++ map fst (
      _Y ( ((firstSievePrime, wheel) :) .
               gapsW (firstSievePrime + head wheel, tail wheel) . _U .
                 map (\ (p,w) ->
                          scanl (\ c m -> c + m * p) (p * p) w ) ) ) where

  _Y g = g (_Y g) -- non-sharing multi-stage fixpoint Y-combinator

  wheel = cycle gaps

  gapsW k@(n,d:w) s@(c:cs) | n < c     = k : gapsW (n + d, w) s  -- set diff
                           | otherwise =     gapsW (n + d, w) cs --   n == c
 
  _U ((x:xs):t) = -- exactly the same as for odds-only!
      x : (union xs . _U . pairs) t where   -- tree-shaped folding big union
    pairs (xs:ys:t) = union xs ys : pairs t --  ~= nub . sort . concat
    union xs@(x:xs') ys@(y:ys')
      | x < y = x : union xs' ys
      | y < x = y : union xs ys'
      | otherwise = x : union xs' ys' -- x and y must be equal!

When compiled with -O2 optimization and -fllvm (the LLVM back end), the above code is over twice as fast as the Odds-Only version as it should be as that is about the ratio of reduced operations minus some slightly increased operation complexity, sieving the primes to a hundred million in about seven seconds on a modern middle range desktop computer. It is almost twice as fast as the "primesW" version due to the increased algorithmic efficiency!

Note that the "wheelGen" code could be used to not need to do further culling at all by continuously generating wheels until the square of the "firstSievePrime" is greater than the range as there are no composites left up to that limit, but this is always slower than a SoE due to the high overhead in generating the wheels - this would take a wheel generation of 1229 (number of primes to the square root of a hundred thousand is ten thousand) to create the required wheel sieved to a hundred million; however, the theoretical (if the time to advance through the lists per element were zero, which of course it is not) asymptotic performance would be O(n) instead of O(n log (log n)) where n is the range sieved. Just another case where theory supports (slightly) reduced number of operations, but practicality means that the overheads to do this are so big as to make it useless for any reasonable range ;-) !

Priority Queue based incremental sieve

The above work is derived from the Epilogue of the Melissa E. O'Neill paper which is much referenced with respect to incremental functional sieves; however, that paper is now dated and her comments comparing list based sieves to her original work leading up to a Priority Queue based implementation is no longer current given more recent work such as the above Tree Merging version. Accordingly, a modern "odd's-only" Priority Queue version is developed here for more current comparisons between the above list based incremental sieves and a continuation of O'Neill's work.

In order to implement a Priority Queue version with Haskell, an efficient Priority Queue, which is not part of the standard Haskell libraries, is required. A Min Heap implementation is likely best suited for this task in providing the most efficient frequently used peeks of the next item in the queue and replacement of the first item in the queue (not using a "pop" followed by a "push) with "pop" operations then not used at all, and "push" operations used relatively infrequently. Judging by O'Neill's use of an efficient "deleteMinAndInsert" operation which she states "(We provide deleteMinAndInsert becausea heap-based implementation can support this operation with considerably less rearrangement than a deleteMin followed by an insert.)", which statement is true for a Min Heap Priority Queue and not others, and her reference to a priority queue by (Paulson, 1996), the queue she used is likely the one as provided as a simple true functional Min Heap implementation on RosettaCode, from which the essential functions are reproduced here:

data PriorityQ k v = Mt
                     | Br !k v !(PriorityQ k v) !(PriorityQ k v)
  deriving (Eq, Ord, Read, Show)

emptyPQ :: PriorityQ k v
emptyPQ = Mt
 
peekMinPQ :: PriorityQ k v -> Maybe (k, v)
peekMinPQ Mt           = Nothing
peekMinPQ (Br k v _ _) = Just (k, v)

pushPQ :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v
pushPQ wk wv Mt           = Br wk wv Mt Mt
pushPQ wk wv (Br vk vv pl pr)
             | wk <= vk   = Br wk wv (pushPQ vk vv pr) pl
             | otherwise  = Br vk vv (pushPQ wk wv pr) pl
 
siftdown :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v -> PriorityQ k v
siftdown wk wv Mt _          = Br wk wv Mt Mt
siftdown wk wv (pl @ (Br vk vv _ _)) Mt
    | wk <= vk               = Br wk wv pl Mt
    | otherwise              = Br vk vv (Br wk wv Mt Mt) Mt
siftdown wk wv (pl @ (Br vkl vvl pll plr)) (pr @ (Br vkr vvr prl prr))
    | wk <= vkl && wk <= vkr = Br wk wv pl pr
    | vkl <= vkr             = Br vkl vvl (siftdown wk wv pll plr) pr
    | otherwise              = Br vkr vvr pl (siftdown wk wv prl prr)
 
replaceMinPQ :: Ord k => k -> v -> PriorityQ k v -> PriorityQ k v
replaceMinPQ wk wv Mt             = Mt
replaceMinPQ wk wv (Br _ _ pl pr) = siftdown wk wv pl pr

The "peekMin" function retrieves both the key and value in a tuple so processing is required to access whichever is required for further processing. As well, the output of the peekMin function is a Maybe with the case of an empty queue providing a Nothing output.

The following code is O'Neill's original odds-only code (without wheel factorization) from her paper slightly adjusted as per the requirements of this Min Heap implementation as laid out above; note the `seq` adjustments to the "adjust" function to make the evaluation of the entry tuple more strict for better efficiency:

-- (c) 2006-2007 Melissa O'Neill.  Code may be used freely so long as
-- this copyright message is retained and changed versions of the file
-- are clearly marked.
--   the only changes are the names of the called PQ functions and the
--   included processing for the result of the peek function being a maybe tuple.

primesPQ() = 2 : sieve [3,5..]
  where
    sieve [] = []
    sieve (x:xs) = x : sieve' xs (insertprime x xs emptyPQ)
      where
        insertprime p xs table = pushPQ (p*p) (map (* p) xs) table
        sieve' [] table = []
        sieve' (x:xs) table
            | nextComposite <= x = sieve' xs (adjust table)
            | otherwise = x : sieve' xs (insertprime x xs table)
          where
            nextComposite = case peekMinPQ table of
                              Just (c, _) -> c
            adjust table
                | n <= x = adjust (replaceMinPQ n' ns table)
                | otherwise = table
              where (n, n':ns) = case peekMinPQ table of
                                   Just tpl -> tpl

The above code is almost four times slower than the version of the Tree Merging sieve above for the first million primes although it is about the same speed as the original Richard Bird sieve with the "odds-only" adaptation as above. It is slow and uses a huge amount of memory for primarily one reason: over eagerness in adding prime composite streams to the queue, which are added as the primes are listed rather than when they are required as the output primes stream reaches the square of a given base prime; this over eagerness also means that the processed numbers must have a large range in order to not overflow when squared (as in the default Integer = infinite precision integers as used here and by O'Neill, but Int64's or Word64's would give a practical range) which processing of wide range numbers adds processing and memory requirement overhead. Although O'Neill's code is elegant, it also loses some efficiency due to the extensive use of lazy list processing, not all of which is required even for a wheel factorization implementation.

The following code is adjusted to reduce the amount of lazy list processing and to add a secondary base primes stream (or a succession of streams when the combinator is used) so as to overcome the above problems and reduce memory consumption to only that required for the primes below the square root of the currently sieved number; using this means that 32-bit Int's are sufficient for a reasonable range and memory requirements become relatively negligible:

primesPQx :: () -> [Int]
primesPQx() = 2 : _Y ((3 :) . sieve 5 emptyPQ 9) -- initBasePrms
  where
    _Y g = g (_Y g)        -- non-sharing multi-stage fixpoint combinator OR

    sieve n table q bps@(bp:bps')
        | n >= q = let nbp = head bps' in let ntbl = insertprime bp table in
                   ntbl `seq` sieve (n + 2) ntbl (nbp * nbp) bps'
        | n >= nextComposite = let ntbl = adjust table in
                               ntbl `seq` sieve (n + 2) ntbl q bps
        | otherwise = n : sieve (n + 2) table q bps
      where
        insertprime p table = let adv = 2 * p in let nv = p * p + adv
                              in nv `seq` pushPQ nv adv table
        nextComposite = case peekMinPQ table of
                          Nothing -> q -- at beginning when queue empty!
                          Just (c, _) -> c
        adjust table
            | c <= n = let ntbl = replaceMinPQ (c + adv) adv table
                       in ntbl `seq` adjust ntbl
            | otherwise = table
          where (c, adv) = case peekMinPQ table of Just ct -> ct `seq` ct

The above code is over five times faster than the previous (O'Neill) Priority Queue code half again faster than the Tree-Merging Odds-Only code for a range of a hundred million primes; it is likely faster as the Min Heap is slightly more efficient than Tree Merging due to better tree balancing.

Since the Tree-Folding version above includes the minor changes to work with a factorization wheel, this should have the same minor modifications for comparison purposes, with the code as follows:

-- Note:  this code segment uses the same wheelGen as the Tree-Folding version...

primesPQWheeled :: () -> [Int]
primesPQWheeled() =
    wheelPrimes ++ map fst (
      _Y (((firstSievePrime, wheel) :) .
            sieve (firstSievePrime + head wheel, tail wheel)
                  emptyPQ (firstSievePrime * firstSievePrime)) )
  where
    _Y g = g (_Y g)        -- non-sharing multi-stage fixpoint combinator OR

    wheel = cycle gaps

    sieve npr@(n,(g:gs')) table q bpprs@(bppr:bpprs')
        | n >= q =
            let (nbp,_) = head bpprs' in let ntbl = insertprime bppr table in
            nbp `seq` ntbl `seq` sieve (n + g, gs') ntbl (nbp * nbp) bpprs'
        | n >= nextComposite = let ntbl = adjust table in
                               ntbl `seq` sieve (n + g, gs') ntbl q bpprs
        | otherwise = npr : sieve (n + g, gs') table q bpprs
      where
        insertprime (p,(pg:pgs')) table =
          let nv = p * (p + pg) in nv `seq` pushPQ nv (map (* p) pgs') table
        nextComposite = case peekMinPQ table of
                          Nothing -> q -- at beginning when queue empty!
                          Just (c, _) -> c
        adjust table
            | c <= n = let ntbl = replaceMinPQ (c + a) as' table
                       in ntbl `seq` adjust ntbl
            | otherwise = table
          where (c, (a:as')) = case peekMinPQ table of Just ct -> ct `seq` ct

Compiled with -O2 optimization and -fllvm (the LLVM back end), this code gains about the expected ratio in performance in sieving to a range of a hundred million, sieving to this range in about five seconds on a modern medium range desktop computer. This is likely the fastest purely functional incremental type SoE useful for moderate ranges up to about a hundred million to a billion.

Page Segmented Sieve using a mutable unboxed array

All of the above unbounded sieves are quite limited in practical sieving range due to the large constant factor overheads in computation, making them mostly just interesting intellectual exercises other than for small ranges of up to about the first million to ten million primes; the following "odds-only" page-segmented version using (bit-packed internally) mutable unboxed arrays is about 50 times faster than the fastest of the above algorithms for ranges of about that and higher, making it practical for the first several hundred million primes:

{-# OPTIONS_GHC -O2 -fllvm #-} -- use LLVM for about double speed!

import Data.Int ( Int64 )
import Data.Word ( Word64 )
import Data.Bits ( Bits(shiftR) )
import Data.Array.Base ( IArray(unsafeAt), UArray(UArray),     
                         MArray(unsafeWrite), unsafeFreezeSTUArray ) 
import Control.Monad ( forM_ )
import Data.Array.ST ( MArray(newArray), runSTUArray )

type Prime = Word64

cSieveBufferRange :: Int
cSieveBufferRange = 2^17 * 8 -- CPU L2 cache in bits

primes :: () -> [Prime]
primes() = 2 : _Y (listPagePrms . pagesFrom 0) where
  _Y g = g (_Y g) -- non-sharing multi-stage fixpoint combinator
  szblmt = cSieveBufferRange - 1
  listPagePrms pgs@(hdpg@(UArray lwi _ rng _) : tlpgs) =
    let loop i | i >= fromIntegral rng = listPagePrms tlpgs
               | unsafeAt hdpg i = loop (i + 1)
               | otherwise = let ii = lwi + fromIntegral i in
                             case fromIntegral $ 3 + ii + ii of
                               p -> p `seq` p : loop (i + 1) in loop 0
  makePg lwi bps = runSTUArray $ do
    let limi = lwi + fromIntegral szblmt
        bplmt = floor $ sqrt $ fromIntegral $ limi + limi + 3
        strta bp = let si = fromIntegral $ (bp * bp - 3) `shiftR` 1
                   in if si >= lwi then fromIntegral $ si - lwi else
                   let r = fromIntegral (lwi - si) `mod` bp
                   in if r == 0 then 0 else fromIntegral $ bp - r
    cmpsts <- newArray (lwi, limi) False
    fcmpsts <- unsafeFreezeSTUArray cmpsts
    let cbps = if lwi == 0 then listPagePrms [fcmpsts] else bps
    forM_ (takeWhile (<= bplmt) cbps) $ \ bp ->
      forM_ (let sp = fromIntegral $ strta bp
             in [ sp, sp + fromIntegral bp .. szblmt ]) $ \c ->
        unsafeWrite cmpsts c True
    return cmpsts
  pagesFrom lwi bps = map (`makePg` bps)
                          [ lwi, lwi + fromIntegral szblmt + 1 .. ]

The above code as written has a maximum practical range of about 10^12 or so in about an hour.

The above code takes only a few tens of milliseconds to compute the first million primes and a few seconds to calculate the first 50 million primes up to a billion, with over half of those times expended in just enumerating the result lazy list. A further improvement to reduce the computational cost of repeated list processing across the base pages for every page segment would be to store the required base primes (or base prime gaps) in a lazy list of base prime arrays; in that way the scans across base primes per page segment would just mostly be array accesses which are much faster than list enumeration.

Unlike many other other unbounded examples, this algorithm has the true Sieve of Eratosthenes computational time complexity of O(n log log n) where n is the sieving range with no extra "log n" factor while having a very low computational time cost per composite number cull of less than ten CPU clock cycles per cull (well under as in under 4 clock cycles for the Intel i7 using a page buffer size of the CPU L1 cache size).

There are other ways to make the algorithm faster including high degrees of wheel factorization, which can reduce the number of composite culling operations by a factor of about four for practical ranges, and multi-processing which can reduce the computation time proportionally to the number of available independent CPU cores, but there is little point to these optimizations as long as the lazy list enumeration is the bottleneck as it is starting to be in the above code. To take advantage of those optimizations, functions need to be provided that can compute the desired results without using list processing.

For ranges above about 10^14 where culling spans begin to exceed even an expanded size page array, other techniques need to be adapted such as such as automatically extending the sieving buffer size to the square root of the maximum range currently sieved and sieving by CPU L1/L2 cache sized segments/sections.

However, even with the above code and its limitations for large sieving ranges, the speeds will never come close to as slow as the other "incremental" sieve algorithms, as the time will never exceed about 20 CPU clock cycles per composite number cull, where the fastest of those other algorithms takes many hundreds of CPU clock cycles per cull.

A faster method of counting primes with a similar algorithm

To show the limitations of the individual prime enumeration, the following code has been refactored from the above to provide an alternate very fast method of counting the unset bits in the culled array (the primes = none composite) using a CPU native pop count instruction:

{-# LANGUAGE  FlexibleContexts #-}
{-# OPTIONS_GHC -O2 -fllvm #-} -- use LLVM for about double speed!

import Data.Time.Clock.POSIX ( getPOSIXTime ) -- for timing

import Data.Int ( Int64 )
import Data.Word ( Word64 )
import Data.Bits ( Bits((.&.), (.|.), shiftL, shiftR, popCount) )
import Control.Monad.ST ( ST, runST )
import Data.Array.Base ( IArray(unsafeAt), UArray(UArray), STUArray,
                         MArray(unsafeRead, unsafeWrite), castSTUArray,
                         unsafeThawSTUArray, unsafeFreezeSTUArray ) 
import Control.Monad ( forM_ )
import Data.Array.ST ( MArray(newArray), runSTUArray )

type Prime = Word64

cSieveBufferRange :: Int
cSieveBufferRange = 2^17 * 8 -- CPU L2 cache in bits

type PrimeNdx = Int64
type SieveBuffer = UArray PrimeNdx Bool
cWHLPRMS :: [Prime]
cWHLPRMS = [ 2 ]
cFRSTSVPRM :: Prime
cFRSTSVPRM = 3
primesPages :: () -> [SieveBuffer]
primesPages() = _Y (pagesFrom 0 . listPagePrms) where
  _Y g = g (_Y g) -- non-sharing multi-stage fixpoint Y-combinator
  szblmt = fromIntegral (cSieveBufferRange `shiftR` 1) - 1
  makePg lwi bps = runSTUArray $ do
    let limi = lwi + fromIntegral szblmt
        mxprm = cFRSTSVPRM + fromIntegral (limi + limi)
        bplmt = floor $ sqrt $ fromIntegral mxprm
        strta bp = let si = fromIntegral $ (bp * bp - cFRSTSVPRM) `shiftR` 1
                   in if si >= lwi then fromIntegral $ si - lwi else
                   let r = fromIntegral (lwi - si) `mod` bp
                   in if r == 0 then 0 else fromIntegral $ bp - r
    cmpsts <- newArray (lwi, limi) False
    fcmpsts <- unsafeFreezeSTUArray cmpsts
    let cbps = if lwi == 0 then listPagePrms [fcmpsts] else bps
    forM_ (takeWhile (<= bplmt) cbps) $ \ bp ->
      forM_ (let sp = fromIntegral $ strta bp
             in [ sp, sp + fromIntegral bp .. szblmt ]) $ \c ->
        unsafeWrite cmpsts c True
    return cmpsts
  pagesFrom lwi bps = map (`makePg` bps)
                          [ lwi, lwi + fromIntegral szblmt + 1 .. ]

-- convert a list of sieve buffers to a list of primes...
listPagePrms :: [SieveBuffer] -> [Prime]
listPagePrms pgs@(pg@(UArray lwi _ rng _) : pgstl) = bsprm `seq` loop 0 where
  bsprm = cFRSTSVPRM + fromIntegral (lwi + lwi)
  loop i | i >= rng = listPagePrms pgstl
         | unsafeAt pg i = loop (i + 1)
         | otherwise = case bsprm + fromIntegral (i + i) of
                         p -> p `seq` p : loop (i + 1)
 
primes :: () -> [Prime]
primes() = cWHLPRMS ++ listPagePrms (primesPages())

-- very fast using popCount by words technique...
countSieveBuffer :: Int -> UArray PrimeNdx Bool -> Int64
countSieveBuffer lstndx sb = fromIntegral $ runST $ do
  cmpsts <- unsafeThawSTUArray sb :: ST s (STUArray s PrimeNdx Bool)
  wrdcmpsts <-
    (castSTUArray :: STUArray s PrimeNdx Bool ->
                      ST s (STUArray s PrimeNdx Word64)) cmpsts
  let lstwrd = lstndx `shiftR` 6
      lstmsk = 0xFFFFFFFFFFFFFFFE `shiftL` (lstndx .&. 63) :: Word64
      loop wi cnt
        | wi < lstwrd = do
          v <- unsafeRead wrdcmpsts wi
          case cnt - popCount v of ncnt -> ncnt `seq` loop (wi + 1) ncnt
        | otherwise = do
            v <- unsafeRead wrdcmpsts lstwrd
            return $ fromIntegral (cnt - popCount (v .|. lstmsk))
  loop 0 (lstwrd * 64 + 64)

-- count the remaining un-marked composite bits using very fast popcount...
countPrimesTo :: Prime -> Int64
countPrimesTo limit =
  let lmtndx = fromIntegral $ (limit - 3) `shiftR` 1
      loop (pg@(UArray lwi lmti rng _) : pgstl) cnt
        | lmti >= lmtndx =
          (cnt + countSieveBuffer (fromIntegral $ lmtndx - lwi) pg)
        | otherwise = loop pgstl (cnt + countSieveBuffer (rng - 1) pg)
  in if limit < 3 then if limit < 2 then 0 else 1
     else loop (primesPages()) 1

-- test it...
main :: IO ()
main = do
  let limit = 10^9 :: Prime

  strt <- getPOSIXTime
--  let answr = length $ takeWhile (<= limit) $ primes()-- slow way
  let answr = countPrimesTo limit -- fast way
  stop <- answr `seq` getPOSIXTime -- force evaluation of answr b4 stop time!
  let elpsd = round $ 1e3 * (stop - strt) :: Int64
 
  putStr $ "Found " ++ show answr
  putStr $ " primes up to " ++ show limit
  putStrLn $ " in " ++ show elpsd ++ " milliseconds."

When compiled with the "fast way" commented out and the "slow way enabled, the time to find the number of primes up to one billion is about 3.65 seconds on an Intel Sandy Bridge i3-2100 at 3.1 Ghz; with the "fast way" enabled instead, the time is only about 1.45 seconds for the same range, both compiled with the LLVM back end. This shows that more than half of the time for the "slow way" is spent just producing and enumerating the list of primes!

On a Intel Sky Lake i5-2500 CPU @ 3.6 GHz (turbo boost for single threaded as here) compiled with LLVM and 256 Kilobyte buffer size (CPU L2 sized), using the fast counting method:

takes 1.085 seconds to sieve to 10^9: about 3.81 CPU clocks per cull
takes 126 seconds to sieve to 10^11: about 4.0 CPU clocks per cull

This shows a slight loss of efficiency in clocks per cull due to the average culling span size coming closer to the cull buffer span size, meaning that the loop overhead in address calculation and CPU L1 cache overflows increases just a bit for these relative ranges.

This an extra about 20% faster than using the Sandy Bridge i5-2100 above by more than the ratio of CPU clock speeds likely due to the better Instructions Per Clock of the newer Sky Lake architecture due to improved branch prediction and elision of a correctly predicted branch down to close to zero time.

This is about 25 to 30 per cent faster than not using LLVM for this Sky Lake processor due to the poor register allocation and optimizations by the Native Code Gnerator compared to LLVM.

APL-style

Rolling set subtraction over the rolling element-wise addition on integers. Basic, slow, worse than quadratic in the number of primes produced, empirically:

zipWith (flip (!!)) [0..]    -- or: take n . last . take n ...
     . scanl1 minus 
     . scanl1 (zipWith (+)) $ repeat [2..]

Or, a wee bit faster:

unfoldr (\(a:b:t) -> Just . (head &&& (:t) . (`minus` b)
                                           . tail) $ a)
     . scanl1 (zipWith (+)) $ repeat [2..]

A bit optimized, much faster, with better complexity,

tail . concat 
     . unfoldr (\(a:b:t) -> Just . second ((:t) . (`minus` b))
                                 . span (< head b) $ a)
     . scanl1 (zipWith (+) . tail) $ tails [1..]
  -- $ [ [n*n, n*n+n..] | n <- [1..] ]

getting nearer to the functional equivalent of the primesPE above, i.e.

fix ( (2:) . concat 
      . unfoldr (\(a:b:t) -> Just . second ((:t) . (`minus` b))
                                  . span (< head b) $ a)
      . ([3..] :) . map (\p-> [p*p, p*p+p..]) )

An illustration:

> mapM_ (print . take 15) $ take 10 $ scanl1 (zipWith(+)) $ repeat [2..]
[  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16]
[  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32]
[  6,  9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48]
[  8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64]
[ 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80]
[ 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72, 78, 84, 90, 96]
[ 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, 84, 91, 98,105,112]
[ 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96,104,112,120,128]
[ 18, 27, 36, 45, 54, 63, 72, 81, 90, 99,108,117,126,135,144]
[ 20, 30, 40, 50, 60, 70, 80, 90,100,110,120,130,140,150,160]

> mapM_ (print . take 15) $ take 10 $ scanl1 (zipWith(+) . tail) $ tails [1..]
[  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15]
[  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32]
[  9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51]
[ 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 68, 72]
[ 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
[ 36, 42, 48, 54, 60, 66, 72, 78, 84, 90, 96,102,108,114,120]
[ 49, 56, 63, 70, 77, 84, 91, 98,105,112,119,126,133,140,147]
[ 64, 72, 80, 88, 96,104,112,120,128,136,144,152,160,168,176]
[ 81, 90, 99,108,117,126,135,144,153,162,171,180,189,198,207]
[100,110,120,130,140,150,160,170,180,190,200,210,220,230,240]

HicEst

REAL :: N=100,  sieve(N)

sieve = $ > 1     ! = 0 1 1 1 1 ...
DO i = 1, N^0.5
  IF( sieve(i) ) THEN
     DO j = i^2, N, i
       sieve(j) = 0
     ENDDO
  ENDIF
ENDDO

DO i = 1, N
  IF( sieve(i) ) WRITE() i
ENDDO

Hoon

::  Find primes by the sieve of Eratosthenes
!:
|=  end=@ud
=/  index  2
=/  primes  `(list @ud)`(gulf 1 end)
|-  ^-  (list @ud)
?:  (gte index (lent primes))  primes
$(index +(index), primes +:(skid primes |=([a=@ud] &((gth a index) =(0 (mod a index))))))

Icon and Unicon

 procedure main()
    sieve(100)
 end

 procedure sieve(n)
    local p,i,j
    p:=list(n, 1)
    every i:=2 to sqrt(n) & j:= i+i to n by i & p[i] == 1
      do p[j] := 0
    every write(i:=2 to n & p[i] == 1 & i)
 end

Alternatively using sets

 procedure main()
     sieve(100)
 end

 procedure sieve(n)
     primes := set()
     every insert(primes,1 to n)
     every member(primes,i := 2 to n) do
         every delete(primes,i + i to n by i)
     delete(primes,1)
     every write(!sort(primes))
end

J

Generally, this task should be accomplished in J using i.&.(p:inv) . Here we take an approach that's more comparable with the other examples on this page.

Implementation:

sieve=: {{
  r=. 0#t=. y# j=.1
  while. y>j=.j+1 do.
    if. j{t do.
      t=. t > y$j{.1
      r=. r, j
    end.
  end.
}}

Example:

   sieve 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

To see into how this works, we can change the definition:

sieve=: {{
  r=. 0#t=. y# j=.1
  while. y>j=.j+1 do.
    if. j{t do.
      echo j;(y$j{.1);t=. t > y$j{.1
      r=. r, j
    end.
  end.
}}

And go:

   sieve 10
┌─┬───────────────────┬───────────────────┐
│2│1 0 1 0 1 0 1 0 1 0│0 1 0 1 0 1 0 1 0 1│
└─┴───────────────────┴───────────────────┘
┌─┬───────────────────┬───────────────────┐
│3│1 0 0 1 0 0 1 0 0 1│0 1 0 0 0 1 0 1 0 0│
└─┴───────────────────┴───────────────────┘
┌─┬───────────────────┬───────────────────┐
│5│1 0 0 0 0 1 0 0 0 0│0 1 0 0 0 0 0 1 0 0│
└─┴───────────────────┴───────────────────┘
┌─┬───────────────────┬───────────────────┐
│7│1 0 0 0 0 0 0 1 0 0│0 1 0 0 0 0 0 0 0 0│
└─┴───────────────────┴───────────────────┘
2 3 5 7

Thus, here, t would select numbers which have not yet been determined to be a multiple of a prime number.

Janet

Simple, all primes below a limit

Janet has a builtin "buffer" type which is used as a mutable byte string. It has builtin utility methods to handle bit strings (see here :)

This is based off the Python version.

(defn primes-before
  "Gives all the primes < limit"
  [limit]
  (assert (int? limit))
  # Janet has a buffer type (mutable string) which has easy methods for use as bitset
  (def buf-size (math/ceil (/ limit 8)))
  (def is-prime (buffer/new-filled buf-size (bnot 0)))
  (print "Size" buf-size "is-prime: " is-prime)
  (buffer/bit-clear is-prime 0)
  (buffer/bit-clear is-prime 1)
  (for n 0 (math/ceil (math/sqrt limit))
    (if (buffer/bit is-prime n) (loop [i :range-to [(* n n) limit n]]
      (buffer/bit-clear is-prime i))))
  (def res @[]) # Result: Mutable array
  (for i 0 limit
    (if (buffer/bit is-prime i)
      (array/push res i)))
  (def res (array/new limit))
  (for i 0 limit
    (if (buffer/bit is-prime i)
      (array/push res i)))
  res)

Java

Works with: Java version 1.5+

import java.util.LinkedList;

public class Sieve{
       public static LinkedList<Integer> sieve(int n){
               if(n < 2) return new LinkedList<Integer>();
               LinkedList<Integer> primes = new LinkedList<Integer>();
               LinkedList<Integer> nums = new LinkedList<Integer>();

               for(int i = 2;i <= n;i++){ //unoptimized
                       nums.add(i);
               }

               while(nums.size() > 0){
                       int nextPrime = nums.remove();
                       for(int i = nextPrime * nextPrime;i <= n;i += nextPrime){
                               nums.removeFirstOccurrence(i);
                       }
                       primes.add(nextPrime);
               }
               return primes;
       }
}

To optimize by testing only odd numbers, replace the loop marked "unoptimized" with these lines:

nums.add(2);
for(int i = 3;i <= n;i += 2){
       nums.add(i);
}

Version using List:

import java.util.ArrayList;
import java.util.List;
 
public class Eratosthenes {
    public List<Integer> sieve(Integer n) {
        List<Integer> primes = new ArrayList<Integer>(n);
        boolean[] isComposite = new boolean[n + 1];
        for(int i = 2; i <= n; i++) {
            if(!isComposite[i]) {
                primes.add(i);
                for(int j = i * i; j <= n; j += i) {
                    isComposite[j] = true;
                }
            }
        }
        return primes;
    }
}

Version using a BitSet:

import java.util.LinkedList;
import java.util.BitSet;

public class Sieve{
    public static LinkedList<Integer> sieve(int n){
        LinkedList<Integer> primes = new LinkedList<Integer>();
        BitSet nonPrimes = new BitSet(n+1);
        
        for (int p = 2; p <= n ; p = nonPrimes.nextClearBit(p+1)) {
            for (int i = p * p; i <= n; i += p)
                nonPrimes.set(i);
            primes.add(p);
        }
        return primes;
    }
}

Version using a TreeSet:

import java.util.Set;
import java.util.TreeSet;

public class Sieve{
    public static Set<Integer> findPrimeNumbers(int limit) {
    int last = 2;
    TreeSet<Integer> nums = new TreeSet<>();

    if(limit < last) return nums;

    for(int i = last; i <= limit; i++){
      nums.add(i);
    }

    return filterList(nums, last, limit);
  }

  private static TreeSet<Integer> filterList(TreeSet<Integer> list, int last, int limit) {
    int squared = last*last;
    if(squared < limit) {
      for(int i=squared; i <= limit; i += last) {
        list.remove(i);
      }
      return filterList(list, list.higher(last), limit);
    } 
    return list; 
  }
}

Infinite iterator

An iterator that will generate primes indefinitely (perhaps until it runs out of memory), but very slowly.

Translation of: Python

Works with: Java version 1.5+

import java.util.Iterator;
import java.util.PriorityQueue;
import java.math.BigInteger;

// generates all prime numbers
public class InfiniteSieve implements Iterator<BigInteger> {

    private static class NonPrimeSequence implements Comparable<NonPrimeSequence> {
	BigInteger currentMultiple;
	BigInteger prime;

	public NonPrimeSequence(BigInteger p) {
	    prime = p;
	    currentMultiple = p.multiply(p); // start at square of prime
	}
	@Override public int compareTo(NonPrimeSequence other) {
	    // sorted by value of current multiple
	    return currentMultiple.compareTo(other.currentMultiple);
	}
    }

    private BigInteger i = BigInteger.valueOf(2);
    // priority queue of the sequences of non-primes
    // the priority queue allows us to get the "next" non-prime quickly
    final PriorityQueue<NonPrimeSequence> nonprimes = new PriorityQueue<NonPrimeSequence>();

    @Override public boolean hasNext() { return true; }
    @Override public BigInteger next() {
	// skip non-prime numbers
	for ( ; !nonprimes.isEmpty() && i.equals(nonprimes.peek().currentMultiple); i = i.add(BigInteger.ONE)) {
            // for each sequence that generates this number,
            // have it go to the next number (simply add the prime)
            // and re-position it in the priority queue
	    while (nonprimes.peek().currentMultiple.equals(i)) {
		NonPrimeSequence x = nonprimes.poll();
		x.currentMultiple = x.currentMultiple.add(x.prime);
		nonprimes.offer(x);
	    }
	}
	// prime
        // insert a NonPrimeSequence object into the priority queue
	nonprimes.offer(new NonPrimeSequence(i));
	BigInteger result = i;
	i = i.add(BigInteger.ONE);
	return result;
    }

    public static void main(String[] args) {
	Iterator<BigInteger> sieve = new InfiniteSieve();
	for (int i = 0; i < 25; i++) {
	    System.out.println(sieve.next());
	}
    }
}

Output:

Infinite iterator with a faster algorithm (sieves odds-only)

The adding of each discovered prime's incremental step information to the mapping should be postponed until the candidate number reaches the primes square, as it is useless before that point. This drastically reduces the space complexity from O(n/log(n)) to O(sqrt(n/log(n))), in n primes produced, and also lowers the run time complexity due to the use of the hash table based HashMap, which is much more efficient for large ranges.

Translation of: Python

Works with: Java version 1.5+

import java.util.Iterator;
import java.util.HashMap;
 
// generates all prime numbers up to about 10 ^ 19 if one can wait 1000's of years or so...
public class SoEInfHashMap implements Iterator<Long> {

  long candidate = 2;
  Iterator<Long> baseprimes = null;
  long basep = 3;
  long basepsqr = 9;
  // HashMap of the sequences of non-primes
  // the hash map allows us to get the "next" non-prime reasonably quickly
  // but further allows re-insertions to take amortized constant time
  final HashMap<Long,Long> nonprimes = new HashMap<>();

  @Override public boolean hasNext() { return true; }
  @Override public Long next() {
    // do the initial primes separately to initialize the base primes sequence
    if (this.candidate <= 5L) if (this.candidate++ == 2L) return 2L; else {
      this.candidate++; if (this.candidate == 5L) return 3L; else {
        this.baseprimes = new SoEInfHashMap();
        this.baseprimes.next(); this.baseprimes.next(); // throw away 2 and 3
        return 5L;
    } }
    // skip non-prime numbers including squares of next base prime
    for ( ; this.candidate >= this.basepsqr || //equals nextbase squared => not prime
              nonprimes.containsKey(this.candidate); candidate += 2) {
      // insert a square root prime sequence into hash map if to limit
      if (candidate >= basepsqr) { // if square of base prime, always equal
        long adv = this.basep << 1;
        nonprimes.put(this.basep * this.basep + adv, adv);
        this.basep = this.baseprimes.next();
        this.basepsqr = this.basep * this.basep;
      }
      // else for each sequence that generates this number,
      // have it go to the next number (simply add the advance)
      // and re-position it in the hash map at an emply slot
      else {
        long adv = nonprimes.remove(this.candidate);
        long nxt = this.candidate + adv;
        while (this.nonprimes.containsKey(nxt)) nxt += adv; //unique keys
        this.nonprimes.put(nxt, adv);
      }
    }
    // prime
    long tmp = candidate; this.candidate += 2; return tmp;
  }

  public static void main(String[] args) {    
    int n = 100000000;    
    long strt = System.currentTimeMillis();    
    SoEInfHashMap sieve = new SoEInfHashMap();
    int count = 0;
    while (sieve.next() <= n) count++;    
    long elpsd = System.currentTimeMillis() - strt;    
    System.out.println("Found " + count + " primes up to " + n + " in " + elpsd + " milliseconds.");
  }
  
}

Output:

Found 5761455 primes up to 100000000 in 4297 milliseconds.

Infinite iterator with a very fast page segmentation algorithm (sieves odds-only)

Although somewhat faster than the previous infinite iterator version, the above code is still over 10 times slower than an infinite iterator based on array paged segmentation as in the following code, where the time to enumerate/iterate over the found primes (common to all the iterators) is now about half of the total execution time:

Translation of: JavaScript

Works with: Java version 1.5+

import java.util.Iterator;
import java.util.ArrayList;

// generates all prime numbers up to about 10 ^ 19 if one can wait 100's of years or so...
// practical range is about 10^14 in a week or so...
public class SoEPagedOdds implements Iterator<Long> {
  private final int BFSZ = 1 << 16;
  private final int BFBTS = BFSZ * 32;
  private final int BFRNG = BFBTS * 2;
  private long bi = -1;
  private long lowi = 0;
  private final ArrayList<Integer> bpa = new ArrayList<>();
  private Iterator<Long> bps;
  private final int[] buf = new int[BFSZ];
  
  @Override public boolean hasNext() { return true; }
  @Override public Long next() {
    if (this.bi < 1) {
      if (this.bi < 0) {
        this.bi = 0;
        return 2L;
      }
      //this.bi muxt be 0
      long nxt = 3 + (this.lowi << 1) + BFRNG;
      if (this.lowi <= 0) { // special culling for first page as no base primes yet:
          for (int i = 0, p = 3, sqr = 9; sqr < nxt; i++, p += 2, sqr = p * p)
              if ((this.buf[i >>> 5] & (1 << (i & 31))) == 0)
                  for (int j = (sqr - 3) >> 1; j < BFBTS; j += p)
                      this.buf[j >>> 5] |= 1 << (j & 31);
      }
      else { // after the first page:
        for (int i = 0; i < this.buf.length; i++)
          this.buf[i] = 0; // clear the sieve buffer
        if (this.bpa.isEmpty()) { // if this is the first page after the zero one:
            this.bps = new SoEPagedOdds(); // initialize separate base primes stream:
            this.bps.next(); // advance past the only even prime of two
            this.bpa.add(this.bps.next().intValue()); // get the next prime (3 in this case)
        }
        // get enough base primes for the page range...
        for (long p = this.bpa.get(this.bpa.size() - 1), sqr = p * p; sqr < nxt;
                p = this.bps.next(), this.bpa.add((int)p), sqr = p * p) ;
        for (int i = 0; i < this.bpa.size() - 1; i++) {
          long p = this.bpa.get(i);
          long s = (p * p - 3) >>> 1;
          if (s >= this.lowi) // adjust start index based on page lower limit...
            s -= this.lowi;
          else {
            long r = (this.lowi - s) % p;
            s = (r != 0) ? p - r : 0;
          }
          for (int j = (int)s; j < BFBTS; j += p)
            this.buf[j >>> 5] |= 1 << (j & 31);
        }
      }
    }
    while ((this.bi < BFBTS) &&
           ((this.buf[(int)this.bi >>> 5] & (1 << ((int)this.bi & 31))) != 0))
        this.bi++; // find next marker still with prime status
    if (this.bi < BFBTS) // within buffer: output computed prime
        return 3 + ((this.lowi + this.bi++) << 1);
    else { // beyond buffer range: advance buffer
        this.bi = 0;
        this.lowi += BFBTS;
        return this.next(); // and recursively loop
    }
  }

  public static void main(String[] args) {    
    long n = 1000000000;
    long strt = System.currentTimeMillis();
    Iterator<Long> gen = new SoEPagedOdds();
    int count = 0;
    while (gen.next() <= n) count++;
    long elpsd = System.currentTimeMillis() - strt;
    System.out.println("Found " + count + " primes up to " + n + " in " + elpsd + " milliseconds.");
  }
  
}

Output:

Found 50847534 primes up to 1000000000 in 3201 milliseconds.

JavaScript

function eratosthenes(limit) {
    var primes = [];
    if (limit >= 2) {
        var sqrtlmt = Math.sqrt(limit) - 2;
        var nums = new Array(); // start with an empty Array...
        for (var i = 2; i <= limit; i++) // and
            nums.push(i); // only initialize the Array once...
        for (var i = 0; i <= sqrtlmt; i++) {
            var p = nums[i]
            if (p)
                for (var j = p * p - 2; j < nums.length; j += p)
                    nums[j] = 0;
        }
        for (var i = 0; i < nums.length; i++) {
            var p = nums[i];
            if (p)
                primes.push(p);
        }
    }
    return primes;
}

var primes = eratosthenes(100);

if (typeof print == "undefined")
    print = (typeof WScript != "undefined") ? WScript.Echo : alert;
print(primes);

outputs:

2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97

Substituting the following code for the function for an odds-only algorithm using bit packing for the array produces code that is many times faster than the above:

function eratosthenes(limit) {
    var prms = [];
    if (limit >= 2) prms = [2];
    if (limit >= 3) {
        var sqrtlmt = (Math.sqrt(limit) - 3) >> 1;
        var lmt = (limit - 3) >> 1;
        var bfsz = (lmt >> 5) + 1
        var buf = [];
        for (var i = 0; i < bfsz; i++)
            buf.push(0);
        for (var i = 0; i <= sqrtlmt; i++)
            if ((buf[i >> 5] & (1 << (i & 31))) == 0) {
                var p = i + i + 3;
                for (var j = (p * p - 3) >> 1; j <= lmt; j += p)
                    buf[j >> 5] |= 1 << (j & 31);
            }
        for (var i = 0; i <= lmt; i++)
            if ((buf[i >> 5] & (1 << (i & 31))) == 0)
                prms.push(i + i + 3);
    }
    return prms;
}

While the above code is quite fast especially using an efficient JavaScript engine such as Google Chrome's V8, it isn't as elegant as it could be using the features of the new EcmaScript6 specification when it comes out about the end of 2014 and when JavaScript engines including those of browsers implement that standard in that we might choose to implement an incremental algorithm iterators or generators similar to as implemented in Python or F# (yield). Meanwhile, we can emulate some of those features by using a simulation of an iterator class (which is easier than using a call-back function) for an "infinite" generator based on an Object dictionary as in the following odds-only code written as a JavaScript class:

var SoEIncClass = (function () {
    function SoEIncClass() {
        this.n = 0;
    }
    SoEIncClass.prototype.next = function () {
        this.n += 2;
        if (this.n < 7) { // initialization of sequence to avoid runaway:
            if (this.n < 3) { // only even of two:
                this.n = 1; // odds from here...
                return 2;
            }
            if (this.n < 5)
                return 3;
            this.dict = {}; // n must be 5...
            this.bps = new SoEIncClass(); // new source of base primes
            this.bps.next(); // advance past the even prime of two...
            this.p = this.bps.next(); // first odd prime (3 in this case)
            this.q = this.p * this.p; // set guard
            return 5;
        } else { // past initialization:
            var s = this.dict[this.n]; // may or may not be defined...
            if (!s) { // not defined:
                if (this.n < this.q) // haven't reached the guard:
                    return this.n; // found a prime
                else { // n === q => not prime but at guard, so:
                    var p2 = this.p << 1; // the span odds-only is twice prime
                    this.dict[this.n + p2] = p2; // add next composite of prime to dict
                    this.p = this.bps.next();
                    this.q = this.p * this.p; // get next base prime guard
                    return this.next(); // not prime so advance...
                }
            } else { // is a found composite of previous base prime => not prime
                delete this.dict[this.n]; // advance to next composite of this prime:
                var nxt = this.n + s;
                while (this.dict[nxt]) nxt += s; // find unique empty slot in dict
                this.dict[nxt] = s; // to put the next composite for this base prime
                return this.next(); // not prime so advance...
            }
        }
    };
    return SoEIncClass;
})();

The above code can be used to find the nth prime (which would require estimating the required range limit using the previous fixed range code) by using the following code:

var gen = new SoEIncClass(); 
for (var i = 1; i < 1000000; i++, gen.next());
var prime = gen.next();
 
if (typeof print == "undefined")
    print = (typeof WScript != "undefined") ? WScript.Echo : alert;
print(prime);

to produce the following output (about five seconds using Google Chrome's V8 JavaScript engine):

15485863

The above code is considerably slower than the fixed range code due to the multiple method calls and the use of an object as a dictionary, which (using a hash table as its basis for most implementations) will have about a constant O(1) amortized time per operation but has quite a high constant overhead to convert the numeric indices to strings which are then hashed to be used as table keys for the look-up operations as compared to doing this more directly in implementations such as the Python dict with Python's built-in hashing functions for every supported type.

This can be implemented as an "infinite" odds-only generator using page segmentation for a considerable speed-up with the alternate JavaScript class code as follows:

var SoEPgClass = (function () {
    function SoEPgClass() {
        this.bi = -1; // constructor resets the enumeration to start...
    }
    SoEPgClass.prototype.next = function () {
        if (this.bi < 1) {
            if (this.bi < 0) {
                this.bi++;
                this.lowi = 0; // other initialization done here...
                this.bpa = [];
                return 2;
            } else { // bi must be zero:
                var nxt = 3 + (this.lowi << 1) + 262144;
                this.buf = new Array();
                for (var i = 0; i < 4096; i++) // faster initialization:
                    this.buf.push(0);
                if (this.lowi <= 0) { // special culling for first page as no base primes yet:
                    for (var i = 0, p = 3, sqr = 9; sqr < nxt; i++, p += 2, sqr = p * p)
                        if ((this.buf[i >> 5] & (1 << (i & 31))) === 0)
                            for (var j = (sqr - 3) >> 1; j < 131072; j += p)
                                this.buf[j >> 5] |= 1 << (j & 31);
                } else { // after the first page:
                    if (!this.bpa.length) { // if this is the first page after the zero one:
                        this.bps = new SoEPgClass(); // initialize separate base primes stream:
                        this.bps.next(); // advance past the only even prime of two
                        this.bpa.push(this.bps.next()); // get the next prime (3 in this case)
                    }
                    // get enough base primes for the page range...
                    for (var p = this.bpa[this.bpa.length - 1], sqr = p * p; sqr < nxt;
                            p = this.bps.next(), this.bpa.push(p), sqr = p * p) ;
                    for (var i = 0; i < this.bpa.length; i++) {
                        var p = this.bpa[i];
                        var s = (p * p - 3) >> 1;
                        if (s >= this.lowi) // adjust start index based on page lower limit...
                            s -= this.lowi;
                        else {
                            var r = (this.lowi - s) % p;
                            s = (r != 0) ? p - r : 0;
                        }
                        for (var j = s; j < 131072; j += p)
                            this.buf[j >> 5] |= 1 << (j & 31);
                    }
                }
            }
        }
        while (this.bi < 131072 && this.buf[this.bi >> 5] & (1 << (this.bi & 31)))
            this.bi++; // find next marker still with prime status
        if (this.bi < 131072) // within buffer: output computed prime
            return 3 + ((this.lowi + this.bi++) << 1);
        else { // beyond buffer range: advance buffer
            this.bi = 0;
            this.lowi += 131072;
            return this.next(); // and recursively loop
        }
    };
    return SoEPgClass;
})();

The above code is about fifty times faster (about five seconds to calculate 50 million primes to about a billion on the Google Chrome V8 JavaScript engine) than the above dictionary based code.

The speed for both of these "infinite" solutions will also respond to further wheel factorization techniques, especially for the dictionary based version where any added overhead to deal with the factorization wheel will be negligible compared to the dictionary overhead. The dictionary version would likely speed up about a factor of three or a little more with maximum wheel factorization applied; the page segmented version probably won't gain more than a factor of two and perhaps less due to the overheads of array look-up operations.

function is copy-pasted from above to produce a webpage version for beginners:

<script>
function eratosthenes(limit) {
    var primes = [];
    if (limit >= 2) {
        var sqrtlmt = Math.sqrt(limit) - 2;
        var nums = new Array(); // start with an empty Array...
        for (var i = 2; i <= limit; i++) // and
            nums.push(i); // only initialize the Array once...
        for (var i = 0; i <= sqrtlmt; i++) {
            var p = nums[i]
            if (p)
                for (var j = p * p - 2; j < nums.length; j += p)
                    nums[j] = 0;
        }
        for (var i = 0; i < nums.length; i++) {
            var p = nums[i];
            if (p)
                primes.push(p);
        }
    }
    return primes;
}
var primes = eratosthenes(100);
	output='';
        for (var i = 0; i < primes.length; i++) {
		output+=primes[i];	
		if (i < primes.length-1) output+=',';
        }
document.write(output);
</script>

JOVIAL

START
FILE MYOUTPUT ... $ ''Insufficient information to complete this declaration''
PROC SIEVEE $
    '' define the sieve data structure ''
    ARRAY CANDIDATES 1000 B $
    FOR I =0,1,999 $
    BEGIN
        '' everything is potentially prime until proven otherwise ''
        CANDIDATES($I$) = 1$
    END
    '' Neither 1 nor 0 is prime, so flag them off ''
    CANDIDATES($0$) = 0$
    CANDIDATES($1$) = 0$
    '' start the sieve with the integer 0 ''
    FOR I = 0$
    BEGIN
        IF I GE 1000$
        GOTO DONE$
        '' advance to the next un-crossed out number. ''
        '' this number must be a prime ''
NEXTI.  IF I LS 1000 AND Candidates($I$) EQ 0 $
        BEGIN
            I = I + 1 $
            GOTO NEXTI $
        END
        '' insure against running off the end of the data structure ''
        IF I LT 1000 $
        BEGIN
            '' cross out all multiples of the prime, starting with 2*p. ''
            FOR J=2 $
            FOR K=0 $
            BEGIN
                K = J * I $
                IF K GT 999 $
                GOTO ADV $
                CANDIDATES($K$) = 0 $
                J = J + 1 $
            END
            '' advance to the next candidate ''
ADV.        I = I + 1 $
        END
    END
    '' all uncrossed-out numbers are prime (and only those numbers) ''
    '' print all primes ''
DONE. OPEN OUTPUT MYOUTPUT $
    FOR I=0,1,999$
    BEGIN
        IF CANDIDATES($I$) NQ 0$
        BEGIN
            OUTPUT MYOUTPUT I $
        END
    END
TERM$

jq

Works with: jq version 1.4

Bare Bones

Short and sweet ...

# Denoting the input by $n, which is assumed to be a positive integer,
# eratosthenes/0 produces an array of primes less than or equal to $n:
def eratosthenes:

  def erase(i):
    if .[i] then
      reduce (range(2*i; length; i)) as $j (.; .[$j] = false) 
    else .
    end;

  (. + 1) as $n
  | (($n|sqrt) / 2) as $s
  | [null, null, range(2; $n)]
  | reduce (2, 1 + (2 * range(1; $s))) as $i (.; erase($i))
  | map(select(.));

Examples:

100 | eratosthenes

Output:

[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]

1e7 | eratosthenes | length

Output:

664579

Enhanced Sieve

Here is a more economical variant that:

produces a stream of primes less than or equal to a given integer;
only records the status of odd integers greater than 3 during the sieving process;
optimizes the inner loop as described in the task description.

def primes:
  # The array we use for the sieve only stores information for the odd integers greater than 1:
  #  index   integer
  #      0         3
  #      k   2*k + 3
  # So if we wish to mark m = 2*k + 3, the relevant index is: m - 3 / 2
  def ix:
    if . % 2 == 0 then null
    else ((. - 3) / 2)
    end;
    
  # erase(i) sets .[i*j] to false for odd integral j > i, and assumes i is odd
  def erase(i):
    ((i - 3) / 2) as $k
    # Consider relevant multiples:
    then (((length * 2 + 3) / i)) as $upper
    # ... only consider odd multiples from i onwards
    | reduce range(i; $upper; 2) as $j (.;
         (((i * $j) - 3) / 2) as $m
         | if .[$m] then .[$m] = false else . end);

  if . < 2 then []
  else (. + 1) as $n
  | (($n|sqrt) / 2) as $s
  | [range(3; $n; 2)|true]
  | reduce (1 + (2 * range(1; $s)) ) as $i (.; erase($i))
  | . as $sieve
  | 2, (range(3; $n; 2) | select($sieve[ix]))
  end ;

def count(s): reduce s as $_ (0; .+1);

count(1e6 | primes)

Output:

Julia

Started with 2 already in the array, and then test only for odd numbers and push the prime ones onto the array.

# Returns an array of positive prime numbers less than or equal to lim
function sieve(lim :: Int)
    if lim < 2 return [] end
    limi :: Int = (lim - 1) ÷ 2 # calculate the required array size
    isprime :: Array{Bool} = trues(limi)
    llimi :: Int = (isqrt(lim) - 1) ÷ 2 # and calculate maximum root prime index
    result :: Array{Int} = [2]  #Initial array
    for i in 1:limi
        if isprime[i]
            p = i + i + 1 # 2i + 1
            if i <= llimi
                for j = (p*p-1)>>>1:p:limi # quick shift/divide in case LLVM doesn't optimize divide by 2 away
                    isprime[j] = false
                end
            end
            push!(result, p)
        end
    end
    return result
end

Alternate version using findall to get all primes at once in the end

function sieve(n::Integer)
    primes = fill(true, n)
    primes[1] = false
    for p in 2:n
        primes[p] || continue
        primes[p .* (2:n÷p)] .= false
    end
    findall(primes)
end

At about 35 seconds for a range of a billion on my Intel Atom i5-Z8350 CPU at 1.92 GHz (single threaded) or about 70 CPU clock cycles per culling operation, the above examples are two of the very slowest ways to compute the Sieve of Eratosthenes over any kind of a reasonable range due to a couple of factors:

The output primes are extracted to a result array which takes time (and memory) to construct.
They use the naive "one huge memory array" method, which has poor memory access speed for larger ranges.

Even though the first uses an odds-only algorithm (not noted in the text as is a requirement of the task) that reduces the number of operations by a factor of about two and a half times, it is not faster than the second, which is not odds-only due to the second being set up to take advantage of the `findall` function to directly output the indices of the remaining true values as the found primes; the second is faster due to the first taking longer to push the found primes singly to the constructed array, whereas internally the second first creates the array to the size of the counted true values and then just fills it.

Also, the first uses more memory than necessary in one byte per `Bool` where using a `BitArray` as in the second reduces this by a factor of eight.

If one is going to "crib" the MatLab algorithm as above, one may as well do it using odds-only as per the MatLab built-in. The following alternate code improves on the "Alternate" example above by making it sieve odds-only and adjusting the result array contents after to suit:

function sieve2(n :: Int)
    ni = (n - 1) ÷ 2
    isprime = trues(ni)
    for i in 1:ni
        if isprime[i]
            j = 2i * (i + 1)
            if j > ni
                m = findall(isprime)
                map!((i::Int) -> 2i + 1, m, m)
                return pushfirst!(m, 2)
            else
                p = 2i + 1
                while j <= ni
                  isprime[j] = false
                  j += p
                end
            end
        end
    end
end

This takes less about 18.5 seconds or 36 CPU cycles per culling operation to find the primes to a billion, but that is still quite slow compared to what can be done below. Note that the result array needs to be created then copied, created by the findall function, then modified in place by the map! function to transform the indices to primes, and finally copied by the pushfirst! function to add the only even prime of two to the beginning, but these operations are quire fast. However, this still consumes a lot of memory, as in about 64 Megabytes for the sieve buffer and over 400 Megabytes for the result (8-byte Int's for 64 bit execution) to sieve to a billion, and culling the huge culling buffer that doesn't fit the CPU cache sizes is what makes it slow.

Iterator Output

The creation of an output results array is not necessary if the purpose is just to scan across the resulting primes once, they can be output using an iterator (from a `BitArray`) as in the following odds-only code:

const Prime = UInt64

struct Primes
    rangei :: Int64
    primebits :: BitArray{1}
    function Primes(n :: Int64)
        if n < 3
          if n < 2 return new(-1, falses(0)) # no elements
          else return new((0, trues(0))) end # n = 2: meaning is 1 element of 2
        end
        limi :: Int = (n - 1) ÷ 2 # calculate the required array size
        isprimes :: BitArray = trues(limi)
        @inbounds(
        for i in 1:limi
            p = i + i + 1
            start = (p * p - 1) >>> 1 # shift/divide if LLVM doesn't optimize
            if start > limi
                return new(limi, isprimes)
            end
            if isprimes[i]
                for j in start:p:limi
                  isprimes[j] = false
                end
            end
        end)
    end
end

Base.eltype(::Type{Primes}) = Prime

function Base.length(P::Primes)::Int64
    if P.rangei < 0 return 0 end
    return 1 + count(P.primebits)
end

function Base.iterate(P::Primes, state::Int = 0)::
                                        Union{Tuple{Prime, Int}, Nothing}
    lmt = P.rangei
    if state > lmt return nothing end
    if state <= 0 return (UInt64(2), 1) end
    let
        prmbts = P.primebits
        i = state
        @inbounds(
        while i <= lmt && !prmbts[i] i += 1 end)
        if i > lmt return nothing end
        return (i + i + 1, i + 1)
    end
end

for which using the following code:

function bench()
  @time length(Primes(100)) # warm up JIT
#  println(@time count(x->true, Primes(1000000000))) # about 1.5 seconds slower counting over iteration
  println(@time length(Primes(1000000000)))
end
bench()

results in the following output:

Output:

  0.000031 seconds (3 allocations: 160 bytes)
 12.214533 seconds (4 allocations: 59.605 MiB, 0.42% gc time)
50847534

This reduces the CPU cycles per culling cycles to about 24.4, but it's still slow due to using the one largish array. Note that counting each iterated prime takes an additional about one and a half seconds, where if all that is required is the count of primes over a range the specialized length function is much faster.

Page Segmented Algorithm

For any kind of reasonably large range such as a billion, a page segmented version should be used with the pages sized to the CPU caches for much better memory access times. As well, the following odds-only example uses a custom bit packing algorithm for a further two times speed-up, also reducing the memory allocation delays by reusing the sieve buffers when possible (usually possible):

const Prime = UInt64
const BasePrime = UInt32
const BasePrimesArray = Array{BasePrime,1}
const SieveBuffer = Array{UInt8,1}

# contains a lazy list of a secondary base primes arrays feed
# NOT thread safe; needs a Mutex gate to make it so...
abstract type BPAS end # stands in for BasePrimesArrays, not defined yet
mutable struct BasePrimesArrays <: BPAS
    thunk :: Union{Nothing,Function} # problem with efficiency - untyped function!!!!!!!!!
    value :: Union{Nothing,Tuple{BasePrimesArray, BPAS}}
    BasePrimesArrays(thunk::Function) = new(thunk)
end
Base.eltype(::Type{BasePrimesArrays}) = BasePrime
Base.IteratorSize(::Type{BasePrimesArrays}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(BPAs::BasePrimesArrays, state::BasePrimesArrays = BPAs)
    if state.thunk !== nothing
        newvalue :: Union{Nothing,Tuple{BasePrimesArray, BasePrimesArrays}} =
            state.thunk() :: Union{Nothing,Tuple{BasePrimesArray
                                                 , BasePrimesArrays}}
        state.value = newvalue
        state.thunk = nothing
        return newvalue
    end
    state.value
end

# count the number of zero bits (primes) in a byte array,
# also works for part arrays/slices, best used as an `@view`...
function countComposites(cmpsts::AbstractArray{UInt8,1})
    foldl((a, b) -> a + count_zeros(b), cmpsts; init = 0)
end

# converts an entire sieved array of bytes into an array of UInt32 primes,
# to be used as a source of base primes...
function composites2BasePrimesArray(low::Prime, cmpsts::SieveBuffer)
    limiti = length(cmpsts) * 8
    len :: Int = countComposites(cmpsts)
    rslt :: BasePrimesArray = BasePrimesArray(undef, len)
    i :: Int = 0
    j :: Int = 1
    @inbounds(
    while i < limiti
        if cmpsts[i >>> 3 + 1] & (1 << (i & 7)) == 0
            rslt[j] = low + i + i
            j += 1
        end
        i += 1
    end)
    rslt
end

# sieving work done, based on low starting value for the given buffer and
# the given lazy list of base prime arrays...
function sieveComposites(low::Prime, buffer::Array{UInt8,1},
                                     bpas::BasePrimesArrays)
    lowi :: Int = (low - 3) ÷ 2
    len :: Int = length(buffer)
    limiti :: Int = len * 8 - 1
    nexti :: Int = lowi + limiti
    for bpa::BasePrimesArray in bpas
        for bp::BasePrime in bpa
            bpint :: Int = bp
            bpi :: Int = (bpint - 3) >>> 1
            starti :: Int = 2 * bpi * (bpi + 3) + 3
            starti >= nexti && return
            if starti >= lowi starti -= lowi
            else
                r :: Int = (lowi - starti) % bpint
                starti = r == 0 ? 0 : bpint - r
            end
            lmti :: Int = limiti - 40 * bpint
            @inbounds(
            if bpint <= (len >>> 2) starti <= lmti
                for i in 1:8
                    if starti > limiti break end
                    mask = convert(UInt8,1) << (starti & 7)
                    c = starti >>> 3 + 1
                    while c <= len
                        buffer[c] |= mask
                        c += bpint
                    end
                    starti += bpint
                end
            else
                c = starti
                while c <= limiti
                    buffer[c >>> 3 + 1] |= convert(UInt8,1) << (c & 7)
                    c += bpint
                end
            end)
        end
    end
    return
end

# starts the secondary base primes feed with minimum size in bits set to 4K...
# thus, for the first buffer primes up to 8293,
# the seeded primes easily cover it as 97 squared is 9409.
function makeBasePrimesArrays() :: BasePrimesArrays
    cmpsts :: SieveBuffer = Array{UInt8,1}(undef, 512)
    function nextelem(low::Prime, bpas::BasePrimesArrays) ::
                                    Tuple{BasePrimesArray, BasePrimesArrays}
        # calculate size so that the bit span is at least as big as the
        # maximum culling prime required, rounded up to minsizebits blocks...
        reqdsize :: Int = 2 + isqrt(1 + low)
        size :: Int = (reqdsize ÷ 4096 + 1) * 4096 ÷ 8 # size in bytes
        if size > length(cmpsts) cmpsts = Array{UInt8,1}(undef, size) end
        fill!(cmpsts, 0)
        sieveComposites(low, cmpsts, bpas)
        arr :: BasePrimesArray = composites2BasePrimesArray(low, cmpsts)
        next :: Prime = low + length(cmpsts) * 8 * 2
        arr, BasePrimesArrays(() -> nextelem(next, bpas))
    end
    # pre-seeding breaks recursive race,
    # as only known base primes used for first page...
    preseedarr :: BasePrimesArray = # pre-seed to 100, can sieve to 10,000...
        [ 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
        , 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97
        ]
    nextfunc :: Function = () ->
        (nextelem(convert(Prime,101), makeBasePrimesArrays()))
    firstfunc :: Function = () -> (preseedarr, BasePrimesArrays(nextfunc))
    BasePrimesArrays(firstfunc)
end

# an iterator over successive sieved buffer composite arrays,
# returning a tuple of the value represented by the lowest possible prime
# in the sieved composites array and the array itself;
# the array has a 16 Kilobytes minimum size (CPU L1 cache), but
# will grow so that the bit span is larger than the
# maximum culling base prime required, possibly making it larger than
# the L1 cache for large ranges, but still reasonably efficient using
# the L2 cache: very efficient up to about 16e9 range;
# reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 week...
struct PrimesPages
    baseprimes :: BasePrimesArrays
    PrimesPages() = new(makeBasePrimesArrays())
end
Base.eltype(::Type{PrimesPages}) = SieveBuffer
Base.IteratorSize(::Type{PrimesPages}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(PP::PrimesPages,
                      state :: Tuple{Prime,SieveBuffer} =
                            ( convert(Prime,3), Array{UInt8,1}(undef,16384) ))
    (low, cmpsts) = state
    # calculate size so that the bit span is at least as big as the
    # maximum culling prime required, rounded up to minsizebits blocks...
    reqdsize :: Int = 2 + isqrt(1 + low)
    size :: Int = (reqdsize ÷ 131072 + 1) * 131072 ÷ 8 # size in bytes
    if size > length(cmpsts) cmpsts = Array{UInt8,1}(undef, size) end
    fill!(cmpsts, 0)
    sieveComposites(low, cmpsts, PP.baseprimes)
    newlow :: Prime = low + length(cmpsts) * 8 * 2
    ( low, cmpsts ), ( newlow, cmpsts )
end

function countPrimesTo(range::Prime) :: Int64
    range < 3 && ((range < 2 && return 0) || return 1)
    count :: Int64 = 1
    for ( low, cmpsts ) in PrimesPages() # almost never exits!!!
        if low + length(cmpsts) * 8 * 2 > range
            lasti :: Int = (range - low) ÷ 2
            count += countComposites(@view cmpsts[1:lasti >>> 3])
            count += count_zeros(cmpsts[lasti >>> 3 + 1] |
                                 (0xFE << (lasti & 7)))
            return count
        end
        count += countComposites(cmpsts)
    end
    count
end

# iterator over primes from above page iterator;
# unless doing something special with individual primes, usually unnecessary;
# better to do manipulations based on the composites bit arrays...
# takes at least as long to enumerate the primes as sieve them...
mutable struct PrimesPaged
    primespages :: PrimesPages
    primespageiter :: Tuple{Tuple{Prime,SieveBuffer},Tuple{Prime,SieveBuffer}}
    PrimesPaged() = let PP = PrimesPages(); new(PP, Base.iterate(PP)) end
end
Base.eltype(::Type{PrimesPaged}) = Prime
Base.IteratorSize(::Type{PrimesPaged}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(PP::PrimesPaged, state::Int = -1 )
    state < 0 && return Prime(2), 0
    (low, cmpsts) = PP.primespageiter[1]
    len = length(cmpsts) * 8
    @inbounds(
    while state < len && cmpsts[state >>> 3 + 1] &
                         (UInt8(1) << (state & 7)) != 0
        state += 1
    end)
    if state >= len
        PP.primespageiter = Base.iterate(PP.primespages, PP.primespageiter[2])
        return Base.iterate(PP, 0)
    end
    low + state + state, state + 1
end

When tested with the following code:

function bench()
    print("( ")
    for p in PrimesPaged() p > 100 && break; print(p, " ") end
    println(")")
    countPrimesTo(Prime(100)) # warm up JIT
#=
    println(@time let count = 0
                      for p in PrimesPaged()
                          p > 1000000000 && break
                          count += 1
                      end; count end) # much slower counting over iteration
=#
    println(@time countPrimesTo(Prime(1000000000)))
end
bench()

it produces the following:

Output:

( 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 )
  1.947145 seconds (59 allocations: 39.078 KiB)
50847534

Note that "the slow way" as commented out in the code takes an extra about 4.85 seconds to count the primes to a billion, or longer to enumerate the primes than to cull the composites; this makes further work in making this yet faster pointless unless techniques such as the one used here to count the number of found primes by just counting the un-cancelled bit representations in the sieved sieve buffers are used.

This takes about 1.9 seconds to count the primes to a billion (using the fast technique), or about 3.75 clock cycles per culling operation, which is reasonably fast; this is almost 20 times faster the the first naive sieves. As written, the algorithm maintains its efficiency up to about 16 billion and then slows down as the buffer size increases beyond the CPU L1 cache size into the L2 cache size such that it takes about 436.8 seconds to sieve to 100 billion instead of the expected about 300 seconds; however, an extra feature of "double buffered sieving" could be added so that the buffer is sieved in L1 cache slices followed by a final sweep of the entire buffer by the few remaining cull operations that use the larger primes for only a slight reduction in average cycles per cull up to a range of about 2.56e14 (for this CPU). For really large ranges above that, another sieving technique known as the "bucket sieve" that sorts the culling operations by page so that processing time is not expended for values that don't "hit" a given page can be used for only a slight additional reduction in efficiency.

Additionally, maximal wheel factorization can reduce the time by about a factor of four, plus multi-processing where the work is shared across the CPU cores can produce a further speed-up by the factor of the number of cores (only three times on this four-core machine due to the clock speed reducing to 75% of the rate when all cores are used), for an additional about 12 times speed-up for this CPU. These improvements are just slightly too complex to post here.

However, even the version posted shows that the naive "one huge array" implementations should never be used for sieving ranges of over a few million, and that Julia can come very close to the speed of the fastest languages such as C/C++ for the same algorithm.

Functional Algorithm

One of the best simple purely functional Sieve of Eratosthenes algorithms is the infinite tree folding sequence algorithm as implemented in Haskell. As Julia does not have a standard LazyList implementation or library and as a full memoizing lazy list is not required for this algorithm, the following odds-only code implements the rudiments of a Co-Inductive Stream (CIS) in its implementation:

const Thunk = Function # can't define other than as a generalized Function

struct CIS{T}
    head :: T
    tail :: Thunk # produces the next CIS{T}
    CIS{T}(head :: T, tail :: Thunk) where T = new(head, tail)
end
Base.eltype(::Type{CIS{T}}) where T = T
Base.IteratorSize(::Type{CIS{T}}) where T = Base.SizeUnknown()
function Base.iterate(C::CIS{T}, state = C) :: Tuple{T, CIS{T}} where T
    state.head, state.tail()
end

function treefoldingprimes()::CIS{Int}
    function merge(xs::CIS{Int}, ys::CIS{Int})::CIS{Int}
        x = xs.head; y = ys.head
        if x < y CIS{Int}(x, () -> merge(xs.tail(), ys))
        elseif y < x CIS{Int}(y, () -> merge(xs, ys.tail()))
        else CIS{Int}(x, () -> merge(xs.tail(), ys.tail())) end
    end
    function pmultiples(p::Int)::CIS{Int}
        adv :: Int = p + p
        next(c::Int)::CIS{Int} = CIS{Int}(c, () -> next(c + adv)); next(p * p)
    end
    function allmultiples(ps::CIS{Int})::CIS{CIS{Int}}
        CIS{CIS{Int}}(pmultiples(ps.head), () -> allmultiples(ps.tail()))
    end
    function pairs(css :: CIS{CIS{Int}})::CIS{CIS{Int}}
        nextcss = css.tail()
        CIS{CIS{Int}}(merge(css.head, nextcss.head), ()->pairs(nextcss.tail()))
    end
    function composites(css :: CIS{CIS{Int}})::CIS{Int}
        CIS{Int}(css.head.head, ()-> merge(css.head.tail(),
                                            css.tail() |> pairs |> composites))
    end
    function minusat(n::Int, cs::CIS{Int})::CIS{Int}
        if n < cs.head CIS{Int}(n, () -> minusat(n + 2, cs))
        else minusat(n + 2, cs.tail()) end
    end
    oddprimes()::CIS{Int} = CIS{Int}(3, () -> minusat(5, oddprimes()
                                        |> allmultiples |> composites))
    CIS{Int}(2, () -> oddprimes())
end

when tested with the following:

@time let count = 0; for p in treefoldingprimes() p > 1000000 && break; count += 1 end; count end

it outputs the following:

Output:

  1.791058 seconds (10.23 M allocations: 290.862 MiB, 3.64% gc time)
78498

At about 1.8 seconds or 4000 cycles per culling operation to calculate the number of primes up to a million, this is very slow, but that is not the fault of Julia but rather just that purely functional incremental Sieve of Eratosthenes implementations are much slower than those using mutable arrays and are only useful over quite limited ranges of a few million. For one thing, incremental algorithms have O(n log n log log n) asymptotic execution complexity rather than O(n log log n) (an extra log n factor) and for another the constant execution overhead is much larger in creating (and garbage collecting) elements in the sequences.

The time for this algorithm is quite comparable to as implemented in other functional languages such as F# and actually faster than implementing the same algorithm in C/C++, but slower than as implemented in purely functional languages such as Haskell or even in only partly functional languages such as Kotlin by a factor of ten or more; this is due to those languages having specialized memory allocation that is very fast at allocating small amounts of memory per allocation as is often a requirement of functional programming. The majority of the time spent for this algorithm is spent allocating memory, and if future versions of Julia are to be of better use in purely functional programming, improvements need to be made to the memory allocation.

Infinite (Mutable) Iterator Using (Mutable) Dictionary

To gain some extra speed above the purely functional algorithm above, the Python'ish version as a mutable iterator embedding a mutable standard base Dictionary can be used. The following version uses a secondary delayed injection stream of "base" primes defined recursively to provide the successions of composite values in the Dictionary to be used for sieving:

const Prime = UInt64
abstract type PrimesDictAbstract end # used for forward reference
mutable struct PrimesDict <: PrimesDictAbstract
    sieve :: Dict{Prime,Prime}
    baseprimes :: PrimesDictAbstract
    lastbaseprime :: Prime
    q :: Prime
    PrimesDict() = new(Dict())
end
Base.eltype(::Type{PrimesDict}) = Prime
Base.IteratorSize(::Type{PrimesDict}) = Base.SizeUnknown() # "infinite"...
function Base.iterate(PD::PrimesDict, state::Prime = Prime(0) )
    if state < 1
        PD.baseprimes = PrimesDict()
        PD.lastbaseprime = Prime(3)
        PD.q = Prime(9)
        return Prime(2), Prime(1)
    end
    dict = PD.sieve
    while true
        state += 2
        if !haskey(dict, state)
            state < PD.q && return state, state
            p = PD.lastbaseprime # now, state = PD.q in all cases
            adv = p + p # since state is at PD.q, advance to next
            dict[state + adv] = adv # adds base prime composite stream
            # following initializes secondary base strea first time
            p <= 3 && Base.iterate(PD.baseprimes)
            p = Base.iterate(PD.baseprimes, p)[1] # next base prime
            PD.lastbaseprime = p
            PD.q = p * p
        else # advance hit composite in dictionary...
            adv = pop!(dict, state)
            next = state + adv
            while haskey(dict, next) next += adv end
            dict[next] = adv # past other composite hits in dictionary
        end
    end
end

The above version can be used and tested with similar code as for the functional version, but is about ten times faster at about 400 CPU clock cycles per culling operation, meaning it has a practical range ten times larger although it still has a O(n (log n) (log log n)) asymptotic performance complexity; for larger ranges such as sieving to a billion or more, this is still over a hundred times slower than the page segmented version using a page segmented sieving array.

Klingphix

include ..\Utilitys.tlhy

%limit %i
1000 !limit
( 1 $limit ) sequence

( 2 $limit sqrt int ) [ !i $i get [ ( 2 $limit 1 - $i / int ) [ $i * false swap set ] for ] if ] for
( 1 $limit false ) remove
pstack

"Press ENTER to end " input

Kotlin

import kotlin.math.sqrt

fun sieve(max: Int): List<Int> {
    val xs = (2..max).toMutableList()
    val limit = sqrt(max.toDouble()).toInt()
    for (x in 2..limit) xs -= x * x..max step x
    return xs
}

fun main(args: Array<String>) {
    println(sieve(100))
}

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Alternative much faster odds-only version that outputs an enumeration

The above version is quite slow for a lot of reasons: It includes even number culling even though those will be eliminated on the first pass; It uses a list rather than an array to do the composite culling (both of the above reasons also meaning it takes more memory); It uses enumerations (for..in) to implement loops at a execution time cost per loop. It also consumes more memory in the final result output as another list.

The following code overcomes most of those problems: It only culls odd composites; it culls a bit-packed primitive array (also saving memory); It uses tailcall recursive functions for the loops, which are compiled into simple loops. It also outputs the results as an enumeration, which isn't fast but does not consume any more memory than the culling array. In this way, the program is only limited in sieving range by the maximum size limit of the culling array, although as it grows larger than the CPU cache sizes, it loses greatly in speed; however, that doesn't matter so much if just enumerating the results.

fun primesOdds(rng: Int): Iterable<Int> {
    val topi = (rng - 3) shr 1
    val lstw = topi shr 5
    val sqrtndx = (Math.sqrt(rng.toDouble()).toInt() - 3) shr 1
    val cmpsts = IntArray(lstw + 1)

    tailrec fun testloop(i: Int) {
        if (i <= sqrtndx) {
            if (cmpsts[i shr 5] and (1 shl (i and 31)) == 0) {
                val p = i + i + 3
                tailrec fun cullp(j: Int) {
                    if (j <= topi) {
                        cmpsts[j shr 5] = cmpsts[j shr 5] or (1 shl (j and 31))
                        cullp(j + p)
                    }
                }
                cullp((p * p - 3) shr 1)
            }
            testloop(i + 1)
        }
    }

    tailrec fun test(i: Int): Int {
        return if (i <= topi && cmpsts[i shr 5] and (1 shl (i and 31)) != 0) {
            test(i + 1)
        } else {
            i
        }
    }

    testloop(0)

    val iter = object : IntIterator() {
        var i = -1
        override fun nextInt(): Int {
            val oi = i
            i = test(i + 1)
            return if (oi < 0) 2 else oi + oi + 3
        }
        override fun hasNext() = i < topi
    }
    return Iterable { -> iter }
}

fun main(args: Array<String>) {
    primesOdds(100).forEach { print("$it ") }
    println()
    println(primesOdds(1000000).count())
}

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
78498

Concise Functional Versions

Ah, one might say, for such a trivial range one writes for conciseness and not for speed. Well, I say, one can still save memory and some time using odds-only and a bit-packed array, but write very clear and concise (but slower) code using nothing but higher order functions and function calling. The following code using such techniques can use the same "main" function for the same output but is about two times slower, mostly due to the extra time spent making (nested) function calls, including the function calls necessary for enumeration. Note that the effect of using the "(l .. h).forEach { .. }" is the same as the "for i in l .. h { .. }" as both use an iteration across the range but the second is just syntax sugar to make it look more imperative:

fun primesOdds(rng: Int): Iterable<Int> {
    val topi = (rng - 3) / 2 //convert to nearest index
    val size = topi / 32 + 1 //word size to include index
    val sqrtndx = (Math.sqrt(rng.toDouble()).toInt() - 3) / 2
    val cmpsts = IntArray(size)
    fun is_p(i: Int) = cmpsts[i shr 5] and (1 shl (i and 0x1F)) == 0
    fun cull(i: Int) { cmpsts[i shr 5] = cmpsts[i shr 5] or (1 shl (i and 0x1F)) }
    fun cullp(p: Int) = (((p * p - 3) / 2 .. topi).step(p)).forEach { cull(it) }
    (0 .. sqrtndx).filter { is_p(it) }.forEach { cullp(it + it + 3) }
    fun i2p(i: Int) = if (i < 0) 2 else i + i + 3
    val orng = (-1 .. topi).filter { it < 0 || is_p(it) }.map { i2p(it) }
    return Iterable { -> orng.iterator() }
}

The trouble with the above version is that, at least for Kotlin version 1.0, the ".filter" and ".map" extension functions for Iterable<Int> create Java "ArrayList"'s as their output (which are wrapped to return the Kotlin "List<Int>" interface), thus take a considerable amount of memory worse than the first version (using an ArrayList to store the resulting primes), since as the calculations are chained to ".map", require a second ArrayList of up to the same size while the mapping is being done. The following version uses Sequences , which aren't backed by any permanent structure, but it is another small factor slower due to the nested function calls:

fun primesOdds(rng: Int): Iterable<Int> {
    val topi = (rng - 3) / 2 //convert to nearest index
    val size = topi / 32 + 1 //word size to include index
    val sqrtndx = (Math.sqrt(rng.toDouble()).toInt() - 3) / 2
    val cmpsts = IntArray(size)
    fun is_p(i: Int) = cmpsts[i shr 5] and (1 shl (i and 0x1F)) == 0
    fun cull(i: Int) { cmpsts[i shr 5] = cmpsts[i shr 5] or (1 shl (i and 0x1F)) }
    fun iseq(high: Int, low: Int = 0, stp: Int = 1) =
            Sequence { (low .. high step(stp)).iterator() }
    fun cullp(p: Int) = iseq(topi, (p * p - 3) / 2, p).forEach { cull(it) }
    iseq(sqrtndx).filter { is_p(it) }.forEach { cullp(it + it + 3) }
    fun i2p(i: Int) = if (i < 0) 2 else i + i + 3
    val oseq = iseq(topi, -1).filter { it < 0 || is_p(it) }.map { i2p(it) }
    return Iterable { -> oseq.iterator() }
}

Unbounded Versions

An incremental odds-only sieve outputting a sequence (iterator)

The following Sieve of Eratosthenes is not purely functional in that it uses a Mutable HashMap to store the state of succeeding composite numbers to be skipped over, but embodies the principles of an incremental implementation of the Sieve of Eratosthenes sieving odds-only and is faster than most incremental sieves due to using mutability. As with the fastest of this kind of sieve, it uses a delayed secondary primes feed as a source of base primes to generate the composite number progressions. The code as follows:

fun primesHM(): Sequence<Int> = sequence {
    yield(2)
    fun oddprms(): Sequence<Int> = sequence {
        yield(3); yield(5) // need at least 2 for initialization
        val hm = HashMap<Int,Int>()
        hm.put(9, 6)
        val bps = oddprms().iterator(); bps.next(); bps.next() // skip past 5
        yieldAll(generateSequence(SieveState(7, 5, 25)) {
            ss ->
                var n = ss.n; var q = ss.q
                n += 2
                while ( n >= q || hm.containsKey(n)) {
                    if (n >= q) {
                        val inc = ss.bp shl 1
                        hm.put(n + inc, inc)
                        val bp = bps.next(); ss.bp = bp; q = bp * bp
                    }
                    else {
                        val inc = hm.remove(n)!!
                        var next = n + inc
                        while (hm.containsKey(next)) {
                            next += inc
                        }
                        hm.put(next, inc)
                    }
                    n += 2
                }
                ss.n = n; ss.q = q
                ss
        }.map { it.n })
    }
    yieldAll(oddprms())
}

At about 370 clock cycles per culling operation (about 3,800 cycles per prime) on my tablet class Intel CPU, this is not blazing fast but adequate for ranges of a few millions to a hundred million and thus fine for doing things like solving Euler problems. For instance, Euler Problem 10 of summing the primes to two million can be done with the following "one-liner":

primesHM().takeWhile { it <= 2_000_000 }.map { it.toLong() }.sum()

to output the correct answer of the following in about 270 milliseconds for my Intel x5-Z8350 at 1.92 Gigahertz:

Output:

142913828922

A purely functional Incremental Sieve of Eratosthenes that outputs a sequence (iterator)

Following is a Kotlin implementation of the Tree Folding Incremental Sieve of Eratosthenes from an adaptation of the algorithm by Richard Bird. It is based on lazy lists, but in fact the memoization (and cost in execution time) of a lazy list is not required and the following code uses a "roll-your-own" implementation of a Co-Inductive Stream CIS). The final output is as a Sequence for convenience in using it. The code is written as purely function in that no mutation is used:

Translation of: Haskell

data class CIS<T>(val head: T, val tailf: () -> CIS<T>) {
  fun toSequence() = generateSequence(this) { it.tailf() } .map { it.head }
}

fun primes(): Sequence<Int> {
  fun merge(a: CIS<Int>, b: CIS<Int>): CIS<Int> {
    val ahd = a.head; val bhd = b.head
    if (ahd > bhd) return CIS(bhd) { ->merge(a, b.tailf()) }
    if (ahd < bhd) return CIS(ahd) { ->merge(a.tailf(), b) }
    return CIS(ahd) { ->merge(a.tailf(), b.tailf()) }
  }
  fun bpmults(p: Int): CIS<Int> {
    val inc = p + p
    fun mlts(c: Int): CIS<Int> = CIS(c) { ->mlts(c + inc) }
    return mlts(p * p)
  }
  fun allmults(ps: CIS<Int>): CIS<CIS<Int>> = CIS(bpmults(ps.head)) { allmults(ps.tailf()) }
  fun pairs(css: CIS<CIS<Int>>): CIS<CIS<Int>> {
    val xs = css.head; val yss = css.tailf(); val ys = yss.head
    return CIS(merge(xs, ys)) { ->pairs(yss.tailf()) }
  }
  fun union(css: CIS<CIS<Int>>): CIS<Int> {
    val xs = css.head
    return CIS(xs.head) { -> merge(xs.tailf(), union(pairs(css.tailf()))) }
  }
  tailrec fun minus(n: Int, cs: CIS<Int>): CIS<Int> =
    if (n >= cs.head) minus(n + 2, cs.tailf()) else CIS(n) { ->minus(n + 2, cs) }    
  fun oddprms(): CIS<Int> = CIS(3) { -> CIS(5) { ->minus(7, union(allmults(oddprms()))) } }
  return CIS(2) { ->oddprms() } .toSequence()
}

fun main(args: Array<String>) {
  val limit = 1000000
  val strt = System.currentTimeMillis()
  println(primes().takeWhile { it <= limit } .count())
  val stop = System.currentTimeMillis()
  println("Took ${stop - strt} milliseconds.")
}

The code is about five times slower than the more imperative hash table based version immediately above due to the costs of the extra levels of function calls in the functional style. The Haskell version from which this is derived is much faster due to the extensive optimizations it does to do with function/closure "lifting" as well as a Garbage Collector specifically tuned for functional code.

An unbounded Page Segmented Sieve of Eratosthenes that can output a sequence (iterator)

The very fastest implementations of a primes sieve are all based on bit-packed mutable arrays which can be made unbounded by setting them up so that they are a succession of sieved bit-packed arrays that have been culled of composites. The following code is an odds=only implementation that, again, uses a secondary feed of base primes that is only expanded as necessary (in this case memoized by a rudimentary lazy list structure to avoid recalculation for every base primes sweep per page segment):

internal typealias Prime = Long
internal typealias BasePrime = Int
internal typealias BasePrimeArray = IntArray
internal typealias SieveBuffer = ByteArray

// contains a lazy list of a secondary base prime arrays feed
internal data class BasePrimeArrays(val arr: BasePrimeArray,
                                     val rest: Lazy<BasePrimeArrays?>)
                                                : Sequence<BasePrimeArray> {
    override fun iterator() =
        generateSequence(this) { it.rest.value }
            .map { it.arr }.iterator()
}

// count the number of zero bits (primes) in a byte array,
fun countComposites(cmpsts: SieveBuffer): Int {
    var cnt = 0
    for (b in cmpsts) {
        cnt += java.lang.Integer.bitCount(b.toInt().and(0xFF))
    }
    return cmpsts.size.shl(3) - cnt
}

// converts an entire sieved array of bytes into an array of UInt32 primes,
// to be used as a source of base primes...
fun composites2BasePrimeArray(low: Int, cmpsts: SieveBuffer)
                                                            : BasePrimeArray {
    val lmti = cmpsts.size.shl(3)
    val len = countComposites(cmpsts)
    val rslt = BasePrimeArray(len)
    var j = 0
    for (i in 0 until lmti) {
        if (cmpsts[i.shr(3)].toInt() and 1.shl(i and 7) == 0) {
            rslt[j] = low + i + i; j++
        }
    }
    return rslt
}

// do sieving work based on low starting value for the given buffer and
// the given lazy list of base prime arrays...
fun sieveComposites(low: Prime, buffer: SieveBuffer,
                             bpas: Sequence<BasePrimeArray>) {
    val lowi = (low - 3L).shr(1)
    val len = buffer.size
    val lmti = len.shl(3)
    val nxti = lowi + lmti.toLong()
    for (bpa in bpas) {
        for (bp in bpa) {
            val bpi = (bp - 3).shr(1).toLong()
            var strti = (bpi * (bpi + 3L)).shl(1) + 3L
            if (strti >= nxti) return
            val s0 =
                if (strti >= lowi) (strti - lowi).toInt()
                else {
                    val r = (lowi - strti) % bp.toLong()
                    if (r.toInt() == 0) 0 else bp - r.toInt()
                }
            if (bp <= len.shr(3) && s0 <= lmti - bp.shl(6)) {
                val slmti = minOf(lmti, s0 + bp.shl(3))
                tailrec fun mods(s: Int) {
                    if (s < slmti) {
                        val msk = 1.shl(s and 7)
                        tailrec fun cull(c: Int) {
                            if (c < len) {
                                buffer[c] = (buffer[c].toInt() or msk).toByte()
                                cull(c + bp)
                            }
                        }
                        cull(s.shr(3)); mods(s + bp)
                    }
                }
                mods(s0)
            }
            else {
                tailrec fun cull(c: Int) {
                    if (c < lmti) {
                        val w = c.shr(3)
                        buffer[w] = (buffer[w].toInt() or 1.shl(c and 7)).toByte()
                        cull(c + bp)
                    }
                }
                cull(s0)
            }
        }
    } 
}

// starts the secondary base primes feed with minimum size in bits set to 4K...
// thus, for the first buffer primes up to 8293,
// the seeded primes easily cover it as 97 squared is 9409...
fun makeBasePrimeArrays(): Sequence<BasePrimeArray> {
    var cmpsts = SieveBuffer(512)
    fun nextelem(low: Int, bpas: Sequence<BasePrimeArray>): BasePrimeArrays {
        // calculate size so that the bit span is at least as big as the
        // maximum culling prime required, rounded up to minsizebits blocks...
        val rqdsz = 2 + Math.sqrt((1 + low).toDouble()).toInt()
        val sz = (rqdsz.shr(12) + 1).shl(9) // size iin bytes
        if (sz > cmpsts.size) cmpsts = SieveBuffer(sz)
        cmpsts.fill(0)
        sieveComposites(low.toLong(), cmpsts, bpas)
        val arr = composites2BasePrimeArray(low, cmpsts)
        val nxt = low + cmpsts.size.shl(4)
        return BasePrimeArrays(arr, lazy { ->nextelem(nxt, bpas) })
    }
    // pre-seeding breaks recursive race,
    // as only known base primes used for first page...
    var preseedarr = intArrayOf( // pre-seed to 100, can sieve to 10,000...
        3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
        , 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 )
    return BasePrimeArrays(preseedarr, lazy {->nextelem(101, makeBasePrimeArrays())})
}

// a seqence over successive sieved buffer composite arrays,
// returning a tuple of the value represented by the lowest possible prime
// in the sieved composites array and the array itself;
// the array has a 16 Kilobytes minimum size (CPU L1 cache), but
// will grow so that the bit span is larger than the
// maximum culling base prime required, possibly making it larger than
// the L1 cache for large ranges, but still reasonably efficient using
// the L2 cache: very efficient up to about 16e9 range;
// reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 day...
fun makeSievePages(): Sequence<Pair<Prime,SieveBuffer>> {
    val bpas = makeBasePrimeArrays() // secondary source of base prime arrays
    fun init(): SieveBuffer {
        val c = SieveBuffer(16384); sieveComposites(3L, c, bpas); return c }
    return generateSequence(Pair(3L, init())) {
        (low, cmpsts) ->
            // calculate size so that the bit span is at least as big as the
            // max culling prime required, rounded up to minsizebits blocks...
            val rqdsz = 2 + Math.sqrt((1 + low).toDouble()).toInt()
            val sz = (rqdsz.shr(17) + 1).shl(14) // size iin bytes
            val ncmpsts = if (sz > cmpsts.size) SieveBuffer(sz) else cmpsts
            ncmpsts.fill(0)
            val nlow = low + ncmpsts.size.toLong().shl(4)
            sieveComposites(nlow, ncmpsts, bpas)
            Pair(nlow, ncmpsts)
    }
}

fun countPrimesTo(range: Prime): Prime {
    if (range < 3) { if (range < 2) return 0 else return 1 }
    var count = 1L
    for ((low,cmpsts) in makeSievePages()) {
        if (low + cmpsts.size.shl(4) > range) {
            val lsti = (range - low).shr(1).toInt()
            val lstw = lsti.shr(3)
            val msk = -2.shl(lsti.and(7))
            count += 32 + lstw.shl(3)
            for (i in 0 until lstw)
                count -= java.lang.Integer.bitCount(cmpsts[i].toInt().and(0xFF))
            count -= java.lang.Integer.bitCount(cmpsts[lstw].toInt().or(msk))
            break
        } else {
            count += countComposites(cmpsts)
        }
    }
    return count
}

// sequence over primes from above page iterator;
// unless doing something special with individual primes, usually unnecessary;
// better to do manipulations based on the composites bit arrays...
// takes at least as long to enumerate the primes as sieve them...
fun primesPaged(): Sequence<Prime> = sequence {
    yield(2L)
    for ((low,cmpsts) in makeSievePages()) {
        val szbts = cmpsts.size.shl(3)
        for (i in 0 until szbts) {
            if (cmpsts[i.shr(3)].toInt() and 1.shl(i and 7) != 0) continue
            yield(low + i.shl(1).toLong())
        }
    }
}

For this implementation, counting the primes to a million is trivial at about 15 milliseconds on the same CPU as above, or almost too short to count.

It shows its speed in solving the Euler Problem 10 above about five times faster at about 50 milliseconds to give the same output:

It can sum the primes to 200 million or a hundred times the range in just over three seconds.

It finds the count of primes to a billion in about 16 seconds or just about 1000 times slower than to sum the primes to a range 1000 times less for an almost linear response to range as it should be.

However, much of the time (about two thirds) is spent iterating over the results rather than doing the actual work of sieving; for this sort of problem such as counting, finding the nth prime, finding occurrences of maximum prime gaps, etc., one should really use specialized function that directly manipulate the output sieve arrays. Such a function is provided by the `countPrimeTo` function, which can count the primes to a billion (50847534) in about 5.65 seconds, or about 10.6 clock cycles per culling operation or about 210 cycles per prime.

Kotlin isn't really fast even as compared to other virtual machine languages such as C# and F# on CLI but that is mostly due to limitations of the Java Virtual Machine (JVM) as to speed of generated Just In Time (JIT) compilation, handling of primitive number operations, enforced array bounds checks, etc. It will always be much slower than native code producing compilers and the (experimental) native compiler for Kotlin still isn't up to speed (pun intended), producing code that is many times slower than the code run on the JVM (December 2018).

Lambdatalk

• 1) create an array of natural numbers, [0,1,2,3, ... ,n-1]
• 2) the 3rd number is 2, we set to dots all its composites by steps of 2,
• 3) the 4th number is 3, we set to dots all its composites by steps of 3,
• 4) the 6th number is 5, we set to dots all its composites by steps of 5,
• 5) the remaining numbers are primes and we clean all dots.

For instance:

1: 0 0 0 0 0 0 0 0 0 9 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
2: 0 1 2 3 . 5 . 7 . 9 . 1 . 3 . 5 . 7 . 9 . 1 . 3 . 5 . 7 . 9 .
3: 0 1 2 3 . 5 . 7 . . . 1 . 3 . . . 7 . 9 . . . 3 . 5 . . . 9 .
4: 0 1 2 3 . 5 . 7 . . . 1 . 3 . . . 7 . 9 . . . 3 . . . . . 9 .
       | |   |   |       |   |       |   |       |           |
5:     0 0   0   0       1   1       1   1       2           2
       2 3   5   7       1   3       7   9       3           9


1) recursive version {rsieve n}

{def rsieve

  {def rsieve.mark
   {lambda {:n :a :i :j}
    {if {< :j :n}
     then {rsieve.mark :n 
                       {A.set! :j . :a}
                       :i
                       {+ :i :j}}
     else :a}}}

  {def rsieve.loop
   {lambda {:n :a :i}
    {if {< {* :i :i} :n}
     then {rsieve.loop :n
                       {if {W.equal? {A.get :i :a} .}
                        then :a 
                        else {rsieve.mark :n :a :i {* :i :i}}}
                       {+ :i 1}}
     else :a}}}

 {lambda {:n}
  {S.replace \s by space in 
   {S.replace (\[|\]|\.|,) by space in
    {A.disp
     {A.slice 2 -1 
      {rsieve.loop :n
       {A.new {S.serie 0 :n}} 2}}}}}}}
-> rsieve

{rsieve 1000}
-> 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997

Note: this version doesn't avoid stackoverflow.

2) iterative version {isieve n}

{def isieve

 {def isieve.mark 
  {lambda {:n :a :i}
   {S.map {{lambda {:a :j} {A.set! :j . :a}
           } :a}
          {S.serie {* :i :i} :n :i} }}}

 {lambda {:n}
  {S.replace \s by space in 
   {S.replace (\[|\]|\.|,) by space in
    {A.disp
     {A.slice 2 -1 
      {S.last 
       {S.map {{lambda {:n :a :i} {if {W.equal? {A.get :i :a} .}
                                   then
                                   else {isieve.mark :n :a :i}}
               } :n {A.new {S.serie 0 :n}}}
              {S.serie 2 {sqrt :n} 1}}}}}}}}}
-> isieve

{isieve 1000}
-> 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997

Notes: 
- this version avoids stackoverflow. 
- From 1 to 1000000 there are 78500 primes (computed in ~15000ms) and the last is 999983.

langur

Translation of: D

val .sieve = fn(.limit) {
    if .limit < 2: return []

    var .composite = .limit * [false]
    .composite[1] = true

    for .n in 2 .. trunc(.limit ^/ 2) + 1 {
        if not .composite[.n] {
            for .k = .n^2 ; .k < .limit ; .k += .n {
                .composite[.k] = true
            }
        }
    }

    filter fn(.n) not .composite[.n], series .limit-1
}

writeln .sieve(100)

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

LFE

(defmodule eratosthenes
   (export (sieve 1)))

(defun sieve (limit)
   (sieve limit (lists:seq 2 limit)))

(defun sieve
   ((limit (= l (cons p _))) (when (> (* p p) limit))
      l)
   ((limit (cons p ns))
      (cons p (sieve limit (remove-multiples p (* p p) ns)))))

(defun remove-multiples (p q l)
   (lists:reverse (remove-multiples p q l '())))

(defun remove-multiples
   ((_ _ '() s) s)
   ((p q (cons q ns) s)
      (remove-multiples p q ns s))
   ((p q (= r (cons a _)) s) (when (> a q))
      (remove-multiples p (+ q p) r s))
   ((p q (cons n ns) s)
      (remove-multiples p q ns (cons n s))))

Output:

lfe> (slurp "sieve.lfe")
#(ok eratosthenes)
lfe> (sieve 100)
(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

Liberty BASIC

    'Notice that arrays are globally visible to functions.
    'The sieve() function uses the flags() array.
    'This is a Sieve benchmark adapted from BYTE 1985
    ' May, page 286

    size = 7000
    dim flags(7001)
    start = time$("ms")
    print sieve(size); " primes found."
    print "End of iteration.  Elapsed time in milliseconds: "; time$("ms")-start
    end

    function sieve(size)
        for i = 0 to size
            if flags(i) = 0 then
                prime = i + i + 3
                k = i + prime
                while k <= size
                    flags(k) = 1
                    k = k + prime
                wend
                sieve = sieve + 1
            end if
        next i
    end function

Limbo

implement Sieve;

include "sys.m";
	sys: Sys;
	print: import sys;
include "draw.m";
	draw: Draw;

Sieve : module
{
	init : fn(ctxt : ref Draw->Context, args : list of string);
};

init (ctxt: ref Draw->Context, args: list of string)
{
	sys = load Sys Sys->PATH;

	limit := 201;
	sieve : array of int;
	sieve = array [201] of {* => 1};
	(sieve[0], sieve[1]) = (0, 0);

	for (n := 2; n < limit; n++) {
		if (sieve[n]) {
			for (i := n*n; i < limit; i += n) {
				sieve[i] = 0;
			}
		}
	}

	for (n = 1; n < limit; n++) {
		if (sieve[n]) {
			print ("%4d", n);
		} else {
			print("   .");
		};
		if ((n%20) == 0) 
			print("\n\n");
	}	
}

Lingo

-- parent script "sieve"
property _sieve

----------------------------------------
-- @constructor
----------------------------------------
on new (me)
    me._sieve = []
    return me
end

----------------------------------------
-- Returns list of primes <= n
----------------------------------------
on getPrimes (me, limit)
    if me._sieve.count<limit then me._primeSieve(limit)
    primes = []
    repeat with i = 2 to limit
        if me._sieve[i] then primes.add(i)
    end repeat
    return primes
end

----------------------------------------
-- Sieve of Eratosthenes
----------------------------------------
on _primeSieve (me, limit)
    me._sieve = [0]
    repeat with i = 2 to limit
        me._sieve[i] = 1
    end repeat
    c = sqrt(limit)
    repeat with i = 2 to c
        if (me._sieve[i]=0) then next repeat
        j = i*i -- start with square
        repeat while (j<=limit)
            me._sieve[j] = 0
            j = j + i
        end repeat
    end repeat
end

sieve = script("sieve").new()
put sieve.getPrimes(100)

Output:

-- [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

LiveCode

function sieveE int
    set itemdel to comma
    local sieve
    repeat with i = 2 to int
        put i into sieve[i]
    end repeat
    put 2 into n
    repeat while n < int
        repeat with p = n to int step n
            if p = n then 
                next repeat
            else
                put empty into sieve[p]
            end if
        end repeat
        add 1 to n
    end repeat
    combine sieve with comma
    filter items of sieve without empty
    sort items of sieve ascending numeric
    return sieve
end sieveE

Example

put sieveE(121)
--  2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113

# Sieve of Eratosthenes
# calculates prime numbers up to a given number
 
on mouseUp
   put field "maximum" into limit  
   put the ticks into startTicks      # start a timer
   repeat with i = 2 to limit step 1  # load array with zeros
      put 0 into prime_array[i]
   end repeat
   
   repeat with i = 2 to trunc(sqrt(limit)) # truncate square root
      if prime_array[i] = 0 then  
         repeat with k = (i * i) to limit step i
            delete variable prime_array[k] # remove non-primes
         end repeat
      end if
   end repeat
   put the ticks - startTicks into elapsedTicks  # stop timer
   put elapsedTicks / 60 into field "elapsed"    # calculate time
   
   put the keys of prime_array into prime_numbers # array to variable
   put the number of lines of keys of prime_array into field "count"
   sort lines of prime_numbers ascending numeric  
   put prime_numbers into field "primeList"      # show prime numbers
end mouseUp

LiveCode output example

Comments

LiveCode uses a mouse graphical drag and drop. No text code was used to create a button and fields; The user enters a number into the 'maximum' field and then clicks a button to run the code. It runs identically whether in the LiveCode IDE or when compiled to a executable on Mac, Windows, and Linux.

The example was run on an Intel i5 CPU @ 3.29 GHz; all primes found up to 1,000,000 in 3 seconds.

Logo

to sieve :limit
  make "a (array :limit 2)     ; initialized to empty lists
  make "p []
  for [i 2 :limit] [
    if empty? item :i :a [
      queue "p :i
      for [j [:i * :i] :limit :i] [setitem :j :a :i]
    ]
  ]
  output :p
end
print sieve 100   ; 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Logtalk

This example is incorrect. Please fix the code and remove this message.

Details: Not a true Sieve of Eratosthenes but rather a Trial Division Sieve

due to the use of mod (modulo = division) in the filter function.

A coinduction based solution just for fun:

:- object(sieve).

	:- public(primes/2).

	:- coinductive([
		sieve/2, filter/3
	]).

	% computes a coinductive list with all the primes in the 2..N interval
	primes(N, Primes) :-
		generate_infinite_list(N, List),
		sieve(List, Primes).

	% generate a coinductive list with a 2..N repeating patern
	generate_infinite_list(N, List) :-
		sequence(2, N, List, List).

	sequence(Sup, Sup, [Sup| List], List) :-
		!.
	sequence(Inf, Sup, [Inf| List], Tail) :-
		Next is Inf + 1,
		sequence(Next, Sup, List, Tail).

	sieve([H| T], [H| R]) :-
		filter(H, T, F),
		sieve(F, R).

	filter(H, [K| T], L) :-
		(	K > H, K mod H =:= 0 ->
			% throw away the multiple we found
			L = T1
		;	% we must not throw away the integer used for filtering
			% as we must return a filtered coinductive list
			L = [K| T1]
		),
		filter(H, T, T1).

:- end_object.

Example query:

?- sieve::primes(20, P).
P = [2, 3|_S1], % where
    _S1 = [5, 7, 11, 13, 17, 19, 2, 3|_S1] .

LOLCODE

HAI 1.2
CAN HAS STDIO?

HOW IZ I Eratosumthin YR Max
  I HAS A Siv ITZ A BUKKIT
  Siv HAS A SRS 1 ITZ 0
  I HAS A Index ITZ 2
  IM IN YR Inishul UPPIN YR Dummy WILE DIFFRINT Index AN SUM OF Max AN 1
    Siv HAS A SRS Index ITZ 1
    Index R SUM OF Index AN 1
  IM OUTTA YR Inishul

  I HAS A Prime ITZ 2
  IM IN YR MainLoop UPPIN YR Dummy WILE BOTH SAEM Max AN BIGGR OF Max AN PRODUKT OF Prime AN Prime
    BOTH SAEM Siv'Z SRS Prime AN 1
    O RLY?
      YA RLY
        Index R SUM OF Prime AN Prime
        IM IN YR MarkMultipulz UPPIN YR Dummy WILE BOTH SAEM Max AN BIGGR OF Max AN Index
          Siv'Z SRS Index R 0
          Index R SUM OF Index AN Prime
        IM OUTTA YR MarkMultipulz
    OIC
    Prime R SUM OF Prime AN 1
  IM OUTTA YR MainLoop

  Index R 1
  I HAS A First ITZ WIN
  IM IN YR PrintPrimes UPPIN YR Dummy WILE BOTH SAEM Max AN BIGGR OF Max AN Index
    BOTH SAEM Siv'Z SRS Index AN 1
    O RLY?
      YA RLY
        First
        O RLY?
          YA RLY
            First R FAIL
          NO WAI
            VISIBLE ", "!
        OIC
        VISIBLE Index!
    OIC
    Index R SUM OF Index AN 1
  IM OUTTA YR PrintPrimes
  VISIBLE ""
IF U SAY SO

I IZ Eratosumthin YR 100 MKAY

KTHXBYE

Output:

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97

Lua

function erato(n)
  if n < 2 then return {} end
  local t = {0} -- clears '1'
  local sqrtlmt = math.sqrt(n)
  for i = 2, n do t[i] = 1 end
  for i = 2, sqrtlmt do if t[i] ~= 0 then for j = i*i, n, i do t[j] = 0 end end end
  local primes = {}
  for i = 2, n do if t[i] ~= 0 then table.insert(primes, i) end end
  return primes
end

The following changes the code to odds-only using the same large array-based algorithm:

function erato2(n)
  if n < 2 then return {} end
  if n < 3 then return {2} end
  local t = {}
  local lmt = (n - 3) / 2
  local sqrtlmt = (math.sqrt(n) - 3) / 2
  for i = 0, lmt do t[i] = 1 end
  for i = 0, sqrtlmt do if t[i] ~= 0 then
    local p = i + i + 3
    for j = (p*p - 3) / 2, lmt, p do t[j] = 0 end end end
  local primes = {2}
  for i = 0, lmt do if t[i] ~= 0 then table.insert(primes, i + i + 3) end end
  return primes
end

The following code implements an odds-only "infinite" generator style using a table as a hash table, including postponing adding base primes to the table:

function newEratoInf()
  local _cand = 0; local _lstbp = 3; local _lstsqr = 9
  local _composites = {}; local _bps = nil
  local _self = {}
  function _self.next()
    if _cand < 9 then if _cand < 1 then _cand = 1; return 2
                     elseif _cand >= 7 then
                       --advance aux source base primes to 3...
                       _bps = newEratoInf()
                       _bps.next(); _bps.next() end end
    _cand = _cand + 2
    if _composites[_cand] == nil then -- may be prime
      if _cand >= _lstsqr then -- if not the next base prime
        local adv = _lstbp + _lstbp -- if next base prime
        _composites[_lstbp * _lstbp + adv] = adv -- add cull seq
        _lstbp = _bps.next(); _lstsqr = _lstbp * _lstbp -- adv next base prime
        return _self.next()
      else return _cand end -- is prime
    else
      local v = _composites[_cand]
      _composites[_cand] = nil
      local nv = _cand + v
      while _composites[nv] ~= nil do nv = nv + v end
      _composites[nv] = v
      return _self.next() end
  end
  return _self
end

gen = newEratoInf()
count = 0
while gen.next() <= 10000000 do count = count + 1 end -- sieves to 10 million
print(count)

which outputs "664579" in about three seconds. As this code uses much less memory for a given range than the previous ones and retains efficiency better with range, it is likely more appropriate for larger sieve ranges.

Lucid

This example is incorrect. Please fix the code and remove this message.

Details: Not a true Sieve of Eratosthenes but rather a Trial Division Sieve

prime
 where
    prime = 2 fby (n whenever isprime(n));
    n = 3 fby n+2;
    isprime(n) = not(divs) asa divs or prime*prime > N
                    where
                      N is current n;
                      divs = N mod prime eq 0;
                    end;
 end

recursive

This example is incorrect. Please fix the code and remove this message.

Details: Not a true Sieve of Eratosthenes but rather a Trial Division Sieve

sieve( N )
   where
    N = 2 fby N + 1;
    sieve( i ) =
      i fby sieve ( i whenever i mod first i ne 0 ) ;
   end

M2000 Interpreter

Module EratosthenesSieve (x) {
      \\ Κόσκινο του Ερατοσθένη
      Profiler
      If x>2000000 Then Exit
      Dim i(x+1): k=2: k2=sqrt(x)
      While k<=k2{if i(k) else for m=k*k to x step k{i(m)=1}
      k++}
      Print str$(timecount/1000,"####0.##")+" s"
      Input "Press enter skip print or a non zero to get results:", a%
      if a% then For i=2to x{If i(i)=0 Then Print i, 
      }
      Print:Print "Done"
}
EratosthenesSieve 1000

M4

define(`lim',100)dnl
define(`for',
   `ifelse($#,0,
      ``$0'',
      `ifelse(eval($2<=$3),1,
         `pushdef(`$1',$2)$5`'popdef(`$1')$0(`$1',eval($2+$4),$3,$4,`$5')')')')dnl
for(`j',2,lim,1,
   `ifdef(a[j],
      `',
      `j for(`k',eval(j*j),lim,j,
         `define(a[k],1)')')')

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

MAD

            NORMAL MODE IS INTEGER
          
          R TO GENERATE MORE PRIMES, CHANGE BOTH THESE NUMBERS          
            BOOLEAN PRIME
            DIMENSION PRIME(10000)
            MAXVAL = 10000
            PRINT FORMAT BEGIN,MAXVAL
            
          R ASSUME ALL ARE PRIMES AT BEGINNING
            THROUGH SET, FOR I=2, 1, I.G.MAXVAL
SET         PRIME(I) = 1B

          R REMOVE ALL PROVEN COMPOSITES
            SQMAX = SQRT.(MAXVAL)
            THROUGH NEXT, FOR P=2, 1, P.G.SQMAX
            WHENEVER PRIME(P)
                THROUGH MARK, FOR I=P*P, P, I.G.MAXVAL
MARK            PRIME(I) = 0B
NEXT        END OF CONDITIONAL

          R PRINT PRIMES
            THROUGH SHOW, FOR P=2, 1, P.G.MAXVAL
SHOW        WHENEVER PRIME(P), PRINT FORMAT NUMFMT, P
            
            VECTOR VALUES BEGIN = $13HPRIMES UP TO ,I9*$
            VECTOR VALUES NUMFMT = $I9*$
            END OF PROGRAM

Output:

PRIMES UP TO     10000
        2
        3
        5
        7
       11
       13
       17
...
     9979
     9983
     9985
     9989
     9991
     9995
     9997

Maple

Eratosthenes := proc(n::posint) 
  local numbers_to_check, i, k; 
  numbers_to_check := [seq(2 .. n)]; 
  for i from 2 to floor(sqrt(n)) do 
      for k from i by i while k <= n do 
          if evalb(k <> i) then
            numbers_to_check[k - 1] := 0; 
          end if; 
      end do; 
  end do; 
  numbers_to_check := remove(x -> evalb(x = 0), numbers_to_check); 
  return numbers_to_check; 
  end proc:

Output:

Eratosthenes(100);
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Mathematica/Wolfram Language

Eratosthenes[n_] := Module[{numbers = Range[n]},
  Do[If[numbers[[i]] != 0, Do[numbers[[i j]] = 0, {j, 2, n/i}]], {i, 
    2, Sqrt[n]}];
  Select[numbers, # > 1 &]]

Eratosthenes[100]

Slightly Optimized Version

The below has been improved to not require so many operations per composite number cull for about two thirds the execution time:

Eratosthenes[n_] := Module[{numbers = Range[n]},
  Do[If[numbers[[i]] != 0, Do[numbers[[j]] = 0, {j,i i,n,i}]],{i,2,Sqrt[n]}];
  Select[numbers, # > 1 &]]

Eratosthenes[100]

Sieving Odds-Only Version

The below has been further improved to only sieve odd numbers for a further reduction in execution time by a factor of over two:

Eratosthenes2[n_] := Module[{numbers = Range[3, n, 2], limit = (n - 1)/2}, 
  Do[c = numbers[[i]]; If[c != 0,
    Do[numbers[[j]] = 0, {j,(c c - 1)/2,limit,c}]], {i,1,(Sqrt[n] - 1)/2}];
  Prepend[Select[numbers, # > 1 &], 2]]

Eratosthenes2[100]

MATLAB / Octave

Somewhat optimized true Sieve of Eratosthenes

function P = erato(x)        % Sieve of Eratosthenes: returns all primes between 2 and x
        
    P = [0 2:x];             % Create vector with all ints between 2 and x where
                             %   position 1 is hard-coded as 0 since 1 is not a prime.

    for n = 2:sqrt(x)        % All primes factors lie between 2 and sqrt(x).
       if P(n)               % If the current value is not 0 (i.e. a prime),
          P(n*n:n:x) = 0;    % then replace all further multiples of it with 0.
       end
    end                      % At this point P is a vector with only primes and zeroes.

    P = P(P ~= 0);           % Remove all zeroes from P, leaving only the primes.
end

The optimization lies in fewer steps in the for loop, use of MATLAB's built-in array operations and no modulo calculation.

Limitation: your machine has to be able to allocate enough memory for an array of length x.

A more efficient Sieve

A more efficient Sieve avoids creating a large double precision vector P, instead using a logical array (which consumes 1/8 the memory of a double array of the same size) and only converting to double those values corresponding to primes.

function P = sieveOfEratosthenes(x)
    ISP = [false true(1, x-1)]; % 1 is not prime, but we start off assuming all numbers between 2 and x are
    for n = 2:sqrt(x)
        if ISP(n)
            ISP(n*n:n:x) = false; % Multiples of n that are greater than n*n are not primes
        end
    end
    % The ISP vector that we have calculated is essentially the output of the ISPRIME function called on 1:x
    P = find(ISP); % Convert the ISPRIME output to the values of the primes by finding the locations 
                   % of the TRUE values in S.
end

You can compare the output of this function against the PRIMES function included in MATLAB, which performs a somewhat more memory-efficient Sieve (by not storing even numbers, at the expense of a more complicated indexing expression inside the IF statement.)

Maxima

sieve(n):=block(
   [a:makelist(true,n),i:1,j],
   a[1]:false,
   do (
      i:i+1,
      unless a[i] do i:i+1,
      if i*i>n then return(sublist_indices(a,identity)),
      for j from i*i step i while j<=n do a[j]:false
   )
)$

MAXScript

fn eratosthenes n =
(
    multiples = #()
    print 2
    for i in 3 to n do
    (
        if (findItem multiples i) == 0 then
        (
            print i
            for j in (i * i) to n by i do
            (
                append multiples j
            )
        )
    )
)

eratosthenes 100

Mercury

:- module sieve.
:- interface.
:- import_module io.
:- pred main(io::di, io::uo) is det.
:- implementation.
:- import_module bool, array, int.

main(!IO) :-
    sieve(50, Sieve),
    dump_primes(2, size(Sieve), Sieve, !IO).

:- pred dump_primes(int, int, array(bool), io, io).
:- mode dump_primes(in, in, array_di, di, uo) is det.
dump_primes(N, Limit, !.A, !IO) :-
    ( if N < Limit then
        unsafe_lookup(!.A, N, Prime),
        (
            Prime = yes,
            io.write_line(N, !IO)
        ;
            Prime = no
        ),
        dump_primes(N + 1, Limit, !.A, !IO)
    else
        true
    ).

:- pred sieve(int, array(bool)).
:- mode sieve(in, array_uo) is det.
sieve(N, !:A) :-
    array.init(N, yes, !:A),
    sieve(2, N, !A).

:- pred sieve(int, int, array(bool), array(bool)).
:- mode sieve(in, in, array_di, array_uo) is det.
sieve(N, Limit, !A) :-
    ( if N < Limit then
        unsafe_lookup(!.A, N, Prime),
        (
            Prime = yes,
            sift(N + N, N, Limit, !A),
            sieve(N + 1, Limit, !A)
        ;
            Prime = no,
            sieve(N + 1, Limit, !A)
        )
    else
        true
    ).

:- pred sift(int, int, int, array(bool), array(bool)).
:- mode sift(in, in, in, array_di, array_uo) is det.
sift(I, N, Limit, !A) :-
    ( if I < Limit then
        unsafe_set(I, no, !A),
        sift(I + N, N, Limit, !A)
    else
        true
    ).

Microsoft Small Basic

Translation of: GW-BASIC

TextWindow.Write("Enter number to search to: ")
limit = TextWindow.ReadNumber()
For n = 2 To limit 
  flags[n] = 0
EndFor
For n = 2 To math.SquareRoot(limit)
  If flags[n] = 0 Then
    For K = n * n To limit Step n
      flags[K] = 1
    EndFor
  EndIf
EndFor
' Display the primes
If limit >= 2 Then
  TextWindow.Write(2)
  For n = 3 To limit
    If flags[n] = 0 Then 
      TextWindow.Write(", " + n)
    EndIf  
  EndFor
  TextWindow.WriteLine("")
EndIf

Modula-2

MODULE Erato;
FROM InOut IMPORT WriteCard, WriteLn;
FROM MathLib IMPORT sqrt;

CONST Max = 100; 

VAR prime: ARRAY [2..Max] OF BOOLEAN;
    i: CARDINAL;

PROCEDURE Sieve;
    VAR i, j, sqmax: CARDINAL;
BEGIN
    sqmax := TRUNC(sqrt(FLOAT(Max)));

    FOR i := 2 TO Max DO prime[i] := TRUE; END;
    FOR i := 2 TO sqmax DO
        IF prime[i] THEN
            j := i * 2;
            (* alas, the BY clause in a FOR loop must be a constant *)
            WHILE j <= Max DO 
                prime[j] := FALSE;
                j := j + i;
            END;
        END;
    END;
END Sieve;

BEGIN
    Sieve;
    FOR i := 2 TO Max DO
        IF prime[i] THEN
            WriteCard(i,5);
            WriteLn;
        END;
    END;
END Erato.

Output:

Modula-3

Regular version

This version runs slow because of the way I/O is implemented in the CM3 compiler. Setting ListPrimes = FALSE achieves speed comparable to C on sufficiently high values of LastNum (e.g., 10^6).

MODULE Eratosthenes EXPORTS Main;

IMPORT IO;

FROM Math IMPORT sqrt;

CONST
  LastNum    = 1000;
  ListPrimes = TRUE;

VAR
  a: ARRAY[2..LastNum] OF BOOLEAN;

VAR
  n := LastNum - 2 + 1;

BEGIN

  (* set up *)
  FOR i := FIRST(a) TO LAST(a) DO
    a[i] := TRUE;
  END;

  (* declare a variable local to a block *)
  VAR b := FLOOR(sqrt(FLOAT(LastNum, LONGREAL)));

  (* the block must follow immediately *)
  BEGIN

    (* print primes and mark out composites up to sqrt(LastNum) *)
    FOR i := FIRST(a) TO b DO
      IF a[i] THEN
        IF ListPrimes THEN IO.PutInt(i); IO.Put(" "); END;
        FOR j := i*i TO LAST(a) BY i DO
          IF a[j] THEN
            a[j] := FALSE;
            DEC(n);
          END;
        END;
      END;
    END;

    (* print remaining primes *)
    IF ListPrimes THEN
      FOR i := b + 1 TO LAST(a) DO
        IF a[i] THEN
          IO.PutInt(i); IO.Put(" ");
        END;
      END;
    END;

  END;

  (* report *)
  IO.Put("There are ");         IO.PutInt(n);
  IO.Put(" primes from 2 to "); IO.PutInt(LastNum);
  IO.PutChar('\n');

END Eratosthenes.

Advanced version

This version uses more "advanced" types.

(* From the CM3 examples folder (comments removed). *)

MODULE Sieve EXPORTS Main;

IMPORT IO;

TYPE
  Number = [2..1000];
  Set = SET OF Number;

VAR
  prime: Set := Set {FIRST(Number) .. LAST(Number)};

BEGIN
  FOR i := FIRST(Number) TO LAST(Number) DO
    IF i IN prime THEN
      IO.PutInt(i);
      IO.Put(" ");

      FOR j := i TO LAST(Number) BY i DO
        prime := prime - Set{j};
      END;
    END;
  END;
  IO.Put("\n");
END Sieve.

Mojo

Tested with Mojo version 0.7:

from memory import memset_zero
from memory.unsafe import (DTypePointer)
from time import (now)

alias cLIMIT: Int = 1_000_000_000

struct SoEBasic(Sized):
    var len: Int
    var cmpsts: DTypePointer[DType.bool] # because DynamicVector has deep copy bug in mojo version 0.7
    var sz: Int
    var ndx: Int
    fn __init__(inout self, limit: Int):
        self.len = limit - 1
        self.sz = limit - 1
        self.ndx = 0
        self.cmpsts = DTypePointer[DType.bool].alloc(limit - 1)
        memset_zero(self.cmpsts, limit - 1)
        for i in range(limit - 1):
            let s = i * (i + 4) + 2
            if s >= limit - 1: break
            if self.cmpsts[i]: continue
            let bp = i + 2
            for c in range(s, limit - 1, bp):
                self.cmpsts[c] = True
        for i in range(limit - 1):
            if self.cmpsts[i]: self.sz -= 1
    fn __del__(owned self):
        self.cmpsts.free()
    fn __copyinit__(inout self, existing: Self):
        self.len = existing.len
        self.cmpsts = DTypePointer[DType.bool].alloc(self.len)
        for i in range(self.len):
            self.cmpsts[i] = existing.cmpsts[i]
        self.sz = existing.sz
        self.ndx = existing.ndx
    fn __moveinit__(inout self, owned existing: Self):
        self.len = existing.len
        self.cmpsts = existing.cmpsts
        self.sz = existing.sz
        self.ndx = existing.ndx
    fn __len__(self: Self) -> Int: return self.sz
    fn __iter__(self: Self) -> Self: return self
    fn __next__(inout self: Self) -> Int:
        if self.ndx >= self.len: return 0
        while (self.ndx < self.len) and (self.cmpsts[self.ndx]):
            self.ndx += 1
        let rslt = self.ndx + 2; self.sz -= 1; self.ndx += 1
        return rslt

fn main():
    print("The primes to 100 are:")
    for prm in SoEBasic(100): print_no_newline(prm, " ")
    print()
    let strt0 = now()
    let answr0 = len(SoEBasic(1_000_000))
    let elpsd0 = (now() - strt0) / 1000000
    print("Found", answr0, "primes up to 1,000,000 in", elpsd0, "milliseconds.")
    let strt1 = now()
    let answr1 = len(SoEBasic(cLIMIT))
    let elpsd1 = (now() - strt1) / 1000000
    print("Found", answr1, "primes up to", cLIMIT, "in", elpsd1, "milliseconds.")

Output:

The primes to 100 are:
2  3  5  7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97  
Found 78498 primes up to 1,000,000 in 1.2642770000000001 milliseconds.
Found 50847534 primes up to 1000000000 in 6034.328751 milliseconds.

as run on an AMD 7840HS CPU at 5.1 GHz.

Note that due to the huge memory array used, when large ranges are selected, the speed is disproportional in speed slow down by about four times.

This solution uses an interator struct which seems to be the Mojo-preferred way to do this, and normally a DynamicVector would have been used the the culling array except that there is a bug in this version of DynamicVector where the array is not properly deep copied when copied to a new location, so the raw pointer type is used.

Odds-Only with Optimizations

This version does three significant improvements to the above code as follows: 1) It is trivial to skip the processing to store representations for and cull the even comosite numbers other than the prime number two, saving half the storage space and reducing the culling time to about 40 percent. 2) There is a repeating pattern of culling composite representations over a bit-packed byte array (which reduces the storage requirement by another eight times) that repeats every eight culling operations, which can be encapsulated by a extreme loop unrolling technique with compiler generated constants as done here. 3) Further, there is a further extreme optimization technique of dense culling for small base prime values whose culling span is less than one register in size where the loaded register is repeatedly culled for different base prime strides before being written out (with such optimization done by the compiler), again using compiler generated modification constants. This technique is usually further optimizated by modern compilers to use efficient autovectorization and the use of SIMD registers available to the architecture to reduce these culling operations to an avererage of a tiny fraction of a CPU clock cycle per cull.

Mojo version 0.7 was tested:

from memory import (memset_zero, memcpy)
from memory.unsafe import (DTypePointer)
from math.bit import ctpop
from time import (now)

alias cLIMIT: Int = 1_000_000_000

alias cBufferSize: Int = 262144 # bytes
alias cBufferBits: Int = cBufferSize * 8

alias UnrollFunc = fn(DTypePointer[DType.uint8], Int, Int, Int) -> None

@always_inline
fn extreme[OFST: Int, BP: Int](pcmps: DTypePointer[DType.uint8], bufsz: Int, s: Int, bp: Int):
  var cp = pcmps + (s >> 3)
  let r1: Int = ((s + bp) >> 3) - (s >> 3)
  let r2: Int = ((s + 2 * bp) >> 3) - (s >> 3)
  let r3: Int = ((s + 3 * bp) >> 3) - (s >> 3)
  let r4: Int = ((s + 4 * bp) >> 3) - (s >> 3)
  let r5: Int = ((s + 5 * bp) >> 3) - (s >> 3)
  let r6: Int = ((s + 6 * bp) >> 3) - (s >> 3)
  let r7: Int = ((s + 7 * bp) >> 3) - (s >> 3)
  let plmt: DTypePointer[DType.uint8] = pcmps + bufsz - r7
  while cp < plmt:
    cp.store(cp.load() | (1 << OFST))
    (cp + r1).store((cp + r1).load() | (1 << ((OFST + BP) & 7)))
    (cp + r2).store((cp + r2).load() | (1 << ((OFST + 2 * BP) & 7)))
    (cp + r3).store((cp + r3).load() | (1 << ((OFST + 3 * BP) & 7)))
    (cp + r4).store((cp + r4).load() | (1 << ((OFST + 4 * BP) & 7)))
    (cp + r5).store((cp + r5).load() | (1 << ((OFST + 5 * BP) & 7)))
    (cp + r6).store((cp + r6).load() | (1 << ((OFST + 6 * BP) & 7)))
    (cp + r7).store((cp + r7).load() | (1 << ((OFST + 7 * BP) & 7)))
    cp += bp
  let eplmt: DTypePointer[DType.uint8] = plmt + r7
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << OFST))
  cp += r1
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + BP) & 7)))
  cp += r2 - r1
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 2 * BP) & 7)))
  cp += r3 - r2
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 3 * BP) & 7)))
  cp += r4 - r3
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 4 * BP) & 7)))
  cp += r5 - r4
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 5 * BP) & 7)))
  cp += r6 - r5
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 6 * BP) & 7)))
  cp += r7 - r6
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 7 * BP) & 7)))

fn mkExtrm[CNT: Int](pntr: Pointer[UnrollFunc]):
  @parameter
  if CNT >= 32:
    return
  alias OFST = CNT >> 2
  alias BP = ((CNT & 3) << 1) + 1
  pntr.offset(CNT).store(extreme[OFST, BP])
  mkExtrm[CNT + 1](pntr)

@always_inline
fn mkExtremeFuncs() -> Pointer[UnrollFunc]:
  let jmptbl: Pointer[UnrollFunc] = Pointer[UnrollFunc].alloc(32)
  mkExtrm[0](jmptbl)
  return jmptbl
  
let extremeFuncs = mkExtremeFuncs()

alias DenseFunc = fn(DTypePointer[DType.uint64], Int, Int) -> DTypePointer[DType.uint64]

fn mkDenseCull[N: Int, BP: Int](cp: DTypePointer[DType.uint64]):
  @parameter
  if N >= 64:
    return
  alias MUL = N * BP
  var cop = cp.offset(MUL >> 6)
  cop.store(cop.load() | (1 << (MUL & 63)))
  mkDenseCull[N + 1, BP](cp)

@always_inline
fn denseCullFunc[BP: Int](pcmps: DTypePointer[DType.uint64], bufsz: Int, s: Int) -> DTypePointer[DType.uint64]:
  var cp: DTypePointer[DType.uint64] = pcmps + (s >> 6)
  let plmt = pcmps + (bufsz >> 3) - BP
  while cp < plmt:
    mkDenseCull[0, BP](cp)
    cp += BP
  return cp

fn mkDenseFunc[CNT: Int](pntr: Pointer[DenseFunc]):
  @parameter
  if CNT >= 64:
    return
  alias BP = (CNT << 1) + 3
  pntr.offset(CNT).store(denseCullFunc[BP])
  mkDenseFunc[CNT + 1](pntr)

@always_inline
fn mkDenseFuncs() -> Pointer[DenseFunc]:
  let jmptbl : Pointer[DenseFunc] = Pointer[DenseFunc].alloc(64)
  mkDenseFunc[0](jmptbl)
  return jmptbl

let denseFuncs : Pointer[DenseFunc] = mkDenseFuncs()

@always_inline
fn cullPass(cmpsts: DTypePointer[DType.uint8], bytesz: Int, s: Int, bp: Int):
    if bp <= 129: # dense culling
        var sm = s
        while (sm >> 3) < bytesz and (sm & 63) != 0:
            cmpsts[sm >> 3] |= (1 << (sm & 7))
            sm += bp
        let bcp = denseFuncs[(bp - 3) >> 1](cmpsts.bitcast[DType.uint64](), bytesz, sm)
        var ns = 0
        var ncp = bcp
        let cmpstslmtp = (cmpsts + bytesz).bitcast[DType.uint64]()
        while ncp < cmpstslmtp:
            ncp[0] |= (1 << (ns & 63))
            ns += bp
            ncp = bcp + (ns >> 6)
    else: # extreme loop unrolling culling
        extremeFuncs[((s & 7) << 2) + ((bp & 7) >> 1)](cmpsts, bytesz, s, bp)
#    for c in range(s, self.len, bp): # slow bit twiddling way
#        self.cmpsts[c >> 3] |= (1 << (c & 7))

fn countPagePrimes(ptr: DTypePointer[DType.uint8], bitsz: Int) -> Int:
    let wordsz: Int = (bitsz + 63) // 64  # round up to nearest 64 bit boundary 
    var rslt: Int = wordsz * 64
    let bigcmps = ptr.bitcast[DType.uint64]()        
    for i in range(wordsz - 1):
       rslt -= ctpop(bigcmps[i]).to_int()
    rslt -= ctpop(bigcmps[wordsz - 1] | (-2 << ((bitsz - 1) & 63))).to_int()
    return rslt

struct SoEOdds(Sized):
    var len: Int
    var cmpsts: DTypePointer[DType.uint8] # because DynamicVector has deep copy bug in Mojo version 0.7
    var sz: Int
    var ndx: Int
    fn __init__(inout self, limit: Int):
        self.len = 0 if limit < 2 else (limit - 3) // 2 + 1
        self.sz = 0 if limit < 2 else self.len + 1 # for the unprocessed only even prime of two
        self.ndx = -1
        let bytesz = 0 if limit < 2 else ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
        self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
        memset_zero(self.cmpsts, bytesz)
        for i in range(self.len):
            let s = (i + i) * (i + 3) + 3
            if s >= self.len: break
            if (self.cmpsts[i >> 3] >> (i & 7)) & 1 != 0: continue
            let bp = i + i + 3
            cullPass(self.cmpsts, bytesz, s, bp)
        self.sz = countPagePrimes(self.cmpsts, self.len) + 1 # add one for only even prime of two
    fn __del__(owned self):
        self.cmpsts.free()
    fn __copyinit__(inout self, existing: Self):
        self.len = existing.len
        let bytesz = (self.len + 7) // 8
        self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
        memcpy(self.cmpsts, existing.cmpsts, bytesz)
        self.sz = existing.sz
        self.ndx = existing.ndx
    fn __moveinit__(inout self, owned existing: Self):
        self.len = existing.len
        self.cmpsts = existing.cmpsts
        self.sz = existing.sz
        self.ndx = existing.ndx
    fn __len__(self: Self) -> Int: return self.sz
    fn __iter__(self: Self) -> Self: return self
    @always_inline
    fn __next__(inout self: Self) -> Int:
        if self.ndx < 0:
            self.ndx = 0; self.sz -= 1; return 2
        while (self.ndx < self.len) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
            self.ndx += 1
        let rslt = (self.ndx << 1) + 3; self.sz -= 1; self.ndx += 1; return rslt

fn main():
    print("The primes to 100 are:")
    for prm in SoEOdds(100): print_no_newline(prm, " ")
    print()
    let strt0 = now()
    let answr0 = len(SoEOdds(1_000_000))
    let elpsd0 = (now() - strt0) / 1000000
    print("Found", answr0, "primes up to 1,000,000 in", elpsd0, "milliseconds.")
    let strt1 = now()
    let answr1 = len(SoEOdds(cLIMIT))
    let elpsd1 = (now() - strt1) / 1000000
    print("Found", answr1, "primes up to", cLIMIT, "in", elpsd1, "milliseconds.")

Output:

The primes to 100 are:
2  3  5  7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97  
Found 78498 primes up to 1,000,000 in 0.085067000000000004 milliseconds.
Found 50847534 primes up to 1000000000 in 1204.866606 milliseconds.

This was run on the same computer as the above example; notice that while this is much faster than that version, it is still very slow as the sieving range gets large such that the relative processing time for a range that is 1000 times as large is about ten times slower than as might be expected by simple scaling. This is due to the "one huge sieving buffer" algorithm that gets very large with increasing range (and in fact will eventually limit the sieving range that can be used) to exceed the size of CPU cache buffers and thus greatly slow average memory access times.

Page-Segmented Odds-Only with Optimizations

While the above version performs reasonably well for small sieving ranges that fit within the CPU caches of a few tens of millions, as one can see it gets much slower with larger ranges and as well its huge RAM memory consumption limits the maximum range over which it can be used. This version solves these problems be breaking the huge sieving array into "pages" that each fit within the CPU cache size and processing each "page" sequentially until the target range is reached. This technique also greatly reduces memory requirements to only that required to store the base prime value representations up to the square root of the range limit (about O(n/log n) storage plus a fixed size page buffer. In this case, the storage for the base primes has been reduced by a constant factor by storing them as single byte deltas from the previous value, which works for ranges up to the 64-bit number range where the biggest gap is two times 192 and since we store only for odd base primes, the gap values are all half values to fit in a single byte.

Currently, Mojo has problems with some functions in the standard libraries such as the integer square root function is not accurate nor does it work for the required integer types so a custom integer square root function is supplied. As well, current Mojo does not support recursion for hardly any useful cases (other than compile time global function recursion), so the `SoeOdds` structure from the previous answer had to be kept to generate the base prime representation table (or this would have had to be generated from scratch within the new `SoEOddsPaged` structure). Finally, it didn't seem to be worth using the `Sized` trait for the new structure as this would seem to sometimes require processing the pages twice, one to obtain the size and once if iteration across the prime values is required.

Tested with Mojo version 0.7:

from memory import (memset_zero, memcpy)
from memory.unsafe import (DTypePointer)
from math.bit import ctpop
from time import (now)

alias cLIMIT: Int = 1_000_000_000

alias cBufferSize: Int = 262144 # bytes
alias cBufferBits: Int = cBufferSize * 8

fn intsqrt(n: UInt64) -> UInt64:
  if n < 4:
    if n < 1: return 0 else: return 1
  var x: UInt64 = n; var qn: UInt64 = 0; var r: UInt64 = 0
  while qn < 64 and (1 << qn) <= n:
    qn += 2
  var q: UInt64 = 1 << qn
  while q > 1:
    if qn >= 64:
      q = 1 << (qn - 2); qn = 0
    else:
      q >>= 2
    let t: UInt64 =  r + q
    r >>= 1
    if x >= t:
      x -= t; r += q
  return r

alias UnrollFunc = fn(DTypePointer[DType.uint8], Int, Int, Int) -> None

@always_inline
fn extreme[OFST: Int, BP: Int](pcmps: DTypePointer[DType.uint8], bufsz: Int, s: Int, bp: Int):
  var cp = pcmps + (s >> 3)
  let r1: Int = ((s + bp) >> 3) - (s >> 3)
  let r2: Int = ((s + 2 * bp) >> 3) - (s >> 3)
  let r3: Int = ((s + 3 * bp) >> 3) - (s >> 3)
  let r4: Int = ((s + 4 * bp) >> 3) - (s >> 3)
  let r5: Int = ((s + 5 * bp) >> 3) - (s >> 3)
  let r6: Int = ((s + 6 * bp) >> 3) - (s >> 3)
  let r7: Int = ((s + 7 * bp) >> 3) - (s >> 3)
  let plmt: DTypePointer[DType.uint8] = pcmps + bufsz - r7
  while cp < plmt:
    cp.store(cp.load() | (1 << OFST))
    (cp + r1).store((cp + r1).load() | (1 << ((OFST + BP) & 7)))
    (cp + r2).store((cp + r2).load() | (1 << ((OFST + 2 * BP) & 7)))
    (cp + r3).store((cp + r3).load() | (1 << ((OFST + 3 * BP) & 7)))
    (cp + r4).store((cp + r4).load() | (1 << ((OFST + 4 * BP) & 7)))
    (cp + r5).store((cp + r5).load() | (1 << ((OFST + 5 * BP) & 7)))
    (cp + r6).store((cp + r6).load() | (1 << ((OFST + 6 * BP) & 7)))
    (cp + r7).store((cp + r7).load() | (1 << ((OFST + 7 * BP) & 7)))
    cp += bp
  let eplmt: DTypePointer[DType.uint8] = plmt + r7
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << OFST))
  cp += r1
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + BP) & 7)))
  cp += r2 - r1
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 2 * BP) & 7)))
  cp += r3 - r2
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 3 * BP) & 7)))
  cp += r4 - r3
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 4 * BP) & 7)))
  cp += r5 - r4
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 5 * BP) & 7)))
  cp += r6 - r5
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 6 * BP) & 7)))
  cp += r7 - r6
  if eplmt == cp or eplmt < cp: return
  cp.store(cp.load() | (1 << ((OFST + 7 * BP) & 7)))

fn mkExtrm[CNT: Int](pntr: Pointer[UnrollFunc]):
  @parameter
  if CNT >= 32:
    return
  alias OFST = CNT >> 2
  alias BP = ((CNT & 3) << 1) + 1
  pntr.offset(CNT).store(extreme[OFST, BP])
  mkExtrm[CNT + 1](pntr)

@always_inline
fn mkExtremeFuncs() -> Pointer[UnrollFunc]:
  let jmptbl: Pointer[UnrollFunc] = Pointer[UnrollFunc].alloc(32)
  mkExtrm[0](jmptbl)
  return jmptbl
  
let extremeFuncs = mkExtremeFuncs()

alias DenseFunc = fn(DTypePointer[DType.uint64], Int, Int) -> DTypePointer[DType.uint64]

fn mkDenseCull[N: Int, BP: Int](cp: DTypePointer[DType.uint64]):
  @parameter
  if N >= 64:
    return
  alias MUL = N * BP
  var cop = cp.offset(MUL >> 6)
  cop.store(cop.load() | (1 << (MUL & 63)))
  mkDenseCull[N + 1, BP](cp)

@always_inline
fn denseCullFunc[BP: Int](pcmps: DTypePointer[DType.uint64], bufsz: Int, s: Int) -> DTypePointer[DType.uint64]:
  var cp: DTypePointer[DType.uint64] = pcmps + (s >> 6)
  let plmt = pcmps + (bufsz >> 3) - BP
  while cp < plmt:
    mkDenseCull[0, BP](cp)
    cp += BP
  return cp

fn mkDenseFunc[CNT: Int](pntr: Pointer[DenseFunc]):
  @parameter
  if CNT >= 64:
    return
  alias BP = (CNT << 1) + 3
  pntr.offset(CNT).store(denseCullFunc[BP])
  mkDenseFunc[CNT + 1](pntr)

@always_inline
fn mkDenseFuncs() -> Pointer[DenseFunc]:
  let jmptbl : Pointer[DenseFunc] = Pointer[DenseFunc].alloc(64)
  mkDenseFunc[0](jmptbl)
  return jmptbl

let denseFuncs : Pointer[DenseFunc] = mkDenseFuncs()

@always_inline
fn cullPass(cmpsts: DTypePointer[DType.uint8], bytesz: Int, s: Int, bp: Int):
    if bp <= 129: # dense culling
        var sm = s
        while (sm >> 3) < bytesz and (sm & 63) != 0:
            cmpsts[sm >> 3] |= (1 << (sm & 7))
            sm += bp
        let bcp = denseFuncs[(bp - 3) >> 1](cmpsts.bitcast[DType.uint64](), bytesz, sm)
        var ns = 0
        var ncp = bcp
        let cmpstslmtp = (cmpsts + bytesz).bitcast[DType.uint64]()
        while ncp < cmpstslmtp:
            ncp[0] |= (1 << (ns & 63))
            ns += bp
            ncp = bcp + (ns >> 6)
    else: # extreme loop unrolling culling
        extremeFuncs[((s & 7) << 2) + ((bp & 7) >> 1)](cmpsts, bytesz, s, bp)
#    for c in range(s, self.len, bp): # slow bit twiddling way
#        self.cmpsts[c >> 3] |= (1 << (c & 7))

fn cullPage(lwi: Int, lmt: Int, cmpsts: DTypePointer[DType.uint8], bsprmrps: DTypePointer[DType.uint8]):
    var bp = 1; var ndx = 0
    while True:
        bp += bsprmrps[ndx].to_int() << 1
        let i = (bp - 3) >> 1
        var s = (i + i) * (i + 3) + 3
        if s >= lmt: break
        if s >= lwi: s -= lwi
        else:
            s = (lwi - s) % bp
            if s != 0: s = bp - s
        cullPass(cmpsts, cBufferSize, s, bp)
        ndx += 1

fn countPagePrimes(ptr: DTypePointer[DType.uint8], bitsz: Int) -> Int:
    let wordsz: Int = (bitsz + 63) // 64  # round up to nearest 64 bit boundary 
    var rslt: Int = wordsz * 64
    let bigcmps = ptr.bitcast[DType.uint64]()        
    for i in range(wordsz - 1):
       rslt -= ctpop(bigcmps[i]).to_int()
    rslt -= ctpop(bigcmps[wordsz - 1] | (-2 << ((bitsz - 1) & 63))).to_int()
    return rslt

struct SoEOdds(Sized):
    var len: Int
    var cmpsts: DTypePointer[DType.uint8] # because DynamicVector has deep copy bug in Mojo version 0.7
    var sz: Int
    var ndx: Int
    fn __init__(inout self, limit: Int):
        self.len = 0 if limit < 2 else (limit - 3) // 2 + 1
        self.sz = 0 if limit < 2 else self.len + 1 # for the unprocessed only even prime of two
        self.ndx = -1
        let bytesz = 0 if limit < 2 else ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
        self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
        memset_zero(self.cmpsts, bytesz)
        for i in range(self.len):
            let s = (i + i) * (i + 3) + 3
            if s >= self.len: break
            if (self.cmpsts[i >> 3] >> (i & 7)) & 1 != 0: continue
            let bp = i + i + 3
            cullPass(self.cmpsts, bytesz, s, bp)
        self.sz = countPagePrimes(self.cmpsts, self.len) + 1 # add one for only even prime of two
    fn __del__(owned self):
        self.cmpsts.free()
    fn __copyinit__(inout self, existing: Self):
        self.len = existing.len
        let bytesz = (self.len + 7) // 8
        self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
        memcpy(self.cmpsts, existing.cmpsts, bytesz)
        self.sz = existing.sz
        self.ndx = existing.ndx
    fn __moveinit__(inout self, owned existing: Self):
        self.len = existing.len
        self.cmpsts = existing.cmpsts
        self.sz = existing.sz
        self.ndx = existing.ndx
    fn __len__(self: Self) -> Int: return self.sz
    fn __iter__(self: Self) -> Self: return self
    @always_inline
    fn __next__(inout self: Self) -> Int:
        if self.ndx < 0:
            self.ndx = 0; self.sz -= 1; return 2
        while (self.ndx < self.len) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
            self.ndx += 1
        let rslt = (self.ndx << 1) + 3; self.sz -= 1; self.ndx += 1; return rslt

struct SoEOddsPaged:
    var len: Int
    var cmpsts: DTypePointer[DType.uint8] # because DynamicVector has deep copy bug in Mojo version 0.7
    var sz: Int # 0 means finished; otherwise contains number of odd base primes
    var ndx: Int
    var lwi: Int
    var bsprmrps: DTypePointer[DType.uint8] # contains deltas between odd base primes starting from zero
    fn __init__(inout self, limit: UInt64):
        self.len = 0 if limit < 2 else ((limit - 3) // 2 + 1).to_int()
        self.sz = 0 if limit < 2 else 1 # means iterate until this is set to zero
        self.ndx = -1 # for unprocessed only even prime of two
        self.lwi = 0
        if self.len < cBufferBits:
            let bytesz = ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
            self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
            self.bsprmrps = DTypePointer[DType.uint8].alloc(self.sz)
        else:
            self.cmpsts = DTypePointer[DType.uint8].alloc(cBufferSize)
            let bsprmitr = SoEOdds(intsqrt(limit).to_int())
            self.sz = len(bsprmitr)
            self.bsprmrps = DTypePointer[DType.uint8].alloc(self.sz)
            var ndx = -1; var oldbp = 1
            for bsprm in bsprmitr:
                if ndx < 0: ndx += 1; continue # skip over the 2 prime
                self.bsprmrps[ndx] = (bsprm - oldbp) >> 1
                oldbp = bsprm; ndx += 1
            self.bsprmrps[ndx] = 255 # one extra value to go beyond the necessary cull space
    fn __del__(owned self):
        self.cmpsts.free(); self.bsprmrps.free()
    fn __copyinit__(inout self, existing: Self):
        self.len = existing.len
        self.sz = existing.sz
        let bytesz = cBufferSize if self.len >= cBufferBits
                     else ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
        self.cmpsts = DTypePointer[DType.uint8].alloc(bytesz)
        memcpy(self.cmpsts, existing.cmpsts, bytesz)
        self.ndx = existing.ndx
        self.lwi = existing.lwi
        self.bsprmrps = DTypePointer[DType.uint8].alloc(self.sz)
        memcpy(self.bsprmrps, existing.bsprmrps, self.sz)
    fn __moveinit__(inout self, owned existing: Self):
        self.len = existing.len
        self.cmpsts = existing.cmpsts
        self.sz = existing.sz
        self.ndx = existing.ndx
        self.lwi = existing.lwi
        self.bsprmrps = existing.bsprmrps
    fn countPrimes(self) -> Int:
        if self.len <= cBufferBits: return len(SoEOdds(2 * self.len + 1))
        var cnt = 1; var lwi = 0
        let cmpsts = DTypePointer[DType.uint8].alloc(cBufferSize)
        memset_zero(cmpsts, cBufferSize)
        cullPage(0, cBufferBits, cmpsts, self.bsprmrps)
        while lwi + cBufferBits <= self.len:
            cnt += countPagePrimes(cmpsts, cBufferBits)
            lwi += cBufferBits
            memset_zero(cmpsts, cBufferSize)
            let lmt = lwi + cBufferBits if lwi + cBufferBits <= self.len else self.len
            cullPage(lwi, lmt, cmpsts, self.bsprmrps)
        cnt += countPagePrimes(cmpsts, self.len - lwi)
        return cnt
    fn __len__(self: Self) -> Int: return self.sz
    fn __iter__(self: Self) -> Self: return self
    @always_inline
    fn __next__(inout self: Self) -> Int: # don't count number of primes by interating - slooow
        if self.ndx < 0:
            self.ndx = 0; self.lwi = 0
            if self.len < 2: self.sz = 0
            elif self.len <= cBufferBits:
                let bytesz = ((self.len + 63) & -64) >> 3 # round up to nearest 64 bit boundary
                memset_zero(self.cmpsts, bytesz)
                for i in range(self.len):
                    let s = (i + i) * (i + 3) + 3
                    if s >= self.len: break
                    if (self.cmpsts[i >> 3] >> (i & 7)) & 1 != 0: continue
                    let bp = i + i + 3
                    cullPass(self.cmpsts, bytesz, s, bp)
            else:
                memset_zero(self.cmpsts, cBufferSize)
                cullPage(0, cBufferBits, self.cmpsts, self.bsprmrps)
            return 2
        let rslt = ((self.lwi + self.ndx) << 1) + 3; self.ndx += 1
        if self.lwi + cBufferBits >= self.len:
            while (self.lwi + self.ndx < self.len) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
                self.ndx += 1
        else:
            while (self.ndx < cBufferBits) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
                self.ndx += 1
            while (self.ndx >= cBufferBits) and (self.lwi + cBufferBits <= self.len):
                self.ndx = 0; self.lwi += cBufferBits; memset_zero(self.cmpsts, cBufferSize)
                let lmt = self.lwi + cBufferBits if self.lwi + cBufferBits <= self.len else self.len
                cullPage(self.lwi, lmt, self.cmpsts, self.bsprmrps)
                let buflmt = cBufferBits if self.lwi + cBufferBits <= self.len else self.len - self.lwi
                while (self.ndx < buflmt) and ((self.cmpsts[self.ndx >> 3] >> (self.ndx & 7)) & 1 != 0):
                    self.ndx += 1
        if self.lwi + self.ndx >= self.len: self.sz = 0
        return rslt

fn main():
    print("The primes to 100 are:")
    for prm in SoEOddsPaged(100): print_no_newline(prm, " ")
    print()
    let strt0 = now()
    let answr0 = SoEOddsPaged(1_000_000).countPrimes()
    let elpsd0 = (now() - strt0) / 1000000
    print("Found", answr0, "primes up to 1,000,000 in", elpsd0, "milliseconds.")
    let strt1 = now()
    let answr1 = SoEOddsPaged(cLIMIT).countPrimes()
    let elpsd1 = (now() - strt1) / 1000000
    print("Found", answr1, "primes up to", cLIMIT, "in", elpsd1, "milliseconds.")

Output:

The primes to 100 are:
2  3  5  7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97  
Found 78498 primes up to 1,000,000 in 0.084122000000000002 milliseconds.
Found 50847534 primes up to 1000000000 in 139.509275 milliseconds.

This was tested on the same computer as the previous Mojo versions. Note that the time now scales quite well with range since there are no longer the huge RAM access time bottleneck's. This version is only about 2.25 times slower than Kim Walich's primesieve program written in C++ and the mostly constant factor difference will be made up if one adds wheel factorization to the same level as he uses (basic wheel factorization ratio of 48/105 plus some other more minor optimizations). This version can count the number of primes to 1e11 in about 21.85 seconds on this machine. It will work reasonably efficiently up to a range of about 1e14 before other optimization techniques such as "bucket sieving" should be used.

For counting the number of primes to a billion (1e9), this version has reduced the time by about a factor of 40 from the original version and over eight times from the odds-only version above. Adding wheel factorization will make it almost two and a half times faster yet for a gain in speed of about a hundred times over the original version.

MUMPS

ERATO1(HI)
 ;performs the Sieve of Erotosethenes up to the number passed in.
 ;This version sets an array containing the primes
 SET HI=HI\1
 KILL ERATO1 ;Don't make it new - we want it to remain after we quit the function
 NEW I,J,P
 FOR I=2:1:(HI**.5)\1 FOR J=I*I:I:HI SET P(J)=1
 FOR I=2:1:HI S:'$DATA(P(I)) ERATO1(I)=I
 KILL I,J,P
 QUIT

Example:

USER>SET MAX=100,C=0 DO ERATO1^ROSETTA(MAX) 
USER>WRITE !,"PRIMES BETWEEN 1 AND ",MAX,! FOR  SET I=$ORDER(ERATO1(I)) Q:+I<1  WRITE I,", "

PRIMES BETWEEN 1 AND 100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,79, 83, 89, 97,

Neko

/* The Computer Language Shootout
   http://shootout.alioth.debian.org/

   contributed by Nicolas Cannasse
*/
fmt = function(i) {
        var s = $string(i);
        while( $ssize(s) < 8 )
                s = " "+s;
        return s;
}
nsieve = function(m) {
        var a = $amake(m);
        var count = 0;
        var i = 2;
        while( i < m ) {
                if $not(a[i]) {
                        count += 1;
                        var j = (i << 1);
                        while( j < m ) {
                                if( $not(a[j]) ) a[j] = true;
                                j += i;
                        }
                }
                i += 1;
        }
        $print("Primes up to ",fmt(m)," ",fmt(count),"\n");
}

var n = $int($loader.args[0]);
if( n == null ) n = 2;
var i = 0;
while( i < 3 ) {
        nsieve(10000 << (n - i));
        i += 1;
}

Output:

prompt$ nekoc nsieve.neko
prompt$ time -p neko nsieve.n
Primes up to    40000     4203
Primes up to    20000     2262
Primes up to    10000     1229
real 0.02
user 0.01
sys 0.00

NetRexx

Version 1 (slow)

/* NetRexx */

options replace format comments java crossref savelog symbols binary

parse arg loWatermark hiWatermark .
if loWatermark = '' | loWatermark = '.' then loWatermark = 1
if hiWatermark = '' | hiWatermark = '.' then hiWatermark = 200

do
  if \loWatermark.datatype('w') | \hiWatermark.datatype('w') then -
    signal NumberFormatException('arguments must be whole numbers')
  if loWatermark > hiWatermark then -
    signal IllegalArgumentException('the start value must be less than the end value')

  seive = sieveOfEratosthenes(hiWatermark)
  primes = getPrimes(seive, loWatermark, hiWatermark).strip

  say 'List of prime numbers from' loWatermark 'to' hiWatermark 'via a "Sieve of Eratosthenes" algorithm:'
  say '  'primes.changestr(' ', ',')
  say '  Count of primes:' primes.words
catch ex = Exception
  ex.printStackTrace
end

return

method sieveOfEratosthenes(hn = long) public static binary returns Rexx

  sv = Rexx(isTrue)
  sv[1] = isFalse
  ix = long
  jx = long

  loop ix = 2 while ix * ix <= hn
    if sv[ix] then loop jx = ix * ix by ix while jx <= hn
      sv[jx] = isFalse
      end jx
    end ix

  return sv

method getPrimes(seive = Rexx, lo = long, hi = long) private constant binary returns Rexx

  primes = Rexx('')
  loop p_ = lo to hi
    if \seive[p_] then iterate p_
    primes = primes p_
    end p_

  return primes

method isTrue public constant binary returns boolean
  return 1 == 1

method isFalse public constant binary returns boolean
  return \isTrue

Output

List of prime numbers from 1 to 200 via a "Sieve of Eratosthenes" algorithm:
  2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199
  Count of primes: 46

Version 2 (significantly, i.e. 10 times faster)

/* NetRexx ************************************************************
* Essential improvements:Use boolean instead of Rexx for sv
*                        and remove methods isTrue and isFalse
* 24.07.2012 Walter Pachl courtesy Kermit Kiser
**********************************************************************/

options replace format comments java crossref savelog symbols binary

parse arg loWatermark hiWatermark .
if loWatermark = '' | loWatermark = '.' then loWatermark = 1
if hiWatermark = '' | hiWatermark = '.' then hiWatermark = 200000

startdate=Date Date()
do
  if \loWatermark.datatype('w') | \hiWatermark.datatype('w') then -
    signal NumberFormatException('arguments must be whole numbers')
  if loWatermark > hiWatermark then -
    signal IllegalArgumentException(-
                 'the start value must be less than the end value')
  sieve = sieveOfEratosthenes(hiWatermark)
  primes = getPrimes(sieve, loWatermark, hiWatermark).strip
  if hiWatermark = 200 Then do
    say 'List of prime numbers from' loWatermark 'to' hiWatermark
    say '  'primes.changestr(' ', ',')
  end
catch ex = Exception
  ex.printStackTrace
end
enddate=Date Date()
Numeric Digits 20
say (enddate.getTime-startdate.getTime)/1000 'seconds elapsed'
say '  Count of primes:' primes.words

return

method sieveOfEratosthenes(hn = int) -
                                  public static binary returns boolean[]
  true  = boolean 1
  false = boolean 0
  sv = boolean[hn+1]
  sv[1] = false

  ix = int
  jx = int

  loop ix=2 to hn
    sv[ix]=true
    end ix

  loop ix = 2 while ix * ix <= hn
    if sv[ix] then loop jx = ix * ix by ix while jx <= hn
      sv[jx] = false
      end jx
    end ix

  return sv

method getPrimes(sieve = boolean[], lo = int, hi = int) -
                                    private constant binary Returns Rexx
  p_ = int
  primes = Rexx('')
  loop p_ = lo to hi
    if \sieve[p_] then iterate p_
    primes = primes p_
    end p_

  return primes

newLISP

This example is incorrect. Please fix the code and remove this message.

Details: This version uses rem (division) testing and so is a trial division algorithm, not a sieve of Eratosthenes.

This version is maybe a little different because it no longer stores the primes after they've been generated and sent to the main output. Lisp has very convenient list editing, so we don't really need the Boolean flag arrays you'd tend find in the Algol-like languages. We can just throw away the multiples of each prime from an initial list of integers. The implementation is easier if we delete every multiple, including the prime number itself: the list always contains only the numbers that haven't been processed yet, starting with the next prime, and the program is finished when the list becomes empty.

Note that the lambda expression in the following script does not involve a closure; newLISP has dynamic scope, so it matters that the same variable names will not be reused for some other purpose (at runtime) before the anonymous function is called.

(set 'upper-bound 1000)

; The initial sieve is a list of all the numbers starting at 2.
(set 'sieve (sequence 2 upper-bound))

; Keep working until the list is empty.
(while sieve

	; The first number in the list is always prime
	(set 'new-prime (sieve 0))
	(println new-prime)

	; Filter the list leaving only the non-multiples of each number.
	(set 'sieve
		(filter
			(lambda (each-number)
				(not (zero? (% each-number new-prime))))
			sieve)))

(exit)

Output:

Nial

This example is incorrect. Please fix the code and remove this message.

Details: It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.

primes is sublist [ each (2 = sum eachright (0 = mod) [pass,count]), pass ] rest count

Using it

|primes 10
=2 3 5 7

Nim

from math import sqrt
 
iterator primesUpto(limit: int): int =
  let sqrtLimit = int(sqrt(float64(limit)))
  var composites = newSeq[bool](limit + 1)
  for n in 2 .. sqrtLimit: # cull to square root of limit
    if not composites[n]: # if prime -> cull its composites
      for c in countup(n * n, limit, n): # start at ``n`` squared
        composites[c] = true
  for n in 2 .. limit: # separate iteration over results
    if not composites[n]:
      yield n
 
stdout.write "The primes up to 100 are:  "
for x in primesUpto(100):
   stdout.write(x, " ")
echo()
 
var count = 0
for p in primesUpto(1000000):
  count += 1
echo "There are ", count, " primes up to 1000000."

Output:

Primes are:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to 1000000.

Alternate odds-only bit-packed version

The above version wastes quite a lot of memory by using a sequence of boolean values to sieve the composite numbers and sieving all numbers when two is the only even prime. The below code uses a bit-packed sequence to save a factor of eight in memory and also sieves only odd primes for another memory saving by a factor of two; it is also over two and a half times faster due to reduced number of culling operations and better use of the CPU cache as a little cache goes a lot further - this better use of cache is more than enough to make up for the extra bit-packing shifting operations:

iterator isoe_upto(top: uint): uint =
  let topndx = int((top - 3) div 2)
  let sqrtndx = (int(sqrt float64(top)) - 3) div 2
  var cmpsts = newSeq[uint32](topndx div 32 + 1)
  for i in 0 .. sqrtndx: # cull composites for primes
    if (cmpsts[i shr 5] and (1u32 shl (i and 31))) == 0:
      let p = i + i + 3
      for j in countup((p * p - 3) div 2, topndx, p):
        cmpsts[j shr 5] = cmpsts[j shr 5] or (1u32 shl (j and 31))
  yield 2 # separate culling above and iteration here
  for i in 0 .. topndx:
    if (cmpsts[i shr 5] and (1u32 shl (i and 31))) == 0:
      yield uint(i + i + 3)

The above code can be used with the same output functions as in the first code, just replacing the name of the iterator "iprimes_upto" with this iterator's name "isoe_upto" in two places. The output will be identical.

Nim Unbounded Versions

For many purposes, one doesn't know the exact upper limit desired to easily use the above versions; in addition, those versions use an amount of memory proportional to the range sieved. In contrast, unbounded versions continuously update their range as they progress and only use memory proportional to the secondary base primes stream, which is only proportional to the square root of the range. One of the most basic functional versions is the TreeFolding sieve which is based on merging lazy streams as per Richard Bird's contribution to incremental sieves in Haskell, but which has a much better asymptotic execution complexity due to the added tree folding. The following code is a version of that in Nim (odds-only):

import sugar
from times import epochTime

type PrimeType = int
iterator primesTreeFolding(): PrimeType {.closure.} =
  # needs a Co Inductive Stream - CIS...
  type
    CIS[T] = ref object
      head: T
      tail: () -> CIS[T]

  proc merge(xs, ys: CIS[PrimeType]): CIS[PrimeType] =
    let x = xs.head;
    let y = ys.head
    if x < y:
      CIS[PrimeType](head: x, tail: () => merge(xs.tail(), ys))
    elif y < x:
      CIS[PrimeType](
        head: y,
        tail: () => merge(xs, ys.tail()))
    else:
      CIS[PrimeType](
        head: x,
        tail: () => merge(xs.tail(), ys.tail()))

  proc pmults(p: PrimeType): CIS[PrimeType] =
    let inc = p + p
    proc mlts(c: PrimeType): CIS[PrimeType] =
      CIS[PrimeType](head: c, tail: () => mlts(c + inc))
    mlts(p * p)

  proc allmults(ps: CIS[PrimeType]): CIS[CIS[PrimeType]] =
    CIS[CIS[PrimeType]](
      head: pmults(ps.head),
      tail: () => allmults(ps.tail()))

  proc pairs(css: CIS[CIS[PrimeType]]): CIS[CIS[PrimeType]] =
    let cs0 = css.head;
    let rest0 = css.tail()
    CIS[CIS[PrimeType]](
      head: merge(cs0, rest0.head),
      tail: () => pairs(rest0.tail()))

  proc cmpsts(css: CIS[CIS[PrimeType]]): CIS[PrimeType] =
    let cs0 = css.head
    CIS[PrimeType](
      head: cs0.head,
      tail: () => merge(cs0.tail(), css.tail().pairs.cmpsts))

  proc minusAt(n: PrimeType, cs: CIS[PrimeType]): CIS[PrimeType] =
    var nn = n;
    var ncs = cs
    while nn >= ncs.head:
      nn += 2;
      ncs = ncs.tail()
    CIS[PrimeType](head: nn, tail: () => minusAt(nn + 2, ncs))

  proc oddprms(): CIS[PrimeType] =
    CIS[PrimeType](
      head: 3.PrimeType,
      tail: () => minusAt(5.PrimeType, oddprms().allmults.cmpsts))

  var prms = CIS[PrimeType](head: 2.PrimeType, tail: () => oddprms())
  while true:
    yield prms.head;
    prms = prms.tail()

stdout.write "The first 25 primes are:  "
var counter = 0
for p in primesTreeFolding():
  if counter >= 25: break
  stdout.write(p, " "); counter += 1
echo()

let start = epochTime()
counter = 0
for p in primesTreeFolding():
  if p > 1000000: break
  else: counter += 1
let elapsed = epochTime() - start
echo "There are ", counter, " primes up to 1000000."
echo "This test took ", elapsed, " seconds."

Output:

The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
There are 78498 primes up to 1000000.
This test took 0.2780287265777588 seconds.

With Nim 1.4, it takes about 0.4s (in release or danger mode) to compute the primes until one million, better than the time needed in previous versions. With option "--gc arc", this time drops to 0.28s on a small laptop. This is still slow compared to bound algorithm which is due to the many small memory allocations/de-allocations required, which is a characteristic of functional forms of code. Is is purely functional in that everything is immutable other than that Nim does not have Tail Call Optimization (TCO) so that we can freely use function recursion with no execution time cost; therefore, where necessary this is implemented with imperative loops, which is what TCO is generally turned into such forms "under the covers". It is also slow due to the algorithm being only O(n (log n) (log (log n))) rather than without the extra "log n" factor as some version have. This slowness makes it only moderately useful for ranges up to a few million.

Since the algorithm does not require the memoization of a full lazy list, it uses an internal Co Inductive Stream of deferred execution states, finally outputting an iterator to enumerate over the lazily computed stream of primes.

A faster alternative using a mutable hash table (odds-only)

To show the cost of functional forms of code, the following code is written embracing mutability, both by using a mutable hash table to store the state of incremental culling by the secondary stream of base primes and by using mutable values to store the state wherever possible, as per the following code:

import tables, times

type PrimeType = int
proc primesHashTable(): iterator(): PrimeType {.closure.} =
  iterator output(): PrimeType {.closure.} =
    # some initial values to avoid race and reduce initializations...
    yield 2.PrimeType; yield 3.PrimeType; yield 5.PrimeType; yield 7.PrimeType
    var h = initTable[PrimeType,PrimeType]()
    var n = 9.PrimeType
    let bps = primesHashTable()
    var bp = bps()  # advance past 2
    bp = bps()
    var q = bp * bp # to initialize with 3
    while true:
      if n >= q:
        let incr = bp + bp
        h[n + incr] = incr
        bp = bps()
        q = bp * bp
      elif h.hasKey(n):
        var incr: PrimeType
        discard h.take(n, incr)
        var nxt = n + incr
        while h.hasKey(nxt):
          nxt += incr # ensure no duplicates
        h[nxt] = incr
      else:
        yield n
      n += 2.PrimeType
  output

stdout.write "The first 25 primes are:  "
var counter = 0
var iter = primesHashTable()
for p in iter():
  if counter >= 25:
    break
  else:
    stdout.write(p, " ")
    counter += 1
echo ""
let start = epochTime()
counter = 0
iter = primesHashTable()
for p in iter():
  if p > 1000000: break
  else: counter += 1
let elapsed = epochTime() - start
echo "The number of primes up to a million is:  ", counter
stdout.write("This test took ", elapsed, " seconds.\n")

Output:

Time for version compiled with “-d:danger” option.

The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
The number of primes up to a million is:  78498
This test took 0.05106830596923828 seconds.

The output is identical to the first unbounded version, other than, in danger mode, it is over about eight times faster sieving to a million. For larger ranges it will continue to pull further ahead of the above version due to only O(n (log (log n))) performance because of the hash table having an average of O(1) access, and it is only so slow due to the large constant overhead of doing the hashing calculations and look-ups.

Very fast Page Segmented version using a bit-packed mutable array (odds-only)

Note: This version is used as a very fast alternative in Extensible_prime_generator#Nim

For the highest speeds, one needs to use page segmented mutable arrays as in the bit-packed version here:

# a Page Segmented Odd-Only Bit-Packed Sieve of Eratosthenes...

from times import epochTime # for testing
from bitops import popCount

type Prime = uint64

let LIMIT = 1_000_000_000.Prime
let CPUL1CACHE = 16384 # in bytes

const FRSTSVPRM = 3.Prime

type
  BasePrime = uint32
  BasePrimeArray = seq[BasePrime]
  SieveBuffer = seq[byte] # byte size gives the most potential efficiency...

# define a general purpose lazy list to use as secondary base prime arrays feed
# NOT thread safe; needs a Mutex gate to make it so, but not threaded (yet)...
type
  BasePrimeArrayLazyList = ref object
    head: BasePrimeArray
    tailf: proc (): BasePrimeArrayLazyList {.closure.}
    tail: BasePrimeArrayLazyList
template makeBasePrimeArrayLazyList(hd: BasePrimeArray;
                      body: untyped): untyped = # factory constructor
  let thnk = proc (): BasePrimeArrayLazyList {.closure.} = body
  BasePrimeArrayLazyList(head: hd, tailf: thnk)
proc rest(lzylst: sink BasePrimeArrayLazyList): BasePrimeArrayLazyList {.inline.} =
  if lzylst.tailf != nil: lzylst.tail = lzylst.tailf(); lzylst.tailf = nil
  return lzylst.tail
iterator items(lzylst: BasePrimeArrayLazyList): BasePrime {.inline.} =
  var ll = lzylst
  while ll != nil:
    for bp in ll.head: yield bp
    ll = ll.rest

# count the number of zero bits (primes) in a SieveBuffer,
# uses native popCount for extreme speed;
# counts up to the bit index of the last bit to be counted...
proc countSieveBuffer(lsti: int; cmpsts: SieveBuffer): int =
  let lstw = (lsti shr 3) and -8; let lstm = lsti and 63 # last word and bit index!
  result = (lstw shl 3) + 64 # preset for all ones!
  let cmpstsa = cast[int](cmpsts[0].unsafeAddr)
  let cmpstslsta = cmpstsa + lstw
  for csa in countup(cmpstsa, cmpstslsta - 1, 8):
    result -= cast[ptr uint64](csa)[].popCount # subtract number of found ones!
  let msk = (0'u64 - 2'u64) shl lstm # mask for the unused bits in last word!
  result -= (cast[ptr uint64](cmpstslsta)[] or msk).popCount
 
# a fast fill SieveBuffer routine using pointers...
proc fillSieveBuffer(sb: var SieveBuffer) = zeroMem(sb[0].unsafeAddr, sb.len)

const BITMASK = [1'u8, 2, 4, 8, 16, 32, 64, 128] # faster than shifting!

# do sieving work, based on low starting value for the given buffer and
# the given lazy list of base prime arrays...
proc cullSieveBuffer(lwi: int; bpas: BasePrimeArrayLazyList;
                               sb: var SieveBuffer) =
  let len = sb.len; let szbits = len shl 3; let nxti = lwi + szbits
  for bp in bpas:
    let bpwi = ((bp.Prime - FRSTSVPRM) shr 1).int
    var s = (bpwi shl 1) * (bpwi + FRSTSVPRM.int) + FRSTSVPRM.int
    if s >= nxti: break
    if s >= lwi: s -= lwi
    else:
      let r = (lwi - s) mod bp.int
      s = (if r == 0: 0 else: bp.int - r)
    let clmt = szbits - (bp.int shl 3)
#    if len == CPUL1CACHE: continue
    if s < clmt:
      let slmt = s + (bp.int shl 3)
      while s < slmt:
        let msk = BITMASK[s and 7]
        for c in countup(s shr 3, len - 1, bp.int):
          sb[c] = sb[c] or msk
        s += bp.int
      continue
    while s < szbits:
      let w = s shr 3; sb[w] = sb[w] or BITMASK[s and 7]; s += bp.int # (1'u8 shl (s and 7))

proc makeBasePrimeArrays(): BasePrimeArrayLazyList # forward reference!

# an iterator over successive sieved buffer composite arrays,
# returning whatever type the cnvrtr produces from
# the low index and the culled SieveBuffer...
proc makePrimePages[T](
    strtwi, sz: int; cnvrtrf: proc (li: int; sb: var SieveBuffer): T {.closure.}
      ): (iterator(): T {.closure.}) =
  var lwi = strtwi; let bpas = makeBasePrimeArrays(); var cmpsts = newSeq[byte](sz)
  return iterator(): T {.closure.} =
    while true:
      fillSieveBuffer(cmpsts); cullSieveBuffer(lwi, bpas, cmpsts)
      yield cnvrtrf(lwi, cmpsts); lwi += cmpsts.len shl 3
 
# starts the secondary base primes feed with minimum size in bits set to 4K...
# thus, for the first buffer primes up to 8293,
# the seeded primes easily cover it as 97 squared is 9409.
proc makeBasePrimeArrays(): BasePrimeArrayLazyList =
  # converts an entire sieved array of bytes into an array of base primes,
  # to be used as a source of base primes as part of the Lazy List...
  proc sb2bpa(li: int; sb: var SieveBuffer): BasePrimeArray =
    let szbits = sb.len shl 3; let len = countSieveBuffer(szbits - 1, sb)
    result = newSeq[BasePrime](len); var j = 0
    for i in 0 ..< szbits:
      if (sb[i shr 3] and BITMASK[i and 7]) == 0'u8:
        result[j] = FRSTSVPRM.BasePrime + ((li + i) shl 1).BasePrime; j.inc
  proc nxtbparr(
      pgen: iterator (): BasePrimeArray {.closure.}): BasePrimeArrayLazyList =
    return makeBasePrimeArrayLazyList(pgen()): nxtbparr(pgen)
  # pre-seeding first array breaks recursive race,
  # dummy primes of all odd numbers starting at FRSTSVPRM (unculled)...
  var cmpsts = newSeq[byte](512)
  let dummybparr = sb2bpa(0, cmpsts)
  let fakebps = makeBasePrimeArrayLazyList(dummybparr): nil # used just once here!
  cullSieveBuffer(0, fakebps, cmpsts)
  return makeBasePrimeArrayLazyList(sb2bpa(0, cmpsts)):
    nxtbparr(makePrimePages(4096, 512, sb2bpa)) # lazy recursive call breaks race!
 
# iterator over primes from above page iterator;
# takes at least as long to enumerate the primes as sieve them...
iterator primesPaged(): Prime {.inline.} =
  yield 2
  proc mkprmarr(li: int; sb: var SieveBuffer): seq[Prime] =
    let szbits = sb.len shl 3; let low = FRSTSVPRM + (li + li).Prime; var j = 0
    let len = countSieveBuffer(szbits - 1, sb); result = newSeq[Prime](len)
    for i in 0 ..< szbits:
      if (sb[i shr 3] and BITMASK[i and 7]) == 0'u8:
        result[j] = low + (i + i).Prime; j.inc
  let gen = makePrimePages(0, CPUL1CACHE, mkprmarr)
  for prmpg in gen():
    for prm in prmpg: yield prm
 
proc countPrimesTo(range: Prime): int64 =
  if range < FRSTSVPRM: return (if range < 2: 0 else: 1)
  result = 1; let rngi = ((range - FRSTSVPRM) shr 1).int
  proc cntr(li: int; sb: var SieveBuffer): (int, int) {.closure.} =
    let szbits = sb.len shl 3; let nxti = li + szbits; result = (0, nxti)
    if nxti <= rngi: result[0] += countSieveBuffer(szbits - 1, sb)
    else: result[0] += countSieveBuffer(rngi - li, sb)
  let gen = makePrimePages(0, CPUL1CACHE, cntr)
  for count, nxti in gen():
    result += count; if nxti > rngi: break

# showing results...
echo "Page Segmented Bit-Packed Odds-Only Sieve of Eratosthenes"
echo "Needs at least ", CPUL1CACHE, " bytes of CPU L1 cache memory.\n"

stdout.write "First 25 primes:  "
var counter0 = 0
for p in primesPaged():
  if counter0 >= 25: break
  stdout.write(p, " "); counter0.inc
echo ""

stdout.write "The number of primes up to a million is:  "
var counter1 = 0
for p in primesPaged():
  if p > 1_000_000.Prime: break else: counter1.inc
stdout.write counter1, " - these both found by (slower) enumeration.\n"

let start = epochTime()
#[ # slow way to count primes takes as long to enumerate as sieve!
var counter = 0
for p in primesPaged():
  if p > LIMIT: break else: counter.inc
# ]#
let counter = countPrimesTo LIMIT # the fast way using native popCount!
let elpsd = epochTime() - start

echo "Found ", counter, " primes up to ", LIMIT, " in ", elpsd, " seconds."

Output:

Time is obtained with Nim 1.4 with options -d:danger --gc:arc.

Page Segmented Bit-Packed Odds-Only Sieve of Eratosthenes
Needs at least 16384 bytes of CPU L1 cache memory.

First 25 primes:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
The number of primes up to a million is:  78498 - these both found by (slower) enumeration.
Found 50847534 primes up to 1000000000 in 0.7931935787200928 seconds.

The above version approaches a hundred times faster than the incremental style versions above due to the high efficiency of direct mutable memory operations in modern CPU's, and is useful for ranges of billions. This version maintains its efficiency using the CPU L1 cache to a range of over 16 billion and then gets a little slower for ranges of a trillion or more using the CPU's L2 cache. It takes an average of only about 3.5 CPU clock cycles per composite number cull, or about 70 CPU clock cycles per prime found.

Note that the fastest performance is realized by using functions that directly manipulate the output "seq" (array) of culled bit number representations such as the `countPrimesTo` function provided, as enumeration using the `primesPaged` iterator takes about as long to enumerate the found primes as it takes to cull the composites.

Many further improvements in speed can be made, as in tuning the medium ranges to more efficiently use the CPU caches for an improvement in the middle ranges of up to a factor of about two, full maximum wheel factorization for a further improvement of about four, extreme loop unrolling for a further improvement of approximately two, multi-threading for an improvement of the factor of effective CPU cores used, etc. However, these improvements are of little point when used with enumeration; for instance, if one successfully reduced the time to sieve the composite numbers to zero, it would still take about a second just to enumerate the resulting primes over a range of a billion.

Niue

This example is incorrect. Please fix the code and remove this message.

Details: It uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.

[ dup 2 < ] '<2 ;
[ 1 + 'count ; [ <2 [ , ] when ] count times ] 'fill-stack ;

0 'n ; 0 'v ;

[ .clr 0 'n ; 0 'v ; ] 'reset ;
[ len 1 - n - at 'v ; ] 'set-base ;
[ n 1 + 'n ; ] 'incr-n ;
[ mod 0 = ] 'is-factor ;
[ dup * ] 'sqr ;

[ set-base
  v sqr 2 at > not 
  [ [ dup v = not swap v is-factor and ] remove-if incr-n run ] when ] 'run ;

[ fill-stack run ] 'sieve ;

( tests )

10 sieve .s ( => 2 3 5 7 9 ) reset newline
30 sieve .s ( => 2 3 5 7 11 13 17 19 23 29 )

Oberon-2

MODULE Primes;

   IMPORT Out, Math;

   CONST N = 1000;

   VAR a: ARRAY N OF BOOLEAN;
      i, j, m: INTEGER;

BEGIN
   (* Set all elements of a to TRUE. *)
   FOR i := 1 TO N - 1 DO
      a[i] := TRUE;
   END;

   (* Compute square root of N and convert back to INTEGER. *)
   m := ENTIER(Math.Sqrt(N));

   FOR i := 2 TO m DO
      IF a[i] THEN
         FOR j := 2 TO (N - 1) DIV i DO 
            a[i*j] := FALSE;
         END;
      END;
   END;

   (* Print all the elements of a that are TRUE. *)
   FOR i := 2 TO N - 1 DO
      IF a[i] THEN
         Out.Int(i, 4);
      END;
   END;
   Out.Ln;
END Primes.

OCaml

Imperative

let sieve n =
  let is_prime = Array.create n true in
  let limit = truncate(sqrt (float (n - 1))) in
  for i = 2 to limit do
    if is_prime.(i) then
      let j = ref (i*i) in
      while !j < n do
        is_prime.(!j) <- false;
        j := !j + i;
      done
  done;
  is_prime.(0) <- false;
  is_prime.(1) <- false;
  is_prime

let primes n =
  let primes, _ =
    let sieve = sieve n in
    Array.fold_right
      (fun is_prime (xs, i) -> if is_prime then (i::xs, i-1) else (xs, i-1))
      sieve
      ([], Array.length sieve - 1)
  in
  primes

in the top-level:

# primes 100 ;;
- : int list =
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71;
 73; 79; 83; 89; 97]

Functional

(* first define some iterators *)
let fold_iter f init a b =
  let rec aux acc i =
    if i > b
    then (acc)
    else aux (f acc i) (succ i)
  in
  aux init a
(* val fold_iter : ('a -> int -> 'a) -> 'a -> int -> int -> 'a *)

let fold_step f init a b step =
  let rec aux acc i =
    if i > b
    then (acc)
    else aux (f acc i) (i + step)
  in
  aux init a
(* val fold_step : ('a -> int -> 'a) -> 'a -> int -> int -> int -> 'a *)

(* remove a given value from a list *)
let remove li v =
  let rec aux acc = function
    | hd::tl when hd = v -> (List.rev_append acc tl)
    | hd::tl -> aux (hd::acc) tl
    | [] -> li
  in
  aux [] li
(* val remove : 'a list -> 'a -> 'a list *)

(* the main function *)
let primes n =
  let li =
    (* create a list [from 2; ... until n] *)
    List.rev(fold_iter (fun acc i -> (i::acc)) [] 2 n)
  in
  let limit = truncate(sqrt(float n)) in
  fold_iter (fun li i ->
      if List.mem i li  (* test if (i) is prime *)
      then (fold_step remove li (i*i) n i)
      else li)
    li 2 (pred limit)
(* val primes : int -> int list *)

in the top-level:

# primes 200 ;;
- : int list =
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71;
 73; 79; 83; 89; 97; 101; 103; 107; 109; 113; 127; 131; 137; 139; 149; 151;
 157; 163; 167; 173; 179; 181; 191; 193; 197; 199]

Another functional version

This uses zero to denote struck-out numbers. It is slightly inefficient as it strikes-out multiples above p rather than p²

let rec strike_nth k n l = match l with
  | [] -> []
  | h :: t ->
    if k = 0 then 0 :: strike_nth (n-1) n t
    else h :: strike_nth (k-1) n t
(* val strike_nth : int -> int -> int list -> int list *)

let primes n =
  let limit = truncate(sqrt(float n)) in
  let rec range a b = if a > b then [] else a :: range (a+1) b in
  let rec sieve_primes l = match l with
    | [] -> []
    | 0 :: t -> sieve_primes t
    | h :: t -> if h > limit then List.filter ((<) 0) l else
        h :: sieve_primes (strike_nth (h-1) h t) in
  sieve_primes (range 2 n)
(* val primes : int -> int list *)

in the top-level:

# primes 200;;
- : int list =
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71;
 73; 79; 83; 89; 97; 101; 103; 107; 109; 113; 127; 131; 137; 139; 149; 151;
 157; 163; 167; 173; 179; 181; 191; 193; 197; 199]

Oforth

: eratosthenes(n)
| i j |
   ListBuffer newSize(n) dup add(null) seqFrom(2, n) over addAll
   2 n sqrt asInteger for: i [
      dup at(i) ifNotNull: [ i sq n i step: j [ dup put(j, null) ] ]
      ]
   filter(#notNull) ;

Output:

>100 eratosthenes println
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Ol

(define all (iota 999 2))

(print
   (let main ((left '()) (right all))
      (if (null? right)
         (reverse left)
         (unless (car right)
            (main left (cdr right))
            (let loop ((l '()) (r right) (n 0) (every (car right)))
               (if (null? r)
                  (let ((l (reverse l)))
                     (main (cons (car l) left) (cdr l)))
                  (if (eq? n every)
                     (loop (cons #false l) (cdr r) 1 every)
                     (loop (cons (car r) l) (cdr r) (+ n 1) every)))))))
)

Output:

(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997)

ooRexx

/*ooRexx program generates & displays primes via the sieve of Eratosthenes.
*                       derived from first Rexx version
*                       uses an array rather than a stem for the list
*                       uses string methods rather than BIFs
*                       uses new ooRexx keyword LOOP, extended assignment
*                         and line comments
*                       uses meaningful variable names and restructures code
*                         layout for improved understandability
****************************************************************************/
  arg highest                       --get highest number to use.
  if \highest~datatype('W') then
    highest = 200                   --use default value.
  isPrime = .array~new(highest)     --container for all numbers.
  isPrime~fill(1)                   --assume all numbers are prime.
  w = highest~length                --width of the biggest number,
                                    --  it's used for aligned output.
  out1 = 'prime'~right(20)          --first part of output messages.
  np = 0                            --no primes so far.
  loop j = 2 for highest - 1        --all numbers up through highest.
    if isPrime[j] = 1 then do       --found one.
      np += 1                       --bump the prime counter.
      say out1 np~right(w) ' --> ' j~right(w)   --display output.
      loop m = j * j to highest by j
        isPrime[m] = ''             --strike all multiples: not prime.
      end
    end
  end
  say
  say np~right(out1~length + 1 + w) 'primes found up to and including ' highest
  exit

Output:

               prime   1  -->    2
               prime   2  -->    3
               prime   3  -->    5
               prime   4  -->    7
               prime   5  -->   11
               prime   6  -->   13
               prime   7  -->   17
               prime   8  -->   19
               prime   9  -->   23
               prime  10  -->   29
               prime  11  -->   31
               prime  12  -->   37
               prime  13  -->   41
               prime  14  -->   43
               prime  15  -->   47
               prime  16  -->   53
               prime  17  -->   59
               prime  18  -->   61
               prime  19  -->   67
               prime  20  -->   71
               prime  21  -->   73
               prime  22  -->   79
               prime  23  -->   83
               prime  24  -->   89
               prime  25  -->   97
               prime  26  -->  101
               prime  27  -->  103
               prime  28  -->  107
               prime  29  -->  109
               prime  30  -->  113
               prime  31  -->  127
               prime  32  -->  131
               prime  33  -->  137
               prime  34  -->  139
               prime  35  -->  149
               prime  36  -->  151
               prime  37  -->  157
               prime  38  -->  163
               prime  39  -->  167
               prime  40  -->  173
               prime  41  -->  179
               prime  42  -->  181
               prime  43  -->  191
               prime  44  -->  193
               prime  45  -->  197
               prime  46  -->  199

                      46 primes found up to and including  200

Wheel Version

/*ooRexx program generates primes via sieve of Eratosthenes algorithm.
*                       wheel version, 2 handled as special case
*                       loops optimized: outer loop stops at the square root of
*                         the limit, inner loop starts at the square of the
*                         prime just found
*                       use a list rather than an array and remove composites
*                         rather than just mark them
*                       convert list of primes to a list of output messages and
*                         display them with one say statement
*******************************************************************************/
    arg highest                             -- get highest number to use.
    if \highest~datatype('W') then
        highest = 200                       -- use default value.
    w = highest~length                      -- width of the biggest number,
                                            --  it's used for aligned output.
    thePrimes = .list~of(2)                 -- the first prime is 2.
    loop j = 3 to highest by 2              -- populate the list with odd nums.
        thePrimes~append(j)
    end

    j = 3                                   -- first prime (other than 2)
    ix = thePrimes~index(j)                 -- get the index of 3 in the list.
    loop while j*j <= highest               -- strike multiples of odd ints.
                                            --  up to sqrt(highest).
        loop jm = j*j to highest by j+j     -- start at J squared, incr. by 2*J.
            thePrimes~removeItem(jm)        -- delete it since it's composite.
        end
        ix = thePrimes~next(ix)             -- the index of the next prime.
        j = thePrimes[ix]                   -- the next prime.
    end
    np = thePrimes~items                    -- the number of primes since the
                                            --  list is now only primes.
    out1 = '           prime number'        -- first part of output messages.
    out2 = ' --> '                          -- middle part of output messages.
    ix = thePrimes~first
    loop n = 1 to np                        -- change the list of primes
                                            --  to output messages.
        thePrimes[ix] = out1 n~right(w) out2 thePrimes[ix]~right(w)
        ix = thePrimes~next(ix)
    end
    last = np~right(out1~length+1+w) 'primes found up to and including ' highest
    thePrimes~append(.endofline || last)    -- add blank line and summary line.
    say thePrimes~makearray~toString        -- display the output.
    exit

Output:

when using the limit of 100

           prime number    1  -->     2
           prime number    2  -->     3
           prime number    3  -->     5
           prime number    4  -->     7
           prime number    5  -->    11
           prime number    6  -->    13
           prime number    7  -->    17
           prime number    8  -->    19
           prime number    9  -->    23
           prime number   10  -->    29
           prime number   11  -->    31
           prime number   12  -->    37
           prime number   13  -->    41
           prime number   14  -->    43
           prime number   15  -->    47
           prime number   16  -->    53
           prime number   17  -->    59
           prime number   18  -->    61
           prime number   19  -->    67
           prime number   20  -->    71
           prime number   21  -->    73
           prime number   22  -->    79
           prime number   23  -->    83
           prime number   24  -->    89
           prime number   25  -->    97

                          25 primes found up to and including  100

Oz

Translation of: Haskell

declare
  fun {Sieve N}
     S = {Array.new 2 N true}
     M = {Float.toInt {Sqrt {Int.toFloat N}}}
  in
     for I in 2..M do
	if S.I then
	   for J in I*I..N;I do
	      S.J := false
	   end
	end
     end
     S
  end

  fun {Primes N}
     S = {Sieve N}
  in
     for I in 2..N collect:C do
	if S.I then {C I} end
     end
  end
in
  {Show {Primes 30}}

PARI/GP

Eratosthenes(lim)={
  my(v=Vecsmall(lim\1,unused,1));
  forprime(p=2,sqrt(lim),
    forstep(i=p^2,lim,p,
      v[i]=0
    )
  );
  for(i=1,lim,if(v[i],print1(i", ")))
};

An alternate version:

Sieve(n)=
{
v=vector(n,unused,1);
for(i=2,sqrt(n),
    if(v[i],
       forstep(j=i^2,n,i,v[j]=0)));
for(i=2,n,if(v[i],print1(i",")))
};

Pascal

Note: Some Pascal implementations put quite low limits on the size of a set (e.g. Turbo Pascal doesn't allow more than 256 members). To compile on such an implementation, reduce the constant PrimeLimit accordingly.

program primes(output)

const
 PrimeLimit = 1000;

var
 primes: set of 1 .. PrimeLimit;
 n, k: integer;
 needcomma: boolean;

begin
 { calculate the primes }
 primes := [2 .. PrimeLimit];
 for n := 1 to trunc(sqrt(PrimeLimit)) do
  begin
   if n in primes
    then
     begin
      k := n*n;
      while k < PrimeLimit do
       begin
        primes := primes - [k];
        k := k + n
       end
     end
  end;

  { output the primes }
  needcomma := false;
  for n := 1 to PrimeLimit do
   if n in primes
    then
     begin
      if needcomma
       then
        write(', ');
      write(n);
      needcomma := true
     end
end.

alternative using wheel

Using growing wheel to fill array for sieving for minimal unmark operations. Sieving only with possible-prime factors.

program prim(output);
//Sieve of Erathosthenes with fast elimination of multiples of small primes
{$IFNDEF FPC}
  {$APPTYPE CONSOLE}
{$ENDIF}
const
  PrimeLimit = 100*1000*1000;//1;
type
  tLimit = 1..PrimeLimit;
var
  //always initialized with 0 => false at startup
  primes: array [tLimit] of boolean;

function BuildWheel: longInt;
//fill primfield with no multiples of small primes
//returns next sieveprime
//speedup ~1/3
var
  //wheelprimes = 2,3,5,7,11... ;
  //wheelsize = product [i= 0..wpno-1]wheelprimes[i] > Uint64 i> 13
  wheelprimes :array[0..13] of byte;
  wheelSize,wpno,
  pr,pw,i, k: LongWord;
begin
  //the mother of all numbers 1 ;-)
  //the first wheel = generator of numbers
  //not divisible by the small primes first found primes
  pr := 1;
  primes[1]:= true;
  WheelSize := 1;

  wpno := 0;
  repeat
    inc(pr);
    //pw = pr projected in wheel of wheelsize
    pw := pr;
    if pw > wheelsize then
      dec(pw,wheelsize);
    If Primes[pw] then
    begin
//      writeln(pr:10,pw:10,wheelsize:16);
      k := WheelSize+1;
      //turn the wheel (pr-1)-times
      for i := 1 to pr-1 do
      begin
        inc(k,WheelSize);
        if k<primeLimit then
          move(primes[1],primes[k-WheelSize],WheelSize)
        else
        begin
          move(primes[1],primes[k-WheelSize],PrimeLimit-WheelSize*i);
          break;
        end;
      end;
      dec(k);
      IF k > primeLimit then
        k := primeLimit;
      wheelPrimes[wpno] := pr;
      primes[pr] := false;

      inc(wpno);
      //the new wheelsize
      WheelSize := k;

      //sieve multiples of the new found prime
      i:= pr;
      i := i*i;
      while i <= k do
      begin
        primes[i] := false;
        inc(i,pr);
      end;
    end;
  until WheelSize >= PrimeLimit;

  //re-insert wheel-primes
  // 1 still stays prime
  while wpno > 0 do
  begin
    dec(wpno);
    primes[wheelPrimes[wpno]] := true;
  end;
  BuildWheel  := pr+1;
end;

procedure Sieve;
var
  sieveprime,
  fakt : LongWord;
begin
//primes[1] = true is needed to stop for sieveprime = 2
// at //Search next smaller possible prime
  sieveprime := BuildWheel;
//alternative here
  //fillchar(primes,SizeOf(Primes),chr(ord(true)));sieveprime := 2;
  repeat
    if primes[sieveprime] then
    begin
      //eliminate 'possible prime' multiples of sieveprime
      //must go downwards
      //2*2 would unmark 4 -> 4*2 = 8 wouldnt be unmarked
      fakt := PrimeLimit DIV sieveprime;
      IF fakt < sieveprime then
        BREAK;
      repeat
        //Unmark
        primes[sieveprime*fakt] := false;
        //Search next smaller possible prime
        repeat
          dec(fakt);
        until primes[fakt];
      until fakt < sieveprime;
    end;
    inc(sieveprime);
  until false;
  //remove 1
  primes[1] := false;
end;

var
  prCnt,
  i : LongWord;
Begin
  Sieve;
  {count the primes }
  prCnt := 0;
  for i:= 1 to PrimeLimit do
    inc(prCnt,Ord(primes[i]));
  writeln(prCnt,' primes up to ',PrimeLimit);
end.

output: ( i3 4330 Haswell 3.5 Ghz fpc 2.6.4 -O3 )

5761455 primes up to 100000000

real	0m0.204s
user	0m0.193s
sys	0m0.013s

Perl

For highest performance and ease, typically a module would be used, such as Math::Prime::Util, Math::Prime::FastSieve, or Math::Prime::XS.

Classic Sieve

sub sieve {
  my $n = shift;
  my @composite;
  for my $i (2 .. int(sqrt($n))) {
    if (!$composite[$i]) {
      for (my $j = $i*$i; $j <= $n; $j += $i) {
        $composite[$j] = 1;
      }
    }
  }
  my @primes;
  for my $i (2 .. $n) {
    $composite[$i] || push @primes, $i;
  }
  @primes;
}

Odds only (faster)

sub sieve2 {
  my($n) = @_;
  return @{([],[],[2],[2,3],[2,3])[$n]} if $n <= 4;

  my @composite;
  for (my $t = 3;  $t*$t <= $n;  $t += 2) {
     if (!$composite[$t]) {
        for (my $s = $t*$t;  $s <= $n;  $s += $t*2)
           { $composite[$s]++ }
     }
  }
  my @primes = (2);
  for (my $t = 3;  $t <= $n;  $t += 2) { 
     $composite[$t] || push @primes, $t;
  }
  @primes;
}

Odds only, using vectors for lower memory use

sub dj_vector {
  my($end) = @_;
  return @{([],[],[2],[2,3],[2,3])[$end]} if $end <= 4;
  $end-- if ($end & 1) == 0; # Ensure end is odd

  my ($sieve, $n, $limit, $s_end) = ( '', 3, int(sqrt($end)), $end >> 1 );
  while ( $n <= $limit ) {
    for (my $s = ($n*$n) >> 1; $s <= $s_end; $s += $n) {
      vec($sieve, $s, 1) = 1;
    }
    do { $n += 2 } while vec($sieve, $n >> 1, 1) != 0;
  }
  my @primes = (2);
  do { push @primes, 2*$_+1 if !vec($sieve,$_,1) } for (1..int(($end-1)/2));
  @primes;
}

Odds only, using strings for best performance

Compared to array versions, about 2x faster (with 5.16.0 or later) and lower memory. Much faster than the experimental versions below. It's possible a mod-6 or mod-30 wheel could give more improvement, though possibly with obfuscation. The best next step for performance and functionality would be segmenting.

sub string_sieve {
  my ($n, $i, $s, $d, @primes) = (shift, 7);

  local $_ = '110010101110101110101110111110' .
             '101111101110101110101110111110' x ($n/30);

  until (($s = $i*$i) > $n) {
    $d = $i<<1;
    do { substr($_, $s, 1, '1') } until ($s += $d) > $n;
    1 while substr($_, $i += 2, 1);
  }
  $_ = substr($_, 1, $n); 
  # For just the count:  return ($_ =~ tr/0//);
  push @primes, pos while m/0/g;
  @primes;
}

This older version uses half the memory, but at the expense of a bit of speed and code complexity:

sub dj_string {
  my($end) = @_;
  return @{([],[],[2],[2,3],[2,3])[$end]} if $end <= 4;
  $end-- if ($end & 1) == 0;
  my $s_end = $end >> 1;

  my $whole = int( ($end>>1) / 15);    # prefill with 3 and 5 marked
  my $sieve = '100010010010110' . '011010010010110' x $whole;
  substr($sieve, ($end>>1)+1) = '';
  my ($n, $limit, $s) = ( 7, int(sqrt($end)), 0 );
  while ( $n <= $limit ) {
    for ($s = ($n*$n) >> 1; $s <= $s_end; $s += $n) {
      substr($sieve, $s, 1) = '1';
    }
    do { $n += 2 } while substr($sieve, $n>>1, 1);
  }
  # If you just want the count, it's very fast:
  #       my $count = 1 + $sieve =~ tr/0//;
  my @primes = (2);
  push @primes, 2*pos($sieve)-1 while $sieve =~ m/0/g;
  @primes;
}

Experimental

These are examples of golfing or unusual styles.

Golfing a bit, at the expense of speed:

sub sieve{ my (@s, $i);
	grep { not $s[ $i  = $_ ] and do
		 { $s[ $i += $_ ]++ while $i <= $_[0]; 1 }
	} 2..$_[0]
}

print join ", " => sieve 100;

Or with bit strings (much slower than the vector version above):

sub sieve{ my ($s, $i);
	grep { not vec $s, $i  = $_, 1 and do 
		{ (vec $s, $i += $_, 1) = 1 while $i <= $_[0]; 1 }
	} 2..$_[0]
}

print join ", " => sieve 100;

A short recursive version:

sub erat {
    my $p = shift;
    return $p, $p**2 > $_[$#_] ? @_ : erat(grep $_%$p, @_)
}

print join ', ' => erat 2..100000;

Regexp (purely an example -- the regex engine limits it to only 32769):

sub sieve {
	my ($s, $p) = "." . ("x" x shift);

	1 while ++$p
		and $s =~ /^(.{$p,}?)x/g
		and $p = length($1)
		and $s =~ s/(.{$p})./${1}./g
		and substr($s, $p, 1) = "x";
	$s
}

print sieve(1000);

Extensible sieves

Here are two incremental versions, which allows one to create a tied array of primes:

use strict;
use warnings;
package Tie::SieveOfEratosthenes;

sub TIEARRAY {
	my $class = shift;
	bless \$class, $class;
}

# If set to true, produces copious output.  Observing this output
# is an excellent way to gain insight into how the algorithm works.
use constant DEBUG => 0;

# If set to true, causes the code to skip over even numbers,
# improving runtime.  It does not alter the output content, only the speed.
use constant WHEEL2 => 0;

BEGIN {

	# This is loosely based on the Python implementation of this task,
	# specifically the "Infinite generator with a faster algorithm"

	my @primes = (2, 3);
	my $ps = WHEEL2 ? 1 : 0;
	my $p = $primes[$ps];
	my $q = $p*$p;
	my $incr = WHEEL2 ? 2 : 1;
	my $candidate = $primes[-1] + $incr;
	my %sieve;
	
	print "Initial: p = $p, q = $q, candidate = $candidate\n" if DEBUG;

	sub FETCH {
		my $n = pop;
		return if $n < 0;
		return $primes[$n] if $n <= $#primes;
		OUTER: while( 1 ) {

			# each key in %sieve is a composite number between
			# p and p-squared.  Each value in %sieve is $incr x the prime
			# which acted as a 'seed' for that key.  We use the value
			# to step through multiples of the seed-prime, until we find
			# an empty slot in %sieve.
			while( my $s = delete $sieve{$candidate} ) {
				print "$candidate a multiple of ".($s/$incr).";\t\t" if DEBUG;
				my $composite = $candidate + $s;
				$composite += $s while exists $sieve{$composite};
				print "The next stored multiple of ".($s/$incr)." is $composite\n" if DEBUG;
				$sieve{$composite} = $s;
				$candidate += $incr;
			}

			print "Candidate $candidate is not in sieve\n" if DEBUG;

			while( $candidate < $q ) {
				print "$candidate is prime\n" if DEBUG;
				push @primes, $candidate;
				$candidate += $incr;
				next OUTER if exists $sieve{$candidate};
			} 

			die "Candidate = $candidate, p = $p, q = $q" if $candidate > $q;
			print "Candidate $candidate is equal to $p squared;\t" if DEBUG;

			# Thus, it is now time to add p to the sieve,
			my $step = $incr * $p;
			my $composite = $q + $step;
			$composite += $step while exists $sieve{$composite};
			print "The next multiple of $p is $composite\n" if DEBUG;
			$sieve{$composite} = $step;
		
			# and fetch out a new value for p from our primes array.
			$p = $primes[++$ps];
			$q = $p * $p;	
			
			# And since $candidate was equal to some prime squared,
			# it's obviously composite, and we need to increment it.
			$candidate += $incr;
			print "p is $p, q is $q, candidate is $candidate\n" if DEBUG;
		} continue {
			return $primes[$n] if $n <= $#primes;
		}
	}

}

if( !caller ) {
	tie my (@prime_list), 'Tie::SieveOfEratosthenes';
	my $limit = $ARGV[0] || 100;
	my $line = "";
	for( my $count = 0; $prime_list[$count] < $limit; ++$count ) {
		$line .= $prime_list[$count]. ", ";
		next if length($line) <= 70;
		if( $line =~ tr/,// > 1 ) {
			$line =~ s/^(.*,) (.*, )/$2/;
			print $1, "\n";
		} else {
			print $line, "\n";
			$line = "";
		}
	}
	$line =~ s/, \z//;
	print $line, "\n" if $line;
}

1;

This one is based on the vector sieve shown earlier, but adds to a list as needed, just sieving in the segment. Slightly faster and half the memory vs. the previous incremental sieve. It uses the same API -- arguably we should be offset by one so $primes[$n] returns the $n'th prime.

use strict;
use warnings;
package Tie::SieveOfEratosthenes;

sub TIEARRAY {
  my $class = shift;
  my @primes = (2,3,5,7);
  return bless \@primes, $class;
}

sub prextend { # Extend the given list of primes using a segment sieve
  my($primes, $to) = @_;
  $to-- unless $to & 1; # Ensure end is odd
  return if $to < $primes->[-1];
  my $sqrtn = int(sqrt($to)+0.001);
  prextend($primes, $sqrtn) if $primes->[-1] < $sqrtn;
  my($segment, $startp) = ('', $primes->[-1]+1);
  my($s_beg, $s_len) = ($startp >> 1, ($to>>1) - ($startp>>1));
  for my $p (@$primes) {
    last if $p > $sqrtn;
    if ($p >= 3) {
      my $p2 = $p*$p;
      if ($p2 < $startp) {   # Bump up to next odd multiple of p >= startp
        my $f = 1+int(($startp-1)/$p);
        $p2 = $p * ($f | 1);
      }
      for (my $s = ($p2>>1)-$s_beg; $s <= $s_len; $s += $p) {
        vec($segment, $s, 1) = 1;   # Mark composites in segment
      }
    }
  }
  # Now add all the primes found in the segment to the list
  do { push @$primes, 1+2*($_+$s_beg) if !vec($segment,$_,1) } for 0 .. $s_len;
}

sub FETCHSIZE { 0x7FFF_FFFF }  # Allows foreach to work
sub FETCH {
  my($primes, $n) = @_;
  return if $n < 0;
  # Keep expanding the list as necessary, 5% larger each time.
  prextend($primes, 1000+int(1.05*$primes->[-1])) while $n > $#$primes;
  return $primes->[$n];
}

if( !caller ) {
  tie my @prime_list, 'Tie::SieveOfEratosthenes';
  my $limit = $ARGV[0] || 100;
  print $prime_list[0];
  my $i = 1;
  while ($prime_list[$i] < $limit) { print " ", $prime_list[$i++]; }
  print "\n";
}

1;

Phix

Translation of: Euphoria

constant limit = 1000
sequence primes = {}
sequence flags = repeat(1, limit)
for i=2 to floor(sqrt(limit)) do
    if flags[i] then
        for k=i*i to limit by i do
            flags[k] = 0
        end for
    end if
end for
for i=2 to limit do
    if flags[i] then
        primes &= i
    end if
end for
pp(primes,{pp_Maxlen,77})

Output:

{2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,
 101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,
 193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,
 293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,
 409,419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,
 521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,
 641,643,647,653,659,661,673,677,683,691,701,709,719,727,733,739,743,751,
 757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,
 881,883,887,907,911,919,929,937,941,947,953,967,971,977,983,991,997}

See also Sexy_primes#Phix where the sieve is more useful than a list of primes.
Most applications should use the builtins, eg get_primes(-get_maxprime(1000*1000)) or get_primes_le(1000) both give exactly the same output as above.

Phixmonti

include ..\Utilitys.pmt

def sequence /# ( ini end [step] ) #/
    ( ) swap for 0 put endfor
enddef

1000 var limit

( 1 limit ) sequence

( 2 limit ) for >ps
    ( tps dup * limit tps ) for
        dup limit < if 0 swap set else drop endif
    endfor
    cps
endfor
( 1 limit 0 ) remove
pstack

Another solution

include ..\Utilitys.pmt
   
1000

( "Primes in " over ": " ) lprint

2 swap 2 tolist for >ps
    2
    dup tps < while
        tps over mod 0 == if false else 1 + true endif
        over tps < and
    endwhile
    tps < ps> swap if drop endif
endfor

pstack

Output:

Primes in 1000:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]

=== Press any key to exit ===

PHP

function iprimes_upto($limit)
{
    for ($i = 2; $i < $limit; $i++)
    {
	$primes[$i] = true;
    }
    
    for ($n = 2; $n < $limit; $n++)
    {
	if ($primes[$n])
	{
	    for ($i = $n*$n; $i < $limit; $i += $n)
	    {
		$primes[$i] = false;
	    }
	}
    }
    
    return $primes;
}

echo wordwrap(
    'Primes less or equal than 1000 are : ' . PHP_EOL .
    implode(' ', array_keys(iprimes_upto(1000), true, true)),
    100
);

Output:

Primes less or equal than 1000 are : 
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131
137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269
271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421
431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587
593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743
751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919
929 937 941 947 953 967 971 977 983 991 997

Picat

The SoE is provided in the standard library, defined as follows:

primes(N) = L =>
    A = new_array(N),
    foreach(I in 2..floor(sqrt(N)))
        if (var(A[I])) then
            foreach(J in I**2..I..N)
                A[J]=0
            end
         end
     end,
     L=[I : I in 2..N, var(A[I])].

Output:

Picat> L = math.primes(100).
L = [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
yes

PicoLisp

(de sieve (N)
   (let Sieve (range 1 N)
      (set Sieve)
      (for I (cdr Sieve)
         (when I
            (for (S (nth Sieve (* I I)) S (nth (cdr S) I))
               (set S) ) ) )
      (filter bool Sieve) ) )

Output:

: (sieve 100)
-> (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

Alternate Version Using a 2x3x5x7 Wheel

This works by destructively modifying the CDR of the previous cell when it finds a composite number. For sieving large sets (e.g. 1,000,000) it's much faster than the above.

(setq WHEEL-2357
    (2  4  2  4  6  2  6  4
     2  4  6  6  2  6  4  2
     6  4  6  8  4  2  4  2
     4  8  6  4  6  2  4  6
     2  6  6  4  2  4  6  2
     6  4  2  4  2 10  2 10 .))

(de roll2357wheel (Limit)
    (let W WHEEL-2357
        (make
            (for (N 11  (<= N Limit)  (+ N (pop 'W)))
                (link N)))))

(de sqr (X) (* X X))

(de remove-multiples (L)
    (let (N (car L)  M (* N N)  P L  Q (cdr L))
        (while Q
            (let A (car Q)
                (until (>= M A)
                    (setq M (+ M N)))
                (when (= A M)
                    (con P (cdr Q))))
            (setq  P Q  Q (cdr Q)))))


(de sieve (Limit)
    (let Sieve (roll2357wheel Limit)
        (for (P Sieve  (<= (sqr (car P)) Limit)  (cdr P))
            (remove-multiples P))
        (append (2 3 5 7) Sieve)))

Output:

: (sieve 100)
-> (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)
: (filter '((N) (> N 900)) (sieve 1000))
-> (907 911 919 929 937 941 947 953 967 971 977 983 991 997)
: (last (sieve 1000000))
-> 999983

PL/I

eratos: proc options (main) reorder;

dcl i  fixed bin (31);
dcl j  fixed bin (31);
dcl n  fixed bin (31);
dcl sn fixed bin (31);

dcl hbound builtin;
dcl sqrt   builtin;

dcl sysin    file;
dcl sysprint file;

get list (n);
sn = sqrt(n);

begin;
  dcl primes(n) bit (1) aligned init ((*)((1)'1'b));

  i = 2;

  do while(i <= sn);
    do j = i ** 2 by i to hbound(primes, 1);
      /* Adding a test would just slow down processing! */
      primes(j) = '0'b; 
     end;

    do i = i + 1 to sn until(primes(i));
    end;
  end;

  do i = 2 to hbound(primes, 1);
    if primes(i) then
      put data(i);
  end;
end;
end eratos;

PL/M

100H:

DECLARE PRIME$MAX LITERALLY '5000';

/* CREATE SIEVE OF GIVEN SIZE */
MAKE$SIEVE: PROCEDURE(START, SIZE);
    DECLARE (START, SIZE, M, N) ADDRESS;
    DECLARE PRIME BASED START BYTE;
    
    PRIME(0)=0; /* 0 AND 1 ARE NOT PRIMES */
    PRIME(1)=0; 
    DO N=2 TO SIZE;
        PRIME(N)=1; /* ASSUME ALL OTHERS ARE PRIME AT BEGINNING */
    END;
    
    DO N=2 TO SIZE;
        IF PRIME(N) THEN DO; /* IF A NUMBER IS PRIME... */
            DO M=N*N TO SIZE BY N;
                PRIME(M) = 0; /* THEN ITS MULTIPLES ARE NOT */
            END;
        END;
    END;
END MAKE$SIEVE;

/* CP/M CALLS */
BDOS: PROCEDURE(FUNC, ARG);
    DECLARE FUNC BYTE, ARG ADDRESS;
    GO TO 5;
END BDOS;

DECLARE BDOS$EXIT  LITERALLY '0',
        BDOS$PRINT LITERALLY '9';

/* PRINT A 16-BIT NUMBER */
PRINT$NUMBER: PROCEDURE(N);
    DECLARE (N, P) ADDRESS;
    DECLARE S (8) BYTE INITIAL ('.....',10,13,'$');
    DECLARE C BASED P BYTE;
    P = .S(5);
DIGIT:
    P = P - 1;
    C = (N MOD 10) + '0';
    N = N / 10;
    IF N > 0 THEN GO TO DIGIT;
    CALL BDOS(BDOS$PRINT, P);
END PRINT$NUMBER;

/* PRINT ALL PRIMES UP TO N */
PRINT$PRIMES: PROCEDURE(N, SIEVE);
    DECLARE (I, N, SIEVE) ADDRESS;
    DECLARE PRIME BASED SIEVE BYTE;
    CALL MAKE$SIEVE(SIEVE, N);
    DO I = 2 TO N;
        IF PRIME(I) THEN CALL PRINT$NUMBER(I);
    END;
END PRINT$PRIMES;

CALL PRINT$PRIMES(PRIME$MAX, .MEMORY);

CALL BDOS(BDOS$EXIT, 0);
EOF

Output:

PL/SQL

create or replace package sieve_of_eratosthenes as
  type array_of_booleans is varray(100000000) of boolean;
  type table_of_integers is table of integer;
  function find_primes (n number) return table_of_integers pipelined;
end sieve_of_eratosthenes;
/

create or replace package body sieve_of_eratosthenes as
  function find_primes (n number) return table_of_integers pipelined is
      flag array_of_booleans;
      ptr  integer;
      i    integer;
  begin
      flag := array_of_booleans(false, true);
      flag.extend(n - 2, 2);
      ptr  := 1;
      << outer_loop >>
      while ptr * ptr <= n loop
          while not flag(ptr) loop
              ptr := ptr + 1;
          end loop;
          i := ptr * ptr;
          while i <= n loop
              flag(i) := false;
              i := i + ptr;
          end loop;
          ptr := ptr + 1;
      end loop outer_loop;
      for i in 1 .. n loop
          if flag(i) then
              pipe row (i);
          end if;
      end loop;
      return;
  end find_primes;
end sieve_of_eratosthenes;
/

Usage:

select column_value as prime_number
from   table(sieve_of_eratosthenes.find_primes(30));

PRIME_NUMBER
------------
	   2
	   3
	   5
	   7
	  11
	  13
	  17
	  19
	  23
	  29

10 rows selected.

Elapsed: 00:00:00.01

select count(*) as number_of_primes, sum(column_value) as sum_of_primes
from   table(sieve_of_eratosthenes.find_primes(1e7));

NUMBER_OF_PRIMES   SUM_OF_PRIMES
---------------- ---------------
          664579   3203324994356

Elapsed: 00:00:02.60

Pony

use "time" // for testing
use "collections"

class Primes is Iterator[U32] // returns an Iterator of found primes...
  let _bitmask: Array[U8] = [ 1; 2; 4; 8; 16; 32; 64; 128 ]
  var _lmt: USize
  let _cmpsts: Array[U8]
  var _ndx: USize = 2
  var _curr: U32 = 2
  
  new create(limit: U32) ? =>
    _lmt = USize.from[U32](limit)
    let sqrtlmt = USize.from[F64](F64.from[U32](limit).sqrt())
    _cmpsts = Array[U8].init(0, (_lmt + 8) / 8) // already zeroed; bit array
    _cmpsts(0)? = 3 // mark 0 and 1 as not prime!
    if sqrtlmt < 2 then return end
    for p in Range[USize](2, sqrtlmt + 1) do
      if (_cmpsts(p >> 3)? and _bitmask(p and 7)?) == 0 then
        var s = p * p // cull start address for p * p!
        let slmt = (s + (p << 3)).min(_lmt + 1)
        while s < slmt do
          let msk = _bitmask(s and 7)?
          var c = s >> 3
          while c < _cmpsts.size() do
            _cmpsts(c)? = _cmpsts(c)? or msk
            c = c + p
          end
          s = s + p
        end
      end
    end

  fun ref has_next(): Bool val => _ndx < (_lmt + 1)
  
  fun ref next(): U32 ? =>
    _curr = U32.from[USize](_ndx); _ndx = _ndx + 1
    while (_ndx <= _lmt) and ((_cmpsts(_ndx >> 3)? and _bitmask(_ndx and 7)?) != 0) do
      _ndx = _ndx + 1
    end
    _curr

actor Main
  new create(env: Env) =>
    let limit: U32 = 1_000_000_000
    try
      env.out.write("Primes to 100:  ")
      for p in Primes(100)? do env.out.write(p.string() + " ") end
      var count: I32 = 0
      for p in Primes(1_000_000)? do count = count + 1 end
      env.out.print("\nThere are " + count.string() + " primes to a million.")
      let t = Time
      let start = t.millis()
      let prms = Primes(limit)?
      let elpsd = t.millis() - start
      count = 0
      for _ in prms do count = count + 1 end
      env.out.print("Found " + count.string() + " primes to " + limit.string() + ".")
      env.out.print("This took " + elpsd.string() + " milliseconds.")
    end

Output:

Primes to 100:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
There are 78498 primes to a million.
Found 50847534 primes to 1000000000.
This took 28123 milliseconds.

Note to users: a naive monolithic sieve (one huge array) isn't really the way to implement this for other than trivial usage in sieving ranges to a few millions as cache locality becomes a very large problem as the size of the array (even bit packed with one bit per number representation as here) limits the maximum range that can be sieved and the "cache thrashing" limits the speed.

For extended ranges, a Page Segmented version should be used. As well, for any extended ranges in the billions, it is a waste of available computer resources to not use the multi-threading available in a modern CPU, at which Pony would do very well with its built-in Actor concurrency model.

These versions use "loop unpeeling" (not full loop unrolling), which recognizes the repeating modulo pattern of masking the bytes by the base primes less than the square root of the limit so that an "unpeeling" by eight loops can cull by a constant bit mask over the whole range. For smaller ranges where the speed is not limited by "cache thrashing", this can provide about a factor-of-two speed-up.

Alternate Odds-Only version of the above

It is a waste not to do the trivial changes to the above code to sieve odds-only, which is about two and a half times faster due to the decreased number of culling operations; it doesn't really do much about the huge array problem though, other than to reduce it by a factor of two.

use "time" // for testing
use "collections"

class Primes is Iterator[U32] // returns an Iterator of found primes...
  let _bitmask: Array[U8] = [ 1; 2; 4; 8; 16; 32; 64; 128 ]
  var _lmti: USize
  let _cmpsts: Array[U8]
  var _ndx: USize = 0
  var _curr: U32 = 0
  
  new create(limit: U32) ? =>
    if limit < 3 then _lmti = 0; _cmpsts = Array[U8](); return end
    _lmti = USize.from[U32]((limit - 3) / 2)
    let sqrtlmti = (USize.from[F64](F64.from[U32](limit).sqrt()) - 3) / 2
    _cmpsts = Array[U8].init(0, (_lmti + 8) / 8) // already zeroed; bit array
    for i in Range[USize](0, sqrtlmti + 1) do
      if (_cmpsts(i >> 3)? and _bitmask(i and 7)?) == 0 then
        let p = i + i + 3
        var s = ((i << 1) * (i + 3)) + 3 // cull start address for p * p!
        let slmt = (s + (p << 3)).min(_lmti + 1)
        while s < slmt do
          let msk = _bitmask(s and 7)?
          var c = s >> 3
          while c < _cmpsts.size() do
            _cmpsts(c)? = _cmpsts(c)? or msk
            c = c + p
          end
          s = s + p
        end
      end
    end

  fun ref has_next(): Bool val => _ndx < (_lmti + 1)
  
  fun ref next(): U32 ? =>
    if _curr < 1 then _curr = 3; if _lmti == 0 then _ndx = 1 end; return 2 end
    _curr = U32.from[USize](_ndx + _ndx + 3); _ndx = _ndx + 1
    while (_ndx <= _lmti) and ((_cmpsts(_ndx >> 3)? and _bitmask(_ndx and 7)?) != 0) do
      _ndx = _ndx + 1
    end
    _curr

actor Main
  new create(env: Env) =>
    let limit: U32 = 1_000_000_000
    try
      env.out.write("Primes to 100:  ")
      for p in Primes(100)? do env.out.write(p.string() + " ") end
      var count: I32 = 0
      for p in Primes(1_000_000)? do count = count + 1 end
      env.out.print("\nThere are " + count.string() + " primes to a million.")
      let t = Time
      let start = t.millis()
      let prms = Primes(limit)?
      let elpsd = t.millis() - start
      count = 0
      for _ in prms do count = count + 1 end
      env.out.print("Found " + count.string() + " primes to " + limit.string() + ".")
      env.out.print("This took " + elpsd.string() + " milliseconds.")
    end

The output is the same as the above except that it is about two and a half times faster due to that many less culling operations.

Pop11

define eratostenes(n);
lvars bits = inits(n), i, j;
for i from 2 to n do
   if bits(i) = 0 then
      printf('' >< i, '%s\n');
      for j from 2*i by i to n do
         1 -> bits(j);
      endfor;
   endif;
endfor;
enddefine;

PowerShell

Basic procedure

It outputs immediately so that the number can be used by the pipeline.

function Sieve ( [int] $num )
{
    $isprime = @{}
    2..$num | Where-Object {
        $isprime[$_] -eq $null } | ForEach-Object {
        $_
        $isprime[$_] = $true
        $i=$_*$_
        for ( ; $i -le $num; $i += $_ )
        { $isprime[$i] = $false }
    }
}

Another implementation

function eratosthenes ($n) {
    if($n -ge 1){
        $prime = @(1..($n+1) | foreach{$true})
        $prime[1] = $false
        $m = [Math]::Floor([Math]::Sqrt($n))
        for($i = 2; $i -le $m; $i++) {
            if($prime[$i]) {
                for($j = $i*$i; $j -le $n; $j += $i) {
                    $prime[$j] = $false
                }
            }
        }
        1..$n | where{$prime[$_]}
    } else {
        Write-Warning "$n is less than 1"
    }
}
"$(eratosthenes 100)"

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Processing

Calculate the primes up to 1000000 with Processing, including a visualisation of the process.

int i=2;
int maxx;
int maxy;
int max;
boolean[] sieve;

void setup() {
  size(1000, 1000);
  // frameRate(2);
  maxx=width;
  maxy=height;
  max=width*height;
  sieve=new boolean[max+1];

  sieve[1]=false;
  plot(0, false);
  plot(1, false);
  for (int i=2; i<=max; i++) {
    sieve[i]=true;
    plot(i, true);
  }
}

void draw() {
  if (!sieve[i]) {
    while (i*i<max && !sieve[i]) {
      i++;
    }
  }
  if (sieve[i]) {
    print(i+" ");
    for (int j=i*i; j<=max; j+=i) {
      if (sieve[j]) {
        sieve[j]=false;
        plot(j, false);
      }
    }
  }
  if (i*i<max) {
    i++;
  } else {
    noLoop();
    println("finished");
  }
}

void plot(int pos, boolean active) {
  set(pos%maxx, pos/maxx, active?#000000:#ffffff);
}

As an additional visual effect, the layout of the pixel could be changed from the line-by-line layout to a spiral-like layout starting in the middle of the screen.

Processing Python mode

from __future__ import print_function

i = 2

def setup():
    size(1000, 1000)
    # frameRate(2)
    global maxx, maxy, max_num, sieve
    maxx = width
    maxy = height
    max_num = width * height
    sieve = [False] * (max_num + 1)

    sieve[1] = False
    plot(0, False)
    plot(1, False)
    for i in range(2, max_num + 1):
        sieve[i] = True
        plot(i, True)


def draw():
    global i
    if not sieve[i]:
        while (i * i < max_num and not sieve[i]):
            i += 1

    if sieve[i]:
        print("{} ".format(i), end = '')
        for j in range(i * i, max_num + 1, i):
            if sieve[j]:
                sieve[j] = False
                plot(j, False)

    if i * i < max_num:
        i += 1
    else:
        noLoop()
        println("finished")


def plot(pos, active):
    set(pos % maxx, pos / maxx, color(0) if active else color(255))

Prolog

Using lists

Basic bounded sieve

primes(N, L) :- numlist(2, N, Xs),
	        sieve(Xs, L).

sieve([H|T], [H|X]) :- H2 is H + H, 
                       filter(H, H2, T, R),
                       sieve(R, X).
sieve([], []).

filter(_, _, [], []).
filter(H, H2, [H1|T], R) :- 
    (   H1 < H2 -> R = [H1|R1], filter(H, H2, T, R1)
    ;   H3 is H2 + H,
        (   H1 =:= H2  ->       filter(H, H3, T, R)
        ;                       filter(H, H3, [H1|T], R) ) ).

Output:

 ?- time(( primes(7920,X), length(X,N) )).
% 1,131,127 inferences, 0.109 CPU in 0.125 seconds (88% CPU, 10358239 Lips)
X = [2, 3, 5, 7, 11, 13, 17, 19, 23|...],
N = 1000 .

Basic bounded Euler's sieve

Translation of: Erlang Canonical

This is actually the Euler's variant of the sieve of Eratosthenes, generating (and thus removing) each multiple only once, though a sub-optimal implementation.

primes(X, PS) :- X > 1, range(2, X, R), sieve(R, PS).

range(X, X, [X]) :- !.
range(X, Y, [X | R]) :- X < Y, X1 is X + 1, range(X1, Y, R).

mult(A, B, C) :- C is A*B.

sieve([X], [X]) :- !.
sieve([H | T], [H | S]) :- maplist( mult(H), [H | T], MS), 
                           remove(MS, T, R), sieve(R, S).
 
remove( _,       [],      []     ) :- !.
remove( [H | X], [H | Y], R      ) :- !, remove(X, Y, R).
remove( X,       [H | Y], [H | R]) :- remove(X, Y, R).

Running in SWI Prolog,

Output:

 ?- time(( primes(7920,X), length(X,N) )).
% 2,087,373 inferences, 0.203 CPU in 0.203 seconds (100% CPU, 10297621 Lips)
X = [2, 3, 5, 7, 11, 13, 17, 19, 23|...],
N = 1000.

Optimized Euler's sieve

We can stop early, with massive improvement in complexity (below ~ n^1.5 inferences, empirically, vs. the ~ n² of the above, in n primes produced; showing only the modified predicates):

primes(X, PS) :- X > 1, range(2, X, R), sieve(X, R, PS).

sieve(X, [H | T], [H | T]) :- H*H > X, !.
sieve(X, [H | T], [H | S]) :- maplist( mult(H), [H | T], MS), 
                              remove(MS, T, R), sieve(X, R, S).

Output:

 ?- time(( primes(7920,X), length(X,N) )).
% 174,437 inferences, 0.016 CPU in 0.016 seconds (100% CPU, 11181787 Lips)
X = [2, 3, 5, 7, 11, 13, 17, 19, 23|...],
N = 1000.

Bounded sieve

Optimized by stopping early, traditional sieve of Eratosthenes generating multiples by iterated addition.

primes(X, PS) :- X > 1, range(2, X, R), sieve(X, R, PS).

range(X, X, [X]) :- !.
range(X, Y, [X | R]) :- X < Y, X1 is X + 1, range(X1, Y, R).

sieve(X, [H | T], [H | T]) :- H*H > X, !.
sieve(X, [H | T], [H | S]) :- mults( H, X, MS), remove(MS, T, R), sieve(X, R, S).

mults( H, Lim, MS):- M is H*H, mults( H, M, Lim, MS).
mults( _, M, Lim, []):- M > Lim, !.
mults( H, M, Lim, [M|MS]):- M2 is M+H, mults( H, M2, Lim, MS).

remove( _,       [],      []     ) :- !.
remove( [H | X], [H | Y], R      ) :- !, remove(X, Y, R).
remove( [H | X], [G | Y], R      ) :- H < G, !, remove(X, [G | Y], R).
remove( X,       [H | Y], [H | R]) :- remove(X, Y, R).

Output:

?- time(( primes(7920,X), length(X,N) )).
% 140,654 inferences, 0.016 CPU in 0.011 seconds (142% CPU, 9016224 Lips)
X = [2, 3, 5, 7, 11, 13, 17, 19, 23|...],
N = 1000.

Sift the Two's and Sift the Three's

Another version, based on Cloksin&Mellish p.175, modified to stop early as well as to work with odds only and use addition in the removing predicate, instead of the mod testing as the original was doing:

primes(N,[]):- N < 2, !.
primes(N,[2|R]):- ints(3,N,L), sift(N,L,R).
ints(A,B,[A|C]):- A=<B -> D is A+2, ints(D,B,C).
ints(_,_,[]).
sift(_,[],[]).
sift(N,[A|B],[A|C]):- A*A =< N ->  rmv(A,B,D), sift(N,D,C)
                      ; C=B.
rmv(A,B,D):- M is A*A, rmv(A,M,B,D).
rmv(_,_,[],[]).
rmv(P,M,[A|B],C):- (   M>A ->  C=[A|D], rmv(P,M,B,D)
                   ;   M==A ->  M2 is M+2*P, rmv(P,M2,B,C) 
                   ;   M<A ->  M2 is M+2*P, rmv(P,M2,[A|B],C)
                   ).

Runs at about n^1.4 time empirically, producing 20,000 primes in 1.4 secs on the SWISH platform as of 2021-11-26.

Using lazy lists

In SWI Prolog and others, where freeze/2 is available.

Basic variant

primes(PS):- count(2, 1, NS), sieve(NS, PS).

count(N, D, [N|T]):- freeze(T, (N2 is N+D, count(N2, D, T))).

sieve([N|NS],[N|PS]):- N2 is N*N, count(N2,N,A), remove(A,NS,B), freeze(PS, sieve(B,PS)).

take(N, X, A):- length(A, N), append(A, _, X).

remove([A|T],[B|S],R):- A < B -> remove(T,[B|S],R) ;
                        A=:=B -> remove(T,S,R) ; 
                        R = [B|R2], freeze(R2, remove([A|T], S, R2)).

Output:

 ?- time(( primes(PS), take(1000,PS,R1), length(R,10), append(_,R,R1), writeln(R), false )).
[7841,7853,7867,7873,7877,7879,7883,7901,7907,7919]
% 8,464,518 inferences, 0.702 CPU in 0.697 seconds (101% CPU, 12057641 Lips)
false.

Optimized by postponed removal

Showing only changed predicates.

primes([2|PS]):- 
    freeze(PS, (primes(BPS), count(3, 1, NS), sieve(NS, BPS, 4, PS))). 

sieve([N|NS], BPS, Q, PS):- 
    N < Q -> PS = [N|PS2], freeze(PS2, sieve(NS, BPS, Q, PS2))
    ;  BPS = [BP,BP2|BPS2], Q2 is BP2*BP2, count(Q, BP, MS),
       remove(MS, NS, R), sieve(R, [BP2|BPS2], Q2, PS).

Output:

 ?- time(( primes(PS), take(1000,PS,R1), length(R,10), append(_,R,R1), writeln(R), false )).
[7841,7853,7867,7873,7877,7879,7883,7901,7907,7919]
% 697,727 inferences, 0.078 CPU in 0.078 seconds (100% CPU, 8945161 Lips)
false.       %% odds only: 487,441 inferences

Using facts to record composite numbers

The first two solutions use Prolog "facts" to record the composite (i.e. already-visited) numbers.

Elementary approach: multiplication-free, division-free, mod-free, and cut-free

The basic Eratosthenes sieve depends on nothing more complex than counting. In celebration of this simplicity, the first approach to the problem taken here is free of multiplication and division, as well as Prolog's non-logical "cut".

It defines the predicate between/4 to avoid division, and composite/1 to record integers that are found to be composite.

% %sieve( +N, -Primes ) is true if Primes is the list of consecutive primes
% that are less than or equal to N
sieve( N, [2|Rest]) :-
  retractall( composite(_) ),
  sieve( N, 2, Rest ) -> true.  % only one solution

% sieve P, find the next non-prime, and then recurse:
sieve( N, P, [I|Rest] ) :-
  sieve_once(P, N),
  (P = 2 -> P2 is P+1; P2 is P+2),
  between(P2, N, I), 
  (composite(I) -> fail; sieve( N, I, Rest )).

% It is OK if there are no more primes less than or equal to N:
sieve( N, P, [] ).

sieve_once(P, N) :-
  forall( between(P, N, P, IP),
          (composite(IP) -> true ; assertz( composite(IP) )) ).


% To avoid division, we use the iterator
% between(+Min, +Max, +By, -I) 
% where we assume that By > 0
% This is like "for(I=Min; I <= Max; I+=By)" in C.
between(Min, Max, By, I) :- 
  Min =< Max, 
  A is Min + By, 
  (I = Min; between(A, Max, By, I) ).


% Some Prolog implementations require the dynamic predicates be
%  declared:

:- dynamic( composite/1 ).

The above has been tested with SWI-Prolog and gprolog.

% SWI-Prolog:

?- time( (sieve(100000,P), length(P,N), writeln(N), last(P, LP), writeln(LP) )).
% 1,323,159 inferences, 0.862 CPU in 0.921 seconds (94% CPU, 1534724 Lips)
P = [2, 3, 5, 7, 11, 13, 17, 19, 23|...],
N = 9592,
LP = 99991.

Optimized approach

Works with SWI-Prolog.

sieve(N, [2|PS]) :-       % PS is list of odd primes up to N
    retractall(mult(_)),
    sieve_O(3,N,PS).

sieve_O(I,N,PS) :-        % sieve odds from I up to N to get PS
    I =< N, !, I1 is I+2,
    (   mult(I) -> sieve_O(I1,N,PS)
    ;   (   I =< N / I -> 
            ISq is I*I, DI  is 2*I, add_mults(DI,ISq,N)
        ;   true 
        ),
        PS = [I|T],
        sieve_O(I1,N,T)
    ).
sieve_O(I,N,[]) :- I > N.

add_mults(DI,I,N) :-
    I =< N, !,
    ( mult(I) -> true ; assert(mult(I)) ),
    I1 is I+DI,
    add_mults(DI,I1,N).
add_mults(_,I,N) :- I > N.

main(N) :- current_prolog_flag(verbose,F),
  set_prolog_flag(verbose,normal), 
  time( sieve( N,P)), length(P,Len), last(P, LP), writeln([Len,LP]),
  set_prolog_flag(verbose,F).
 
:- dynamic( mult/1 ).
:- main(100000), main(1000000).

Running it produces

%% stdout copy
[9592, 99991]
[78498, 999983]

%% stderr copy
% 293,176 inferences, 0.14 CPU in 0.14 seconds (101% CPU, 2094114 Lips)
% 3,122,303 inferences, 1.63 CPU in 1.67 seconds (97% CPU, 1915523 Lips)

which indicates ~ N^1.1 empirical orders of growth, which is consistent with the O(N log log N) theoretical runtime complexity.

Using a priority queue

Uses a ariority queue, from the paper "The Genuine Sieve of Eratosthenes" by Melissa O'Neill. Works with YAP (Yet Another Prolog)

?- use_module(library(heaps)).

prime(2).
prime(N) :- prime_heap(N, _).

prime_heap(3, H) :- list_to_heap([9-6], H).
prime_heap(N, H) :-
    prime_heap(M, H0), N0 is M + 2,
    next_prime(N0, H0, N, H).

next_prime(N0, H0, N, H) :-
    \+ min_of_heap(H0, N0, _),
    N = N0, Composite is N*N, Skip is N+N,
    add_to_heap(H0, Composite, Skip, H).
next_prime(N0, H0, N, H) :-
    min_of_heap(H0, N0, _),
    adjust_heap(H0, N0, H1), N1 is N0 + 2,
    next_prime(N1, H1, N, H).

adjust_heap(H0, N, H) :-
    min_of_heap(H0, N, _),
    get_from_heap(H0, N, Skip, H1),
    Composite is N + Skip, add_to_heap(H1, Composite, Skip, H2),
    adjust_heap(H2, N, H).
adjust_heap(H, N, H) :-
    \+ min_of_heap(H, N, _).

PureBasic

Basic procedure

For n=2 To Sqr(lim)
  If Nums(n)=0
    m=n*n
    While m<=lim
      Nums(m)=1
      m+n
    Wend
  EndIf
Next n

Working example

Dim Nums.i(0)
Define l, n, m, lim

If OpenConsole()

  ; Ask for the limit to search, get that input and allocate a Array
  Print("Enter limit for this search: ")
  lim=Val(Input())
  ReDim Nums(lim)
  
  ; Use a basic Sieve of Eratosthenes
  For n=2 To Sqr(lim)
    If Nums(n)=#False
      m=n*n
      While m<=lim
        Nums(m)=#True
        m+n
      Wend
    EndIf
  Next n
  
  ;Present the result to our user
  PrintN(#CRLF$+"The Prims up to "+Str(lim)+" are;")
  m=0: l=Log10(lim)+1
  For n=2 To lim
    If Nums(n)=#False
      Print(RSet(Str(n),l)+" ")
      m+1
      If m>72/(l+1)
        m=0: PrintN("")
      EndIf
    EndIf
  Next
  
  Print(#CRLF$+#CRLF$+"Press ENTER to exit"): Input()
  CloseConsole()
EndIf

Output may look like;

Enter limit for this search: 750

The Prims up to 750 are;
   2    3    5    7   11   13   17   19   23   29   31   37   41   43   47
  53   59   61   67   71   73   79   83   89   97  101  103  107  109  113
 127  131  137  139  149  151  157  163  167  173  179  181  191  193  197
 199  211  223  227  229  233  239  241  251  257  263  269  271  277  281
 283  293  307  311  313  317  331  337  347  349  353  359  367  373  379
 383  389  397  401  409  419  421  431  433  439  443  449  457  461  463
 467  479  487  491  499  503  509  521  523  541  547  557  563  569  571
 577  587  593  599  601  607  613  617  619  631  641  643  647  653  659
 661  673  677  683  691  701  709  719  727  733  739  743

Press ENTER to exit

Python

Note that the examples use range instead of xrange for Python 3 and Python 2 compatability, but when using Python 2 xrange is the nearest equivalent to Python 3's implementation of range and should be substituted for ranges with a very large number of items.

Using set lookup

The version below uses a set to store the multiples. set objects are much faster (usually O(log n)) than lists (O(n)) for checking if a given object is a member. Using the set.update method avoids explicit iteration in the interpreter, giving a further speed improvement.

def eratosthenes2(n):
    multiples = set()
    for i in range(2, n+1):
        if i not in multiples:
            yield i
            multiples.update(range(i*i, n+1, i))

print(list(eratosthenes2(100)))

Using array lookup

The version below uses array lookup to test for primality. The function primes_upto() is a straightforward implementation of Sieve of Eratosthenesalgorithm. It returns prime numbers less than or equal to limit.

def primes_upto(limit):
    is_prime = [False] * 2 + [True] * (limit - 1) 
    for n in range(int(limit**0.5 + 1.5)): # stop at ``sqrt(limit)``
        if is_prime[n]:
            for i in range(n*n, limit+1, n):
                is_prime[i] = False
    return [i for i, prime in enumerate(is_prime) if prime]

Using generator

The following code may be slightly slower than using the array/list as above, but uses no memory for output:

def iprimes_upto(limit):
    is_prime = [False] * 2 + [True] * (limit - 1)
    for n in xrange(int(limit**0.5 + 1.5)): # stop at ``sqrt(limit)``
        if is_prime[n]:
            for i in range(n * n, limit + 1, n): # start at ``n`` squared
                is_prime[i] = False
    for i in xrange(limit + 1):
        if is_prime[i]: yield i

Example:

>>> list(iprimes_upto(15))
[2, 3, 5, 7, 11, 13]

Odds-only version of the array sieve above

The following code is faster than the above array version using only odd composite operations (for a factor of over two) and because it has been optimized to use slice operations for composite number culling to avoid extra work by the interpreter:

def primes2(limit):
    if limit < 2: return []
    if limit < 3: return [2]
    lmtbf = (limit - 3) // 2
    buf = [True] * (lmtbf + 1)
    for i in range((int(limit ** 0.5) - 3) // 2 + 1):
        if buf[i]:
            p = i + i + 3
            s = p * (i + 1) + i
            buf[s::p] = [False] * ((lmtbf - s) // p + 1)
    return [2] + [i + i + 3 for i, v in enumerate(buf) if v]

Note that "range" needs to be changed to "xrange" for maximum speed with Python 2.

Odds-only version of the generator version above

The following code is faster than the above generator version using only odd composite operations (for a factor of over two) and because it has been optimized to use slice operations for composite number culling to avoid extra work by the interpreter:

def iprimes2(limit):
    yield 2
    if limit < 3: return
    lmtbf = (limit - 3) // 2
    buf = [True] * (lmtbf + 1)
    for i in range((int(limit ** 0.5) - 3) // 2 + 1):
        if buf[i]:
            p = i + i + 3
            s = p * (i + 1) + i
            buf[s::p] = [False] * ((lmtbf - s) // p + 1)
    for i in range(lmtbf + 1):
        if buf[i]: yield (i + i + 3)

Note that this version may actually run slightly faster than the equivalent array version with the advantage that the output doesn't require any memory.

Also note that "range" needs to be changed to "xrange" for maximum speed with Python 2.

Factorization wheel235 version of the generator version

This uses a 235 factorial wheel for further reductions in operations; the same techniques can be applied to the array version as well; it runs slightly faster and uses slightly less memory as compared to the odds-only algorithms:

def primes235(limit):
    yield 2; yield 3; yield 5
    if limit < 7: return
    modPrms = [7,11,13,17,19,23,29,31]
    gaps = [4,2,4,2,4,6,2,6,4,2,4,2,4,6,2,6] # 2 loops for overflow
    ndxs = [0,0,0,0,1,1,2,2,2,2,3,3,4,4,4,4,5,5,5,5,5,5,6,6,7,7,7,7,7,7]
    lmtbf = (limit + 23) // 30 * 8 - 1 # integral number of wheels rounded up
    lmtsqrt = (int(limit ** 0.5) - 7)
    lmtsqrt = lmtsqrt // 30 * 8 + ndxs[lmtsqrt % 30] # round down on the wheel
    buf = [True] * (lmtbf + 1)
    for i in range(lmtsqrt + 1):
        if buf[i]:
            ci = i & 7; p = 30 * (i >> 3) + modPrms[ci]
            s = p * p - 7; p8 = p << 3
            for j in range(8):
                c = s // 30 * 8 + ndxs[s % 30]
                buf[c::p8] = [False] * ((lmtbf - c) // p8 + 1)
                s += p * gaps[ci]; ci += 1
    for i in range(lmtbf - 6 + (ndxs[(limit - 7) % 30])): # adjust for extras
        if buf[i]: yield (30 * (i >> 3) + modPrms[i & 7])

Note: Much of the time (almost two thirds for this last case for Python 2.7.6) for any of these array/list or generator algorithms is used in the computation and enumeration of the final output in the last line(s), so any slight changes to those lines can greatly affect execution time. For Python 3 this enumeration is about twice as slow as Python 2 (Python 3.3 slow and 3.4 slower) for an even bigger percentage of time spent just outputting the results. This slow enumeration means that there is little advantage to versions that use even further wheel factorization, as the composite number culling is a small part of the time to enumerate the results.

If just the count of the number of primes over a range is desired, then converting the functions to prime counting functions by changing the final enumeration lines to "return buf.count(True)" will save a lot of time.

Note that "range" needs to be changed to "xrange" for maximum speed with Python 2 where Python 2's "xrange" is a better choice for very large sieve ranges.
Timings were done primarily in Python 2 although source is Python 2/3 compatible (shows range and not xrange).

Using numpy

Library: NumPy

Below code adapted from literateprograms.org using numpy

import numpy
def primes_upto2(limit):
    is_prime = numpy.ones(limit + 1, dtype=numpy.bool)
    for n in xrange(2, int(limit**0.5 + 1.5)): 
        if is_prime[n]:
            is_prime[n*n::n] = 0
    return numpy.nonzero(is_prime)[0][2:]

Performance note: there is no point to add wheels here, due to execution of p[n*n::n] = 0 and nonzero() takes us almost all time.

Also see Prime numbers and Numpy – Python.

Using wheels with numpy

Version with wheel based optimization:

from numpy import array, bool_, multiply, nonzero, ones, put, resize
#
def makepattern(smallprimes):
    pattern = ones(multiply.reduce(smallprimes), dtype=bool_)
    pattern[0] = 0
    for p in smallprimes:
        pattern[p::p] = 0
    return pattern
#
def primes_upto3(limit, smallprimes=(2,3,5,7,11)):    
    sp = array(smallprimes)
    if limit <= sp.max(): return sp[sp <= limit]
    #
    isprime = resize(makepattern(sp), limit + 1) 
    isprime[:2] = 0; put(isprime, sp, 1) 
    #
    for n in range(sp.max() + 2, int(limit**0.5 + 1.5), 2): 
        if isprime[n]:
            isprime[n*n::n] = 0 
    return nonzero(isprime)[0]

Examples:

>>> primes_upto3(10**6, smallprimes=(2,3)) # Wall time: 0.17
array([     2,      3,      5, ..., 999961, 999979, 999983])
>>> primes_upto3(10**7, smallprimes=(2,3))            # Wall time: '''2.13'''
array([      2,       3,       5, ..., 9999971, 9999973, 9999991])
>>> primes_upto3(15)
array([ 2,  3,  5,  7, 11, 13])
>>> primes_upto3(10**7, smallprimes=primes_upto3(15)) # Wall time: '''1.31'''
array([      2,       3,       5, ..., 9999971, 9999973, 9999991])
>>> primes_upto2(10**7)                               # Wall time: '''1.39''' <-- version ''without'' wheels
array([      2,       3,       5, ..., 9999971, 9999973, 9999991])
>>> primes_upto3(10**7)                               # Wall time: '''1.30'''
array([      2,       3,       5, ..., 9999971, 9999973, 9999991])

The above-mentioned examples demonstrate that the given wheel based optimization does not show significant performance gain.

Infinite generator

A generator that will generate primes indefinitely (perhaps until it runs out of memory). Used as a library here.

Works with: Python version 2.6+, 3.x

import heapq

# generates all prime numbers
def sieve():
    # priority queue of the sequences of non-primes
    # the priority queue allows us to get the "next" non-prime quickly
    nonprimes = []
    
    i = 2
    while True:
        if nonprimes and i == nonprimes[0][0]: # non-prime
            while nonprimes[0][0] == i:
                # for each sequence that generates this number,
                # have it go to the next number (simply add the prime)
                # and re-position it in the priority queue
                x = nonprimes[0]
                x[0] += x[1]
                heapq.heapreplace(nonprimes, x)
        
        else: # prime
            # insert a 2-element list into the priority queue:
            # [current multiple, prime]
            # the first element allows sorting by value of current multiple
            # we start with i^2
            heapq.heappush(nonprimes, [i*i, i])
            yield i
        
        i += 1

Example:

>>> foo = sieve()
>>> for i in range(8):
...     print(next(foo))
... 
2
3
5
7
11
13
17
19

Infinite generator with a faster algorithm

The adding of each discovered prime's incremental step info to the mapping should be postponed until the prime's square is seen amongst the candidate numbers, as it is useless before that point. This drastically reduces the space complexity from O(n) to O(sqrt(n/log(n))), in n primes produced, and also lowers the run time complexity quite low (this test entry in Python 2.7 and this test entry in Python 3.x shows about ~ n^1.08 empirical order of growth which is very close to the theoretical value of O(n log(n) log(log(n))), in n primes produced):

Works with: Python version 2.6+, 3.x

def primes():
    yield 2; yield 3; yield 5; yield 7;
    bps = (p for p in primes())             # separate supply of "base" primes (b.p.)
    p = next(bps) and next(bps)             # discard 2, then get 3
    q = p * p                               # 9 - square of next base prime to keep track of,
    sieve = {}                              #                       in the sieve dict
    n = 9                                   # n is the next candidate number
    while True:
        if n not in sieve:                  # n is not a multiple of any of base primes,
            if n < q:                       # below next base prime's square, so
                yield n                     # n is prime
            else:
                p2 = p + p                  # n == p * p: for prime p, add p * p + 2 * p
                sieve[q + p2] = p2          #   to the dict, with 2 * p as the increment step
                p = next(bps); q = p * p    # pull next base prime, and get its square
        else:
            s = sieve.pop(n); nxt = n + s   # n's a multiple of some b.p., find next multiple
            while nxt in sieve: nxt += s    # ensure each entry is unique
            sieve[nxt] = s                  # nxt is next non-marked multiple of this prime
        n += 2                              # work on odds only
 
import itertools
def primes_up_to(limit):
    return list(itertools.takewhile(lambda p: p <= limit, primes()))

Fast infinite generator using a wheel

Although theoretically over three times faster than odds-only, the following code using a 2/3/5/7 wheel is only about 1.5 times faster than the above odds-only code due to the extra overheads in code complexity. The test link for Python 2.7 and test link for Python 3.x show about the same empirical order of growth as the odds-only implementation above once the range grows enough so the dict operations become amortized to a constant factor.

Works with: Python version 2.6+, 3.x

def primes():
    for p in [2,3,5,7]: yield p                 # base wheel primes
    gaps1 = [ 2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8 ]
    gaps = gaps1 + [ 6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10 ] # wheel2357
    def wheel_prime_pairs():
        yield (11,0); bps = wheel_prime_pairs() # additional primes supply
        p, pi = next(bps); q = p * p            # adv to get 11 sqr'd is 121 as next square to put
        sieve = {}; n = 13; ni = 1              #   into sieve dict; init cndidate, wheel ndx
        while True:
            if n not in sieve:                  # is not a multiple of previously recorded primes
                if n < q: yield (n, ni)         # n is prime with wheel modulo index
                else:
                    npi = pi + 1                # advance wheel index
                    if npi > 47: npi = 0
                    sieve[q + p * gaps[pi]] = (p, npi) # n == p * p: put next cull position on wheel
                    p, pi = next(bps); q = p * p  # advance next prime and prime square to put
            else:
                s, si = sieve.pop(n)
                nxt = n + s * gaps[si]          # move current cull position up the wheel
                si = si + 1                     # advance wheel index
                if si > 47: si = 0
                while nxt in sieve:             # ensure each entry is unique by wheel
                    nxt += s * gaps[si]
                    si = si + 1                 # advance wheel index
                    if si > 47: si = 0
                sieve[nxt] = (s, si)            # next non-marked multiple of a prime
            nni = ni + 1                        # advance wheel index
            if nni > 47: nni = 0
            n += gaps[ni]; ni = nni             # advance on the wheel
    for p, pi in wheel_prime_pairs(): yield p   # strip out indexes

Further gains of about 1.5 times in speed can be made using the same code by only changing the tables and a few constants for a further constant factor gain of about 1.5 times in speed by using a 2/3/5/7/11/13/17 wheel (with the gaps list 92160 elements long) computed for a slight constant overhead time as per the test link for Python 2.7 and test link for Python 3.x. Further wheel factorization will not really be worth it as the gains will be small (if any and not losses) and the gaps table huge - it is already too big for efficient use by 32-bit Python 3 and the wheel should likely be stopped at 13:

def primes():
    whlPrms = [2,3,5,7,11,13,17]                # base wheel primes
    for p in whlPrms: yield p
    def makeGaps():
        buf = [True] * (3 * 5 * 7 * 11 * 13 * 17 + 1) # all odds plus extra for o/f
        for p in whlPrms:
            if p < 3:
                continue              # no need to handle evens
            strt = (p * p - 19) >> 1            # start position (divided by 2 using shift)
            while strt < 0: strt += p
            buf[strt::p] = [False] * ((len(buf) - strt - 1) // p + 1) # cull for p
        whlPsns = [i + i for i,v in enumerate(buf) if v]
        return [whlPsns[i + 1] - whlPsns[i] for i in range(len(whlPsns) - 1)]
    gaps = makeGaps()                           # big wheel gaps
    def wheel_prime_pairs():
        yield (19,0); bps = wheel_prime_pairs() # additional primes supply
        p, pi = next(bps); q = p * p            # adv to get 11 sqr'd is 121 as next square to put
        sieve = {}; n = 23; ni = 1              #   into sieve dict; init cndidate, wheel ndx
        while True:
            if n not in sieve:                  # is not a multiple of previously recorded primes
                if n < q: yield (n, ni)         # n is prime with wheel modulo index
                else:
                    npi = pi + 1                # advance wheel index
                    if npi > 92159: npi = 0
                    sieve[q + p * gaps[pi]] = (p, npi) # n == p * p: put next cull position on wheel
                    p, pi = next(bps); q = p * p  # advance next prime and prime square to put
            else:
                s, si = sieve.pop(n)
                nxt = n + s * gaps[si]          # move current cull position up the wheel
                si = si + 1                     # advance wheel index
                if si > 92159: si = 0
                while nxt in sieve:             # ensure each entry is unique by wheel
                    nxt += s * gaps[si]
                    si = si + 1                 # advance wheel index
                    if si > 92159: si = 0
                sieve[nxt] = (s, si)            # next non-marked multiple of a prime
            nni = ni + 1                        # advance wheel index
            if nni > 92159: nni = 0
            n += gaps[ni]; ni = nni             # advance on the wheel
    for p, pi in wheel_prime_pairs(): yield p   # strip out indexes

Iterative sieve on unbounded count from 2

See Extensible prime generator: Iterative sieve on unbounded count from 2

Quackery

  [ dup 1
    [ 2dup > while
      + 1 >>
      2dup / again ]
    drop nip ]                  is sqrt         ( n --> n )
 
  [ stack [ 3 ~ ] constant ]    is primes       (   --> s )
 
  ( If a number is prime, the corresponding bit on the
    number on the primes ancillary stack is set.
    Initially all the bits are set except for 0 and 1,
    which are not prime numbers by definition. 
    "eratosthenes" unsets all bits above those specified
    by it's argument. )
 
  [ bit ~
    primes take & primes put ]  is -composite   ( n -->   )
 
  [ bit primes share & 0 != ]   is isprime      ( n --> b )
 
  [ dup dup sqrt times
      [ i^ 1+
        dup isprime if
          [ dup 2 **
            [ dup -composite
              over +
              rot 2dup >
              dip unrot until ]
            drop ]
        drop ]
     drop 
     1+ bit 1 -
     primes take & 
     primes put ]                 is eratosthenes ( n -->   )
 
  100 eratosthenes
 
  100 times [ i^ isprime if [ i^ echo sp ] ]

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

R

sieve <- function(n) {
  if (n < 2) integer(0)
  else {
    primes <- rep(T, n)
    primes[[1]] <- F
    for(i in seq(sqrt(n))) {
      if(primes[[i]]) {
        primes[seq(i * i, n, i)] <- F
      }
    }
    which(primes)
  }
}

sieve(1000)

Output:

  [1]   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61
 [19]  67  71  73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151
 [37] 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251
 [55] 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359
 [73] 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463
 [91] 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593
[109] 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701
[127] 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827
[145] 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953
[163] 967 971 977 983 991 997

Alternate Odds-Only Version

sieve <- function(n) {
  if (n < 2) return(integer(0))
  lmt <- (sqrt(n) - 1) / 2
  sz <- (n - 1) / 2
  buf <- rep(TRUE, sz)
  for(i in seq(lmt)) {
    if (buf[i]) {
      buf[seq((i + i) * (i + 1), sz, by=(i + i + 1))] <- FALSE
    }
  }
  cat(2, sep='')
  for(i in seq(sz)) {
    if (buf[i]) {
      cat(" ", (i + i + 1), sep='')
    }
  }
}

sieve(1000)

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997

Racket

Imperative versions

Ugly imperative version:

#lang racket

(define (sieve n)
  (define non-primes '())
  (define primes '())
  (for ([i (in-range 2 (add1 n))])
    (unless (member i non-primes)
      (set! primes (cons i primes))
      (for ([j (in-range (* i i) (add1 n) i)])
        (set! non-primes (cons j non-primes)))))
  (reverse primes))

(sieve 100)

A little nicer, but still imperative:

#lang racket
(define (sieve n)
  (define primes (make-vector (add1 n) #t))
  (for* ([i (in-range 2 (add1 n))]
         #:when (vector-ref primes i)
         [j (in-range (* i i) (add1 n) i)])
    (vector-set! primes j #f))
  (for/list ([n (in-range 2 (add1 n))]
             #:when (vector-ref primes n))
    n))
(sieve 100)

Imperative version using a bit vector:

#lang racket
(require data/bit-vector)
;; Returns a list of prime numbers up to natural number limit
(define (eratosthenes limit)
  (define bv (make-bit-vector (+ limit 1) #f))
  (bit-vector-set! bv 0 #t)
  (bit-vector-set! bv 1 #t)
  (for* ([i (in-range (add1 (sqrt limit)))] #:unless (bit-vector-ref bv i)
         [j (in-range (* 2 i) (bit-vector-length bv) i)])
    (bit-vector-set! bv j #t))
  ;; translate to a list of primes
  (for/list ([i (bit-vector-length bv)] #:unless (bit-vector-ref bv i)) i))
(eratosthenes 100)

Output:

'(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

Infinite list of primes Using laziness

These examples use infinite lists (streams) to implement the sieve of Eratosthenes in a functional way, and producing all prime numbers. The following functions are used as a prefix for pieces of code that follow:

#lang lazy
(define (ints-from i d) (cons i (ints-from (+ i d) d)))
(define (after n l f)
  (if (< (car l) n) (cons (car l) (after n (cdr l) f)) (f l)))
(define (diff l1 l2)
  (let ([x1 (car l1)] [x2 (car l2)])
    (cond [(< x1 x2) (cons x1 (diff (cdr l1)      l2 ))]
          [(> x1 x2)          (diff      l1  (cdr l2)) ]
          [else               (diff (cdr l1) (cdr l2)) ])))
(define (union l1 l2)        ; union of two lists
  (let ([x1 (car l1)] [x2 (car l2)])
    (cond [(< x1 x2) (cons x1 (union (cdr l1)      l2 ))]
          [(> x1 x2) (cons x2 (union      l1  (cdr l2)))]
          [else      (cons x1 (union (cdr l1) (cdr l2)))])))

Basic sieve

(define (sieve l)
  (define x (car l))
  (cons x (sieve (diff (cdr l) (ints-from (+ x x) x)))))
(define primes (sieve (ints-from 2 1)))
(!! (take 25 primes))

Runs at ~ n^2.1 empirically, for n <= 1500 primes produced.

With merged composites

Note that the first number, 2, and its multiples stream (ints-from 4 2) are handled separately to ensure that the non-primes list is never empty, which simplifies the code for union which assumes non-empty infinite lists.

(define (sieve l non-primes)
  (let ([x (car l)] [np (car non-primes)])
    (cond [(= x np)     (sieve (cdr l) (cdr  non-primes))]    ; else x < np
          [else (cons x (sieve (cdr l) (union (ints-from (* x x) x)  
                                               non-primes)))]))) 
(define primes (cons 2 (sieve (ints-from 3 1) (ints-from 4 2))))

Basic sieve Optimized with postponed processing

Since a prime's multiples that count start from its square, we should only start removing them when we reach that square.

(define (sieve l prs)
  (define p (car prs))
  (define q (* p p))
  (after q l (λ(t) (sieve (diff t (ints-from q p)) (cdr prs)))))
(define primes (cons 2 (sieve (ints-from 3 1) primes)))

Runs at ~ n^1.4 up to n=10,000. The initial 2 in the self-referential primes definition is needed to prevent a "black hole".

Merged composites Optimized with postponed processing

Since prime's multiples that matter start from its square, we should only add them when we reach that square.

(define (composites l q primes)
  (after q l 
    (λ(t) 
      (let ([p (car primes)] [r (cadr primes)])
        (composites (union t (ints-from q p))   ; q = p*p
                    (* r r) (cdr primes))))))
(define primes (cons 2 
                 (diff (ints-from 3 1)
                       (composites (ints-from 4 2) 9 (cdr primes)))))

Implementation of Richard Bird's algorithm

Appears in M.O'Neill's paper. Achieves on its own the proper postponement that is specifically arranged for in the version above (with after), and is yet more efficient, because it folds to the right and so builds the right-leaning structure of merges at run time, where the more frequently-producing streams of multiples appear higher in that structure, so the composite numbers produced by them have less merge nodes to percolate through:

(define primes
  (cons 2 (diff (ints-from 3 1)
                (foldr (λ(p r) (define q (* p p))
                               (cons q (union (ints-from (+ q p) p) r)))
                       '() primes))))

Using threads and channels

Same algorithm as "merged composites" above (without the postponement optimization), but now using threads and channels to produce a channel of all prime numbers (similar to newsqueak). The macro at the top is a convenient wrapper around definitions of channels using a thread that feeds them.

#lang racket
(define-syntax (define-thread-loop stx)
  (syntax-case stx ()
    [(_ (name . args) expr ...)
     (with-syntax ([out! (datum->syntax stx 'out!)])
       #'(define (name . args)
           (define out (make-channel))
           (define (out! x) (channel-put out x))
           (thread (λ() (let loop () expr ... (loop))))
           out))]))
(define-thread-loop (ints-from i d) (out! i) (set! i (+ i d)))
(define-thread-loop (merge c1 c2)
  (let loop ([x1 (channel-get c1)] [x2 (channel-get c2)])
    (cond [(> x1 x2) (out! x2) (loop x1 (channel-get c2))]
          [(< x1 x2) (out! x1) (loop (channel-get c1) x2)]
          [else      (out! x1) (loop (channel-get c1) (channel-get c2))])))
(define-thread-loop (sieve l non-primes)
  (let loop ([x (channel-get l)] [np (channel-get non-primes)])
    (cond [(> x np) (loop x (channel-get non-primes))]
          [(= x np) (loop (channel-get l) (channel-get non-primes))]
          [else     (out! x) 
                    (set! non-primes (merge (ints-from (* x x) x) non-primes))
                    (loop (channel-get l)  np)])))
(define-thread-loop (cons x l)
  (out! x) (let loop () (out! (channel-get l)) (loop)))
(define primes (cons 2 (sieve (ints-from 3 1) (ints-from 4 2))))
(for/list ([i 25] [x (in-producer channel-get eof primes)]) x)

Using generators

Yet another variation of the same algorithm as above, this time using generators.

#lang racket
(require racket/generator)
(define (ints-from i d)
  (generator () (let loop ([i i]) (yield i) (loop (+ i d)))))
(define (merge g1 g2)
  (generator ()
    (let loop ([x1 (g1)] [x2 (g2)])
      (cond [(< x1 x2) (yield x1) (loop (g1) x2)]
            [(> x1 x2) (yield x2) (loop x1 (g2))]
            [else      (yield x1) (loop (g1) (g2))]))))
(define (sieve l non-primes)
  (generator ()
    (let loop ([x (l)] [np (non-primes)])
      (cond [(> x np) (loop x (non-primes))]
            [(= x np) (loop (l) (non-primes))]
            [else (yield x)
                  (set! non-primes (merge (ints-from (* x x) x) non-primes))
                  (loop (l) np)]))))
(define (cons x l) (generator () (yield x) (let loop () (yield (l)) (loop))))
(define primes (cons 2 (sieve (ints-from 3 1) (ints-from 4 2))))
(for/list ([i 25] [x (in-producer primes)]) x)

Raku

(formerly Perl 6)

sub sieve( Int $limit ) {
    my @is-prime = False, False, slip True xx $limit - 1;

    gather for @is-prime.kv -> $number, $is-prime {
        if $is-prime {
            take $number;
            loop (my $s = $number**2; $s <= $limit; $s += $number) {
                @is-prime[$s] = False;
            }
        }
    }
}

(sieve 100).join(",").say;

A set-based approach

More or less the same as the first Python example:

sub eratsieve($n) {
    # Requires n(1 - 1/(log(n-1))) storage
    my $multiples = set();
    gather for 2..$n -> $i {
        unless $i (&) $multiples { # is subset
            take $i;
            $multiples (+)= set($i**2, *+$i ... (* > $n)); # union
        }
    }
}

say flat eratsieve(100);

This gives:

 (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

Using a chain of filters

This example is incorrect. Please fix the code and remove this message.

Details: This version uses modulo (division) testing and so is a trial division algorithm, not a sieve of Eratosthenes.

Note: while this is "incorrect" by a strict interpretation of the rules, it is being left as an interesting example

sub primes ( UInt $n ) {
    gather {
        # create an iterator from 2 to $n (inclusive)
        my $iterator := (2..$n).iterator;

        loop {
            # If it passed all of the filters it must be prime
            my $prime := $iterator.pull-one;
            # unless it is actually the end of the sequence
            last if $prime =:= IterationEnd;

            take $prime; # add the prime to the `gather` sequence

            # filter out the factors of the current prime
            $iterator := Seq.new($iterator).grep(* % $prime).iterator;
            # (2..*).grep(* % 2).grep(* % 3).grep(* % 5).grep(* % 7)…
        }
    }
}

put primes( 100 );

Which prints

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

RATFOR

program prime
#
define(true,1)
define(false,0)
#
integer loop,loop2,limit,k,primes,count
integer isprime(1000)

limit = 1000
count = 0

for (loop=1; loop<=limit; loop=loop+1)
    {
       isprime(loop) = true
    }

isprime(1) = false

for (loop=2; loop<=limit; loop=loop+1)


    {
       if (isprime(loop) == true) 
          {
              count = count + 1
              for (loop2=loop*loop; loop2 <= limit; loop2=loop2+loop)
                 {
                     isprime(loop2) = false
                 }
          }
    }
write(*,*)
write(*,101) count

101 format('There are ',I12,' primes.')

count = 0
for (loop=1; loop<=limit; loop=loop+1)
        if (isprime(loop) == true)
           {
               Count = count + 1
               write(*,'(I6,$)')loop
               if (mod(count,10) == 0) write(*,*)
           }
write(*,*)

end

Red

primes: function [n [integer!]][
   poke prim: make bitset! n 1 true
   r: 2 while [r * r <= n][
      repeat q n / r - 1 [poke prim q + 1 * r true] 
      until [not pick prim r: r + 1]
   ]
   collect [repeat i n [if not prim/:i [keep i]]]
]

primes 100
== [2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]

Refal

$ENTRY Go {
    = <Print <Primes 100>>;
};

Primes {
    s.N = <Sieve <Iota 2 s.N>>;
};

Iota {
    s.End s.End = s.End;
    s.Start s.End = s.Start <Iota <+ 1 s.Start> s.End>;
};

Cross {
    s.Step e.List = <Cross (s.Step 1) s.Step e.List>;
    (s.Step s.Skip) = ;
    (s.Step 1) s.Item e.List = X <Cross (s.Step s.Step) e.List>;
    (s.Step s.N) s.Item e.List = s.Item <Cross (s.Step <- s.N 1>) e.List>;
};

Sieve {
    = ;
    X e.List = <Sieve e.List>;
    s.N e.List = s.N <Sieve <Cross s.N e.List>>;
};

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

REXX

no wheel version

The first three REXX versions make use of a sparse stemmed array: [@.].

As the stemmed array gets heavily populated, the number of entries may slow down the REXX interpreter substantially,
depending upon the efficacy of the hashing technique being used for REXX variables (setting/retrieving).

/*REXX program generates and displays primes  via the  sieve of Eratosthenes  algorithm.*/
parse arg H .;   if H=='' | H==","  then H= 200  /*optain optional argument from the CL.*/
w= length(H);    @prime= right('prime', 20)      /*W:   is used for aligning the output.*/
@.=.                                             /*assume all the numbers are  prime.   */
#= 0                                             /*number of primes found  (so far).    */
     do j=2  for H-1;   if @.j==''  then iterate /*all prime integers up to H inclusive.*/
     #= # + 1                                    /*bump the prime number counter.       */
     say  @prime right(#,w)  " ───► " right(j,w) /*display the  prime  to the terminal. */
         do m=j*j  to H  by j;    @.m=;   end    /*strike all multiples as being ¬ prime*/
     end   /*j*/                                 /*       ───                           */
say                                              /*stick a fork in it,  we're all done. */
say  right(#, 1+w+length(@prime) )     'primes found up to and including '       H

output when using the input default of: 200

               prime   1  ───►    2
               prime   2  ───►    3
               prime   3  ───►    5
               prime   4  ───►    7
               prime   5  ───►   11
               prime   6  ───►   13
               prime   7  ───►   17
               prime   8  ───►   19
               prime   9  ───►   23
               prime  10  ───►   29
               prime  11  ───►   31
               prime  12  ───►   37
               prime  13  ───►   41
               prime  14  ───►   43
               prime  15  ───►   47
               prime  16  ───►   53
               prime  17  ───►   59
               prime  18  ───►   61
               prime  19  ───►   67
               prime  20  ───►   71
               prime  21  ───►   73
               prime  22  ───►   79
               prime  23  ───►   83
               prime  24  ───►   89
               prime  25  ───►   97
               prime  26  ───►  101
               prime  27  ───►  103
               prime  28  ───►  107
               prime  29  ───►  109
               prime  30  ───►  113
               prime  31  ───►  127
               prime  32  ───►  131
               prime  33  ───►  137
               prime  34  ───►  139
               prime  35  ───►  149
               prime  36  ───►  151
               prime  37  ───►  157
               prime  38  ───►  163
               prime  39  ───►  167
               prime  40  ───►  173
               prime  41  ───►  179
               prime  42  ───►  181
               prime  43  ───►  191
               prime  44  ───►  193
               prime  45  ───►  197
               prime  46  ───►  199

                      46 primes found up to and including  200

wheel version, optional prime list suppression

This version skips striking the even numbers (as being not prime), 2 is handled as a special case.

Also supported is the suppression of listing the primes if the H (high limit) is negative.

Also added is a final message indicating the number of primes found.

/*REXX program generates primes via a  wheeled  sieve of Eratosthenes  algorithm.       */
parse arg H .;   if H==''  then H=200            /*let the highest number be specified. */
tell=h>0;     H=abs(H);    w=length(H)           /*a negative H suppresses prime listing*/
if 2<=H & tell  then say right(1, w+20)'st prime   ───► '      right(2, w)
@.= '0'x                                         /*assume that  all  numbers are prime. */
cw= length(@.)                                   /*the cell width that holds numbers.   */
#= w<=H                                          /*the number of primes found  (so far).*/
!=0                                              /*skips the top part of sieve marking. */
    do j=3  by 2  for (H-2)%2;  b= j%cw          /*odd integers up to   H   inclusive.  */
    if substr(x2b(c2x(@.b)),j//cw+1,1)  then iterate              /*is  J  composite ?  */
    #= # + 1                                     /*bump the prime number counter.       */
    if tell  then say right(#, w+20)th(#)    'prime   ───► '      right(j, w)
    if !     then iterate                        /*should the top part be skipped ?     */
    jj=j * j                                     /*compute the square of  J.         ___*/
    if jj>H  then !=1                            /*indicates skip top part  if  j > √ H */
      do m=jj  to H  by j+j;   call . m;   end   /* [↑]  strike odd multiples  ¬ prime  */
    end   /*j*/                                  /*             ───                     */

say;             say  right(#, w+20)      'prime's(#)    "found up to and including "  H
exit                                             /*stick a fork in it,  we're all done. */
/*──────────────────────────────────────────────────────────────────────────────────────────────*/
.: parse arg n; b=n%cw; r=n//cw+1;_=x2b(c2x(@.b));@.b=x2c(b2x(left(_,r-1)'1'substr(_,r+1)));return
s: if arg(1)==1  then return arg(3);  return word(arg(2) 's',1)            /*pluralizer.*/
th: procedure; parse arg x; x=abs(x); return word('th st nd rd',1+x//10*(x//100%10\==1)*(x//10<4))

output when using the input default of: 200

                      1st prime   ───►    2
                      2nd prime   ───►    3
                      3rd prime   ───►    5
                      4th prime   ───►    7
                      5th prime   ───►   11
                      6th prime   ───►   13
                      7th prime   ───►   17
                      8th prime   ───►   19
                      9th prime   ───►   23
                     10th prime   ───►   29
                     11th prime   ───►   31
                     12th prime   ───►   37
                     13th prime   ───►   41
                     14th prime   ───►   43
                     15th prime   ───►   47
                     16th prime   ───►   53
                     17th prime   ───►   59
                     18th prime   ───►   61
                     19th prime   ───►   67
                     20th prime   ───►   71
                     21st prime   ───►   73
                     22nd prime   ───►   79
                     23rd prime   ───►   83
                     24th prime   ───►   89
                     25th prime   ───►   97
                     26th prime   ───►  101
                     27th prime   ───►  103
                     28th prime   ───►  107
                     29th prime   ───►  109
                     30th prime   ───►  113
                     31st prime   ───►  127
                     32nd prime   ───►  131
                     33rd prime   ───►  137
                     34th prime   ───►  139
                     35th prime   ───►  149
                     36th prime   ───►  151
                     37th prime   ───►  157
                     38th prime   ───►  163
                     39th prime   ───►  167
                     40th prime   ───►  173
                     41st prime   ───►  179
                     42nd prime   ───►  181
                     43rd prime   ───►  191
                     44th prime   ───►  193
                     45th prime   ───►  197
                     46th prime   ───►  199

                     46 primes found up to and including  200

output when using the input of: -1000

                     168 primes found up to and including  1000

output when using the input of: -10000

                     1229 primes found up to and including  10000

output when using the input of: -100000

                     9592 primes found up to and including  100000.

output when using the input of: -1000000

                     78498 primes found up to and including  10000000

output when using the input of: -10000000

                     664579 primes found up to and including  10000000

wheel version

This version skips striking the even numbers (as being not prime), 2 is handled as a special case.

It also uses a short-circuit test for striking out composites ≤ √ target

/*REXX pgm generates and displays primes via a wheeled sieve of Eratosthenes algorithm. */
parse arg H .;  if H=='' | H==","  then H= 200   /*obtain the optional argument from CL.*/
w= length(H);       @prime= right('prime', 20)   /*w:  is used for aligning the output. */
if 2<=H  then  say  @prime  right(1, w)       " ───► "       right(2, w)
#= 2<=H                                          /*the number of primes found  (so far).*/
@.=.                                             /*assume all the numbers are  prime.   */
!=0;  do j=3  by 2  for (H-2)%2                  /*the odd integers up to  H  inclusive.*/
      if @.j==''  then iterate                   /*Is composite?  Then skip this number.*/
      #= # + 1                                   /*bump the prime number counter.       */
      say  @prime right(#,w) " ───► " right(j,w) /*display the prime to the terminal.   */
      if !        then iterate                   /*skip the top part of loop?       ___ */
      if j*j>H     then !=1                      /*indicate skip top part  if  J > √ H  */
          do m=j*j  to H  by j+j;   @.m=;   end  /*strike odd multiples as  not  prime. */
      end   /*j*/                                /*       ───                           */
say                                              /*stick a fork in it,  we're all done. */
say right(#,  1 + w + length(@prime) )    'primes found up to and including '    H

output is identical to the first (non-wheel) version; program execution is over twice as fast.

The addition of the short-circuit test (using the REXX variable !) makes it about another 20% faster.

Wheel Version restructured

/*REXX program generates primes via sieve of Eratosthenes algorithm.
* 21.07.2012 Walter Pachl derived from above Rexx version
*                       avoid symbols @ and # (not supported by ooRexx)
*                       avoid negations (think positive)
**********************************************************************/
  highest=200                       /*define highest number to use.  */
  is_prime.=1                       /*assume all numbers are prime.  */
  w=length(highest)                 /*width of the biggest number,   */
                                    /*  it's used for aligned output.*/
  Do j=3 To highest By 2,           /*strike multiples of odd ints.  */
               While j*j<=highest   /* up to sqrt(highest)           */
      If is_prime.j Then Do
        Do jm=j*3 To highest By j+j /*start with next odd mult. of J */
          is_prime.jm=0             /*mark odd mult. of J not prime. */
          End
        End
    End
  np=0                              /*number of primes shown         */
  Call tell 2
  Do n=3 To highest By 2            /*list all the primes found.     */
    If is_prime.n Then Do
      Call tell n
      End
    End
  Exit
tell: Parse Arg prime
      np=np+1
      Say '           prime number' right(np,w) " --> " right(prime,w)
      Return

output is mostly identical to the above versions.

Ring

limit = 100
sieve = list(limit)
for i = 2 to limit
    for k = i*i to limit step i 
        sieve[k] = 1
    next
    if sieve[i] = 0 see "" + i + " " ok
next

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

RPL

This is a direct translation from Wikipedia. The variable i has been renamed ii to avoid confusion with the language constant i=√ -1

Works with: Halcyon Calc version 4.2.8

RPL code

Comment

 ≪ → n
   ≪ { } n + 1 CON 'A' STO
       
      2 n √ FOR ii 
         IF A ii GET THEN
            ii SQ n FOR j
               'A' j 0 PUT ii STEP
         END
      NEXT
      { } 
      2 n FOR ii IF A ii GET THEN ii + END NEXT
      'A' PURGE
≫ ≫ 'SIEVE' STO

SIEVE ( n -- { prime_numbers } )
let A be an array of Boolean values, indexed by 2 to n,
   initially all set to true.   
   for i = 2, 3, 4, ..., not exceeding √n do
       if A[i] is true
           for j = i^2, i^2+i,... not exceeding n do
               set A[j] := false



return all i such that A[i] is true.

100 SIEVE

Output:

1: { 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 }

Latest RPL versions allow to remove some slow FOR..NEXT loops and use local variables only.

Works with: HP version 49

« 'X' DUP 1 4 PICK 1 SEQ DUP → n a seq123
  « 2 n √ FOR ii 
       IF a ii GET THEN
          ii SQ n FOR j
             'a' j 0 PUT ii STEP
       END
    NEXT 
    a seq123 IFT TAIL
» » 'SIEVE' STO

Works with: HP version 49

Ruby

eratosthenes starts with nums = [nil, nil, 2, 3, 4, 5, ..., n], then marks (　the nil setting　) multiples of 2, 3, 5, 7, ... there, then returns all non-nil numbers which are the primes.

def eratosthenes(n)
  nums = [nil, nil, *2..n]
  (2..Math.sqrt(n)).each do |i|
    (i**2..n).step(i){|m| nums[m] = nil}  if nums[i]
  end
  nums.compact
end
 
p eratosthenes(100)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

With a wheel

eratosthenes2 adds more optimizations, but the code is longer.

The array nums only tracks odd numbers (skips multiples of 2).
The array nums holds booleans instead of integers, and every multiple of 3 begins false.
The outer loop skips multiples of 2 and 3.
Both inner loops skip multiples of 2 and 3.

def eratosthenes2(n)
  # For odd i, if i is prime, nums[i >> 1] is true.
  # Set false for all multiples of 3.
  nums = [true, false, true] * ((n + 5) / 6)
  nums[0] = false  # 1 is not prime.
  nums[1] = true   # 3 is prime.

  # Outer loop and both inner loops are skipping multiples of 2 and 3.
  # Outer loop checks i * i > n, same as i > Math.sqrt(n).
  i = 5
  until (m = i * i) > n
    if nums[i >> 1]
      i_times_2 = i << 1
      i_times_4 = i << 2
      while m <= n
        nums[m >> 1] = false
        m += i_times_2
        nums[m >> 1] = false
        m += i_times_4  # When i = 5, skip 45, 75, 105, ...
      end
    end
    i += 2
    if nums[i >> 1]
      m = i * i
      i_times_2 = i << 1
      i_times_4 = i << 2
      while m <= n
        nums[m >> 1] = false
        m += i_times_4  # When i = 7, skip 63, 105, 147, ...
        nums[m >> 1] = false
        m += i_times_2
      end
    end
    i += 4
  end

  primes = [2]
  nums.each_index {|i| primes << (i * 2 + 1) if nums[i]}
  primes.pop while primes.last > n
  primes
end

p eratosthenes2(100)

This simple benchmark compares eratosthenes with eratosthenes2.

require 'benchmark'
Benchmark.bmbm {|x|
  x.report("eratosthenes") { eratosthenes(1_000_000) }
  x.report("eratosthenes2") { eratosthenes2(1_000_000) }
}

eratosthenes2 runs about 4 times faster than eratosthenes.

With the standard library

MRI 1.9.x implements the sieve of Eratosthenes at file prime.rb, class EratosthensesSeive (around line 421). This implementation optimizes for space, by packing the booleans into 16-bit integers. It also hardcodes all primes less than 256.

require 'prime'
p Prime::EratosthenesGenerator.new.take_while {|i| i <= 100}

Run BASIC

input "Gimme the limit:"; limit
dim flags(limit)
for i = 2 to  limit
 for k = i*i to limit step i 
  flags(k) = 1
 next k
if flags(i) = 0 then print i;", ";
next i

Gimme the limit:?100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,

Rust

Unboxed Iterator

A slightly more idiomatic, optimized and modern iterator output example.

fn primes(n: usize) -> impl Iterator<Item = usize> {
    const START: usize = 2;
    if n < START {
        Vec::new()
    } else {
        let mut is_prime = vec![true; n + 1 - START];
        let limit = (n as f64).sqrt() as usize;
        for i in START..limit + 1 {
            let mut it = is_prime[i - START..].iter_mut().step_by(i);
            if let Some(true) = it.next() {
                it.for_each(|x| *x = false);
            }
        }
        is_prime
    }
    .into_iter()
    .enumerate()
    .filter_map(|(e, b)| if b { Some(e + START) } else { None })
}

Notes:

Starting at an offset of 2 means that an n < 2 input requires zero allocations, because Vec::new() doesn't allocate memory until elements are pushed into it.
Using Vec as an output to the if .. {} else {} condition means the output is statically deterministic, avoiding the need for a boxed trait object.
Iterating is_prime with .iter_mut() and then using .step_by(i) makes all the optimizations required, and removes a lot of tediousness.
Returning impl Iterator allows for static dispatching instead of dynamic dispatching, which is possible because the type is now statically known at compile time, making the zero input/output condition an order of magnitude faster.

Sieve of Eratosthenes - No optimization

fn simple_sieve(limit: usize) -> Vec<usize> {

    let mut is_prime = vec![true; limit+1];
    is_prime[0] = false;
    if limit >= 1 { is_prime[1] = false }

    for num in 2..limit+1 {
        if is_prime[num] {
            let mut multiple = num*num;
            while multiple <= limit {
                is_prime[multiple] = false;
                multiple += num;
            }
        }
    }

    is_prime.iter().enumerate()
        .filter_map(|(pr, &is_pr)| if is_pr {Some(pr)} else {None} )
        .collect()
}

fn main() {
    println!("{:?}", simple_sieve(100));
}

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

Basic Version slightly optimized, Iterator output

The above code doesn't even do the basic optimizing of only culling composites by primes up to the square root of the range as allowed in the task; it also outputs a vector of resulting primes, which consumes memory. The following code fixes both of those, outputting the results as an Iterator:

use std::iter::{empty, once};
use std::time::Instant;

fn basic_sieve(limit: usize) -> Box<Iterator<Item = usize>> {
    if limit < 2 { return Box::new(empty()) }

    let mut is_prime = vec![true; limit+1];
    is_prime[0] = false;
    if limit >= 1 { is_prime[1] = false }
    let sqrtlmt = (limit as f64).sqrt() as usize + 1; 

    for num in 2..sqrtlmt {
        if is_prime[num] {
            let mut multiple = num * num;
            while multiple <= limit {
                is_prime[multiple] = false;
                multiple += num;
            }
        }
    }

    Box::new(is_prime.into_iter().enumerate()
                .filter_map(|(p, is_prm)| if is_prm { Some(p) } else { None }))

}
 
fn main() {
    let n = 1000000;
    let vrslt = basic_sieve(100).collect::<Vec<_>>();
    println!("{:?}", vrslt);
    let strt = Instant::now();

    // do it 1000 times to get a reasonable execution time span...
    let rslt = (1..1000).map(|_| basic_sieve(n)).last().unwrap();

    let elpsd = strt.elapsed();

    let count = rslt.count();
    println!("{}", count);

    let secs = elpsd.as_secs();
    let millis = (elpsd.subsec_nanos() / 1000000) as u64;
    let dur = secs * 1000 + millis;
    println!("Culling composites took {} milliseconds.", dur);
}

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
78498
Culling composites took 4595 milliseconds.

The sieving operation is run for 1000 loops in order to get a reasonable execution time for comparison.

Odds-only bit-packed array, Iterator output

The following code improves the above code by sieving only odd composite numbers as 2 is the only even prime for a reduction in number of operations by a factor of about two and a half with reduction of memory use by a factor of two, and bit-packs the composite sieving array for a further reduction of memory use by a factor of eight and with some saving in time due to better CPU cache use for a given sieving range; it also demonstrates how to eliminate the redundant array bounds check:

fn optimized_sieve(limit: usize) -> Box<Iterator<Item = usize>> {
    if limit < 3 {
        return if limit < 2 { Box::new(empty()) } else { Box::new(once(2)) }
    }

    let ndxlmt = (limit - 3) / 2 + 1;
    let bfsz = ((limit - 3) / 2) / 32 + 1;
    let mut cmpsts = vec![0u32; bfsz];
    let sqrtndxlmt = ((limit as f64).sqrt() as usize - 3) / 2 + 1;

    for ndx in 0..sqrtndxlmt {
        if (cmpsts[ndx >> 5] & (1u32 << (ndx & 31))) == 0 {
            let p = ndx + ndx + 3;
            let mut cullpos = (p * p - 3) / 2;
            while cullpos < ndxlmt {
                unsafe { // avoids array bounds check, which is already done above
	            let cptr = cmpsts.get_unchecked_mut(cullpos >> 5);
	            *cptr |= 1u32 << (cullpos & 31);
                }
//                cmpsts[cullpos >> 5] |= 1u32 << (cullpos & 31); // with bounds check
                cullpos += p;
            }
        }
    }

    Box::new((-1 .. ndxlmt as isize).into_iter().filter_map(move |i| {
                if i < 0 { Some(2) } else {
                    if cmpsts[i as usize >> 5] & (1u32 << (i & 31)) == 0 {
                        Some((i + i + 3) as usize) } else { None } }
    }))
}

The above function can be used just by substituting "optimized_sieve" for "basic_sieve" in the previous "main" function, and the outputs are the same except that the time is only 1584 milliseconds, or about three times as fast.

Unbounded Page-Segmented bit-packed odds-only version with Iterator

Caution! This implementation is used in the Extensible prime generator task, so be sure not to break that implementation when changing this code.

While that above code is quite fast, as the range increases above the 10's of millions it begins to lose efficiency due to loss of CPU cache associativity as the size of the one-large-array used for culling composites grows beyond the limits of the various CPU caches. Accordingly the following page-segmented code where each culling page can be limited to not larger than the L1 CPU cache is about four times faster than the above for the range of one billion:

use std::iter::{empty, once};
use std::rc::Rc;
use std::cell::RefCell;
use std::time::Instant;

const RANGE: u64 = 1000000000;
const SZ_PAGE_BTS: u64 = (1 << 14) * 8; // this should be the size of the CPU L1 cache
const SZ_BASE_BTS: u64 = (1 << 7) * 8;
static CLUT: [u8; 256] = [
	8, 7, 7, 6, 7, 6, 6, 5, 7, 6, 6, 5, 6, 5, 5, 4, 7, 6, 6, 5, 6, 5, 5, 4, 6, 5, 5, 4, 5, 4, 4, 3, 
	7, 6, 6, 5, 6, 5, 5, 4, 6, 5, 5, 4, 5, 4, 4, 3, 6, 5, 5, 4, 5, 4, 4, 3, 5, 4, 4, 3, 4, 3, 3, 2, 
	7, 6, 6, 5, 6, 5, 5, 4, 6, 5, 5, 4, 5, 4, 4, 3, 6, 5, 5, 4, 5, 4, 4, 3, 5, 4, 4, 3, 4, 3, 3, 2, 
	6, 5, 5, 4, 5, 4, 4, 3, 5, 4, 4, 3, 4, 3, 3, 2, 5, 4, 4, 3, 4, 3, 3, 2, 4, 3, 3, 2, 3, 2, 2, 1, 
	7, 6, 6, 5, 6, 5, 5, 4, 6, 5, 5, 4, 5, 4, 4, 3, 6, 5, 5, 4, 5, 4, 4, 3, 5, 4, 4, 3, 4, 3, 3, 2, 
	6, 5, 5, 4, 5, 4, 4, 3, 5, 4, 4, 3, 4, 3, 3, 2, 5, 4, 4, 3, 4, 3, 3, 2, 4, 3, 3, 2, 3, 2, 2, 1, 
	6, 5, 5, 4, 5, 4, 4, 3, 5, 4, 4, 3, 4, 3, 3, 2, 5, 4, 4, 3, 4, 3, 3, 2, 4, 3, 3, 2, 3, 2, 2, 1, 
	5, 4, 4, 3, 4, 3, 3, 2, 4, 3, 3, 2, 3, 2, 2, 1, 4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0 ];

fn count_page(lmti: usize, pg: &[u32]) -> i64 {
	let pgsz = pg.len(); let pgbts = pgsz * 32;
	let (lmt, icnt) = if lmti >= pgbts { (pgsz, 0) } else {
		let lstw = lmti / 32;
		let msk = 0xFFFFFFFEu32 << (lmti & 31);
		let v = (msk | pg[lstw]) as usize;
		(lstw, (CLUT[v & 0xFF] + CLUT[(v >> 8) & 0xFF]
			+ CLUT[(v >> 16) & 0xFF] + CLUT[v >> 24]) as u32)
	};
	let mut count = 0u32;
	for i in 0 .. lmt {
		let v = pg[i] as usize;
		count += (CLUT[v & 0xFF] + CLUT[(v >> 8) & 0xFF]
					+ CLUT[(v >> 16) & 0xFF] + CLUT[v >> 24]) as u32;
	}
	(icnt + count) as i64
}

fn primes_pages() -> Box<Iterator<Item = (u64, Vec<u32>)>> {
	// a memoized iterable enclosing a Vec that grows as needed from an Iterator...
	type Bpasi = Box<Iterator<Item = (u64, Vec<u32>)>>; // (lwi, base cmpsts page)
	type Bpas = Rc<(RefCell<Bpasi>, RefCell<Vec<Vec<u32>>>)>; // interior mutables
	struct Bps(Bpas); // iterable wrapper for base primes array state
	struct Bpsi<'a>(usize, &'a Bpas); // iterator with current pos, state ref's
	impl<'a> Iterator for Bpsi<'a> {
		type Item = &'a Vec<u32>;
		fn next(&mut self) -> Option<Self::Item> {
			let n = self.0; let bpas = self.1;
			while n >= bpas.1.borrow().len() { // not thread safe
				let nbpg = match bpas.0.borrow_mut().next() {
								Some(v) => v, _ => (0, vec!()) };
				if nbpg.1.is_empty() { return None } // end if no source iter
				bpas.1.borrow_mut().push(cnvrt2bppg(nbpg));
			}
			self.0 += 1; // unsafe pointer extends interior -> exterior lifetime
			// multi-threading might drop following Vec while reading - protect
			let ptr = &bpas.1.borrow()[n] as *const Vec<u32>;
			unsafe { Some(&(*ptr)) }
		}
	}
	impl<'a> IntoIterator for &'a Bps {
		type Item = &'a Vec<u32>;
		type IntoIter = Bpsi<'a>;
		fn into_iter(self) -> Self::IntoIter {
			Bpsi(0, &self.0)
		}
	}
	fn make_page(lwi: u64, szbts: u64, bppgs: &Bpas)
			-> (u64, Vec<u32>) {
		let nxti = lwi + szbts;
		let pbts = szbts as usize;
		let mut cmpsts = vec!(0u32; pbts / 32);
		'outer: for bpg in Bps(bppgs.clone()).into_iter() { // in the inner tight loop...
			let pgsz = bpg.len();
			for i in 0 .. pgsz {
				let p = bpg[i] as u64; let pc = p as usize;
				let s = (p * p - 3) / 2;
				if s >= nxti { break 'outer; } else { // page start address:
					let mut cp = if s >= lwi { (s - lwi) as usize } else {
						let r = ((lwi - s) % p) as usize;
						if r == 0 { 0 } else { pc - r }
					};
					while cp < pbts {
						unsafe { // avoids array bounds check, which is already done above
							let cptr = cmpsts.get_unchecked_mut(cp >> 5);
							*cptr |= 1u32 << (cp & 31); // about as fast as it gets...
						}
//						cmpsts[cp >> 5] |= 1u32 << (cp & 31);
						cp += pc;
					}
				}
			}
		}
		(lwi, cmpsts)
	}
	fn pages_from(lwi: u64, szbts: u64, bpas: Bpas)
			-> Box<Iterator<Item = (u64, Vec<u32>)>> {
		struct Gen(u64,  u64);
		impl Iterator for Gen {
			type Item = (u64, u64);
			#[inline]
			fn next(&mut self) -> Option<(u64, u64)> {
				let v = self.0; let inc = self.1; // calculate variable size here
				self.0 = v + inc;
				Some((v, inc))
			}
		}
		Box::new(Gen(lwi, szbts)
					.map(move |(lwi, szbts)| make_page(lwi, szbts, &bpas)))
	}
	fn cnvrt2bppg(cmpsts: (u64, Vec<u32>)) -> Vec<u32> {
		let (lwi, pg) = cmpsts;
		let pgbts = pg.len() * 32;
		let cnt = count_page(pgbts, &pg) as usize;
		let mut bpv = vec!(0u32; cnt);
		let mut j = 0; let bsp = (lwi + lwi + 3) as usize;
		for i in 0 .. pgbts {
			if (pg[i >> 5] & (1u32 << (i & 0x1F))) == 0u32 {
				bpv[j] = (bsp + i + i) as u32; j += 1;
			}
		}
		bpv
	}
	// recursive Rc/RefCell variable bpas - used only for init, then fixed ...
	// start with just enough base primes to init the first base primes page...
	let base_base_prms = vec!(3u32,5u32,7u32);
	let rcvv = RefCell::new(vec!(base_base_prms));
	let bpas: Bpas = Rc::new((RefCell::new(Box::new(empty())), rcvv));
	let initpg = make_page(0, 32, &bpas); // small base primes page for SZ_BASE_BTS = 2^7 * 8
	*bpas.1.borrow_mut() = vec!(cnvrt2bppg(initpg)); // use for first page
	let frstpg = make_page(0, SZ_BASE_BTS, &bpas); // init bpas for first base prime page
	*bpas.0.borrow_mut() = pages_from(SZ_BASE_BTS, SZ_BASE_BTS, bpas.clone()); // recurse bpas
	*bpas.1.borrow_mut() = vec!(cnvrt2bppg(frstpg)); // fixed for subsequent pages
	pages_from(0, SZ_PAGE_BTS, bpas) // and bpas also used here for main pages
}
 
fn primes_paged() -> Box<Iterator<Item = u64>> {
	fn list_paged_primes(cmpstpgs: Box<Iterator<Item = (u64, Vec<u32>)>>)
			-> Box<Iterator<Item = u64>> {
		Box::new(cmpstpgs.flat_map(move |(lwi, cmpsts)| {
			let pgbts = (cmpsts.len() * 32) as usize;
			(0..pgbts).filter_map(move |i| {
				if cmpsts[i >> 5] & (1u32 << (i & 31)) == 0 {
					Some((lwi + i as u64) * 2 + 3) } else { None } }) }))
	}
	Box::new(once(2u64).chain(list_paged_primes(primes_pages())))
}

fn count_primes_paged(top: u64) -> i64 {
	if top < 3 { if top < 2 { return 0i64 } else { return 1i64 } }
	let topi = (top - 3u64) / 2;
	primes_pages().take_while(|&(lwi, _)| lwi <= topi)
		.map(|(lwi, pg)| { count_page((topi - lwi) as usize, &pg) })
		.sum::<i64>() + 1
}

fn main() {
	let n = 262146;
	let vrslt = primes_paged()
			.take_while(|&p| p <= 100)
			.collect::<Vec<_>>();
	println!("{:?}", vrslt);

	let strt = Instant::now();

//	let count = primes_paged().take_while(|&p| p <= RANGE).count(); // slow way to count
	let count = count_primes_paged(RANGE); // fast way to count

	let elpsd = strt.elapsed();

	println!("{}", count);

	let secs = elpsd.as_secs();
	let millis = (elpsd.subsec_nanos() / 1000000) as u64;
	let dur = secs * 1000 + millis;
	println!("Culling composites took {} milliseconds.", dur);
}

The output is about the same as the previous codes except much faster; as well as cache size improvements mentioned above, it has a population count primes counting function that is able to determine the number of found primes about twice as fast as using the Iterator count() method (commented out and labelled as "the slow way" in the main function).

As listed above, the code maintains its efficiency up to about sixteen billion, and can easily be extended to be useful above that point by having the buffer size dynamically calculated to be proportional to the square root of the current range as commented in the code.

It would also be quite easy to extend the code to use multi-threading per page so that the time would be reduced proportionally to the number of true CPU cores used (not Hyper-Threaded ones) as in four true cores for many common high end desktop CPU's.

Before being extended to truly huge ranges such a 1e14, the code should have maximum wheel factorization added (2;3;5;7 wheels and the culling buffers further pre-culled by the primes (11;13;17; and maybe 19), which would speed it up by another factor of four or so for the range of one billion. It would also be possible to use extreme loop unrolling techniques such as used by "primesieve" written in C/C++ to increase the speed for this range by another factor of two or so.

The above code demonstrates some techniques to work within the limitations of Rust's ownership/borrowing/lifetime memory model as it: 1) uses a recursive secondary base primes Iterator made persistent by using a Vec that uses its own value as a source of its own page stream, 2) this is done by using a recursive variable that accessed as a Rc reference counted heap value with internal mutability by a pair of RefCell's, 3) note that the above secondary stream is not thread safe and needs to have the Rc changed to an Arc, the RefCell's changed to Mutex'es or (probably preferably RwLock's that enclose/lock all reading and writing operations in the secondary stream "Bpsi"'s next() method, and 4) the use of Iterators where their performance doesn't matter (at the page level) while using tight loops at more inner levels.

S-BASIC

comment
   Find primes up to the specified limit (here 1,000) using
   classic Sieve of Eratosthenes
end

$constant limit = 1000
$constant false = 0
$constant true = FFFFH

var i, k, count, col = integer
dim integer flags(limit)

print "Finding primes from 2 to";limit

rem - initialize table
for i = 1 to limit
  flags(i) = true
next i

rem - sieve for primes
for i = 2 to int(sqr(limit))
  if flags(i) = true then
     for k = (i*i) to limit step i
        flags(k) = false
     next k
next i

rem - write out primes 10 per line
count = 0
col = 1
for i = 2 to limit
   if flags(i) = true then
      begin
         print using "#####";i;
         count = count + 1
         col = col + 1
         if col > 10 then
            begin
               print
               col = 1
            end
      end
next i
print
print count; " primes were found."

end

Output:

Finding primes from 2 to 1000
    2    3    5    7   11   13   17   19   23   29
   31   37   41   43   47   53   59   61   67   71
                      . . .
  877  881  883  887  907  911  919  929  937  941
  947  953  967  971  977  983  991  997
168 primes were found.

SAS

The following defines an IML routine to compute the sieve, and as an example stores the primes below 1000 in a dataset.

proc iml;
start sieve(n);
    a = J(n,1);
    a[1] = 0;
    do i = 1 to n;
        if a[i] then do;
            if i*i>n then return(a);
            a[i*(i:int(n/i))] = 0;
        end;
    end;
finish;

a = loc(sieve(1000))`;
create primes from a;
append from a;
close primes;
quit;

SASL

This example is incorrect. Please fix the code and remove this message.

Details: These use REM (division) testing and so are Trial Division algorithms, not Sieve of Eratosthenes.

Copied from SASL manual, top of page 36. This provides an infinite list.

show primes
WHERE
primes = sieve (2...)
sieve (p : x ) = p : sieve {a <- x; a REM p > 0}
?

The limited list for the first 1000 numbers

show primes
WHERE
primes = sieve (2..1000)
sieve (p : x ) = p : sieve {a <- x; a REM p > 0}
?

Scala

Genuine Eratosthenes sieve

import scala.annotation.tailrec
import scala.collection.parallel.mutable
import scala.compat.Platform

object GenuineEratosthenesSieve extends App {
  def sieveOfEratosthenes(limit: Int) = {
    val (primes: mutable.ParSet[Int], sqrtLimit) = (mutable.ParSet.empty ++ (2 to limit), math.sqrt(limit).toInt)
    @tailrec
    def prim(candidate: Int): Unit = {
      if (candidate <= sqrtLimit) {
        if (primes contains candidate) primes --= candidate * candidate to limit by candidate
        prim(candidate + 1)
      }
    }
    prim(2)
    primes
  }
  // BitSet toList is shuffled when using the ParSet version. So it has to be sorted before using it as a sequence.

  assert(sieveOfEratosthenes(15099480).size == 976729)
  println(s"Successfully completed without errors. [total ${Platform.currentTime - executionStart} ms]")
}

Output:

Successfully completed without errors. [total 39807 ms]

Process finished with exit code 0

While concise, the above code is quite slow but a little faster not using the ParSet (take out the '.par' for speed), in which case the sorting ('sorted') is not necessary for an additional small gain in speed; the above code is slow because of all the overhead in processing the bit packed "BitSet" bib-by-bit using complex "higher-order" method calls.

The following '''odds-only''' code is written in a very concise functional style (no mutable state other than the contents of the composites buffer and "higher order functions" for clarity), in this case using a Scala mutable BitSet:

object SoEwithBitSet {
  def makeSoE_PrimesTo(top: Int): Iterator[Int] = {
    val topNdx = (top - 3) / 2 //odds composite BitSet buffer offset down to 3
    val cmpsts = new scala.collection.mutable.BitSet(topNdx + 1) //size includes topNdx
    @inline def cullPrmCmpsts(prmNdx: Int) = {
      val prm = prmNdx + prmNdx + 3; cmpsts ++= ((prm * prm - 3) >>> 1 to topNdx by prm) }
    (0 to (Math.sqrt(top).toInt - 3) / 2).filterNot { cmpsts }.foreach { cullPrmCmpsts }
    Iterator.single(2) ++ (0 to topNdx).filterNot { cmpsts }.map { pi => pi + pi + 3 } }
}

In spite of being very concise, it is very much faster than the above code converted to odds-only due to the use of the BitSet instead of the hash table based Set (or ParSet), taking only a few seconds to enumerate the primes to 100 million as compared to the 10's of seconds to count the primes to above 15 million above.

Using tail recursion

The below '''odds-only''' code using a primitive array (bit packed) and tail recursion to avoid some of the enumeration delays due to nested complex "higher order" function calls is almost eight times faster than the above more functional code:

object SoEwithArray {
  def makeSoE_PrimesTo(top: Int) = {
    import scala.annotation.tailrec
    val topNdx = (top - 3) / 2 + 1 //odds composite BitSet buffer offset down to 3 plus 1 for overflow
    val (cmpsts, sqrtLmtNdx) = (new Array[Int]((topNdx >>> 5) + 1), (Math.sqrt(top).toInt - 3) / 2)

    @inline def isCmpst(ci: Int): Boolean = (cmpsts(ci >>> 5) & (1 << (ci & 31))) != 0

    @inline def setCmpst(ci: Int): Unit = cmpsts(ci >>> 5) |= 1 << (ci & 31)

    @tailrec def forCndNdxsFrom(cndNdx: Int): Unit =
      if (cndNdx <= sqrtLmtNdx) {
        if (!isCmpst(cndNdx)) { //is prime
          val p = cndNdx + cndNdx + 3
          
          @tailrec def cullPrmCmpstsFrom(cmpstNdx: Int): Unit =
            if (cmpstNdx <= topNdx) { setCmpst(cmpstNdx); cullPrmCmpstsFrom(cmpstNdx + p) }
          
          cullPrmCmpstsFrom((p * p - 3) >>> 1) }
        
        forCndNdxsFrom(cndNdx + 1) }; forCndNdxsFrom(0)

    @tailrec def getNxtPrmFrom(cndNdx: Int): Int =
      if ((cndNdx > topNdx) || !isCmpst(cndNdx)) cndNdx + cndNdx + 3 else getNxtPrmFrom(cndNdx + 1)

    Iterator.single(2) ++ Iterator.iterate(3)(p => getNxtPrmFrom(((p + 2) - 3) >>> 1)).takeWhile(_ <= top)
  }
}

It can be tested with the following code:

object Main extends App {
  import SoEwithArray._
  val top_num = 100000000
  val strt = System.nanoTime()
  val count = makeSoE_PrimesTo(top_num).size
  val end = System.nanoTime()
  println(s"Successfully completed without errors. [total ${(end - strt) / 1000000} ms]")
  println(f"Found $count primes up to $top_num" + ".")
  println("Using one large mutable Array and tail recursive loops.")
}

To produce the following output:

Output:

Successfully completed without errors. [total 661 ms]
Found 5761455 primes up to 100000000.
Using one large mutable Array and tail recursive loops.

Odds-only page-segmented "infinite" generator version using tail recursion

The above code still uses an amount of memory proportional to the range of the sieve (although bit-packed as 8 values per byte). As well as only sieving odd candidates, the following code uses a fixed range buffer that is about the size of the CPU L2 cache plus only storage for the base primes up to the square root of the range for a large potential saving in RAM memory used as well as greatly reducing memory access times. The use of innermost tail recursive loops for critical loops where the majority of the execution time is spent rather than "higher order" functions from iterators also greatly reduces execution time, with much of the remaining time used just to enumerate the primes output:

object APFSoEPagedOdds {
  import scala.annotation.tailrec
  
  private val CACHESZ = 1 << 18 //used cache buffer size
  private val PGSZ = CACHESZ / 4 //number of int's in cache
  private val PGBTS = PGSZ * 32 //number of bits in buffer
  
  //processing output type is a tuple of low bit (odds) address,
  // bit range size, and the actual culled page segment buffer.
  private type Chunk = (Long, Int, Array[Int])
  
  //produces an iteration of all the primes from an iteration of Chunks
  private def enumChnkPrms(chnks: Stream[Chunk]): Iterator[Long] = {
    def iterchnk(chnk: Chunk) = { //iterating primes per Chunk
      val (lw, rng, bf) = chnk
      @tailrec def nxtpi(i: Int): Int = { //find next prime index not composite
        if (i < rng && (bf(i >>> 5) & (1 << (i & 31))) != 0) nxtpi(i + 1) else i }
      Iterator.iterate(nxtpi(0))(i => nxtpi(i + 1)).takeWhile { _ < rng }
        .map { i => ((lw + i) << 1) + 3 } } //map from index to prime value
    chnks.toIterator.flatMap { iterchnk } }
  
  //culls the composite number bit representations from the bit-packed page buffer
  //using a given source of a base primes iterator
  private def cullPg(bsprms: Iterator[Long],
                     lowi: Long, buf: Array[Int]): Unit = {
    //cull for all base primes until square >= nxt
    val rng = buf.length * 32; val nxt = lowi + rng
    @tailrec def cull(bps: Iterator[Long]): Unit = {
      //given prime then calculate the base start address for prime squared
      val bp = bps.next(); val s = (bp * bp - 3) / 2
      //almost all of the execution time is spent in the following tight loop
      @tailrec def cullp(j: Int): Unit = { //cull the buffer for given prime
        if (j < rng) { buf(j >>> 5) |= 1 << (j & 31); cullp(j + bp.toInt) } }
      if (s < nxt) { //only cull for primes squared less than max
        //calculate the start address within this given page segment
        val strt = if (s >= lowi) (s - lowi).toInt else {
          val b = (lowi - s) % bp
          if (b == 0) 0 else (bp - b).toInt }
        cullp(strt); if (bps.hasNext) cull(bps) } } //loop for all primes in range
    //for the first page, use own bit pattern as a source of base primes
    //if another source is not given
    if (lowi <= 0 && bsprms.isEmpty)
      cull(enumChnkPrms(Stream((0, buf.length << 5, buf))))
    //otherwise use the given source of base primes
    else if (bsprms.hasNext) cull(bsprms) }
  
  //makes a chunk given a low address in (odds) bits
  private def mkChnk(lwi: Long): Chunk = {
    val rng = PGBTS; val buf = new Array[Int](rng / 32);
    val bps = if (lwi <= 0) Iterator.empty else enumChnkPrms(basePrms)
    cullPg(bps, lwi, buf); (lwi, rng, buf) }
  
  //new independent source of base primes in a stream of packed-bit arrays
  //memoized by converting it to a Stream and retaining a reference here
  private val basePrms: Stream[Chunk] =
    Stream.iterate(mkChnk(0)) { case (lw, rng, bf) => { mkChnk(lw + rng) } }
  
  //produces an infinite iterator over all the chunk results
  private def itrRslts[R](rsltf: Chunk => R): Iterator[R] = {
    def mkrslt(lwi: Long) = { //makes tuple of result and next low index
      val c = mkChnk(lwi); val (_, rng, _) = c; (rsltf(c), lwi + rng) }
    Iterator.iterate(mkrslt(0)) { case (_, nlwi) => mkrslt(nlwi) }
            .map { case (rslt, _) => rslt} } //infinite iteration of results
  
  //iterates across the "infinite" produced output primes
  def enumSoEPrimes(): Iterator[Long] = //use itrRsltsMP to produce Chunks iteration
    Iterator.single(2L) ++ enumChnkPrms(itrRslts { identity }.toStream)
 
  //counts the number of remaining primes in a page segment buffer
  //using a very fast bitcount per integer element
  //with special treatment for the last page
  private def countpgto(top: Long, b: Array[Int], nlwp: Long) = {
    val numbts = b.length * 32; val prng = numbts * 2
    @tailrec def cnt(i: Int, c: Int): Int = { //tight int bitcount loop
      if (i >= b.length) c else cnt (i + 1, c - Integer.bitCount(b(i))) }
    if (nlwp > top) { //for top in page, calculate int address containing top
      val bi = ((top - nlwp + prng) >>> 1).toInt
      val w = bi >>> 5; b(w) |= -2 << (bi & 31) //mark all following as composite
      for (i <- w + 1 until b.length) b(i) = -1 } //for all int's to end of buffer
    cnt(0, numbts) } //counting the entire buffer in every case
  
  //counts all the primes up to a top value
  def countSoEPrimesTo(top: Long): Long = {
    if (top < 2) return 0L else if (top < 3) return 1L //no work necessary
    //count all Chunks using multi-processing
    val gen = itrRslts { case (lwi, rng, bf) =>
      val nlwp = (lwi + rng) * 2 + 3; (countpgto(top, bf, nlwp), nlwp) }
    //a loop to take Chunk's up to including top limit but not past it
    @tailrec def takeUpto(acc: Long): Long = {
      val (cnt, nlwp) = gen.next(); val nacc = acc + cnt
      if (nlwp <= top) takeUpto(nacc) else nacc }; takeUpto(1) }
}

As the above and all following sieves are "infinite", they all require an extra range limiting condition to produce a finite output, such as the addition of ".takeWhile(_ <= topLimit)" where "topLimit" is the specified range as is done in the following code:

object MainSoEPagedOdds extends App {
  import APFSoEPagedOdds._
  countSoEPrimesTo(100)
  val top = 1000000000
  val strt = System.currentTimeMillis()
  val cnt = enumSoEPrimes().takeWhile { _ <= top }.length
//  val cnt = countSoEPrimesTo(top)
  val elpsd = System.currentTimeMillis() - strt
  println(f"Found $cnt primes up to $top in $elpsd milliseconds.")
}

which outputs the following:

Output:

Found 50847534 primes up to 1000000000 in 5867 milliseconds.

While the above code is reasonably fast, much of the execution time is consumed by the use of the built-in functions and iterators for concise code, especially in the use of iterators for primes output. To show this, the code includes a "countSoEPrimesTo" function/method that can be uncommented in the above code (commenting out the "takeWhile" line) to produce the following output:

Output:

Found 50847534 primes up to 1000000000 in 2623 milliseconds.

This shows that it takes somewhat longer to enumerate the primes than it does to actually produce them; this could be improved with a "roll-your-own" enumeration Iterator implementation at considerable increased complexity, but enumeration time will still be a significant portion of the execution time. Further improvements to the code using extreme wheel factorization and multi-processing will make enumeration time an even higher percentage of the total; this is why for large ranges one writes functions/methods similar to "countSoEPrimesTo" to (say) sum the primes, to find the nth prime, etc.

Odds-Only "infinite" generator sieve using Streams and Co-Inductive Streams

The following code uses delayed recursion via Streams to implement the Richard Bird algorithm mentioned in the last part (the Epilogue) of M.O'Neill's paper, which is a true incremental Sieve of Eratosthenes. It is nowhere near as fast as the array based solutions due to the overhead of functionally chasing the merging of the prime multiple streams; this also means that the empirical performance is not according to the usual Sieve of Eratosthenes approximations due to this overhead increasing as the log of the sieved range, but it is much better than the "unfaithful" sieve.

  def birdPrimes() = {
    def oddPrimes: Stream[Int] = {
      def merge(xs: Stream[Int], ys: Stream[Int]): Stream[Int] = {
        val (x, y) = (xs.head, ys.head)
   
        if (y > x) x #:: merge(xs.tail, ys) else if (x > y) y #:: merge(xs, ys.tail) else x #:: merge(xs.tail, ys.tail)
      }
   
      def primeMltpls(p: Int): Stream[Int] = Stream.iterate(p * p)(_ + p + p)
   
      def allMltpls(ps: Stream[Int]): Stream[Stream[Int]] = primeMltpls(ps.head) #:: allMltpls(ps.tail)
   
      def join(ams: Stream[Stream[Int]]): Stream[Int] = ams.head.head #:: merge(ams.head.tail, join(ams.tail))
   
      def oddPrms(n: Int, composites: Stream[Int]): Stream[Int] =
        if (n >= composites.head) oddPrms(n + 2, composites.tail) else n #:: oddPrms(n + 2, composites)
   
      //following uses a new recursive source of odd base primes
      3 #:: oddPrms(5, join(allMltpls(oddPrimes)))
    }
    2 #:: oddPrimes
  }

Now this algorithm doesn't really need the memoization and full laziness as offered by Streams, so an implementation and use of a Co-Inductive Stream (CIS) class is sufficient and reduces execution time by almost a factor of two:

  class CIS[A](val start: A, val continue: () => CIS[A])

  def primesBirdCIS: Iterator[Int] = {
    def merge(xs: CIS[Int], ys: CIS[Int]): CIS[Int] = {
      val (x, y) = (xs.start, ys.start)

      if (y > x) new CIS(x, () => merge(xs.continue(), ys))
      else if (x > y) new CIS(y, () => merge(xs, ys.continue()))
      else new CIS(x, () => merge(xs.continue(), ys.continue()))
    }

    def primeMltpls(p: Int): CIS[Int] = {
      def nextCull(cull: Int): CIS[Int] = new CIS[Int](cull, () => nextCull(cull + 2 * p))
      nextCull(p * p)
    }

    def allMltpls(ps: CIS[Int]): CIS[CIS[Int]] =
      new CIS[CIS[Int]](primeMltpls(ps.start), () => allMltpls(ps.continue()))
    def join(ams: CIS[CIS[Int]]): CIS[Int] = {
      new CIS[Int](ams.start.start, () => merge(ams.start.continue(), join(ams.continue())))
    }

    def oddPrimes(): CIS[Int] = {
      def oddPrms(n: Int, composites: CIS[Int]): CIS[Int] = { //"minua"
        if (n >= composites.start) oddPrms(n + 2, composites.continue())
        else new CIS[Int](n, () => oddPrms(n + 2, composites))
      }
      //following uses a new recursive source of odd base primes
      new CIS(3, () => oddPrms(5, join(allMltpls(oddPrimes()))))
    }

    Iterator.single(2) ++ Iterator.iterate(oddPrimes())(_.continue()).map(_.start)
  }

Further gains in performance for these last two implementations can be had by using further wheel factorization and "tree folding/merging" as per this Haskell implementation.

Odds-Only "infinite" generator sieve using a hash table (HashMap)

As per the "unfaithful sieve" article linked above, the incremental "infinite" Sieve of Eratosthenes can be implemented using a hash table instead of a Priority Queue or Map (Binary Heap) as were used in that article. The following implementation postpones the adding of base prime representations to the hash table until necessary to keep the hash table small:

  def SoEInc: Iterator[Int] = {
    val nextComposites = scala.collection.mutable.HashMap[Int, Int]()
    def oddPrimes: Iterator[Int] = {
      val basePrimes = SoEInc
      basePrimes.next()
      basePrimes.next() // skip the two and three prime factors
      @tailrec def makePrime(state: (Int, Int, Int)): (Int, Int, Int) = {
        val (candidate, nextBasePrime, nextSquare) = state
        if (candidate >= nextSquare) {
          val adv = nextBasePrime << 1
          nextComposites += ((nextSquare + adv) -> adv)
          val np = basePrimes.next()
          makePrime((candidate + 2, np, np * np))
        } else if (nextComposites.contains(candidate)) {
          val adv = nextComposites(candidate)
          nextComposites -= (candidate) += (Iterator.iterate(candidate + adv)(_ + adv)
            .dropWhile(nextComposites.contains(_)).next() -> adv)
          makePrime((candidate + 2, nextBasePrime, nextSquare))
        } else (candidate, nextBasePrime, nextSquare)
      }
      Iterator.iterate((5, 3, 9)) { case (c, p, q) => makePrime((c + 2, p, q)) }
        .map { case (p, _, _) => p }
    }
    List(2, 3).toIterator ++ oddPrimes
  }

The above could be implemented using Streams or Co-Inductive Streams to pass the continuation parameters as passed here in a tuple but there would be no real difference in speed and there is no need to use the implied laziness. As compared to the versions of the Bird (or tree folding) Sieve of Eratosthenes, this has the expected same computational complexity as the array based versions, but is about 20 times slower due to the constant overhead of processing the key value hashing. Memory use is quite low, only being the hash table entries for each of the base prime values less than the square root of the last prime enumerated multiplied by the size of each hash entry (about 12 bytes in this case) plus a "load factor" percentage overhead in hash table size to minimize hash collisions (about twice as large as entries actually used by default on average).

The Scala implementable of a mutable HashMap is slower than the java.util.HashMap one by a factor of almost two, but the Scala version is used here to keep the code more portable (as to CLR). One can also quite easily convert this code to use the immutable Scala HashMap, but the code runs about four times slower due to the required "copy on update" operations for immutable objects.

This algorithm is very responsive to further application of wheel factorization, which can make it run up to about four times faster for the composite number culling operations; however, that is not enough to allow it to catch up to the array based sieves.

Scheme

Tail-recursive solution

Works with: Scheme version R

^{5}

RS

; Tail-recursive solution :
(define (sieve n)
  (define (aux u v)
    (let ((p (car v)))
      (if (> (* p p) n)
        (let rev-append ((u u) (v v))
          (if (null? u) v (rev-append (cdr u) (cons (car u) v))))
        (aux (cons p u)
          (let wheel ((u '()) (v (cdr v)) (a (* p p)))
            (cond ((null? v) (reverse u))
                  ((= (car v) a) (wheel u (cdr v) (+ a p)))
                  ((> (car v) a) (wheel u v (+ a p)))
                  (else (wheel (cons (car v) u) (cdr v) a))))))))
  (aux '(2)
    (let range ((v '()) (k (if (odd? n) n (- n 1))))
      (if (< k 3) v (range (cons k v) (- k 2))))))

; > (sieve 100)
; (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)
; > (length (sieve 10000000))
; 664579

Simpler, non-tail-recursive solution

; Simpler solution, with the penalty that none of 'iota, 'strike or 'sieve is tail-recursive :
(define (iota start stop stride)
  (if (> start stop)
      (list)
      (cons start (iota (+ start stride) stop stride))))

(define (strike lst start stride)
  (cond ((null? lst) lst)
        ((= (car lst) start) (strike (cdr lst) (+ start stride) stride))
        ((> (car lst) start) (strike lst (+ start stride) stride))
        (else (cons (car lst) (strike (cdr lst) start stride)))))

(define (primes limit)
  (let ((stop (sqrt limit)))
    (define (sieve lst)
      (let ((p (car lst)))
        (if (> p stop)
            lst
            (cons p (sieve (strike (cdr lst) (* p p) p))))))
    (sieve (iota 2 limit 1))))

(display (primes 100))
(newline)

Output:

(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

Optimised using an odds-wheel

Optimised using a pre-computed wheel based on 2 (i.e. odds only):

(define (primes-wheel-2 limit)
  (let ((stop (sqrt limit)))
    (define (sieve lst)
      (let ((p (car lst)))
        (if (> p stop)
            lst
            (cons p (sieve (strike (cdr lst) (* p p) (* 2 p)))))))
    (cons 2 (sieve (iota 3 limit 2)))))

(display (primes-wheel-2 100))
(newline)

Output:

(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

Vector-based

Vector-based (faster), works with R $^{5}$ RS:

; initialize v to vector of sequential integers
(define (initialize! v)
  (define (iter v n) (if (>= n (vector-length v)) 
                         (values) 
                         (begin (vector-set! v n n) (iter v (+ n 1)))))
  (iter v 0))

; set every nth element of vector v to 0,
; starting with element m
(define (strike! v m n)
  (cond ((>= m (vector-length v)) (values))
        (else (begin
                (vector-set! v m 0)
                (strike! v (+ m n) n)))))

; lowest non-zero index of vector v >= n
(define (nextprime v n)
  (if (zero? (vector-ref v n))
      (nextprime v (+ n 1))
      (vector-ref v n)))

; remove elements satisfying pred? from list lst
(define (remove pred? lst)
  (cond 
    ((null? lst) '())
    ((pred? (car lst))(remove pred? (cdr lst)))
    (else (cons (car lst) (remove pred? (cdr lst))))))

; the sieve itself
(define (sieve n)
  (define stop (sqrt n))
  (define (iter v p)
    (cond 
      ((> p stop) v)
      (else 
       (begin
         (strike! v (* p p) p)
         (iter v (nextprime v (+ p 1)))))))
  
  (let ((v (make-vector (+ n 1))))
    (initialize! v)
    (vector-set! v 1 0) ; 1 is not a prime
    (remove zero? (vector->list (iter v 2)))))

SICP-style streams

Using SICP-style head-forced streams. Works with MIT-Scheme, Chez Scheme, – or any other Scheme, if writing out by hand the expansion of the only macro here, s-cons, with explicit lambda. Common functions:

 ;;;; Stream Implementation
 (define (head s) (car s))   
 (define (tail s) ((cdr s)))  
 (define-syntax s-cons
   (syntax-rules () ((s-cons h t) (cons h (lambda () t))))) 

 ;;;; Stream Utility Functions
 (define (from-By x s)
   (s-cons x (from-By (+ x s) s)))
 (define (take n s) 
   (cond 
     ((> n 1) (cons (head s) (take (- n 1) (tail s))))
     ((= n 1) (list (head s)))      ;; don't force it too soon!!
     (else '())))     ;; so (take 4 (s-map / (from-By 4 -1))) works
 (define (drop n s)
   (cond 
     ((> n 0) (drop (- n 1) (tail s)))
     (else s)))
 (define (s-map f s)
   (s-cons (f (head s)) (s-map f (tail s))))
 (define (s-diff s1 s2)
   (let ((h1 (head s1)) (h2 (head s2)))
    (cond
     ((< h1 h2) (s-cons h1 (s-diff  (tail s1)       s2 )))
     ((< h2 h1)            (s-diff        s1  (tail s2)))
     (else                 (s-diff  (tail s1) (tail s2))))))
 (define (s-union s1 s2)
   (let ((h1 (head s1)) (h2 (head s2)))
    (cond
     ((< h1 h2) (s-cons h1 (s-union (tail s1)       s2 )))
     ((< h2 h1) (s-cons h2 (s-union       s1  (tail s2))))
     (else      (s-cons h1 (s-union (tail s1) (tail s2)))))))

The simplest, naive sieve

Very slow, running at ~ n^2.2, empirically, and worsening:

 (define (sieve s) 
	(let ((p (head s))) 
	  (s-cons p 
	          (sieve (s-diff s (from-By p p))))))
 (define primes (sieve (from-By 2 1)))

Bounded, stopping early

Stops at the square root of the upper limit m, running at about ~ n^1.4 in n primes produced, empirically. Returns infinite stream of numbers which is only valid up to m, includes composites above it:

 (define (primes-To m)
   (define (sieve s) 
     (let ((p (head s))) 
       (cond ((> (* p p) m) s) 
             (else (s-cons p 
	             (sieve (s-diff (tail s) 
	                     (from-By (* p p) p))))))))
   (sieve (from-By 2 1)))

Combined multiples sieve

Archetypal, straightforward approach by Richard Bird, presented in Melissa E. O'Neill article. Uses s-linear-join, i.e. right fold, which is less efficient and of worse time complexity than the tree-folding that follows. Does not attempt to conserve space by arranging for the additional inner feedback loop, as is done in the tree-folding variant below.

 (define (primes-stream-ala-Bird)
   (define (mults p) (from-By (* p p) p))
   (define primes                                          ;; primes are 
       (s-cons 2 (s-diff (from-By 3 1)                     ;;  numbers > 1, without 
                  (s-linear-join (s-map mults primes)))))  ;;   multiples of primes
   primes)

 ;;;; join streams using linear structure
 (define (s-linear-join sts)
   (s-cons (head (head sts)) 
           (s-union (tail (head sts)) 
                    (s-linear-join (tail sts)))))

Here is a version of the same sieve, which is self contained with all the requisite functions wrapped in the overall function; optimized further. It works with odd primes only, and arranges for a separate primes feed for the base primes separate from the output stream, calculated recursively by the recursive call to "oddprms" in forming "cmpsts". It also "fuses" two functions, s-diff and from-By, into one, minusstrtat:

(define (birdPrimes)
  (define (mltpls p)
    (define pm2 (* p 2))
    (let nxtmltpl ((cmpst (* p p)))
      (cons cmpst (lambda () (nxtmltpl (+ cmpst pm2))))))
  (define (allmltpls ps)
    (cons (mltpls (car ps)) (lambda () (allmltpls ((cdr ps))))))
  (define (merge xs ys)
    (let ((x (car xs)) (xt (cdr xs)) (y (car ys)) (yt (cdr ys)))
      (cond ((< x y) (cons x (lambda () (merge (xt) ys))))
            ((> x y) (cons y (lambda () (merge xs (yt)))))
            (else (cons x (lambda () (merge (xt) (yt))))))))
  (define (mrgmltpls mltplss)
    (cons (car (car mltplss))
          (lambda () (merge ((cdr (car mltplss)))
                            (mrgmltpls ((cdr mltplss)))))))
  (define (minusstrtat n cmps)
    (if (< n (car cmps))
      (cons n (lambda () (minusstrtat (+ n 2) cmps)))
      (minusstrtat (+ n 2) ((cdr cmps)))))
  (define (cmpsts) (mrgmltpls (allmltpls (oddprms)))) ;; internal define's are mutually recursive
  (define (oddprms) (cons 3 (lambda () (minusstrtat 5 (cmpsts)))))  
  (cons 2 (lambda () (oddprms))))

It can be tested with the following code:

(define (nthPrime n)
  (let nxtprm ((cnt 0) (ps (birdPrimes)))
    (if (< cnt n) (nxtprm (+ cnt 1) ((cdr ps))) (car ps))))
(nthPrime 1000000)

Output:

15485863

The same code can easily be modified to perform the folded tree case just by writing and integrating a "pairs" function to do the folding along with the merge, which has been done as an alternate tree folding case below.

Tree-folding

The most efficient. Finds composites as a tree of unions of each prime's multiples.

 ;;;; all primes' multiples are removed, merged through a tree of unions
 ;;;;  runs in ~ n^1.15 run time in producing n = 100K .. 1M primes
 (define (primes-stream)
   (define (mults p) (from-By (* p p) (* 2 p)))
   (define (odd-primes-From from)              ;; odd primes from (odd) f are
       (s-diff (from-By from 2)                ;; all odds from f without the
               (s-tree-join (s-map mults odd-primes))))  ;; multiples of odd primes
   (define odd-primes 
       (s-cons 3 (odd-primes-From 5)))         ;; inner feedback loop
   (s-cons 2 (odd-primes-From 3)))             ;; result stream

 ;;;; join an ordered stream of streams (here, of primes' multiples)
 ;;;; into one ordered stream, via an infinite right-deepening tree
 (define (s-tree-join sts)
   (s-cons (head (head sts))
           (s-union (tail (head sts))
                    (s-tree-join (pairs (tail sts))))))

 (define (pairs sts)                        ;; {a.(b.t)} -> (a+b).{t}
     (s-cons (s-cons (head (head sts)) 
                     (s-union (tail (head sts)) 
                              (head (tail sts))))
             (pairs (tail (tail sts)))))

Print 10 last primes of the first thousand primes:

(display (take 10 (drop 990 (primes-stream)))) 
;
(7841 7853 7867 7873 7877 7879 7883 7901 7907 7919)

This can be also accomplished by the following self contained code which follows the format of the birdPrimes code above with the added "pairs" function integrated into the "mrgmltpls" function:

(define (treemergePrimes)
  (define (mltpls p)
    (define pm2 (* p 2))
    (let nxtmltpl ((cmpst (* p p)))
      (cons cmpst (lambda () (nxtmltpl (+ cmpst pm2))))))
  (define (allmltpls ps)
    (cons (mltpls (car ps)) (lambda () (allmltpls ((cdr ps))))))
  (define (merge xs ys)
    (let ((x (car xs)) (xt (cdr xs)) (y (car ys)) (yt (cdr ys)))
      (cond ((< x y) (cons x (lambda () (merge (xt) ys))))
            ((> x y) (cons y (lambda () (merge xs (yt)))))
            (else (cons x (lambda () (merge (xt) (yt))))))))
  (define (pairs mltplss)
    (let ((tl ((cdr mltplss))))
      (cons (merge (car mltplss) (car tl))
            (lambda () (pairs ((cdr tl)))))))
  (define (mrgmltpls mltplss)
    (cons (car (car mltplss))
          (lambda () (merge ((cdr (car mltplss)))
                            (mrgmltpls (pairs ((cdr mltplss))))))))
  (define (minusstrtat n cmps)
    (if (< n (car cmps))
      (cons n (lambda () (minusstrtat (+ n 2) cmps)))
      (minusstrtat (+ n 2) ((cdr cmps)))))
  (define (cmpsts) (mrgmltpls (allmltpls (oddprms)))) ;; internal define's are mutually recursive
  (define (oddprms) (cons 3 (lambda () (minusstrtat 5 (cmpsts)))))  
  (cons 2 (lambda () (oddprms))))

It can be tested with the same code as the self-contained Richard Bird sieve, just by calling treemergePrimes instead of birdPrimes.

Generators

(define (integers n)
  (lambda ()
    (let ((ans n))
      (set! n (+ n 1))
      ans)))

(define natural-numbers (integers 0)) 

(define (remove-multiples g n)
  (letrec ((m (+ n n))
           (self
              (lambda ()
                 (let loop ((x (g)))
                    (cond ((< x m) x)
                          ((= x m) (set! m (+ m n)) (self))
                          (else (set! m (+ m n)) (loop x)))))))
     self))

(define (sieve g)
  (lambda ()
    (let ((x (g)))
      (set! g (remove-multiples g x))
      x)))

(define primes (sieve (integers 2)))

Scilab

function a = sieve(n)
    a = ~zeros(n, 1)
    a(1) = %f
    for i = 1:n
        if a(i)
            j = i*i
            if j > n
                return
            end
            a(j:i:n) = %f
        end
    end
endfunction

find(sieve(100))
// [2 3 5 ... 97]

sum(sieve(1000))
// 168, the number of primes below 1000

Scratch

when clicked
    broadcast: fill list with zero (0) and wait
    broadcast: put one (1) in list of multiples and wait
    broadcast: fill primes where zero (0 in list

when I receive: fill list with zero (0)
    delete all of primes
    delete all of list
    set i to 0
    set maximum to 25
    repeat maximum
        add 0 to list
        change i by 1
    {end repeat}

when I receive: put ones (1) in list of multiples
    set S to sqrt of maximum
    set i to 2
    set k to 0
    repeat S
        change J by 1
        set i to 2
        repeat until i > 100
            if not (i = J) then
                if item i of list = 0 then
                    set m to (i mod J)
                    if (m = 0) then
                        replace item i of list with 1
        {end repeat until}
        change i by 1
        set k to 1
        delete all of primes
    {end repeat}
    set J to 1

when I receive: fill primes where zeros (0) in list
    repeat maximum
        if (item k of list) = 0 then
            add k to primes
        set k to (k + 1)
    {end repeat}

Comments

Scratch is a graphical drag and drop language designed to teach children an introduction to programming. It has easy to use multimedia and animation features. The code listed above was not entered into the Scratch IDE but faithfully represents the graphical code blocks used to run the sieve algorithm. The actual Scratch graphical code blocks cannot be represented on this web site due to its inability to directly represent graphical code. The actual code and output can be seen or downloaded at an external URL web link:

Scratch Code and Output

Seed7

The program below computes the number of primes between 1 and 10000000:

$ include "seed7_05.s7i";

const func set of integer: eratosthenes (in integer: n) is func
  result
    var set of integer: sieve is EMPTY_SET;
  local
    var integer: i is 0;
    var integer: j is 0;
  begin
    sieve := {2 .. n};
    for i range 2 to sqrt(n) do
      if i in sieve then
        for j range i ** 2 to n step i do
          excl(sieve, j);
        end for;
      end if;
    end for;
  end func;

const proc: main is func
  begin
    writeln(card(eratosthenes(10000000)));
  end func;

Original source: [1]

SETL

program eratosthenes;
    print(sieve 100);

    op sieve(n);
        numbers := [1..n];
        numbers(1) := om;
        loop for i in [2..floor sqrt n] do
            loop for j in [i*i, i*i+i..n] do
                numbers(j) := om;
            end loop;
        end loop;
        return [n : n in numbers | n /= om];
    end op;
end program;

Output:

[2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97]

Sidef

Translation of: Raku

func sieve(limit) {
    var sieve_arr = [false, false, (limit-1).of(true)...]
    gather {
        sieve_arr.each_kv { |number, is_prime|
            if (is_prime) {
                take(number)
                for i in (number**2 .. limit `by` number) {
                    sieve_arr[i] = false
                }
            }
        }
    }
}

say sieve(100).join(",")

Output:

2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97

Alternative implementation:

func sieve(limit) {
    var composite = []
    for n in (2 .. limit.isqrt) {
        for i in (n**2 .. limit `by` n) {
            composite[i] = true
        }
    }
    2..limit -> grep{ !composite[_] }
}

say sieve(100).join(",")

Simula

Works with: Simula-67

BEGIN
    INTEGER ARRAY t(0:1000);
    INTEGER i,j,k;
    FOR i:=0 STEP 1 UNTIL 1000 DO t(i):=1;
    t(0):=0; t(1):=0;
    i:=0;
    FOR i:=i WHILE i<1000 DO
    BEGIN
        FOR i:=i WHILE i<1000 AND t(i)=0 DO i:=i+1;
        IF i<1000 THEN
        BEGIN
            j:=2;
            k:=j*i;
            FOR k:=k WHILE k<1000 DO
            BEGIN
                t(k):=0;
                j:=j+1;
                k:=j*i
            END;
            i:=i+1
        END
    END;
    FOR i:=0 STEP 1 UNTIL 999 DO
       IF t(i)<>0  THEN
       BEGIN
           OutInt(i,5); OutImage
       END
END

Output:

A Concurrent Prime Sieve

! A CONCURRENT PRIME SIEVE ;

BEGIN

BOOLEAN DEBUG;

CLASS FILTER(INPUT, OUTPUT, PRIME); REF(FILTER) INPUT, OUTPUT; INTEGER PRIME;
BEGIN
    INTEGER NUM;
    IF PRIME = 0 AND INPUT == NONE THEN
    BEGIN
        ! SEND THE SEQUENCE 2, 3, 4, ... TO CHANNEL 'CH'. ;
        DETACH;
        NUM := 2;
        WHILE TRUE DO
        BEGIN
            IF OUTPUT == NONE THEN
            BEGIN
                IF DEBUG THEN
                BEGIN
                    OUTTEXT("GENERATE SENDS ");
                    OUTINT(NUM, 0);
                    OUTIMAGE;
                END;
                DETACH; ! SEND 'NUM' ;
            END ELSE
            BEGIN
                IF DEBUG THEN
                BEGIN
                    OUTTEXT("GENERATE SENDS ");
                    OUTINT(NUM, 0);
                    OUTTEXT(" TO FILTER("); OUTINT(OUTPUT.PRIME, 0);
                    OUTTEXT(")");
                    OUTIMAGE;
                END;
                OUTPUT.NUM := NUM;
                RESUME(OUTPUT);
            END;
            NUM := NUM + 1;
        END;
    END ELSE
    BEGIN
        ! COPY THE VALUES FROM CHANNEL 'IN' TO CHANNEL 'OUT', ;
        ! REMOVING THOSE DIVISIBLE BY 'PRIME'. ;
        DETACH;
        ! FILTER ;
        WHILE TRUE DO
        BEGIN
            INTEGER I;
            RESUME(INPUT);
            I := INPUT.NUM; ! RECEIVE VALUE FROM 'INPUT'. ;
            IF DEBUG THEN
            BEGIN
                OUTTEXT("FILTER("); OUTINT(PRIME, 0); OUTTEXT(") RECEIVES ");
                OUTINT(I, 0);
                OUTIMAGE;
            END;
            IF NOT MOD(I, PRIME) = 0 THEN
            BEGIN
                IF OUTPUT == NONE THEN
                BEGIN
                    IF DEBUG THEN
                    BEGIN
                        OUTTEXT("FILTER("); OUTINT(PRIME, 0);
                        OUTTEXT(") SENDS ");
                        OUTINT(I, 0);
                        OUTIMAGE;
                    END;
                    DETACH;
                END ELSE
                BEGIN
                    IF DEBUG THEN
                    BEGIN
                        OUTTEXT("FILTER("); OUTINT(PRIME, 0);
                        OUTTEXT(") SENDS ");
                        OUTINT(I, 0);
                        OUTTEXT(" TO FILTER("); OUTINT(OUTPUT.PRIME, 0);
                        OUTTEXT(")");
                        OUTIMAGE;
                    END;
                    OUTPUT.NUM := I; ! SEND 'I' TO 'OUT'. ;
                    RESUME(OUTPUT);
                END;
            END;
        END;
    END;
END;

! THE PRIME SIEVE: DAISY-CHAIN FILTER PROCESSES. ;
! MAIN BLOCK ;
    REF(FILTER) CH;
    INTEGER I, PRIME;
    DEBUG := TRUE;
    CH :- NEW FILTER(NONE, NONE, 0); ! LAUNCH GENERATE GOROUTINE. ;
    FOR I := 1 STEP 1 UNTIL 5 DO
    BEGIN
        REF(FILTER) CH1;
        RESUME(CH);
        PRIME := CH.NUM;
        IF DEBUG THEN OUTTEXT("MAIN BLOCK RECEIVES ");
        OUTINT(PRIME,0);
        OUTIMAGE;
        CH1 :- NEW FILTER(CH, NONE, PRIME);
        CH.OUTPUT :- CH1;
        CH :- CH1;
    END;
END;

Output:

GENERATE SENDS 2
MAIN BLOCK RECEIVES 2
GENERATE SENDS 3 TO FILTER(2)
FILTER(2) RECEIVES 3
FILTER(2) SENDS 3
MAIN BLOCK RECEIVES 3
GENERATE SENDS 4 TO FILTER(2)
FILTER(2) RECEIVES 4
GENERATE SENDS 5 TO FILTER(2)
FILTER(2) RECEIVES 5
FILTER(2) SENDS 5 TO FILTER(3)
FILTER(3) RECEIVES 5
FILTER(3) SENDS 5
MAIN BLOCK RECEIVES 5
GENERATE SENDS 6 TO FILTER(2)
FILTER(2) RECEIVES 6
GENERATE SENDS 7 TO FILTER(2)
FILTER(2) RECEIVES 7
FILTER(2) SENDS 7 TO FILTER(3)
FILTER(3) RECEIVES 7
FILTER(3) SENDS 7 TO FILTER(5)
FILTER(5) RECEIVES 7
FILTER(5) SENDS 7
MAIN BLOCK RECEIVES 7
GENERATE SENDS 8 TO FILTER(2)
FILTER(2) RECEIVES 8
GENERATE SENDS 9 TO FILTER(2)
FILTER(2) RECEIVES 9
FILTER(2) SENDS 9 TO FILTER(3)
FILTER(3) RECEIVES 9
GENERATE SENDS 10 TO FILTER(2)
FILTER(2) RECEIVES 10
GENERATE SENDS 11 TO FILTER(2)
FILTER(2) RECEIVES 11
FILTER(2) SENDS 11 TO FILTER(3)
FILTER(3) RECEIVES 11
FILTER(3) SENDS 11 TO FILTER(5)
FILTER(5) RECEIVES 11
FILTER(5) SENDS 11 TO FILTER(7)
FILTER(7) RECEIVES 11
FILTER(7) SENDS 11
MAIN BLOCK RECEIVES 11

Smalltalk

A simple implementation that you can run in a workspace. It finds all the prime numbers up to and including limit—for the sake of example, up to and including 100.

| potentialPrimes limit |
limit := 100.
potentialPrimes := Array new: limit.
potentialPrimes atAllPut: true.
2 to: limit sqrt do: [:testNumber |
    (potentialPrimes at: testNumber) ifTrue: [
        (testNumber * 2) to: limit by: testNumber do: [:nonPrime |
            potentialPrimes at: nonPrime put: false
        ]
    ]
].
2 to: limit do: [:testNumber |
    (potentialPrimes at: testNumber) ifTrue: [
        Transcript show: testNumber asString; cr
    ]
]

SNOBOL4

Using strings instead of arrays, and the square/sqrt optimizations.

        define('sieve(n)i,j,k,p,str,res') :(sieve_end)
sieve   i = lt(i,n - 1) i + 1 :f(sv1)
        str = str (i + 1) ' ' :(sieve)
sv1     str break(' ') . j span(' ') = :f(return)
        sieve = sieve j ' '
        sieve = gt(j ^ 2,n) sieve str :s(return) ;* Opt1
        res = ''
        str (arb ' ') @p ((j ^ 2) ' ') ;* Opt2
        str len(p) . res = ;* Opt2
sv2     str break(' ') . k  span(' ') = :f(sv3)
        res = ne(remdr(k,j),0) res k ' ' :(sv2)
sv3     str = res :(sv1)
sieve_end

*       # Test and display        
        output = sieve(100)
end

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

SparForte

As a structured script.

#!/usr/local/bin/spar
pragma annotate( summary, "sieve" );
pragma annotate( description, "The Sieve of Eratosthenes is a simple algorithm that" );
pragma annotate( description, "finds the prime numbers up to a given integer. Implement ");
pragma annotate( description, "this algorithm, with the only allowed optimization that" );
pragma annotate( description, "the outer loop can stop at the square root of the limit," );
pragma annotate( description, "and the inner loop may start at the square of the prime" );
pragma annotate( description, "just found. That means especially that you shouldn't" );
pragma annotate( description, "optimize by using pre-computed wheels, i.e. don't assume" );
pragma annotate( description, "you need only to cross out odd numbers (wheel based on" );
pragma annotate( description, "2), numbers equal to 1 or 5 modulo 6 (wheel based on 2" );
pragma annotate( description, "and 3), or similar wheels based on low primes." );
pragma annotate( see_also, "http://rosettacode.org/wiki/Sieve_of_Eratosthenes" );
pragma annotate( author, "Ken O. Burtch" );
pragma license( unrestricted );

pragma restriction( no_external_commands );

procedure sieve is 
   last_bool : constant positive := 20;
   type bool_array is array(2..last_bool) of boolean;
   a : bool_array;
 
   test_num : positive;  
   -- limit    : positive := positive(numerics.sqrt(float(arrays.last(a))));

   -- n : positive := 2;  
begin
   for i in arrays.first(a)..last_bool loop
     a(i) := true;
   end loop;

   for num in arrays.first(a)..last_bool loop
     if a(num) then
        test_num := num * num;
        while test_num <= last_bool loop
          a(test_num) := false;
          test_num := @ + num;
        end loop;
     end if;
   end loop;
 
   for i in arrays.first(a)..last_bool loop
     if a(i) then
       put_line(i);
     end if;
   end loop;
end sieve;

Standard ML

Works with SML/NJ. This uses BitArrays which are available in SML/NJ. The algorithm is the one on wikipedia, referred to above. Limit: Memory, normally. When more than 20 petabyte of memory available, this code will have its limitation at a maximum integer around 1.44*10E17, due to the maximum list length in SMLNJ. Using two extra loops, however, bit arrays can simply be stored to disk and processed in multiple lists. With a tail recursive wrapper function as well, the upper limit will be determined by available disk space only.

val segmentedSieve =  fn N =>
(* output : list of {segment=bit segment, start=number at startposition segment} *)

let

val NSegmt= 120000000;                                                                                  (* segment size *)


val inf2i = IntInf.toInt ;
val i2inf = IntInf.fromInt ;
val isInt = fn m => m <= IntInf.fromInt (valOf Int.maxInt);

val sweep = fn (bits, step, k, up) =>                                                                   (* strike off bits up to limit *)
       (while (  !k < up  andalso 0 <= !k  ) do
             (  BitArray.clrBit( bits, !k ) ; k:= !k +  step ; ()) ) handle Overflow => ()

val rec nextPrimebit =                                                                                  (* find next 1-bit within segment *)
     fn Bits =>
     fn pos  =>
        if pos+1 >= BitArray.length Bits
	  then    NONE
          else    ( if BitArray.sub ( Bits,pos) then SOME (i2inf pos) else nextPrimebit Bits (pos+1) );


val sieveEratosthenes =  fn n: int =>                                                             (* Eratosthenes sieve , up to+incl n *)

 let
  val nums= BitArray.bits(n,[] );
  val i=ref 2;
  val k=ref (!i * (!i) -1);

 in

  ( BitArray.complement nums;
    BitArray.clrBit( nums, 0 );
    while ( !k <n ) do (  if ( BitArray.sub (nums, !i - 1) ) then  sweep (nums, !i, k, n) else ();
      i:= !i+1;
      k:= !i * (!i) - 1 
    );
    [ { start= i2inf 1, segment=nums } ]                                                                              
  )

 end;



val sieveThroughSegment =

 fn ( primes : { segment:BitArray.array, start:IntInf.int } list, low : IntInf.int, up ) =>
                                                                                                        (* second segment and on *)
 let
  val n      = inf2i (up-low+1)
  val nums   = BitArray.bits(n,[] ); 
  val itdone = low div i2inf NSegmt

  val rec oldprimes = fn c =>  fn                                                                 (* use segment B to sweep current one *)
                 []                       => ()
      | ctlist as {start=st,segment=B}::t =>
       let
       
        val nxt  = nextPrimebit B c ;
        val p    = st +  Option.getOpt( nxt,~10)  
        val modp = ( i2inf NSegmt * itdone ) mod p
        val i    = inf2i ( if( isInt( p - modp ) ) then p - modp else 0 )                               (* i = 0 breaks off *)
        val k    = ref   ( if Option.isSome nxt  then  (i - 1)  else ~2 )
        val step = if (isInt(p)) then inf2i(p) else valOf Int.maxInt                                    (* !k+maxInt > n *)

       in
       
          ( sweep (nums, step, k, n) ;
	    if ( p*p <= up  andalso  Option.isSome nxt )
	       then    oldprimes ( inf2i (p-st+1) ) ctlist
	       else    oldprimes 0 t                                                                    (* next segment B *)
          ) 

       end

 in
  (  BitArray.complement nums;
     oldprimes 0 primes;
     rev ( {start = low, segment = nums } :: rev (primes) )
  )
 end;



val rec workSegmentsDown = fn firstFn =>
    			   fn nextFns =>
			   fn sizeSeg : int =>
			   fn upLimit : IntInf.int =>
 let
   val residual = upLimit mod i2inf sizeSeg
 in

   if ( upLimit <= i2inf sizeSeg ) then firstFn (  inf2i upLimit )
   else
     if ( residual > 0 ) then
           nextFns ( workSegmentsDown firstFn nextFns sizeSeg (upLimit - residual ),     upLimit - residual      + 1, upLimit )
     else
           nextFns ( workSegmentsDown firstFn nextFns sizeSeg (upLimit - i2inf sizeSeg), upLimit - i2inf sizeSeg + 1, upLimit ) 
 end;

in

  workSegmentsDown  sieveEratosthenes  sieveThroughSegment  NSegmt  N

end;

Example, segment size 120 million, prime numbers up to 2.5 billion:

-val writeSegment = fn  L : {segment:BitArray.array, start:IntInf.int} list =>   fn NthSegment =>
		   let
		        val M=List.nth (L , NthSegment - 1 )
		   in
		        List.map (fn x=> x + #start M)  (map IntInf.fromInt (BitArray.getBits ( #segment M)) ) 
		   end;
- val primesInBits = segmentedSieve 2500000000 ;
val primesInBits =
  [{segment=-,start=1},{segment=-,start=120000001},
   {segment=-,start=240000001},{segment=-,start=360000001},
   {segment=-,start=480000001},..  <skipped> ,...]
  : {segment:BitArray.array, start:IntInf.int} list
- writeSegment primesInBits 21 ;
val it =
  [2400000011,2400000017,2400000023,2400000047,2400000061,2400000073,
   2400000091,2400000103,2400000121,2400000133,2400000137,2400000157,...]
  : IntInf.int list
- writeSegment primesInBits 1 ;
val it = [2,3,5,7,11,13,17,19,23,29,31,37,...] : IntInf.int list

Stata

A program to create a dataset with a variable p containing primes up to a given number.

prog def sieve
	args n
	clear
	qui set obs `n'
	gen long p=_n
	gen byte a=_n>1
	forv i=2/`n' {
		if a[`i'] {
			loc j=`i'*`i'
			if `j'>`n' {
				continue, break
			}
			forv k=`j'(`i')`n' {
				qui replace a=0 in `k'
			}
		}
	}
	qui keep if a
	drop a
end

Example call

sieve 100
list in 1/10 // show only the first ten primes

     +----+
     |  p |
     |----|
  1. |  2 |
  2. |  3 |
  3. |  5 |
  4. |  7 |
  5. | 11 |
     |----|
  6. | 13 |
  7. | 17 |
  8. | 19 |
  9. | 23 |
 10. | 29 |
     +----+

Mata

mata
real colvector sieve(real scalar n) {
	real colvector a
	real scalar i, j
	if (n < 2) return(J(0, 1, .))
	a = J(n, 1, 1)
	a[1] = 0
	for (i = 1; i <= n; i++) {
		if (a[i]) {
			j = i*i
			if (j > n) return(select(1::n, a))
			for (; j <= n; j = j+i) a[j] = 0
		}
	}
}

sieve(10)
       1
    +-----+
  1 |  2  |
  2 |  3  |
  3 |  5  |
  4 |  7  |
    +-----+
end

Swift

import Foundation // for sqrt() and Date()

let max = 1_000_000
let maxroot = Int(sqrt(Double(max)))
let startingPoint = Date()

var isprime = [Bool](repeating: true, count: max+1 )
for i in 2...maxroot {
    if isprime[i] {
        for k in stride(from: max/i, through: i, by: -1) {
            if isprime[k] {
                isprime[i*k] = false }
        }
    }
}

var count = 0
for i in 2...max {
    if isprime[i] {
        count += 1
    }
}
print("\(count) primes found under \(max)")

print("\(startingPoint.timeIntervalSinceNow * -1) seconds")

Output:

78498 primes found under 1000000
0.01282501220703125 seconds

iMac 3,2 GHz Intel Core i5

Alternative odds-only version

The most obvious two improvements are to sieve for only odds as two is the only even prime and to make the sieving array bit-packed so that instead of a whole 8-bit byte per number representation there, each is represented by just one bit; these two changes improved memory use by a factor of 16 and the better CPU cache locality more than compensates for the extra code required to implement bit packing as per the following code:

func soePackedOdds(_ n: Int) ->
    LazyMapSequence<UnfoldSequence<Int, (Int?, Bool)>, Int> {
 
  let lmti = (n - 3) / 2
  let size = lmti / 8 + 1
  var sieve = Array<UInt8>(repeating: 0, count: size)
  let sqrtlmti = (Int(sqrt(Double(n))) - 3) / 2
 
  for i in 0...sqrtlmti {
    if sieve[i >> 3] & (1 << (i & 7)) == 0 {
      let p = i + i + 3
      for c in stride(from: (i*(i+3)<<1)+3, through: lmti, by: p) {
        sieve[c >> 3] |= 1 << (c & 7)
      }
    }
  }

  return sequence(first: -1, next: { (i:Int) -> Int? in
      var ni = i + 1
      while ni <= lmti && sieve[ni >> 3] & (1 << (ni & 7)) != 0 { ni += 1}
      return ni > lmti ? nil : ni
    }).lazy.map { $0 < 0 ? 2 : $0 + $0 + 3 }
}

the output for the same testing (with `soePackedOdds` substituted for `primes`) is the same except that it is about 1.5 times faster or only 1200 cycles per prime.

These "one huge sieving array" algorithms are never going to be very fast for extended ranges (past about the CPU L2 cache size for this processor supporting a range of about eight million), and a page segmented approach should be taken as per the last of the unbounded algorithms below.

Unbounded (Odds-Only) Versions

To use Swift's "higher order functions" on the generated `Sequence`'s effectively, one needs unbounded (or only by the numeric range chosen for the implementation) sieves. Many of these are incremental sieves that, instead of buffering a series of culled arrays, records the culling structure of the culling base primes (which should be a secondary stream of primes for efficiency) and produces the primes incrementally through reference and update of that structure. Various structures may be chosen for this, as in a MinHeap Priority Queue, a Hash Dictionary, or a simple List tree structure as used in the following code:

import Foundation

func soeTreeFoldingOdds() -> UnfoldSequence<Int, (Int?, Bool)> {
  class CIS<T> {
    let head: T
    let rest: (() -> CIS<T>)
    init(_ hd: T, _ rst: @escaping (() -> CIS<T>)) {
      self.head = hd; self.rest = rst
    }
  }

  func merge(_ xs: CIS<Int>, _ ys: CIS<Int>) -> CIS<Int> {
    let x = xs.head; let y = ys.head
    if x < y { return CIS(x, {() in merge(xs.rest(), ys) }) }
    else { if y < x { return CIS(y, {() in merge(xs, ys.rest()) }) }
    else { return CIS(x, {() in merge(xs.rest(), ys.rest()) }) } }
  }

  func smults(_ p: Int) -> CIS<Int> {
    let inc = p + p
    func smlts(_ c: Int) -> CIS<Int> {
      return CIS(c, { () in smlts(c + inc) })
    }
    return smlts(p * p)
  }

  func allmults(_ ps: CIS<Int>) -> CIS<CIS<Int>> {
    return CIS(smults(ps.head), { () in allmults(ps.rest()) })
  }

  func pairs(_ css: CIS<CIS<Int>>) -> CIS<CIS<Int>> {
    let cs0 = css.head; let ncss = css.rest()
    return CIS(merge(cs0, ncss.head), { () in pairs(ncss.rest()) })
  }

  func cmpsts(_ css: CIS<CIS<Int>>) -> CIS<Int> {
    let cs0 = css.head
    return CIS(cs0.head, { () in merge(cs0.rest(), cmpsts(pairs(css.rest()))) })
  }

  func minusAt(_ n: Int, _ cs: CIS<Int>) -> CIS<Int> {
    var nn = n; var ncs = cs
    while nn >= ncs.head { nn += 2; ncs = ncs.rest() }
    return CIS(nn, { () in minusAt(nn + 2, ncs) })
  }

  func oddprms() -> CIS<Int> {
    return CIS(3, { () in minusAt(5, cmpsts(allmults(oddprms()))) })
  }

  var odds = oddprms()
  return sequence(first: 2, next: { _ in
    let p = odds.head; odds = odds.rest()
    return p
  })
}

let range = 100000000

print("The primes up to 100 are:")
soeTreeFoldingOdds().prefix(while: { $0 <= 100 })
  .forEach { print($0, "", terminator: "") }
print()

print("Found \(soeTreeFoldingOdds().lazy.prefix(while: { $0 <= 1000000 })
                .reduce(0) { (a, _) in a + 1 }) primes to 1000000.")

let start = NSDate()
let answr = soeTreeFoldingOdds().prefix(while: { $0 <= range })
              .reduce(0) { (a, _) in a + 1 }
let elpsd = -start.timeIntervalSinceNow

print("Found \(answr) primes to \(range).")

print(String(format: "This test took %.3f milliseconds.", elpsd * 1000))

The output is the same as for the above except that it is much slower at about 56,000 CPU clock cycles per prime even just sieving to ten million due to the many memory allocations/de-allocations. It also has a O(n (log n) (log (log n))) asymptotic computational complexity (with the extra "log n" factor) that makes it slower with increasing range. This makes this algorithm only useful up to ranges of a few million although it is adequate to solve trivial problems such as Euler Problem 10 of summing all the primes to two million.

Note that the above code is almost a pure functional version using immutability other than for the use of loops because Swift doesn't support Tail Call Optimization (TCO) in function recursion: the loops do what TCO usually automatically does "under the covers".

Alternative version using a (mutable) Hash Dictionary

As the above code is slow due to memory allocations/de-allocations and the inherent extra "log n" term in the complexity, the following code uses a Hash Dictionary which has an average of O(1) access time (without the "log n" and uses mutability for increased seed so is in no way purely functional:

func soeDictOdds() -> UnfoldSequence<Int, Int> {
  var bp = 5; var q = 25
  var bps: UnfoldSequence<Int, Int>.Iterator? = nil
  var dict = [9: 6] // Dictionary<Int, Int>(9 => 6)
  return sequence(state: 2, next: { n in
    if n < 9 { if n < 3 { n = 3; return 2 }; defer {n += 2}; return n }
    while n >= q || dict[n] != nil {
      if n >= q {
        let inc = bp + bp
        dict[n + inc] = inc
        if bps == nil {
          bps = soeDictOdds().makeIterator()
          bp = (bps?.next())!; bp = (bps?.next())!; bp = (bps?.next())! // skip 2/3/5...
        }
        bp = (bps?.next())!; q = bp * bp // guaranteed never nil
      } else {
        let inc = dict[n] ?? 0
        dict[n] = nil
        var next = n + inc
        while dict[next] != nil { next += inc }
        dict[next] = inc
      }
      n += 2
    }
    defer { n += 2 }; return n
  })
}

It can be substituted in the above code just by substituting the `soeDictOdds` in three places in the testing code with the same output other than it is over four times faster or about 12,500 CPU clock cycles per prime.

Fast Bit-Packed Page-Segmented Version

An unbounded array-based algorithm can be written that combines the excellent cache locality of the second bounded version above but is unbounded by producing a sequence of sieved bit-packed arrays that are CPU cache size as required with a secondary stream of base primes used in culling produced in the same fashion, as in the following code:

import Foundation

typealias Prime = UInt64
typealias BasePrime = UInt32
typealias SieveBuffer = [UInt8]
typealias BasePrimeArray = [UInt32]

// the lazy list decribed problems don't affect its use here as
// it is only used here for its memoization properties and not consumed...
// In fact a consumed deferred list would be better off to use a CIS as above!

// a lazy list to memoize the progression of base prime arrays...
// there is some bug in Swift 4.2 that generating a LazyList<T> with a
// function and immediately using an extension method on it without
// first storing it to a variable results in mem seg fault for large
// ranges in the order of a million; in order to write a consuming
// function, one must write a function passing in a generator thunk, and
// immediately call a `makeIterator()` on it before storing, then doing a
// iteration on the iterator; doing a for on the immediately produced
// LazyList<T> (without storing it) also works, but this means we have to
// implement the "higher order functions" ourselves.
// this bug may have something to do with "move sematics".
class LazyList<T> : LazySequenceProtocol {
  internal typealias Thunk<T> = () -> T
  let head : T
  internal var _thnk: Thunk<LazyList<T>?>?
  lazy var tail: LazyList<T>? = {
    let tl = self._thnk?(); self._thnk = nil
    return tl
  }()
  init(_ hd: T, _ thnk: @escaping Thunk<LazyList<T>?>) {
    self.head = hd; self._thnk = thnk
  }
  struct LLSeqIter : IteratorProtocol, LazySequenceProtocol {
    @usableFromInline
    internal var _isfirst: Bool = true
    @usableFromInline
    internal var _current: LazyList<T>
    @inlinable // ensure that reference is not released by weak reference
    init(_ base: LazyList<T>) { self._current = base }
    @inlinable // can't be called by multiple threads on same LLSeqIter...
    mutating func next() -> T? {
      let curll = self._current
      if (self._isfirst) { self._isfirst = false; return curll.head }
      let ncur = curll.tail
      if (ncur == nil) { return nil }
      self._current = ncur!
      return ncur!.head
    }
    @inlinable
    func makeIterator() -> LLSeqIter {
      return LLSeqIter(self._current)
    }
  }
  @inlinable
  func makeIterator() -> LLSeqIter {
    return LLSeqIter(self)
  }
}

internal func makeCLUT() -> Array<UInt8> {
  var clut = Array(repeating: UInt8(0), count: 65536)
  for i in 0..<65536 {
    let v0 = ~i & 0xFFFF
    let v1 = v0 - ((v0 & 0xAAAA) >> 1)
    let v2 = (v1 & 0x3333) + ((v1 & 0xCCCC) >> 2)
    let v3 = (((((v2 & 0x0F0F) + ((v2 & 0xF0F0) >> 4)) &* 0x0101)) >> 8) & 31
    clut[i] = UInt8(v3)
  }
  return clut
}

internal let CLUT = makeCLUT()

internal func countComposites(_ cmpsts: SieveBuffer) -> Int {
  let len = cmpsts.count >> 1
  let clutp = UnsafePointer(CLUT) // for faster un-bounds checked access
  var bufp = UnsafeRawPointer(UnsafePointer(cmpsts))
                .assumingMemoryBound(to: UInt16.self)
  let plmt = bufp + len
  var count: Int = 0
  while (bufp < plmt) {
    count += Int(clutp[Int(bufp.pointee)])
    bufp += 1
  }
  return count
}

// converts an entire sieved array of bytes into an array of UInt32 primes,
// to be used as a source of base primes...
internal func composites2BasePrimeArray(_ low: BasePrime, _ cmpsts: SieveBuffer)
                                                          -> BasePrimeArray {
  let lmti = cmpsts.count << 3
  let len = countComposites(cmpsts)
  var rslt = BasePrimeArray(repeating: BasePrime(0), count: len)
  var j = 0
  for i in 0..<lmti {
    if (cmpsts[i >> 3] & (1 << (i & 7)) == UInt8(0)) {
      rslt[j] = low + BasePrime(i + i); j += 1
    }
  }
  return rslt
}

// do sieving work based on low starting value for the given buffer and
// the given lazy list of base prime arrays...
// uses pointers to avoid bounds checking for speed, but bounds are checked in code.
// uses an improved algorithm to maximize simple culling loop speed for
// the majority of cases of smaller base primes, only reverting to normal
// bit-packing operations for larger base primes...
// NOTE: a further optimization of maximum loop unrolling can later be
// implemented when warranted after maximum wheel factorization is implemented.
internal func sieveComposites(
      _ low: Prime, _ buf: SieveBuffer,
      _ bpas: LazyList<BasePrimeArray>) {
  let lowi = Int64((low - 3) >> 1)
  let len = buf.count
  let lmti = Int64(len << 3)
  let bufp = UnsafeMutablePointer(mutating: buf)
  let plen = bufp + len
  let nxti = lowi + lmti
  for bpa in bpas {
    for bp in bpa {
      let bp64 = Int64(bp)
      let bpi64 = (bp64 - 3) >> 1
      var strti = (bpi64 * (bpi64 + 3) << 1) + 3
      if (strti >= nxti) { return }
      if (strti >= lowi) { strti -= lowi }
      else {
        let r = (lowi - strti) % bp64
        strti = r == 0 ? 0 : bp64 - r
      }
      if (bp <= UInt32(len >> 3) && strti <= (lmti - 20 * bp64)) {
        let slmti = min(lmti, strti + (bp64 << 3))
        while (strti < slmti) {
          let msk = UInt8(1 << (strti & 7))
          var cp = bufp + Int(strti >> 3)
          while (cp < plen) {
              cp.pointee |= msk; cp += Int(bp64)
          }
          strti &+= bp64
        }
      }
      else {
        var c = strti
        let nbufp = UnsafeMutableRawPointer(bufp)
                      .assumingMemoryBound(to: Int32.self)
        while (c < lmti) {
            nbufp[Int(c >> 5)] |= 1 << (c & 31)
            c &+= bp64
        }
      }
    }
  }
}

// starts the secondary base primes feed with minimum size in bits set to 4K...
// thus, for the first buffer primes up to 8293,
// the seeded primes easily cover it as 97 squared is 9409...
// following used for fast clearing of SieveBuffer of multiple base size...
internal let clrbpseg = SieveBuffer(repeating: UInt8(0), count: 512)
internal func makeBasePrimeArrays() -> LazyList<BasePrimeArray> {
  var cmpsts = SieveBuffer(repeating: UInt8(0), count: 512)
  func nextelem(_ low: BasePrime, _ bpas: LazyList<BasePrimeArray>)
                                                -> LazyList<BasePrimeArray> {
    // calculate size so that the bit span is at least as big as the
    // maximum culling prime required, rounded up to minsizebits blocks...
    let rqdsz = 2 + Int(sqrt(Double(1 + low)))
    let sz = ((rqdsz >> 12) + 1) << 9 // size in bytes, blocks of 512 bytes
    if (sz > cmpsts.count) {
      cmpsts = SieveBuffer(repeating: UInt8(0), count: sz)
    }
    // fast clearing of the SieveBuffer array?
    for i in stride(from: 0, to: cmpsts.count, by: 512) {
      cmpsts.replaceSubrange(i..<i+512, with: clrbpseg)
    }
    sieveComposites(Prime(low), cmpsts, bpas)
    let arr = composites2BasePrimeArray(low, cmpsts)
    let nxt = low + BasePrime(cmpsts.count << 4)
    return LazyList(arr, { nextelem(nxt, bpas) })
  }
  // pre-seeding breaks recursive race,
  // as only known base primes used for first page...
  let preseedarr: [BasePrime] = [
    3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41
    , 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97 ]
  return
    LazyList(
      preseedarr,
      { nextelem(BasePrime(101), makeBasePrimeArrays()) })
}

// an iterable sequence over successive sieved buffer composite arrays,
// returning a tuple of the value represented by the lowest possible prime
// in the sieved composites array and the array itself;
// the array has a 16 Kilobytes minimum size (CPU L1 cache), but
// will grow so that the bit span is larger than the
// maximum culling base prime required, possibly making it larger than
// the L1 cache for large ranges, but still reasonably efficient using
// the L2 cache: very efficient up to about 16e9 range;
// reasonably efficient to about 2.56e14 for two Megabyte L2 cache = > 1 day...
internal let clrseg = SieveBuffer(repeating: UInt8(0), count: 16384)
func makeSievePages()
    -> UnfoldSequence<(Prime, SieveBuffer), ((Prime, SieveBuffer)?, Bool)> {
  let bpas = makeBasePrimeArrays()
  let cmpsts = SieveBuffer(repeating: UInt8(0), count: 16384)
  let low = Prime(3)
  sieveComposites(low, cmpsts, bpas)
  return sequence(first: (low, cmpsts), next: { (low, cmpsts) in
    var ncmpsts = cmpsts
    let rqdsz = 2 + Int(sqrt(Double(1 + low))) // problem with sqrt not exact past about 10^12!!!!!!!!!
    let sz = ((rqdsz >> 17) + 1) << 14 // size iin bytes, by chunks of 16384
    if (sz > ncmpsts.count) {
      ncmpsts = SieveBuffer(repeating: UInt8(0), count: sz)
    }
    // fast clearing of the SieveBuffer array?
    for i in stride(from: 0, to: ncmpsts.count, by: 16384) {
      ncmpsts.replaceSubrange(i..<i+16384, with: clrseg)
    }
    let nlow = low + Prime(ncmpsts.count << 4)
    sieveComposites(nlow, ncmpsts, bpas)
    return (nlow, ncmpsts)
  })
}

func countPrimesTo(_ range: Prime) -> Int64 {
  if (range < 3) { if (range < 2) { return Int64(0) }
                   else { return Int64(1) } }
  let rngi = Int64(range - 3) >> 1
  let clutp = UnsafePointer(CLUT) // for faster un-bounds checked access
  var count: Int64 = 1
  for sp in makeSievePages() {
    let (low, cmpsts) = sp; let lowi = Int64(low - 3) >> 1
    if ((lowi + Int64(cmpsts.count << 3)) > rngi) {
      let lsti = Int(rngi - lowi); let lstw = lsti >> 4
      let msk = UInt16(-2 << (lsti & 15))
      var bufp = UnsafeRawPointer(UnsafePointer(cmpsts))
                    .assumingMemoryBound(to: UInt16.self)
      let plmt = bufp + lstw
      while (bufp < plmt) {
        count += Int64(clutp[Int(bufp.pointee)]); bufp += 1
      }
      count += Int64(clutp[Int(bufp.pointee | msk)]);
      break;
    } else {
      count += Int64(countComposites(cmpsts))
    }
  }
  return count
}

// iterator of primes from the generated culled page segments...
struct PagedPrimesSeqIter: LazySequenceProtocol, IteratorProtocol {
  @inlinable
  init() {
    self._pgs = makeSievePages().makeIterator()
    self._cmpstsp = UnsafePointer(self._pgs.next()!.1)
  }
  @usableFromInline
  internal var _pgs: UnfoldSequence<(Prime, SieveBuffer), ((Prime, SieveBuffer)?, Bool)>
  @usableFromInline
  internal var _i = -2
  @usableFromInline
  internal var _low = Prime(3)
  @usableFromInline
  internal var _cmpstsp: UnsafePointer<UInt8>
  @usableFromInline
  internal var _lmt = 131072
  @inlinable
  mutating func next() -> Prime? {
    if self._i < -1 { self._i = -1; return Prime(2) }
    while true {
      repeat { self._i += 1 }
      while self._i < self._lmt &&
              (Int(self._cmpstsp[self._i >> 3]) & (1 << (self._i & 7))) != 0
      if self._i < self._lmt { break }
      let pg = self._pgs.next(); self._low = pg!.0
      let cmpsts = pg!.1; self._lmt = cmpsts.count << 3
      self._cmpstsp = UnsafePointer(cmpsts); self._i = -1
    }
    return self._low + Prime(self._i + self._i)
  }
  @inlinable
  func makeIterator() -> PagedPrimesSeqIter {
    return PagedPrimesSeqIter()
  }
  @inlinable
  var elements: PagedPrimesSeqIter {
    return PagedPrimesSeqIter()
  }
}
 
// sequence over primes using the above prime iterator from page iterator;
// unless doing something special with individual primes, usually unnecessary;
// better to do manipulations based on the composites bit arrays...
// takes at least as long to enumerate the primes as to sieve them...
func primesPaged() -> PagedPrimesSeqIter { return PagedPrimesSeqIter() }

let range = Prime(1000000000)

print("The first 25 primes are:")
primesPaged().prefix(25).forEach { print($0, "", terminator: "") }
print()

let start = NSDate()

let answr =
  countPrimesTo(range) // fast way, following enumeration way is slower...
//  primesPaged().prefix(while: { $0 <= range }).reduce(0, { a, _ in a + 1 })

let elpsd = -start.timeIntervalSinceNow

print("Found \(answr) primes up to \(range).")

print(String(format: "This test took %.3f milliseconds.", elpsd * 1000))

Output:

The first 25 primes are:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 50847534 primes up to 1000000000.
This test took 2004.007 milliseconds.

This produces similar output but is many many times times faster at about 75 CPU clock cycles per prime as used here to count the primes to a billion. If one were to produce the answer by enumeration using the commented out `primesPaged()` function, the time to enumerate the results is about the same as the time to actually do the work of culling, so the example `countPrimesTo` function that does high-speed counting of found packed bits is implemented to eliminate the enumeration. For most problems over larger ranges, this approach is recommended, and could be used for summing the primes, finding first instances of maximum prime gaps, prime pairs, triples, etc.

Further optimizations as in maximum wheel factorization (a further about five times faster), multi-threading (for a further multiple of the effective number of cores used), maximum loop unrolling (about a factor of two for smaller base primes), and other techniques for higher ranges (above 16 billion in this case) can be used with increasing code complexity, but there is little point when using prime enumeration as output: ie. one could reduce the composite number culling time to zero but it would still take about 2.8 seconds to enumerate the results over the billion range in the case of this processor.

Tailspin

templates sieve
  def limit: $;
  @: [ 2..$limit ];
  1 -> #
  $@ !

  when <..$@::length ?($@($) * $@($) <..$limit>)> do
    templates sift
      def prime: $;
      @: $prime * $prime;
      @sieve: [ $@sieve... -> # ];
      when <..~$@> do
        $ !
      when <$@~..> do
        @: $@ + $prime;
        $ -> #
    end sift

    $@($) -> sift !
    $ + 1 -> #
end sieve

1000 -> sieve ...->  '$; ' -> !OUT::write

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997

Better version using the mutability of the @-state to just update a primality flag

templates sieve
  def limit: $;
  @: [ 1..$limit -> 1 ];
  @(1): 0;
  2..$limit -> #
  $@ -> \[i](<=1> $i !\) !

  when <?($@($) <=1>)> do
    def prime2: $ * $;
    $prime2..$limit:$ -> @sieve($): 0;
end sieve

1000 -> sieve... ->  '$; ' -> !OUT::write

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997

Tcl

package require Tcl 8.5

proc sieve n {
    if {$n < 2} {return {}}
    
    # create a container to hold the sequence of numbers.
    # use a dictionary for its speedy access (like an associative array) 
    # and for its insertion order preservation (like a list)
    set nums [dict create]
    for {set i 2} {$i <= $n} {incr i} {
        # the actual value is never used
        dict set nums $i ""
    }
    
    set primes [list]
    while {[set nextPrime [lindex [dict keys $nums] 0]] <= sqrt($n)} {
        dict unset nums $nextPrime
        for {set i [expr {$nextPrime ** 2}]} {$i <= $n} {incr i $nextPrime} {
            dict unset nums $i
        }
        lappend primes $nextPrime
    }
    return [concat $primes [dict keys $nums]]
}

puts [sieve 100]   ;# 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Summary :/* TI-83 BASIC */

TI-83 BASIC

Input "Limit:",N
N→Dim(L1)
For(I,2,N)
1→L1(I)
End
For(I,2,SQRT(N))
If L1(I)=1
Then
For(J,I*I,N,I)
0→L1(J)
End
End
End
For(I,2,N)
If L1(I)=1
Then
Disp i
End
End
ClrList L1

UNIX Shell

With array

Works with: Bourne Again SHell

Works with: Korn Shell

Works with: Zsh

function primes {
  typeset -a a
  typeset i j
  a[1]=""
  for (( i = 2; i <= $1; i++ )); do
    a[$i]=$i
  done
  for (( i = 2; i * i <= $1; i++ )); do
    if [[ ! -z ${a[$i]} ]]; then
      for (( j = i * i; j <= $1; j += i )); do
        a[$j]=""
      done
    fi
  done
  printf '%s' "${a[2]}"
  printf ' %s' ${a[*]:3}
  printf '\n'
}

primes 1000

Output:

Output is a single long line:

2 3 5 7 11 13 17 19 23 ... 971 977 983 991 997

Using variables as fake array

Bourne Shell and Almquist Shell have no arrays. This script works with bash or dash (standard shell in Ubuntu), but uses no specifics of the shells, so it works with plain Bourne Shell as well.

Works with: Bourne Shell

#! /bin/sh

LIMIT=1000

# As a workaround for missing arrays, we use variables p2, p3, ...,
# p$LIMIT, to represent the primes. Values are true or false.
#   eval p$i=true     # Set value.
#   eval \$p$i        # Run command: true or false.
#
# A previous version of this script created a temporary directory and
# used files named 2, 3, ..., $LIMIT to represent the primes. We now use
# variables so that a killed script does not leave extra files. About
# performance, variables are about as slow as files.

i=2
while [ $i -le $LIMIT ]
do
    eval p$i=true               # was touch $i
    i=`expr $i + 1`
done

i=2
while
    j=`expr $i '*' $i`
    [ $j -le $LIMIT ]
do
    if eval \$p$i               # was if [ -f $i ]
    then
        while [ $j -le $LIMIT ]
        do
            eval p$j=false      # was rm -f $j
            j=`expr $j + $i`
        done
    fi
    i=`expr $i + 1`
done

# was echo `ls|sort -n`
echo `i=2
      while [ $i -le $LIMIT ]; do
          eval \\$p$i && echo $i
          i=\`expr $i + 1\`
      done`

With piping

This example is incorrect. Please fix the code and remove this message.

Details: This version uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.

Note: McIlroy misunderstood the Sieve of Eratosthenes as did many of his day including David Turner (1975); see Sieve of Eratosthenes article on Wikipedia

This is an elegant script by M. Douglas McIlroy, one of the founding fathers of UNIX.

This implementation is explained in his paper "Coroutine prime number sieve" (2014).

Works with: Bourne Shell

sourc() {
    seq 2 1000
}

cull() {
    while
        read p || exit
    do
        (($p % $1 != 0)) && echo $p
    done
}

sink() {
    read p || exit
    echo $p
    cull $p | sink &
}

sourc | sink

This version works by piping 1s and 0s through sed. The string of 1s and 0s represents the array of primes.

Works with: Bourne Shell

# Fill $1 characters with $2.
fill () {
	# This pipeline would begin
	#   head -c $1 /dev/zero | ...
	# but some systems have no head -c. Use dd.
	dd if=/dev/zero bs=$1 count=1 2>/dev/null | tr '\0' $2
}

filter () {
	# Use sed to put an 'x' after each multiple of $1, remove
	# first 'x', and mark non-primes with '0'.
	sed -e s/$2/\&x/g -e s/x// -e s/.x/0/g | {
		if expr $1 '*' $1 '<' $3 > /dev/null; then
			filter `expr $1 + 1` .$2 $3
		else
			cat
		fi
	}
}

# Generate a sequence of 1s and 0s indicating primality.
oz () {
	fill $1 1 | sed s/1/0/ | filter 2 .. $1
}

# Echo prime numbers from 2 to $1.
prime () {
	# Escape backslash inside backquotes. sed sees one backslash.
	echo `oz $1 | sed 's/./&\\
/g' | grep -n 1 | sed s/:1//`
}

prime 1000

C Shell

Translation of: CMake

# Sieve of Eratosthenes: Echoes all prime numbers through $limit.
@ limit = 80

if ( ( $limit * $limit ) / $limit != $limit ) then
	echo limit is too large, would cause integer overflow.
	exit 1
endif

# Use $prime[2], $prime[3], ..., $prime[$limit] as array of booleans.
# Initialize values to 1 => yes it is prime.
set prime=( `repeat $limit echo 1` )

# Find and echo prime numbers.
@ i = 2
while ( $i <= $limit )
	if ( $prime[$i] ) then
		echo $i

		# For each multiple of i, set 0 => no it is not prime.
		# Optimization: start at i squared.
		@ m = $i * $i
		while ( $m <= $limit )
			set prime[$m] = 0
			@ m += $i
		end
	endif
	@ i += 1
end

Ursala

This example is incorrect. Please fix the code and remove this message.

Details: It probably (remainder) uses rem testing and so is a trial division algorithm, not a sieve of Eratosthenes.

with no optimizations

#import nat

sieve = ~<{0,1}&& iota; @NttPX ~&r->lx ^/~&rhPlC remainder@rlX~|@r

test program:

#cast %nL

example = sieve 50

Output:

<2,3,5,7,11,13,17,19,23,29,31,37,41,43,47>

Vala

Library: Gee

Without any optimizations:

using Gee;

ArrayList<int> primes(int limit){
	var sieve = new ArrayList<bool>();
	var prime_list = new ArrayList<int>();
	
	for(int i = 0; i <= limit; i++){
		sieve.add(true);
	}
	
	sieve[0] = false;
	sieve[1] = false;
	
	for (int i = 2; i <= limit/2; i++){
		if (sieve[i] != false){
			for (int j = 2; i*j <= limit; j++){
				sieve[i*j] = false;
			}
		}
	}

	for (int i = 0; i < sieve.size; i++){
		if (sieve[i] != false){
			prime_list.add(i);
		}
	}
	
	return prime_list;
} // end primes

public static void main(){
	var prime_list = primes(50);
	
	foreach(var prime in prime_list)
		stdout.printf("%s ", prime.to_string());
	
	stdout.printf("\n");
}

{{out}

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47

VAX Assembly

                           000F4240  0000     1 n=1000*1000
                               0000  0000     2 .entry	main,0
                            7E 7CFD  0002     3 	clro	-(sp)			;result buffer
                            5E   DD  0005     4 	pushl	sp			;pointer to buffer
                            10   DD  0007     5 	pushl	#16			;descriptor -> len of buffer
                                     0009     6 
                            02   DD  0009     7 	pushl	#2			;1st candidate
                                     000B     8 test:
                 09 46'AF   6E   E1  000B     9 	bbc	(sp), b^bits, found	;bc - bit clear
                                     0010    10 next:
           F3 6E   000F4240 8F   F2  0010    11         aoblss  #n, (sp), test		;+1: limit,index
                                 04  0018    12         ret
                                     0019    13 found:
                         04 AE   7F  0019    14 	pushaq	4(sp)			;-> descriptor by ref
                         04 AE   DF  001C    15 	pushal	4(sp)			;-> prime on stack by ref
              00000000'GF   02   FB  001F    16 	calls	#2, g^ots$cvt_l_ti	;convert integer to string
                         04 AE   7F  0026    17 	pushaq	4(sp)			;
              00000000'GF   01   FB  0029    18 	calls	#1, g^lib$put_output	;show result
                                     0030    19 
                       53   6E   D0  0030    20 	movl	(sp), r3
                                     0033    21 mult:
    0002 53   6E   000F4240 8F   F1  0033    22 	acbl    #n, (sp), r3, set_mult	;limit,add,index
                            D1   11  003D    23 	brb	next
                                     003F    24 set_mult:				;set bits for multiples
                 EF 46'AF   53   E2  003F    25 	bbss	r3, b^bits, mult	;branch on bit set & set
                            ED   11  0044    26 	brb	mult
                                     0046    27 
                           0001E892  0046    28 bits:	.blkl	<n+2+31>/32
                                     E892    29 .end	main

VBA

Using Excel

 Sub primes()
'BRRJPA
'Prime calculation for VBA_Excel
'p is the superior limit of the range calculation
'This example calculates from 2 to 100000 and print it
'at the collum A


p = 100000

Dim nprimes(1 To 100000) As Integer
b = Sqr(p)

For n = 2 To b

    For k = n * n To p Step n
        nprimes(k) = 1
        
    Next k
Next n


For a = 2 To p
    If nprimes(a) = 0 Then
      c = c + 1
      Range("A" & c).Value = a
        
    End If
 Next a

End Sub

VBScript

To run in console mode with cscript.

    Dim sieve()
	If WScript.Arguments.Count>=1 Then
	    n = WScript.Arguments(0)
	Else 
	    n = 99
	End If
    ReDim sieve(n)
    For i = 1 To n
        sieve(i) = True
    Next
    For i = 2 To n
        If sieve(i) Then
            For j = i * 2 To n Step i
                sieve(j) = False
            Next
        End If
    Next
    For i = 2 To n
        If sieve(i) Then WScript.Echo i
    Next

Vedit macro language

This implementation uses an edit buffer as an array for flags. After the macro has been run, you can see how the primes are located in the array. Primes are marked with 'P' and non-primes with '-'. The first character position represents number 0.

#10 = Get_Num("Enter number to search to: ", STATLINE)
Buf_Switch(Buf_Free)                    // Use edit buffer as flags array
Ins_Text("--")                          // 0 and 1 are not primes
Ins_Char('P', COUNT, #10-1)             // init rest of the flags to "prime"
for (#1 = 2; #1*#1 < #10; #1++) {
    Goto_Pos(#1)
    if (Cur_Char=='P') {                // this is a prime number
        for (#2 = #1*#1; #2 <= #10; #2 += #1) {
            Goto_Pos(#2)
            Ins_Char('-', OVERWRITE)
        }
    }
}

Sample output showing numbers in range 0 to 599.

--PP-P-P---P-P---P-P---P-----P-P-----P---P-P---P-----P-----P
-P-----P---P-P-----P---P-----P-------P---P-P---P-P---P------
-------P---P-----P-P---------P-P-----P-----P---P-----P-----P
-P---------P-P---P-P-----------P-----------P---P-P---P-----P
-P---------P-----P-----P-----P-P-----P---P-P---------P------
-------P---P-P---P-------------P-----P---------P-P---P-----P
-------P-----P-----P---P-----P-------P---P-------P---------P
-P---------P-P-----P---P-----P-------P---P-P---P-----------P
-------P---P-------P---P-----P-----------P-P----------------
-P-----P---------P-----P-----P-P-----P---------P-----P-----P

Visual Basic

Works with: VB6

Sub Eratost()
    Dim sieve() As Boolean
    Dim n As Integer, i As Integer, j As Integer
    n = InputBox("limit:", n)
    ReDim sieve(n)
    For i = 1 To n
        sieve(i) = True
    Next i
    For i = 2 To n
        If sieve(i) Then
            For j = i * 2 To n Step i
                sieve(j) = False
            Next j
        End If
    Next i
    For i = 2 To n
        If sieve(i) Then Debug.Print i
    Next i
End Sub 'Eratost

Visual Basic .NET

Dim n As Integer, k As Integer, limit As Integer
Console.WriteLine("Enter number to search to: ")
limit = Console.ReadLine
Dim flags(limit) As Integer
For n = 2 To Math.Sqrt(limit)
    If flags(n) = 0 Then
        For k = n * n To limit Step n
            flags(k) = 1
        Next k
    End If
Next n
 
' Display the primes
For n = 2 To limit
    If flags(n) = 0 Then
        Console.WriteLine(n)
    End If
Next n

Alternate

Since the sieves are being removed only above the current iteration, the separate loop for display is unnecessary. And no Math.Sqrt() needed. Also, input is from command line parameter instead of Console.ReadLine(). Consolidated If block with For statement into two Do loops.

Module Module1
    Sub Main(args() As String)
        Dim lmt As Integer = 500, n As Integer = 2, k As Integer
        If args.Count > 0 Then Integer.TryParse(args(0), lmt)
        Dim flags(lmt + 1) As Boolean   ' non-primes are true in this array.
        Do                              ' a prime was found, 
            Console.Write($"{n} ")      ' so show it,
            For k = n * n To lmt Step n ' and eliminate any multiple of it at it's square and beyond.
                flags(k) = True
            Next
            Do                          ' skip over non-primes.
                n += If(n = 2, 1, 2)
            Loop While flags(n)
        Loop while n <= lmt
    End Sub
End Module

Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499

V (Vlang)

Translation of: go

Basic sieve of array of booleans

fn main() {
    limit := 201 // means sieve numbers < 201
 
    // sieve
    mut c := []bool{len: limit} // c for composite.  false means prime candidate
    c[1] = true              // 1 not considered prime
    mut p := 2
    for {
        // first allowed optimization:  outer loop only goes to sqrt(limit)
        p2 := p * p
        if p2 >= limit {
            break
        }
        // second allowed optimization:  inner loop starts at sqr(p)
        for i := p2; i < limit; i += p {
            c[i] = true // it's a composite
        }
        // scan to get next prime for outer loop
        for {
            p++
            if !c[p] {
                break
            }
        }
    }
 
    // sieve complete.  now print a representation.
    for n in 1..limit {
        if c[n] {
            print("  .")
        } else {
            print("${n:3}")
        }
        if n%20 == 0 {
            println("")
        }
    }
}

Output:

  .  2  3  .  5  .  7  .  .  . 11  . 13  .  .  . 17  . 19  .
  .  . 23  .  .  .  .  . 29  . 31  .  .  .  .  . 37  .  .  .
 41  . 43  .  .  . 47  .  .  .  .  . 53  .  .  .  .  . 59  .
 61  .  .  .  .  . 67  .  .  . 71  . 73  .  .  .  .  . 79  .
  .  . 83  .  .  .  .  . 89  .  .  .  .  .  .  . 97  .  .  .
101  .103  .  .  .107  .109  .  .  .113  .  .  .  .  .  .  .
  .  .  .  .  .  .127  .  .  .131  .  .  .  .  .137  .139  .
  .  .  .  .  .  .  .  .149  .151  .  .  .  .  .157  .  .  .
  .  .163  .  .  .167  .  .  .  .  .173  .  .  .  .  .179  .
181  .  .  .  .  .  .  .  .  .191  .193  .  .  .197  .199  .

Vorpal

self.print_primes = method(m){
   p = new()
   p.fill(0, m, 1, true)

   count = 0
   i = 2
   while(i < m){
      if(p[i] == true){
         p.fill(i+i, m, i, false)
         count = count + 1
      }
      i = i + 1
   }
   ('primes: ' + count + ' in ' + m).print()
   for(i = 2, i < m, i = i + 1){
      if(p[i] == true){
         ('' + i + ', ').put()
      }
   }
   ''.print()
}

self.print_primes(100)

Result:

primes: 25 in 100
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,

WebAssembly

(module
 (import "js" "print" (func $print (param i32)))
 (memory 4096)
 
 (func $sieve (export "sieve") (param $n i32)
   (local $i i32)
   (local $j i32)
 
   (set_local $i (i32.const 0))
   (block $endLoop
     (loop $loop
       (br_if $endLoop (i32.ge_s (get_local $i) (get_local $n)))
       (i32.store8 (get_local $i) (i32.const 1))
       (set_local $i (i32.add (get_local $i) (i32.const 1)))
       (br $loop)))
 
   (set_local $i (i32.const 2))
   (block $endLoop
     (loop $loop
       (br_if $endLoop (i32.ge_s (i32.mul (get_local $i) (get_local $i)) 
                                 (get_local $n)))
       (if (i32.eq (i32.load8_s (get_local $i)) (i32.const 1))
         (then
           (set_local $j (i32.mul (get_local $i) (get_local $i)))
           (block $endInnerLoop
             (loop $innerLoop
               (i32.store8 (get_local $j) (i32.const 0))
               (set_local $j (i32.add (get_local $j) (get_local $i)))
               (br_if $endInnerLoop (i32.ge_s (get_local $j) (get_local $n)))
               (br $innerLoop)))))
       (set_local $i (i32.add (get_local $i) (i32.const 1)))
       (br $loop)))
 
   (set_local $i (i32.const 2))
   (block $endLoop
     (loop $loop
       (if (i32.eq (i32.load8_s (get_local $i)) (i32.const 1))
         (then
           (call $print (get_local $i))))
       (set_local $i (i32.add (get_local $i) (i32.const 1)))
       (br_if $endLoop (i32.ge_s (get_local $i) (get_local $n)))
       (br $loop)))))

Xojo

Place the following in the Run event handler of a Console application:

Dim limit, prime, i As Integer
Dim s As String
Dim t As Double
Dim sieve(100000000) As Boolean

REM Get the maximum number
While limit<1 Or limit > 100000000
  Print("Max number? [1 to 100000000]")
  s = Input
  limit = CDbl(s)
Wend

REM Do the calculations
t = Microseconds
prime = 2
While prime^2 < limit
  For i = prime*2 To limit Step prime
    sieve(i) = True
  Next
  Do
    prime = prime+1
  Loop Until Not sieve(prime)
Wend
t = Microseconds-t
Print("Compute time = "+Str(t/1000000)+" sec")
Print("Press Enter...")
s = Input

REM Display the prime numbers
For i = 1 To limit
  If Not sieve(i) Then Print(Str(i))
Next
s = Input

Output:

Max number? [1 to 100000000]
1000
Compute time = 0.0000501 sec
Press Enter...

1
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
...

This version uses a dynamic array and can use (a lot) less memory. It's (a lot) slower too. Since Booleans are manually set to True, the algorithm makes more sense.

Dim limit, prime, i As Integer
Dim s As String
Dim t As Double
Dim sieve() As Boolean

REM Get the maximum number and define array
While limit<1 Or limit > 2147483647
  Print("Max number? [1 to 2147483647]")
  s = Input
  limit = CDbl(s)
Wend
t = Microseconds
For i = 0 To Limit
   Sieve.Append(True)
Next
t = Microseconds-t
Print("Memory allocation time = "+Str(t/1000000)+" sec")

REM Do the calculations
t = Microseconds
prime = 2
While prime^2 < limit
  For i = prime*2 To limit Step prime
    sieve(i) = False
  Next
  Do
    prime = prime+1
  Loop Until sieve(prime)
Wend
t = Microseconds-t
Print("Compute time = "+Str(t/1000000)+" sec")
Print("Press Enter...")
s = Input

REM Display the prime numbers
For i = 1 To limit
  If sieve(i) Then Print(Str(i))
Next
s = Input

Output:

Max number? [1 to 2147483647]
1000
Memory allocation time = 0.0000296 sec
Compute time = 0.0000501 sec
Press Enter...

1
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
...

Woma

(sieve(n = /0 -> int; limit = /0 -> int; is_prime = [/0] -> *)) *
    i<@>range(n*n, limit+1, n)
        is_prime = is_prime[$]i,False
    <*>is_prime

(primes_upto(limit = 4 -> int)) list(int)
    primes = [] -> list
    f = [False, False] -> list(bool)
    t = [True] -> list(bool)
    u = limit - 1 -> int
    tt = t * u -> list(bool)
    is_prime = flatten(f[^]tt) -> list(bool)
    limit_sqrt = limit ** 0.5 -> float
    iter1 = int(limit_sqrt + 1.5) -> int

    n<@>range(iter1)
        is_prime[n]<?>is_prime = sieve(n, limit, is_prime)

    i,prime<@>enumerate(is_prime)
        prime<?>primes = primes[^]i
    <*>primes

Wren

var sieveOfE = Fn.new { |n|
    if (n < 2) return []
    var comp = List.filled(n-1, false)
    var p = 2
    while (true) {
        var p2 = p * p
        if (p2 > n) break
        var i = p2
        while (i <= n) {
            comp[i-2] = true
            i = i + p
        }
        while (true) {
            p = p + 1
            if (!comp[p-2]) break
        }
    }
    var primes = []
    for (i in 0..n-2) {
        if (!comp[i]) primes.add(i+2)
    }
    return primes
}

System.print(sieveOfE.call(100))

Output:

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

XPL0

include c:\cxpl\codes;                  \intrinsic 'code' declarations
int  Size, Prime, I, Kill;
char Flag;
[Size:= IntIn(0);
Flag:= Reserve(Size+1);
for I:= 2 to Size do Flag(I):= true;
for I:= 2 to Size do
    if Flag(I) then                     \found a prime
        [Prime:= I;
        IntOut(0, Prime);  CrLf(0);
        Kill:= Prime + Prime;           \first multiple to kill
        while Kill <= Size do
                [Flag(Kill):= false;    \zero a non-prime
                Kill:= Kill + Prime;    \next multiple
                ];
        ];
]

Example output:

20

2 3 5 7 11 13 17

19

Yabasic

#!/usr/bin/yabasic

// ---------------------------
// Prime Sieve Benchmark --
// "Shootout" Version    --
// ---------------------------
// usage:
//     yabasic sieve8k.yab 90000


SIZE = 8192
ONN = 1 : OFF = 0
dim flags(SIZE)

sub main()
    
    cmd = peek("arguments")
    if cmd = 1 then
       iterations = val(peek$("argument"))
       if iterations = 0 then print "Argument wrong. Done 1000." : iterations = 1000 end if
    else
       print "1000 iterations."
       iterations = 1000
    end if
    
    for iter = 1 to iterations
        count = 0
        for n= 1 to SIZE : flags(n) = ONN: next n
        for i = 2 to SIZE
            if flags(i) = ONN then
               let k = i + i
               if k < SIZE then
                 for k = k to SIZE step i
                    flags(k) = OFF
                 next k
               end if
               count = count + 1                 
            end if
        next i
    next iter
    print "Count: ", count  // 1028
end sub

clear screen

print "Prime Sieve Benchmark\n"

main()

t = val(mid$(time$,10))

print "time: ", t, "\n"
print peek("millisrunning")

Zig

const std = @import("std");
const stdout = std.io.getStdOut().outStream();

pub fn main() !void {
    try sieve(1000);
}

// using a comptime limit ensures that there's no need for dynamic memory.
fn sieve(comptime limit: usize) !void {
    var prime = [_]bool{true} ** limit;
    prime[0] = false;
    prime[1] = false;
    var i: usize = 2;
    while (i*i < limit) : (i += 1) {
        if (prime[i]) {
            var j = i*i;
            while (j < limit) : (j += i)
                prime[j] = false;
        }
    }
    var c: i32 = 0;
    for (prime) |yes, p|
        if (yes) {
            c += 1;
            try stdout.print("{:5}", .{p});
            if (@rem(c, 10) == 0)
                try stdout.print("\n", .{});
        };
    try stdout.print("\n", .{});
}

Output:

$ zig run sieve.zig 
    2    3    5    7   11   13   17   19   23   29
   31   37   41   43   47   53   59   61   67   71
   73   79   83   89   97  101  103  107  109  113
  127  131  137  139  149  151  157  163  167  173
  179  181  191  193  197  199  211  223  227  229
  233  239  241  251  257  263  269  271  277  281
  283  293  307  311  313  317  331  337  347  349
  353  359  367  373  379  383  389  397  401  409
  419  421  431  433  439  443  449  457  461  463
  467  479  487  491  499  503  509  521  523  541
  547  557  563  569  571  577  587  593  599  601
  607  613  617  619  631  641  643  647  653  659
  661  673  677  683  691  701  709  719  727  733
  739  743  751  757  761  769  773  787  797  809
  811  821  823  827  829  839  853  857  859  863
  877  881  883  887  907  911  919  929  937  941
  947  953  967  971  977  983  991  997

Odds-only bit packed version

Translation of: BCPL

Includes the iterator, as with the BCPL Odds-only bit packed sieve. Since it's not much extra code, the sieve object also includes methods for getting the size and testing for membership.

const std = @import("std");
const heap = std.heap;
const mem = std.mem;
const stdout = std.io.getStdOut().writer();

pub fn main() !void {
    const assert = std.debug.assert;

    var buf: [fixed_alloc_sz(1000)]u8 = undefined; // buffer big enough for 1,000 primes.
    var fba = heap.FixedBufferAllocator.init(&buf);

    const sieve = try SoE.init(1000, &fba.allocator);
    defer sieve.deinit(); // not needed for the FBA, but in general you would de-init the sieve

    // test membership functions
    assert(sieve.contains(997));
    assert(!sieve.contains(995));
    assert(!sieve.contains(994));
    assert(!sieve.contains(1009));

    try stdout.print("There are {} primes < 1000\n", .{sieve.size()});
    var c: u32 = 0;
    var iter = sieve.iterator();
    while (iter.next()) |p| {
        try stdout.print("{:5}", .{p});
        c += 1;
        if (c % 10 == 0)
            try stdout.print("\n", .{});
    }
    try stdout.print("\n", .{});
}

// return size to sieve n prmes if using the Fixed Buffer Allocator
//     adds some u64 words for FBA bookkeeping.
pub inline fn fixed_alloc_sz(limit: usize) usize {
    return (2 + limit / 128) * @sizeOf(u64);
}

pub const SoE = struct {
    const all_u64bits_on = 0xFFFF_FFFF_FFFF_FFFF;
    const empty = [_]u64{};

    sieve: []u64,
    alloc: *mem.Allocator,

    pub fn init(limit: u64, allocator: *mem.Allocator) error{OutOfMemory}!SoE {
        if (limit < 3)
            return SoE{
                .sieve = &empty,
                .alloc = allocator,
            };

        var bit_sz = (limit + 1) / 2 - 1;
        var q = bit_sz >> 6;
        var r = bit_sz & 0x3F;
        var sz = q + @boolToInt(r > 0);
        var sieve = try allocator.alloc(u64, sz);

        var i: usize = 0;
        while (i < q) : (i += 1)
            sieve[i] = all_u64bits_on;
        if (r > 0)
            sieve[q] = (@as(u64, 1) << @intCast(u6, r)) - 1;

        var bit: usize = 0;
        while (true) {
            while (sieve[bit >> 6] & @as(u64, 1) << @intCast(u6, bit & 0x3F) == 0)
                bit += 1;

            const p = 2 * bit + 3;
            q = p * p;
            if (q > limit)
                return SoE{
                    .sieve = sieve,
                    .alloc = allocator,
                };

            r = (q - 3) / 2;
            while (r < bit_sz) : (r += p)
                sieve[r >> 6] &= ~((@as(u64, 1)) << @intCast(u6, r & 0x3F));

            bit += 1;
        }
    }

    pub fn deinit(self: SoE) void {
        if (self.sieve.len > 0) {
            self.alloc.free(self.sieve);
        }
    }

    pub fn iterator(self: SoE) SoE_Iterator {
        return SoE_Iterator.init(self.sieve);
    }

    pub fn size(self: SoE) usize {
        var sz: usize = 1; // sieve doesn't include 2.
        for (self.sieve) |bits|
            sz += @popCount(u64, bits);
        return sz;
    }

    pub fn contains(self: SoE, n: u64) bool {
        if (n & 1 == 0)
            return n == 2
        else {
            const bit = (n - 3) / 2;
            const q = bit >> 6;
            const r = @intCast(u6, bit & 0x3F);
            return if (q >= self.sieve.len)
                false
            else
                self.sieve[q] & (@as(u64, 1) << r) != 0;
        }
    }
};

// Create an iterater object to enumerate primes we've generated.
const SoE_Iterator = struct {
    const Self = @This();

    start: u64,
    bits: u64,
    sieve: []const u64,

    pub fn init(sieve: []const u64) Self {
        return Self{
            .start = 0,
            .sieve = sieve,
            .bits = sieve[0],
        };
    }

    pub fn next(self: *Self) ?u64 {
        if (self.sieve.len == 0)
            return null;

        // start = 0 => first time, so yield 2.
        if (self.start == 0) {
            self.start = 3;
            return 2;
        }

        var x = self.bits;
        while (true) {
            if (x != 0) {
                const p = @ctz(u64, x) * 2 + self.start;
                x &= x - 1;
                self.bits = x;
                return p;
            } else {
                self.start += 128;
                self.sieve = self.sieve[1..];
                if (self.sieve.len == 0)
                    return null;
                x = self.sieve[0];
            }
        }
    }
};

Output:

There are 168 primes < 1000
    2    3    5    7   11   13   17   19   23   29
   31   37   41   43   47   53   59   61   67   71
   73   79   83   89   97  101  103  107  109  113
  127  131  137  139  149  151  157  163  167  173
  179  181  191  193  197  199  211  223  227  229
  233  239  241  251  257  263  269  271  277  281
  283  293  307  311  313  317  331  337  347  349
  353  359  367  373  379  383  389  397  401  409
  419  421  431  433  439  443  449  457  461  463
  467  479  487  491  499  503  509  521  523  541
  547  557  563  569  571  577  587  593  599  601
  607  613  617  619  631  641  643  647  653  659
  661  673  677  683  691  701  709  719  727  733
  739  743  751  757  761  769  773  787  797  809
  811  821  823  827  829  839  853  857  859  863
  877  881  883  887  907  911  919  929  937  941
  947  953  967  971  977  983  991  997

Optimized version

const stdout = @import("std").io.getStdOut().writer();

const lim = 1000;
const n = lim - 2;

var primes: [n]?usize = undefined;

pub fn main() anyerror!void {
    var i: usize = 0;
    var m: usize = 0;

    while (i < n) : (i += 1) {
        primes[i] = i + 2;
    }

    i = 0;
    while (i < n) : (i += 1) {
        if (primes[i]) |prime| {
            m += 1;
            try stdout.print("{:5}", .{prime});
            if (m % 10 == 0) try stdout.print("\n", .{});
            var j: usize = i + prime;
            while (j < n) : (j += prime) {
                primes[j] = null;
            }
        }
    }
    try stdout.print("\n", .{});
}

Output:

$ zig run sieve.zig 
    2    3    5    7   11   13   17   19   23   29
   31   37   41   43   47   53   59   61   67   71
   73   79   83   89   97  101  103  107  109  113
  127  131  137  139  149  151  157  163  167  173
  179  181  191  193  197  199  211  223  227  229
  233  239  241  251  257  263  269  271  277  281
  283  293  307  311  313  317  331  337  347  349
  353  359  367  373  379  383  389  397  401  409
  419  421  431  433  439  443  449  457  461  463
  467  479  487  491  499  503  509  521  523  541
  547  557  563  569  571  577  587  593  599  601
  607  613  617  619  631  641  643  647  653  659
  661  673  677  683  691  701  709  719  727  733
  739  743  751  757  761  769  773  787  797  809
  811  821  823  827  829  839  853  857  859  863
  877  881  883  887  907  911  919  929  937  941
  947  953  967  971  977  983  991  997

zkl

fcn sieve(limit){
   composite:=Data(limit+1).fill(1);  // bucket of bytes set to 1 (prime)
   (2).pump(limit.toFloat().sqrt()+1, Void,  // Void==no results, just loop
       composite.get, Void.Filter,	// if prime, zero multiples
      'wrap(n){ [n*n..limit,n].pump(Void,composite.set.fp1(0)) }); //composite[n*p]=0
   (2).filter(limit-1,composite.get); // bytes still 1 are prime
}
sieve(53).println();

The pump method is just a loop, passing results from action to action and collecting the results (ie a minimal state machine). Pumping to Void means don't collect. The Void.Filter action means if result.toBool() is False, skip else get the source input (pre any action) and pass that to the next action. Here, the first filter checks the table if src is prime, if so, the third action take the prime and does some side effects.

Output:

L(2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53)