Sieve of Eratosthenes

From Rosetta Code
Task
Sieve of Eratosthenes
You are encouraged to solve this task according to the task description, using any language you may know.
This task has been clarified. Its programming examples are in need of review to ensure that they still fit the requirements of the task.


The Sieve of Eratosthenes is a simple algorithm that finds the prime numbers up to a given integer.


Task

Implement the   Sieve of Eratosthenes   algorithm, with the only allowed optimization that the outer loop can stop at the square root of the limit, and the inner loop may start at the square of the prime just found.

That means especially that you shouldn't optimize by using pre-computed wheels, i.e. don't assume you need only to cross out odd numbers (wheel based on 2), numbers equal to 1 or 5 modulo 6 (wheel based on 2 and 3), or similar wheels based on low primes.

If there's an easy way to add such a wheel based optimization, implement it as an alternative version.


Note
  • It is important that the sieve algorithm be the actual algorithm used to find prime numbers for the task.


Related tasks



11l

Translation of: Python
F primes_upto(limit)
   V is_prime = [0B]*2 [+] [1B]*(limit - 1)
   L(n) 0 .< Int(limit ^ 0.5 + 1.5)
      I is_prime[n]
         L(i) (n*n..limit).step(n)
            is_prime[i] = 0B
   R enumerate(is_prime).filter((i, prime) -> prime).map((i, prime) -> i)

print(primes_upto(100))
Output:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]

360 Assembly

For maximum compatibility, this program uses only the basic instruction set.

*        Sieve of Eratosthenes 
ERATOST  CSECT  
         USING  ERATOST,R12
SAVEAREA B      STM-SAVEAREA(R15)
         DC     17F'0'
         DC     CL8'ERATOST'
STM      STM    R14,R12,12(R13) save calling context
         ST     R13,4(R15)      
         ST     R15,8(R13)
         LR     R12,R15         set addessability
*        ----   CODE
         LA     R4,1            I=1  
         LA     R6,1            increment
         L      R7,N            limit
LOOPI    BXH    R4,R6,ENDLOOPI  do I=2 to N
         LR     R1,R4           R1=I
         BCTR   R1,0             
         LA     R14,CRIBLE(R1)
         CLI    0(R14),X'01'
         BNE    ENDIF           if not CRIBLE(I)
         LR     R5,R4           J=I
         LR     R8,R4
         LR     R9,R7
LOOPJ    BXH    R5,R8,ENDLOOPJ  do J=I*2 to N by I
         LR     R1,R5           R1=J
         BCTR   R1,0
         LA     R14,CRIBLE(R1)
         MVI    0(R14),X'00'    CRIBLE(J)='0'B
         B      LOOPJ
ENDLOOPJ EQU    *
ENDIF    EQU    *
         B      LOOPI
ENDLOOPI EQU    *
         LA     R4,1            I=1  
         LA     R6,1
         L      R7,N
LOOP     BXH    R4,R6,ENDLOOP   do I=1 to N
         LR     R1,R4           R1=I
         BCTR   R1,0
         LA     R14,CRIBLE(R1)
         CLI    0(R14),X'01'
         BNE    NOTPRIME        if not CRIBLE(I) 
         CVD    R4,P            P=I
         UNPK   Z,P             Z=P
         MVC    C,Z             C=Z
         OI     C+L'C-1,X'F0'   zap sign
         MVC    WTOBUF(8),C+8
         WTO    MF=(E,WTOMSG)		  
NOTPRIME EQU    *
         B      LOOP
ENDLOOP  EQU    *
RETURN   EQU    *
         LM     R14,R12,12(R13) restore context
         XR     R15,R15         set return code to 0
         BR     R14             return to caller
*        ----   DATA
I        DS     F
J        DS     F
         DS     0F
P        DS     PL8             packed
Z        DS     ZL16            zoned
C        DS     CL16            character 
WTOMSG   DS     0F
         DC     H'80'           length of WTO buffer
         DC     H'0'            must be binary zeroes
WTOBUF   DC     80C' '
         LTORG  
N        DC     F'100000'
CRIBLE   DC     100000X'01'
         YREGS  
         END    ERATOST
Output:
00000002
00000003
00000005
00000007
00000011
00000013
00000017
00000019
00000023
00000029
00000031
00000037
00000041
00000043
00000047
00000053
00000059
00000061
00000067
...
00099767
00099787
00099793
00099809
00099817
00099823
00099829
00099833
00099839
00099859
00099871
00099877
00099881
00099901
00099907
00099923
00099929
00099961
00099971
00099989
00099991

6502 Assembly

If this subroutine is called with the value of n in the accumulator, it will store an array of the primes less than n beginning at address 1000 hex and return the number of primes it has found in the accumulator.

ERATOS: STA  $D0      ; value of n
        LDA  #$00
        LDX  #$00
SETUP:  STA  $1000,X  ; populate array
        ADC  #$01
        INX
        CPX  $D0
        BPL  SET
        JMP  SETUP
SET:    LDX  #$02
SIEVE:  LDA  $1000,X  ; find non-zero
        INX
        CPX  $D0
        BPL  SIEVED
        CMP  #$00
        BEQ  SIEVE
        STA  $D1      ; current prime
MARK:   CLC
        ADC  $D1
        TAY
        LDA  #$00
        STA  $1000,Y
        TYA
        CMP  $D0
        BPL  SIEVE
        JMP  MARK
SIEVED: LDX  #$01
        LDY  #$00
COPY:   INX
        CPX  $D0
        BPL  COPIED
        LDA  $1000,X
        CMP  #$00
        BEQ  COPY
        STA  $2000,Y
        INY
        JMP  COPY
COPIED: TYA           ; how many found
        RTS

68000 Assembly

Algorithm somewhat optimized: array omits 1, 2, all higher odd numbers. Optimized for storage: uses bit array for prime/composite flags.

Works with: [EASy68K v5.13.00]

Some of the macro code is derived from the examples included with EASy68K. See 68000 "100 Doors" listing for additional information.

*-----------------------------------------------------------
* Title      : BitSieve
* Written by : G. A. Tippery
* Date       : 2014-Feb-24, 2013-Dec-22
* Description: Prime number sieve
*-----------------------------------------------------------
    	ORG    $1000

**	---- Generic macros ----	**
PUSH	MACRO
	MOVE.L	\1,-(SP)
	ENDM

POP	MACRO
	MOVE.L	(SP)+,\1
	ENDM

DROP	MACRO
	ADDQ	#4,SP
	ENDM
	
PUTS	MACRO
	** Print a null-terminated string w/o CRLF **
	** Usage: PUTS stringaddress
	** Returns with D0, A1 modified
	MOVEQ	#14,D0	; task number 14 (display null string)
	LEA	\1,A1	; address of string
	TRAP	#15	; display it
	ENDM
	
GETN	MACRO
	MOVEQ	#4,D0	; Read a number from the keyboard into D1.L. 
	TRAP	#15
	ENDM

**	---- Application-specific macros ----	**

val	MACRO		; Used by bit sieve. Converts bit address to the number it represents.
	ADD.L	\1,\1	; double it because odd numbers are omitted
	ADDQ	#3,\1	; add offset because initial primes (1, 2) are omitted
	ENDM

* ** ================================================================================ **
* ** Integer square root routine, bisection method **
* ** IN: D0, should be 0<D0<$10000 (65536) -- higher values MAY work, no guarantee
* ** OUT: D1
*
SquareRoot:
*
	MOVEM.L	D2-D4,-(SP)	; save registers needed for local variables
*	DO == n
*	D1 == a
*	D2 == b
*	D3 == guess
*	D4 == temp
*
*		a = 1;
*		b = n;
	MOVEQ	#1,D1
	MOVE.L	D0,D2
*		do {
	REPEAT
*		guess = (a+b)/2;
	MOVE.L	D1,D3
	ADD.L	D2,D3
	LSR.L	#1,D3
*		if (guess*guess > n) {	// inverse function of sqrt is square
	MOVE.L	D3,D4
	MULU	D4,D4		; guess^2
	CMP.L	D0,D4
	BLS	.else
*		b = guess;
	MOVE.L	D3,D2
	BRA	.endif
*		} else {
.else:
*		a = guess;
	MOVE.L	D3,D1
*		} //if
.endif:
*		} while ((b-a) > 1);	; Same as until (b-a)<=1 or until (a-b)>=1
	MOVE.L	D2,D4
	SUB.L	D1,D4	; b-a
	UNTIL.L	  D4 <LE> #1 DO.S
*		return (a)	; Result is in D1
*		} //LongSqrt()
	MOVEM.L	(SP)+,D2-D4	; restore saved registers
	RTS
*
* ** ================================================================================ **


** ======================================================================= **
*
**  Prime-number Sieve of Eratosthenes routine using a big bit field for flags  **
*  Enter with D0 = size of sieve (bit array)
*  Prints found primes 10 per line
*  Returns # prime found in D6
*
*   Register usage:
*
*	D0 == n
*	D1 == prime
*	D2 == sqroot
*	D3 == PIndex
*	D4 == CIndex
*	D5 == MaxIndex
*	D6 == PCount
*
*	A0 == PMtx[0]
*
*   On return, all registers above except D0 are modified. Could add MOVEMs to save and restore D2-D6/A0.
*

**	------------------------	**

GetBit:		** sub-part of Sieve subroutine **
		** Entry: bit # is on TOS
		** Exit: A6 holds the byte number, D7 holds the bit number within the byte
		** Note: Input param is still on TOS after return. Could have passed via a register, but
                **  wanted to practice with stack. :)
*		
	MOVE.L	(4,SP),D7	; get value from (pre-call) TOS
	ASR.L	#3,D7	; /8
	MOVEA	D7,A6	; byte #
	MOVE.L	(4,SP),D7	; get value from (pre-call) TOS
	AND.L	#$7,D7	; bit #
	RTS

**	------------------------	**

Sieve:
	MOVE	D0,D5
	SUBQ	#1,D5
	JSR	SquareRoot	; sqrt D0 => D1
	MOVE.L	D1,D2
	LEA	PArray,A0
	CLR.L	D3
*
PrimeLoop:
	MOVE.L	D3,D1
	val	D1
	MOVE.L	D3,D4
	ADD.L	D1,D4
*
CxLoop:		; Goes through array marking multiples of d1 as composite numbers
	CMP.L	D5,D4
	BHI	ExitCx
	PUSH	D4	; set D7 as bit # and A6 as byte pointer for D4'th bit of array
	JSR GetBit
	DROP
	BSET	D7,0(A0,A6.L)	; set bit to mark as composite number
	ADD.L	D1,D4	; next number to mark
	BRA	CxLoop
ExitCx:
	CLR.L	D1	; Clear new-prime-found flag
	ADDQ	#1,D3	; Start just past last prime found 
PxLoop:		; Searches for next unmarked (not composite) number
	CMP.L	D2,D3	; no point searching past where first unmarked multiple would be past end of array
	BHI	ExitPx	; if past end of array
	TST.L	D1
	BNE	ExitPx	; if flag set, new prime found
	PUSH D3		; check D3'th bit flag
	JSR	GetBit	; sets D7 as bit # and A6 as byte pointer
	DROP		; drop TOS
	BTST	D7,0(A0,A6.L)	; read bit flag
	BNE	IsSet	; If already tagged as composite
	MOVEQ	#-1,D1	; Set flag that we've found a new prime
IsSet:
	ADDQ	#1,D3	; next PIndex
	BRA	PxLoop
ExitPx:
	SUBQ	#1,D3	; back up PIndex
	TST.L	D1	; Did we find a new prime #?
	BNE	PrimeLoop	; If another prime # found, go process it
*
		; fall through to print routine

**	------------------------	**

* Print primes found
*
*	D4 == Column count
*
*	Print header and assumed primes (#1, #2) 
    	PUTS	Header	; Print string @ Header, no CR/LF
	MOVEQ	#2,D6	; Start counter at 2 because #1 and #2 are assumed primes
	MOVEQ	#2,D4
*
	MOVEQ	#0,D3
PrintLoop:
	CMP.L	D5,D3
	BHS	ExitPL
	PUSH	D3
	JSR	GetBit	; sets D7 as bit # and A6 as byte pointer
	DROP		; drop TOS
	BTST	D7,0(A0,A6.L)
	BNE		NotPrime
*		printf(" %6d", val(PIndex)
	MOVE.L	D3,D1
	val	D1
	AND.L	#$0000FFFF,D1
	MOVEQ	#6,D2
	MOVEQ	#20,D0	; display signed RJ
	TRAP	#15
	ADDQ	#1,D4
	ADDQ	#1,D6
*	*** Display formatting ***
*		if((PCount % 10) == 0) printf("\n");
	CMP	#10,D4
	BLO	NoLF
	PUTS	CRLF
	MOVEQ	#0,D4
NoLF:
NotPrime:
	ADDQ	#1,D3
	BRA	PrintLoop
ExitPL:
	RTS

** ======================================================================= **

N	EQU	5000	; *** Size of boolean (bit) array ***
SizeInBytes	EQU	(N+7)/8
*
START:                  	; first instruction of program
	MOVE.L	#N,D0	; # to test
	JSR	Sieve
*		printf("\n %d prime numbers found.\n", D6); ***
	PUTS	Summary1,A1
	MOVE	#3,D0	; Display signed number in D1.L in decimal in smallest field.
	MOVE.W	D6,D1
	TRAP	#15
	PUTS	Summary2,A1

	SIMHALT             	; halt simulator

** ======================================================================= **

* Variables and constants here

	ORG	$2000
CR	EQU	13
LF	EQU	10
CRLF	DC.B	CR,LF,$00

PArray:	DCB.B	SizeInBytes,0

Header:	DC.B	CR,LF,LF,' Primes',CR,LF,' ======',CR,LF
		DC.B	'     1     2',$00

Summary1:	DC.B	CR,LF,' ',$00
Summary2:	DC.B	' prime numbers found.',CR,LF,$00

    END    START        	; last line of source


8086 Assembly

MAXPRM:	equ	5000		; Change this value for more primes
	cpu	8086
	bits	16
	org	100h
section	.text
erato:	mov	cx,MAXPRM	; Initialize array (set all items to prime)
	mov	bp,cx		; Keep a copy in BP
	mov	di,sieve
	mov	al,1
	rep	stosb
	;;;	Sieve
	mov	bx,sieve	; Set base register to array
	inc	cx		; CX=1 (CH=0, CL=1); CX was 0 before
	mov	si,cx		; Start at number 2 (1+1)
.next:	inc	si		; Next number 
	cmp	cl,[bx+si]	; Is this number marked as prime?
	jne	.next		; If not, try next number
	mov	ax,si		; Otherwise, calculate square,
	mul	si
	mov	di,ax		; and put it in DI
	cmp	di,bp		; Check array bounds
	ja	output		; We're done when SI*SI>MAXPRM
.mark:	mov	[bx+di],ch	; Mark byte as composite
	add	di,si		; Next composite
	cmp 	di,bp		; While maximum not reached
	jbe	.mark
	jmp	.next
	;;;	Output
output:	mov	si,2		; Start at 2
.test:	dec	byte [bx+si]	; Prime?
	jnz	.next		; If not, try next number
	mov	ax,si		; Otherwise, print number
	call	prax
.next:	inc	si
	cmp	si,MAXPRM
	jbe	.test
	ret
	;;;	Write number in AX to standard output (using MS-DOS)
prax:	push	bx		; Save BX
	mov	bx,numbuf
	mov	bp,10		; Divisor
.loop:	xor	dx,dx		; Divide AX by 10, modulus in DX
	div	bp
	add	dl,'0'		; ASCII digit
	dec	bx
	mov	[bx],dl		; Store ASCII digit
	test	ax,ax		; More digits?
	jnz	.loop
	mov	dx,bx		; Print number
	mov	ah,9		; 9 = MS-DOS syscall to print string
	int	21h
	pop	bx		; Restore BX
	ret
section	.data
	db	'*****'		; Room for number
numbuf:	db	13,10,'$'
section	.bss 
sieve:	resb	MAXPRM
Output:
2
3
5
7
11
...
4969
4973
4987
4993
4999

8th

with: n

\ create a new buffer which will function as a bit vector
: bit-vec SED: n -- b
    dup 3 shr swap 7 band if 1+ then b:new b:clear ;

\ given a buffer, sieving prime, and limit, cross off multiples
\ of the sieving prime.
: +composites SED: b n n -- b
    >r dup sqr rot \ want: -- n index b
    repeat
        over 1- true b:bit!
        >r over + r>
    over r@ > until!
    rdrop nip nip ;

\ SoE algorithm proper
: make-sieve SED: n -- b
    dup>r bit-vec 2
    repeat
        tuck 1- b:bit@ not
        if
            over r@ +composites
        then swap 1+
    dup sqr r@ < while!
    rdrop drop ;

\ traverse the final buffer, creating an array of primes
: sieve>a SED: b n -- a
    >r a:new swap
    ( 1- b:bit@ not if >r I a:push r> then ) 2 r> loop drop ;

;with

: sieve SED: n -- a
    dup make-sieve swap sieve>a ;
Output:
ok> 100 sieve .
 [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]
ok> 1_000_000 sieve a:len . \ count primes up to 1,000,000
78498
ok> -1 a:@ . \ largest prime < 1,000,000
999983

AArch64 Assembly

Works with: as version Raspberry Pi 3B version Buster 64 bits
/* ARM assembly AARCH64 Raspberry PI 3B */
/*  program cribleEras64.s   */

/*******************************************/
/* Constantes file                         */
/*******************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeConstantesARM64.inc"

.equ MAXI,      100

/*********************************/
/* Initialized data              */
/*********************************/
.data
sMessResult:        .asciz "Prime  : @ \n"
szCarriageReturn:   .asciz "\n"

/*********************************/
/* UnInitialized data            */
/*********************************/
.bss  
sZoneConv:                  .skip 24
TablePrime:                 .skip   8 * MAXI 
/*********************************/
/*  code section                 */
/*********************************/
.text
.global main 
main:                               // entry of program 
    ldr x4,qAdrTablePrime           // address prime table
    mov x0,#2                       // prime 2
    bl displayPrime
    mov x1,#2
    mov x2,#1
1:                                  // loop for multiple of 2
    str x2,[x4,x1,lsl #3]           // mark  multiple of 2
    add x1,x1,#2
    cmp x1,#MAXI                    // end ?
    ble 1b                          // no loop
    mov x1,#3                       // begin indice
    mov x3,#1
2:
    ldr x2,[x4,x1,lsl #3]           // load table élément
    cmp x2,#1                       // is prime ?
    beq 4f
    mov x0,x1                       // yes -> display
    bl displayPrime
    mov x2,x1
3:                                  // and loop to mark multiples of this prime
    str x3,[x4,x2,lsl #3]
    add x2,x2,x1                    // add the prime
    cmp x2,#MAXI                    // end ?
    ble 3b                          // no -> loop
4:
    add x1,x1,2                     // other prime in table
    cmp x1,MAXI                     // end table ?
    ble 2b                          // no -> loop

100:                                // standard end of the program 
    mov x0,0                        // return code
    mov x8,EXIT                     // request to exit program
    svc 0                           // perform the system call
qAdrszCarriageReturn:    .quad szCarriageReturn
qAdrsMessResult:         .quad sMessResult
qAdrTablePrime:          .quad TablePrime

/******************************************************************/
/*      Display prime table elements                                */ 
/******************************************************************/
/* x0 contains the prime */
displayPrime:
    stp x1,lr,[sp,-16]!             // save  registers
    ldr x1,qAdrsZoneConv
    bl conversion10                 // call décimal conversion
    ldr x0,qAdrsMessResult
    ldr x1,qAdrsZoneConv            // insert conversion in message
    bl strInsertAtCharInc
    bl affichageMess                // display message
100:
    ldp x1,lr,[sp],16               // restaur  2 registers
    ret                             // return to address lr x30
qAdrsZoneConv:                   .quad sZoneConv  

/********************************************************/
/*        File Include fonctions                        */
/********************************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeARM64.inc"
Prime  : 2
Prime  : 3
Prime  : 5
Prime  : 7
Prime  : 11
Prime  : 13
Prime  : 17
Prime  : 19
Prime  : 23
Prime  : 29
Prime  : 31
Prime  : 37
Prime  : 41
Prime  : 43
Prime  : 47
Prime  : 53
Prime  : 59
Prime  : 61
Prime  : 67
Prime  : 71
Prime  : 73
Prime  : 79
Prime  : 83
Prime  : 89
Prime  : 97

ABAP

PARAMETERS: p_limit TYPE i OBLIGATORY DEFAULT 100.

AT SELECTION-SCREEN ON p_limit.
  IF p_limit LE 1.
    MESSAGE 'Limit must be higher then 1.' TYPE 'E'.
  ENDIF.

START-OF-SELECTION.
  FIELD-SYMBOLS: <fs_prime> TYPE flag.
  DATA: gt_prime TYPE TABLE OF flag,
        gv_prime TYPE flag,
        gv_i     TYPE i,
        gv_j     TYPE i.

  DO p_limit TIMES.
    IF sy-index > 1.
      gv_prime = abap_true.
    ELSE.
      gv_prime = abap_false.
    ENDIF.

    APPEND gv_prime TO gt_prime.
  ENDDO.

  gv_i = 2.
  WHILE ( gv_i <= trunc( sqrt( p_limit ) ) ).
    IF ( gt_prime[ gv_i ] EQ abap_true ).
      gv_j =  gv_i ** 2.
      WHILE ( gv_j <= p_limit ).
        gt_prime[ gv_j ] = abap_false.
        gv_j = ( gv_i ** 2 ) + ( sy-index * gv_i ).
      ENDWHILE.
    ENDIF.
    gv_i = gv_i + 1.
  ENDWHILE.

  LOOP AT gt_prime INTO gv_prime.
    IF gv_prime = abap_true.
      WRITE: / sy-tabix.
    ENDIF.
  ENDLOOP.

ABC

HOW TO SIEVE UP TO n:
    SHARE sieve
    PUT {} IN sieve
    FOR cand IN {2..n}: PUT 1 IN sieve[cand]
    FOR cand IN {2..floor root n}:
        IF sieve[cand] = 1:
            PUT cand*cand IN comp
            WHILE comp <= n:
                PUT 0 IN sieve[comp]
                PUT comp+cand IN comp

HOW TO REPORT prime n:
    SHARE sieve
    IF n<2: FAIL
    REPORT sieve[n] = 1

SIEVE UP TO 100
FOR n IN {1..100}:
    IF prime n: WRITE n
Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

ACL2

(defun nats-to-from (n i)
   (declare (xargs :measure (nfix (- n i))))
   (if (zp (- n i))
       nil
       (cons i (nats-to-from n (+ i 1)))))

(defun remove-multiples-up-to-r (factor limit xs i)
   (declare (xargs :measure (nfix (- limit i))))
   (if (or (> i limit)
           (zp (- limit i))
           (zp factor))
       xs
       (remove-multiples-up-to-r
        factor
        limit
        (remove i xs)
        (+ i factor))))

(defun remove-multiples-up-to (factor limit xs)
   (remove-multiples-up-to-r factor limit xs (* factor 2)))

(defun sieve-r (factor limit)
   (declare (xargs :measure (nfix (- limit factor))))
   (if (zp (- limit factor))
       (nats-to-from limit 2)
       (remove-multiples-up-to factor (+ limit 1)
                               (sieve-r (1+ factor) limit))))

(defun sieve (limit)
   (sieve-r 2 limit))

Action!

DEFINE MAX="1000"

PROC Main()
  BYTE ARRAY t(MAX+1)
  INT i,j,k,first

  FOR i=0 TO MAX
  DO
    t(i)=1
  OD

  t(0)=0
  t(1)=0

  i=2 first=1
  WHILE i<=MAX
  DO
    IF t(i)=1 THEN
      IF first=0 THEN
        Print(", ")
      FI
      PrintI(i)
      FOR j=2*i TO MAX STEP i
      DO
        t(j)=0
      OD
      first=0
    FI
    i==+1
  OD
RETURN
Output:

Screenshot from Atari 8-bit computer

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103,
107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223,
227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347,
349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463,
467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607,
613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743,
751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883,
887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997

ActionScript

Works with ActionScript 3.0 (this is utilizing the actions panel, not a separated class file)

function eratosthenes(limit:int):Array
{
	var primes:Array = new Array();
	if (limit >= 2) {
		var sqrtlmt:int = int(Math.sqrt(limit) - 2);
		var nums:Array = new Array(); // start with an empty Array...
		for (var i:int = 2; i <= limit; i++) // and
			nums.push(i); // only initialize the Array once...
		for (var j:int = 0; j <= sqrtlmt; j++) {
			var p:int = nums[j]
			if (p)
				for (var t:int = p * p - 2; t < nums.length; t += p)
					nums[t] = 0;
		}
		for (var m:int = 0; m < nums.length; m++) {
			var r:int = nums[m];
			if (r)
				primes.push(r);
		}
	}
	return primes;
}
var e:Array = eratosthenes(1000);
trace(e);

Output:

Output:
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,947,953,967,971,977,983,991,997

Ada

with Ada.Text_IO, Ada.Command_Line;

procedure Eratos is
 
   Last: Positive := Positive'Value(Ada.Command_Line.Argument(1));
   Prime: array(1 .. Last) of Boolean := (1 => False, others => True);
   Base: Positive := 2;
   Cnt: Positive;
begin
   while Base * Base <= Last loop
      if Prime(Base) then
         Cnt := Base + Base;
         while Cnt <= Last loop
            Prime(Cnt) := False;
            Cnt := Cnt + Base;
         end loop;
      end if;
      Base := Base + 1;
   end loop;
   Ada.Text_IO.Put("Primes less or equal" & Positive'Image(Last) &" are:");
   for Number in Prime'Range loop
      if Prime(Number) then
         Ada.Text_IO.Put(Positive'Image(Number));
      end if;
   end loop;
end Eratos;
Output:
> ./eratos 31
Primes less or equal 31 are : 2 3 5 7 11 13 17 19 23 29 31

Agda

-- imports
open import Data.Nat as      using (ℕ; suc; zero; _+_; _∸_)
open import Data.Vec as Vec   using (Vec; _∷_; []; tabulate; foldr)
open import Data.Fin as Fin   using (Fin; suc; zero)
open import Function          using (_∘_; const; id)
open import Data.List as List using (List; _∷_; [])
open import Data.Maybe        using (Maybe; just; nothing)

-- Without square cutoff optimization
module Simple where
  primes :  n  List (Fin n)
  primes zero = []
  primes (suc zero) = []
  primes (suc (suc zero)) = []
  primes (suc (suc (suc m))) = sieve (tabulate (just  suc))
    where
    sieve :  {n}  Vec (Maybe (Fin (2 + m))) n  List (Fin (3 + m))
    sieve [] = []
    sieve (nothing  xs) =         sieve xs
    sieve (just x   xs) = suc x  sieve (foldr B remove (const []) xs x)
      where
      B = λ n   {i}  Fin i  Vec (Maybe (Fin (2 + m))) n

      remove :  {n}  Maybe (Fin (2 + m))  B n  B (suc n)
      remove _ ys zero    = nothing  ys x
      remove y ys (suc z) = y        ys z

-- With square cutoff optimization
module SquareOpt where
  primes :  n  List (Fin n)
  primes zero = []
  primes (suc zero) = []
  primes (suc (suc zero)) = []
  primes (suc (suc (suc m))) = sieve 1 m (Vec.tabulate (just  Fin.suc  Fin.suc))
    where
    sieve :  {n}      Vec (Maybe (Fin (3 + m))) n  List (Fin (3 + m))
    sieve _ zero = List.mapMaybe id  Vec.toList
    sieve _ (suc _) [] = []
    sieve i (suc l) (nothing  xs) =     sieve (suc i) (l  i  i) xs
    sieve i (suc l) (just x   xs) = x  sieve (suc i) (l  i  i) (Vec.foldr B remove (const []) xs i)
      where
      B = λ n    Vec (Maybe (Fin (3 + m))) n

      remove :  {i}  Maybe (Fin (3 + m))  B i  B (suc i)
      remove _ ys zero    = nothing  ys i
      remove y ys (suc j) = y        ys j

Agena

Tested with Agena 2.9.5 Win32

# Sieve of Eratosthenes

# generate and return a sequence containing the primes up to sieveSize
sieve := proc( sieveSize :: number ) :: sequence is
    local sieve, result;

    result := seq(); # sequence of primes - initially empty
    create register sieve( sieveSize ); # "vector" to be sieved

    sieve[ 1 ] := false;
    for sPos from 2 to sieveSize do sieve[ sPos ] := true od;

    # sieve the primes
    for sPos from 2 to entier( sqrt( sieveSize ) ) do
        if sieve[ sPos ] then
            for p from sPos * sPos to sieveSize by sPos do
                sieve[ p ] := false
            od
        fi
    od;

    # construct the sequence of primes
    for sPos from 1 to sieveSize do
        if sieve[ sPos ] then insert sPos into result fi
    od

return result
end; # sieve


# test the sieve proc
for i in sieve( 100 ) do write( " ", i ) od; print();
Output:
 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

ALGOL 60

Based on the 1962 Revised Repport:

comment Sieve of Eratosthenes;
begin
   integer array t[0:1000];
   integer i,j,k;
   for i:=0 step 1 until 1000 do t[i]:=1;
   t[0]:=0; t[1]:=0; i:=0;
   for i:=i while i<1000 do
   begin
       for i:=i while i<1000 and t[i]=0 do i:=i+1;
       if i<1000 then
       begin
           j:=2;
           k:=j*i;
           for k:=k while k<1000 do
           begin
               t[k]:=0;
               j:=j+1;
               k:=j*i
           end;
           i:=i+1
       end
   end;
   for i:=0 step 1 until 999 do
   if t[i]≠0 then print(i,ꞌ is primeꞌ)
end

An 1964 Implementation:

Works with: ALGOL 60 for OS/360

'BEGIN'
    'INTEGER' 'ARRAY' CANDIDATES(/0..1000/);
    'INTEGER' I,J,K;
    'COMMENT' SET LINE-LENGTH=120,SET LINES-PER-PAGE=62,OPEN;
    SYSACT(1,6,120); SYSACT(1,8,62); SYSACT(1,12,1);
    'FOR' I := 0 'STEP' 1 'UNTIL' 1000 'DO'
    'BEGIN'
        CANDIDATES(/I/) := 1;
    'END';
    CANDIDATES(/0/) := 0;
    CANDIDATES(/1/) := 0;
    I := 0;
    'FOR' I := I 'WHILE' I 'LESS' 1000 'DO'
    'BEGIN'
        'FOR' I := I 'WHILE' I 'LESS' 1000
          'AND' CANDIDATES(/I/) 'EQUAL' 0 'DO'
            I := I+1;
        'IF' I 'LESS' 1000 'THEN'
        'BEGIN'
            J := 2;
            K := J*I;
            'FOR' K := K 'WHILE' K 'LESS' 1000 'DO'
            'BEGIN'
                CANDIDATES(/K/) := 0;
                J := J + 1;
                K := J*I;
            'END';
            I := I+1;
        'END';
        'FOR' I := 0 'STEP' 1 'UNTIL' 999 'DO'
        'IF' CANDIDATES(/I/) 'NOTEQUAL' 0  'THEN'
        'BEGIN'
            OUTINTEGER(1,I);
            OUTSTRING(1,'(' IS PRIME')');
            'COMMENT' NEW LINE;
            SYSACT(1,14,1)
        'END'
    'END'
'END'

ALGOL 68

BOOL prime = TRUE, non prime = FALSE;
PROC eratosthenes = (INT n)[]BOOL:
(
  [n]BOOL sieve;
  FOR i TO UPB sieve DO sieve[i] := prime OD;
  INT m = ENTIER sqrt(n);
  sieve[1] := non prime;
  FOR i FROM 2 TO m DO
    IF sieve[i] = prime THEN
      FOR j FROM i*i BY i TO n DO
        sieve[j] := non prime
      OD
    FI
  OD;
  sieve
);
 
 print((eratosthenes(80),new line))
Output:
FTTFTFTFFFTFTFFFTFTFFFTFFFFFTFTFFFFFTFFFTFTFFFTFFFFFTFFFFFTFTFFFFFTFFFTFTFFFFFTF

ALGOL W

Standard, non-optimised sieve

begin

    % implements the sieve of Eratosthenes                                   %
    %     s(i) is set to true if i is prime, false otherwise                 %
    %     algol W doesn't have a upb operator, so we pass the size of the    %
    %     array in n                                                         %
    procedure sieve( logical array s ( * ); integer value n ) ;
    begin

        % start with everything flagged as prime                             % 
        for i := 1 until n do s( i ) := true;

        % sieve out the non-primes                                           %
        s( 1 ) := false;
        for i := 2 until truncate( sqrt( n ) )
        do begin
            if s( i )
            then begin
                for p := i * i step i until n do s( p ) := false
            end if_s_i
        end for_i ;

    end sieve ;

    % test the sieve procedure                                               %

    integer sieveMax;

    sieveMax := 100;
    begin

        logical array s ( 1 :: sieveMax );

        i_w := 2; % set output field width                                   %
        s_w := 1; % and output separator width                               %

        % find and display the primes                                        %
        sieve( s, sieveMax );
        for i := 1 until sieveMax do if s( i ) then writeon( i );

    end

end.
Output:
 2  3  5  7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 

Odd numbers only version

Alternative version that only stores odd numbers greater than 1 in the sieve.

begin
    % implements the sieve of Eratosthenes                                   %
    % only odd numbers appear in the sieve, which starts at 3                %
    % s( i ) is set to true if ( i * 2 ) + 1 is prime                        %
    procedure sieve2( logical array s ( * ); integer value n ) ;
    begin
        % start with everything flagged as prime                             % 
        for i := 1 until n do s( i ) := true;
        % sieve out the non-primes                                           %
        % the subscripts of s are  1  2  3  4  5  6  7  8  9 10 11 12 13...  %
        %      which correspond to 3  5  7  9 11 13 15 17 19 21 23 25 27...  %
        for i := 1 until truncate( sqrt( n ) ) do begin
            if s( i ) then begin
                integer ip;
                ip := ( i * 2 ) + 1;
                for p := i + ip step ip until n do s( p ) := false
            end if_s_i
        end for_i ;
    end sieve2 ;
    % test the sieve2 procedure                                              %
    integer primeMax, arrayMax;
    primeMax := 100;
    arrayMax := ( primeMax div 2 ) - 1;
    begin
        logical array s ( 1 :: arrayMax);
        i_w := 2; % set output field width                                   %
        s_w := 1; % and output separator width                               %
        % find and display the primes                                        %
        sieve2( s, arrayMax );
        write( 2 );
        for i := 1 until arrayMax do if s( i ) then writeon( ( i * 2 ) + 1 );
    end
end.
Output:

Same as the standard version.

ALGOL-M

BEGIN

COMMENT
  FIND PRIMES UP TO THE SPECIFIED LIMIT (HERE 1,000) USING 
  CLASSIC SIEVE OF ERATOSTHENES;

% CALCULATE INTEGER SQUARE ROOT %
INTEGER FUNCTION ISQRT(N);
INTEGER N;
BEGIN
    INTEGER R1, R2;
    R1 := N;
    R2 := 1;
    WHILE R1 > R2 DO
        BEGIN
            R1 := (R1+R2) / 2;
            R2 := N / R1;
        END;
    ISQRT := R1;
END;

INTEGER LIMIT, I, J, FALSE, TRUE, COL, COUNT;
INTEGER ARRAY FLAGS[1:1000];

LIMIT := 1000;
FALSE := 0;
TRUE := 1;

WRITE("FINDING PRIMES FROM 2 TO",LIMIT);

% INITIALIZE TABLE %
WRITE("INITIALIZING ... ");
FOR I := 1 STEP 1 UNTIL LIMIT DO
  FLAGS[I] := TRUE;

% SIEVE FOR PRIMES %
WRITEON("SIEVING ... ");
FOR I := 2 STEP 1 UNTIL ISQRT(LIMIT) DO
  BEGIN
    IF FLAGS[I] = TRUE THEN
        FOR J := (I * I) STEP I UNTIL LIMIT DO
          FLAGS[J] := FALSE;
  END;

% WRITE OUT THE PRIMES TEN PER LINE %
WRITEON("PRINTING");
COUNT := 0;
COL := 1;
WRITE("");  
FOR I := 2 STEP 1 UNTIL LIMIT DO
  BEGIN
    IF FLAGS[I] = TRUE THEN 
      BEGIN
         WRITEON(I);
         COUNT := COUNT + 1;
         COL := COL + 1;
         IF COL > 10 THEN
           BEGIN
             WRITE("");
             COL := 1;
           END;
      END;
  END;

WRITE("");
WRITE(COUNT, " PRIMES WERE FOUND.");

END
Output:
FINDING PRIMES FROM 2 TO  1000
INTIALIZING ... SIEVING ... PRINTING
    2     3     5     7    11    13    17    19    23    29
   31    37    41    43    47    53    59    61    67    71
                      . . .
  877   881   883   887   907   911   919   929   937   941
  947   953   967   971   977   983   991   997

  168 PRIMES WERE FOUND.

APL

All these versions requires ⎕io←0 (index origin 0).

It would have been better to require a result of the boolean mask rather than the actual list of primes. The list of primes obtains readily from the mask by application of a simple function (here {⍵/⍳⍴⍵}). Other related computations (such as the number of primes < n) obtain readily from the mask, easier than producing the list of primes.

Non-Optimized Version

sieve2{                          
  b1             
  b[2]0         
  2⍵:b             
  p{/⍳⍴}*0.5  
  m1+⌊(-1+p×p)÷p  
  b  p {b[×+⍳]0}¨ m
}

primes2{/⍳⍴}sieve2

The required list of prime divisors obtains by recursion ({⍵/⍳⍴⍵}∇⌈⍵*0.5).

=== Optimized Version ===|

sieve{                           
  b{(×/)¨~¨1}2 3 5
  b[6](6)0 0 1 1 0 1 
  49⍵:b                    
  p3{/⍳⍴}*0.5        
  m1+⌊(-1+p×p)÷2×p        
  b  p {b[×+2×⍳]0}¨ m      
}

primes{/⍳⍴}sieve

The optimizations are as follows:

  • Multiples of 2 3 5 are marked by initializing b with ⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5 rather than with ⍵⍴1.
  • Subsequently, only odd multiples of primes > 5 are marked.
  • Multiples of a prime to be marked start at its square.

Examples

   primes 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

   primes¨ 14
┌┬┬┬─┬───┬───┬─────┬─────┬───────┬───────┬───────┬───────┬──────────┬──────────┐
││││22 32 32 3 52 3 52 3 5 72 3 5 72 3 5 72 3 5 72 3 5 7 112 3 5 7 11
└┴┴┴─┴───┴───┴─────┴─────┴───────┴───────┴───────┴───────┴──────────┴──────────┘

   sieve 13
0 0 1 1 0 1 0 1 0 0 0 1 0

   +/∘sieve¨ 10*⍳10
0 4 25 168 1229 9592 78498 664579 5761455 50847534

The last expression computes the number of primes < 1e0 1e1 ... 1e9. The last number 50847534 can perhaps be called the anti-Bertelsen's number (http://mathworld.wolfram.com/BertelsensNumber.html).

AppleScript

on sieveOfEratosthenes(limit)
    script o
        property numberList : {missing value}
    end script
    
    repeat with n from 2 to limit
        set end of o's numberList to n
    end repeat
    repeat with n from 2 to (limit ^ 0.5 div 1)
        if (item n of o's numberList is n) then
            repeat with multiple from (n * n) to limit by n
                set item multiple of o's numberList to missing value
            end repeat
        end if
    end repeat
    
    return o's numberList's numbers
end sieveOfEratosthenes

sieveOfEratosthenes(1000)
Output:
{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997}

ARM Assembly

Works with: as version Raspberry Pi
/* ARM assembly Raspberry PI  */
/*  program cribleEras.s   */

 /* REMARK 1 : this program use routines in a include file 
   see task Include a file language arm assembly 
   for the routine affichageMess conversion10 
   see at end of this program the instruction include */
/* for constantes see task include a file in arm assembly */
/************************************/
/* Constantes                       */
/************************************/
.include "../constantes.inc"

.equ MAXI,      101


/*********************************/
/* Initialized data              */
/*********************************/
.data
sMessResult:        .asciz "Prime  : @ \n"
szCarriageReturn:   .asciz "\n"

/*********************************/
/* UnInitialized data            */
/*********************************/
.bss  
sZoneConv:                  .skip 24
TablePrime:                 .skip   4 * MAXI 
/*********************************/
/*  code section                 */
/*********************************/
.text
.global main 
main:                               @ entry of program 
    ldr r4,iAdrTablePrime           @ address prime table
    mov r0,#2                       @ prime 2
    bl displayPrime
    mov r1,#2
    mov r2,#1
1:                                  @ loop for multiple of 2
    str r2,[r4,r1,lsl #2]           @ mark  multiple of 2
    add r1,#2
    cmp r1,#MAXI                    @ end ?
    ble 1b                          @ no loop
    mov r1,#3                       @ begin indice
    mov r3,#1
2:
    ldr r2,[r4,r1,lsl #2]           @ load table élément
    cmp r2,#1                       @ is prime ?
    beq 4f
    mov r0,r1                       @ yes -> display
    bl displayPrime
    mov r2,r1
3:                                  @ and loop to mark multiples of this prime
    str r3,[r4,r2,lsl #2]
    add r2,r1                       @ add the prime
    cmp r2,#MAXI              @ end ?
    ble 3b                          @ no -> loop
4:
    add r1,#2                       @ other prime in table
    cmp r1,#MAXI              @ end table ?
    ble 2b                          @ no -> loop

100:                                @ standard end of the program 
    mov r0, #0                      @ return code
    mov r7, #EXIT                   @ request to exit program
    svc #0                          @ perform the system call
iAdrszCarriageReturn:    .int szCarriageReturn
iAdrsMessResult:         .int sMessResult
iAdrTablePrime:          .int TablePrime

/******************************************************************/
/*      Display prime table elements                                */ 
/******************************************************************/
/* r0 contains the prime */
displayPrime:
    push {r1,lr}                    @ save registers
    ldr r1,iAdrsZoneConv
    bl conversion10                 @ call décimal conversion
    ldr r0,iAdrsMessResult
    ldr r1,iAdrsZoneConv            @ insert conversion in message
    bl strInsertAtCharInc
    bl affichageMess                @ display message
100:
    pop {r1,lr}
    bx lr
iAdrsZoneConv:                   .int sZoneConv  
/***************************************************/
/*      ROUTINES INCLUDE                           */
/***************************************************/
.include "../affichage.inc"
Prime  : 2
Prime  : 3
Prime  : 5
Prime  : 7
Prime  : 11
Prime  : 13
Prime  : 17
Prime  : 19
Prime  : 23
Prime  : 29
Prime  : 31
Prime  : 37
Prime  : 41
Prime  : 43
Prime  : 47
Prime  : 53
Prime  : 59
Prime  : 61
Prime  : 67
Prime  : 71
Prime  : 73
Prime  : 79
Prime  : 83
Prime  : 89
Prime  : 97
Prime  : 101

Arturo

sieve: function [upto][
    composites: array.of: inc upto false
    loop 2..to :integer sqrt upto 'x [
        if not? composites\[x][
            loop range.step: x x^2 upto 'c [
                composites\[c]: true
            ]
        ]
    ]
    result: new []
    loop.with:'i composites 'c [
        unless c -> 'result ++ i
    ]
    return result -- [0,1]
]

print sieve 100
Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

AutoHotkey

Search autohotkey.com: of Eratosthenes
Source: AutoHotkey forum by Laszlo

MsgBox % "12345678901234567890`n" Sieve(20) 

Sieve(n) { ; Sieve of Eratosthenes => string of 0|1 chars, 1 at position k: k is prime 
   Static zero := 48, one := 49 ; Asc("0"), Asc("1") 
   VarSetCapacity(S,n,one) 
   NumPut(zero,S,0,"char") 
   i := 2 
   Loop % sqrt(n)-1 { 
      If (NumGet(S,i-1,"char") = one) 
         Loop % n//i 
            If (A_Index > 1) 
               NumPut(zero,S,A_Index*i-1,"char") 
      i += 1+(i>2) 
   } 
   Return S 
}

Alternative Version

Sieve_of_Eratosthenes(n){
	arr := []
	loop % n-1
		if A_Index>1
			arr[A_Index] := true

	for i, v in arr	{
		if (i>Sqrt(n))
			break
		else if arr[i]
			while ((j := i*2 + (A_Index-1)*i) < n)
				arr.delete(j)
	}
	return Arr
}

Examples:

n := 101
Arr := Sieve_of_Eratosthenes(n)
loop, % n-1
	output .= (Arr[A_Index] ? A_Index : ".") . (!Mod(A_Index, 10) ? "`n" : "`t")
MsgBox % output
return
Output:
.	2	3	.	5	.	7	.	.	.
11	.	13	.	.	.	17	.	19	.
.	.	23	.	.	.	.	.	29	.
31	.	.	.	.	.	37	.	.	.
41	.	43	.	.	.	47	.	.	.
.	.	53	.	.	.	.	.	59	.
61	.	.	.	.	.	67	.	.	.
71	.	73	.	.	.	.	.	79	.
.	.	83	.	.	.	.	.	89	.
.	.	.	.	.	.	97	.	.	.

AutoIt

#include <Array.au3>
$M = InputBox("Integer", "Enter biggest Integer")
Global $a[$M], $r[$M], $c = 1
For $i = 2 To $M -1
	If Not $a[$i] Then
		$r[$c] = $i
		$c += 1
		For $k = $i To $M -1 Step $i
			$a[$k] = True
		Next
	EndIf
Next
$r[0] = $c - 1
ReDim $r[$c]
_ArrayDisplay($r)

AWK

An initial array holds all numbers 2..max (which is entered on stdin); then all products of integers are deleted from it; the remaining are displayed in the unsorted appearance of a hash table. Here, the script is entered directly on the commandline, and input entered on stdin:

$ awk '{for(i=2;i<=$1;i++) a[i]=1;
>       for(i=2;i<=sqrt($1);i++) for(j=2;j<=$1;j++) delete a[i*j];
>       for(i in a) printf i" "}'
100
71 53 17 5 73 37 19 83 47 29 7 67 59 11 97 79 89 31 13 41 23 2 61 43 3

The following variant does not unset non-primes, but sets them to 0, to preserve order in output:

$ awk '{for(i=2;i<=$1;i++) a[i]=1;
>       for(i=2;i<=sqrt($1);i++) for(j=2;j<=$1;j++) a[i*j]=0;
>       for(i=2;i<=$1;i++) if(a[i])printf i" "}'
100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

Now with the script from a file, input from commandline as well as stdin, and input is checked for valid numbers:

# usage:  gawk  -v n=101  -f sieve.awk

function sieve(n) { # print n,":"
	for(i=2; i<=n;      i++) a[i]=1;
	for(i=2; i<=sqrt(n);i++) for(j=2;j<=n;j++) a[i*j]=0;
	for(i=2; i<=n;      i++) if(a[i]) printf i" "
	print ""
}

BEGIN	{ print "Sieve of Eratosthenes:"
	  if(n>1) sieve(n)
	}

	{ n=$1+0 }
n<2	{ exit }
	{ sieve(n) }

END	{ print "Bye!" }

Here is an alternate version that uses an associative array to record composites with a prime dividing it. It can be considered a slow version, as it does not cross out composites until needed. This version assumes enough memory to hold all primes up to ULIMIT. It prints out noncomposites greater than 1.

BEGIN {  ULIMIT=100

for ( n=1 ; (n++) < ULIMIT ; ) 
    if (n in S) {
        p = S[n]
        delete S[n]
        for ( m = n ; (m += p) in S ; )  { }
        S[m] = p 
        }
    else  print ( S[(n+n)] = n )
}

Bash

See solutions at UNIX Shell.

BASIC

Works with: FreeBASIC
Works with: RapidQ
DIM n AS Integer, k AS Integer, limit AS Integer

INPUT "Enter number to search to: "; limit
DIM flags(limit) AS Integer

FOR n = 2 TO SQR(limit)
    IF flags(n) = 0 THEN
        FOR k = n*n TO limit STEP n
            flags(k) = 1
        NEXT k
    END IF
NEXT n

' Display the primes
FOR n = 2 TO limit
    IF flags(n) = 0 THEN PRINT n; ", ";
NEXT n

Applesoft BASIC

10  INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT
20  DIM FLAGS(LIMIT)
30  FOR N = 2 TO SQR (LIMIT)
40  IF FLAGS(N) < > 0 GOTO 80
50  FOR K = N * N TO LIMIT STEP N
60  FLAGS(K) = 1
70  NEXT K
80  NEXT N
90  REM  DISPLAY THE PRIMES
100  FOR N = 2 TO LIMIT
110  IF FLAGS(N) = 0 THEN PRINT N;", ";
120  NEXT N

Atari BASIC

Translation of: Commodore BASIC

Auto-initialization of arrays is not reliable, so we have to do our own. Also, PRINTing with commas doesn't quite format as nicely as one might hope, so we do a little extra work to keep the columns lined up.

100 REM SIEVE OF ERATOSTHENES
110 PRINT "LIMIT";:INPUT LI
120 DIM N(LI):FOR I=0 TO LI:N(I)=1:NEXT I
130 SL = SQR(LI)
140 N(0)=0:N(1)=0
150 FOR P=2 TO SL
160  IF N(P)=0 THEN 200
170  FOR I=P*P TO LI STEP P
180    N(I)=0
190  NEXT I
200 NEXT P
210 C=0
220 FOR I=2 TO LI
230   IF N(I)=0 THEN 260
240   PRINT I,:C=C+1
250   IF C=3 THEN PRINT:C=0
260 NEXT I
270 IF C THEN PRINT
Output:
  Ready
  RUN
  LIMIT?100
  2         3         5
  7         11        13
  17        19        23
  29        31        37
  41        43        47
  53        59        61
  67        71        73
  79        83        89
  97

CBASIC

Works with: CB80
limit% = 1000
true% = -1
false% = 0
dim flags%(limit%)

print "Finding primes from 2 to"; limit%

rem - all numbers are potentially prime
for i% = 2 to limit%
  flags%(i%) = true%
next i%

rem - sift for primes
outer.loop.limit% = int%(sqr(limit%))
for i% = 2 to outer.loop.limit%
  if flags%(i%) = true% then \
     for k% = i% * i% to limit% step i%
       if k% <= limit% then flags%(k%) = false%
     next k%
next i%


rem - write out the primes 12 per line
count% = 0
col% = 1
for i% = 2 to limit%
  if flags%(i%) = true% then \
    print using "### "; i%; : \ 
    count% = count% + 1 : \
    col% = col% + 1 
  if col% > 12 then print : col% = 1
next i%
print
print count%; "were found"

end
Output:
Finding primes from 2 to 1000
  2   3   5   7  11  13  17  19  23  29  31  37
 41  43  47  53  59  61  67  71  73  79  83  84
                      <snip>
829 839 853 857 859 863 877 881 883 887 907 911
919 929 937 941 947 953 967 971 997 983 991 997

 168 were found

Commodore BASIC

Since C= BASIC initializes arrays to all zeroes automatically, we avoid needing our own initialization loop by simply letting 0 mean prime and using 1 for composite.

100 REM SIEVE OF ERATOSTHENES
110 INPUT "LIMIT";LI
120 DIM N(LI)
130 SL = SQR(LI)
140 N(0)=1:N(1)=1
150 FOR P=2 TO SL
160 : IF N(P) THEN 200
170 : FOR I=P*P TO LI STEP P
180 :   N(I)=1
190 : NEXT I
200 NEXT P
210 FOR I=2 TO LI
220 : IF N(I)=0 THEN PRINT I,
230 NEXT I
240 PRINT
Output:
READY.
RUN
LIMIT? 100
 2         3         5         7
 11        13        17        19
 23        29        31        37
 41        43        47        53
 59        61        67        71
 73        79        83        89
 97

READY.

IS-BASIC

100 PROGRAM "Sieve.bas"
110 LET LIMIT=100
120 NUMERIC T(1 TO LIMIT)
130 FOR I=1 TO LIMIT
140   LET T(I)=0
150 NEXT
160 FOR I=2 TO SQR(LIMIT)
170   IF T(I)<>1 THEN
180     FOR K=I*I TO LIMIT STEP I
190       LET T(K)=1
200     NEXT
210   END IF
220 NEXT
230 FOR I=2 TO LIMIT ! Display the primes
240   IF T(I)=0 THEN PRINT I;
250 NEXT

Locomotive Basic

10 DEFINT a-z
20 INPUT "Limit";limit
30 DIM f(limit)
40 FOR n=2 TO SQR(limit)
50 IF f(n)=1 THEN 90
60 FOR k=n*n TO limit STEP n
70 f(k)=1
80 NEXT k
90 NEXT n
100 FOR n=2 TO limit
110 IF f(n)=0 THEN PRINT n;",";
120 NEXT

MSX Basic

5 REM Tested with MSXPen web emulator
6 REM Translated from Rosetta's ZX Spectrum implementation 
10 INPUT "Enter number to search to: ";l
20 DIM p(l)
30 FOR n=2 TO SQR(l)
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
70 NEXT k
80 NEXT n
90 REM Display the primes
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n

Sinclair ZX81 BASIC

If you only have 1k of RAM, this program will work—but you will only be able to sieve numbers up to 101. The program is therefore more useful if you have more memory available.

A note on FAST and SLOW: under normal circumstances the CPU spends about 3/4 of its time driving the display and only 1/4 doing everything else. Entering FAST mode blanks the screen (which we do not want to update anyway), resulting in substantially improved performance; we then return to SLOW mode when we have something to print out.

 10 INPUT L
 20 FAST
 30 DIM N(L)
 40 FOR I=2 TO SQR L
 50 IF N(I) THEN GOTO 90
 60 FOR J=I+I TO L STEP I
 70 LET N(J)=1
 80 NEXT J
 90 NEXT I
100 SLOW
110 FOR I=2 TO L
120 IF NOT N(I) THEN PRINT I;" ";
130 NEXT I

ZX Spectrum Basic

10 INPUT "Enter number to search to: ";l
20 DIM p(l)
30 FOR n=2 TO SQR l
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
70 NEXT k
80 NEXT n
90 REM Display the primes
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n

QBasic

Works with: QBasic version 1.1
Works with: QuickBasic version 4.5
limit = 120

DIM flags(limit)
FOR n = 2 TO limit
    flags(n) = 1
NEXT n

PRINT "Prime numbers less than or equal to "; limit; " are: "
FOR n = 2 TO SQR(limit)
    IF flags(n) = 1 THEN
        FOR i = n * n TO limit STEP n
            flags(i) = 0
    NEXT i
    END IF
NEXT n

FOR n = 1 TO limit
    IF flags(n) THEN PRINT USING "####"; n;
NEXT n
Output:
Prime numbers less than or equal to 120 are: 
   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97 101 103 107 109 113

BASIC256

arraybase 1
limit = 120

dim  flags(limit)
for n = 2 to limit
    flags[n] = True
next n

print "Prime numbers less than or equal to "; limit; " are: "
for n = 2 to sqr(limit)
    if flags[n] then
        for i = n * n to limit step n
            flags[i] = False
        next i
    end if
next n

for n = 1 to limit
    if flags[n] then print rjust(n,4);
next n
Output:
Prime numbers less than or equal to 120 are: 
   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71  73  79  83  89  97 101 103 107 109 113

True BASIC

Translation of: QBasic
LET limit = 120
DIM flags(0)
MAT redim flags(limit)
FOR n = 2 to limit
    LET flags(n) = 1
NEXT n
PRINT "Prime numbers less than or equal to "; limit; " are: "
FOR n = 2 to sqr(limit)
    IF flags(n) = 1 then
       FOR i = n*n to limit step n
           LET flags(i) = 0
       NEXT i
    END IF
NEXT n
FOR n = 1 to limit
    IF flags(n)<>0 then PRINT  using "####": n;
NEXT n
END
Output:
Same as QBasic entry.

QL SuperBASIC

using 'easy way' to 'add' 2n wheels

Translation of: ZX Spectrum Basic

Sets h$ to 1 for higher multiples of 2 via FILL$, later on sets STEP to 2n; replaces Floating Pt array p(z) with string variable h$(z) to sieve out all primes < z=441 (l=21) in under 1K, so that h$ is fillable to its maximum (32766), even on a 48K ZX Spectrum if translated back.

10 INPUT "Enter Stopping Pt for squared factors: ";z
15 LET l=SQR(z)
20 LET h$="10" : h$=h$ & FILL$("01",z)
40      FOR n=3 TO l 
50 IF h$(n): NEXT n
60 FOR k=n*n TO z STEP n+n: h$(k)=1 
80      END FOR n   
90 REM Display the primes     
100 FOR n=2 TO z: IF h$(n)=0: PRINT n;", ";

2i wheel emulation of Sinclair ZX81 BASIC

Backward-compatible also on Spectrums, as well as 1K ZX81s for all primes < Z=441. N.B. the STEP of 2 in line 40 mitigates line 50's inefficiency when going to 90.

 10 INPUT Z
 15 LET L=SQR(Z)
 30 LET H$="10"
 32 FOR J=3 TO Z STEP 2
 34 LET H$=H$ & "01"
 36 NEXT J
 40 FOR I=3 TO L STEP 2
 50 IF H$(I)="1" THEN GOTO 90
 60 FOR J=I*I TO Z STEP I+I
 70 LET H$(J)="1"
 80 NEXT J
 90 NEXT I
110 FOR I=2 TO Z
120 IF H$(I)="0" THEN PRINT I!
130 NEXT I

2i wheel emulation of Sinclair ZX80 BASIC

. . . with 2:1 compression (of 16-bit integer variables on ZX80s) such that it obviates having to account for any multiple of 2; one has to input odd upper limits on factors to be squared, L (=21 at most on 1K ZX80s for all primes till 439).

Backward-compatible on ZX80s after substituting ** for ^ in line 120.

 10 INPUT L
 15 LET Z=(L+1)*(L- 1)/2
 30 DIM H(Z)
 40 FOR I=3 TO L STEP 2
 50 IF H((I-1)/ 2) THEN GOTO 90
 60 FOR J=I*I TO L*L STEP I+I
 70 LET H((J-1)/ 2)=1
 80 NEXT J
 90 NEXT I
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I

Sieve of Sundaram

Objections that the latter emulation has strayed far from the given task are obviously justified. Yet not as obvious is that we are now just a slight transformation away from the Sieve of Sundaram, as transformed as follows: O is the highest value for an Index of succesive diagonal elements in Sundaram's matrix, for which H(J) also includes the off-diagonal elements in-between, such that duplicate entries are omitted. Thus, a slightly transformed Sieve of Sundaram is what Eratosthenes' Sieve becomes upon applying all optimisations incorporated into the prior entries for QL SuperBASIC, except for any equivalent to line 50 in them.

Backward-compatible on 1K ZX80s for all primes < 441 (O=10) after substituting ** for ^ in line 120.

 10 INPUT O
 15 LET Z=2*O*O+O*2
 30 DIM H(Z)
 40 FOR I=1 TO O
 45 LET A=2*I*I+I*2
 50 REM IF H(A) THEN GOTO 90
 60 FOR J=A TO Z STEP 1+I*2
 65 REM IF H(J) THEN GOTO 80
 70 LET H(J)=1
 80 NEXT J
 90 NEXT I 
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I

Eulerian optimisation

While slower than the optimised Sieve of Eratosthenes before it, the Sieve of Sundaram above has a compatible compression scheme that's more convenient than the conventional one used beforehand. It is therefore applied below along with Euler's alternative optimisation in a reversed implementation that lacks backward-compatibility to ZX80 BASIC. This program is designed around features & limitations of the QL, yet can be rewritten more efficiently for 1K ZX80s, as they allow integer variables to be parameters of FOR statements (& as their 1K of static RAM is equivalent to L1 cache, even in FAST mode). That's left as an exercise for ZX80 enthusiasts, who for o%=14 should be able to generate all primes < 841, i.e. 3 orders of (base 2) magnitude above the limit for the program listed under Sinclair ZX81 BASIC. In QL SuperBASIC, o% may at most be 127--generating all primes < 65,025 (almost 2x the upper limit for indices & integer variables used to calculate them ~2x faster than for floating point as used in line 30, after which the integer code mimics an assembly algorithm for the QL's 68008.)

 
10 INPUT "Enter highest value of diagonal index q%: ";o%
15 LET z%=o%*(2+o%*2) : h$=FILL$(" ",z%+o%) : q%=1 : q=q% : m=z% DIV (2*q%+1)
30 FOR p=m TO q STEP -1: h$((2*q+1)*p+q)="1" 
42 GOTO 87
61 IF h$(p%)="1": GOTO 63
62 IF p%<q%: GOTO 87 : ELSE h$((2*q%+1)*p%+q%)="1"
63 LET p%=p%-1 : GOTO 61
87 LET q%=q%+1 : IF h$(q%)="1": GOTO 87
90 LET p%=z% DIV (2*q%+1) : IF q%<=o%: GOTO 61
100 LET z%=z%-1 : IF z%=0: PRINT N%(z%) : STOP
101 IF h$(z%)=" ": PRINT N%(z%)! 
110 GOTO 100
127 DEF FN N%(i)=0^i+1+i*2

Batch File

:: Sieve of Eratosthenes for Rosetta Code - PG
@echo off
setlocal ENABLEDELAYEDEXPANSION
setlocal ENABLEEXTENSIONS
rem echo on
set /p n=limit: 
rem set n=100
for /L %%i in (1,1,%n%) do set crible.%%i=1
for /L %%i in (2,1,%n%) do (
  if !crible.%%i! EQU 1 (
    set /A w = %%i * 2
    for /L %%j in (!w!,%%i,%n%) do (
	  set crible.%%j=0
	)
  )
)
for /L %%i in (2,1,%n%) do (
  if !crible.%%i! EQU 1 echo %%i 
)
pause
Output:
limit: 100
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
83
89
97

BBC BASIC

      limit% = 100000
      DIM sieve% limit%
      
      prime% = 2
      WHILE prime%^2 < limit%
        FOR I% = prime%*2 TO limit% STEP prime%
          sieve%?I% = 1
        NEXT
        REPEAT prime% += 1 : UNTIL sieve%?prime%=0
      ENDWHILE
      
      REM Display the primes:
      FOR I% = 1 TO limit%
        IF sieve%?I% = 0 PRINT I%;
      NEXT

BCPL

get "libhdr"

manifest $( LIMIT = 1000 $)

let sieve(prime,max) be
$(  let i = 2
    0!prime := false
    1!prime := false
    for i = 2 to max do i!prime := true
    while i*i <= max do
    $(  if i!prime do
        $(  let j = i*i
            while j <= max do
            $(  j!prime := false
                j := j + i
            $)
        $)
        i := i + 1
    $)
$)

let start() be
$(  let prime = vec LIMIT
    let col = 0
    sieve(prime, LIMIT)
    for i = 2 to LIMIT do
        if i!prime do
        $(  writef("%I4",i)   
            col := col + 1
            if col rem 20 = 0 then wrch('*N')
        $)
    wrch('*N')
$)
Output:
   2   3   5   7  11  13  17  19  23  29  31  37  41  43  47  53  59  61  67  71
  73  79  83  89  97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173
 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281
 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409
 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541
 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659
 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809
 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941
 947 953 967 971 977 983 991 997

Odds-only bit packed array version (64 bit)

This sieve also uses an iterator structure to enumerate the primes in the sieve. It's inspired by the golang bit packed sieve that returns a closure as an iterator. However, BCPL does not support closures, so the code uses an iterator object.

GET "libhdr"

LET lowbit(n) =
    0 -> -1,
    VALOF {
        // The table is byte packed to conserve space; therefore we must
        // unpack the structure.
        //
        LET deBruijn64 = TABLE
            #x0001300239311C03, #x3D3A322A261D1104,
            #x3E373B2435332B16, #x2D27211E18120C05,
            #x3F2F381B3C292510, #x362334152C20170B,
            #x2E1A280F22141F0A, #x190E13090D080706

        LET x6 = (n & -n) * #x3F79D71B4CB0A89 >> 58
        RESULTIS deBruijn64[x6 >> 3] >> (7 - (x6 & 7) << 3) & #xFF
    }

LET primes_upto(limit) =
    limit < 3 -> 0,
    VALOF {
        LET bit_sz = (limit + 1) / 2 - 1
        LET bit, p = ?, ?
        LET q, r = bit_sz >> 6, bit_sz & #x3F
        LET sz = q - (r > 0)
        LET sieve = getvec(sz)

        // Initialize the array
        FOR i = 0 TO q - 1 DO
            sieve!i := -1
        IF r > 0 THEN sieve!q := ~(-1 << r)
        sieve!sz := -1 // Sentinel value to mark the end -
              // (after sieving, we'll never have 64 consecutive odd primes.)

        // run the sieve
        bit := 0
        {
            WHILE (sieve[bit >> 6] & 1 << (bit & #x3F)) = 0 DO
                bit +:= 1
            p := 2*bit + 3
            q := p*p
            IF q > limit THEN RESULTIS sieve
            r := (q - 3) >> 1
            UNTIL r >= bit_sz DO {
                sieve[r >> 6] &:= ~(1 << (r & #x3F))
                r +:= p
            }
            bit +:= 1
        } REPEAT
    }

MANIFEST { // fields in an iterable
    sieve_start; sieve_bits; sieve_ptr
}

LET prime_iter(sieve) = VALOF {
    LET iter = getvec(2)
    iter!sieve_start := 0
    iter!sieve_bits := sieve!0
    iter!sieve_ptr := sieve
    RESULTIS iter
}

LET nextprime(iter) =
    !iter!sieve_ptr = -1 -> 0,  // guard entry if at the end already
    VALOF {
        LET p, x = ?, ?

        // iter!sieve_start is also a flag to yield 2.
        IF iter!sieve_start = 0 {
            iter!sieve_start := 3
            RESULTIS 2
        }
        x := iter!sieve_bits
        {
            TEST x ~= 0
            THEN {
                p := (lowbit(x) << 1) + iter!sieve_start
                x &:= x - 1
                iter!sieve_bits := x
                RESULTIS p
            }
            ELSE {
                iter!sieve_start +:= 128
                iter!sieve_ptr +:= 1
                x := !iter!sieve_ptr
                IF x = -1 RESULTIS 0
            }
        } REPEAT
    }

LET show(sieve) BE {
    LET iter = prime_iter(sieve)
    LET c, p = 0, ?
    {
        p := nextprime(iter)
        IF p = 0 THEN {
            wrch('*n')
            freevec(iter)
            RETURN
        }
        IF c MOD 10 = 0 THEN wrch('*n')
        c +:= 1
        writef("%8d", p)
    } REPEAT
}

LET start() = VALOF {
    LET n = ?
    LET argv = VEC 20
    LET sz = ?
    LET primes = ?

    sz := rdargs("upto/a/n/p", argv, 20)
    IF sz = 0 RESULTIS 1
    n := !argv!0
    primes := primes_upto(n)
    IF primes = 0 RESULTIS 1 // no array allocated because limit too small
    show(primes)
    freevec(primes)
    RESULTIS 0
}
Output:
$ ./sieve 1000

BCPL 64-bit Cintcode System (13 Jan 2020)
0.000> 
       2       3       5       7      11      13      17      19      23      29
      31      37      41      43      47      53      59      61      67      71
      73      79      83      89      97     101     103     107     109     113
     127     131     137     139     149     151     157     163     167     173
     179     181     191     193     197     199     211     223     227     229
     233     239     241     251     257     263     269     271     277     281
     283     293     307     311     313     317     331     337     347     349
     353     359     367     373     379     383     389     397     401     409
     419     421     431     433     439     443     449     457     461     463
     467     479     487     491     499     503     509     521     523     541
     547     557     563     569     571     577     587     593     599     601
     607     613     617     619     631     641     643     647     653     659
     661     673     677     683     691     701     709     719     727     733
     739     743     751     757     761     769     773     787     797     809
     811     821     823     827     829     839     853     857     859     863
     877     881     883     887     907     911     919     929     937     941
     947     953     967     971     977     983     991     997
0.005> 

Befunge

2>:3g" "-!v\  g30          <
 |!`"O":+1_:.:03p>03g+:"O"`|
 @               ^  p3\" ":<
2 234567890123456789012345678901234567890123456789012345678901234567890123456789

Binary Lambda Calculus

The BLC sieve of Eratosthenes as documented at https://github.com/tromp/AIT/blob/master/characteristic_sequences/primes.lam is the 167 bit program

00010001100110010100011010000000010110000010010001010111110111101001000110100001110011010000000000101101110011100111111101111000000001111100110111000000101100000110110

The infinitely long output is

001101010001010001010001000001010000010001010001000001000001010000010001010000010001000001000000010001010001010001000000000000010001000001010000000001010000010000010001000001000001010000000001010001010000000000010000000000010001010001000001010000000001000001000001000001010000010001010000000001000000000000010001010001000000000000010000010000000001010001000001000...

BQN

A more efficient sieve (primes below one billion in around a second) is provided as PrimesTo in bqn-libs primes.bqn.

Primes  {
  𝕩2 ? 0 ;             # No primes below 2
  p  𝕊⌈√n𝕩             # Initial primes by recursion
  b  2≤↕n               # Initial sieve: no 0 or 1
  E  {((𝕩×𝕩+⊢))n}  # Multiples of 𝕩 under n, starting at 𝕩×𝕩
  / b E{0¨(𝕨)𝕩}´ p   # Cross them out
}
Output:
   Primes 100
 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
   Primes¨ 107  # Number of primes below 1e0, 1e1, ... 1e6
 0 4 25 168 1229 9592 78498 

Bracmat

This solution does not use an array. Instead, numbers themselves are used as variables. The numbers that are not prime are set (to the silly value "nonprime"). Finally all numbers up to the limit are tested for being initialised. The uninitialised (unset) ones must be the primes.

( ( eratosthenes
  =   n j i
    .   !arg:?n
      & 1:?i
      &   whl
        ' ( (1+!i:?i)^2:~>!n:?j
          & ( !!i
            |   whl
              ' ( !j:~>!n
                & nonprime:?!j
                & !j+!i:?j
                )
            )
          )
      & 1:?i
      &   whl
        ' ( 1+!i:~>!n:?i
          & (!!i|put$(!i " "))
          )
  )
& eratosthenes$100
)
Output:

2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97

C

Plain sieve, without any optimizations:

#include <stdlib.h>
#include <math.h>

char*
eratosthenes(int n, int *c)
{
	char* sieve;
	int i, j, m;

	if(n < 2)
		return NULL;

	*c = n-1;     /* primes count */
	m = (int) sqrt((double) n);

	/* calloc initializes to zero */
	sieve = calloc(n+1,sizeof(char));
	sieve[0] = 1;
	sieve[1] = 1;
	for(i = 2; i <= m; i++)
		if(!sieve[i])
			for (j = i*i; j <= n; j += i)
				if(!sieve[j]){
					sieve[j] = 1; 
					--(*c);
				}
  	return sieve;
}

Possible optimizations include sieving only odd numbers (or more complex wheels), packing the sieve into bits to improve locality (and allow larger sieves), etc.

Another example:

We first fill ones into an array and assume all numbers are prime. Then, in a loop, fill zeroes into those places where i * j is less than or equal to n (number of primes requested), which means they have multiples! To understand this better, look at the output of the following example.

To print this back, we look for ones in the array and only print those spots.

#include <stdio.h>
#include <malloc.h>
void sieve(int *, int);

int main(int argc, char *argv)
{
    int *array, n=10;
    array =(int *)malloc((n + 1) * sizeof(int));
    sieve(array,n);
    return 0;
}

void sieve(int *a, int n)
{
    int i=0, j=0;

    for(i=2; i<=n; i++) {
        a[i] = 1;
    }

    for(i=2; i<=n; i++) {
        printf("\ni:%d", i);
        if(a[i] == 1) {
            for(j=i; (i*j)<=n; j++) {
                printf ("\nj:%d", j);
                printf("\nBefore a[%d*%d]: %d", i, j, a[i*j]);
                a[(i*j)] = 0;
                printf("\nAfter a[%d*%d]: %d", i, j, a[i*j]);
            }
        }
    }

    printf("\nPrimes numbers from 1 to %d are : ", n);
    for(i=2; i<=n; i++) {
        if(a[i] == 1)
            printf("%d, ", i);
    }
    printf("\n\n");
}
Output:
i:2
j:2
Before a[2*2]: 1
After a[2*2]: 0
j:3
Before a[2*3]: 1
After a[2*3]: 0
j:4
Before a[2*4]: 1
After a[2*4]: 0
j:5
Before a[2*5]: 1
After a[2*5]: 0
i:3
j:3
Before a[3*3]: 1
After a[3*3]: 0
i:4
i:5
i:6
i:7
i:8
i:9
i:10
Primes numbers from 1 to 10 are : 2, 3, 5, 7,

C#

Works with: C# version 2.0+
using System;
using System.Collections;
using System.Collections.Generic;

namespace SieveOfEratosthenes
{
    class Program
    {
        static void Main(string[] args)
        {
            int maxprime = int.Parse(args[0]);
            var primelist = GetAllPrimesLessThan(maxprime);
            foreach (int prime in primelist)
            {
                Console.WriteLine(prime);
            }
            Console.WriteLine("Count = " + primelist.Count);
            Console.ReadLine();
        }

        private static List<int> GetAllPrimesLessThan(int maxPrime)
        {
            var primes = new List<int>();
            var maxSquareRoot = (int)Math.Sqrt(maxPrime);
            var eliminated = new BitArray(maxPrime + 1);

            for (int i = 2; i <= maxPrime; ++i)
            {
                if (!eliminated[i])
                {
                    primes.Add(i);
                    if (i <= maxSquareRoot)
                    {
                        for (int j = i * i; j <= maxPrime; j += i)
                        {
                            eliminated[j] = true;
                        }
                    }
                }
            }
            return primes;
        }
    }
}

Richard Bird Sieve

Translation of: F#

To show that C# code can be written in somewhat functional paradigms, the following in an implementation of the Richard Bird sieve from the Epilogue of [Melissa E. O'Neill's definitive article](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) in Haskell:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesBird : IEnumerable<PrimeT> {
    private struct CIS<T> {
      public T v; public Func<CIS<T>> cont;
      public CIS(T v, Func<CIS<T>> cont) {
        this.v = v; this.cont = cont;
      }
    }
    private CIS<PrimeT> pmlts(PrimeT p) {
      Func<PrimeT, CIS<PrimeT>> fn = null;
      fn = (c) => new CIS<PrimeT>(c, () => fn(c + p));
      return fn(p * p);
    }
    private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {
      return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont())); }
    private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {
      var x = xs.v; var y = ys.v;
      if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));
      else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));
      else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));
    }
    private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {
      return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(css.cont()))); }
    private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {
      var nn = n; var ncs = cs;
      for (; ; ++nn) {
        if (nn >= ncs.v) ncs = ncs.cont();
        else return new CIS<PrimeT>(nn, () => minusat(++nn, ncs));
      }
    }
    private CIS<PrimeT> prms() {
      return new CIS<PrimeT>(2, () => minusat(3, cmpsts(allmlts(prms())))); }
    public IEnumerator<PrimeT> GetEnumerator() {
      for (var ps = prms(); ; ps = ps.cont()) yield return ps.v;
    }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

Tree Folding Sieve

Translation of: F#

The above code can easily be converted to "odds-only" and a infinite tree-like folding scheme with the following minor changes:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesTreeFold : IEnumerable<PrimeT> {
    private struct CIS<T> {
      public T v; public Func<CIS<T>> cont;
      public CIS(T v, Func<CIS<T>> cont) {
        this.v = v; this.cont = cont;
      }
    }
    private CIS<PrimeT> pmlts(PrimeT p) {
      var adv = p + p;
      Func<PrimeT, CIS<PrimeT>> fn = null;
      fn = (c) => new CIS<PrimeT>(c, () => fn(c + adv));
      return fn(p * p);
    }
    private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {
      return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont()));
    }
    private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {
      var x = xs.v; var y = ys.v;
      if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));
      else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));
      else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));
    }
    private CIS<CIS<PrimeT>> pairs(CIS<CIS<PrimeT>> css) {
      var nxtcss = css.cont();
      return new CIS<CIS<PrimeT>>(merge(css.v, nxtcss.v), () => pairs(nxtcss.cont())); }
    private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {
      return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(pairs(css.cont()))));
    }
    private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {
      var nn = n; var ncs = cs;
      for (; ; nn += 2) {
        if (nn >= ncs.v) ncs = ncs.cont();
        else return new CIS<PrimeT>(nn, () => minusat(nn + 2, ncs));
      }
    }
    private CIS<PrimeT> oddprms() {
      return new CIS<PrimeT>(3, () => minusat(5, cmpsts(allmlts(oddprms()))));
    }
    public IEnumerator<PrimeT> GetEnumerator() {
      yield return 2;
      for (var ps = oddprms(); ; ps = ps.cont()) yield return ps.v;
    }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

The above code runs over ten times faster than the original Richard Bird algorithm.

Priority Queue Sieve

Translation of: F#

First, an implementation of a Min Heap Priority Queue is provided as extracted from the entry at RosettaCode, with only the necessary methods duplicated here:

namespace PriorityQ {
  using KeyT = System.UInt32;
  using System;
  using System.Collections.Generic;
  using System.Linq;
  class Tuple<K, V> { // for DotNet 3.5 without Tuple's
    public K Item1; public V Item2;
    public Tuple(K k, V v) { Item1 = k; Item2 = v; }
    public override string ToString() {
      return "(" + Item1.ToString() + ", " + Item2.ToString() + ")";
    }
  }
  class MinHeapPQ<V> {
    private struct HeapEntry {
      public KeyT k; public V v;
      public HeapEntry(KeyT k, V v) { this.k = k; this.v = v; }
    }
    private List<HeapEntry> pq;
    private MinHeapPQ() { this.pq = new List<HeapEntry>(); }
    private bool mt { get { return pq.Count == 0; } }
    private Tuple<KeyT, V> pkmn {
      get {
        if (pq.Count == 0) return null;
        else {
          var mn = pq[0];
          return new Tuple<KeyT, V>(mn.k, mn.v);
        }
      }
    }
    private void psh(KeyT k, V v) { // add extra very high item if none
      if (pq.Count == 0) pq.Add(new HeapEntry(UInt32.MaxValue, v));
      var i = pq.Count; pq.Add(pq[i - 1]); // copy bottom item...
      for (var ni = i >> 1; ni > 0; i >>= 1, ni >>= 1) {
        var t = pq[ni - 1];
        if (t.k > k) pq[i - 1] = t; else break;
      }
      pq[i - 1] = new HeapEntry(k, v);
    }
    private void siftdown(KeyT k, V v, int ndx) {
      var cnt = pq.Count - 1; var i = ndx;
      for (var ni = i + i + 1; ni < cnt; ni = ni + ni + 1) {
        var oi = i; var lk = pq[ni].k; var rk = pq[ni + 1].k;
        var nk = k;
        if (k > lk) { i = ni; nk = lk; }
        if (nk > rk) { ni += 1; i = ni; }
        if (i != oi) pq[oi] = pq[i]; else break;
      }
      pq[i] = new HeapEntry(k, v);
    }
    private void rplcmin(KeyT k, V v) {
      if (pq.Count > 1) siftdown(k, v, 0); }
    public static MinHeapPQ<V> empty { get { return new MinHeapPQ<V>(); } }
    public static Tuple<KeyT, V> peekMin(MinHeapPQ<V> pq) { return pq.pkmn; }
    public static MinHeapPQ<V> push(KeyT k, V v, MinHeapPQ<V> pq) {
      pq.psh(k, v); return pq; }
    public static MinHeapPQ<V> replaceMin(KeyT k, V v, MinHeapPQ<V> pq) {
      pq.rplcmin(k, v); return pq; }
}


Restricted Base Primes Queue

The following code implements an improved version of the odds-only O'Neil algorithm, which provides the improvements of only adding base prime composite number streams to the queue when the sieved number reaches the square of the base prime (saving a huge amount of memory and considerable execution time, including not needing as wide a range of a type for the internal prime numbers) as well as minimizing stream processing using fusion:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesPQ : IEnumerable<PrimeT> {
    private IEnumerator<PrimeT> nmrtr() {
      MinHeapPQ<PrimeT> pq = MinHeapPQ<PrimeT>.empty;
      PrimeT bp = 3; PrimeT q = 9;
      IEnumerator<PrimeT> bps = null;
      yield return 2; yield return 3;
      for (var n = (PrimeT)5; ; n += 2) {
        if (n >= q) { // always equal or less...
          if (q <= 9) {
            bps = nmrtr();
            bps.MoveNext(); bps.MoveNext(); } // move to 3...
          bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;
          var adv = bp + bp; bp = nbp;
          pq = MinHeapPQ<PrimeT>.push(n + adv, adv, pq);
        }
        else {
          var pk = MinHeapPQ<PrimeT>.peekMin(pq);
          var ck = (pk == null) ? q : pk.Item1;
          if (n >= ck) {
            do { var adv = pk.Item2;
                  pq = MinHeapPQ<PrimeT>.replaceMin(ck + adv, adv, pq);
                  pk = MinHeapPQ<PrimeT>.peekMin(pq); ck = pk.Item1;
            } while (n >= ck);
          }
          else yield return n;
        }
      }
    }
    public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

The above code is at least about 2.5 times faster than the Tree Folding version.


Dictionary (Hash table) Sieve

The above code adds quite a bit of overhead in having to provide a version of a Priority Queue for little advantage over a Dictionary (hash table based) version as per the code below:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesDict : IEnumerable<PrimeT> {
    private IEnumerator<PrimeT> nmrtr() {
      Dictionary<PrimeT, PrimeT> dct = new Dictionary<PrimeT, PrimeT>();
      PrimeT bp = 3; PrimeT q = 9;
      IEnumerator<PrimeT> bps = null;
      yield return 2; yield return 3;
      for (var n = (PrimeT)5; ; n += 2) {
        if (n >= q) { // always equal or less...
          if (q <= 9) {
            bps = nmrtr();
            bps.MoveNext(); bps.MoveNext();
          } // move to 3...
          bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;
          var adv = bp + bp; bp = nbp;
          dct.Add(n + adv, adv);
        }
        else {
          if (dct.ContainsKey(n)) {
            PrimeT nadv; dct.TryGetValue(n, out nadv); dct.Remove(n); var nc = n + nadv;
            while (dct.ContainsKey(nc)) nc += nadv;
            dct.Add(nc, nadv);
          }
          else yield return n;
        }
      }
    }
    public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

The above code runs in about three quarters of the time as the above Priority Queue based version for a range of a million primes which will fall even further behind for increasing ranges due to the Dictionary providing O(1) access times as compared to the O(log n) access times for the Priority Queue; the only slight advantage of the PQ based version is at very small ranges where the constant factor overhead of computing the table hashes becomes greater than the "log n" factor for small "n".

Best performance: CPU-Cache-Optimized Segmented Sieve

All of the above unbounded versions are really just an intellectual exercise as with very little extra lines of code above the fastest Dictionary based version, one can have an bit-packed page-segmented array based version as follows:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
  class PrimesPgd : IEnumerable<PrimeT> {
    private const int PGSZ = 1 << 14; // L1 CPU cache size in bytes
    private const int BFBTS = PGSZ * 8; // in bits
    private const int BFRNG = BFBTS * 2;
    public IEnumerator<PrimeT> nmrtr() {
      IEnumerator<PrimeT> bps = null;
      List<uint> bpa = new List<uint>();
      uint[] cbuf = new uint[PGSZ / 4]; // 4 byte words
      yield return 2;
      for (var lowi = (PrimeT)0; ; lowi += BFBTS) {
        for (var bi = 0; ; ++bi) {
          if (bi < 1) {
            if (bi < 0) { bi = 0; yield return 2; }
            PrimeT nxt = 3 + lowi + lowi + BFRNG;
            if (lowi <= 0) { // cull very first page
              for (int i = 0, p = 3, sqr = 9; sqr < nxt; i++, p += 2, sqr = p * p)
                if ((cbuf[i >> 5] & (1 << (i & 31))) == 0)
                  for (int j = (sqr - 3) >> 1; j < BFBTS; j += p)
                    cbuf[j >> 5] |= 1u << j;
            }
            else { // cull for the rest of the pages
              Array.Clear(cbuf, 0, cbuf.Length);
              if (bpa.Count == 0) { // inite secondar base primes stream
                bps = nmrtr(); bps.MoveNext(); bps.MoveNext();
                bpa.Add((uint)bps.Current); bps.MoveNext();
              } // add 3 to base primes array
              // make sure bpa contains enough base primes...
              for (PrimeT p = bpa[bpa.Count - 1], sqr = p * p; sqr < nxt; ) {
                p = bps.Current; bps.MoveNext(); sqr = p * p; bpa.Add((uint)p);
              }
              for (int i = 0, lmt = bpa.Count - 1; i < lmt; i++) {
                var p = (PrimeT)bpa[i]; var s = (p * p - 3) >> 1;
                // adjust start index based on page lower limit...
                if (s >= lowi) s -= lowi;
                else {
                  var r = (lowi - s) % p;
                  s = (r != 0) ? p - r : 0;
                }
                for (var j = (uint)s; j < BFBTS; j += p)
                  cbuf[j >> 5] |= 1u << ((int)j);
              }
            }
          }
          while (bi < BFBTS && (cbuf[bi >> 5] & (1 << (bi & 31))) != 0) ++bi;
          if (bi < BFBTS) yield return 3 + (((PrimeT)bi + lowi) << 1);
          else break; // outer loop for next page segment...
        }
      }
    }
    public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
    IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
  }

The above code is about 25 times faster than the Dictionary version at computing the first about 50 million primes (up to a range of one billion), with the actual enumeration of the result sequence now taking longer than the time it takes to cull the composite number representation bits from the arrays, meaning that it is over 50 times faster at actually sieving the primes. The code owes its speed as compared to a naive "one huge memory array" algorithm to using an array size that is the size of the CPU L1 or L2 caches and using bit-packing to fit more number representations into this limited capacity; in this way RAM memory access times are reduced by a factor of from about four to about 10 (depending on CPU and RAM speed) as compared to those naive implementations, and the minor computational cost of the bit manipulations is compensated by a large factor in total execution time.

The time to enumerate the result primes sequence can be reduced somewhat (about a second) by removing the automatic iterator "yield return" statements and converting them into a "roll-your-own" IEnumerable<PrimeT> implementation, but for page segmentation of odds-only, this iteration of the results will still take longer than the time to actually cull the composite numbers from the page arrays.

In order to make further gains in speed, custom methods must be used to avoid using iterator sequences. If this is done, then further gains can be made by extreme wheel factorization (up to about another about four times gain in speed) and multi-processing (with another gain in speed proportional to the actual independent CPU cores used).

Note that all of these gains in speed are not due to C# other than it compiles to reasonably efficient machine code, but rather to proper use of the Sieve of Eratosthenes algorithm.

All of the above unbounded code can be tested by the following "main" method (replace the name "PrimesXXX" with the name of the class to be tested):

    static void Main(string[] args) {
      Console.WriteLine(new PrimesXXX().ElementAt(1000000 - 1)); // zero based indexing...
    }

To produce the following output for all tested versions (although some are considerably faster than others):

Output:
15485863

C++

Standard Library

This implementation follows the standard library pattern of std::iota. The start and end iterators are provided for the container. The destination container is used for marking primes and then filled with the primes which are less than the container size. This method requires no memory allocation inside the function.

#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>

// Fills the range [start, end) with 1 if the integer corresponding to the index is composite and 0 otherwise.
// requires: I is RandomAccessIterator
template<typename I>
void mark_composites(I start, I end)
{
    std::fill(start, end, 0);

    for (auto it = start + 1; it != end; ++it)
    {
        if (*it == 0)
        {
            auto prime = std::distance(start, it) + 1;
            // mark all multiples of this prime number as composite.
            auto multiple_it = it;
            while (std::distance(multiple_it, end) > prime)
            {
                std::advance(multiple_it, prime);
                *multiple_it = 1;
            }
        }
    }
}

// Fills "out" with the prime numbers in the range 2...N where N = distance(start, end).
// requires: I is a RandomAccessIterator
//           O is an OutputIterator
template <typename I, typename O>
O sieve_primes(I start, I end, O out)
{
    mark_composites(start, end);
    for (auto it = start + 1; it != end; ++it)
    {
        if (*it == 0)
        {
            *out = std::distance(start, it) + 1;
            ++out;
        }
    }
    return out;
}

int main()
{
    std::vector<uint8_t> is_composite(1000);
    sieve_primes(is_composite.begin(), is_composite.end(), std::ostream_iterator<int>(std::cout, " "));

    // Alternative to store in a vector: 
    // std::vector<int> primes;
    // sieve_primes(is_composite.begin(), is_composite.end(), std::back_inserter(primes));
}

Boost

// yield all prime numbers less than limit. 
template<class UnaryFunction>
void primesupto(int limit, UnaryFunction yield)
{
  std::vector<bool> is_prime(limit, true);
  
  const int sqrt_limit = static_cast<int>(std::sqrt(limit));
  for (int n = 2; n <= sqrt_limit; ++n)
    if (is_prime[n]) {
	yield(n);

	for (unsigned k = n*n, ulim = static_cast<unsigned>(limit); k < ulim; k += n) 
      //NOTE: "unsigned" is used to avoid an overflow in `k+=n` for `limit` near INT_MAX
	  is_prime[k] = false;
    }

  for (int n = sqrt_limit + 1; n < limit; ++n)
    if (is_prime[n])
	yield(n);
}

Full program:

Works with: Boost
/**
   $ g++ -I/path/to/boost sieve.cpp -o sieve && sieve 10000000
 */
#include <inttypes.h> // uintmax_t
#include <limits>
#include <cmath>
#include <iostream>
#include <sstream>
#include <vector>

#include <boost/lambda/lambda.hpp>

int main(int argc, char *argv[])
{
  using namespace std;
  using namespace boost::lambda;

  int limit = 10000;
  if (argc == 2) {
    stringstream ss(argv[--argc]);
    ss >> limit;

    if (limit < 1 or ss.fail()) {
      cerr << "USAGE:\n  sieve LIMIT\n\nwhere LIMIT in the range [1, " 
	   << numeric_limits<int>::max() << ")" << endl;
      return 2;
    }
  }

  // print primes less then 100
  primesupto(100, cout << _1 << " ");
  cout << endl;  

  // find number of primes less then limit and their sum
  int count = 0;
  uintmax_t sum = 0;
  primesupto(limit, (var(sum) += _1, var(count) += 1));

  cout << "limit sum pi(n)\n" 
       << limit << " " << sum << " " << count << endl;
}

Chapel

This example is incorrect. Please fix the code and remove this message.

Details: Doesn't compile since at least Chapel version 1.20 to 1.24.1.

This solution uses nested iterators to create new wheels at run time:

// yield prime and remove all multiples of it from children sieves
iter sieve(prime):int {

        yield prime;

        var last = prime;
        label candidates for candidate in sieve(prime+1) do {
                for composite in last..candidate by prime do {

                        // candidate is a multiple of this prime
                        if composite == candidate {
                                // remember size of last composite
                                last = composite;
                                // and try the next candidate
                                continue candidates;
                        }
                }

                // candidate cannot need to be removed by this sieve
                // yield to parent sieve for checking
                yield candidate;
        }
}

The topmost sieve needs to be started with 2 (the smallest prime):

config const N = 30;
for p in sieve(2) {
        if p > N {
                writeln();
                break;
        }
        write(" ", p);
}

Alternate Conventional Bit-Packed Implementation

The following code implements the conventional monolithic (one large array) Sieve of Eratosthenes where the representations of the numbers use only one bit per number, using an iteration for output so as to not require further memory allocation:

compile with the `--fast` option

use Time;
use BitOps;

type Prime = uint(32);

config const limit: Prime = 1000000000; // sieve limit

proc main() {
  write("The first 25 primes are:  ");
  for p in primes(100) do write(p, " "); writeln();
  
  var count = 0; for p in primes(1000000) do count += 1;
  writeln("Count of primes to a million is:  ", count, ".");
  
  var timer: Timer;
  timer.start();

  count = 0;
  for p in primes(limit) do count += 1;

  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

iter primes(n: Prime): Prime {
  const szlmt = n / 8;
  var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up

  for bp in 2 .. n {
    if cmpsts[bp >> 3] & (1: uint(8) << (bp & 7)) == 0 {
      const s0 = bp * bp;
      if s0 > n then break;
      for c in s0 .. n by bp { cmpsts[c >> 3] |= 1: uint(8) << (c & 7); }
    }
  }

  for p in 2 .. n do if cmpsts[p >> 3] & (1: uint(8) << (p & 7)) == 0 then yield p;

}
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Count of primes to a million is:  78498.
Found 50847534 primes up to 1000000000 in 7964.05 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

Alternate Odds-Only Bit-Packed Implementation

use Time;
use BitOps;

type Prime = int(32);

config const limit: Prime = 1000000000; // sieve limit

proc main() {
  write("The first 25 primes are:  ");
  for p in primes(100) do write(p, " "); writeln();
  
  var count = 0; for p in primes(1000000) do count += 1;
  writeln("Count of primes to a million is:  ", count, ".");
  
  var timer: Timer;
  timer.start();

  count = 0;
  for p in primes(limit) do count += 1;

  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

iter primes(n: Prime): Prime {
  const ndxlmt = (n - 3) / 2;
  const szlmt = ndxlmt / 8;
  var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up

  for i in 0 .. ndxlmt { // never gets to the end!
    if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 {
      const bp = i + i + 3;
      const s0 = (bp * bp - 3) / 2;
      if s0 > ndxlmt then break;
      for s in s0 .. ndxlmt by bp do cmpsts[s >> 3] |= 1: uint(8) << (s & 7);
    }
  }

  yield 2;
  for i in 0 .. ndxlmt do
    if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 then yield i + i + 3;

}
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Count of primes to a million is:  78498.
Found 50847534 primes up to 1000000000 in 4008.16 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, sieving odds-only is about twice as fast due to the reduced number of operations; it also uses only half the amount of memory. However, this is still not all that fast at about 14.4 CPU clock cycles per sieve culling operation due to the size of the array exceeding the CPU cache size(s).

Hash Table Based Odds-Only Version

Translation of: Python

code link

Works with: Chapel version 1.25.1
use Time;

config const limit = 100000000;

type Prime = uint(32);

class Primes { // needed so we can use next to get successive values
  var n: Prime; var obp: Prime; var q: Prime;
  var bps: owned Primes?;
  var keys: domain(Prime); var dict: [keys] Prime;
  proc next(): Prime { // odd primes!
    if this.n < 5 { this.n = 5; return 3; }
    if this.bps == nil {
      this.bps = new Primes(); // secondary odd base primes feed
      this.obp = this.bps!.next(); this.q = this.obp * this.obp;
    }
    while true {
      if this.n >= this.q { // advance secondary stream of base primes...
        const adv = this.obp * 2; const key = this.q + adv;
        this.obp = this.bps!.next(); this.q = this.obp * this.obp;       
        this.keys += key; this.dict[key] = adv;
      }
      else if this.keys.contains(this.n) { // found a composite; advance...
        const adv = this.dict[this.n]; this.keys.remove(this.n);
        var nkey = this.n + adv;
        while this.keys.contains(nkey) do nkey += adv;
        this.keys += nkey; this.dict[nkey] = adv;
      }
      else { const p = this.n; this.n += 2; return p; }
      this.n += 2;
    }
    return 0; // to keep compiler happy in returning a value!
  }
  iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
  var count = 0;
  write("The first 25 primes are:  ");
  for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
  writeln();
  
  var timer: Timer;
  timer.start();

  count = 0;
  for p in new Primes() { if p > limit then break; count += 1; }

  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Found 5761455 primes up to 100000000 in 5195.41 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

As you can see, this is much slower than the array based versions but much faster than previous Chapel version code as the hashing has been greatly improved.

As an alternate to the use of a built-in library, the following code implements a specialized BasePrimesTable that works similarly to the way the Python associative arrays work as to hashing algorithm used (no hashing, as the hash values for integers are just themselves) and something similar to the Python method of handling hash table collisions is used:

Works with: Chapel version 1.25.1

Compile with the `--fast` compiler command line option

use Time;
 
config const limit = 100000000;
 
type Prime = uint(32);

record BasePrimesTable { // specialized for the use here...
  record BasePrimeEntry { var fullkey: Prime; var val: Prime; }
  var cpcty: int = 8; var sz: int = 0;
  var dom = { 0 .. cpcty - 1 }; var bpa: [dom] BasePrimeEntry;
  proc grow() {   
    const ndom = dom; var cbpa: [ndom] BasePrimeEntry = bpa[ndom];
    bpa = new BasePrimeEntry(); cpcty *= 2; dom = { 0 .. cpcty - 1 };
    for kv in cbpa do if kv.fullkey != 0 then add(kv.fullkey, kv.val);   
  }
  proc find(k: Prime): int { // internal get location of value or -1
    const msk = cpcty - 1; var skey = k: int & msk;
    var perturb = k: int; var loop = 8;
    do {
      if bpa[skey].fullkey == k then return skey;
      perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
      loop -= 1; if perturb > 0 then loop = 8;
    } while loop > 0;
    return -1; // not found!
  }
  proc contains(k: Prime): bool { return find(k) >= 0; }
  proc add(k, v: Prime) { // if exists then replaces else new entry
    const fndi = find(k);
    if fndi >= 0 then bpa[fndi] = new BasePrimeEntry(k, v);
    else {
      sz += 1; if 2 * sz > cpcty then grow();
      const msk = cpcty - 1; var skey = k: int & msk;
      var perturb = k: int; var loop = 8;
      do {
        if bpa[skey].fullkey == 0 {
          bpa[skey] = new BasePrimeEntry(k, v); return; }
        perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
        loop -= 1; if perturb > 0 then loop = 8;
      } while loop > 0;
    }
  }
  proc remove(k: Prime) { // if doesn't exist does nothing
    const fndi = find(k);
    if fndi >= 0 { bpa[fndi].fullkey = 0; sz -= 1; }
  }
  proc this(k: Prime): Prime { // returns value or 0 if not found
    const fndi = find(k);
    if fndi < 0 then return 0; else return bpa[fndi].val;
  }
}

class Primes { // needed so we can use next to get successive values
  var n: Prime; var obp: Prime; var q: Prime;
  var bps: shared Primes?; var dict = new BasePrimesTable();
  proc next(): Prime { // odd primes!
    if this.n < 5 { this.n = 5; return 3; }
    if this.bps == nil {
      this.bps = new Primes(); // secondary odd base primes feed
      this.obp = this.bps!.next(); this.q = this.obp * this.obp;
    }
    while true {
      if this.n >= this.q { // advance secondary stream of base primes...
        const adv = this.obp * 2; const key = this.q + adv;
        this.obp = this.bps!.next(); this.q = this.obp * this.obp;
        this.dict.add(key, adv);
      }
      else if this.dict.contains(this.n) { // found a composite; advance...
        const adv = this.dict[this.n]; this.dict.remove(this.n);
        var nkey = this.n + adv;
        while this.dict.contains(nkey) do nkey += adv;
        this.dict.add(nkey, adv);
      }
      else { const p = this.n; this.n += 2; return p; }
      this.n += 2;
    }
    return 0; // to keep compiler happy in returning a value!
  }
  iter these(): Prime { yield 2; while true do yield this.next(); }
}

proc main() {
  var count = 0;
  write("The first 25 primes are:  ");
  for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
  writeln();
 
  var timer: Timer;
  timer.start();
 
  count = 0;
  for p in new Primes() { if p > limit then break; count += 1; }
 
  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 
Found 5761455 primes up to 100000000 in 2351.79 milliseconds.

This last code is quite usable up to a hundred million (as here) or even a billion in a little over ten times the time, but is still slower than the very simple odds-only monolithic array version and is also more complex, although it uses less memory (only for the hash table for the base primes of about eight Kilobytes for sieving to a billion compared to over 60 Megabytes for the monolithic odds-only simple version).

Chapel version 1.25.1 provides yet another option as to the form of the code although the algorithm is the same in that one can now override the hashing function for Chapel records so that they can be used as the Key Type for Hash Map's as follows:

Works with: Chapel version 1.25.1

Compile with the `--fast` compiler command line option

use Time;

use Map;
 
config const limit = 100000000;
 
type Prime = uint(32);
 
class Primes { // needed so we can use next to get successive values
  record PrimeR { var prime: Prime; proc hash() { return prime; } }
  var n: PrimeR = new PrimeR(0); var obp: Prime; var q: Prime;
  var bps: owned Primes?;
  var dict = new map(PrimeR, Prime);
  proc next(): Prime { // odd primes!
    if this.n.prime < 5 { this.n.prime = 5; return 3; }
    if this.bps == nil {
      this.bps = new Primes(); // secondary odd base primes feed
      this.obp = this.bps!.next(); this.q = this.obp * this.obp;
    }
    while true {
      if this.n.prime >= this.q { // advance secondary stream of base primes...
        const adv = this.obp * 2; const key = new PrimeR(this.q + adv);
        this.obp = this.bps!.next(); this.q = this.obp * this.obp;       
        this.dict.add(key, adv);
      }
      else if this.dict.contains(this.n) { // found a composite; advance...
        const adv = this.dict.getValue(this.n); this.dict.remove(this.n);
        var nkey = new PrimeR(this.n.prime + adv);
        while this.dict.contains(nkey) do nkey.prime += adv;
        this.dict.add(nkey, adv);
      }
      else { const p = this.n.prime;
             this.n.prime += 2; return p; }
      this.n.prime += 2;
    }
    return 0; // to keep compiler happy in returning a value!
  }
  iter these(): Prime { yield 2; while true do yield this.next(); }
}
 
proc main() {
  var count = 0;
  write("The first 25 primes are:  ");
  for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
  writeln();
 
  var timer: Timer;
  timer.start();
 
  count = 0;
  for p in new Primes() { if p > limit then break; count += 1; }
 
  timer.stop();
  write("Found ", count, " primes up to ", limit);
  writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}

This works in about exactly the same time as the last previous code, but doesn't require special custom adaptations of the associative array so that the standard library Map can be used.

Functional Tree Folding Odds-Only Version

Chapel isn't really a very functional language even though it has some functional forms of code in the Higher Order Functions (HOF's) of zippered, scanned, and reduced, iterations and has first class functions (FCF's) and lambdas (anonymous functions), these last can't be closures (capture variable bindings from external scope(s)), nor can the work around of using classes to emulate closures handle recursive (Y-combinator type) variable bindings using reference fields (at least currently with version 1.22). However, the Tree Folding add-on to the Richard Bird lazy list sieve doesn't require any of the things that can't be emulated using classes, so a version is given as follows:

Translation of: Nim

code link

Works with: 1.22 version - compile with the --fast compiler command line flag for full optimization
use Time;

type Prime = uint(32);

config const limit = 1000000: Prime;

// Chapel doesn't have closures, so we need to emulate them with classes...
class PrimeCIS { // base prime stream...
  var head: Prime;
  proc next(): shared PrimeCIS { return new shared PrimeCIS(); }
}

class PrimeMultiples: PrimeCIS {
  var adv: Prime;
  override proc next(): shared PrimeCIS {
    return new shared PrimeMultiples(
      this.head + this.adv, this.adv): shared PrimeCIS; }
}

class PrimeCISCIS { // base stream of prime streams; never used directly...
  var head: shared PrimeCIS;
  proc init() { this.head = new shared PrimeCIS(); }
  proc next(): shared PrimeCISCIS {
    return new shared PrimeCISCIS(); }
}

class AllMultiples: PrimeCISCIS {
  var bps: shared PrimeCIS;
  proc init(bsprms: shared PrimeCIS) {
    const bp = bsprms.head; const sqr = bp * bp; const adv = bp + bp;
    this.head = new shared PrimeMultiples(sqr, adv): PrimeCIS;
    this.bps = bsprms;
  }
  override proc next(): shared PrimeCISCIS {
    return new shared AllMultiples(this.bps.next()): PrimeCISCIS; }
}

class Union: PrimeCIS {
  var feeda, feedb: shared PrimeCIS;
  proc init(fda: shared PrimeCIS, fdb: shared PrimeCIS) {
    const ahd = fda.head; const bhd = fdb.head;
    this.head = if ahd < bhd then ahd else bhd;
    this.feeda = fda; this.feedb = fdb;
  }
  override proc next(): shared PrimeCIS {    
    const ahd = this.feeda.head; const bhd = this.feedb.head;
    if ahd < bhd then
      return new shared Union(this.feeda.next(), this.feedb): shared PrimeCIS;
    if ahd > bhd then
      return new shared Union(this.feeda, this.feedb.next()): shared PrimeCIS;
    return new shared Union(this.feeda.next(),
                            this.feedb.next()): shared PrimeCIS;
  }
}

class Pairs: PrimeCISCIS {
  var feed: shared PrimeCISCIS;
  proc init(fd: shared PrimeCISCIS) {
    const fs = fd.head; const sss = fd.next(); const ss = sss.head;
    this.head = new shared Union(fs, ss): shared PrimeCIS; this.feed = sss;
  }
  override proc next(): shared PrimeCISCIS {
    return new shared Pairs(this.feed.next()): shared PrimeCISCIS; }
}

class Composites: PrimeCIS {
  var feed: shared PrimeCISCIS;
  proc init(fd: shared PrimeCISCIS) {
    this.head = fd.head.head; this.feed = fd;
  }
  override proc next(): shared PrimeCIS {
    const fs = this.feed.head.next();
    const prs = new shared Pairs(this.feed.next()): shared PrimeCISCIS;
    const ncs = new shared Composites(prs): shared PrimeCIS;
    return new shared Union(fs, ncs): shared PrimeCIS;
  }
}

class OddPrimesFrom: PrimeCIS {
  var cmpsts: shared PrimeCIS;
  override proc next(): shared PrimeCIS {
    var n = head + 2; var cs = this.cmpsts;
    while true {
      if n < cs.head then
        return new shared OddPrimesFrom(n, cs): shared PrimeCIS;
      n += 2; cs = cs.next();
    }
    return this.cmpsts; // never used; keeps compiler happy!
  }
}

class OddPrimes: PrimeCIS {
  proc init() { this.head = 3; }
  override proc next(): shared PrimeCIS {
    const bps = new shared OddPrimes(): shared PrimeCIS;
    const mlts = new shared AllMultiples(bps): shared PrimeCISCIS;
    const cmpsts = new shared Composites(mlts): shared PrimeCIS;
    return new shared OddPrimesFrom(5, cmpsts): shared PrimeCIS;
  }
}

iter primes(): Prime {
  yield 2; var cur = new shared OddPrimes(): shared PrimeCIS;
  while true { yield cur.head; cur = cur.next(); }
}

// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

var timer: Timer; timer.start(); cnt = 0;
for p in primes() { if p > limit then break; cnt += 1; }
timer.stop(); write("\nFound ", cnt, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Found 78498 primes up to 1000000 in 344.859 milliseconds.

Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).

The above code is really just a toy example to show that Chapel can handle some tasks functionally (within the above stated limits) although doing so is slower than the Hash Table version above and also takes more memory as the nested lazy list structure consumes more memory in lazy list links and "plumbing" than does the simple implementation of a Hash Table. It also has a worst asymptotic performance with an extra `log(n)` factor where `n` is the sieving range; this can be shown by running the above program with `--limit=10000000` run time command line option to sieve to ten million which takes about 4.5 seconds to count the primes up to ten million (a factor of ten higher range, but much higher than the expected increased factor of about 10 per cent extra as per the Hash Table version with about 20 per cent more operations times the factor of ten for this version). Other than for the extra operations, this version is generally slower due to the time to do the many small allocations/de-allocations of the functional object instances, and this will be highly dependent on the platform on which it is run: cygwin on Windows may be particularly slow due to the extra level of indirection, and some on-line IDE's may also be slow due to their level of virtualization.

A Multi-Threaded Page-Segmented Odds-Only Bit-Packed Version

To take advantage of the features that make Chapel shine, we need to use it to do some parallel computations, and to efficiently do that for the Sieve of Eratosthenes, we need to divide the work into page segments where we can assign each largish segment to a separate thread/task; this also improves the speed due to better cache associativity with most memory accesses to values that are already in the cache(s). Once we have divided the work, Chapel offers lots of means to implement the parallelism but to be a true Sieve of Eratosthenes, we need to have the ability to output the results in order; with many of the convenience mechanisms not doing that, the best/simplest option is likely task parallelism with the output results assigned to an rotary indexed array containing the `sync` results. It turns out that, although the Chapel compiler can sometimes optimize the code so the overhead of creating tasks is not onerous, for this case where the actual tasks are somewhat complex, the compiler can't recognize that an automatically generated thread pool(s) are required so we need to generate the thread pool(s) manually. The code that implements the multi-threading of page segments using thread pools is as follows:

Works with: 1.24.1 version - compile with the --fast compiler command line flag for full optimization
use Time; use BitOps; use CPtr;

type Prime = uint(64);
type PrimeNdx = int(64);
type BasePrime = uint(32);

config const LIMIT = 1000000000: Prime;

config const L1 = 16; // CPU L1 cache size in Kilobytes (1024);
assert (L1 == 16 || L1 == 32 || L1 == 64,
        "L1 cache size must be 16, 32, or 64 Kilobytes!");
config const L2 = 128; // CPU L2 cache size in Kilobytes (1024);
assert (L2 == 128 || L2 == 256 || L2 == 512,
        "L2 cache size must be 128, 256, or 512 Kilobytes!");
const CPUL1CACHE: int = L1 * 1024 * 8; // size in bits!
const CPUL2CACHE: int = L2 * 1024 * 8; // size in bits!
config const NUMTHRDS = here.maxTaskPar;
assert(NUMTHRDS >= 1, "NUMTHRDS must be at least one!");

const WHLPRMS = [ 2: Prime, 3: Prime, 5: Prime, 7: Prime,
                            11: Prime, 13: Prime, 17: Prime];
const FRSTSVPRM = 19: Prime; // past the pre-cull primes!
// 2 eliminated as even; 255255 in bytes...
const WHLPTRNSPN = 3 * 5 * 7 * 11 * 13 * 17;
// rounded up to next 64-bit boundary plus a 16 Kilobyte buffer for overflow...
const WHLPTRNBTSZ = ((WHLPTRNSPN * 8 + 63) & (-64)) + 131072;

// number of base primes within small span!
const SZBPSTRTS = 6542 - WHLPRMS.size + 1; // extra one for marker!
// number of base primes for CPU L1 cache buffer!
const SZMNSTRTS = (if L1 == 16 then 12251 else
                     if L1 == 32 then 23000 else 43390)
                       - WHLPRMS.size + 1; // extra one for marker!

// using this Look Up Table faster than bit twiddling...
const bitmsk = for i in 0 .. 7 do 1:uint(8) << i;

var WHLPTRN: SieveBuffer = new SieveBuffer(WHLPTRNBTSZ); fillWHLPTRN(WHLPTRN);
proc fillWHLPTRN(ref wp: SieveBuffer) {
  const hi = WHLPRMS.size - 1;
  const rng = 0 .. hi; var whlhd = new shared BasePrimeArr({rng});
  // contains wheel pattern primes skipping the small wheel prime (2)!...
  // never advances past the first base prime arr as it ends with a huge!...
  for i in rng do whlhd.bparr[i] = (if i != hi then WHLPRMS[i + 1] // skip 2!
                                    else 0x7FFFFFFF): BasePrime; // last huge!
  var whlbpas = new shared BasePrimeArrs(whlhd);
  var whlstrts = new StrtsArr({rng});
  wp.cull(0, WHLPTRNBTSZ, whlbpas, whlstrts);
  // eliminate wheel primes from the WHLPTRN buffer!...
  wp.cmpsts[0] = 0xFF: uint(8);
}

// the following two must be classes for compability with sync...
class PrimeArr { var dom = { 0 .. -1 }; var prmarr: [dom] Prime; }
class BasePrimeArr { var dom = { 0 .. -1 }; var bparr: [dom] BasePrime; }
record StrtsArr { var dom = { 0 .. -1 }; var strtsarr: [dom] int(32); }
record SieveBuffer {
  var dom = { 0 .. -1 }; var cmpsts: [dom] uint(8) = 0;
  proc init() {}
  proc init(btsz: int) { dom = { 0 .. btsz / 8 - 1 }; }
  proc deinit() { dom = { 0 .. -1 }; }

  proc fill(lwi: PrimeNdx) { // fill from the WHLPTRN stamp...
    const sz = cmpsts.size; const mvsz = min(sz, 16384);
    var mdlo = ((lwi / 8) % (WHLPTRNSPN: PrimeNdx)): int;
    for i in 0 .. sz - 1 by 16384 {
      c_memcpy(c_ptrTo(cmpsts[i]): c_void_ptr,
               c_ptrTo(WHLPTRN.cmpsts[mdlo]): c_void_ptr, mvsz);    
      mdlo += 16384; if mdlo >= WHLPTRNSPN then mdlo -= WHLPTRNSPN;
    }
  }

  proc count(btlmt: int) { // count by 64 bits using CPU popcount...
    const lstwrd = btlmt / 64; const lstmsk = (-2):uint(64) << (btlmt & 63);
    const cmpstsp = c_ptrTo(cmpsts: [dom] uint(8)): c_ptr(uint(64));
    var i = 0; var cnt = (lstwrd * 64 + 64): int;
    while i < lstwrd { cnt -= popcount(cmpstsp[i]): int; i += 1; }
    return cnt - popcount(cmpstsp[lstwrd] | lstmsk): int;    
  }

  // most of the time is spent doing culling operations as follows!...
  proc cull(lwi: PrimeNdx, bsbtsz: int, bpas: BasePrimeArrs,
                                        ref strts: StrtsArr) {
    const btlmt = cmpsts.size * 8 - 1; const bplmt = bsbtsz / 32;
    const ndxlmt = lwi: Prime + btlmt: Prime; // can't overflow!
    const strtssz = strts.strtsarr.size;
    // C pointer for speed magic!...
    const cmpstsp = c_ptrTo(cmpsts[0]);
    const strtsp = c_ptrTo(strts.strtsarr);

    // first fill the strts array with pre-calculated start addresses...
    var i = 0; for bp in bpas {
      // calculate page start address for the given base prime...
      const bpi = bp: int; const bbp = bp: Prime; const ndx0 = (bbp - 3) / 2;
      const s0 = (ndx0 + ndx0) * (ndx0 + 3) + 3; // can't overflow!
      if s0 > ndxlmt then {
        if i < strtssz then strtsp[i] = -1: int(32); break; }
      var s = 0: int;
      if s0 >= lwi: Prime then s = (s0 - lwi: Prime): int;
      else { const r = (lwi: Prime - s0) % bbp;
            if r == 0 then s = 0: int; else s = (bbp - r): int; };
      if i < strtssz - 1 { strtsp[i] = s: int(32); i += 1; continue; }
      if i < strtssz { strtsp[i] = -1; i = strtssz; }
      // cull the full buffer for this given base prime as usual...
      // only works up to limit of int(32)**2!!!!!!!!
      while s <= btlmt { cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
    }

    // cull the smaller sub buffers according to the strts array...
    for sbtlmt in bsbtsz - 1 .. btlmt by bsbtsz {
      i = 0; for bp in bpas { // bp never bigger than uint(32)!
        // cull the sub buffer for this given base prime...
        var s = strtsp[i]: int; if s < 0 then break;
        var bpi = bp: int; var nxt = 0x7FFFFFFFFFFFFFFF;
        if bpi <= bplmt { // use loop "unpeeling" for a small improvement...
          const slmt = s + bpi * 8 - 1;
          while s <= slmt {
            const bmi = s & 7; const msk = bitmsk[bmi];
            var c = s >> 3; const clmt = sbtlmt >> 3;
            while c <= clmt { cmpstsp[c] |= msk; c += bpi; }
            nxt = min(nxt, (c << 3): int(64) | bmi: int(64)); s += bpi;
          }
          strtsp[i] = nxt: int(32); i += 1;
        }
        else { while s <= sbtlmt { // standard cull loop...
                 cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
               strtsp[i] = s: int(32); i += 1; }
      }
    }
  }
}

// a generic record that contains a page result generating function;
// allows manual iteration through the use of the next() method;
// multi-threaded through the use of a thread pool...
class PagedResults {
  const cnvrtrclsr; // output converter closure emulator, (lwi, sba) => output
  var lwi: PrimeNdx; var bsbtsz: int;
  var bpas: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
  var sbs: [ 0 .. NUMTHRDS - 1 ] SieveBuffer = new SieveBuffer();
  var strts: [ 0 .. NUMTHRDS - 1 ] StrtsArr = new StrtsArr();
  var qi: int = 0;
  var wrkq$: [ 0 .. NUMTHRDS - 1 ] sync PrimeNdx;
  var rsltsq$: [ 0 .. NUMTHRDS - 1 ] sync cnvrtrclsr(lwi, sbs(0)).type;

  proc init(cvclsr, li: PrimeNdx, bsz: int) {
    cnvrtrclsr = cvclsr; lwi = li; bsbtsz = bsz; }

  proc deinit() { // kill the thread pool when out of scope...
    if bpas == nil then return; // no thread pool!
    for i in wrkq$.domain {
      wrkq$[i].writeEF(-1); while true { const r = rsltsq$[i].readFE();
        if r == nil then break; }
    }
  }
 
  proc next(): cnvrtrclsr(lwi, sbs(0)).type {   
    proc dowrk(ri: int) { // used internally!...
      while true {
        const li = wrkq$[ri].readFE(); // following to kill thread!
        if li < 0 { rsltsq$[ri].writeEF(nil: cnvrtrclsr(li, sbs(ri)).type); break; }
        sbs[ri].fill(li);
        sbs[ri].cull(li, bsbtsz, bpas!, strts[ri]);
        rsltsq$[ri].writeEF(cnvrtrclsr(li, sbs[ri]));
      }
    }
    if this.bpas == nil { // init on first use; avoids data race!
      this.bpas = new BasePrimeArrs();
      if this.bsbtsz < CPUL1CACHE {
        this.sbs = new SieveBuffer(bsbtsz);
        this.strts = new StrtsArr({0 .. SZBPSTRTS - 1});
      }
      else {
        this.sbs = new SieveBuffer(CPUL2CACHE);
        this.strts = new StrtsArr({0 .. SZMNSTRTS - 1});
      }
      // start threadpool and give it inital work...
      for i in rsltsq$.domain {
        begin with (const in i) dowrk(i);
        this.wrkq$[i].writeEF(this.lwi); this.lwi += this.sbs[i].cmpsts.size * 8;
      }
    }
    const rslt = this.rsltsq$[qi].readFE();
    this.wrkq$[qi].writeEF(this.lwi);
    this.lwi += this.sbs[qi].cmpsts.size * 8;
    this.qi = if qi >= NUMTHRDS - 1 then 0 else qi + 1;
    return rslt;
  }
 
  iter these() { while lwi >= 0 do yield next(); }
}

// the sieve buffer to base prime array converter closure...
record SB2BPArr {  
  proc this(lwi: PrimeNdx, sb: SieveBuffer): shared BasePrimeArr? {
    const bsprm = (lwi + lwi + 3): BasePrime;
    const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
    var arr = new shared BasePrimeArr({ 0 .. sb.count(szlmt) - 1 });
    while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 {
                        arr.bparr[j] = bsprm + (i + i): BasePrime; j += 1; }
                      i += 1; }
    return arr;
  }
}

// a memoizing lazy list of BasePrimeArr's...
class BasePrimeArrs {
  var head: shared BasePrimeArr;
  var tail: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
  var lock$: sync bool = true;
  var feed: shared PagedResults(SB2BPArr) =
    new shared PagedResults(new SB2BPArr(), 65536, 65536);

  proc init() { // make our own first array to break data race!
    var sb = new SieveBuffer(256); sb.fill(0);
    const sb2 = new SB2BPArr();
    head = sb2(0, sb): shared BasePrimeArr;
    this.complete(); // fake base primes!
    sb = new SieveBuffer(65536); sb.fill(0);
    // use (completed) self as source of base primes!
    var strts = new StrtsArr({ 0 .. 256 });
    sb.cull(0, 65536, this, strts);
    // replace head with new larger version culled using fake base primes!...
    head = sb2(0, sb): shared BasePrimeArr;
  }

  // for initializing for use by the fillWHLPTRN proc...
  proc init(hd: shared BasePrimeArr) {
    head = hd; feed = new shared PagedResults(new SB2BPArr(), 0, 0);
  }

  // for initializing lazily extended list as required...
  proc init(hd: shared BasePrimeArr, fd: PagedResults) { head = hd; feed = fd; }

  proc next(): shared BasePrimeArrs {
    if this.tail == nil { // in case other thread slipped through!     
      if this.lock$.readFE() && this.tail == nil { // empty sync -> block others!
        const nhd = this.feed.next(): shared BasePrimeArr;
        this.tail = new shared BasePrimeArrs(nhd , this.feed);
      }
      this.lock$.writeEF(false); // fill the sync so other threads can do nothing!
    }
    return this.tail: shared BasePrimeArrs; // necessary cast!
  }
 
  iter these(): BasePrime {
    for bp in head.bparr do yield bp; var cur = next();
    while true {
      for bp in cur.head.bparr do yield bp; cur = cur.next(); }
  }
}

record SB2PrmArr {
  proc this(lwi: PrimeNdx, sb: SieveBuffer): shared PrimeArr? {
    const bsprm = (lwi + lwi + 3): Prime;
    const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
    var arr = new shared PrimeArr({0 .. sb.count(szlmt) - 1});
    while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 then {
                        arr.prmarr[j] = bsprm + (i + i): Prime; j += 1; }
                      i += 1; }
    return arr;
  }
}

iter primes(): Prime {
  for p in WHLPRMS do yield p: Prime;
  for pa in new shared PagedResults(new SB2PrmArr(), 0, CPUL1CACHE) do
    for p in pa!.prmarr do yield p;
}

// use a class so that it can be used as a generic sync value!...
class CntNxt { const cnt: int; const nxt: PrimeNdx; }

// a class that emulates a closure and a return value...
record SB2Cnt {
  const nxtlmt: PrimeNdx;
  proc this(lwi: PrimeNdx, sb: SieveBuffer): shared CntNxt? {
    const btszlmt = sb.cmpsts.size * 8 - 1; const lstndx = lwi + btszlmt: PrimeNdx;
    const btlmt = if lstndx > nxtlmt then max(0, (nxtlmt - lwi): int) else btszlmt;
    return new shared CntNxt(sb.count(btlmt), lstndx);
  }
}

// couut primes to limit, just like it says...
proc countPrimesTo(lmt: Prime): int(64) {
  const nxtlmt = ((lmt - 3) / 2): PrimeNdx; var count = 0: int(64);
  for p in WHLPRMS { if p > lmt then break; count += 1; }
  if lmt < FRSTSVPRM then return count;
  for cn in new shared PagedResults(new SB2Cnt(nxtlmt), 0, CPUL1CACHE) {
    count += cn!.cnt: int(64); if cn!.nxt >= nxtlmt then break;
  }
  return count;
}

// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }

cnt = 0; for p in primes() { if p > 1000000 then break; cnt += 1; }
writeln("\nThere are ", cnt, " primes up to a million.");

write("Sieving to ", LIMIT, " with ");
write("CPU L1/L2 cache sizes of ", L1, "/", L2, " KiloBytes ");
writeln("using ", NUMTHRDS, " threads.");

var timer: Timer; timer.start();
// the slow way!:
// var count = 0; for p in primes() { if p > LIMIT then break; count += 1; }
const count = countPrimesTo(LIMIT); // the fast way!
timer.stop();

write("Found ", count, " primes up to ", LIMIT);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
Output:
The first 25 primes are:  2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
There are 78498 primes up to a million.
Sieving to 1000000000 with CPU L1/L2 cache sizes of 16/128 KiloBytes using 4 threads.
Found 50847534 primes up to 1000000000 in 128.279 milliseconds.

Time as run using Chapel version 1.24.1 on an Intel Skylake i5-6500 at 3.2 GHz (base, multi-threaded).

Note that the above code does implement some functional concepts as in a memoized lazy list of base prime arrays, but as this is used at the page level, the slowish performance doesn't impact the overall execution time much and the code is much more elegant in using this concept such that we compute new pages of base primes as they are required for increasing range.

Some of the most tricky bits due to having thread pools is stopping and de-initializing when they go out of scope; this is done by the `deinit` method of the `PagedResults` generic class, and was necessary to prevent a segmentation fault when the thread pool goes out of scope.

The tight inner loops for culling composite number representations have been optimized to some extent in using "loop unpeeling" for smaller base primes to simplify the loops down to simple masking by a constant with eight separate loops for the repeating pattern over bytes and culling by sub buffer CPU L1 cache sizes over the outer sieve buffer size of the CPU L2 cache size in order to make the task work-sized chunks larger for less task context switching overheads and for reduced time lost to culling start address calculations per base prime (which needs to use some integer division that is always slower than other integer operations). This last optimization allows for reasonably efficient culling up to the square of the CPU L2 cache size in bits or 1e12 for the one Megabit CPU L2 cache size many mid-range Intel CPU's have currently when used for multi-threading (half of the actual size for Hyper-Threaded - HT - threads as they share both the L1 and the L2 caches over the pairs of Hyper-Threaded (HT) threads per core).

Although this code can be used for much higher sieving ranges, it is not recommended due to not yet being tuned for better efficiency above 1e12; there are no checks limiting the user to this range, but, as well as decreasing efficiency for sieving limits much higher than this, at some point there will be errors due to integer overflows but these will be for huge sieving ranges taking days -> weeks -> months -> years to execute on common desktop CPU's.

A further optimization used is to create a pre-culled `WHLPTRN` `SieveBuffer` where the odd primes (since we cull odds-only) of 3, 5, 7, 11, 13, and 17 have already been culled and using that to pre-fill the page segment buffers so that no culling by these base prime values is required, this reduces the number of operations by about 45% compared to if it wasn't done but the ratio of better performance is only about 34.5% better as this changes the ratio of (fast) smaller base primes to larger (slower) ones.

All of the improvements to this point allow the shown performance as per the displayed output for the above program; using a command line argument of `--L1=32 --L2=256 --LIMIT=100000000000` (a hundred billion - 1e11 - on this computer, which has cache sizes of that amount and no Hyper-Threading - HT), it can count the primes to 1e11 in about 17.5 seconds using the above mentioned CPU. It will be over two times faster than this using a more modern desktop CPU such as the Intel Core i7-9700K which has twice as many effective cores, a higher CPU clock rate, is about 10% to 15% faster due the a more modern CPU architecture which is three generations newer. Of course using a top end AMD Threadripper CPU with its 64/128 cores/threads will be almost eight times faster again except that it will lose about 20% due to its slower clock speed when all cores/threads are used; note that high core CPU's will only give these speed gains for large sieving ranges such as 1e11 and above since otherwise there aren't enough work chunks to go around for all the threads available!

Incredibly, even run single threaded (argument of `--NUMTHRDS=1`) this implementation is only about 20% slower than the reference Sieve of Atkin "primegen/primespeed" implementation in counting the number of primes to a billion and is about 20% faster in counting the primes to a hundred billion (arguments of `--LIMIT=100000000000 --NUMTHRDS=1`) with both using the same size of CPU L1 cache buffer of 16 Kilobytes; This implementation does not yet have the level of wheel optimization of the Sieve of Atkin as it has only the limited wheel optimization of Odds-Only plus the use of the pre-cull fill. Maximum wheel factorization will reduce the number of operations for this code to less than about half the current number, making it faster than the Sieve of Atkin for all ranges, and approach the speed of Kim Walisch's "primesieve". However, not having primitive element pointers and pointer operations, there are some optimizations used that Kim Walisch's "primesieve" uses of extreme loop unrolling that mean that it can never quite reach the speed of "primeseive" by about 20% to 30%.

The above code is a fairly formidable benchmark, which I have also written in Fortran as in likely the major computer language that is comparable. I see that Chapel has the following advantages over Fortran:

1) It is somewhat cleaner to read and write code with more modern forms of expression, especially as to declaring variables/constants which can often be inferred as to type.

2) The Object Oriented Programming paradigm has been designed in from the beginning and isn't just an add-on that needs to be careful not to break legacy code; Fortran's method of expression this paradigm using modules seems awkward by comparison.

3) It has some more modern forms of automatic memory management as to type safety and sharing of allocated memory structures.

4) It has several modern forms of managing concurrency built in from the beginning rather than being add-on's or just being the ability to call through to OpenMP/MPI.

That said, it also as the following disadvantages, at least as I see it:

1) One of the worst things about Chapel is the slow compilation speed, which is about ten times slower than GNU gfortran.

2) It's just my personal opinion, but so much about forms of expression have been modernized and improved, it seems very dated to go back to using curly braces to delineate code blocks and semi-colons as line terminators; Most modern languages at least dispense with the latter.

3) Some programming features offered are still being defined, although most evolutionary changes now no longer are breaking code changes.

Speed isn't really an issue with either one, with some types of tasks better suited to one or the other but mostly about the same; for this particular task they are about the same if one were to implement the same algorithmic optimizations other than that one can do some of the extreme loop unrolling optimization with Fortran that can't be done with Chapel as Fortran has some limited form of pointers, although not the full set of pointer operators that C/C++ like languages have. I think that if both were optimized as much as each is capable, Fortran may run about 20% faster, perhaps due to the maturity of its compile and due to the availablity of (limited) pointer operations.

The primary additional optimization available to Chapel code is the addition of Maximum Wheel-Factorization as per my StackOverflow JavaScript Tutorial answer, with the other major improvement to add "bucket sieving" for sieving limits above about 1e12 so as to get reasonable efficiency up to 1e16 and above.

Clojure

(defn primes< [n]
   (remove (set (mapcat #(range (* % %) n %)
                        (range 2 (Math/sqrt n))))
           (range 2 n)))

The above is **not strictly a Sieve of Eratosthenes** as the composite culling ranges (in the mapcat) include all of the multiples of all of the numbers and not just the multiples of primes. When tested with (println (time (count (primes< 1000000)))), it takes about 5.5 seconds just to find the number of primes up to a million, partly because of the extra work due to the use of the non-primes, and partly because of the constant enumeration using sequences with multiple levels of function calls. Although very short, this code is likely only useful up to about this range of a million.

It may be written using the into #{} function to run slightly faster due to the set function being concerned with only distinct elements whereas the into #{} only does the conjunction, and even at that doesn't do that much as it does the conjunction to an empty sequence, the code as follows:

(defn primes< [n]
   (remove (into #{}
                 (mapcat #(range (* % %) n %)
                         (range 2 (Math/sqrt n))))
           (range 2 n)))

The above code is slightly faster for the reasons given, but is still not strictly a Sieve of Eratosthenes due to sieving by all numbers and not just by the base primes.

The following code also uses the into #{} transducer but has been slightly wheel-factorized to sieve odds-only:

(defn primes< [n]
   (if (< n 2) ()
     (cons 2 (remove (into #{}
                           (mapcat #(range (* % %) n %)
                                   (range 3 (Math/sqrt n) 2)))
                     (range 3 n 2)))))

The above code is a little over twice as fast as the non-odds-only due to the reduced number of operations. It still isn't strictly a Sieve of Eratosthenes as it sieves by all odd base numbers and not only by the base primes.

The following code calculates primes up to and including n using a mutable boolean array but otherwise entirely functional code; it is tens (to a hundred) times faster than the purely functional codes due to the use of mutability in the boolean array:

(defn primes-to
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [n]
  (let [root (-> n Math/sqrt long),
        cmpsts (boolean-array (inc n)),
        cullp (fn [p]
                (loop [i (* p p)]
                  (if (<= i n)
                    (do (aset cmpsts i true)
                        (recur (+ i p))))))]
    (do (dorun (map #(cullp %) (filter #(not (aget cmpsts %))
                                       (range 2 (inc root)))))
        (filter #(not (aget cmpsts %)) (range 2 (inc n))))))

Alternative implementation using Clojure's side-effect oriented list comprehension.

 (defn primes-to
  "Returns a lazy sequence of prime numbers less than lim"
  [lim]
  (let [refs (boolean-array (+ lim 1) true)
        root (int (Math/sqrt lim))]
    (do (doseq [i (range 2 lim)
                :while (<= i root)
                :when (aget refs i)]
          (doseq [j (range (* i i) lim i)]
            (aset refs j false)))
        (filter #(aget refs %) (range 2 lim)))))

Alternative implementation using Clojure's side-effect oriented list comprehension. Odds only.

(defn primes-to
  "Returns a lazy sequence of prime numbers less than lim"
  [lim]
  (let [max-i (int (/ (- lim 1) 2))
        refs (boolean-array max-i true)
        root (/ (dec (int (Math/sqrt lim))) 2)]
    (do (doseq [i (range 1 (inc root))
                :when (aget refs i)]
          (doseq [j (range (* (+ i i) (inc i)) max-i (+ i i 1))]
            (aset refs j false)))
        (cons 2 (map #(+ % % 1) (filter #(aget refs %) (range 1 max-i)))))))

This implemantation is about twice as fast as the previous one and uses only half the memory. From the index of the array, it calculates the value it represents as (2*i + 1), the step between two indices that represent the multiples of primes to mark as composite is also (2*i + 1). The index of the square of the prime to start composite marking is 2*i*(i+1).

Alternative very slow entirely functional implementation using lazy sequences

(defn primes-to
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [n]
  (letfn [(nxtprm [cs] ; current candidates
            (let [p (first cs)]
              (if (> p (Math/sqrt n)) cs
                (cons p (lazy-seq (nxtprm (-> (range (* p p) (inc n) p)
                                              set (remove cs) rest)))))))]
    (nxtprm (range 2 (inc n)))))

The reason that the above code is so slow is that it has has a high constant factor overhead due to using a (hash) set to remove the composites from the future composites stream, each prime composite stream removal requires a scan across all remaining composites (compared to using an array or vector where only the culled values are referenced, and due to the slowness of Clojure sequence operations as compared to iterator/sequence operations in other languages.

Version based on immutable Vector's

Here is an immutable boolean vector based non-lazy sequence version other than for the lazy sequence operations to output the result:

(defn primes-to
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [max-prime]
  (let [sieve (fn [s n]
                (if (<= (* n n) max-prime)
                  (recur (if (s n)
                           (reduce #(assoc %1 %2 false) s (range (* n n) (inc max-prime) n))
                           s)
                         (inc n))
                  s))]
    (->> (-> (reduce conj (vector-of :boolean) (map #(= % %) (range (inc max-prime))))
             (assoc 0 false)
             (assoc 1 false)
             (sieve 2))
         (map-indexed #(vector %2 %1)) (filter first) (map second))))

The above code is still quite slow due to the cost of the immutable copy-on-modify operations.

Odds only bit packed mutable array based version

The following code implements an odds-only sieve using a mutable bit packed long array, only using a lazy sequence for the output of the resulting primes:

(set! *unchecked-math* true)

(defn primes-to
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [n]
  (let [root (-> n Math/sqrt long),
        rootndx (long (/ (- root 3) 2)),
        ndx (long (/ (- n 3) 2)),
        cmpsts (long-array (inc (/ ndx 64))),
        isprm #(zero? (bit-and (aget cmpsts (bit-shift-right % 6))
                               (bit-shift-left 1 (bit-and % 63)))),
        cullp (fn [i]
                (let [p (long (+ i i 3))]
	                (loop [i (bit-shift-right (- (* p p) 3) 1)]
	                  (if (<= i ndx)
	                    (do (let [w (bit-shift-right i 6)]
	                    (aset cmpsts w (bit-or (aget cmpsts w)
	                                           (bit-shift-left 1 (bit-and i 63)))))
	                        (recur (+ i p))))))),
        cull (fn [] (loop [i 0] (if (<= i rootndx)
                                  (do (if (isprm i) (cullp i)) (recur (inc i))))))]
    (letfn [(nxtprm [i] (if (<= i ndx)
                          (cons (+ i i 3) (lazy-seq (nxtprm (loop [i (inc i)]
                                                              (if (or (> i ndx) (isprm i)) i
                                                                (recur (inc i)))))))))]
      (if (< n 2) nil
        (cons 3 (if (< n 3) nil (do (cull) (lazy-seq (nxtprm 0)))))))))

The above code is about as fast as any "one large sieving array" type of program in any computer language with this level of wheel factorization other than the lazy sequence operations are quite slow: it takes about ten times as long to enumerate the results as it does to do the actual sieving work of culling the composites from the sieving buffer array. The slowness of sequence operations is due to nested function calls, but primarily due to the way Clojure implements closures by "boxing" all arguments (and perhaps return values) as objects in the heap space, which then need to be "un-boxed" as primitives as necessary for integer operations. Some of the facilities provided by lazy sequences are not needed for this algorithm, such as the automatic memoization which means that each element of the sequence is calculated only once; it is not necessary for the sequence values to be retraced for this algorithm.

If further levels of wheel factorization were used, the time to enumerate the resulting primes would be an even higher overhead as compared to the actual composite number culling time, would get even higher if page segmentation were used to limit the buffer size to the size of the CPU L1 cache for many times better memory access times, most important in the culling operations, and yet higher again if multi-processing were used to share to page segment processing across CPU cores.

The following code overcomes many of those limitations by using an internal (OPSeq) "deftype" which implements the ISeq interface as well as the Counted interface to provide immediate count returns (based on a pre-computed total), as well as the IReduce interface which can greatly speed come computations based on the primes sequence (eased greatly using facilities provided by Clojure 1.7.0 and up):

(defn primes-tox
  "Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
  [n]
  (let [root (-> n Math/sqrt long),
        rootndx (long (/ (- root 3) 2)),
        ndx (max (long (/ (- n 3) 2)) 0),
        lmt (quot ndx 64),
        cmpsts (long-array (inc lmt)),
        cullp (fn [i]
                (let [p (long (+ i i 3))]
	                (loop [i (bit-shift-right (- (* p p) 3) 1)]
	                  (if (<= i ndx)
	                    (do (let [w (bit-shift-right i 6)]
                            (aset cmpsts w (bit-or (aget cmpsts w)
                                                   (bit-shift-left 1 (bit-and i 63)))))
                          (recur (+ i p))))))),
        cull (fn [] (do (aset cmpsts lmt (bit-or (aget cmpsts lmt)
                                                 (bit-shift-left -2 (bit-and ndx 63))))
                        (loop [i 0]
                          (when (<= i rootndx)
                            (when (zero? (bit-and (aget cmpsts (bit-shift-right i 6))
                                                   (bit-shift-left 1 (bit-and i 63))))
                              (cullp i))
                            (recur (inc i))))))
        numprms (fn []
                  (let [w (dec (alength cmpsts))] ;; fast results count bit counter
                    (loop [i 0, cnt (bit-shift-left (alength cmpsts) 6)]
                      (if (> i w) cnt
                        (recur (inc i) 
                               (- cnt (java.lang.Long/bitCount (aget cmpsts i))))))))]
    (if (< n 2) nil
      (cons 2 (if (< n 3) nil
                (do (cull)
                    (deftype OPSeq [^long i ^longs cmpsa ^long cnt ^long tcnt] ;; for arrays maybe need to embed the array so that it doesn't get garbage collected???
                      clojure.lang.ISeq
                        (first [_] (if (nil? cmpsa) nil (+ i i 3)))
                        (next [_] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) nil
                                                          (OPSeq.
                                                            (loop [j (inc i)]
                                                              (let [p? (zero? (bit-and (aget cmpsa (bit-shift-right j 6))
                                                                                       (bit-shift-left 1 (bit-and j 63))))]
                                                                (if p? j (recur (inc j)))))
                                                            cmpsa ncnt tcnt))))
                        (more [this] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) (OPSeq. 0 nil tcnt tcnt)
                                                             (.next this))))
                        (cons [this o] (clojure.core/cons o this))
                        (empty [_] (if (= cnt tcnt) nil (OPSeq. 0 nil tcnt tcnt)))
                        (equiv [this o] (if (or (not= (type this) (type o))
                                                (not= cnt (.cnt ^OPSeq o)) (not= tcnt (.tcnt ^OPSeq o))
                                                (not= i (.i ^OPSeq o))) false true))
                        clojure.lang.Counted
                          (count [_] (- tcnt cnt))
                        clojure.lang.Seqable
                          (clojure.lang.Seqable/seq [this] (if (= cnt tcnt) nil this))
                        clojure.lang.IReduce
                          (reduce [_ f v] (let [c (- tcnt cnt)]
                                            (if (<= c 0) nil
                                              (loop [ci i, n c, rslt v]
                                                (if (zero? (bit-and (aget cmpsa (bit-shift-right ci 6))
                                                                    (bit-shift-left 1 (bit-and ci 63))))
                                                  (let [rrslt (f rslt (+ ci ci 3)),
                                                        rdcd (reduced? rrslt),
                                                        nrslt (if rdcd @rrslt rrslt)]
                                                    (if (or (<= n 1) rdcd) nrslt
                                                      (recur (inc ci) (dec n) nrslt)))
                                                  (recur (inc ci) n rslt))))))
                          (reduce [this f] (if (nil? i) (f) (if (= (.count this) 1) (+ i i 3)
                                                              (.reduce ^clojure.lang.IReduce (.next this) f (+ i i 3)))))
                        clojure.lang.Sequential
                        Object
                          (toString [this] (if (= cnt tcnt) "()"
                                             (.toString (seq (map identity this)))))) 
                    (->OPSeq 0 cmpsts 0 (numprms))))))))

'(time (count (primes-tox 10000000)))' takes about 40 milliseconds (compiled) to produce 664579.

Due to the better efficiency of the custom CIS type, the primes to the above range can be enumerated in about the same 40 milliseconds that it takes to cull and count the sieve buffer array.

Under Clojure 1.7.0, one can use '(time (reduce (fn [] (+ (long sum) (long v))) 0 (primes-tox 2000000)))' to find "142913828922" as the sum of the primes to two million as per Euler Problem 10 in about 40 milliseconds total with about half the time used for sieving the array and half for computing the sum.

To show how sensitive Clojure is to forms of expression of functions, the simple form '(time (reduce + (primes-tox 2000000)))' takes about twice as long even though it is using the same internal routine for most of the calculation due to the function not having the type coercion's.

Before one considers that this code is suitable for larger ranges, it is still lacks the improvements of page segmentation with pages about the size of the CPU L1/L2 caches (produces about a four times speed up), maximal wheel factorization (to make it another about four times faster), and the use of multi-processing (for a further gain of about 4 times for a multi-core desktop CPU such as an Intel i7), will make the sieving/counting code about 50 times faster than this, although there will only be a moderate improvement in the time to enumerate/process the resulting primes. Using these techniques, the number of primes to one billion can be counted in a small fraction of a second.

Unbounded Versions

For some types of problems such as finding the nth prime (rather than the sequence of primes up to m), a prime sieve with no upper bound is a better tool.

The following variations on an incremental Sieve of Eratosthenes are based on or derived from the Richard Bird sieve as described in the Epilogue of Melissa E. O'Neill's definitive paper:

A Clojure version of Richard Bird's Sieve using Lazy Sequences (sieves odds only)

(defn primes-Bird
  "Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm by Richard Bird."
  []
  (letfn [(mltpls [p] (let [p2 (* 2 p)]
                        (letfn [(nxtmltpl [c]
                                  (cons c (lazy-seq (nxtmltpl (+ c p2)))))]
                          (nxtmltpl (* p p))))),
          (allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),
          (union [xs ys] (let [xv (first xs), yv (first ys)]
                           (if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))
                             (if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))
                               (cons xv (lazy-seq (union (next xs) (next ys)))))))),
          (mrgmltpls [mltplss] (cons (first (first mltplss))
                                     (lazy-seq (union (next (first mltplss))
                                                      (mrgmltpls (next mltplss)))))),
          (minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]
                                    (if (< n (first cmpsts))
                                      (cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))
                                      (recur (+ n 2) (next cmpsts)))))]
    (do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
                                         (minusStrtAt 5 cmpsts)))))
        (cons 2 (lazy-seq oddprms)))))

The above code is quite slow due to both that the data structure is a linear merging of prime multiples and due to the slowness of the Clojure sequence operations.

A Clojure version of the tree folding sieve using Lazy Sequences

The following code speeds up the above code by merging the linear sequence of sequences as above by pairs into a right-leaning tree structure:

(defn primes-treeFolding
  "Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
  []
  (letfn [(mltpls [p] (let [p2 (* 2 p)]
                        (letfn [(nxtmltpl [c]
                                  (cons c (lazy-seq (nxtmltpl (+ c p2)))))]
                          (nxtmltpl (* p p))))),
          (allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),
          (union [xs ys] (let [xv (first xs), yv (first ys)]
                           (if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))
                             (if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))
                               (cons xv (lazy-seq (union (next xs) (next ys)))))))),
          (pairs [mltplss] (let [tl (next mltplss)]
                             (cons (union (first mltplss) (first tl))
                                   (lazy-seq (pairs (next tl)))))),
          (mrgmltpls [mltplss] (cons (first (first mltplss))
                                     (lazy-seq (union (next (first mltplss))
                                                      (mrgmltpls (pairs (next mltplss))))))),
          (minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]
                                    (if (< n (first cmpsts))
                                      (cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))
                                      (recur (+ n 2) (next cmpsts)))))]
    (do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
                                         (minusStrtAt 5 cmpsts)))))
        (cons 2 (lazy-seq oddprms)))))

The above code is still slower than it should be due to the slowness of Clojure's sequence operations.

A Clojure version of the above tree folding sieve using a custom Co Inductive Sequence

The following code uses a custom "deftype" non-memoizing Co Inductive Stream/Sequence (CIS) implementing the ISeq interface to make the sequence operations more efficient and is about four times faster than the above code:

(deftype CIS [v cont]
  clojure.lang.ISeq
    (first [_] v)
    (next [_] (if (nil? cont) nil (cont)))
    (more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))
    (cons [this o] (clojure.core/cons o this))
    (empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))
    (equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)
                                                (if (or (not= (type cis1) (type cis2))
                                                        (not= (.v cis1) (.v ^CIS cis2))
                                                        (and (nil? (.cont cis1))
                                                              (not (nil? (.cont ^CIS cis2))))
                                                        (and (nil? (.cont ^CIS cis2))
                                                              (not (nil? (.cont cis1))))) false
                                                  (if (nil? (.cont cis1)) true
                                                    (recur ((.cont cis1)) ((.cont ^CIS cis2))))))))
    (count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt
                                            (recur ((.cont cis)) (inc cnt)))))
  clojure.lang.Seqable
    (seq [this] (if (and (nil? v) (nil? cont)) nil this))
  clojure.lang.Sequential
  Object
    (toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))

(defn primes-treeFoldingx
  "Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
  []
  (letfn [(mltpls [p] (let [p2 (* 2 p)]
                        (letfn [(nxtmltpl [c]
                                  (->CIS c (fn [] (nxtmltpl (+ c p2)))))]
                          (nxtmltpl (* p p))))),
          (allmtpls [^CIS ps] (->CIS (mltpls (.v ps)) (fn [] (allmtpls ((.cont ps)))))),
          (union [^CIS xs ^CIS ys] (let [xv (.v xs), yv (.v ys)]
                                     (if (< xv yv) (->CIS xv (fn [] (union ((.cont xs)) ys)))
                                       (if (< yv xv) (->CIS yv (fn [] (union xs ((.cont ys)))))
                                         (->CIS xv (fn [] (union (next xs) ((.cont ys))))))))),
          (pairs [^CIS mltplss] (let [^CIS tl ((.cont mltplss))]
                                  (->CIS (union (.v mltplss) (.v tl))
                                         (fn [] (pairs ((.cont tl))))))),
          (mrgmltpls [^CIS mltplss] (->CIS (.v ^CIS (.v mltplss))
                                           (fn [] (union ((.cont ^CIS (.v mltplss)))
                                                         (mrgmltpls (pairs ((.cont mltplss)))))))),
          (minusStrtAt [n ^CIS cmpsts] (loop [n n, cmpsts cmpsts]
                                         (if (< n (.v cmpsts))
                                           (->CIS n (fn [] (minusStrtAt (+ n 2) cmpsts)))
                                           (recur (+ n 2) ((.cont cmpsts))))))]
    (do (def oddprms (->CIS 3 (fn [] (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
                                       (minusStrtAt 5 cmpsts)))))
        (->CIS 2 (fn [] oddprms)))))

'(time (count (take-while #(<= (long %) 10000000) (primes-treeFoldingx))))' takes about 3.4 seconds for a range of 10 million.

The above code is useful for ranges up to about fifteen million primes, which is about the first million primes; it is comparable in speed to all of the bounded versions except for the fastest bit packed version which can reasonably be used for ranges about 100 times as large.

Incremental Hash Map based unbounded "odds-only" version

The following code is a version of the O'Neill Haskell code but does not use wheel factorization other than for sieving odds only (although it could be easily added) and uses a Hash Map (constant amortized access time) rather than a Priority Queue (log n access time for combined remove-and-insert-anew operations, which are the majority used for this algorithm) with a lazy sequence for output of the resulting primes; the code has the added feature that it uses a secondary base primes sequence generator and only adds prime culling sequences to the composites map when they are necessary, thus saving time and limiting storage to only that required for the map entries for primes up to the square root of the currently sieved number:

(defn primes-hashmap
  "Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Hashmap"
  []
  (letfn [(nxtoddprm [c q bsprms cmpsts]
            (if (>= c q) ;; only ever equal
              (let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]
                (recur (+ c 2) (* nbp nbp) nbps (assoc cmpsts (+ q p2) p2)))
              (if (contains? cmpsts c)
                (recur (+ c 2) q bsprms
                       (let [adv (cmpsts c), ncmps (dissoc cmpsts c)]
                         (assoc ncmps
                                (loop [try (+ c adv)] ;; ensure map entry is unique
                                  (if (contains? ncmps try)
                                    (recur (+ try adv)) try)) adv)))
                (cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts))))))]
    (do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms {}))))
        (cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms {}))))))

The above code is slower than the best tree folding version due to the added constant factor overhead of computing the hash functions for every hash map operation even though it has computational complexity of (n log log n) rather than the worse (n log n log log n) for the previous incremental tree folding sieve. It is still about 100 times slower than the sieve based on the bit-packed mutable array due to these constant factor hashing overheads.

There is almost no benefit of converting the above code to use a CIS as most of the time is expended in the hash map functions.

Incremental Priority Queue based unbounded "odds-only" version

In order to implement the O'Neill Priority Queue incremental Sieve of Eratosthenes algorithm, one requires an efficient implementation of a Priority Queue, which is not part of standard Clojure. For this purpose, the most suitable Priority Queue is a binary tree heap based MinHeap algorithm. The following code implements a purely functional (using entirely immutable state) MinHeap Priority Queue providing the required functions of (emtpy-pq) initialization, (getMin-pq pq) to examinte the minimum key/value pair in the queue, (insert-pq pq k v) to add entries to the queue, and (replaceMinAs-pq pq k v) to replaace the minimum entry with a key/value pair as given (it is more efficient that if functions were provided to delete and then re-insert entries in the queue; there is therefore no "delete" or other queue functions supplied as the algorithm does not requrie them:

(deftype PQEntry [k, v]
  Object
    (toString [_] (str "<" k "," v ">")))
(deftype PQNode [ntry, lft, rght]
  Object
    (toString [_] (str "<" ntry " left: " (str lft) " right: " (str rght) ">")))

(defn empty-pq [] nil)

(defn getMin-pq [^PQNode pq]
  (if (nil? pq)
    nil
    (.ntry pq)))

(defn insert-pq [^PQNode opq ok v]
  (loop [^PQEntry kv (->PQEntry ok v), pq opq, cont identity]
    (if (nil? pq)
      (cont (->PQNode kv nil nil))
      (let [k (.k kv),
            ^PQEntry kvn (.ntry pq), kn (.k kvn),
            l (.lft pq), r (.rght pq)]
        (if (<= k kn)
          (recur kvn r #(cont (->PQNode kv % l)))
          (recur kv r #(cont (->PQNode kvn % l))))))))

(defn replaceMinAs-pq [^PQNode opq k v]
  (let [^PQEntry kv (->PQEntry k v)]
    (if (nil? opq) ;; if was empty or just an entry, just use current entry
      (->PQNode kv nil nil)
      (loop [pq opq, cont identity]
        (let [^PQNode l (.lft pq), ^PQNode r (.rght pq)]
          (cond ;; if left us empty, right must be too
            (nil? l)
              (cont (->PQNode kv nil nil)),
            (nil? r) ;; we only have a left...
              (let [^PQEntry kvl (.ntry l), kl (.k kvl)]
                    (if (<= k kl)
                      (cont (->PQNode kv l nil))
                      (recur l #(cont (->PQNode kvl % nil))))),
            :else (let [^PQEntry kvl (.ntry l), kl (.k kvl),
                        ^PQEntry kvr (.ntry r), kr (.k kvr)] ;; we have both
                    (if (and (<= k kl) (<= k kr))
                      (cont (->PQNode kv l r))
                      (if (<= kl kr)
                        (recur l #(cont (->PQNode kvl % r)))
                        (recur r #(cont (->PQNode kvr l %))))))))))))

Note that the above code is written partially using continuation passing style so as to leave the "recur" calls in tail call position as required for efficient looping in Clojure; for practical sieving ranges, the algorithm could likely use just raw function recursion as recursion depth is unlikely to be used beyond a depth of about ten or so, but raw recursion is said to be less code efficient.

The actual incremental sieve using the Priority Queue is as follows, which code uses the same optimizations of postponing the addition of prime composite streams to the queue until the square root of the currently sieved number is reached and using a secondary base primes stream to generate the primes composite stream markers in the queue as was used for the Hash Map version:

(defn primes-pq
  "Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Priority Queue"
  []
  (letfn [(nxtoddprm [c q bsprms cmpsts]
            (if (>= c q) ;; only ever equal
              (let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]
                (recur (+ c 2) (* nbp nbp) nbps (insert-pq cmpsts (+ q p2) p2)))
              (let [mn (getMin-pq cmpsts)]
                (if (and mn (>= c (.k mn))) ;; never greater than
                  (recur (+ c 2) q bsprms
                         (loop [adv (.v mn), cmps cmpsts] ;; advance repeat composites for value
                           (let [ncmps (replaceMinAs-pq cmps (+ c adv) adv),
                                 nmn (getMin-pq ncmps)]
                             (if (and nmn (>= c (.k nmn)))
                               (recur (.v nmn) ncmps)
                               ncmps))))
                  (cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts)))))))]
    (do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms (empty-pq)))))
        (cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms (empty-pq)))))))

The above code is faster than the Hash Map version up to about a sieving range of fifteen million or so, but gets progressively slower for larger ranges due to having (n log n log log n) computational complexity rather than the (n log log n) for the Hash Map version, which has a higher constant factor overhead that is overtaken by the extra "log n" factor.

It is slower that the fastest of the tree folding versions (which has the same computational complexity) due to the higher constant factor overhead of the Priority Queue operations (although perhaps a more efficient implementation of the MinHeap Priority Queue could be developed).

Again, these non-mutable array based sieves are about a hundred times slower than even the "one large memory buffer array" version as implemented in the bounded section; a page segmented version of the mutable bit-packed memory array would be several times faster.

All of these algorithms will respond to maximum wheel factorization, getting up to approximately four times faster if this is applied as compared to the the "odds-only" versions.

It is difficult if not impossible to apply efficient multi-processing to the above versions of the unbounded sieves as the next values of the primes sequence are dependent on previous changes of state for the Bird and Tree Folding versions; however, with the addition of a "update the whole Priority Queue (and reheapify)" or "update the Hash Map" to a given page start state functions, it is possible to do for these letter two algorithms; however, even though it is possible and there is some benefit for these latter two implementations, the benefit is less than using mutable arrays due to that the results must be enumerated into a data structure of some sort in order to be passed out of the page function whereas they can be directly enumerated from the array for the mutable array versions.

Bit packed page segmented array unbounded "odds-only" version

To show that Clojure does not need to be particularly slow, the following version runs about twice as fast as the non-segmented unbounded array based version above (extremely fast compared to the non-array based versions) and only a little slower than other equivalent versions running on virtual machines: C# or F# on DotNet or Java and Scala on the JVM:

(set! *unchecked-math* true)

(def PGSZ (bit-shift-left 1 14)) ;; size of CPU cache
(def PGBTS (bit-shift-left PGSZ 3))
(def PGWRDS (bit-shift-right PGBTS 5))
(def BPWRDS (bit-shift-left 1 7)) ;; smaller page buffer for base primes
(def BPBTS (bit-shift-left BPWRDS 5))
(defn- count-pg
  "count primes in the culled page buffer, with test for limit"
  [lmt ^ints pg]
  (let [pgsz (alength pg),
        pgbts (bit-shift-left pgsz 5),
        cntem (fn [lmtw]
                (let [lmtw (long lmtw)]
	          (loop [i (long 0), c (long 0)]
	            (if (>= i lmtw) (- (bit-shift-left lmtw 5) c)
	              (recur (inc i)
	              (+ c (java.lang.Integer/bitCount (aget pg i))))))))]
    (if (< lmt pgbts)
      (let [lmtw (bit-shift-right lmt 5),
            lmtb (bit-and lmt 31)
            msk (bit-shift-left -2 lmtb)]
        (+ (cntem lmtw)
           (- 32 (java.lang.Integer/bitCount (bit-or (aget pg lmtw)
                                                      msk)))))
      (- pgbts
         (areduce pg i ret (long 0) (+ ret (java.lang.Integer/bitCount (aget pg i))))))))
;;      (cntem pgsz))))
(defn- primes-pages
  "unbounded Sieve of Eratosthenes producing a lazy sequence of culled page buffers."
  []
  (letfn [(make-pg [lowi pgsz bpgs]
            (let [lowi (long lowi),
                  pgbts (long (bit-shift-left pgsz 5)),
                  pgrng (long (+ (bit-shift-left (+ lowi pgbts) 1) 3)),
                  ^ints pg (int-array pgsz),
                  cull (fn [bpgs']
                         (loop [i (long 0), bpgs' bpgs']
	                         (let [^ints fbpg