Sieve of Eratosthenes
You are encouraged to solve this task according to the task description, using any language you may know.
The Sieve of Eratosthenes is a simple algorithm that finds the prime numbers up to a given integer.
- Task
Implement the Sieve of Eratosthenes algorithm, with the only allowed optimization that the outer loop can stop at the square root of the limit, and the inner loop may start at the square of the prime just found.
That means especially that you shouldn't optimize by using pre-computed wheels, i.e. don't assume you need only to cross out odd numbers (wheel based on 2), numbers equal to 1 or 5 modulo 6 (wheel based on 2 and 3), or similar wheels based on low primes.
If there's an easy way to add such a wheel based optimization, implement it as an alternative version.
- Note
- It is important that the sieve algorithm be the actual algorithm used to find prime numbers for the task.
- Related tasks
- Emirp primes
- count in factors
- prime decomposition
- factors of an integer
- extensible prime generator
- primality by trial division
- factors of a Mersenne number
- trial factoring of a Mersenne number
- partition an integer X into N primes
- sequence of primes by Trial Division
11l
F primes_upto(limit)
V is_prime = [0B]*2 [+] [1B]*(limit - 1)
L(n) 0 .< Int(limit ^ 0.5 + 1.5)
I is_prime[n]
L(i) (n*n..limit).step(n)
is_prime[i] = 0B
R enumerate(is_prime).filter((i, prime) -> prime).map((i, prime) -> i)
print(primes_upto(100))
- Output:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
360 Assembly
For maximum compatibility, this program uses only the basic instruction set.
* Sieve of Eratosthenes
ERATOST CSECT
USING ERATOST,R12
SAVEAREA B STM-SAVEAREA(R15)
DC 17F'0'
DC CL8'ERATOST'
STM STM R14,R12,12(R13) save calling context
ST R13,4(R15)
ST R15,8(R13)
LR R12,R15 set addessability
* ---- CODE
LA R4,1 I=1
LA R6,1 increment
L R7,N limit
LOOPI BXH R4,R6,ENDLOOPI do I=2 to N
LR R1,R4 R1=I
BCTR R1,0
LA R14,CRIBLE(R1)
CLI 0(R14),X'01'
BNE ENDIF if not CRIBLE(I)
LR R5,R4 J=I
LR R8,R4
LR R9,R7
LOOPJ BXH R5,R8,ENDLOOPJ do J=I*2 to N by I
LR R1,R5 R1=J
BCTR R1,0
LA R14,CRIBLE(R1)
MVI 0(R14),X'00' CRIBLE(J)='0'B
B LOOPJ
ENDLOOPJ EQU *
ENDIF EQU *
B LOOPI
ENDLOOPI EQU *
LA R4,1 I=1
LA R6,1
L R7,N
LOOP BXH R4,R6,ENDLOOP do I=1 to N
LR R1,R4 R1=I
BCTR R1,0
LA R14,CRIBLE(R1)
CLI 0(R14),X'01'
BNE NOTPRIME if not CRIBLE(I)
CVD R4,P P=I
UNPK Z,P Z=P
MVC C,Z C=Z
OI C+L'C-1,X'F0' zap sign
MVC WTOBUF(8),C+8
WTO MF=(E,WTOMSG)
NOTPRIME EQU *
B LOOP
ENDLOOP EQU *
RETURN EQU *
LM R14,R12,12(R13) restore context
XR R15,R15 set return code to 0
BR R14 return to caller
* ---- DATA
I DS F
J DS F
DS 0F
P DS PL8 packed
Z DS ZL16 zoned
C DS CL16 character
WTOMSG DS 0F
DC H'80' length of WTO buffer
DC H'0' must be binary zeroes
WTOBUF DC 80C' '
LTORG
N DC F'100000'
CRIBLE DC 100000X'01'
YREGS
END ERATOST
- Output:
00000002 00000003 00000005 00000007 00000011 00000013 00000017 00000019 00000023 00000029 00000031 00000037 00000041 00000043 00000047 00000053 00000059 00000061 00000067 ... 00099767 00099787 00099793 00099809 00099817 00099823 00099829 00099833 00099839 00099859 00099871 00099877 00099881 00099901 00099907 00099923 00099929 00099961 00099971 00099989 00099991
6502 Assembly
If this subroutine is called with the value of n in the accumulator, it will store an array of the primes less than n beginning at address 1000 hex and return the number of primes it has found in the accumulator.
ERATOS: STA $D0 ; value of n
LDA #$00
LDX #$00
SETUP: STA $1000,X ; populate array
ADC #$01
INX
CPX $D0
BPL SET
JMP SETUP
SET: LDX #$02
SIEVE: LDA $1000,X ; find non-zero
INX
CPX $D0
BPL SIEVED
CMP #$00
BEQ SIEVE
STA $D1 ; current prime
MARK: CLC
ADC $D1
TAY
LDA #$00
STA $1000,Y
TYA
CMP $D0
BPL SIEVE
JMP MARK
SIEVED: LDX #$01
LDY #$00
COPY: INX
CPX $D0
BPL COPIED
LDA $1000,X
CMP #$00
BEQ COPY
STA $2000,Y
INY
JMP COPY
COPIED: TYA ; how many found
RTS
68000 Assembly
Algorithm somewhat optimized: array omits 1, 2, all higher odd numbers. Optimized for storage: uses bit array for prime/composite flags.
Some of the macro code is derived from the examples included with EASy68K. See 68000 "100 Doors" listing for additional information.
*-----------------------------------------------------------
* Title : BitSieve
* Written by : G. A. Tippery
* Date : 2014-Feb-24, 2013-Dec-22
* Description: Prime number sieve
*-----------------------------------------------------------
ORG $1000
** ---- Generic macros ---- **
PUSH MACRO
MOVE.L \1,-(SP)
ENDM
POP MACRO
MOVE.L (SP)+,\1
ENDM
DROP MACRO
ADDQ #4,SP
ENDM
PUTS MACRO
** Print a null-terminated string w/o CRLF **
** Usage: PUTS stringaddress
** Returns with D0, A1 modified
MOVEQ #14,D0 ; task number 14 (display null string)
LEA \1,A1 ; address of string
TRAP #15 ; display it
ENDM
GETN MACRO
MOVEQ #4,D0 ; Read a number from the keyboard into D1.L.
TRAP #15
ENDM
** ---- Application-specific macros ---- **
val MACRO ; Used by bit sieve. Converts bit address to the number it represents.
ADD.L \1,\1 ; double it because odd numbers are omitted
ADDQ #3,\1 ; add offset because initial primes (1, 2) are omitted
ENDM
* ** ================================================================================ **
* ** Integer square root routine, bisection method **
* ** IN: D0, should be 0<D0<$10000 (65536) -- higher values MAY work, no guarantee
* ** OUT: D1
*
SquareRoot:
*
MOVEM.L D2-D4,-(SP) ; save registers needed for local variables
* DO == n
* D1 == a
* D2 == b
* D3 == guess
* D4 == temp
*
* a = 1;
* b = n;
MOVEQ #1,D1
MOVE.L D0,D2
* do {
REPEAT
* guess = (a+b)/2;
MOVE.L D1,D3
ADD.L D2,D3
LSR.L #1,D3
* if (guess*guess > n) { // inverse function of sqrt is square
MOVE.L D3,D4
MULU D4,D4 ; guess^2
CMP.L D0,D4
BLS .else
* b = guess;
MOVE.L D3,D2
BRA .endif
* } else {
.else:
* a = guess;
MOVE.L D3,D1
* } //if
.endif:
* } while ((b-a) > 1); ; Same as until (b-a)<=1 or until (a-b)>=1
MOVE.L D2,D4
SUB.L D1,D4 ; b-a
UNTIL.L D4 <LE> #1 DO.S
* return (a) ; Result is in D1
* } //LongSqrt()
MOVEM.L (SP)+,D2-D4 ; restore saved registers
RTS
*
* ** ================================================================================ **
** ======================================================================= **
*
** Prime-number Sieve of Eratosthenes routine using a big bit field for flags **
* Enter with D0 = size of sieve (bit array)
* Prints found primes 10 per line
* Returns # prime found in D6
*
* Register usage:
*
* D0 == n
* D1 == prime
* D2 == sqroot
* D3 == PIndex
* D4 == CIndex
* D5 == MaxIndex
* D6 == PCount
*
* A0 == PMtx[0]
*
* On return, all registers above except D0 are modified. Could add MOVEMs to save and restore D2-D6/A0.
*
** ------------------------ **
GetBit: ** sub-part of Sieve subroutine **
** Entry: bit # is on TOS
** Exit: A6 holds the byte number, D7 holds the bit number within the byte
** Note: Input param is still on TOS after return. Could have passed via a register, but
** wanted to practice with stack. :)
*
MOVE.L (4,SP),D7 ; get value from (pre-call) TOS
ASR.L #3,D7 ; /8
MOVEA D7,A6 ; byte #
MOVE.L (4,SP),D7 ; get value from (pre-call) TOS
AND.L #$7,D7 ; bit #
RTS
** ------------------------ **
Sieve:
MOVE D0,D5
SUBQ #1,D5
JSR SquareRoot ; sqrt D0 => D1
MOVE.L D1,D2
LEA PArray,A0
CLR.L D3
*
PrimeLoop:
MOVE.L D3,D1
val D1
MOVE.L D3,D4
ADD.L D1,D4
*
CxLoop: ; Goes through array marking multiples of d1 as composite numbers
CMP.L D5,D4
BHI ExitCx
PUSH D4 ; set D7 as bit # and A6 as byte pointer for D4'th bit of array
JSR GetBit
DROP
BSET D7,0(A0,A6.L) ; set bit to mark as composite number
ADD.L D1,D4 ; next number to mark
BRA CxLoop
ExitCx:
CLR.L D1 ; Clear new-prime-found flag
ADDQ #1,D3 ; Start just past last prime found
PxLoop: ; Searches for next unmarked (not composite) number
CMP.L D2,D3 ; no point searching past where first unmarked multiple would be past end of array
BHI ExitPx ; if past end of array
TST.L D1
BNE ExitPx ; if flag set, new prime found
PUSH D3 ; check D3'th bit flag
JSR GetBit ; sets D7 as bit # and A6 as byte pointer
DROP ; drop TOS
BTST D7,0(A0,A6.L) ; read bit flag
BNE IsSet ; If already tagged as composite
MOVEQ #-1,D1 ; Set flag that we've found a new prime
IsSet:
ADDQ #1,D3 ; next PIndex
BRA PxLoop
ExitPx:
SUBQ #1,D3 ; back up PIndex
TST.L D1 ; Did we find a new prime #?
BNE PrimeLoop ; If another prime # found, go process it
*
; fall through to print routine
** ------------------------ **
* Print primes found
*
* D4 == Column count
*
* Print header and assumed primes (#1, #2)
PUTS Header ; Print string @ Header, no CR/LF
MOVEQ #2,D6 ; Start counter at 2 because #1 and #2 are assumed primes
MOVEQ #2,D4
*
MOVEQ #0,D3
PrintLoop:
CMP.L D5,D3
BHS ExitPL
PUSH D3
JSR GetBit ; sets D7 as bit # and A6 as byte pointer
DROP ; drop TOS
BTST D7,0(A0,A6.L)
BNE NotPrime
* printf(" %6d", val(PIndex)
MOVE.L D3,D1
val D1
AND.L #$0000FFFF,D1
MOVEQ #6,D2
MOVEQ #20,D0 ; display signed RJ
TRAP #15
ADDQ #1,D4
ADDQ #1,D6
* *** Display formatting ***
* if((PCount % 10) == 0) printf("\n");
CMP #10,D4
BLO NoLF
PUTS CRLF
MOVEQ #0,D4
NoLF:
NotPrime:
ADDQ #1,D3
BRA PrintLoop
ExitPL:
RTS
** ======================================================================= **
N EQU 5000 ; *** Size of boolean (bit) array ***
SizeInBytes EQU (N+7)/8
*
START: ; first instruction of program
MOVE.L #N,D0 ; # to test
JSR Sieve
* printf("\n %d prime numbers found.\n", D6); ***
PUTS Summary1,A1
MOVE #3,D0 ; Display signed number in D1.L in decimal in smallest field.
MOVE.W D6,D1
TRAP #15
PUTS Summary2,A1
SIMHALT ; halt simulator
** ======================================================================= **
* Variables and constants here
ORG $2000
CR EQU 13
LF EQU 10
CRLF DC.B CR,LF,$00
PArray: DCB.B SizeInBytes,0
Header: DC.B CR,LF,LF,' Primes',CR,LF,' ======',CR,LF
DC.B ' 1 2',$00
Summary1: DC.B CR,LF,' ',$00
Summary2: DC.B ' prime numbers found.',CR,LF,$00
END START ; last line of source
8086 Assembly
MAXPRM: equ 5000 ; Change this value for more primes
cpu 8086
bits 16
org 100h
section .text
erato: mov cx,MAXPRM ; Initialize array (set all items to prime)
mov bp,cx ; Keep a copy in BP
mov di,sieve
mov al,1
rep stosb
;;; Sieve
mov bx,sieve ; Set base register to array
inc cx ; CX=1 (CH=0, CL=1); CX was 0 before
mov si,cx ; Start at number 2 (1+1)
.next: inc si ; Next number
cmp cl,[bx+si] ; Is this number marked as prime?
jne .next ; If not, try next number
mov ax,si ; Otherwise, calculate square,
mul si
mov di,ax ; and put it in DI
cmp di,bp ; Check array bounds
ja output ; We're done when SI*SI>MAXPRM
.mark: mov [bx+di],ch ; Mark byte as composite
add di,si ; Next composite
cmp di,bp ; While maximum not reached
jbe .mark
jmp .next
;;; Output
output: mov si,2 ; Start at 2
.test: dec byte [bx+si] ; Prime?
jnz .next ; If not, try next number
mov ax,si ; Otherwise, print number
call prax
.next: inc si
cmp si,MAXPRM
jbe .test
ret
;;; Write number in AX to standard output (using MS-DOS)
prax: push bx ; Save BX
mov bx,numbuf
mov bp,10 ; Divisor
.loop: xor dx,dx ; Divide AX by 10, modulus in DX
div bp
add dl,'0' ; ASCII digit
dec bx
mov [bx],dl ; Store ASCII digit
test ax,ax ; More digits?
jnz .loop
mov dx,bx ; Print number
mov ah,9 ; 9 = MS-DOS syscall to print string
int 21h
pop bx ; Restore BX
ret
section .data
db '*****' ; Room for number
numbuf: db 13,10,'$'
section .bss
sieve: resb MAXPRM
- Output:
2 3 5 7 11 ... 4969 4973 4987 4993 4999
8th
with: n
\ create a new buffer which will function as a bit vector
: bit-vec SED: n -- b
dup 3 shr swap 7 band if 1+ then b:new b:clear ;
\ given a buffer, sieving prime, and limit, cross off multiples
\ of the sieving prime.
: +composites SED: b n n -- b
>r dup sqr rot \ want: -- n index b
repeat
over 1- true b:bit!
>r over + r>
over r@ > until!
rdrop nip nip ;
\ SoE algorithm proper
: make-sieve SED: n -- b
dup>r bit-vec 2
repeat
tuck 1- b:bit@ not
if
over r@ +composites
then swap 1+
dup sqr r@ < while!
rdrop drop ;
\ traverse the final buffer, creating an array of primes
: sieve>a SED: b n -- a
>r a:new swap
( 1- b:bit@ not if >r I a:push r> then ) 2 r> loop drop ;
;with
: sieve SED: n -- a
dup make-sieve swap sieve>a ;
- Output:
ok> 100 sieve . [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97] ok> 1_000_000 sieve a:len . \ count primes up to 1,000,000 78498 ok> -1 a:@ . \ largest prime < 1,000,000 999983
AArch64 Assembly
/* ARM assembly AARCH64 Raspberry PI 3B */
/* program cribleEras64.s */
/*******************************************/
/* Constantes file */
/*******************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeConstantesARM64.inc"
.equ MAXI, 100
/*********************************/
/* Initialized data */
/*********************************/
.data
sMessResult: .asciz "Prime : @ \n"
szCarriageReturn: .asciz "\n"
/*********************************/
/* UnInitialized data */
/*********************************/
.bss
sZoneConv: .skip 24
TablePrime: .skip 8 * MAXI
/*********************************/
/* code section */
/*********************************/
.text
.global main
main: // entry of program
ldr x4,qAdrTablePrime // address prime table
mov x0,#2 // prime 2
bl displayPrime
mov x1,#2
mov x2,#1
1: // loop for multiple of 2
str x2,[x4,x1,lsl #3] // mark multiple of 2
add x1,x1,#2
cmp x1,#MAXI // end ?
ble 1b // no loop
mov x1,#3 // begin indice
mov x3,#1
2:
ldr x2,[x4,x1,lsl #3] // load table élément
cmp x2,#1 // is prime ?
beq 4f
mov x0,x1 // yes -> display
bl displayPrime
mov x2,x1
3: // and loop to mark multiples of this prime
str x3,[x4,x2,lsl #3]
add x2,x2,x1 // add the prime
cmp x2,#MAXI // end ?
ble 3b // no -> loop
4:
add x1,x1,2 // other prime in table
cmp x1,MAXI // end table ?
ble 2b // no -> loop
100: // standard end of the program
mov x0,0 // return code
mov x8,EXIT // request to exit program
svc 0 // perform the system call
qAdrszCarriageReturn: .quad szCarriageReturn
qAdrsMessResult: .quad sMessResult
qAdrTablePrime: .quad TablePrime
/******************************************************************/
/* Display prime table elements */
/******************************************************************/
/* x0 contains the prime */
displayPrime:
stp x1,lr,[sp,-16]! // save registers
ldr x1,qAdrsZoneConv
bl conversion10 // call décimal conversion
ldr x0,qAdrsMessResult
ldr x1,qAdrsZoneConv // insert conversion in message
bl strInsertAtCharInc
bl affichageMess // display message
100:
ldp x1,lr,[sp],16 // restaur 2 registers
ret // return to address lr x30
qAdrsZoneConv: .quad sZoneConv
/********************************************************/
/* File Include fonctions */
/********************************************************/
/* for this file see task include a file in language AArch64 assembly */
.include "../includeARM64.inc"
Prime : 2 Prime : 3 Prime : 5 Prime : 7 Prime : 11 Prime : 13 Prime : 17 Prime : 19 Prime : 23 Prime : 29 Prime : 31 Prime : 37 Prime : 41 Prime : 43 Prime : 47 Prime : 53 Prime : 59 Prime : 61 Prime : 67 Prime : 71 Prime : 73 Prime : 79 Prime : 83 Prime : 89 Prime : 97
ABAP
PARAMETERS: p_limit TYPE i OBLIGATORY DEFAULT 100.
AT SELECTION-SCREEN ON p_limit.
IF p_limit LE 1.
MESSAGE 'Limit must be higher then 1.' TYPE 'E'.
ENDIF.
START-OF-SELECTION.
FIELD-SYMBOLS: <fs_prime> TYPE flag.
DATA: gt_prime TYPE TABLE OF flag,
gv_prime TYPE flag,
gv_i TYPE i,
gv_j TYPE i.
DO p_limit TIMES.
IF sy-index > 1.
gv_prime = abap_true.
ELSE.
gv_prime = abap_false.
ENDIF.
APPEND gv_prime TO gt_prime.
ENDDO.
gv_i = 2.
WHILE ( gv_i <= trunc( sqrt( p_limit ) ) ).
IF ( gt_prime[ gv_i ] EQ abap_true ).
gv_j = gv_i ** 2.
WHILE ( gv_j <= p_limit ).
gt_prime[ gv_j ] = abap_false.
gv_j = ( gv_i ** 2 ) + ( sy-index * gv_i ).
ENDWHILE.
ENDIF.
gv_i = gv_i + 1.
ENDWHILE.
LOOP AT gt_prime INTO gv_prime.
IF gv_prime = abap_true.
WRITE: / sy-tabix.
ENDIF.
ENDLOOP.
ABC
HOW TO SIEVE UP TO n:
SHARE sieve
PUT {} IN sieve
FOR cand IN {2..n}: PUT 1 IN sieve[cand]
FOR cand IN {2..floor root n}:
IF sieve[cand] = 1:
PUT cand*cand IN comp
WHILE comp <= n:
PUT 0 IN sieve[comp]
PUT comp+cand IN comp
HOW TO REPORT prime n:
SHARE sieve
IF n<2: FAIL
REPORT sieve[n] = 1
SIEVE UP TO 100
FOR n IN {1..100}:
IF prime n: WRITE n
- Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
ACL2
(defun nats-to-from (n i)
(declare (xargs :measure (nfix (- n i))))
(if (zp (- n i))
nil
(cons i (nats-to-from n (+ i 1)))))
(defun remove-multiples-up-to-r (factor limit xs i)
(declare (xargs :measure (nfix (- limit i))))
(if (or (> i limit)
(zp (- limit i))
(zp factor))
xs
(remove-multiples-up-to-r
factor
limit
(remove i xs)
(+ i factor))))
(defun remove-multiples-up-to (factor limit xs)
(remove-multiples-up-to-r factor limit xs (* factor 2)))
(defun sieve-r (factor limit)
(declare (xargs :measure (nfix (- limit factor))))
(if (zp (- limit factor))
(nats-to-from limit 2)
(remove-multiples-up-to factor (+ limit 1)
(sieve-r (1+ factor) limit))))
(defun sieve (limit)
(sieve-r 2 limit))
Action!
DEFINE MAX="1000"
PROC Main()
BYTE ARRAY t(MAX+1)
INT i,j,k,first
FOR i=0 TO MAX
DO
t(i)=1
OD
t(0)=0
t(1)=0
i=2 first=1
WHILE i<=MAX
DO
IF t(i)=1 THEN
IF first=0 THEN
Print(", ")
FI
PrintI(i)
FOR j=2*i TO MAX STEP i
DO
t(j)=0
OD
first=0
FI
i==+1
OD
RETURN
- Output:
Screenshot from Atari 8-bit computer
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997
ActionScript
Works with ActionScript 3.0 (this is utilizing the actions panel, not a separated class file)
function eratosthenes(limit:int):Array
{
var primes:Array = new Array();
if (limit >= 2) {
var sqrtlmt:int = int(Math.sqrt(limit) - 2);
var nums:Array = new Array(); // start with an empty Array...
for (var i:int = 2; i <= limit; i++) // and
nums.push(i); // only initialize the Array once...
for (var j:int = 0; j <= sqrtlmt; j++) {
var p:int = nums[j]
if (p)
for (var t:int = p * p - 2; t < nums.length; t += p)
nums[t] = 0;
}
for (var m:int = 0; m < nums.length; m++) {
var r:int = nums[m];
if (r)
primes.push(r);
}
}
return primes;
}
var e:Array = eratosthenes(1000);
trace(e);
Output:
- Output:
2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,127,131,137,139,149,151,157,163,167,173,179,181,191,193,197,199,211,223,227,229,233,239,241,251,257,263,269,271,277,281,283,293,307,311,313,317,331,337,347,349,353,359,367,373,379,383,389,397,401,409,419,421,431,433,439,443,449,457,461,463,467,479,487,491,499,503,509,521,523,541,547,557,563,569,571,577,587,593,599,601,607,613,617,619,631,641,643,647,653,659,661,673,677,683,691,701,709,719,727,733,739,743,751,757,761,769,773,787,797,809,811,821,823,827,829,839,853,857,859,863,877,881,883,887,907,911,919,929,937,941,947,953,967,971,977,983,991,997
Ada
with Ada.Text_IO, Ada.Command_Line;
procedure Eratos is
Last: Positive := Positive'Value(Ada.Command_Line.Argument(1));
Prime: array(1 .. Last) of Boolean := (1 => False, others => True);
Base: Positive := 2;
Cnt: Positive;
begin
while Base * Base <= Last loop
if Prime(Base) then
Cnt := Base + Base;
while Cnt <= Last loop
Prime(Cnt) := False;
Cnt := Cnt + Base;
end loop;
end if;
Base := Base + 1;
end loop;
Ada.Text_IO.Put("Primes less or equal" & Positive'Image(Last) &" are:");
for Number in Prime'Range loop
if Prime(Number) then
Ada.Text_IO.Put(Positive'Image(Number));
end if;
end loop;
end Eratos;
- Output:
> ./eratos 31 Primes less or equal 31 are : 2 3 5 7 11 13 17 19 23 29 31
Agda
-- imports
open import Data.Nat as ℕ using (ℕ; suc; zero; _+_; _∸_)
open import Data.Vec as Vec using (Vec; _∷_; []; tabulate; foldr)
open import Data.Fin as Fin using (Fin; suc; zero)
open import Function using (_∘_; const; id)
open import Data.List as List using (List; _∷_; [])
open import Data.Maybe using (Maybe; just; nothing)
-- Without square cutoff optimization
module Simple where
primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve (tabulate (just ∘ suc))
where
sieve : ∀ {n} → Vec (Maybe (Fin (2 + m))) n → List (Fin (3 + m))
sieve [] = []
sieve (nothing ∷ xs) = sieve xs
sieve (just x ∷ xs) = suc x ∷ sieve (foldr B remove (const []) xs x)
where
B = λ n → ∀ {i} → Fin i → Vec (Maybe (Fin (2 + m))) n
remove : ∀ {n} → Maybe (Fin (2 + m)) → B n → B (suc n)
remove _ ys zero = nothing ∷ ys x
remove y ys (suc z) = y ∷ ys z
-- With square cutoff optimization
module SquareOpt where
primes : ∀ n → List (Fin n)
primes zero = []
primes (suc zero) = []
primes (suc (suc zero)) = []
primes (suc (suc (suc m))) = sieve 1 m (Vec.tabulate (just ∘ Fin.suc ∘ Fin.suc))
where
sieve : ∀ {n} → ℕ → ℕ → Vec (Maybe (Fin (3 + m))) n → List (Fin (3 + m))
sieve _ zero = List.mapMaybe id ∘ Vec.toList
sieve _ (suc _) [] = []
sieve i (suc l) (nothing ∷ xs) = sieve (suc i) (l ∸ i ∸ i) xs
sieve i (suc l) (just x ∷ xs) = x ∷ sieve (suc i) (l ∸ i ∸ i) (Vec.foldr B remove (const []) xs i)
where
B = λ n → ℕ → Vec (Maybe (Fin (3 + m))) n
remove : ∀ {i} → Maybe (Fin (3 + m)) → B i → B (suc i)
remove _ ys zero = nothing ∷ ys i
remove y ys (suc j) = y ∷ ys j
Agena
Tested with Agena 2.9.5 Win32
# Sieve of Eratosthenes
# generate and return a sequence containing the primes up to sieveSize
sieve := proc( sieveSize :: number ) :: sequence is
local sieve, result;
result := seq(); # sequence of primes - initially empty
create register sieve( sieveSize ); # "vector" to be sieved
sieve[ 1 ] := false;
for sPos from 2 to sieveSize do sieve[ sPos ] := true od;
# sieve the primes
for sPos from 2 to entier( sqrt( sieveSize ) ) do
if sieve[ sPos ] then
for p from sPos * sPos to sieveSize by sPos do
sieve[ p ] := false
od
fi
od;
# construct the sequence of primes
for sPos from 1 to sieveSize do
if sieve[ sPos ] then insert sPos into result fi
od
return result
end; # sieve
# test the sieve proc
for i in sieve( 100 ) do write( " ", i ) od; print();
- Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
ALGOL 60
Based on the 1962 Revised Repport:
comment Sieve of Eratosthenes; begin integer array t[0:1000]; integer i,j,k; for i:=0 step 1 until 1000 do t[i]:=1; t[0]:=0; t[1]:=0; i:=0; for i:=i while i<1000 do begin for i:=i while i<1000 and t[i]=0 do i:=i+1; if i<1000 then begin j:=2; k:=j*i; for k:=k while k<1000 do begin t[k]:=0; j:=j+1; k:=j*i end; i:=i+1 end end; for i:=0 step 1 until 999 do if t[i]≠0 then print(i,ꞌ is primeꞌ) end
An 1964 Implementation:
Works with: ALGOL 60 for OS/360
'BEGIN'
'INTEGER' 'ARRAY' CANDIDATES(/0..1000/);
'INTEGER' I,J,K;
'COMMENT' SET LINE-LENGTH=120,SET LINES-PER-PAGE=62,OPEN;
SYSACT(1,6,120); SYSACT(1,8,62); SYSACT(1,12,1);
'FOR' I := 0 'STEP' 1 'UNTIL' 1000 'DO'
'BEGIN'
CANDIDATES(/I/) := 1;
'END';
CANDIDATES(/0/) := 0;
CANDIDATES(/1/) := 0;
I := 0;
'FOR' I := I 'WHILE' I 'LESS' 1000 'DO'
'BEGIN'
'FOR' I := I 'WHILE' I 'LESS' 1000
'AND' CANDIDATES(/I/) 'EQUAL' 0 'DO'
I := I+1;
'IF' I 'LESS' 1000 'THEN'
'BEGIN'
J := 2;
K := J*I;
'FOR' K := K 'WHILE' K 'LESS' 1000 'DO'
'BEGIN'
CANDIDATES(/K/) := 0;
J := J + 1;
K := J*I;
'END';
I := I+1;
'END';
'FOR' I := 0 'STEP' 1 'UNTIL' 999 'DO'
'IF' CANDIDATES(/I/) 'NOTEQUAL' 0 'THEN'
'BEGIN'
OUTINTEGER(1,I);
OUTSTRING(1,'(' IS PRIME')');
'COMMENT' NEW LINE;
SYSACT(1,14,1)
'END'
'END'
'END'
ALGOL 68
BOOL prime = TRUE, non prime = FALSE;
PROC eratosthenes = (INT n)[]BOOL:
(
[n]BOOL sieve;
FOR i TO UPB sieve DO sieve[i] := prime OD;
INT m = ENTIER sqrt(n);
sieve[1] := non prime;
FOR i FROM 2 TO m DO
IF sieve[i] = prime THEN
FOR j FROM i*i BY i TO n DO
sieve[j] := non prime
OD
FI
OD;
sieve
);
print((eratosthenes(80),new line))
- Output:
FTTFTFTFFFTFTFFFTFTFFFTFFFFFTFTFFFFFTFFFTFTFFFTFFFFFTFFFFFTFTFFFFFTFFFTFTFFFFFTF
ALGOL W
Standard, non-optimised sieve
begin
% implements the sieve of Eratosthenes %
% s(i) is set to true if i is prime, false otherwise %
% algol W doesn't have a upb operator, so we pass the size of the %
% array in n %
procedure sieve( logical array s ( * ); integer value n ) ;
begin
% start with everything flagged as prime %
for i := 1 until n do s( i ) := true;
% sieve out the non-primes %
s( 1 ) := false;
for i := 2 until truncate( sqrt( n ) )
do begin
if s( i )
then begin
for p := i * i step i until n do s( p ) := false
end if_s_i
end for_i ;
end sieve ;
% test the sieve procedure %
integer sieveMax;
sieveMax := 100;
begin
logical array s ( 1 :: sieveMax );
i_w := 2; % set output field width %
s_w := 1; % and output separator width %
% find and display the primes %
sieve( s, sieveMax );
for i := 1 until sieveMax do if s( i ) then writeon( i );
end
end.
- Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Odd numbers only version
Alternative version that only stores odd numbers greater than 1 in the sieve.
begin
% implements the sieve of Eratosthenes %
% only odd numbers appear in the sieve, which starts at 3 %
% s( i ) is set to true if ( i * 2 ) + 1 is prime %
procedure sieve2( logical array s ( * ); integer value n ) ;
begin
% start with everything flagged as prime %
for i := 1 until n do s( i ) := true;
% sieve out the non-primes %
% the subscripts of s are 1 2 3 4 5 6 7 8 9 10 11 12 13... %
% which correspond to 3 5 7 9 11 13 15 17 19 21 23 25 27... %
for i := 1 until truncate( sqrt( n ) ) do begin
if s( i ) then begin
integer ip;
ip := ( i * 2 ) + 1;
for p := i + ip step ip until n do s( p ) := false
end if_s_i
end for_i ;
end sieve2 ;
% test the sieve2 procedure %
integer primeMax, arrayMax;
primeMax := 100;
arrayMax := ( primeMax div 2 ) - 1;
begin
logical array s ( 1 :: arrayMax);
i_w := 2; % set output field width %
s_w := 1; % and output separator width %
% find and display the primes %
sieve2( s, arrayMax );
write( 2 );
for i := 1 until arrayMax do if s( i ) then writeon( ( i * 2 ) + 1 );
end
end.
- Output:
Same as the standard version.
ALGOL-M
BEGIN
COMMENT
FIND PRIMES UP TO THE SPECIFIED LIMIT (HERE 1,000) USING
CLASSIC SIEVE OF ERATOSTHENES;
% CALCULATE INTEGER SQUARE ROOT %
INTEGER FUNCTION ISQRT(N);
INTEGER N;
BEGIN
INTEGER R1, R2;
R1 := N;
R2 := 1;
WHILE R1 > R2 DO
BEGIN
R1 := (R1+R2) / 2;
R2 := N / R1;
END;
ISQRT := R1;
END;
INTEGER LIMIT, I, J, FALSE, TRUE, COL, COUNT;
INTEGER ARRAY FLAGS[1:1000];
LIMIT := 1000;
FALSE := 0;
TRUE := 1;
WRITE("FINDING PRIMES FROM 2 TO",LIMIT);
% INITIALIZE TABLE %
WRITE("INITIALIZING ... ");
FOR I := 1 STEP 1 UNTIL LIMIT DO
FLAGS[I] := TRUE;
% SIEVE FOR PRIMES %
WRITEON("SIEVING ... ");
FOR I := 2 STEP 1 UNTIL ISQRT(LIMIT) DO
BEGIN
IF FLAGS[I] = TRUE THEN
FOR J := (I * I) STEP I UNTIL LIMIT DO
FLAGS[J] := FALSE;
END;
% WRITE OUT THE PRIMES TEN PER LINE %
WRITEON("PRINTING");
COUNT := 0;
COL := 1;
WRITE("");
FOR I := 2 STEP 1 UNTIL LIMIT DO
BEGIN
IF FLAGS[I] = TRUE THEN
BEGIN
WRITEON(I);
COUNT := COUNT + 1;
COL := COL + 1;
IF COL > 10 THEN
BEGIN
WRITE("");
COL := 1;
END;
END;
END;
WRITE("");
WRITE(COUNT, " PRIMES WERE FOUND.");
END
- Output:
FINDING PRIMES FROM 2 TO 1000 INTIALIZING ... SIEVING ... PRINTING 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 . . . 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997 168 PRIMES WERE FOUND.
APL
All these versions requires ⎕io←0 (index origin 0).
It would have been better to require a result of the boolean mask rather than the actual list of primes. The list of primes obtains readily from the mask by application of a simple function (here {⍵/⍳⍴⍵}). Other related computations (such as the number of primes < n) obtain readily from the mask, easier than producing the list of primes.
Non-Optimized Version
sieve2←{
b←⍵⍴1
b[⍳2⌊⍵]←0
2≥⍵:b
p←{⍵/⍳⍴⍵}∇⌈⍵*0.5
m←1+⌊(⍵-1+p×p)÷p
b ⊣ p {b[⍺×⍺+⍳⍵]←0}¨ m
}
primes2←{⍵/⍳⍴⍵}∘sieve2
The required list of prime divisors obtains by recursion ({⍵/⍳⍴⍵}∇⌈⍵*0.5).
=== Optimized Version ===|
sieve←{
b←⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5
b[⍳6⌊⍵]←(6⌊⍵)⍴0 0 1 1 0 1
49≥⍵:b
p←3↓{⍵/⍳⍴⍵}∇⌈⍵*0.5
m←1+⌊(⍵-1+p×p)÷2×p
b ⊣ p {b[⍺×⍺+2×⍳⍵]←0}¨ m
}
primes←{⍵/⍳⍴⍵}∘sieve
The optimizations are as follows:
- Multiples of 2 3 5 are marked by initializing b with ⍵⍴{∧⌿↑(×/⍵)⍴¨~⍵↑¨1}2 3 5 rather than with ⍵⍴1.
- Subsequently, only odd multiples of primes > 5 are marked.
- Multiples of a prime to be marked start at its square.
Examples
primes 100
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
primes¨ ⍳14
┌┬┬┬─┬───┬───┬─────┬─────┬───────┬───────┬───────┬───────┬──────────┬──────────┐
││││2│2 3│2 3│2 3 5│2 3 5│2 3 5 7│2 3 5 7│2 3 5 7│2 3 5 7│2 3 5 7 11│2 3 5 7 11│
└┴┴┴─┴───┴───┴─────┴─────┴───────┴───────┴───────┴───────┴──────────┴──────────┘
sieve 13
0 0 1 1 0 1 0 1 0 0 0 1 0
+/∘sieve¨ 10*⍳10
0 4 25 168 1229 9592 78498 664579 5761455 50847534
The last expression computes the number of primes < 1e0 1e1 ... 1e9. The last number 50847534 can perhaps be called the anti-Bertelsen's number (http://mathworld.wolfram.com/BertelsensNumber.html).
AppleScript
on sieveOfEratosthenes(limit)
script o
property numberList : {missing value}
end script
repeat with n from 2 to limit
set end of o's numberList to n
end repeat
repeat with n from 2 to (limit ^ 0.5 div 1)
if (item n of o's numberList is n) then
repeat with multiple from (n * n) to limit by n
set item multiple of o's numberList to missing value
end repeat
end if
end repeat
return o's numberList's numbers
end sieveOfEratosthenes
sieveOfEratosthenes(1000)
- Output:
{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997}
ARM Assembly
/* ARM assembly Raspberry PI */
/* program cribleEras.s */
/* REMARK 1 : this program use routines in a include file
see task Include a file language arm assembly
for the routine affichageMess conversion10
see at end of this program the instruction include */
/* for constantes see task include a file in arm assembly */
/************************************/
/* Constantes */
/************************************/
.include "../constantes.inc"
.equ MAXI, 101
/*********************************/
/* Initialized data */
/*********************************/
.data
sMessResult: .asciz "Prime : @ \n"
szCarriageReturn: .asciz "\n"
/*********************************/
/* UnInitialized data */
/*********************************/
.bss
sZoneConv: .skip 24
TablePrime: .skip 4 * MAXI
/*********************************/
/* code section */
/*********************************/
.text
.global main
main: @ entry of program
ldr r4,iAdrTablePrime @ address prime table
mov r0,#2 @ prime 2
bl displayPrime
mov r1,#2
mov r2,#1
1: @ loop for multiple of 2
str r2,[r4,r1,lsl #2] @ mark multiple of 2
add r1,#2
cmp r1,#MAXI @ end ?
ble 1b @ no loop
mov r1,#3 @ begin indice
mov r3,#1
2:
ldr r2,[r4,r1,lsl #2] @ load table élément
cmp r2,#1 @ is prime ?
beq 4f
mov r0,r1 @ yes -> display
bl displayPrime
mov r2,r1
3: @ and loop to mark multiples of this prime
str r3,[r4,r2,lsl #2]
add r2,r1 @ add the prime
cmp r2,#MAXI @ end ?
ble 3b @ no -> loop
4:
add r1,#2 @ other prime in table
cmp r1,#MAXI @ end table ?
ble 2b @ no -> loop
100: @ standard end of the program
mov r0, #0 @ return code
mov r7, #EXIT @ request to exit program
svc #0 @ perform the system call
iAdrszCarriageReturn: .int szCarriageReturn
iAdrsMessResult: .int sMessResult
iAdrTablePrime: .int TablePrime
/******************************************************************/
/* Display prime table elements */
/******************************************************************/
/* r0 contains the prime */
displayPrime:
push {r1,lr} @ save registers
ldr r1,iAdrsZoneConv
bl conversion10 @ call décimal conversion
ldr r0,iAdrsMessResult
ldr r1,iAdrsZoneConv @ insert conversion in message
bl strInsertAtCharInc
bl affichageMess @ display message
100:
pop {r1,lr}
bx lr
iAdrsZoneConv: .int sZoneConv
/***************************************************/
/* ROUTINES INCLUDE */
/***************************************************/
.include "../affichage.inc"
Prime : 2 Prime : 3 Prime : 5 Prime : 7 Prime : 11 Prime : 13 Prime : 17 Prime : 19 Prime : 23 Prime : 29 Prime : 31 Prime : 37 Prime : 41 Prime : 43 Prime : 47 Prime : 53 Prime : 59 Prime : 61 Prime : 67 Prime : 71 Prime : 73 Prime : 79 Prime : 83 Prime : 89 Prime : 97 Prime : 101
Arturo
sieve: function [upto][
composites: array.of: inc upto false
loop 2..to :integer sqrt upto 'x [
if not? composites\[x][
loop range.step: x x^2 upto 'c [
composites\[c]: true
]
]
]
result: new []
loop.with:'i composites 'c [
unless c -> 'result ++ i
]
return result -- [0,1]
]
print sieve 100
- Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
AutoHotkey
Search autohotkey.com: of Eratosthenes
Source: AutoHotkey forum by Laszlo
MsgBox % "12345678901234567890`n" Sieve(20)
Sieve(n) { ; Sieve of Eratosthenes => string of 0|1 chars, 1 at position k: k is prime
Static zero := 48, one := 49 ; Asc("0"), Asc("1")
VarSetCapacity(S,n,one)
NumPut(zero,S,0,"char")
i := 2
Loop % sqrt(n)-1 {
If (NumGet(S,i-1,"char") = one)
Loop % n//i
If (A_Index > 1)
NumPut(zero,S,A_Index*i-1,"char")
i += 1+(i>2)
}
Return S
}
Alternative Version
Sieve_of_Eratosthenes(n){
arr := []
loop % n-1
if A_Index>1
arr[A_Index] := true
for i, v in arr {
if (i>Sqrt(n))
break
else if arr[i]
while ((j := i*2 + (A_Index-1)*i) < n)
arr.delete(j)
}
return Arr
}
Examples:
n := 101
Arr := Sieve_of_Eratosthenes(n)
loop, % n-1
output .= (Arr[A_Index] ? A_Index : ".") . (!Mod(A_Index, 10) ? "`n" : "`t")
MsgBox % output
return
- Output:
. 2 3 . 5 . 7 . . . 11 . 13 . . . 17 . 19 . . . 23 . . . . . 29 . 31 . . . . . 37 . . . 41 . 43 . . . 47 . . . . . 53 . . . . . 59 . 61 . . . . . 67 . . . 71 . 73 . . . . . 79 . . . 83 . . . . . 89 . . . . . . . 97 . . .
AutoIt
#include <Array.au3>
$M = InputBox("Integer", "Enter biggest Integer")
Global $a[$M], $r[$M], $c = 1
For $i = 2 To $M -1
If Not $a[$i] Then
$r[$c] = $i
$c += 1
For $k = $i To $M -1 Step $i
$a[$k] = True
Next
EndIf
Next
$r[0] = $c - 1
ReDim $r[$c]
_ArrayDisplay($r)
AWK
An initial array holds all numbers 2..max (which is entered on stdin); then all products of integers are deleted from it; the remaining are displayed in the unsorted appearance of a hash table. Here, the script is entered directly on the commandline, and input entered on stdin:
$ awk '{for(i=2;i<=$1;i++) a[i]=1; > for(i=2;i<=sqrt($1);i++) for(j=2;j<=$1;j++) delete a[i*j]; > for(i in a) printf i" "}' 100 71 53 17 5 73 37 19 83 47 29 7 67 59 11 97 79 89 31 13 41 23 2 61 43 3
The following variant does not unset non-primes, but sets them to 0, to preserve order in output:
$ awk '{for(i=2;i<=$1;i++) a[i]=1; > for(i=2;i<=sqrt($1);i++) for(j=2;j<=$1;j++) a[i*j]=0; > for(i=2;i<=$1;i++) if(a[i])printf i" "}' 100 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Now with the script from a file, input from commandline as well as stdin, and input is checked for valid numbers:
# usage: gawk -v n=101 -f sieve.awk
function sieve(n) { # print n,":"
for(i=2; i<=n; i++) a[i]=1;
for(i=2; i<=sqrt(n);i++) for(j=2;j<=n;j++) a[i*j]=0;
for(i=2; i<=n; i++) if(a[i]) printf i" "
print ""
}
BEGIN { print "Sieve of Eratosthenes:"
if(n>1) sieve(n)
}
{ n=$1+0 }
n<2 { exit }
{ sieve(n) }
END { print "Bye!" }
Here is an alternate version that uses an associative array to record composites with a prime dividing it. It can be considered a slow version, as it does not cross out composites until needed. This version assumes enough memory to hold all primes up to ULIMIT. It prints out noncomposites greater than 1.
BEGIN { ULIMIT=100
for ( n=1 ; (n++) < ULIMIT ; )
if (n in S) {
p = S[n]
delete S[n]
for ( m = n ; (m += p) in S ; ) { }
S[m] = p
}
else print ( S[(n+n)] = n )
}
Bash
See solutions at UNIX Shell.
BASIC
DIM n AS Integer, k AS Integer, limit AS Integer
INPUT "Enter number to search to: "; limit
DIM flags(limit) AS Integer
FOR n = 2 TO SQR(limit)
IF flags(n) = 0 THEN
FOR k = n*n TO limit STEP n
flags(k) = 1
NEXT k
END IF
NEXT n
' Display the primes
FOR n = 2 TO limit
IF flags(n) = 0 THEN PRINT n; ", ";
NEXT n
Applesoft BASIC
10 INPUT "ENTER NUMBER TO SEARCH TO: ";LIMIT
20 DIM FLAGS(LIMIT)
30 FOR N = 2 TO SQR (LIMIT)
40 IF FLAGS(N) < > 0 GOTO 80
50 FOR K = N * N TO LIMIT STEP N
60 FLAGS(K) = 1
70 NEXT K
80 NEXT N
90 REM DISPLAY THE PRIMES
100 FOR N = 2 TO LIMIT
110 IF FLAGS(N) = 0 THEN PRINT N;", ";
120 NEXT N
Atari BASIC
Auto-initialization of arrays is not reliable, so we have to do our own. Also, PRINTing with commas doesn't quite format as nicely as one might hope, so we do a little extra work to keep the columns lined up.
100 REM SIEVE OF ERATOSTHENES
110 PRINT "LIMIT";:INPUT LI
120 DIM N(LI):FOR I=0 TO LI:N(I)=1:NEXT I
130 SL = SQR(LI)
140 N(0)=0:N(1)=0
150 FOR P=2 TO SL
160 IF N(P)=0 THEN 200
170 FOR I=P*P TO LI STEP P
180 N(I)=0
190 NEXT I
200 NEXT P
210 C=0
220 FOR I=2 TO LI
230 IF N(I)=0 THEN 260
240 PRINT I,:C=C+1
250 IF C=3 THEN PRINT:C=0
260 NEXT I
270 IF C THEN PRINT
- Output:
Ready RUN LIMIT?100 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
CBASIC
limit% = 1000
true% = -1
false% = 0
dim flags%(limit%)
print "Finding primes from 2 to"; limit%
rem - all numbers are potentially prime
for i% = 2 to limit%
flags%(i%) = true%
next i%
rem - sift for primes
outer.loop.limit% = int%(sqr(limit%))
for i% = 2 to outer.loop.limit%
if flags%(i%) = true% then \
for k% = i% * i% to limit% step i%
if k% <= limit% then flags%(k%) = false%
next k%
next i%
rem - write out the primes 12 per line
count% = 0
col% = 1
for i% = 2 to limit%
if flags%(i%) = true% then \
print using "### "; i%; : \
count% = count% + 1 : \
col% = col% + 1
if col% > 12 then print : col% = 1
next i%
print
print count%; "were found"
end
- Output:
Finding primes from 2 to 1000 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 84 <snip> 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 997 983 991 997 168 were found
Commodore BASIC
Since C= BASIC initializes arrays to all zeroes automatically, we avoid needing our own initialization loop by simply letting 0 mean prime and using 1 for composite.
100 REM SIEVE OF ERATOSTHENES
110 INPUT "LIMIT";LI
120 DIM N(LI)
130 SL = SQR(LI)
140 N(0)=1:N(1)=1
150 FOR P=2 TO SL
160 : IF N(P) THEN 200
170 : FOR I=P*P TO LI STEP P
180 : N(I)=1
190 : NEXT I
200 NEXT P
210 FOR I=2 TO LI
220 : IF N(I)=0 THEN PRINT I,
230 NEXT I
240 PRINT
- Output:
READY. RUN LIMIT? 100 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 READY.
IS-BASIC
100 PROGRAM "Sieve.bas"
110 LET LIMIT=100
120 NUMERIC T(1 TO LIMIT)
130 FOR I=1 TO LIMIT
140 LET T(I)=0
150 NEXT
160 FOR I=2 TO SQR(LIMIT)
170 IF T(I)<>1 THEN
180 FOR K=I*I TO LIMIT STEP I
190 LET T(K)=1
200 NEXT
210 END IF
220 NEXT
230 FOR I=2 TO LIMIT ! Display the primes
240 IF T(I)=0 THEN PRINT I;
250 NEXT
Locomotive Basic
10 DEFINT a-z
20 INPUT "Limit";limit
30 DIM f(limit)
40 FOR n=2 TO SQR(limit)
50 IF f(n)=1 THEN 90
60 FOR k=n*n TO limit STEP n
70 f(k)=1
80 NEXT k
90 NEXT n
100 FOR n=2 TO limit
110 IF f(n)=0 THEN PRINT n;",";
120 NEXT
MSX Basic
5 REM Tested with MSXPen web emulator
6 REM Translated from Rosetta's ZX Spectrum implementation
10 INPUT "Enter number to search to: ";l
20 DIM p(l)
30 FOR n=2 TO SQR(l)
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
70 NEXT k
80 NEXT n
90 REM Display the primes
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n
Sinclair ZX81 BASIC
If you only have 1k of RAM, this program will work—but you will only be able to sieve numbers up to 101. The program is therefore more useful if you have more memory available.
A note on FAST
and SLOW
: under normal circumstances the CPU spends about 3/4 of its time driving the display and only 1/4 doing everything else. Entering FAST
mode blanks the screen (which we do not want to update anyway), resulting in substantially improved performance; we then return to SLOW
mode when we have something to print out.
10 INPUT L
20 FAST
30 DIM N(L)
40 FOR I=2 TO SQR L
50 IF N(I) THEN GOTO 90
60 FOR J=I+I TO L STEP I
70 LET N(J)=1
80 NEXT J
90 NEXT I
100 SLOW
110 FOR I=2 TO L
120 IF NOT N(I) THEN PRINT I;" ";
130 NEXT I
ZX Spectrum Basic
10 INPUT "Enter number to search to: ";l
20 DIM p(l)
30 FOR n=2 TO SQR l
40 IF p(n)<>0 THEN NEXT n
50 FOR k=n*n TO l STEP n
60 LET p(k)=1
70 NEXT k
80 NEXT n
90 REM Display the primes
100 FOR n=2 TO l
110 IF p(n)=0 THEN PRINT n;", ";
120 NEXT n
QBasic
limit = 120
DIM flags(limit)
FOR n = 2 TO limit
flags(n) = 1
NEXT n
PRINT "Prime numbers less than or equal to "; limit; " are: "
FOR n = 2 TO SQR(limit)
IF flags(n) = 1 THEN
FOR i = n * n TO limit STEP n
flags(i) = 0
NEXT i
END IF
NEXT n
FOR n = 1 TO limit
IF flags(n) THEN PRINT USING "####"; n;
NEXT n
- Output:
Prime numbers less than or equal to 120 are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113
BASIC256
arraybase 1
limit = 120
dim flags(limit)
for n = 2 to limit
flags[n] = True
next n
print "Prime numbers less than or equal to "; limit; " are: "
for n = 2 to sqr(limit)
if flags[n] then
for i = n * n to limit step n
flags[i] = False
next i
end if
next n
for n = 1 to limit
if flags[n] then print rjust(n,4);
next n
- Output:
Prime numbers less than or equal to 120 are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113
True BASIC
LET limit = 120
DIM flags(0)
MAT redim flags(limit)
FOR n = 2 to limit
LET flags(n) = 1
NEXT n
PRINT "Prime numbers less than or equal to "; limit; " are: "
FOR n = 2 to sqr(limit)
IF flags(n) = 1 then
FOR i = n*n to limit step n
LET flags(i) = 0
NEXT i
END IF
NEXT n
FOR n = 1 to limit
IF flags(n)<>0 then PRINT using "####": n;
NEXT n
END
- Output:
Same as QBasic entry.
QL SuperBASIC
using 'easy way' to 'add' 2n wheels
Sets h$ to 1 for higher multiples of 2 via FILL$
, later on sets STEP
to 2n; replaces Floating Pt array p(z) with string variable h$(z) to sieve out all primes < z=441 (l
=21) in under 1K, so that h$ is fillable to its maximum (32766), even on a 48K ZX Spectrum if translated back.
10 INPUT "Enter Stopping Pt for squared factors: ";z
15 LET l=SQR(z)
20 LET h$="10" : h$=h$ & FILL$("01",z)
40 FOR n=3 TO l
50 IF h$(n): NEXT n
60 FOR k=n*n TO z STEP n+n: h$(k)=1
80 END FOR n
90 REM Display the primes
100 FOR n=2 TO z: IF h$(n)=0: PRINT n;", ";
2i wheel emulation of Sinclair ZX81 BASIC
Backward-compatible also on Spectrums, as well as 1K ZX81s for all primes < Z=441. N.B. the STEP
of 2 in line 40 mitigates line 50's inefficiency when going to 90.
10 INPUT Z
15 LET L=SQR(Z)
30 LET H$="10"
32 FOR J=3 TO Z STEP 2
34 LET H$=H$ & "01"
36 NEXT J
40 FOR I=3 TO L STEP 2
50 IF H$(I)="1" THEN GOTO 90
60 FOR J=I*I TO Z STEP I+I
70 LET H$(J)="1"
80 NEXT J
90 NEXT I
110 FOR I=2 TO Z
120 IF H$(I)="0" THEN PRINT I!
130 NEXT I
2i wheel emulation of Sinclair ZX80 BASIC
. . . with 2:1 compression (of 16-bit integer variables on ZX80s) such that it obviates having to account for any multiple of 2; one has to input odd upper limits on factors to be squared, L
(=21 at most on 1K ZX80s for all primes till 439).
Backward-compatible on ZX80s after substituting ** for ^ in line 120.
10 INPUT L
15 LET Z=(L+1)*(L- 1)/2
30 DIM H(Z)
40 FOR I=3 TO L STEP 2
50 IF H((I-1)/ 2) THEN GOTO 90
60 FOR J=I*I TO L*L STEP I+I
70 LET H((J-1)/ 2)=1
80 NEXT J
90 NEXT I
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I
Sieve of Sundaram
Objections that the latter emulation has strayed far from the given task are obviously justified. Yet not as obvious is that we are now
just a slight transformation away from the Sieve of Sundaram, as transformed as follows: O
is the highest value for an Index of succesive diagonal elements in Sundaram's matrix, for which H(J) also includes the off-diagonal elements in-between, such that
duplicate entries are omitted. Thus, a slightly transformed Sieve of Sundaram is what Eratosthenes' Sieve becomes upon applying all
optimisations incorporated into the prior entries for QL SuperBASIC, except for any equivalent to line 50 in them.
Backward-compatible on 1K ZX80s for all primes < 441 (O=10) after substituting ** for ^ in line 120.
10 INPUT O
15 LET Z=2*O*O+O*2
30 DIM H(Z)
40 FOR I=1 TO O
45 LET A=2*I*I+I*2
50 REM IF H(A) THEN GOTO 90
60 FOR J=A TO Z STEP 1+I*2
65 REM IF H(J) THEN GOTO 80
70 LET H(J)=1
80 NEXT J
90 NEXT I
110 FOR I=0 TO Z
120 IF NOT H(I) THEN PRINT 0^I+1+I*2!
130 NEXT I
Eulerian optimisation
While slower than the optimised Sieve of Eratosthenes before it, the Sieve of Sundaram above has a compatible compression scheme that's more convenient than the conventional one used beforehand. It is therefore applied below along with Euler's alternative optimisation in a reversed implementation that lacks backward-compatibility to ZX80 BASIC. This program is designed around features & limitations of the QL, yet can be rewritten more efficiently for 1K ZX80s, as they allow integer variables to be parameters of FOR
statements (& as their 1K of static RAM is equivalent to L1 cache, even in FAST
mode). That's left as an exercise for ZX80 enthusiasts, who for o%=14 should be able to generate all primes < 841, i.e. 3 orders of (base 2) magnitude above the limit for the program listed under Sinclair ZX81 BASIC. In QL SuperBASIC, o% may at most be 127--generating all primes < 65,025 (almost 2x the upper limit for indices & integer variables used to calculate them ~2x faster than for floating point as used in line 30, after which the integer code mimics an assembly algorithm for the QL's 68008.)
10 INPUT "Enter highest value of diagonal index q%: ";o%
15 LET z%=o%*(2+o%*2) : h$=FILL$(" ",z%+o%) : q%=1 : q=q% : m=z% DIV (2*q%+1)
30 FOR p=m TO q STEP -1: h$((2*q+1)*p+q)="1"
42 GOTO 87
61 IF h$(p%)="1": GOTO 63
62 IF p%<q%: GOTO 87 : ELSE h$((2*q%+1)*p%+q%)="1"
63 LET p%=p%-1 : GOTO 61
87 LET q%=q%+1 : IF h$(q%)="1": GOTO 87
90 LET p%=z% DIV (2*q%+1) : IF q%<=o%: GOTO 61
100 LET z%=z%-1 : IF z%=0: PRINT N%(z%) : STOP
101 IF h$(z%)=" ": PRINT N%(z%)!
110 GOTO 100
127 DEF FN N%(i)=0^i+1+i*2
Batch File
:: Sieve of Eratosthenes for Rosetta Code - PG
@echo off
setlocal ENABLEDELAYEDEXPANSION
setlocal ENABLEEXTENSIONS
rem echo on
set /p n=limit:
rem set n=100
for /L %%i in (1,1,%n%) do set crible.%%i=1
for /L %%i in (2,1,%n%) do (
if !crible.%%i! EQU 1 (
set /A w = %%i * 2
for /L %%j in (!w!,%%i,%n%) do (
set crible.%%j=0
)
)
)
for /L %%i in (2,1,%n%) do (
if !crible.%%i! EQU 1 echo %%i
)
pause
- Output:
limit: 100 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
BBC BASIC
limit% = 100000
DIM sieve% limit%
prime% = 2
WHILE prime%^2 < limit%
FOR I% = prime%*2 TO limit% STEP prime%
sieve%?I% = 1
NEXT
REPEAT prime% += 1 : UNTIL sieve%?prime%=0
ENDWHILE
REM Display the primes:
FOR I% = 1 TO limit%
IF sieve%?I% = 0 PRINT I%;
NEXT
BCPL
get "libhdr"
manifest $( LIMIT = 1000 $)
let sieve(prime,max) be
$( let i = 2
0!prime := false
1!prime := false
for i = 2 to max do i!prime := true
while i*i <= max do
$( if i!prime do
$( let j = i*i
while j <= max do
$( j!prime := false
j := j + i
$)
$)
i := i + 1
$)
$)
let start() be
$( let prime = vec LIMIT
let col = 0
sieve(prime, LIMIT)
for i = 2 to LIMIT do
if i!prime do
$( writef("%I4",i)
col := col + 1
if col rem 20 = 0 then wrch('*N')
$)
wrch('*N')
$)
- Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997
Odds-only bit packed array version (64 bit)
This sieve also uses an iterator structure to enumerate the primes in the sieve. It's inspired by the golang bit packed sieve that returns a closure as an iterator. However, BCPL does not support closures, so the code uses an iterator object.
GET "libhdr"
LET lowbit(n) =
0 -> -1,
VALOF {
// The table is byte packed to conserve space; therefore we must
// unpack the structure.
//
LET deBruijn64 = TABLE
#x0001300239311C03, #x3D3A322A261D1104,
#x3E373B2435332B16, #x2D27211E18120C05,
#x3F2F381B3C292510, #x362334152C20170B,
#x2E1A280F22141F0A, #x190E13090D080706
LET x6 = (n & -n) * #x3F79D71B4CB0A89 >> 58
RESULTIS deBruijn64[x6 >> 3] >> (7 - (x6 & 7) << 3) & #xFF
}
LET primes_upto(limit) =
limit < 3 -> 0,
VALOF {
LET bit_sz = (limit + 1) / 2 - 1
LET bit, p = ?, ?
LET q, r = bit_sz >> 6, bit_sz & #x3F
LET sz = q - (r > 0)
LET sieve = getvec(sz)
// Initialize the array
FOR i = 0 TO q - 1 DO
sieve!i := -1
IF r > 0 THEN sieve!q := ~(-1 << r)
sieve!sz := -1 // Sentinel value to mark the end -
// (after sieving, we'll never have 64 consecutive odd primes.)
// run the sieve
bit := 0
{
WHILE (sieve[bit >> 6] & 1 << (bit & #x3F)) = 0 DO
bit +:= 1
p := 2*bit + 3
q := p*p
IF q > limit THEN RESULTIS sieve
r := (q - 3) >> 1
UNTIL r >= bit_sz DO {
sieve[r >> 6] &:= ~(1 << (r & #x3F))
r +:= p
}
bit +:= 1
} REPEAT
}
MANIFEST { // fields in an iterable
sieve_start; sieve_bits; sieve_ptr
}
LET prime_iter(sieve) = VALOF {
LET iter = getvec(2)
iter!sieve_start := 0
iter!sieve_bits := sieve!0
iter!sieve_ptr := sieve
RESULTIS iter
}
LET nextprime(iter) =
!iter!sieve_ptr = -1 -> 0, // guard entry if at the end already
VALOF {
LET p, x = ?, ?
// iter!sieve_start is also a flag to yield 2.
IF iter!sieve_start = 0 {
iter!sieve_start := 3
RESULTIS 2
}
x := iter!sieve_bits
{
TEST x ~= 0
THEN {
p := (lowbit(x) << 1) + iter!sieve_start
x &:= x - 1
iter!sieve_bits := x
RESULTIS p
}
ELSE {
iter!sieve_start +:= 128
iter!sieve_ptr +:= 1
x := !iter!sieve_ptr
IF x = -1 RESULTIS 0
}
} REPEAT
}
LET show(sieve) BE {
LET iter = prime_iter(sieve)
LET c, p = 0, ?
{
p := nextprime(iter)
IF p = 0 THEN {
wrch('*n')
freevec(iter)
RETURN
}
IF c MOD 10 = 0 THEN wrch('*n')
c +:= 1
writef("%8d", p)
} REPEAT
}
LET start() = VALOF {
LET n = ?
LET argv = VEC 20
LET sz = ?
LET primes = ?
sz := rdargs("upto/a/n/p", argv, 20)
IF sz = 0 RESULTIS 1
n := !argv!0
primes := primes_upto(n)
IF primes = 0 RESULTIS 1 // no array allocated because limit too small
show(primes)
freevec(primes)
RESULTIS 0
}
- Output:
$ ./sieve 1000 BCPL 64-bit Cintcode System (13 Jan 2020) 0.000> 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587 593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709 719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853 857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991 997 0.005>
Befunge
2>:3g" "-!v\ g30 < |!`"O":+1_:.:03p>03g+:"O"`| @ ^ p3\" ":< 2 234567890123456789012345678901234567890123456789012345678901234567890123456789
Binary Lambda Calculus
The BLC sieve of Eratosthenes as documented at https://github.com/tromp/AIT/blob/master/characteristic_sequences/primes.lam is the 167 bit program
00010001100110010100011010000000010110000010010001010111110111101001000110100001110011010000000000101101110011100111111101111000000001111100110111000000101100000110110
The infinitely long output is
001101010001010001010001000001010000010001010001000001000001010000010001010000010001000001000000010001010001010001000000000000010001000001010000000001010000010000010001000001000001010000000001010001010000000000010000000000010001010001000001010000000001000001000001000001010000010001010000000001000000000000010001010001000000000000010000010000000001010001000001000...
BQN
A more efficient sieve (primes below one billion in around a second) is provided as PrimesTo
in bqn-libs primes.bqn.
Primes ← {
𝕩≤2 ? ↕0 ; # No primes below 2
p ← 𝕊⌈√n←𝕩 # Initial primes by recursion
b ← 2≤↕n # Initial sieve: no 0 or 1
E ← {↕∘⌈⌾((𝕩×𝕩+⊢)⁼)n} # Multiples of 𝕩 under n, starting at 𝕩×𝕩
/ b E⊸{0¨⌾(𝕨⊸⊏)𝕩}´ p # Cross them out
}
- Output:
Primes 100
⟨ 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 ⟩
≠∘Primes¨ 10⋆↕7 # Number of primes below 1e0, 1e1, ... 1e6
⟨ 0 4 25 168 1229 9592 78498 ⟩
Bracmat
This solution does not use an array. Instead, numbers themselves are used as variables. The numbers that are not prime are set (to the silly value "nonprime"). Finally all numbers up to the limit are tested for being initialised. The uninitialised (unset) ones must be the primes.
( ( eratosthenes
= n j i
. !arg:?n
& 1:?i
& whl
' ( (1+!i:?i)^2:~>!n:?j
& ( !!i
| whl
' ( !j:~>!n
& nonprime:?!j
& !j+!i:?j
)
)
)
& 1:?i
& whl
' ( 1+!i:~>!n:?i
& (!!i|put$(!i " "))
)
)
& eratosthenes$100
)
- Output:
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
C
Plain sieve, without any optimizations:
#include <stdlib.h>
#include <math.h>
char*
eratosthenes(int n, int *c)
{
char* sieve;
int i, j, m;
if(n < 2)
return NULL;
*c = n-1; /* primes count */
m = (int) sqrt((double) n);
/* calloc initializes to zero */
sieve = calloc(n+1,sizeof(char));
sieve[0] = 1;
sieve[1] = 1;
for(i = 2; i <= m; i++)
if(!sieve[i])
for (j = i*i; j <= n; j += i)
if(!sieve[j]){
sieve[j] = 1;
--(*c);
}
return sieve;
}
Possible optimizations include sieving only odd numbers (or more complex wheels), packing the sieve into bits to improve locality (and allow larger sieves), etc.
Another example:
We first fill ones into an array and assume all numbers are prime. Then, in a loop, fill zeroes into those places where i * j is less than or equal to n (number of primes requested), which means they have multiples! To understand this better, look at the output of the following example.
To print this back, we look for ones in the array and only print those spots.
#include <stdio.h>
#include <malloc.h>
void sieve(int *, int);
int main(int argc, char *argv)
{
int *array, n=10;
array =(int *)malloc((n + 1) * sizeof(int));
sieve(array,n);
return 0;
}
void sieve(int *a, int n)
{
int i=0, j=0;
for(i=2; i<=n; i++) {
a[i] = 1;
}
for(i=2; i<=n; i++) {
printf("\ni:%d", i);
if(a[i] == 1) {
for(j=i; (i*j)<=n; j++) {
printf ("\nj:%d", j);
printf("\nBefore a[%d*%d]: %d", i, j, a[i*j]);
a[(i*j)] = 0;
printf("\nAfter a[%d*%d]: %d", i, j, a[i*j]);
}
}
}
printf("\nPrimes numbers from 1 to %d are : ", n);
for(i=2; i<=n; i++) {
if(a[i] == 1)
printf("%d, ", i);
}
printf("\n\n");
}
- Output:
i:2
j:2
Before a[2*2]: 1
After a[2*2]: 0
j:3
Before a[2*3]: 1
After a[2*3]: 0
j:4
Before a[2*4]: 1
After a[2*4]: 0
j:5
Before a[2*5]: 1
After a[2*5]: 0
i:3
j:3
Before a[3*3]: 1
After a[3*3]: 0
i:4
i:5
i:6
i:7
i:8
i:9
i:10
Primes numbers from 1 to 10 are : 2, 3, 5, 7,
C#
using System;
using System.Collections;
using System.Collections.Generic;
namespace SieveOfEratosthenes
{
class Program
{
static void Main(string[] args)
{
int maxprime = int.Parse(args[0]);
var primelist = GetAllPrimesLessThan(maxprime);
foreach (int prime in primelist)
{
Console.WriteLine(prime);
}
Console.WriteLine("Count = " + primelist.Count);
Console.ReadLine();
}
private static List<int> GetAllPrimesLessThan(int maxPrime)
{
var primes = new List<int>();
var maxSquareRoot = (int)Math.Sqrt(maxPrime);
var eliminated = new BitArray(maxPrime + 1);
for (int i = 2; i <= maxPrime; ++i)
{
if (!eliminated[i])
{
primes.Add(i);
if (i <= maxSquareRoot)
{
for (int j = i * i; j <= maxPrime; j += i)
{
eliminated[j] = true;
}
}
}
}
return primes;
}
}
}
Richard Bird Sieve
To show that C# code can be written in somewhat functional paradigms, the following in an implementation of the Richard Bird sieve from the Epilogue of [Melissa E. O'Neill's definitive article](http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf) in Haskell:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesBird : IEnumerable<PrimeT> {
private struct CIS<T> {
public T v; public Func<CIS<T>> cont;
public CIS(T v, Func<CIS<T>> cont) {
this.v = v; this.cont = cont;
}
}
private CIS<PrimeT> pmlts(PrimeT p) {
Func<PrimeT, CIS<PrimeT>> fn = null;
fn = (c) => new CIS<PrimeT>(c, () => fn(c + p));
return fn(p * p);
}
private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {
return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont())); }
private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {
var x = xs.v; var y = ys.v;
if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));
else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));
else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));
}
private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {
return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(css.cont()))); }
private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {
var nn = n; var ncs = cs;
for (; ; ++nn) {
if (nn >= ncs.v) ncs = ncs.cont();
else return new CIS<PrimeT>(nn, () => minusat(++nn, ncs));
}
}
private CIS<PrimeT> prms() {
return new CIS<PrimeT>(2, () => minusat(3, cmpsts(allmlts(prms())))); }
public IEnumerator<PrimeT> GetEnumerator() {
for (var ps = prms(); ; ps = ps.cont()) yield return ps.v;
}
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}
Tree Folding Sieve
The above code can easily be converted to "odds-only" and a infinite tree-like folding scheme with the following minor changes:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesTreeFold : IEnumerable<PrimeT> {
private struct CIS<T> {
public T v; public Func<CIS<T>> cont;
public CIS(T v, Func<CIS<T>> cont) {
this.v = v; this.cont = cont;
}
}
private CIS<PrimeT> pmlts(PrimeT p) {
var adv = p + p;
Func<PrimeT, CIS<PrimeT>> fn = null;
fn = (c) => new CIS<PrimeT>(c, () => fn(c + adv));
return fn(p * p);
}
private CIS<CIS<PrimeT>> allmlts(CIS<PrimeT> ps) {
return new CIS<CIS<PrimeT>>(pmlts(ps.v), () => allmlts(ps.cont()));
}
private CIS<PrimeT> merge(CIS<PrimeT> xs, CIS<PrimeT> ys) {
var x = xs.v; var y = ys.v;
if (x < y) return new CIS<PrimeT>(x, () => merge(xs.cont(), ys));
else if (y < x) return new CIS<PrimeT>(y, () => merge(xs, ys.cont()));
else return new CIS<PrimeT>(x, () => merge(xs.cont(), ys.cont()));
}
private CIS<CIS<PrimeT>> pairs(CIS<CIS<PrimeT>> css) {
var nxtcss = css.cont();
return new CIS<CIS<PrimeT>>(merge(css.v, nxtcss.v), () => pairs(nxtcss.cont())); }
private CIS<PrimeT> cmpsts(CIS<CIS<PrimeT>> css) {
return new CIS<PrimeT>(css.v.v, () => merge(css.v.cont(), cmpsts(pairs(css.cont()))));
}
private CIS<PrimeT> minusat(PrimeT n, CIS<PrimeT> cs) {
var nn = n; var ncs = cs;
for (; ; nn += 2) {
if (nn >= ncs.v) ncs = ncs.cont();
else return new CIS<PrimeT>(nn, () => minusat(nn + 2, ncs));
}
}
private CIS<PrimeT> oddprms() {
return new CIS<PrimeT>(3, () => minusat(5, cmpsts(allmlts(oddprms()))));
}
public IEnumerator<PrimeT> GetEnumerator() {
yield return 2;
for (var ps = oddprms(); ; ps = ps.cont()) yield return ps.v;
}
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}
The above code runs over ten times faster than the original Richard Bird algorithm.
Priority Queue Sieve
First, an implementation of a Min Heap Priority Queue is provided as extracted from the entry at RosettaCode, with only the necessary methods duplicated here:
namespace PriorityQ {
using KeyT = System.UInt32;
using System;
using System.Collections.Generic;
using System.Linq;
class Tuple<K, V> { // for DotNet 3.5 without Tuple's
public K Item1; public V Item2;
public Tuple(K k, V v) { Item1 = k; Item2 = v; }
public override string ToString() {
return "(" + Item1.ToString() + ", " + Item2.ToString() + ")";
}
}
class MinHeapPQ<V> {
private struct HeapEntry {
public KeyT k; public V v;
public HeapEntry(KeyT k, V v) { this.k = k; this.v = v; }
}
private List<HeapEntry> pq;
private MinHeapPQ() { this.pq = new List<HeapEntry>(); }
private bool mt { get { return pq.Count == 0; } }
private Tuple<KeyT, V> pkmn {
get {
if (pq.Count == 0) return null;
else {
var mn = pq[0];
return new Tuple<KeyT, V>(mn.k, mn.v);
}
}
}
private void psh(KeyT k, V v) { // add extra very high item if none
if (pq.Count == 0) pq.Add(new HeapEntry(UInt32.MaxValue, v));
var i = pq.Count; pq.Add(pq[i - 1]); // copy bottom item...
for (var ni = i >> 1; ni > 0; i >>= 1, ni >>= 1) {
var t = pq[ni - 1];
if (t.k > k) pq[i - 1] = t; else break;
}
pq[i - 1] = new HeapEntry(k, v);
}
private void siftdown(KeyT k, V v, int ndx) {
var cnt = pq.Count - 1; var i = ndx;
for (var ni = i + i + 1; ni < cnt; ni = ni + ni + 1) {
var oi = i; var lk = pq[ni].k; var rk = pq[ni + 1].k;
var nk = k;
if (k > lk) { i = ni; nk = lk; }
if (nk > rk) { ni += 1; i = ni; }
if (i != oi) pq[oi] = pq[i]; else break;
}
pq[i] = new HeapEntry(k, v);
}
private void rplcmin(KeyT k, V v) {
if (pq.Count > 1) siftdown(k, v, 0); }
public static MinHeapPQ<V> empty { get { return new MinHeapPQ<V>(); } }
public static Tuple<KeyT, V> peekMin(MinHeapPQ<V> pq) { return pq.pkmn; }
public static MinHeapPQ<V> push(KeyT k, V v, MinHeapPQ<V> pq) {
pq.psh(k, v); return pq; }
public static MinHeapPQ<V> replaceMin(KeyT k, V v, MinHeapPQ<V> pq) {
pq.rplcmin(k, v); return pq; }
}
Restricted Base Primes Queue
The following code implements an improved version of the odds-only O'Neil algorithm, which provides the improvements of only adding base prime composite number streams to the queue when the sieved number reaches the square of the base prime (saving a huge amount of memory and considerable execution time, including not needing as wide a range of a type for the internal prime numbers) as well as minimizing stream processing using fusion:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesPQ : IEnumerable<PrimeT> {
private IEnumerator<PrimeT> nmrtr() {
MinHeapPQ<PrimeT> pq = MinHeapPQ<PrimeT>.empty;
PrimeT bp = 3; PrimeT q = 9;
IEnumerator<PrimeT> bps = null;
yield return 2; yield return 3;
for (var n = (PrimeT)5; ; n += 2) {
if (n >= q) { // always equal or less...
if (q <= 9) {
bps = nmrtr();
bps.MoveNext(); bps.MoveNext(); } // move to 3...
bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;
var adv = bp + bp; bp = nbp;
pq = MinHeapPQ<PrimeT>.push(n + adv, adv, pq);
}
else {
var pk = MinHeapPQ<PrimeT>.peekMin(pq);
var ck = (pk == null) ? q : pk.Item1;
if (n >= ck) {
do { var adv = pk.Item2;
pq = MinHeapPQ<PrimeT>.replaceMin(ck + adv, adv, pq);
pk = MinHeapPQ<PrimeT>.peekMin(pq); ck = pk.Item1;
} while (n >= ck);
}
else yield return n;
}
}
}
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}
The above code is at least about 2.5 times faster than the Tree Folding version.
Dictionary (Hash table) Sieve
The above code adds quite a bit of overhead in having to provide a version of a Priority Queue for little advantage over a Dictionary (hash table based) version as per the code below:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesDict : IEnumerable<PrimeT> {
private IEnumerator<PrimeT> nmrtr() {
Dictionary<PrimeT, PrimeT> dct = new Dictionary<PrimeT, PrimeT>();
PrimeT bp = 3; PrimeT q = 9;
IEnumerator<PrimeT> bps = null;
yield return 2; yield return 3;
for (var n = (PrimeT)5; ; n += 2) {
if (n >= q) { // always equal or less...
if (q <= 9) {
bps = nmrtr();
bps.MoveNext(); bps.MoveNext();
} // move to 3...
bps.MoveNext(); var nbp = bps.Current; q = nbp * nbp;
var adv = bp + bp; bp = nbp;
dct.Add(n + adv, adv);
}
else {
if (dct.ContainsKey(n)) {
PrimeT nadv; dct.TryGetValue(n, out nadv); dct.Remove(n); var nc = n + nadv;
while (dct.ContainsKey(nc)) nc += nadv;
dct.Add(nc, nadv);
}
else yield return n;
}
}
}
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}
The above code runs in about three quarters of the time as the above Priority Queue based version for a range of a million primes which will fall even further behind for increasing ranges due to the Dictionary providing O(1) access times as compared to the O(log n) access times for the Priority Queue; the only slight advantage of the PQ based version is at very small ranges where the constant factor overhead of computing the table hashes becomes greater than the "log n" factor for small "n".
Best performance: CPU-Cache-Optimized Segmented Sieve
All of the above unbounded versions are really just an intellectual exercise as with very little extra lines of code above the fastest Dictionary based version, one can have an bit-packed page-segmented array based version as follows:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using PrimeT = System.UInt32;
class PrimesPgd : IEnumerable<PrimeT> {
private const int PGSZ = 1 << 14; // L1 CPU cache size in bytes
private const int BFBTS = PGSZ * 8; // in bits
private const int BFRNG = BFBTS * 2;
public IEnumerator<PrimeT> nmrtr() {
IEnumerator<PrimeT> bps = null;
List<uint> bpa = new List<uint>();
uint[] cbuf = new uint[PGSZ / 4]; // 4 byte words
yield return 2;
for (var lowi = (PrimeT)0; ; lowi += BFBTS) {
for (var bi = 0; ; ++bi) {
if (bi < 1) {
if (bi < 0) { bi = 0; yield return 2; }
PrimeT nxt = 3 + lowi + lowi + BFRNG;
if (lowi <= 0) { // cull very first page
for (int i = 0, p = 3, sqr = 9; sqr < nxt; i++, p += 2, sqr = p * p)
if ((cbuf[i >> 5] & (1 << (i & 31))) == 0)
for (int j = (sqr - 3) >> 1; j < BFBTS; j += p)
cbuf[j >> 5] |= 1u << j;
}
else { // cull for the rest of the pages
Array.Clear(cbuf, 0, cbuf.Length);
if (bpa.Count == 0) { // inite secondar base primes stream
bps = nmrtr(); bps.MoveNext(); bps.MoveNext();
bpa.Add((uint)bps.Current); bps.MoveNext();
} // add 3 to base primes array
// make sure bpa contains enough base primes...
for (PrimeT p = bpa[bpa.Count - 1], sqr = p * p; sqr < nxt; ) {
p = bps.Current; bps.MoveNext(); sqr = p * p; bpa.Add((uint)p);
}
for (int i = 0, lmt = bpa.Count - 1; i < lmt; i++) {
var p = (PrimeT)bpa[i]; var s = (p * p - 3) >> 1;
// adjust start index based on page lower limit...
if (s >= lowi) s -= lowi;
else {
var r = (lowi - s) % p;
s = (r != 0) ? p - r : 0;
}
for (var j = (uint)s; j < BFBTS; j += p)
cbuf[j >> 5] |= 1u << ((int)j);
}
}
}
while (bi < BFBTS && (cbuf[bi >> 5] & (1 << (bi & 31))) != 0) ++bi;
if (bi < BFBTS) yield return 3 + (((PrimeT)bi + lowi) << 1);
else break; // outer loop for next page segment...
}
}
}
public IEnumerator<PrimeT> GetEnumerator() { return nmrtr(); }
IEnumerator IEnumerable.GetEnumerator() { return (IEnumerator)GetEnumerator(); }
}
The above code is about 25 times faster than the Dictionary version at computing the first about 50 million primes (up to a range of one billion), with the actual enumeration of the result sequence now taking longer than the time it takes to cull the composite number representation bits from the arrays, meaning that it is over 50 times faster at actually sieving the primes. The code owes its speed as compared to a naive "one huge memory array" algorithm to using an array size that is the size of the CPU L1 or L2 caches and using bit-packing to fit more number representations into this limited capacity; in this way RAM memory access times are reduced by a factor of from about four to about 10 (depending on CPU and RAM speed) as compared to those naive implementations, and the minor computational cost of the bit manipulations is compensated by a large factor in total execution time.
The time to enumerate the result primes sequence can be reduced somewhat (about a second) by removing the automatic iterator "yield return" statements and converting them into a "roll-your-own" IEnumerable<PrimeT> implementation, but for page segmentation of odds-only, this iteration of the results will still take longer than the time to actually cull the composite numbers from the page arrays.
In order to make further gains in speed, custom methods must be used to avoid using iterator sequences. If this is done, then further gains can be made by extreme wheel factorization (up to about another about four times gain in speed) and multi-processing (with another gain in speed proportional to the actual independent CPU cores used).
Note that all of these gains in speed are not due to C# other than it compiles to reasonably efficient machine code, but rather to proper use of the Sieve of Eratosthenes algorithm.
All of the above unbounded code can be tested by the following "main" method (replace the name "PrimesXXX" with the name of the class to be tested):
static void Main(string[] args) {
Console.WriteLine(new PrimesXXX().ElementAt(1000000 - 1)); // zero based indexing...
}
To produce the following output for all tested versions (although some are considerably faster than others):
- Output:
15485863
C++
Standard Library
This implementation follows the standard library pattern of std::iota. The start and end iterators are provided for the container. The destination container is used for marking primes and then filled with the primes which are less than the container size. This method requires no memory allocation inside the function.
#include <iostream>
#include <iterator>
#include <algorithm>
#include <vector>
// Fills the range [start, end) with 1 if the integer corresponding to the index is composite and 0 otherwise.
// requires: I is RandomAccessIterator
template<typename I>
void mark_composites(I start, I end)
{
std::fill(start, end, 0);
for (auto it = start + 1; it != end; ++it)
{
if (*it == 0)
{
auto prime = std::distance(start, it) + 1;
// mark all multiples of this prime number as composite.
auto multiple_it = it;
while (std::distance(multiple_it, end) > prime)
{
std::advance(multiple_it, prime);
*multiple_it = 1;
}
}
}
}
// Fills "out" with the prime numbers in the range 2...N where N = distance(start, end).
// requires: I is a RandomAccessIterator
// O is an OutputIterator
template <typename I, typename O>
O sieve_primes(I start, I end, O out)
{
mark_composites(start, end);
for (auto it = start + 1; it != end; ++it)
{
if (*it == 0)
{
*out = std::distance(start, it) + 1;
++out;
}
}
return out;
}
int main()
{
std::vector<uint8_t> is_composite(1000);
sieve_primes(is_composite.begin(), is_composite.end(), std::ostream_iterator<int>(std::cout, " "));
// Alternative to store in a vector:
// std::vector<int> primes;
// sieve_primes(is_composite.begin(), is_composite.end(), std::back_inserter(primes));
}
Boost
// yield all prime numbers less than limit.
template<class UnaryFunction>
void primesupto(int limit, UnaryFunction yield)
{
std::vector<bool> is_prime(limit, true);
const int sqrt_limit = static_cast<int>(std::sqrt(limit));
for (int n = 2; n <= sqrt_limit; ++n)
if (is_prime[n]) {
yield(n);
for (unsigned k = n*n, ulim = static_cast<unsigned>(limit); k < ulim; k += n)
//NOTE: "unsigned" is used to avoid an overflow in `k+=n` for `limit` near INT_MAX
is_prime[k] = false;
}
for (int n = sqrt_limit + 1; n < limit; ++n)
if (is_prime[n])
yield(n);
}
Full program:
/**
$ g++ -I/path/to/boost sieve.cpp -o sieve && sieve 10000000
*/
#include <inttypes.h> // uintmax_t
#include <limits>
#include <cmath>
#include <iostream>
#include <sstream>
#include <vector>
#include <boost/lambda/lambda.hpp>
int main(int argc, char *argv[])
{
using namespace std;
using namespace boost::lambda;
int limit = 10000;
if (argc == 2) {
stringstream ss(argv[--argc]);
ss >> limit;
if (limit < 1 or ss.fail()) {
cerr << "USAGE:\n sieve LIMIT\n\nwhere LIMIT in the range [1, "
<< numeric_limits<int>::max() << ")" << endl;
return 2;
}
}
// print primes less then 100
primesupto(100, cout << _1 << " ");
cout << endl;
// find number of primes less then limit and their sum
int count = 0;
uintmax_t sum = 0;
primesupto(limit, (var(sum) += _1, var(count) += 1));
cout << "limit sum pi(n)\n"
<< limit << " " << sum << " " << count << endl;
}
Chapel
This solution uses nested iterators to create new wheels at run time:
// yield prime and remove all multiples of it from children sieves
iter sieve(prime):int {
yield prime;
var last = prime;
label candidates for candidate in sieve(prime+1) do {
for composite in last..candidate by prime do {
// candidate is a multiple of this prime
if composite == candidate {
// remember size of last composite
last = composite;
// and try the next candidate
continue candidates;
}
}
// candidate cannot need to be removed by this sieve
// yield to parent sieve for checking
yield candidate;
}
}
The topmost sieve needs to be started with 2 (the smallest prime):
config const N = 30;
for p in sieve(2) {
if p > N {
writeln();
break;
}
write(" ", p);
}
Alternate Conventional Bit-Packed Implementation
The following code implements the conventional monolithic (one large array) Sieve of Eratosthenes where the representations of the numbers use only one bit per number, using an iteration for output so as to not require further memory allocation:
compile with the `--fast` option
use Time;
use BitOps;
type Prime = uint(32);
config const limit: Prime = 1000000000; // sieve limit
proc main() {
write("The first 25 primes are: ");
for p in primes(100) do write(p, " "); writeln();
var count = 0; for p in primes(1000000) do count += 1;
writeln("Count of primes to a million is: ", count, ".");
var timer: Timer;
timer.start();
count = 0;
for p in primes(limit) do count += 1;
timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
iter primes(n: Prime): Prime {
const szlmt = n / 8;
var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up
for bp in 2 .. n {
if cmpsts[bp >> 3] & (1: uint(8) << (bp & 7)) == 0 {
const s0 = bp * bp;
if s0 > n then break;
for c in s0 .. n by bp { cmpsts[c >> 3] |= 1: uint(8) << (c & 7); }
}
}
for p in 2 .. n do if cmpsts[p >> 3] & (1: uint(8) << (p & 7)) == 0 then yield p;
}
- Output:
The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 Count of primes to a million is: 78498. Found 50847534 primes up to 1000000000 in 7964.05 milliseconds.
Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).
Alternate Odds-Only Bit-Packed Implementation
use Time;
use BitOps;
type Prime = int(32);
config const limit: Prime = 1000000000; // sieve limit
proc main() {
write("The first 25 primes are: ");
for p in primes(100) do write(p, " "); writeln();
var count = 0; for p in primes(1000000) do count += 1;
writeln("Count of primes to a million is: ", count, ".");
var timer: Timer;
timer.start();
count = 0;
for p in primes(limit) do count += 1;
timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
iter primes(n: Prime): Prime {
const ndxlmt = (n - 3) / 2;
const szlmt = ndxlmt / 8;
var cmpsts: [0 .. szlmt] uint(8); // even number of byte array rounded up
for i in 0 .. ndxlmt { // never gets to the end!
if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 {
const bp = i + i + 3;
const s0 = (bp * bp - 3) / 2;
if s0 > ndxlmt then break;
for s in s0 .. ndxlmt by bp do cmpsts[s >> 3] |= 1: uint(8) << (s & 7);
}
}
yield 2;
for i in 0 .. ndxlmt do
if cmpsts[i >> 3] & (1: uint(8) << (i & 7)) == 0 then yield i + i + 3;
}
- Output:
The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 Count of primes to a million is: 78498. Found 50847534 primes up to 1000000000 in 4008.16 milliseconds.
Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).
As you can see, sieving odds-only is about twice as fast due to the reduced number of operations; it also uses only half the amount of memory. However, this is still not all that fast at about 14.4 CPU clock cycles per sieve culling operation due to the size of the array exceeding the CPU cache size(s).
Hash Table Based Odds-Only Version
use Time;
config const limit = 100000000;
type Prime = uint(32);
class Primes { // needed so we can use next to get successive values
var n: Prime; var obp: Prime; var q: Prime;
var bps: owned Primes?;
var keys: domain(Prime); var dict: [keys] Prime;
proc next(): Prime { // odd primes!
if this.n < 5 { this.n = 5; return 3; }
if this.bps == nil {
this.bps = new Primes(); // secondary odd base primes feed
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
while true {
if this.n >= this.q { // advance secondary stream of base primes...
const adv = this.obp * 2; const key = this.q + adv;
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
this.keys += key; this.dict[key] = adv;
}
else if this.keys.contains(this.n) { // found a composite; advance...
const adv = this.dict[this.n]; this.keys.remove(this.n);
var nkey = this.n + adv;
while this.keys.contains(nkey) do nkey += adv;
this.keys += nkey; this.dict[nkey] = adv;
}
else { const p = this.n; this.n += 2; return p; }
this.n += 2;
}
return 0; // to keep compiler happy in returning a value!
}
iter these(): Prime { yield 2; while true do yield this.next(); }
}
proc main() {
var count = 0;
write("The first 25 primes are: ");
for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
writeln();
var timer: Timer;
timer.start();
count = 0;
for p in new Primes() { if p > limit then break; count += 1; }
timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
- Output:
The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 Found 5761455 primes up to 100000000 in 5195.41 milliseconds.
Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).
As you can see, this is much slower than the array based versions but much faster than previous Chapel version code as the hashing has been greatly improved.
As an alternate to the use of a built-in library, the following code implements a specialized BasePrimesTable that works similarly to the way the Python associative arrays work as to hashing algorithm used (no hashing, as the hash values for integers are just themselves) and something similar to the Python method of handling hash table collisions is used:
Compile with the `--fast` compiler command line option
use Time;
config const limit = 100000000;
type Prime = uint(32);
record BasePrimesTable { // specialized for the use here...
record BasePrimeEntry { var fullkey: Prime; var val: Prime; }
var cpcty: int = 8; var sz: int = 0;
var dom = { 0 .. cpcty - 1 }; var bpa: [dom] BasePrimeEntry;
proc grow() {
const ndom = dom; var cbpa: [ndom] BasePrimeEntry = bpa[ndom];
bpa = new BasePrimeEntry(); cpcty *= 2; dom = { 0 .. cpcty - 1 };
for kv in cbpa do if kv.fullkey != 0 then add(kv.fullkey, kv.val);
}
proc find(k: Prime): int { // internal get location of value or -1
const msk = cpcty - 1; var skey = k: int & msk;
var perturb = k: int; var loop = 8;
do {
if bpa[skey].fullkey == k then return skey;
perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
loop -= 1; if perturb > 0 then loop = 8;
} while loop > 0;
return -1; // not found!
}
proc contains(k: Prime): bool { return find(k) >= 0; }
proc add(k, v: Prime) { // if exists then replaces else new entry
const fndi = find(k);
if fndi >= 0 then bpa[fndi] = new BasePrimeEntry(k, v);
else {
sz += 1; if 2 * sz > cpcty then grow();
const msk = cpcty - 1; var skey = k: int & msk;
var perturb = k: int; var loop = 8;
do {
if bpa[skey].fullkey == 0 {
bpa[skey] = new BasePrimeEntry(k, v); return; }
perturb >>= 5; skey = (5 * skey + 1 + perturb) & msk;
loop -= 1; if perturb > 0 then loop = 8;
} while loop > 0;
}
}
proc remove(k: Prime) { // if doesn't exist does nothing
const fndi = find(k);
if fndi >= 0 { bpa[fndi].fullkey = 0; sz -= 1; }
}
proc this(k: Prime): Prime { // returns value or 0 if not found
const fndi = find(k);
if fndi < 0 then return 0; else return bpa[fndi].val;
}
}
class Primes { // needed so we can use next to get successive values
var n: Prime; var obp: Prime; var q: Prime;
var bps: shared Primes?; var dict = new BasePrimesTable();
proc next(): Prime { // odd primes!
if this.n < 5 { this.n = 5; return 3; }
if this.bps == nil {
this.bps = new Primes(); // secondary odd base primes feed
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
while true {
if this.n >= this.q { // advance secondary stream of base primes...
const adv = this.obp * 2; const key = this.q + adv;
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
this.dict.add(key, adv);
}
else if this.dict.contains(this.n) { // found a composite; advance...
const adv = this.dict[this.n]; this.dict.remove(this.n);
var nkey = this.n + adv;
while this.dict.contains(nkey) do nkey += adv;
this.dict.add(nkey, adv);
}
else { const p = this.n; this.n += 2; return p; }
this.n += 2;
}
return 0; // to keep compiler happy in returning a value!
}
iter these(): Prime { yield 2; while true do yield this.next(); }
}
proc main() {
var count = 0;
write("The first 25 primes are: ");
for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
writeln();
var timer: Timer;
timer.start();
count = 0;
for p in new Primes() { if p > limit then break; count += 1; }
timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
- Output:
The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 Found 5761455 primes up to 100000000 in 2351.79 milliseconds.
This last code is quite usable up to a hundred million (as here) or even a billion in a little over ten times the time, but is still slower than the very simple odds-only monolithic array version and is also more complex, although it uses less memory (only for the hash table for the base primes of about eight Kilobytes for sieving to a billion compared to over 60 Megabytes for the monolithic odds-only simple version).
Chapel version 1.25.1 provides yet another option as to the form of the code although the algorithm is the same in that one can now override the hashing function for Chapel records so that they can be used as the Key Type for Hash Map's as follows:
Compile with the `--fast` compiler command line option
use Time;
use Map;
config const limit = 100000000;
type Prime = uint(32);
class Primes { // needed so we can use next to get successive values
record PrimeR { var prime: Prime; proc hash() { return prime; } }
var n: PrimeR = new PrimeR(0); var obp: Prime; var q: Prime;
var bps: owned Primes?;
var dict = new map(PrimeR, Prime);
proc next(): Prime { // odd primes!
if this.n.prime < 5 { this.n.prime = 5; return 3; }
if this.bps == nil {
this.bps = new Primes(); // secondary odd base primes feed
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
}
while true {
if this.n.prime >= this.q { // advance secondary stream of base primes...
const adv = this.obp * 2; const key = new PrimeR(this.q + adv);
this.obp = this.bps!.next(); this.q = this.obp * this.obp;
this.dict.add(key, adv);
}
else if this.dict.contains(this.n) { // found a composite; advance...
const adv = this.dict.getValue(this.n); this.dict.remove(this.n);
var nkey = new PrimeR(this.n.prime + adv);
while this.dict.contains(nkey) do nkey.prime += adv;
this.dict.add(nkey, adv);
}
else { const p = this.n.prime;
this.n.prime += 2; return p; }
this.n.prime += 2;
}
return 0; // to keep compiler happy in returning a value!
}
iter these(): Prime { yield 2; while true do yield this.next(); }
}
proc main() {
var count = 0;
write("The first 25 primes are: ");
for p in new Primes() { if count >= 25 then break; write(p, " "); count += 1; }
writeln();
var timer: Timer;
timer.start();
count = 0;
for p in new Primes() { if p > limit then break; count += 1; }
timer.stop();
write("Found ", count, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
}
This works in about exactly the same time as the last previous code, but doesn't require special custom adaptations of the associative array so that the standard library Map can be used.
Functional Tree Folding Odds-Only Version
Chapel isn't really a very functional language even though it has some functional forms of code in the Higher Order Functions (HOF's) of zippered, scanned, and reduced, iterations and has first class functions (FCF's) and lambdas (anonymous functions), these last can't be closures (capture variable bindings from external scope(s)), nor can the work around of using classes to emulate closures handle recursive (Y-combinator type) variable bindings using reference fields (at least currently with version 1.22). However, the Tree Folding add-on to the Richard Bird lazy list sieve doesn't require any of the things that can't be emulated using classes, so a version is given as follows:
use Time;
type Prime = uint(32);
config const limit = 1000000: Prime;
// Chapel doesn't have closures, so we need to emulate them with classes...
class PrimeCIS { // base prime stream...
var head: Prime;
proc next(): shared PrimeCIS { return new shared PrimeCIS(); }
}
class PrimeMultiples: PrimeCIS {
var adv: Prime;
override proc next(): shared PrimeCIS {
return new shared PrimeMultiples(
this.head + this.adv, this.adv): shared PrimeCIS; }
}
class PrimeCISCIS { // base stream of prime streams; never used directly...
var head: shared PrimeCIS;
proc init() { this.head = new shared PrimeCIS(); }
proc next(): shared PrimeCISCIS {
return new shared PrimeCISCIS(); }
}
class AllMultiples: PrimeCISCIS {
var bps: shared PrimeCIS;
proc init(bsprms: shared PrimeCIS) {
const bp = bsprms.head; const sqr = bp * bp; const adv = bp + bp;
this.head = new shared PrimeMultiples(sqr, adv): PrimeCIS;
this.bps = bsprms;
}
override proc next(): shared PrimeCISCIS {
return new shared AllMultiples(this.bps.next()): PrimeCISCIS; }
}
class Union: PrimeCIS {
var feeda, feedb: shared PrimeCIS;
proc init(fda: shared PrimeCIS, fdb: shared PrimeCIS) {
const ahd = fda.head; const bhd = fdb.head;
this.head = if ahd < bhd then ahd else bhd;
this.feeda = fda; this.feedb = fdb;
}
override proc next(): shared PrimeCIS {
const ahd = this.feeda.head; const bhd = this.feedb.head;
if ahd < bhd then
return new shared Union(this.feeda.next(), this.feedb): shared PrimeCIS;
if ahd > bhd then
return new shared Union(this.feeda, this.feedb.next()): shared PrimeCIS;
return new shared Union(this.feeda.next(),
this.feedb.next()): shared PrimeCIS;
}
}
class Pairs: PrimeCISCIS {
var feed: shared PrimeCISCIS;
proc init(fd: shared PrimeCISCIS) {
const fs = fd.head; const sss = fd.next(); const ss = sss.head;
this.head = new shared Union(fs, ss): shared PrimeCIS; this.feed = sss;
}
override proc next(): shared PrimeCISCIS {
return new shared Pairs(this.feed.next()): shared PrimeCISCIS; }
}
class Composites: PrimeCIS {
var feed: shared PrimeCISCIS;
proc init(fd: shared PrimeCISCIS) {
this.head = fd.head.head; this.feed = fd;
}
override proc next(): shared PrimeCIS {
const fs = this.feed.head.next();
const prs = new shared Pairs(this.feed.next()): shared PrimeCISCIS;
const ncs = new shared Composites(prs): shared PrimeCIS;
return new shared Union(fs, ncs): shared PrimeCIS;
}
}
class OddPrimesFrom: PrimeCIS {
var cmpsts: shared PrimeCIS;
override proc next(): shared PrimeCIS {
var n = head + 2; var cs = this.cmpsts;
while true {
if n < cs.head then
return new shared OddPrimesFrom(n, cs): shared PrimeCIS;
n += 2; cs = cs.next();
}
return this.cmpsts; // never used; keeps compiler happy!
}
}
class OddPrimes: PrimeCIS {
proc init() { this.head = 3; }
override proc next(): shared PrimeCIS {
const bps = new shared OddPrimes(): shared PrimeCIS;
const mlts = new shared AllMultiples(bps): shared PrimeCISCIS;
const cmpsts = new shared Composites(mlts): shared PrimeCIS;
return new shared OddPrimesFrom(5, cmpsts): shared PrimeCIS;
}
}
iter primes(): Prime {
yield 2; var cur = new shared OddPrimes(): shared PrimeCIS;
while true { yield cur.head; cur = cur.next(); }
}
// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }
Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).
var timer: Timer; timer.start(); cnt = 0;
for p in primes() { if p > limit then break; cnt += 1; }
timer.stop(); write("\nFound ", cnt, " primes up to ", limit);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
- Output:
The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 Found 78498 primes up to 1000000 in 344.859 milliseconds.
Time as run using Chapel version 24.1 on an Intel Skylake i5-6500 at 3.6 GHz (turbo, single threaded).
The above code is really just a toy example to show that Chapel can handle some tasks functionally (within the above stated limits) although doing so is slower than the Hash Table version above and also takes more memory as the nested lazy list structure consumes more memory in lazy list links and "plumbing" than does the simple implementation of a Hash Table. It also has a worst asymptotic performance with an extra `log(n)` factor where `n` is the sieving range; this can be shown by running the above program with `--limit=10000000` run time command line option to sieve to ten million which takes about 4.5 seconds to count the primes up to ten million (a factor of ten higher range, but much higher than the expected increased factor of about 10 per cent extra as per the Hash Table version with about 20 per cent more operations times the factor of ten for this version). Other than for the extra operations, this version is generally slower due to the time to do the many small allocations/de-allocations of the functional object instances, and this will be highly dependent on the platform on which it is run: cygwin on Windows may be particularly slow due to the extra level of indirection, and some on-line IDE's may also be slow due to their level of virtualization.
A Multi-Threaded Page-Segmented Odds-Only Bit-Packed Version
To take advantage of the features that make Chapel shine, we need to use it to do some parallel computations, and to efficiently do that for the Sieve of Eratosthenes, we need to divide the work into page segments where we can assign each largish segment to a separate thread/task; this also improves the speed due to better cache associativity with most memory accesses to values that are already in the cache(s). Once we have divided the work, Chapel offers lots of means to implement the parallelism but to be a true Sieve of Eratosthenes, we need to have the ability to output the results in order; with many of the convenience mechanisms not doing that, the best/simplest option is likely task parallelism with the output results assigned to an rotary indexed array containing the `sync` results. It turns out that, although the Chapel compiler can sometimes optimize the code so the overhead of creating tasks is not onerous, for this case where the actual tasks are somewhat complex, the compiler can't recognize that an automatically generated thread pool(s) are required so we need to generate the thread pool(s) manually. The code that implements the multi-threading of page segments using thread pools is as follows:
use Time; use BitOps; use CPtr;
type Prime = uint(64);
type PrimeNdx = int(64);
type BasePrime = uint(32);
config const LIMIT = 1000000000: Prime;
config const L1 = 16; // CPU L1 cache size in Kilobytes (1024);
assert (L1 == 16 || L1 == 32 || L1 == 64,
"L1 cache size must be 16, 32, or 64 Kilobytes!");
config const L2 = 128; // CPU L2 cache size in Kilobytes (1024);
assert (L2 == 128 || L2 == 256 || L2 == 512,
"L2 cache size must be 128, 256, or 512 Kilobytes!");
const CPUL1CACHE: int = L1 * 1024 * 8; // size in bits!
const CPUL2CACHE: int = L2 * 1024 * 8; // size in bits!
config const NUMTHRDS = here.maxTaskPar;
assert(NUMTHRDS >= 1, "NUMTHRDS must be at least one!");
const WHLPRMS = [ 2: Prime, 3: Prime, 5: Prime, 7: Prime,
11: Prime, 13: Prime, 17: Prime];
const FRSTSVPRM = 19: Prime; // past the pre-cull primes!
// 2 eliminated as even; 255255 in bytes...
const WHLPTRNSPN = 3 * 5 * 7 * 11 * 13 * 17;
// rounded up to next 64-bit boundary plus a 16 Kilobyte buffer for overflow...
const WHLPTRNBTSZ = ((WHLPTRNSPN * 8 + 63) & (-64)) + 131072;
// number of base primes within small span!
const SZBPSTRTS = 6542 - WHLPRMS.size + 1; // extra one for marker!
// number of base primes for CPU L1 cache buffer!
const SZMNSTRTS = (if L1 == 16 then 12251 else
if L1 == 32 then 23000 else 43390)
- WHLPRMS.size + 1; // extra one for marker!
// using this Look Up Table faster than bit twiddling...
const bitmsk = for i in 0 .. 7 do 1:uint(8) << i;
var WHLPTRN: SieveBuffer = new SieveBuffer(WHLPTRNBTSZ); fillWHLPTRN(WHLPTRN);
proc fillWHLPTRN(ref wp: SieveBuffer) {
const hi = WHLPRMS.size - 1;
const rng = 0 .. hi; var whlhd = new shared BasePrimeArr({rng});
// contains wheel pattern primes skipping the small wheel prime (2)!...
// never advances past the first base prime arr as it ends with a huge!...
for i in rng do whlhd.bparr[i] = (if i != hi then WHLPRMS[i + 1] // skip 2!
else 0x7FFFFFFF): BasePrime; // last huge!
var whlbpas = new shared BasePrimeArrs(whlhd);
var whlstrts = new StrtsArr({rng});
wp.cull(0, WHLPTRNBTSZ, whlbpas, whlstrts);
// eliminate wheel primes from the WHLPTRN buffer!...
wp.cmpsts[0] = 0xFF: uint(8);
}
// the following two must be classes for compability with sync...
class PrimeArr { var dom = { 0 .. -1 }; var prmarr: [dom] Prime; }
class BasePrimeArr { var dom = { 0 .. -1 }; var bparr: [dom] BasePrime; }
record StrtsArr { var dom = { 0 .. -1 }; var strtsarr: [dom] int(32); }
record SieveBuffer {
var dom = { 0 .. -1 }; var cmpsts: [dom] uint(8) = 0;
proc init() {}
proc init(btsz: int) { dom = { 0 .. btsz / 8 - 1 }; }
proc deinit() { dom = { 0 .. -1 }; }
proc fill(lwi: PrimeNdx) { // fill from the WHLPTRN stamp...
const sz = cmpsts.size; const mvsz = min(sz, 16384);
var mdlo = ((lwi / 8) % (WHLPTRNSPN: PrimeNdx)): int;
for i in 0 .. sz - 1 by 16384 {
c_memcpy(c_ptrTo(cmpsts[i]): c_void_ptr,
c_ptrTo(WHLPTRN.cmpsts[mdlo]): c_void_ptr, mvsz);
mdlo += 16384; if mdlo >= WHLPTRNSPN then mdlo -= WHLPTRNSPN;
}
}
proc count(btlmt: int) { // count by 64 bits using CPU popcount...
const lstwrd = btlmt / 64; const lstmsk = (-2):uint(64) << (btlmt & 63);
const cmpstsp = c_ptrTo(cmpsts: [dom] uint(8)): c_ptr(uint(64));
var i = 0; var cnt = (lstwrd * 64 + 64): int;
while i < lstwrd { cnt -= popcount(cmpstsp[i]): int; i += 1; }
return cnt - popcount(cmpstsp[lstwrd] | lstmsk): int;
}
// most of the time is spent doing culling operations as follows!...
proc cull(lwi: PrimeNdx, bsbtsz: int, bpas: BasePrimeArrs,
ref strts: StrtsArr) {
const btlmt = cmpsts.size * 8 - 1; const bplmt = bsbtsz / 32;
const ndxlmt = lwi: Prime + btlmt: Prime; // can't overflow!
const strtssz = strts.strtsarr.size;
// C pointer for speed magic!...
const cmpstsp = c_ptrTo(cmpsts[0]);
const strtsp = c_ptrTo(strts.strtsarr);
// first fill the strts array with pre-calculated start addresses...
var i = 0; for bp in bpas {
// calculate page start address for the given base prime...
const bpi = bp: int; const bbp = bp: Prime; const ndx0 = (bbp - 3) / 2;
const s0 = (ndx0 + ndx0) * (ndx0 + 3) + 3; // can't overflow!
if s0 > ndxlmt then {
if i < strtssz then strtsp[i] = -1: int(32); break; }
var s = 0: int;
if s0 >= lwi: Prime then s = (s0 - lwi: Prime): int;
else { const r = (lwi: Prime - s0) % bbp;
if r == 0 then s = 0: int; else s = (bbp - r): int; };
if i < strtssz - 1 { strtsp[i] = s: int(32); i += 1; continue; }
if i < strtssz { strtsp[i] = -1; i = strtssz; }
// cull the full buffer for this given base prime as usual...
// only works up to limit of int(32)**2!!!!!!!!
while s <= btlmt { cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
}
// cull the smaller sub buffers according to the strts array...
for sbtlmt in bsbtsz - 1 .. btlmt by bsbtsz {
i = 0; for bp in bpas { // bp never bigger than uint(32)!
// cull the sub buffer for this given base prime...
var s = strtsp[i]: int; if s < 0 then break;
var bpi = bp: int; var nxt = 0x7FFFFFFFFFFFFFFF;
if bpi <= bplmt { // use loop "unpeeling" for a small improvement...
const slmt = s + bpi * 8 - 1;
while s <= slmt {
const bmi = s & 7; const msk = bitmsk[bmi];
var c = s >> 3; const clmt = sbtlmt >> 3;
while c <= clmt { cmpstsp[c] |= msk; c += bpi; }
nxt = min(nxt, (c << 3): int(64) | bmi: int(64)); s += bpi;
}
strtsp[i] = nxt: int(32); i += 1;
}
else { while s <= sbtlmt { // standard cull loop...
cmpstsp[s >> 3] |= bitmsk[s & 7]; s += bpi; }
strtsp[i] = s: int(32); i += 1; }
}
}
}
}
// a generic record that contains a page result generating function;
// allows manual iteration through the use of the next() method;
// multi-threaded through the use of a thread pool...
class PagedResults {
const cnvrtrclsr; // output converter closure emulator, (lwi, sba) => output
var lwi: PrimeNdx; var bsbtsz: int;
var bpas: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
var sbs: [ 0 .. NUMTHRDS - 1 ] SieveBuffer = new SieveBuffer();
var strts: [ 0 .. NUMTHRDS - 1 ] StrtsArr = new StrtsArr();
var qi: int = 0;
var wrkq$: [ 0 .. NUMTHRDS - 1 ] sync PrimeNdx;
var rsltsq$: [ 0 .. NUMTHRDS - 1 ] sync cnvrtrclsr(lwi, sbs(0)).type;
proc init(cvclsr, li: PrimeNdx, bsz: int) {
cnvrtrclsr = cvclsr; lwi = li; bsbtsz = bsz; }
proc deinit() { // kill the thread pool when out of scope...
if bpas == nil then return; // no thread pool!
for i in wrkq$.domain {
wrkq$[i].writeEF(-1); while true { const r = rsltsq$[i].readFE();
if r == nil then break; }
}
}
proc next(): cnvrtrclsr(lwi, sbs(0)).type {
proc dowrk(ri: int) { // used internally!...
while true {
const li = wrkq$[ri].readFE(); // following to kill thread!
if li < 0 { rsltsq$[ri].writeEF(nil: cnvrtrclsr(li, sbs(ri)).type); break; }
sbs[ri].fill(li);
sbs[ri].cull(li, bsbtsz, bpas!, strts[ri]);
rsltsq$[ri].writeEF(cnvrtrclsr(li, sbs[ri]));
}
}
if this.bpas == nil { // init on first use; avoids data race!
this.bpas = new BasePrimeArrs();
if this.bsbtsz < CPUL1CACHE {
this.sbs = new SieveBuffer(bsbtsz);
this.strts = new StrtsArr({0 .. SZBPSTRTS - 1});
}
else {
this.sbs = new SieveBuffer(CPUL2CACHE);
this.strts = new StrtsArr({0 .. SZMNSTRTS - 1});
}
// start threadpool and give it inital work...
for i in rsltsq$.domain {
begin with (const in i) dowrk(i);
this.wrkq$[i].writeEF(this.lwi); this.lwi += this.sbs[i].cmpsts.size * 8;
}
}
const rslt = this.rsltsq$[qi].readFE();
this.wrkq$[qi].writeEF(this.lwi);
this.lwi += this.sbs[qi].cmpsts.size * 8;
this.qi = if qi >= NUMTHRDS - 1 then 0 else qi + 1;
return rslt;
}
iter these() { while lwi >= 0 do yield next(); }
}
// the sieve buffer to base prime array converter closure...
record SB2BPArr {
proc this(lwi: PrimeNdx, sb: SieveBuffer): shared BasePrimeArr? {
const bsprm = (lwi + lwi + 3): BasePrime;
const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
var arr = new shared BasePrimeArr({ 0 .. sb.count(szlmt) - 1 });
while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 {
arr.bparr[j] = bsprm + (i + i): BasePrime; j += 1; }
i += 1; }
return arr;
}
}
// a memoizing lazy list of BasePrimeArr's...
class BasePrimeArrs {
var head: shared BasePrimeArr;
var tail: shared BasePrimeArrs? = nil: shared BasePrimeArrs?;
var lock$: sync bool = true;
var feed: shared PagedResults(SB2BPArr) =
new shared PagedResults(new SB2BPArr(), 65536, 65536);
proc init() { // make our own first array to break data race!
var sb = new SieveBuffer(256); sb.fill(0);
const sb2 = new SB2BPArr();
head = sb2(0, sb): shared BasePrimeArr;
this.complete(); // fake base primes!
sb = new SieveBuffer(65536); sb.fill(0);
// use (completed) self as source of base primes!
var strts = new StrtsArr({ 0 .. 256 });
sb.cull(0, 65536, this, strts);
// replace head with new larger version culled using fake base primes!...
head = sb2(0, sb): shared BasePrimeArr;
}
// for initializing for use by the fillWHLPTRN proc...
proc init(hd: shared BasePrimeArr) {
head = hd; feed = new shared PagedResults(new SB2BPArr(), 0, 0);
}
// for initializing lazily extended list as required...
proc init(hd: shared BasePrimeArr, fd: PagedResults) { head = hd; feed = fd; }
proc next(): shared BasePrimeArrs {
if this.tail == nil { // in case other thread slipped through!
if this.lock$.readFE() && this.tail == nil { // empty sync -> block others!
const nhd = this.feed.next(): shared BasePrimeArr;
this.tail = new shared BasePrimeArrs(nhd , this.feed);
}
this.lock$.writeEF(false); // fill the sync so other threads can do nothing!
}
return this.tail: shared BasePrimeArrs; // necessary cast!
}
iter these(): BasePrime {
for bp in head.bparr do yield bp; var cur = next();
while true {
for bp in cur.head.bparr do yield bp; cur = cur.next(); }
}
}
record SB2PrmArr {
proc this(lwi: PrimeNdx, sb: SieveBuffer): shared PrimeArr? {
const bsprm = (lwi + lwi + 3): Prime;
const szlmt = sb.cmpsts.size * 8 - 1; var i, j = 0;
var arr = new shared PrimeArr({0 .. sb.count(szlmt) - 1});
while i <= szlmt { if sb.cmpsts[i >> 3] & bitmsk[i & 7] == 0 then {
arr.prmarr[j] = bsprm + (i + i): Prime; j += 1; }
i += 1; }
return arr;
}
}
iter primes(): Prime {
for p in WHLPRMS do yield p: Prime;
for pa in new shared PagedResults(new SB2PrmArr(), 0, CPUL1CACHE) do
for p in pa!.prmarr do yield p;
}
// use a class so that it can be used as a generic sync value!...
class CntNxt { const cnt: int; const nxt: PrimeNdx; }
// a class that emulates a closure and a return value...
record SB2Cnt {
const nxtlmt: PrimeNdx;
proc this(lwi: PrimeNdx, sb: SieveBuffer): shared CntNxt? {
const btszlmt = sb.cmpsts.size * 8 - 1; const lstndx = lwi + btszlmt: PrimeNdx;
const btlmt = if lstndx > nxtlmt then max(0, (nxtlmt - lwi): int) else btszlmt;
return new shared CntNxt(sb.count(btlmt), lstndx);
}
}
// couut primes to limit, just like it says...
proc countPrimesTo(lmt: Prime): int(64) {
const nxtlmt = ((lmt - 3) / 2): PrimeNdx; var count = 0: int(64);
for p in WHLPRMS { if p > lmt then break; count += 1; }
if lmt < FRSTSVPRM then return count;
for cn in new shared PagedResults(new SB2Cnt(nxtlmt), 0, CPUL1CACHE) {
count += cn!.cnt: int(64); if cn!.nxt >= nxtlmt then break;
}
return count;
}
// test it...
write("The first 25 primes are: "); var cnt = 0;
for p in primes() { if cnt >= 25 then break; cnt += 1; write(" ", p); }
cnt = 0; for p in primes() { if p > 1000000 then break; cnt += 1; }
writeln("\nThere are ", cnt, " primes up to a million.");
write("Sieving to ", LIMIT, " with ");
write("CPU L1/L2 cache sizes of ", L1, "/", L2, " KiloBytes ");
writeln("using ", NUMTHRDS, " threads.");
var timer: Timer; timer.start();
// the slow way!:
// var count = 0; for p in primes() { if p > LIMIT then break; count += 1; }
const count = countPrimesTo(LIMIT); // the fast way!
timer.stop();
write("Found ", count, " primes up to ", LIMIT);
writeln(" in ", timer.elapsed(TimeUnits.milliseconds), " milliseconds.");
- Output:
The first 25 primes are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 There are 78498 primes up to a million. Sieving to 1000000000 with CPU L1/L2 cache sizes of 16/128 KiloBytes using 4 threads. Found 50847534 primes up to 1000000000 in 128.279 milliseconds.
Time as run using Chapel version 1.24.1 on an Intel Skylake i5-6500 at 3.2 GHz (base, multi-threaded).
Note that the above code does implement some functional concepts as in a memoized lazy list of base prime arrays, but as this is used at the page level, the slowish performance doesn't impact the overall execution time much and the code is much more elegant in using this concept such that we compute new pages of base primes as they are required for increasing range.
Some of the most tricky bits due to having thread pools is stopping and de-initializing when they go out of scope; this is done by the `deinit` method of the `PagedResults` generic class, and was necessary to prevent a segmentation fault when the thread pool goes out of scope.
The tight inner loops for culling composite number representations have been optimized to some extent in using "loop unpeeling" for smaller base primes to simplify the loops down to simple masking by a constant with eight separate loops for the repeating pattern over bytes and culling by sub buffer CPU L1 cache sizes over the outer sieve buffer size of the CPU L2 cache size in order to make the task work-sized chunks larger for less task context switching overheads and for reduced time lost to culling start address calculations per base prime (which needs to use some integer division that is always slower than other integer operations). This last optimization allows for reasonably efficient culling up to the square of the CPU L2 cache size in bits or 1e12 for the one Megabit CPU L2 cache size many mid-range Intel CPU's have currently when used for multi-threading (half of the actual size for Hyper-Threaded - HT - threads as they share both the L1 and the L2 caches over the pairs of Hyper-Threaded (HT) threads per core).
Although this code can be used for much higher sieving ranges, it is not recommended due to not yet being tuned for better efficiency above 1e12; there are no checks limiting the user to this range, but, as well as decreasing efficiency for sieving limits much higher than this, at some point there will be errors due to integer overflows but these will be for huge sieving ranges taking days -> weeks -> months -> years to execute on common desktop CPU's.
A further optimization used is to create a pre-culled `WHLPTRN` `SieveBuffer` where the odd primes (since we cull odds-only) of 3, 5, 7, 11, 13, and 17 have already been culled and using that to pre-fill the page segment buffers so that no culling by these base prime values is required, this reduces the number of operations by about 45% compared to if it wasn't done but the ratio of better performance is only about 34.5% better as this changes the ratio of (fast) smaller base primes to larger (slower) ones.
All of the improvements to this point allow the shown performance as per the displayed output for the above program; using a command line argument of `--L1=32 --L2=256 --LIMIT=100000000000` (a hundred billion - 1e11 - on this computer, which has cache sizes of that amount and no Hyper-Threading - HT), it can count the primes to 1e11 in about 17.5 seconds using the above mentioned CPU. It will be over two times faster than this using a more modern desktop CPU such as the Intel Core i7-9700K which has twice as many effective cores, a higher CPU clock rate, is about 10% to 15% faster due the a more modern CPU architecture which is three generations newer. Of course using a top end AMD Threadripper CPU with its 64/128 cores/threads will be almost eight times faster again except that it will lose about 20% due to its slower clock speed when all cores/threads are used; note that high core CPU's will only give these speed gains for large sieving ranges such as 1e11 and above since otherwise there aren't enough work chunks to go around for all the threads available!
Incredibly, even run single threaded (argument of `--NUMTHRDS=1`) this implementation is only about 20% slower than the reference Sieve of Atkin "primegen/primespeed" implementation in counting the number of primes to a billion and is about 20% faster in counting the primes to a hundred billion (arguments of `--LIMIT=100000000000 --NUMTHRDS=1`) with both using the same size of CPU L1 cache buffer of 16 Kilobytes; This implementation does not yet have the level of wheel optimization of the Sieve of Atkin as it has only the limited wheel optimization of Odds-Only plus the use of the pre-cull fill. Maximum wheel factorization will reduce the number of operations for this code to less than about half the current number, making it faster than the Sieve of Atkin for all ranges, and approach the speed of Kim Walisch's "primesieve". However, not having primitive element pointers and pointer operations, there are some optimizations used that Kim Walisch's "primesieve" uses of extreme loop unrolling that mean that it can never quite reach the speed of "primeseive" by about 20% to 30%.
The above code is a fairly formidable benchmark, which I have also written in Fortran as in likely the major computer language that is comparable. I see that Chapel has the following advantages over Fortran:
1) It is somewhat cleaner to read and write code with more modern forms of expression, especially as to declaring variables/constants which can often be inferred as to type.
2) The Object Oriented Programming paradigm has been designed in from the beginning and isn't just an add-on that needs to be careful not to break legacy code; Fortran's method of expression this paradigm using modules seems awkward by comparison.
3) It has some more modern forms of automatic memory management as to type safety and sharing of allocated memory structures.
4) It has several modern forms of managing concurrency built in from the beginning rather than being add-on's or just being the ability to call through to OpenMP/MPI.
That said, it also as the following disadvantages, at least as I see it:
1) One of the worst things about Chapel is the slow compilation speed, which is about ten times slower than GNU gfortran.
2) It's just my personal opinion, but so much about forms of expression have been modernized and improved, it seems very dated to go back to using curly braces to delineate code blocks and semi-colons as line terminators; Most modern languages at least dispense with the latter.
3) Some programming features offered are still being defined, although most evolutionary changes now no longer are breaking code changes.
Speed isn't really an issue with either one, with some types of tasks better suited to one or the other but mostly about the same; for this particular task they are about the same if one were to implement the same algorithmic optimizations other than that one can do some of the extreme loop unrolling optimization with Fortran that can't be done with Chapel as Fortran has some limited form of pointers, although not the full set of pointer operators that C/C++ like languages have. I think that if both were optimized as much as each is capable, Fortran may run about 20% faster, perhaps due to the maturity of its compile and due to the availablity of (limited) pointer operations.
The primary additional optimization available to Chapel code is the addition of Maximum Wheel-Factorization as per my StackOverflow JavaScript Tutorial answer, with the other major improvement to add "bucket sieving" for sieving limits above about 1e12 so as to get reasonable efficiency up to 1e16 and above.
Clojure
(defn primes< [n]
(remove (set (mapcat #(range (* % %) n %)
(range 2 (Math/sqrt n))))
(range 2 n)))
The above is **not strictly a Sieve of Eratosthenes** as the composite culling ranges (in the mapcat) include all of the multiples of all of the numbers and not just the multiples of primes. When tested with (println (time (count (primes< 1000000))))
, it takes about 5.5 seconds just to find the number of primes up to a million, partly because of the extra work due to the use of the non-primes, and partly because of the constant enumeration using sequences with multiple levels of function calls. Although very short, this code is likely only useful up to about this range of a million.
It may be written using the into #{} function to run slightly faster due to the set function being concerned with only distinct elements whereas the into #{} only does the conjunction, and even at that doesn't do that much as it does the conjunction to an empty sequence, the code as follows:
(defn primes< [n]
(remove (into #{}
(mapcat #(range (* % %) n %)
(range 2 (Math/sqrt n))))
(range 2 n)))
The above code is slightly faster for the reasons given, but is still not strictly a Sieve of Eratosthenes due to sieving by all numbers and not just by the base primes.
The following code also uses the into #{} transducer but has been slightly wheel-factorized to sieve odds-only:
(defn primes< [n]
(if (< n 2) ()
(cons 2 (remove (into #{}
(mapcat #(range (* % %) n %)
(range 3 (Math/sqrt n) 2)))
(range 3 n 2)))))
The above code is a little over twice as fast as the non-odds-only due to the reduced number of operations. It still isn't strictly a Sieve of Eratosthenes as it sieves by all odd base numbers and not only by the base primes.
The following code calculates primes up to and including n using a mutable boolean array but otherwise entirely functional code; it is tens (to a hundred) times faster than the purely functional codes due to the use of mutability in the boolean array:
(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
(let [root (-> n Math/sqrt long),
cmpsts (boolean-array (inc n)),
cullp (fn [p]
(loop [i (* p p)]
(if (<= i n)
(do (aset cmpsts i true)
(recur (+ i p))))))]
(do (dorun (map #(cullp %) (filter #(not (aget cmpsts %))
(range 2 (inc root)))))
(filter #(not (aget cmpsts %)) (range 2 (inc n))))))
Alternative implementation using Clojure's side-effect oriented list comprehension.
(defn primes-to
"Returns a lazy sequence of prime numbers less than lim"
[lim]
(let [refs (boolean-array (+ lim 1) true)
root (int (Math/sqrt lim))]
(do (doseq [i (range 2 lim)
:while (<= i root)
:when (aget refs i)]
(doseq [j (range (* i i) lim i)]
(aset refs j false)))
(filter #(aget refs %) (range 2 lim)))))
Alternative implementation using Clojure's side-effect oriented list comprehension. Odds only.
(defn primes-to
"Returns a lazy sequence of prime numbers less than lim"
[lim]
(let [max-i (int (/ (- lim 1) 2))
refs (boolean-array max-i true)
root (/ (dec (int (Math/sqrt lim))) 2)]
(do (doseq [i (range 1 (inc root))
:when (aget refs i)]
(doseq [j (range (* (+ i i) (inc i)) max-i (+ i i 1))]
(aset refs j false)))
(cons 2 (map #(+ % % 1) (filter #(aget refs %) (range 1 max-i)))))))
This implemantation is about twice as fast as the previous one and uses only half the memory. From the index of the array, it calculates the value it represents as (2*i + 1), the step between two indices that represent the multiples of primes to mark as composite is also (2*i + 1). The index of the square of the prime to start composite marking is 2*i*(i+1).
Alternative very slow entirely functional implementation using lazy sequences
(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
(letfn [(nxtprm [cs] ; current candidates
(let [p (first cs)]
(if (> p (Math/sqrt n)) cs
(cons p (lazy-seq (nxtprm (-> (range (* p p) (inc n) p)
set (remove cs) rest)))))))]
(nxtprm (range 2 (inc n)))))
The reason that the above code is so slow is that it has has a high constant factor overhead due to using a (hash) set to remove the composites from the future composites stream, each prime composite stream removal requires a scan across all remaining composites (compared to using an array or vector where only the culled values are referenced, and due to the slowness of Clojure sequence operations as compared to iterator/sequence operations in other languages.
Version based on immutable Vector's
Here is an immutable boolean vector based non-lazy sequence version other than for the lazy sequence operations to output the result:
(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[max-prime]
(let [sieve (fn [s n]
(if (<= (* n n) max-prime)
(recur (if (s n)
(reduce #(assoc %1 %2 false) s (range (* n n) (inc max-prime) n))
s)
(inc n))
s))]
(->> (-> (reduce conj (vector-of :boolean) (map #(= % %) (range (inc max-prime))))
(assoc 0 false)
(assoc 1 false)
(sieve 2))
(map-indexed #(vector %2 %1)) (filter first) (map second))))
The above code is still quite slow due to the cost of the immutable copy-on-modify operations.
Odds only bit packed mutable array based version
The following code implements an odds-only sieve using a mutable bit packed long array, only using a lazy sequence for the output of the resulting primes:
(set! *unchecked-math* true)
(defn primes-to
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
(let [root (-> n Math/sqrt long),
rootndx (long (/ (- root 3) 2)),
ndx (long (/ (- n 3) 2)),
cmpsts (long-array (inc (/ ndx 64))),
isprm #(zero? (bit-and (aget cmpsts (bit-shift-right % 6))
(bit-shift-left 1 (bit-and % 63)))),
cullp (fn [i]
(let [p (long (+ i i 3))]
(loop [i (bit-shift-right (- (* p p) 3) 1)]
(if (<= i ndx)
(do (let [w (bit-shift-right i 6)]
(aset cmpsts w (bit-or (aget cmpsts w)
(bit-shift-left 1 (bit-and i 63)))))
(recur (+ i p))))))),
cull (fn [] (loop [i 0] (if (<= i rootndx)
(do (if (isprm i) (cullp i)) (recur (inc i))))))]
(letfn [(nxtprm [i] (if (<= i ndx)
(cons (+ i i 3) (lazy-seq (nxtprm (loop [i (inc i)]
(if (or (> i ndx) (isprm i)) i
(recur (inc i)))))))))]
(if (< n 2) nil
(cons 3 (if (< n 3) nil (do (cull) (lazy-seq (nxtprm 0)))))))))
The above code is about as fast as any "one large sieving array" type of program in any computer language with this level of wheel factorization other than the lazy sequence operations are quite slow: it takes about ten times as long to enumerate the results as it does to do the actual sieving work of culling the composites from the sieving buffer array. The slowness of sequence operations is due to nested function calls, but primarily due to the way Clojure implements closures by "boxing" all arguments (and perhaps return values) as objects in the heap space, which then need to be "un-boxed" as primitives as necessary for integer operations. Some of the facilities provided by lazy sequences are not needed for this algorithm, such as the automatic memoization which means that each element of the sequence is calculated only once; it is not necessary for the sequence values to be retraced for this algorithm.
If further levels of wheel factorization were used, the time to enumerate the resulting primes would be an even higher overhead as compared to the actual composite number culling time, would get even higher if page segmentation were used to limit the buffer size to the size of the CPU L1 cache for many times better memory access times, most important in the culling operations, and yet higher again if multi-processing were used to share to page segment processing across CPU cores.
The following code overcomes many of those limitations by using an internal (OPSeq) "deftype" which implements the ISeq interface as well as the Counted interface to provide immediate count returns (based on a pre-computed total), as well as the IReduce interface which can greatly speed come computations based on the primes sequence (eased greatly using facilities provided by Clojure 1.7.0 and up):
(defn primes-tox
"Computes lazy sequence of prime numbers up to a given number using sieve of Eratosthenes"
[n]
(let [root (-> n Math/sqrt long),
rootndx (long (/ (- root 3) 2)),
ndx (max (long (/ (- n 3) 2)) 0),
lmt (quot ndx 64),
cmpsts (long-array (inc lmt)),
cullp (fn [i]
(let [p (long (+ i i 3))]
(loop [i (bit-shift-right (- (* p p) 3) 1)]
(if (<= i ndx)
(do (let [w (bit-shift-right i 6)]
(aset cmpsts w (bit-or (aget cmpsts w)
(bit-shift-left 1 (bit-and i 63)))))
(recur (+ i p))))))),
cull (fn [] (do (aset cmpsts lmt (bit-or (aget cmpsts lmt)
(bit-shift-left -2 (bit-and ndx 63))))
(loop [i 0]
(when (<= i rootndx)
(when (zero? (bit-and (aget cmpsts (bit-shift-right i 6))
(bit-shift-left 1 (bit-and i 63))))
(cullp i))
(recur (inc i))))))
numprms (fn []
(let [w (dec (alength cmpsts))] ;; fast results count bit counter
(loop [i 0, cnt (bit-shift-left (alength cmpsts) 6)]
(if (> i w) cnt
(recur (inc i)
(- cnt (java.lang.Long/bitCount (aget cmpsts i))))))))]
(if (< n 2) nil
(cons 2 (if (< n 3) nil
(do (cull)
(deftype OPSeq [^long i ^longs cmpsa ^long cnt ^long tcnt] ;; for arrays maybe need to embed the array so that it doesn't get garbage collected???
clojure.lang.ISeq
(first [_] (if (nil? cmpsa) nil (+ i i 3)))
(next [_] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) nil
(OPSeq.
(loop [j (inc i)]
(let [p? (zero? (bit-and (aget cmpsa (bit-shift-right j 6))
(bit-shift-left 1 (bit-and j 63))))]
(if p? j (recur (inc j)))))
cmpsa ncnt tcnt))))
(more [this] (let [ncnt (inc cnt)] (if (>= ncnt tcnt) (OPSeq. 0 nil tcnt tcnt)
(.next this))))
(cons [this o] (clojure.core/cons o this))
(empty [_] (if (= cnt tcnt) nil (OPSeq. 0 nil tcnt tcnt)))
(equiv [this o] (if (or (not= (type this) (type o))
(not= cnt (.cnt ^OPSeq o)) (not= tcnt (.tcnt ^OPSeq o))
(not= i (.i ^OPSeq o))) false true))
clojure.lang.Counted
(count [_] (- tcnt cnt))
clojure.lang.Seqable
(clojure.lang.Seqable/seq [this] (if (= cnt tcnt) nil this))
clojure.lang.IReduce
(reduce [_ f v] (let [c (- tcnt cnt)]
(if (<= c 0) nil
(loop [ci i, n c, rslt v]
(if (zero? (bit-and (aget cmpsa (bit-shift-right ci 6))
(bit-shift-left 1 (bit-and ci 63))))
(let [rrslt (f rslt (+ ci ci 3)),
rdcd (reduced? rrslt),
nrslt (if rdcd @rrslt rrslt)]
(if (or (<= n 1) rdcd) nrslt
(recur (inc ci) (dec n) nrslt)))
(recur (inc ci) n rslt))))))
(reduce [this f] (if (nil? i) (f) (if (= (.count this) 1) (+ i i 3)
(.reduce ^clojure.lang.IReduce (.next this) f (+ i i 3)))))
clojure.lang.Sequential
Object
(toString [this] (if (= cnt tcnt) "()"
(.toString (seq (map identity this))))))
(->OPSeq 0 cmpsts 0 (numprms))))))))
'(time (count (primes-tox 10000000)))' takes about 40 milliseconds (compiled) to produce 664579.
Due to the better efficiency of the custom CIS type, the primes to the above range can be enumerated in about the same 40 milliseconds that it takes to cull and count the sieve buffer array.
Under Clojure 1.7.0, one can use '(time (reduce (fn [] (+ (long sum) (long v))) 0 (primes-tox 2000000)))' to find "142913828922" as the sum of the primes to two million as per Euler Problem 10 in about 40 milliseconds total with about half the time used for sieving the array and half for computing the sum.
To show how sensitive Clojure is to forms of expression of functions, the simple form '(time (reduce + (primes-tox 2000000)))' takes about twice as long even though it is using the same internal routine for most of the calculation due to the function not having the type coercion's.
Before one considers that this code is suitable for larger ranges, it is still lacks the improvements of page segmentation with pages about the size of the CPU L1/L2 caches (produces about a four times speed up), maximal wheel factorization (to make it another about four times faster), and the use of multi-processing (for a further gain of about 4 times for a multi-core desktop CPU such as an Intel i7), will make the sieving/counting code about 50 times faster than this, although there will only be a moderate improvement in the time to enumerate/process the resulting primes. Using these techniques, the number of primes to one billion can be counted in a small fraction of a second.
Unbounded Versions
For some types of problems such as finding the nth prime (rather than the sequence of primes up to m), a prime sieve with no upper bound is a better tool.
The following variations on an incremental Sieve of Eratosthenes are based on or derived from the Richard Bird sieve as described in the Epilogue of Melissa E. O'Neill's definitive paper:
A Clojure version of Richard Bird's Sieve using Lazy Sequences (sieves odds only)
(defn primes-Bird
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm by Richard Bird."
[]
(letfn [(mltpls [p] (let [p2 (* 2 p)]
(letfn [(nxtmltpl [c]
(cons c (lazy-seq (nxtmltpl (+ c p2)))))]
(nxtmltpl (* p p))))),
(allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),
(union [xs ys] (let [xv (first xs), yv (first ys)]
(if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))
(if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))
(cons xv (lazy-seq (union (next xs) (next ys)))))))),
(mrgmltpls [mltplss] (cons (first (first mltplss))
(lazy-seq (union (next (first mltplss))
(mrgmltpls (next mltplss)))))),
(minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]
(if (< n (first cmpsts))
(cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))
(recur (+ n 2) (next cmpsts)))))]
(do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
(cons 2 (lazy-seq oddprms)))))
The above code is quite slow due to both that the data structure is a linear merging of prime multiples and due to the slowness of the Clojure sequence operations.
A Clojure version of the tree folding sieve using Lazy Sequences
The following code speeds up the above code by merging the linear sequence of sequences as above by pairs into a right-leaning tree structure:
(defn primes-treeFolding
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
[]
(letfn [(mltpls [p] (let [p2 (* 2 p)]
(letfn [(nxtmltpl [c]
(cons c (lazy-seq (nxtmltpl (+ c p2)))))]
(nxtmltpl (* p p))))),
(allmtpls [ps] (cons (mltpls (first ps)) (lazy-seq (allmtpls (next ps))))),
(union [xs ys] (let [xv (first xs), yv (first ys)]
(if (< xv yv) (cons xv (lazy-seq (union (next xs) ys)))
(if (< yv xv) (cons yv (lazy-seq (union xs (next ys))))
(cons xv (lazy-seq (union (next xs) (next ys)))))))),
(pairs [mltplss] (let [tl (next mltplss)]
(cons (union (first mltplss) (first tl))
(lazy-seq (pairs (next tl)))))),
(mrgmltpls [mltplss] (cons (first (first mltplss))
(lazy-seq (union (next (first mltplss))
(mrgmltpls (pairs (next mltplss))))))),
(minusStrtAt [n cmpsts] (loop [n n, cmpsts cmpsts]
(if (< n (first cmpsts))
(cons n (lazy-seq (minusStrtAt (+ n 2) cmpsts)))
(recur (+ n 2) (next cmpsts)))))]
(do (def oddprms (cons 3 (lazy-seq (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
(cons 2 (lazy-seq oddprms)))))
The above code is still slower than it should be due to the slowness of Clojure's sequence operations.
A Clojure version of the above tree folding sieve using a custom Co Inductive Sequence
The following code uses a custom "deftype" non-memoizing Co Inductive Stream/Sequence (CIS) implementing the ISeq interface to make the sequence operations more efficient and is about four times faster than the above code:
(deftype CIS [v cont]
clojure.lang.ISeq
(first [_] v)
(next [_] (if (nil? cont) nil (cont)))
(more [this] (let [nv (.next this)] (if (nil? nv) (CIS. nil nil) nv)))
(cons [this o] (clojure.core/cons o this))
(empty [_] (if (and (nil? v) (nil? cont)) nil (CIS. nil nil)))
(equiv [this o] (loop [cis1 this, cis2 o] (if (nil? cis1) (if (nil? cis2) true false)
(if (or (not= (type cis1) (type cis2))
(not= (.v cis1) (.v ^CIS cis2))
(and (nil? (.cont cis1))
(not (nil? (.cont ^CIS cis2))))
(and (nil? (.cont ^CIS cis2))
(not (nil? (.cont cis1))))) false
(if (nil? (.cont cis1)) true
(recur ((.cont cis1)) ((.cont ^CIS cis2))))))))
(count [this] (loop [cis this, cnt 0] (if (or (nil? cis) (nil? (.cont cis))) cnt
(recur ((.cont cis)) (inc cnt)))))
clojure.lang.Seqable
(seq [this] (if (and (nil? v) (nil? cont)) nil this))
clojure.lang.Sequential
Object
(toString [this] (if (and (nil? v) (nil? cont)) "()" (.toString (seq (map identity this))))))
(defn primes-treeFoldingx
"Computes the unbounded sequence of primes using a Sieve of Eratosthenes algorithm modified from Bird."
[]
(letfn [(mltpls [p] (let [p2 (* 2 p)]
(letfn [(nxtmltpl [c]
(->CIS c (fn [] (nxtmltpl (+ c p2)))))]
(nxtmltpl (* p p))))),
(allmtpls [^CIS ps] (->CIS (mltpls (.v ps)) (fn [] (allmtpls ((.cont ps)))))),
(union [^CIS xs ^CIS ys] (let [xv (.v xs), yv (.v ys)]
(if (< xv yv) (->CIS xv (fn [] (union ((.cont xs)) ys)))
(if (< yv xv) (->CIS yv (fn [] (union xs ((.cont ys)))))
(->CIS xv (fn [] (union (next xs) ((.cont ys))))))))),
(pairs [^CIS mltplss] (let [^CIS tl ((.cont mltplss))]
(->CIS (union (.v mltplss) (.v tl))
(fn [] (pairs ((.cont tl))))))),
(mrgmltpls [^CIS mltplss] (->CIS (.v ^CIS (.v mltplss))
(fn [] (union ((.cont ^CIS (.v mltplss)))
(mrgmltpls (pairs ((.cont mltplss)))))))),
(minusStrtAt [n ^CIS cmpsts] (loop [n n, cmpsts cmpsts]
(if (< n (.v cmpsts))
(->CIS n (fn [] (minusStrtAt (+ n 2) cmpsts)))
(recur (+ n 2) ((.cont cmpsts))))))]
(do (def oddprms (->CIS 3 (fn [] (let [cmpsts (-> oddprms (allmtpls) (mrgmltpls))]
(minusStrtAt 5 cmpsts)))))
(->CIS 2 (fn [] oddprms)))))
'(time (count (take-while #(<= (long %) 10000000) (primes-treeFoldingx))))' takes about 3.4 seconds for a range of 10 million.
The above code is useful for ranges up to about fifteen million primes, which is about the first million primes; it is comparable in speed to all of the bounded versions except for the fastest bit packed version which can reasonably be used for ranges about 100 times as large.
Incremental Hash Map based unbounded "odds-only" version
The following code is a version of the O'Neill Haskell code but does not use wheel factorization other than for sieving odds only (although it could be easily added) and uses a Hash Map (constant amortized access time) rather than a Priority Queue (log n access time for combined remove-and-insert-anew operations, which are the majority used for this algorithm) with a lazy sequence for output of the resulting primes; the code has the added feature that it uses a secondary base primes sequence generator and only adds prime culling sequences to the composites map when they are necessary, thus saving time and limiting storage to only that required for the map entries for primes up to the square root of the currently sieved number:
(defn primes-hashmap
"Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Hashmap"
[]
(letfn [(nxtoddprm [c q bsprms cmpsts]
(if (>= c q) ;; only ever equal
(let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]
(recur (+ c 2) (* nbp nbp) nbps (assoc cmpsts (+ q p2) p2)))
(if (contains? cmpsts c)
(recur (+ c 2) q bsprms
(let [adv (cmpsts c), ncmps (dissoc cmpsts c)]
(assoc ncmps
(loop [try (+ c adv)] ;; ensure map entry is unique
(if (contains? ncmps try)
(recur (+ try adv)) try)) adv)))
(cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts))))))]
(do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms {}))))
(cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms {}))))))
The above code is slower than the best tree folding version due to the added constant factor overhead of computing the hash functions for every hash map operation even though it has computational complexity of (n log log n) rather than the worse (n log n log log n) for the previous incremental tree folding sieve. It is still about 100 times slower than the sieve based on the bit-packed mutable array due to these constant factor hashing overheads.
There is almost no benefit of converting the above code to use a CIS as most of the time is expended in the hash map functions.
Incremental Priority Queue based unbounded "odds-only" version
In order to implement the O'Neill Priority Queue incremental Sieve of Eratosthenes algorithm, one requires an efficient implementation of a Priority Queue, which is not part of standard Clojure. For this purpose, the most suitable Priority Queue is a binary tree heap based MinHeap algorithm. The following code implements a purely functional (using entirely immutable state) MinHeap Priority Queue providing the required functions of (emtpy-pq) initialization, (getMin-pq pq) to examinte the minimum key/value pair in the queue, (insert-pq pq k v) to add entries to the queue, and (replaceMinAs-pq pq k v) to replaace the minimum entry with a key/value pair as given (it is more efficient that if functions were provided to delete and then re-insert entries in the queue; there is therefore no "delete" or other queue functions supplied as the algorithm does not requrie them:
(deftype PQEntry [k, v]
Object
(toString [_] (str "<" k "," v ">")))
(deftype PQNode [ntry, lft, rght]
Object
(toString [_] (str "<" ntry " left: " (str lft) " right: " (str rght) ">")))
(defn empty-pq [] nil)
(defn getMin-pq [^PQNode pq]
(if (nil? pq)
nil
(.ntry pq)))
(defn insert-pq [^PQNode opq ok v]
(loop [^PQEntry kv (->PQEntry ok v), pq opq, cont identity]
(if (nil? pq)
(cont (->PQNode kv nil nil))
(let [k (.k kv),
^PQEntry kvn (.ntry pq), kn (.k kvn),
l (.lft pq), r (.rght pq)]
(if (<= k kn)
(recur kvn r #(cont (->PQNode kv % l)))
(recur kv r #(cont (->PQNode kvn % l))))))))
(defn replaceMinAs-pq [^PQNode opq k v]
(let [^PQEntry kv (->PQEntry k v)]
(if (nil? opq) ;; if was empty or just an entry, just use current entry
(->PQNode kv nil nil)
(loop [pq opq, cont identity]
(let [^PQNode l (.lft pq), ^PQNode r (.rght pq)]
(cond ;; if left us empty, right must be too
(nil? l)
(cont (->PQNode kv nil nil)),
(nil? r) ;; we only have a left...
(let [^PQEntry kvl (.ntry l), kl (.k kvl)]
(if (<= k kl)
(cont (->PQNode kv l nil))
(recur l #(cont (->PQNode kvl % nil))))),
:else (let [^PQEntry kvl (.ntry l), kl (.k kvl),
^PQEntry kvr (.ntry r), kr (.k kvr)] ;; we have both
(if (and (<= k kl) (<= k kr))
(cont (->PQNode kv l r))
(if (<= kl kr)
(recur l #(cont (->PQNode kvl % r)))
(recur r #(cont (->PQNode kvr l %))))))))))))
Note that the above code is written partially using continuation passing style so as to leave the "recur" calls in tail call position as required for efficient looping in Clojure; for practical sieving ranges, the algorithm could likely use just raw function recursion as recursion depth is unlikely to be used beyond a depth of about ten or so, but raw recursion is said to be less code efficient.
The actual incremental sieve using the Priority Queue is as follows, which code uses the same optimizations of postponing the addition of prime composite streams to the queue until the square root of the currently sieved number is reached and using a secondary base primes stream to generate the primes composite stream markers in the queue as was used for the Hash Map version:
(defn primes-pq
"Infinite sequence of primes using an incremental Sieve or Eratosthenes with a Priority Queue"
[]
(letfn [(nxtoddprm [c q bsprms cmpsts]
(if (>= c q) ;; only ever equal
(let [p2 (* (first bsprms) 2), nbps (next bsprms), nbp (first nbps)]
(recur (+ c 2) (* nbp nbp) nbps (insert-pq cmpsts (+ q p2) p2)))
(let [mn (getMin-pq cmpsts)]
(if (and mn (>= c (.k mn))) ;; never greater than
(recur (+ c 2) q bsprms
(loop [adv (.v mn), cmps cmpsts] ;; advance repeat composites for value
(let [ncmps (replaceMinAs-pq cmps (+ c adv) adv),
nmn (getMin-pq ncmps)]
(if (and nmn (>= c (.k nmn)))
(recur (.v nmn) ncmps)
ncmps))))
(cons c (lazy-seq (nxtoddprm (+ c 2) q bsprms cmpsts)))))))]
(do (def baseoddprms (cons 3 (lazy-seq (nxtoddprm 5 9 baseoddprms (empty-pq)))))
(cons 2 (lazy-seq (nxtoddprm 3 9 baseoddprms (empty-pq)))))))
The above code is faster than the Hash Map version up to about a sieving range of fifteen million or so, but gets progressively slower for larger ranges due to having (n log n log log n) computational complexity rather than the (n log log n) for the Hash Map version, which has a higher constant factor overhead that is overtaken by the extra "log n" factor.
It is slower that the fastest of the tree folding versions (which has the same computational complexity) due to the higher constant factor overhead of the Priority Queue operations (although perhaps a more efficient implementation of the MinHeap Priority Queue could be developed).
Again, these non-mutable array based sieves are about a hundred times slower than even the "one large memory buffer array" version as implemented in the bounded section; a page segmented version of the mutable bit-packed memory array would be several times faster.
All of these algorithms will respond to maximum wheel factorization, getting up to approximately four times faster if this is applied as compared to the the "odds-only" versions.
It is difficult if not impossible to apply efficient multi-processing to the above versions of the unbounded sieves as the next values of the primes sequence are dependent on previous changes of state for the Bird and Tree Folding versions; however, with the addition of a "update the whole Priority Queue (and reheapify)" or "update the Hash Map" to a given page start state functions, it is possible to do for these letter two algorithms; however, even though it is possible and there is some benefit for these latter two implementations, the benefit is less than using mutable arrays due to that the results must be enumerated into a data structure of some sort in order to be passed out of the page function whereas they can be directly enumerated from the array for the mutable array versions.
Bit packed page segmented array unbounded "odds-only" version
To show that Clojure does not need to be particularly slow, the following version runs about twice as fast as the non-segmented unbounded array based version above (extremely fast compared to the non-array based versions) and only a little slower than other equivalent versions running on virtual machines: C# or F# on DotNet or Java and Scala on the JVM:
(set! *unchecked-math* true)
(def PGSZ (bit-shift-left 1 14)) ;; size of CPU cache
(def PGBTS (bit-shift-left PGSZ 3))
(def PGWRDS (bit-shift-right PGBTS 5))
(def BPWRDS (bit-shift-left 1 7)) ;; smaller page buffer for base primes
(def BPBTS (bit-shift-left BPWRDS 5))
(defn- count-pg
"count primes in the culled page buffer, with test for limit"
[lmt ^ints pg]
(let [pgsz (alength pg),
pgbts (bit-shift-left pgsz 5),
cntem (fn [lmtw]
(let [lmtw (long lmtw)]
(loop [i (long 0), c (long 0)]
(if (>= i lmtw) (- (bit-shift-left lmtw 5) c)
(recur (inc i)
(+ c (java.lang.Integer/bitCount (aget pg i))))))))]
(if (< lmt pgbts)
(let [lmtw (bit-shift-right lmt 5),
lmtb (bit-and lmt 31)
msk (bit-shift-left -2 lmtb)]
(+ (cntem lmtw)
(- 32 (java.lang.Integer/bitCount (bit-or (aget pg lmtw)
msk)))))
(- pgbts
(areduce pg i ret (long 0) (+ ret (java.lang.Integer/bitCount (aget pg i))))))))
;; (cntem pgsz))))
(defn- primes-pages
"unbounded Sieve of Eratosthenes producing a lazy sequence of culled page buffers."
[]
(letfn [(make-pg [lowi pgsz bpgs]
(let [lowi (long lowi),
pgbts (long (bit-shift-left pgsz 5)),
pgrng (long (+ (bit-shift-left (+ lowi pgbts) 1) 3)),
^ints pg (int-array pgsz),
cull (fn [bpgs']
(loop [i (long 0), bpgs' bpgs']
(let [^ints fbpg