Sieve of Eratosthenes: Difference between revisions

→‎{{header|Scala}}: corrected code and comments, added a section using HashMap...
(→‎Sub-optimal tail recursive trial division: Scala - used tail recursion to implement a true Sieve of Eratosthenes (page segmented "infinite" iteration)...)
(→‎{{header|Scala}}: corrected code and comments, added a section using HashMap...)
Line 4,444:
}
// BitSet toList is shuffled when using the ParSet version.
println(sieveOfEratosthenes(100).toList.sorted)</lang>
println(sieveOfEratosthenes(100000000).size)</lang>
{{out}}
<pre>List(2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97)</pre>
5761455</pre>
 
TheWhile concise, the above code is quite slow but a little faster not using the ParSet (taking'''take out the '.par' for speed'''), in which case the conversion 'toList' and sorting ('sorted') areis not necessary; thefor abovean codeadditional alsosmall usesgain ain lotspeed; ofthe memoryabove withcode anis Intslow elementbecause inof all the BitSet for every numberoverhead in theprocessing rangethe forbit whichpacked the"BitSet" compositebib-by-bit numbersusing arecomplex removed by the"higher-order" sievemethod operationcalls.
 
=== Using tail recursion===
The below code using an array (bit packed) is 10's of times faster and uses 32 times less memory:
The below code using a primitive array (bit packed) and tail recursion to avoid some of the enumeration delays is almost three times faster:
 
<lang Scala>defimport sieveOfEratosthenes(nTo: Int) = {scala.annotation.tailrec
def sieveOfEratosthenes(nTo: Int) = {
val nonprimes = new java.util.BitSet(nTo + 1)
val nonprimes: Array[Int] = new Array((nTo >>> 5) + 1)
Iterator.iterate(2)(p => nonprimes.nextClearBit(p + 1))
def cullp(p: Int) = { @tailrec def cull(c: Int): Unit = { if (c <= nTo) {
.takeWhile(_ <= Math.sqrt(nTo).toInt).foreach(p => {
for {cull <-nonprimes(c p>>> *5) p|= to1 nTo<< by(c & 31); cull(c + p) } nonprimes set}; cull(p }* p) }
for { p <- 2 to Math.sqrt(nTo).toInt
Iterator.iterate(2)(p => nonprimes.nextClearBit(p + 1)).takeWhile(_ <= nTo) }</lang>
if ((nonprimes(p >>> 5). & (1 << (p & 31))) == 0) } cullp(p)
 
def nextprime(i: Int): Int = { if (i > nTo) i else
The above code still uses an amount of memory proportional to the range of the sieve. The following code uses a fixed range buffer that is about the size of the CPU L1 cache plus only storage for the base primes up to the square root of the range for a large potential saving in RAM memory used:
if ((nonprimes(i >>> 5) & (1 << (i & 31))) != 0) nextprime(i + 1) else i }
 
Iterator.iterate(2)(c => nextprime(c + 1)).takeWhile(_ <= nTo)
<lang Scala>def SoEPg(): Iterator[Int] = {
var basePrimes = Iterator.empty : Iterator[Int]
val basePrimesArr = ArrayBuffer() : ArrayBuffer[Int]
def pgPrms = (low: Int) => {
val buf = new java.util.BitSet(131072);
if (low <= 0) { //page zero has no base primes...
Iterator.iterate(0)(i => buf.nextClearBit(i + 1))
.takeWhile(_ <= (Math.sqrt(262146).toInt - 3) / 2).foreach(i => {
val p = i + i + 3
for {cull <- (p * p - 3) / 2 until 131072 by p} buf set cull }) }
else {
if (basePrimesArr.length == 0) {
basePrimes = SoEPg(); basePrimes.next() // new source primes past two
basePrimesArr += basePrimes.next() }
var lst = basePrimesArr(basePrimesArr.length - 1)
var sqr = lst * lst
while (sqr < ((low + 131072) << 1) + 3) { // add enough base primes
val np = basePrimes.next(); basePrimesArr += np; sqr = np * np }
for (p <- basePrimesArr) {
var s = (p * p - 3) >>> 1
if (s >= low) s -= low
else { val r = (low - s) % p; s = if (r != 0) p - r else 0 }
for {cull <- s until 131072 by p} buf set cull } }
Iterator.iterate(buf.nextClearBit(0))(i => buf.nextClearBit(i + 1))
.takeWhile(_ < 131072).map(i => ((i + low) << 1) + 3)
}
Iterator.single(2) ++ Iterator.iterate(0)(_ + 131072)
.map(i => pgPrms(i)).flatten
}</lang>
 
While the above code is reasonably fast, much of the execution time is used by the use of the built-in functions and iterators for concise code, especially in the use of iterators. A longer version using "roll-your-own" iteration or using a class with built in "higher-order" functions would be faster for many jobs in the use of primes such as summing over a range, counting, et cetera.
 
===Odds-only page-segmented "infinite" generator version using tail recursion===
The above code still uses an amount of memory proportional to the range of the sieve (although bit packed). As well as only sieving odd candidates, the following code uses a fixed range buffer that is about the size of the CPU L1 cache plus only storage for the base primes up to the square root of the range for a large potential saving in RAM memory used. The use of innermost tail recursive loops rather than "higher order" functions from iterators also greatly reduces execution time, with much of the remaining time used just to enumerate the primes output:
 
The following code takes the above page segmented code and changes the innermost composite number culling loops from using iterators to use a tail recursive loop for a gain of almost a factor of two in speed, with much of the remaining time used just to enumerate the primes output:
<lang Scala>import scala.annotation.tailrec
def SoEPg(): Iterator[Int] = {
val basePrimesArr : ArrayBuffer[Int] = ArrayBuffer()
var basePrimes = Iterator.empty : Iterator[Int]
def makePg(lowp: Int, basePrimes: Iterator[Int]) = {
val basePrimesArr = ArrayBuffer() : ArrayBuffer[Int]
def pgPrms val nxtlowp = (lowp + 262144; val low: Int= (lowp - 3) =>>> {1
val buf: Array[Int] = new java.util.BitSetArray(131072);
def cullpnextPrime(sc: Int,) p= { @tailrec def nexti(i: Int): Int = {
if ((i < 131072) && ((buf(i >>> 5) & (1 << (i & 31)))) != 0) nexti(i + 1)
@tailrec
def cull(c: Int):else Uniti = if}; (nexti((c <- 131072lowp) {buf>>> set1) c;<< cull(c1) + p)lowp }
def cullp(s: Int, p: Int) = { @tailrec def cull(c: Int): Unit = {
cull(s) }
if (lowc <= 0131072) { //pagebuf(c zero>>> has5) no|= base1 primes...<< (c & 31)
cull(c + p) } }; cull(s) }
Iterator.iterate(0)(i => buf.nextClearBit(i + 1))
val bps = if (low <= 0) { //page zero has no base primes...
.takeWhile(_ <= (Math.sqrt(262146).toInt - 3) / 2).foreach(i => {
val Iterator.iterate(3)(p => i + i + 3; cullp(nextPrime(p *+ p - 32) >>> 1, p) }) }
.takeWhile(_ <= Math.sqrt(262145).toInt).foreach(p => {
else {
cullp((p * p - 3) >>> 1, p) }); basePrimes }
if (basePrimesArr.length == 0) {
else { val nbps = if (basePrimesArr.length == 0) {
basePrimes = SoEPg(); basePrimes.next() // new source primes past two
val nbasePrimes = SoEPg(); nbasePrimes.next() // new source primes > 2
basePrimesArr += basePrimes.next() }
basePrimesArr += nbasePrimes.next(); nbasePrimes; } else basePrimes
var lst = basePrimesArr(basePrimesArr.length - 1)
vardef sqrfillBasePrimes(bp: =Int): lstInt *= lst{
while (sqr <if ((lowbp +* 131072)bp << 1) + 3nxtlowp) { // add enough base primes
val np = basePrimesnbps.next(); basePrimesArr += np; sqr = np * np }
fillBasePrimes(np) } else bp }
fillBasePrimes(basePrimesArr(basePrimesArr.length - 1))
for (p <- basePrimesArr) {
varval sb = (p * p - 3) >>> 1
val s = if (sb >= low) sb -= low else
else { val r = (low - sb) % p; s = if (r != 0) p - r else 0 }; cullp(s, p) }; nbps }
(Iterator.iterate(nextPrime(lowp))(p => nextPrime(p + 2))
cullp(s, p) } }
.takeWhile(_ < nxtlowp), nxtlowp, bps) }
Iterator.iterate(buf.nextClearBit(0))(i => buf.nextClearBit(i + 1))
Iterator.single(2) ++
.takeWhile(_ < 131072).map(i => ((i + low) << 1) + 3) }
Iterator.single(2) ++ Iterator.iterate(0makePg(3, Iterator.empty))({case (_,nlp,b) +=> 131072makePg(nlp,b)})
.map(i{case (itr,_,_) => pgPrms(i)itr}).flatten
}</lang>
 
As the above and all following sieves are "infinite", they all require an extra range limiting condition to produce a finite output, such as the addition of ".takeWhile(_ <= topLimit)" where "topLimit" is the specified range.
===Odds-Only sieve using Streams and Co-Inductive Streams===
 
While the above code is reasonably fast, much of the execution time is consumed by the use of the built-in functions and iterators for concise code, especially in the use of iterators for primes output. A longer version using "roll-your-own" iteration or using a class with built in combined "higher-order" functions would be faster for many tasks using primes such as summing over a range, counting, et cetera.
 
===Odds-Only "infinite" generator sieve using Streams and Co-Inductive Streams===
Using Streams, [http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf the "unfaithful sieve"], i.e. '''sub-optimal trial division sieve. Not a sieve of Eratosthenes'''.
<lang scala>def sieve(nums: Stream[Int]): Stream[Int] =
Line 4,540 ⟶ 4,520:
<pre>List(2, 3, 5, 7, 11, 13, 17, 19, 23, 29)</pre>
 
The above code is extremely inefficient for larger ranges, both because it tests for primality using computationally expensive divide (modulo) operations and because it sets up deferred tests for division by all of the primes up to each prime candidate, meaning that it has approximately a square law computational complexity with range.
The following code uses delayed recursion as per Streams to implement the Richard Bird algorithm mentioned in the last part (the Epilogue) of the above linked article, which is '''a true incremental Sieve of Eratosthenes'''. It is nowhere near as fast as the array based solutions due to the overhead of functionally chasing the merging of the prime multiple streams; this also means that the empirical performance is not according to the usual Sieve of Eratosthenes approximations due to this overhead increasing as the log of the sieved range.
 
The following code uses delayed recursion via Streams to implement the Richard Bird algorithm mentioned in the last part (the Epilogue) of the above linked article, which is '''a true incremental Sieve of Eratosthenes'''. It is nowhere near as fast as the array based solutions due to the overhead of functionally chasing the merging of the prime multiple streams; this also means that the empirical performance is not according to the usual Sieve of Eratosthenes approximations due to this overhead increasing as the log of the sieved range, but it is much better than the above "unfaithful" sieve.
 
<lang scala>def primesBird() : Stream[Int] = {
Line 4,591 ⟶ 4,573:
 
Further gains in performance for these last two implementations can be had by using further wheel factorization and "tree folding/merging" as per [http://www.haskell.org/haskellwiki/Primes#Tree_merging_with_Wheel this Haskell implementation].
 
===Odds-Only "infinite" generator sieve using a hash table (HashMap)===
As per the "unfaithful sieve" article linked above, the incremental "infinite" Sieve of Eratosthenes can be implemented using a hash table instead of a Priority Queue or Map (Binary Heap) as were used in that article. The following implementation postpones the adding of base prime representations to the hash table until necessary to keep the hash table small:
 
<lang scala>def SoEInc(): Iterator[Int] = {
val nextComposites = scala.collection.mutable.HashMap[Int,Int]()
def oddPrimes(): Iterator[Int] = {
val basePrimes = SoEInc(); basePrimes.next(); basePrimes.next() // skip the two and three prime factors
@tailrec def makePrime(state: (Int,Int,Int)): (Int,Int,Int) = {
val (candidate, nextBasePrime, nextSquare) = state
if (candidate >= nextSquare) {
val adv = nextBasePrime << 1
nextComposites.+=((nextSquare + adv) -> adv)
val np = basePrimes.next()
makePrime((candidate + 2, np, np * np))
}
else if (nextComposites.contains(candidate)) {
val adv = nextComposites(candidate)
nextComposites.-=(candidate)
.+=(Iterator.iterate(candidate + adv)(_ + adv)
.dropWhile(nextComposites.contains(_)).next() -> adv)
makePrime((candidate + 2, nextBasePrime, nextSquare))
} else (candidate, nextBasePrime, nextSquare)
}
Iterator.iterate((5,3,9))({case (c, p, q) => makePrime((c + 2, p, q)) })
.map({case (p, _, _) => p})
}
List(2,3).toIterator ++ oddPrimes()
}</lang>
 
The above could be implemented using Streams or Co-Inductive Streams to pass the continuation parameters as passed here in a tuple but there would be no real difference in speed and there is no need to use the implied laziness. As compared to the versions of the Bird (or tree folding) Sieve of Eratosthenes, this has the expected same computational complexity as the array based versions, but is about 20 times slower due to the constant overhead of processing the key value hashing. Memory use is quite low, only being the hash table entries for each of the base prime values less than the square root of the last prime enumerated multiplied by the size of each hash entry (about 12 bytes in this case) plus a "load factor" percentage overhead in hash table size to minimize hash collisions (about twice as large as entries actually used by default on average).
 
The Scala implementable of a mutable HashMap is slower than the java.util.HashMap one by a factor of almost two, but the Scala version is used here to keep the code more portable (as to CLR). One can also quite easily convert this code to use the immutable Scala HashMap, but the code runs about four times slower due to the required "copy on update" operations for immutable objects.
 
This algorithm is very responsive to further application of wheel factorization, which can make it run up to about four times faster for the composite number culling operations; however, that is not enough to allow it to catch up to the array based sieves.
 
=={{header|Scheme}}==
474

edits