Sieve of Eratosthenes: Difference between revisions

m
→‎C# Unbounded + other headings: I on several occasions wanted to link to one of the implementations. So I turned the names of the implementations into headlines, so they could be linked to.
(Emacs Lisp: Use cl-lib)
m (→‎C# Unbounded + other headings: I on several occasions wanted to link to one of the implementations. So I turned the names of the implementations into headlines, so they could be linked to.)
Line 2,295:
}</lang>
 
===UnboundedRichard Bird Sieve===
 
'''Richard Bird Sieve'''
 
{{trans|F#}}
Line 2,344 ⟶ 2,342:
}</lang>
 
'''===Tree Folding Sieve'''===
 
{{trans|F#}}
Line 2,401 ⟶ 2,399:
The above code runs over ten times faster than the original Richard Bird algorithm.
 
'''===Priority Queue Sieve'''===
 
{{trans|F#}}
Line 2,464 ⟶ 2,462:
pq.rplcmin(k, v); return pq; }
}</lang>
 
 
===Restricted Base Primes Queue===
 
The following code implements an improved version of the '''odds-only''' O'Neil algorithm, which provides the improvements of only adding base prime composite number streams to the queue when the sieved number reaches the square of the base prime (saving a huge amount of memory and considerable execution time, including not needing as wide a range of a type for the internal prime numbers) as well as minimizing stream processing using fusion:
 
<lang csharp>using System;
using System.Collections;
Line 2,505 ⟶ 2,507:
The above code is at least about 2.5 times faster than the Tree Folding version.
 
 
'''===Dictionary (Hash table) Sieve'''===
 
The above code adds quite a bit of overhead in having to provide a version of a Priority Queue for little advantage over a Dictionary (hash table based) version as per the code below:
 
<lang csharp>using System;
using System.Collections;
Line 2,545 ⟶ 2,549:
The above code runs in about three quarters of the time as the above Priority Queue based version for a range of a million primes which will fall even further behind for increasing ranges due to the Dictionary providing O(1) access times as compared to the O(log n) access times for the Priority Queue; the only slight advantage of the PQ based version is at very small ranges where the constant factor overhead of computing the table hashes becomes greater than the "log n" factor for small "n".
 
===Best performance: CPU-Cache-Optimized Segmented Sieve===
'''Page Segmented Array Sieve'''
 
All of the above unbounded versions are really just an intellectual exercise as with very little extra lines of code above the fastest Dictionary based version, one can have an bit-packed page-segmented array based version as follows:
Line 2,608 ⟶ 2,612:
The above code is about 25 times faster than the Dictionary version at computing the first about 50 million primes (up to a range of one billion), with the actual enumeration of the result sequence now taking longer than the time it takes to cull the composite number representation bits from the arrays, meaning that it is over 50 times faster at actually sieving the primes. The code owes its speed as compared to a naive "one huge memory array" algorithm to using an array size that is the size of the CPU L1 or L2 caches and using bit-packing to fit more number representations into this limited capacity; in this way RAM memory access times are reduced by a factor of from about four to about 10 (depending on CPU and RAM speed) as compared to those naive implementations, and the minor computational cost of the bit manipulations is compensated by a large factor in total execution time.
 
The time to enumerate the result primes sequence can be reduced somewhat (about a second) by removing the automatic iterator "yield return" statements and converting them into a "rullroll-your-own" IEnumerable<PrimeT> implementation, but for page segmentation of '''odds-only''', this iteration of the results will still take longer than the time to actually cull the composite numbers from the page arrays.
 
In order to make further gains in speed, custom methods must be used to avoid using iterator sequences. If this is done, then further gains can be made by extreme wheel factorization (up to about another about four times gain in speed) and multi-processing (with another gain in speed proportional to the actual independent CPU cores used).
3

edits