Legendre prime counting function: Difference between revisions

Line 693:

π(10^8) = 5761455

π(10^9) = 50847534</pre>

===Non-Memoized Version===

The above code takes a huge amount of memory to run, especially if using the full size "Naturals" which not using them will severely limit the usable counting range (which of course is limited anyway by the huge memory use by the memoization). As per the comments on the task, memoization is not strictly needed, as the exponential growth in execution time can be limited by a very simple optimization that stops splitting of the recursive "tree" when there are no more contributions to be had by further "leaves", as per the following code:

<syntaxhighlight lang="nim"># compile with: nim c -d:danger -t:-march=native --gc:arc

from std/monotimes import getMonoTime, `-`

from std/times import inMilliseconds

from std/math import sqrt

let masks = [ 1'u8, 2, 4, 8, 16, 32, 64, 128 ] # faster than bit twiddling

let masksp = cast[ptr[UncheckedArray[byte]]](unsafeAddr(masks[0]))

proc countPrimes(n: int64): int64 =

if n < 3:

return if n < 2: 0 else: 1

else:

let rtlmt = n.float64.sqrt.int

let mxndx = (rtlmt - 1) div 2

let sz = (mxndx + 8) div 8

var cmpsts = cast[ptr[UncheckedArray[byte]]](alloc0(sz))

for i in 1 .. mxndx:

if (cmpsts[i shr 3] and masksp[i and 7]) != 0: continue

let sqri = (i + i) * (i + 1)

if sqri > mxndx: break

let bp = i + i + 1

for c in countup(sqri, mxndx, bp):

let w = c shr 3; cmpsts[w] = cmpsts[w] or masksp[c and 7]

var pisqrt = 0'i64

for i in 0 .. mxndx:

if (cmpsts[i shr 3] and masksp[i and 7]) == 0: pisqrt += 1

var primes = cast[ptr[UncheckedArray[uint32]]](alloc(sizeof(uint32) * pisqrt.int))

var j = 0

for i in 0 .. mxndx:

if (cmpsts[i shr 3] and masksp[i and 7]) == 0: primes[j] = (i + i + 1).uint32; j += 1

proc phi(x: int64; a: int): int64 =

if a <= 1:

return if a < 1: x else: x - (x shr 1)

let p = primes[a - 1].int64

if x <= p: return 1 # very simple one-line optimization that limits exponential growth!

return phi(x, a - 1) - phi((x.float64 / p.float64).int64, a - 1)

result = phi(n, pisqrt.int) + pisqrt - 1

cmpsts.dealloc; primes.dealloc

let nstrt = getMonoTime()

var pow = 1'i64

for i in 0 .. 9:

echo "π(10^", i, ") = ", pow.countPrimes; pow *= 10

let nelpsd = (getMonoTime() - nstrt).inMilliseconds

echo "This took ", nelpsd, " milliseconds."

</syntaxhighlight>

<pre>π(10^0) = 0

π(10^1) = 4

π(10^2) = 25

π(10^3) = 168

π(10^4) = 1229

π(10^5) = 9592

π(10^6) = 78498

π(10^7) = 664579

π(10^8) = 5761455

π(10^9) = 50847534

This took 194 milliseconds.</pre>

As shown in the output, the above program runs without memoization about 12 times faster and using a huge amount less memory than the previous memoized algorithm. Other than not requiring memoization, the implementation also substitutes floating point division for integer division so as to be faster (only being of 53-bits accuracy; which limited precision will limit the range over which the primes can be accurately counted, but as this algorithm is only reasonably useful to about ten to the fourteenth power anyway, that shouldn't be a problem.

A further optimization that would greatly reduce the number of operations and thus the run time for these small ranges is to use a pre-computed Look Up Table (LUT) array for the "leaves" where the "degree" is small (the smallest primes used); this would reduce the "width" of the tree at its widest "base" to eliminate many divisions and recursions and thus reduce the execution time greatly, although the effect is more for smaller ranges such as this as for greater ranges.

For larger ranges, a better algorithm such as that of Legarias, Miller, and Odlyzko (LMO) will perform much better than this and use much less memory, although at the cost of greater complexity and more lines of code (about 500 LOC). There is also a variation of the Legendre algorithm which is to this Legendre algorithm as the LMO algorithm is to the Meissel algorithm (which directly followed and improved the Legendre work), which can be implemented in about 60 LOC and is much faster although it still uses a lot of memory.

=={{header|Perl}}==