User talk:Horst.h: Difference between revisions

→‎Your Extensible Prime Generator Pascal contribution: examples showing Free Pascal now has the speed and other comments...
(Sieve of Eratosthenes comments and improvements...)
(→‎Your Extensible Prime Generator Pascal contribution: examples showing Free Pascal now has the speed and other comments...)
Line 22:
 
BTW, a good way to make speeds more comparable in a CPU agnostic way is to post them as CPU clock cycles per cull operation and CPU clock cycles per prime found - that removes much of the variation between comparisons that people using your examples would have although for some algorithms on some CPU's there is some variation be up to a factor of two due to the abilities of the CPU such as simultaneous micro instruction execution (not including hyper-threading or multi-threading).--[[User:GordonBGood|GordonBGood]] ([[User talk:GordonBGood|talk]]) 08:06, 6 January 2019 (UTC)
 
: I think that it may be possible to write a Free Pascal version of the Sieve of Eratosthenes that catches up to Kim Walisch's primesieve with the current fpc compiler:
 
: I run a very simple Sieve of Eratosthenes benchmark that tests raw maximally optimized (other than extreme loop unrolling as used in primesieve) loop speed; this implementation is run over only a 16 Kilobyte buffer of packed bits representing odds-only numbers as every modern CPU will have a L1 cache of at least that big, thus, it sieves for the primes up to 262146 with the answer 23000 (including the known only even prime of two).
 
: On Wandbox with full optimization, [https://wandbox.org/permlink/2rhmyx0j1VbeMzPY a Free Pascal version using the HEAD version 3.3.1 compiler] takes only about 156 milliseoonds, there [https://wandbox.org/permlink/9Ezpc6epuL9ZbDaz a C version using Clang HEAD version 8.0.0] takes about 147 milliseconds where the difference is probably just "line noise" but it also may be that Free Pascal still doesn't fully tune the order of likely the same machine instructions used. The Free Pascal compiler has been greatly improved since the last release as it takes about 250 milliseconds with the last version 3.0.2: I expect that it now combines the bit modification "k^ := k^ or msk;" into a single read modify write instruction rather than the former one instruction to read from memory, another to modify, and another to write the result back to memory. This last is what Google's Go language still does, which is one of the reasons it is fairly slow for this benchmark.
 
: A maximally optimized Sieve of Eratosthenes must use bit packing to make maximum use of the CPU caches, and (when optimized such as here), the cost of any extra operations to do with bit-packing is more than made up for in improved memory access times using the L1 cache. As Pascal has pointers as used here and with the newly optimized compiler, it should be able to take advantage of extreme loop unrolling as used by primesieve to gain a little more, especially for smaller base primes.
 
: It's been a "blast from the past" for me to re-visit Pascal after all these years in order to write this benchmark, but it has also made me realize how far language design has moved on since Turbo Pascal days when we used to rave about it: when compared to a modern language like Kotlin, or Nim, or with modern functional languages like Haskell or even more F#, there is just no going back for me as it feels "clunky". Verbosity in declaring var's separately from the code using them, no type inference whatsoever, begin/end blocks, etc. etc. Even Julia is better, although I don't like dynamic typing there are ways to get around it with some effort.--[[User:GordonBGood|GordonBGood]] ([[User talk:GordonBGood|talk]]) 04:43, 8 January 2019 (UTC)
474

edits