## Finding primes in parallel

Posted by Jonas Elfström Thu, 14 Jan 2010 21:55:00 GMT

Justin Etheredge has been blogging about his challenge to find prime numbers with LINQ. He later used `AsParallel()`

(coming in .NET 4) to speed things up and then followed that up with a post about using The Sieve Of Eratosthenes.

As you can see in the comments of those posts I tried to speed the Sieve of Eratosthenes up by using `Parallel.For`

in the inner loop. I also tried AsParallel() in the LINQ expression but it made no difference in either case. At most it got 5% faster. I'm not sure but it could be that because SoE is very memory intense we could have a scaling issue and maybe also memory bandwidth exhaustion. This is mere speculation.

I then searched for other algorithms and found The Sieve of Atkin. It uses less memory than SoE so I thought I'd give it a try.

I set the limit to 20,000,000 and then benchmarked it. It timed in on 2.48s so actually worse than the 2.2s that SoE took. Not good!
Then I added `Parallel.For`

in the loop that did most of the work and lo and behold, it scaled! I have two cores in my machine (T7200@2.0GHz) and the average runtime went down to 1.26s. That's almost linear and surprisingly good! If you happen have a quad core (or more) and feel like trying it out then please contact me. It would be interesting to see if it scales further.

1 2 3 4 5 6 7 8 9 |
static List<int> FindPrimesBySieveOfAtkins(int max) { // var isPrime = new BitArray((int)max+1, false); // Can't use BitArray because of threading issues. var isPrime = new bool[max + 1]; var sqrt = (int)Math.Sqrt(max); Parallel.For(1, sqrt, x => { var xx = x * x; for (int y = 1; y <= sqrt; y++) { var yy = y * y; var n = 4 * xx + yy; if (n <= max && (n % 12 == 1 || n % 12 == 5)) isPrime[n] ^= true; n = 3 * xx + yy; if (n <= max && n % 12 == 7) isPrime[n] ^= true; n = 3 * xx - yy; if (x > y && n <= max && n % 12 == 11) isPrime[n] ^= true; } }); var primes = new List<int>() { 2, 3 }; for (int n = 5; n <= sqrt; n++) { if (isPrime[n]) { primes.Add(n); int nn = n * n; for (int k = nn; k <= max; k += nn) isPrime[k] = false; } } for (int n = sqrt + 1; n <= max; n++) if (isPrime[n]) primes.Add(n); return primes; } |

This code needs C# 4.0 to compile.

Edit 2010-12-14

Dommer found out that the BitArray implementation had some serious threading issues.
I had my worries about the non thread safe characteristics of BitArray but I thought that the isPrime[n] ^= true; was an atomic operation and that it didn't matter in what order bit bits was flipped would make it possible to use anyway. Not so. Changed it to a boolean array and that seems to rock the boat but of course at a much higher memory cost.

Edit 2010-01-20

Indications are that this does in fact not scale very good on a quad core. It's even worse, it seems it scales good on my old T7200 but not on a dual core E6320. I don't know why but of course the shared state of the **isPrime** `BitArray`

is a huge problem and maybe it could be that differences in CPU architecture (FSB speed, caches and so on) in the E6320 is an explanation. Average execution time on the E6320 was 1290ms in a single thread and 1064ms in two.

If you want to try this in an older version of C# than 4.0 then check out this post.

A reader asked how I timed the executions. Here's how.

1 2 3 4 5 6 7 8 9 |
var steps = new List<long>(); var watch = new Stopwatch(); for (int i = 0; i < 10; i++) { watch.Reset(); watch.Start(); var primes = FindPrimesBySieveOfAtkins(20000000); watch.Stop(); Console.WriteLine(watch.ElapsedMilliseconds.ToString()); steps.Add(watch.ElapsedMilliseconds); } Console.WriteLine("Average: " + steps.Average().ToString()); |

Edit 2010-10-24

Tom's code from the comment below

1 2 3 4 5 6 7 8 9 |
using System; using System.Collections.Generic; using System.Linq; using System.Numerics; using System.Text; using System.Threading.Tasks; namespace Calculate_Primes { class Program { private const int _NUMBER_OF_DIGITS = 100; static void Main(string[] args) { BigInteger floor = BigInteger.Parse("1" + string.Empty.PadLeft(_NUMBER_OF_DIGITS - 1, '0')); BigInteger ceiling = BigInteger.Parse(string.Empty.PadLeft(_NUMBER_OF_DIGITS, '9')); Console.WindowWidth = 150; //var primes = Enumerable.Range(floor, ceiling).Where(n => Enumerable.Range(1, n).Where(m => (n / m) * m == n).Count() == 2); Console.Clear(); _calculatePrimes(floor, ceiling, "C:\\100 digit primes.txt"); Console.Clear(); _calculatePrimes(floor, ceiling, "C:\\300 digit primes.txt"); } static IEnumerable<BigInteger> Range(BigInteger fromInclusive, BigInteger toInclusive) { for (BigInteger i = fromInclusive; i <= toInclusive; i++) yield return i; } static void ParallelFor(BigInteger fromInclusive, BigInteger toInclusive, Action<BigInteger> body) { Parallel.ForEach(Range(fromInclusive, toInclusive), body); } static void _calculatePrimes(BigInteger floor, BigInteger ceiling, string resultsFileFilepath) { using (System.IO.FileStream fs = new System.IO.FileStream(resultsFileFilepath, System.IO.FileMode.Create)) { } using (System.IO.StreamWriter sw = new System.IO.StreamWriter(resultsFileFilepath)) { ParallelFor(floor, ceiling, i => { if (_isPrime(i)) { lock (sw) { sw.Write(i.ToString() + System.Environment.NewLine); sw.Flush(); } } }); } } static bool _isPrime(BigInteger number) { bool returnValue = true; Console.WriteLine("Checking {0} for primality.", number.ToString()); if ((number < 2) || (number > 2 && number.IsEven) || (number > 2 && number.IsPowerOfTwo)) returnValue = false; else for (BigInteger i = 2; i * i <= number; i++) { if (number % i == 0) returnValue = false; } if(returnValue) Console.WriteLine(" {0} IS prime.", number.ToString()); else Console.WriteLine(" {0} IS NOT prime.", number.ToString()); return returnValue; } } } |

Are BitArrays thread-safe, or how does that work?

Is it possible to write a version of this without the Parrallel.For method? Well, I suppose it is possible of course, but would it be a big mess? :p

Replace the Parallel.For with for (int x = 1; x <= sqrt; x++) and remove ); from row 24 and you should be good to go.

If I understand http://bit.ly/8lZagW correctly the isPrime[n] = !isPrime[n]; is an atomic operation but I have to investigate the matter of thread safety further. Thanks!

But that would remove the parallelism :p I was wondering if it could be done in a nice way without using Parallel.For, but still have the parallelism. (So do whatever Parallel.For does yourself)

From that link you posted: “Aside from the library functions designed for that purpose, there is no guarantee of atomic read-modify-write, such as in the case of increment or decrement.” – Wouldn’t that mean that it is not thread-safe? or?

That’s for “long, ulong, double, and decimal”. Read/write of booleans is atomic. I’m just not sure that isPrime[n] = !isPrime[n]; is the same as Boolean test = false; test = !test; which would be atomic.

If Holterman is correct then my usage is thread safe: https://stackoverflow.com/questions/1213997/is-there-a-generic-type-safe-bitarray-in-net/1214686#1214686

“I was wondering if it could be done in a nice way without using Parallel.”Check out: http://www.codeproject.com/KB/dotnet/PoorMansParallelForEach.aspx

So should maybe use the Set method instead then? Or doesn’t make much difference perhaps…

Thanks for the link. Will check it out :)

http://coding-time.blogspot.com/2008/03/implement-your-own-parallelfor-in-c.html - makes it possible to run FindPrimesBySieveOfAtkins unchanged in C# 2.0-3.5.

These lines: isPrime[n] = !isPrime[n];

Should be replaced with isPrime[n] ^= true;

No more atomicity.

Here’s a little something I came up with–except for the commented out LINQ query–using Stephen Toub’s comments on Scott Hansellman’s blog:

using System; using System.Collections.Generic; using System.Linq; using System.Numerics; using System.Text; using System.Threading.Tasks; namespace Calculate_Primes { class Program { private const int

NUMBEROF_DIGITS = 100;}

@Tom

An XOR toggle, nice! I haven’t checked it but I would guess that both that and mine compile to something similar.

Now you lost me. I believe it is and that that’s a good thing.

@Tom Thanks for the code and sorry for the not so fancy commenting function on my blog.

I think that you are right in that brute force prime search scales over multiple CPUs. It could be a problem that it’s so terrible slow in comparance to The Sieve Of Eratosthene and The Sieve of Atkin, I just don’t know. Guess I have to read up on how those gigantic primes that have been found was found.

This implementation of the sieve of Eatosthenes should take about 500 ms on your PC finding primes upto 20*10^6

Regards,

Peter

@Peter Thanks! Even though Sieve of Atkin in theory should be faster than Sieve of Eratosthenes your implementation of the latter is much faster than mine of the former. I’m not surprised because I just did a naive translation from the pseudo code on Wikipedia to C#.

Also my implementation can’t handle searching for primes up to 1000000000. It seems to be the line

`int n = 4 * xx + yy;`

that is the problem.`4*xx+yy`

does not fit in a`Int32`

for max=1000000000.FWIW: 20M took 387 ms on my 6 core using the algo in this post. Rockin! Thanks for the C# implementation.

just wanted to say you can use watch.Restart() instead of watch.Reset(); watch.Start();

You can fix the original posting without using boolean arrays, but you would need to create a bit array for each thread and do each one separately, then after all the threads are finished, simply XOR them all together, this give the best of both worlds.

@AronM That sounds ingeniously simple! I will have to try that out. Still worried about the XOR. Why wouldn’t it be affected by all other threads?