Back in the day, processors had a single core and clock speeds increased steadily every year. This meant that although writing multi-threaded code for a single CPU gave some performance improvements (since single-core CPUs usually try to execute multiple threads), it was hardly the top of any developer's list of priorities. In the last few years, clock speeds have reached a plateau and we've all got dual core or quad core processors, with the core-increasing trend set to continue.
Part of the reason for the apathy toward parallelism is that some significant effort must be made to ensure that thread safety is taken care of and issues of deadlocking and race conditions are avoided, with the associated maintenance cost and scope for introducing bugs often outweighing performance gains. In this blog post I will introduce a few of the features that .NET 4.0 brings to help us write more scalable code by taking advantage of the opportunities for parallelism, as we move into a world of multi-core and many-core processors.
Aside: "Many-core" describes a processor containing more cores than the number of running processes, in the realm of hundreds or thousands of cores. It's at this point that parallelism in software is key; the system can't run efficiently otherwise because there are more cores than threads. If you think this number of cores is totally futuristic, think again - NVIDIA have long been driving their parallel processors due to the parallelism demanded by rendering engines. The NVIDIA GTX280 GPU has 240 cores, and the Tesla M20 has 448 cores . Intel has also announced a 50-core CPU called "Knights Corner" .
Frequently, when we do use concurrency in our applications, we developers write code that is targeted towards a specific number of cores. For example, I want to free up my UI thread by performing a long-running calculation in the background. So naturally I run my calculation in a new thread from the thread pool. But what if my processor has four cores? Or 1000? I've hardcoded my software to use two and only two threads. This is the main issue highlighted in a 2006 research report from the University of California, Berkeley titled 'The Landscape of Parallel Computing Research: A View from Berkeley' , which states:
"The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems"
"To be successful, programming models should be independent of the number of processors."
Don't worry, though - these are exactly the problems that .NET 4.0 addresses by separating the developer from the low-level concerns of threading. From now on, the focus is on what runs in parallel, and not how. In this way we can more easily future-proof our software for the many-core scenarios outlined above.
Px (Parallel Extensions) is the umbrella term for the parallelism features of .NET 4.0. It consists of the Task Parallel library and PLINQ. Below I will give a brief outline of each, and as a bonus, I'll introduce Rx - Reactive Extensions - which builds on Px.
Task Parallel Library 
This library contains, among other things, functions for parallelising for and foreach loops, and introduces the Task type which can replace our use of the ThreadPool. The Task type is used to represent an operation that may spend some time waiting, without blocking its thread.
- Parallel.For and Parallel.ForEach: create a Task delegate for the loop body, allowing parallel invocation.
- Parallel.Invoke: allows parallel invocation of multiple statements.
- Task.WaitAll: only blocks the main thread while waiting for all tasks to complete, instead of blocking each individual thread
- Task.Factory.ContinueWhenAll: Creates a task that represents completion of a number of sub-tasks.
Warning - don't get too carried away with converting your delightfully-parallel algorithms to use parallel constructs if your code is simple to start off with. Because of the (small) overhead of synchronisation (for load-balancing of threads) and the delegate invocations involved in running Tasks, you might be worse off.
Parallel LINQ is a set of extension methods which can be used to execute your usual LINQ queries in parallel. Also included are some additional extensions that don't exist in the old-school serial LINQ. The extension methods exist on the ParallelQuery<T> type, which you can get an instance of by calling AsParallel() on your IEnumerable object. I'll introduce here a few of the extensions to give a feel for PLINQ;
- Select, Where, OrderBy, Aggregate, All, etc etc...: parallel versions of the standard operators.
- ForAll: perform an action for every item in a ParallelQuery collection. In parallel, of course.
- AsOrdered: signals that the parallel query should be treated as ordered - so when a parallel operation is subsequently executed, you can be sure the resulting collection comes out in the same order.
- AsUnordered: signals that a ParallelQuery doesn't need to be in order - this is the case by default.
- AsSequential: converts a ParallelQuery to IEnumerable, to force sequential evaluation in subsequent operations.
- WithCancellation: allows a cancellation token to be passed in so that a parallel operation can be cancelled.
- WithExecutionMode: PLINQ sometimes decides that it's more efficient to run a query sequentially. However, if you know better than PLINQ, you can force it to always run in parallel.
- MapReduce: If you're familiar with the popular MapReduce technique used for parallel computation on large datasets, then...here you go.
Warning: when replacing queries with these parallel versions, everything you want to execute in parallel must be independent. Be careful that your lambda expressions don't include any side effects and are actually parallelisable! Please see the further reading :-)
Rx (Reactive Extensions) is like LINQ all over again, but in reverse. Where LINQ is a set of extension methods for IEnumerable, Rx is a set of extension methods for IObservable. This stems from the fact that IObservable is IEnumerable's mathematical dual - where an IEnumerable collection is a pull collection, IObservable is a push collection. They are equal and opposite. Instead of requesting each item from the collection as you would with an IEnumerable, an IObservable gives you a notification every time it receives a new item, and another notification when it reaches the end of the collection.
Hold on - what does all this have to do with parallelism? Well, Rx goes hand in hand with the parallelism constructs available because it allows us to combine asynchronous sources together to compose our application. These asynchronous sources could be the output from some separate, parallel computations - or something more basic like UI events (on a different thread). It handles all of the complexity involved in cross-thread issues for us and allows us to focus on composition. Additionally, Rx has a dependency on Px because it's under-the-hood operations are already implemented using parallel-friendly techniques.
In the following examples, an IObservable is referred to as a source. You then call Subscribe on this source and pass in some Actions for what you want to do with each item, at the end, and on an error.
For example, start by creating the empty collection:
Once you create your source, you must subscribe to it to receive notifications. This is done with the Subscribe method;
With the empty source created above, we'll simply get a single OnCompleted notification.
Obviously empty sources aren't much use - here are some situations where you might want to create an IObservable:
- You can create an IObservable collection from a long-running computation, and write code that asynchronously acts upon each new result.
- You can treat user input as an observable collection. Eg - mouse move events are really just a never-ending collection of items.
- Any other .NET event can be used to instantiate an IObservable by using the Observable.FromEvent method.
A lot of the LINQ extension methods are available - Where, Select, Skip - do the same thing you are used to with LINQ. Here is a fun IObservable source example taken from the Rx Hands On Labs whitepaper  which I recommend you read if you intend to use Rx in your code.
The whitepaper example I mentioned above takes this a lot further - by sending items to a dictionary webservice (there is in-built support for asynchronous web services, too), receiving the responses, calling SelectMany to map from a single input to multiple results, and dumping them into a ListView. All in just a few lines, and avoiding the spaghetti of event handlers we're used to.
The intention of this post is to introduce some of the parallelism features available in .NET 4.0, and I've worked hard to resist going into any significant detail. For example, support is provided for exception handling during a parallel-executing action, explicit specification of how many tasks (threads) are created in the background, and task scheduling, all of which will likely be needed at some point in a real-world application. There's also loads of the more basic features that I've left out.
I hope that you can now appreciate the need for, and availability of, parallelism support in .NET. If any of the features described above tickle your fancy then check out the further reading links below. I'd highly recommend it because I feel that this is probably going to form a significant portion of most .NET developers' future.
1 NVIDIA multi-core GPU - the Tesla specifications http://www.nvidia.com/object/product_tesla_c1060_us.html
2 Intel "Knights Corner" 50-core chip http://www.geek.com/chips/intel-unveils-knights-corner-50-core-server-chip-1260323/
3 'The Landscape of Parallel Computing Research: A View from Berkeley' http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf
4 'Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4' http://www.microsoft.com/downloads/en/details.aspx?FamilyID=86B3D32B-AD26-4BB8-A3AE-C1637026C3EE
5 Introduction to PLINQ http://msdn.microsoft.com/en-us/library/dd997425.aspx