Parallel Programming in Maple 16: The Task Programming Model
Maple includes the Task Programming Model, a high level parallel programming library. The Task Programming Model allows users to implement parallel algorithms, algorithms capable of taking advantage of the multiple core computers that are now common. Writing parallel algorithms using the Task Programming Model reduces and removes many of the difficulties associated with standard threaded programming.
Example: Computing a Convex Hull
Maple's New Memory Manager and Parallel Performance
Improvements to Threads Add, Mul, Seq and Map
Example: Magma:-IsAssociative
Given X, a set of points in 2D, the Convex Hull is the minimum set of points that define a polygon containing all the points of X. If you imagine the points as pegs on a board, you can find the Convex Hull by surrounding the pegs by a loop of string and then tightening the string until there is no more slack, the Convex Hull is the pegs in contact with the string.
The following is an example of a Convex Hull of 20 points.
One way to compute a Convex Hull is to use the QuickHull algorithm. Starting with two points on the convex hull (the points with lowest and highest position on the x axis, for example), you create a line which divides the remaining points into two groups. If you find the Convex Hull of these two groups, they can be combined to form the Convex Hull of the entire set. Given the line and the set of points on one side of the line, we find the point in the set furthest from the line. This point will also be on the Convex Hull. Using this point and the two end points of the line, we can define two new lines on which we can recurse. If we find a line with one or no points on the outside of the hull we have constructed so far, then we stop knowing that both end points of the line and the one exterior point (if it exists) are in the Convex Hull.
The algorithm can be parallelized by running the recursive steps in parallel. The following code implements the QuickHull algorithm and a parallel QuickHull using the Task Programming Model.
One might be surprised to see how little extra code is necessary to turn a sequential algorithm into a parallel one. This is because the Task Programming Model takes care of many of the confusing, messy, and bug prone aspects of threaded programming.
We now create some points to try the QuickHull algorithm on:
points ≔ table:while numelems points < 1000 do p ≔ RandomTools:-Generate floatrange=−1..1 , RandomTools:-Generate floatrange=−1..1 ; if p1*p1+p2*p2 < 1 then points numelemspoints ≔ p; end;end: points ≔ entries points, 'nolist' : t ≔ timereal:QuickHull points, plot ; tSeq ≔ timereal−t:
Now try the parallel algorithm on the same set of points:
t ≔ timereal:QuickHull points, plot,parallel ;tPara ≔ timereal−t:
As you can see, the relatively simple modification to add parallelism to the QuickHull algorithm leads to a good speed enhancement.
Maple 16 has a revamped memory management system that allows for better handling of memory when multiple threads are executing. By running the above example with larger numbers of points, the difference between Maple 15 and Maple 16 becomes clear.
As the problem size grows, the Maple 15 memory manager is unable to keep up with the demands of the problem. Maple 16 does a much better job, leading to significantly better performance. The following graph shows memory used when generating the graph above.
In Maple 16 some improvements were made the the Threads:-Add, Threads:-Mul, Threads:-Seq and Threads:-Map routines. These routines are parallel implementations of the corresponding top level Maple routines. As Maple can't predict the cost of executing each element, it has to guess the best way to divide the problem over multiple cores. Up to Maple 15, a very simple heuristic was used. It worked well in many cases, but performed poorly in many others, including a few common ones. Maple 16 uses a smarter heuristic that can exploit parallelism in more cases. In addition, users can now specify how the problem should be divided, which can allow greater parallelism even in the cases where the new heuristic does not work optimally.
For a small number of elements (less than 1000) Maple 15 performs no better than single threaded. For 1000 to 2000 elements, Maple 15 performs better, but still worse that Maple 16. Only for problems with more that about 2500 elements does the Maple 15 performance approach that of Maple 16.
The Maple 16 heuristic is better than Maple 15, but it is not optimal in all cases. One case where it is known to perform poorly is for a very small number of elements where each task takes a long time to execute. In this case the best approach is to allow each element to execute as a single task. Maple 16 users can specify the size of a task using the tasksize option. The effect of this option can be seen below.
Thus for a small number of long running tasks, explicitly specifying a tasksize can improve performance. That said, the heuristic does fairly well once there are enough elements.
The Task Programming Model is the same programming method Maple developers use to add parallelism into its library of routines. The new Magma package uses parallelism to accelerate its IsAssociative function. The following shows how effective parallelism can be in accelerating Maple.
Download Help Document