New Features in Maple 17: Parallelism
 

Next Feature

Maple 17 introduces parallelism into its memory management system, taking advantage of multiple processors to perform its job more quickly. This change can lead to a reduction of running times for all computations, not just parallel algorithms. With no code changes required, your computations will run 10% faster on average, with memory-intensive computations running up to 50% faster.

In addition, new programming constructs were added to make it easier to write parallel code.   Variables and procedure remember tables can be declared local to the thread, so that each instance can store different values. This allows for complex algorithms that need to maintain state to be written in a thread-safe manner.

Details and Examples

Parallel Memory Management

One of the core components of Maple's engine is the 'garbage collector'. The garbage collector is responsible for finding and reclaiming memory that is no longer needed by the evaluation engine. In Maple 17, the garbage collector is capable of taking advantage of multiple processors to perform its job more quickly. As the garbage collector is used continuously as Maple runs, this speed up helps all users, not just those running parallel algorithms.

  • The garbage collector in Maple 17 is capable of running multiple threads to take advantage of multiple cores.  This can speed up the collector significantly.  Although the collector is only responsible for a fraction of Maple's total running time, this can still lead to running times reduced 10% on average, with memory-intensive computations running up to 50% faster.  These speed ups require no changes to user code.
  • There are 2 kernelopts that can control the parallel collector.gcmaxthreads controls the maximum number of threads that the parallel garbage collector will use.gcthreadmemorysize controls how many threads Maple will use for a collection.  The number of bytes allocated is divided by the value assigned to gcthreadmemorysize to determine the number of threads.

Example

The following example shows the speed up in the garbage collector from using multiple threads.

> restart
 

> if kernelopts(wordsize) = 32 then upperBound := `+`(`*`(2, `*`(`^`(10, 7)))) else upperBound := `^`(10, 8) end if; -1
if kernelopts(wordsize) = 32 then upperBound := `+`(`*`(2, `*`(`^`(10, 7)))) else upperBound := `^`(10, 8) end if; -1
if kernelopts(wordsize) = 32 then upperBound := `+`(`*`(2, `*`(`^`(10, 7)))) else upperBound := `^`(10, 8) end if; -1
if kernelopts(wordsize) = 32 then upperBound := `+`(`*`(2, `*`(`^`(10, 7)))) else upperBound := `^`(10, 8) end if; -1
if kernelopts(wordsize) = 32 then upperBound := `+`(`*`(2, `*`(`^`(10, 7)))) else upperBound := `^`(10, 8) end if; -1
 

> data := [seq(i, i = 1 .. upperBound)]; -1
 

> p := proc () local i; for i to 100 do gc() end do end proc; -1
p := proc () local i; for i to 100 do gc() end do end proc; -1
p := proc () local i; for i to 100 do gc() end do end proc; -1
p := proc () local i; for i to 100 do gc() end do end proc; -1
p := proc () local i; for i to 100 do gc() end do end proc; -1
p := proc () local i; for i to 100 do gc() end do end proc; -1
p := proc () local i; for i to 100 do gc() end do end proc; -1
 

> kernelopts(gcmaxthreads = 1); -1t1[1] := time[real](p()); -1
 

> kernelopts(gcmaxthreads = 2); -1t1[2] := time[real](p())
 

> kernelopts(gcmaxthreads = 4); -1t1[4] := time[real](p()); -1
 

>

 

Plot_2d

Of course, the average compuation does not require such intensive memory management, so the garbage collection algorithm only comprises a small amount of the total Maple running time. But even in this more typical example, the improvement is significant.

> restart
 

> d := [seq(randpoly([x, y, z], dense, degree = 12), i = 1 .. 10000)]; -1
 

> f := proc () local i; map(proc (i) options operator, arrow; int(i, x) end proc, d) end proc; -1
 

> kernelopts(gcmaxthreads = 1); -1
 

> t2[1] := time[real](f()); -1
  

> kernelopts(gcmaxthreads = 2); -1
 

> t2[2] := time[real](f()); -1
  

> kernelopts(gcmaxthreads = 4); -1
 

> t2[4] := time[real](f()); -1
  

>

 

Plot_2d

Thread Local Remember Tables

In Maple 17, procedures have different remember tables for each thread the procedure is executed on.  When a procedure with a thread local remember table is executed in parallel, modifications to the remember table will not lead to thread safety issues.  This change makes it easier to write thread safe functions that make use of remember tables, especially when direct manipulation of the remember table is required.

A procedure with a shared remember table, one that is shared between threads, can still be created by passing option shared to the procedure. This provides the same functionality as previous versions of Maple.

In Maple 17 remember tables are thread local by default.  When a procedure with a remember table is called by multiple threads, the remembered results are only available on the thread that created them.  

The following procedures implement a sequence similar to Fibonacci, where each term is the sum of the previous two terms, however the user can specify the initial values.  A remember table is used to improve the performance of this procedure.  The initial values are stored in FibsHelper's remember table where they act as the termination condition for the recursion.

> FibsHelper := proc (n::nonnegint)
option remember;
FibsHelper(n-1)+FibsHelper(n-2)
 end proc;

proc (n::nonnegint) option remember; `+`(FibsHelper(`+`(n, `-`(1))), FibsHelper(`+`(n, `-`(2)))) end proc

> Fibs := proc (n::nonnegint, t1 := 1, t2 := 1)
subsop(4 = NULL, op(FibsHelper));
FibsHelper(0) := t1;
FibsHelper(1) := t2;
FibsHelper(n)
end proc;

proc (n::nonnegint, t1 := 1, t2 := 1) subsop(4 = NULL, op(FibsHelper)); FibsHelper(0) := t1; FibsHelper(1) := t2; FibsHelper(n) end proc
proc (n::nonnegint, t1 := 1, t2 := 1) subsop(4 = NULL, op(FibsHelper)); FibsHelper(0) := t1; FibsHelper(1) := t2; FibsHelper(n) end proc
proc (n::nonnegint, t1 := 1, t2 := 1) subsop(4 = NULL, op(FibsHelper)); FibsHelper(0) := t1; FibsHelper(1) := t2; FibsHelper(n) end proc

> seq(Fibs(i), i = 0 .. 10);

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89

> seq(Fibs(i, 2, 2), i = 0 .. 10);

2, 2, 4, 6, 10, 16, 26, 42, 68, 110, 178

> seq(Fibs(i, 2, 38), i = 0 .. 10);

2, 38, 40, 78, 118, 196, 314, 510, 824, 1334, 2158

This works when run on a single thread, and in Maple 17, it works when run in parallel.

> task := proc (n, t1, t2) local i; [seq(Fibs(i, t1, t2), i = 0 .. 10)] end proc;

proc (n, t1, t2) local i; [seq(Fibs(i, t1, t2), i = 0 .. 10)] end proc

> Threads:-Task:-Start(passed, Tasks = [task, [10, 1, 1], [10, 2, 2], [10, 2, 38]]);

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89], [2, 2, 4, 6, 10, 16, 26, 42, 68, 110, 178], [2, 38, 40, 78, 118, 196, 314, 510, 824, 1334, 2158]

In previous versions of Maple, the parallel evaluation can fail.

A procedure with a shared remember table (Maple's previous behavior) can still be created in Maple 17 by specifying the shared option.  The following procedure will work when used in a single thread, but not in parallel.

> SharedFibsHelper := proc (n::nonnegint)
options remember, shared;
SharedFibsHelper(n-1)+SharedFibsHelper(n-2)
end proc;

proc (n::nonnegint) options remember, shared; `+`(SharedFibsHelper(`+`(n, `-`(1))), SharedFibsHelper(`+`(n, `-`(2)))) end proc

> SharedFibs := proc (n::nonnegint, t1 := 1, t2 := 1)
subsop(4 = NULL, op(SharedFibsHelper));
SharedFibsHelper(0) := t1;
SharedFibsHelper(1) := t2;
SharedFibsHelper(n)
end proc;

proc (n::nonnegint, t1 := 1, t2 := 1) subsop(4 = NULL, op(SharedFibsHelper)); SharedFibsHelper(0) := t1; SharedFibsHelper(1) := t2; SharedFibsHelper(n) end proc
proc (n::nonnegint, t1 := 1, t2 := 1) subsop(4 = NULL, op(SharedFibsHelper)); SharedFibsHelper(0) := t1; SharedFibsHelper(1) := t2; SharedFibsHelper(n) end proc
proc (n::nonnegint, t1 := 1, t2 := 1) subsop(4 = NULL, op(SharedFibsHelper)); SharedFibsHelper(0) := t1; SharedFibsHelper(1) := t2; SharedFibsHelper(n) end proc

> seq(SharedFibs(i), i = 0 .. 10);

1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89

> seq(SharedFibs(i, 2, 2), i = 0 .. 10);

2, 2, 4, 6, 10, 16, 26, 42, 68, 110, 178

> seq(SharedFibs(i, 2, 38), i = 0 .. 10);

2, 38, 40, 78, 118, 196, 314, 510, 824, 1334, 2158

When SharedFibs is called in parallel, incorrect results may be computed.  Note that due to the nature of parallel execution, incorrect results may not occur on every run.

> sharedTask := proc (n, t1, t2)
local i;
[seq(SharedFibs(i, t1, t2), i = 0 .. 10)]
end proc;

proc (n, t1, t2) local i; [seq(SharedFibs(i, t1, t2), i = 0 .. 10)] end proc

> Threads:-Task:-Start(passed, Tasks = [sharedTask, [10, 1, 1], [10, 2, 2], [10, 2, 38]]);

[1, 1, 2, 3, 5, 8, 13, 21, 34, 650, 89], [2, 2, 4, 6, 10, 16, 26, 42, 68, 110, 178], [2, 38, 40, 78, 118, 196, 314, 510, 34, 1334, 2158]

There are various things that can go wrong.  Clearing the remember table in one thread can remove another thread's initial values, which will cause SharedFibsHelper to be called with invalid inputs (negative numbers).  One threads initial values can overwrite another threads, which will cause that thread to compute incorrect results.  One threads remembered values can be used by a different thread where those values are incorrect.

A procedure with a shared remember table can work correctly in parallel and computed results can be shared across threads.  However there are may complexities that require  procedures with shared remember tables to be written with care.  Maple 17's thread local remember tables makes it easier for procedure written for single threaded execution to work in parallel.

Thread Local Data

In Maple 17, module local variables can be declared thread local, meaning that they store a different value for each thread that access the variable. This allows for complex algorithms that need to maintain state to be written in a thread safe manner.

A module that wants to maintain internal state generally will not work correctly when executed in parallel. Consider the following example: the Mapper module maintains an internal state using the variables func and data.  The setMapper routine is used to set these variables. When ModuleApply is called, it maps func over the given data.

> Mapper := module ()
local func, data;
export setMapper, ModuleApply;
 setMapper := proc (f::procedure, d::anything)
func := f; data := d;
NULL
end proc;
ModuleApply := proc (d)
map(func, d, data)
end proc
end module:

> adder := proc ()
Mapper:-setMapper(proc (x, y) options operator, arrow; x+y end proc, 1);
Mapper([seq(i, i = 1 .. 10)])
end proc:

> adder();

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

> multer := proc ()
Mapper:-setMapper(proc (x, y) options operator, arrow; x*y end proc, 3);
Mapper([seq(i, i = 1 .. 10)])
end proc:

> multer();

[3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

If you execute the adder and multer commands in parallel, you can get incorrect results. Executing the following statement multiple times will show different results, some of which will be incorrect.

> Threads:-Task:-Start(passed, Task = [adder], Task = [multer]);

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

> Threads:-Task:-Start(passed, Task = [adder], Task = [multer]);

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

> Threads:-Task:-Start(passed, Task = [adder], Task = [multer]);

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

> Threads:-Task:-Start(passed, Task = [adder], Task = [multer]);

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

> Threads:-Task:-Start(passed, Task = [adder], Task = [multer]);

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

Incorrect results are computed because the func and data state variables are shared between threads. Thus when one thread changes func's value, the other thread is also affected. In Maple 17, this can be fixed by declaring the state variables as thread_local. This means that each thread will maintain its own value for the state variables. Thus changing the value in one thread will not effect other threads.

> MapperTL := module ()
local func::thread_local, data::thread_local;
export setMapper, ModuleApply;
  setMapper := proc (f::procedure, d::anything)
func := f; data := d;
NULL
end proc;
ModuleApply := proc (d) map(func, d, data)
end proc
end module;

module () local func::thread_local, data::thread_local; export setMapper, ModuleApply; end module

> adderTL := proc ()
MapperTL:-setMapper(proc (x, y) options operator, arrow; x+y end proc, 1);
MapperTL([seq(i, i = 1 .. 10)])
end proc;

proc () MapperTL:-setMapper(proc (x, y) options operator, arrow; `+`(x, y) end proc, 1); MapperTL([seq(i, i = 1 .. 10)]) end proc

> adderTL();

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

> multerTL := proc ()
MapperTL:-setMapper(proc (x, y) options operator, arrow; x*y end proc, 3);
MapperTL([seq(i, i = 1 .. 10)])
end proc;

proc () MapperTL:-setMapper(proc (x, y) options operator, arrow; `*`(x, `*`(y)) end proc, 3); MapperTL([seq(i, i = 1 .. 10)]) end proc

> multerTL();

[3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

Executing this version, using the thread local variables, will return the expected result.

> Threads:-Task:-Start(passed, Task = [adderTL], Task = [multerTL]);

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

> Threads:-Task:-Start(passed, Task = [adderTL], Task = [multerTL]);

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

> Threads:-Task:-Start(passed, Task = [adderTL], Task = [multerTL]);

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

> Threads:-Task:-Start(passed, Task = [adderTL], Task = [multerTL]);

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]

> Threads:-Task:-Start(passed, Task = [adderTL], Task = [multerTL]);

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11], [3, 6, 9, 12, 15, 18, 21, 24, 27, 30]