New Features in Maple 16 – Computational Efficiency Improvements

New Features in Maple 16: Computational Efficiency Improvements

Tremendous performance gains for many algorithms in Maple 16, including core polynomial operations, numeric differential equation solving, linear algebra computations, and more, allow you to solve larger problems faster than ever before. In addition, Maple 16 continues to improve on scalability to multi-core computers. As the only system among its competitors that supports not just grid computing but also multi-threaded computations within its math engine and programming language, Maple 16 sets the standard for large-scale computing.

Specific improvements have been made in:

Polynomial arithmetic
Polynomial factorization
Polynomial ideal decomposition
Bernoulli numbers
Numeric linear algebra
Parallel map, seq, add and mul
Memory management
Grid computing

Computational Efficiency

Polynomial arithmetic

Over the past couple of releases, the efficiency of polynomial arithmetic in Maple has been improved dramatically:

Multiplication and powering (expand, Expand mod n, Power mod n, Powmod)
Exact division (divide, Divide mod n)
Univariate and multivariate polynomials with integer coefficients
Dense and sparse polynomials

This has been achieved through a combination of various techniques:

New fast heap-based and memory efficient algorithms implemented in C
New compact internal polynomial data structure
Multithreading and exploiting of memory cache locality

Improvements in Maple 16 have increased speed-ups even further since Maple 15, by more than a factor of 2. Maple 16 is up to 22 times faster than Maple 14 and earlier when performing these computations, and up to 1800 times faster compared to Mathematica® 8.

Benchmarks

Multiplication of two dense polynomials in 3 variables, each of degree 30

Computation times:

Maple 14: 0.65s
Maple 15: 0.73s
Maple 16: 0.52s
Mathematica 8: 110s

Speedup:

Maple 16/Maple15: 1.4 times faster

Maple 16/Mathematica 8: 210 times faster

Division of a dense polynomial of degree 60 by a dense polynomial of degree 30, each in 3 variables.

Computation times:

Maple 14: 0.88s
Maple 15: 0.75s
Maple 16: 0.54s
Mathematica 8: 84s

Speedup:

Maple 16/Maple 15: 1.4 times faster

Maple 16/Mathematica 8: 160 times faster

Multiplication of two sparse, univariate polynomials of degree 100000 with 1000 terms each.

Computation times:

Maple 14: 0.27s
Maple 15: 0.28s
Maple 16: 0.21s
Mathematica 8: 3.0s

Speedup:

Maple 16/Maple 15: 1.3 times faster

Speedup Maple 16/Mathematica 8: 14 times faster

Division of a univariate polynomial of degree 200000 with 180000 terms by a univariate, sparse polynomial of degree 100000 with 1000 terms.

Computation times:

Maple 14: 0.27s
Maple 15: 0.25s
Maple 16: 0.20s
Mathematica 8: 78s

Speedup:

Maple 16/Maple 15: 1.25 times faster

Maple 16/Mathematica 8: 390 times faster

Check for non-divisibility of a degree 60 dense polynomial in 3 variables by a polynomial of degree 30 in 3 variables.

Computation times:

Maple 14: 0.99s
Maple 15: 0.090s
Maple 16: 0.046s
Mathematica 8: 84s

Speedup:

Maple 16/Maple 15: 2 times faster

Maple 16/Mathematica 8: 1800 times faster

Division of a dense polynomial of degree 30 in 4 variables by a linear polynomial in 4 variables modulo a 512 bit prime number

Computation times:

Maple 14: 2.0s
Maple 15: 0.16s
Maple 16: 0.11s
Mathematica 8: 2.5s

Speedup:

Maple 16/Maple 15: 1.5 times faster

Maple 16/Mathematica 8: 23 times faster

Powering of a linear univariate polynomial modulo a 31 bit prime number and a polynomial of degree 1024, to a 30 bit power

Computation times:

Maple 14: 0.7s
Maple 15: 0.26s
Maple 16: 0.11s
Mathematica 8: did not complete

Speedup:

Maple 16/Maple 15: 2.4 times faster

Multi-Core Benchmarks

Multiplication of two dense polynomials in 4 variables and of degree 20

Maple 16 computation times:

1 core: 13s

2 cores: 7.5s

4 cores: 4.8s

Speedup:

4 cores: 2.7 times faster

Division of a dense polynomial in 4 variables of degree 20 by a dense polynomial in 4 variables of degree 10

Maple 16 computation times:

1 core: 14s

2 cores: 7.7s

4 cores: 4.4s

Speedup:

4 cores: 3.2 times faster

Polynomial factorization

Maple has two commands for polynomial factorization, i.e., the multiplicative decomposition of a multivariate polynomial into irreducible factors that cannot be decomposed any further.

The simplest case is that of a univariate polynomial with integer coefficients, using the factor command:

(2.1)

Note that the resulting factors have integer coefficients as well. For example, is not split any further into . To obtain a more refined factorization, you can use a second argument to specify what you want to allow in the coefficients:

(2.2)

In the most general case, the polynomial can have more than one variable and may already contain non-rational expressions such as or or , which can even be nested or involve parameters (such as in the example) in the coefficients. Such cases are handled by the command evala(Factor(...)). Note that this command generally requires all radicals to be expressed in RootOf form:

(2.3)

(2.4)

(2.5)

In Maple 16, the computational efficiency of the evala(Factor(...)) command has been improved dramatically, particularly in the presence of multiple and nested RootOfs. The speedup is due to a new and efficient sparse modular algorithm. For example, the execution time for the following example is about 20 times faster than in Maple 15 on the same machine.

(2.6)

(2.7)

(2.8)

(2.9)

(2.10)

(2.11)

(2.12)

The following table shows some benchmarks in comparison to Maple 15.

Variables	RootOfs	Parameters	Total #indets	Degree	Terms	Maple 15	Maple 16	Speedup
		−	5	10	36
			5	10	36
			5	10	134
			5	10	276
			5	20	36
			5	30	36
	,		5	10	36
	, ,		6	10	36
			6	10	36

Polynomial ideal decomposition

A generalization of the concept of factoring a single polynomial into its multiplicatively irreducible factors is the prime decomposition of a polynomial ideal, generated by a finite set of multivariate polynomials. Maple's PolynomialIdeals package provides, in addition to many useful commands and tools for manipulating and analyzing polynomial ideals, the PrimeDecomposition command, as well as the Radical command, which computes equivalent to the squarefree part for a polynomial ideal. The computational efficiency of these two commands was improved for Maple 16, leading to dramatic speedup factors of more than 700 in some cases.

The main improvements are new algorithms and heuristics to split the systems more frequently and, in some cases, run the factoring Buchberger algorithm. These enhancements apply in particular to positive-dimensional systems, i.e., systems with an infinite number of complex solutions.

For example, the following computation is over 50 times faster than in Maple 15:

(3.1)

$POLYNOMIALIDEAL(a, `+`(`*`(2, `*`(c)), `-`(b)), `+`(`-`(4), `-`(`*`(2, `*`(`^`(b, 2)))), `*`(3, `*`(d, `*`(b)))), variables = {a, b, c, d}, characteristic = 0, known_groebner_bases = (table( [( lexdeg...$
$POLYNOMIALIDEAL(a, `+`(`*`(2, `*`(c)), `-`(b)), `+`(`-`(4), `-`(`*`(2, `*`(`^`(b, 2)))), `*`(3, `*`(d, `*`(b)))), variables = {a, b, c, d}, characteristic = 0, known_groebner_bases = (table( [( lexdeg...$
$POLYNOMIALIDEAL(a, `+`(`*`(2, `*`(c)), `-`(b)), `+`(`-`(4), `-`(`*`(2, `*`(`^`(b, 2)))), `*`(3, `*`(d, `*`(b)))), variables = {a, b, c, d}, characteristic = 0, known_groebner_bases = (table( [( lexdeg...$
$POLYNOMIALIDEAL(a, `+`(`*`(2, `*`(c)), `-`(b)), `+`(`-`(4), `-`(`*`(2, `*`(`^`(b, 2)))), `*`(3, `*`(d, `*`(b)))), variables = {a, b, c, d}, characteristic = 0, known_groebner_bases = (table( [( lexdeg...$
$POLYNOMIALIDEAL(a, `+`(`*`(2, `*`(c)), `-`(b)), `+`(`-`(4), `-`(`*`(2, `*`(`^`(b, 2)))), `*`(3, `*`(d, `*`(b)))), variables = {a, b, c, d}, characteristic = 0, known_groebner_bases = (table( [( lexdeg...$
$POLYNOMIALIDEAL(a, `+`(`*`(2, `*`(c)), `-`(b)), `+`(`-`(4), `-`(`*`(2, `*`(`^`(b, 2)))), `*`(3, `*`(d, `*`(b)))), variables = {a, b, c, d}, characteristic = 0, known_groebner_bases = (table( [( lexdeg...$
$POLYNOMIALIDEAL(a, `+`(`*`(2, `*`(c)), `-`(b)), `+`(`-`(4), `-`(`*`(2, `*`(`^`(b, 2)))), `*`(3, `*`(d, `*`(b)))), variables = {a, b, c, d}, characteristic = 0, known_groebner_bases = (table( [( lexdeg...$
$POLYNOMIALIDEAL(a, `+`(`*`(2, `*`(c)), `-`(b)), `+`(`-`(4), `-`(`*`(2, `*`(`^`(b, 2)))), `*`(3, `*`(d, `*`(b)))), variables = {a, b, c, d}, characteristic = 0, known_groebner_bases = (table( [( lexdeg...$
$POLYNOMIALIDEAL(a, `+`(`*`(2, `*`(c)), `-`(b)), `+`(`-`(4), `-`(`*`(2, `*`(`^`(b, 2)))), `*`(3, `*`(d, `*`(b)))), variables = {a, b, c, d}, characteristic = 0, known_groebner_bases = (table( [( lexdeg...$

(3.2)

The following table shows some benchmarks in comparison to Maple 15.

Task	Problem	#Variables	Dimension	Max degree	Maple 15	Maple 16	Speedup
prime decomposition	circles
prime decomposition	vermeer
prime decomposition	wang92c
prime decomposition	ctoa
prime decomposition	butcher
radical	butcher

Bernoulli Numbers

Bernoulli numbers are a sequence of rationals that are important in number theory and in particular for the Euler-Maclaurin formula.

Maple is able to compute Bernoulli numbers very efficiently:

(4.1)

Maple 16 is able to take advantage of multi-core machines to compute sequences of Bernoulli numbers in parallel:

Number of Cores	Calculate all even-index Bernoulli numbers up to index 6000
1	249s
2	128s
3	88s
4	67s

Numeric linear algebra

The compiled C version (CLAPACK) of the LAPACK numerical linear algebra library used and shipped with Maple has been upgraded to version 3.2.1. Algorithms used for nonsymmetric eigen-solving have been markedly improved. For example, the time to compute all eigenvectors of a size 1000x1000 random nonsymmetric hardware float Matrix, has decreased from 30.1 seconds in Maple 15 to 11.6 seconds in Maple 16 (AMD X2 4600+ 64bit Linux)

Vector[column](%id = 18446744078138800478)

(5.1)

Vector[column](%id = 18446744078138800598), Vector[column](%id = 18446744078138800718)

(5.2)

Parallel map, seq, add and mul

Maple offers parallel implementations of the powerful map, seq, add and mul commands in the Threads package. These commands automatically parallelize over the available core for each operation and yield significant speed ups with very little user effort. The heuristic used to automatically determine how to split up the computation among the cores has been dramatically improved in Maple 16. The following charts show the behavior of Maple 15 vs. Maple 16 on two example problems, one with a large number number of short computations and one with a smaller number of long-running computations.

Plot_2d

In addition to a better automated heuristic, Maple 16 also offers direct user control about how to split such computations up across cores with the new tasksize option.

Memory management

Automatic memory management alleviates a programmer from the low-level details of allocating and deallocating storage space for the results of their computation. This enables programmers to concentrate on developing correct and efficient high-level algorithms to solve their problems.Significant improvements have been made to both the memory allocator and garbage collector that together comprise Maple's memory management system. The modernization of the memory management system positions Maple to effectively use the resources available on not only the machines of today but also those of tomorrow. For example, the advances in CPU performance will continue to be driven by the increase in the number of cores per chip. Matching this trend, the advances to both the memory allocator and garbage collector are primarily centered around simultaneously handling the memory demands of multiple threads running on multi-core machines. Furthermore, Maple 16 is able to more effectively use the larger amounts of memory available on 64-bit architectures.These memory management enhancements will improve the computation time of any large or memory-intensive operations throughout Maple.

The following charts illustrate the gains derived from the new memory manager on a parallel convex hull example:

Plot_2d

Grid computing

Distributed systems offer fantastic gains when it comes to solving large-scale problems. By sharing the computational load, you can solve problems too large for a single computer to handle, or solve problems in a fraction of the time it would take with a single computer. The Maple Grid Computing Toolbox platform support has been extended for Maple 16, making it easy to create and test parallel distributed programs on some of the world's largest clusters.

MPICH2 is a high performance implementation of the Message Passing Interface (MPI) standard, distributed by Argonne National Laboratory (http://www.mcs.anl.gov/research/projects/mpich2/). The stated goals of MPICH2 are to:

Provide an MPI implementation that efficiently supports different computation and communication platforms including commodity clusters (desktop systems, shared-memory systems, multicore architectures), high-speed networks (10 Gigabit Ethernet, InfiniBand, Myrinet, Quadrics) and proprietary high-end computing systems (Blue Gene, Cray, SiCortex) and
Enable cutting-edge research in MPI through an easy-to-extend modular framework for other derived implementations.

The Grid Computing Toolbox for Maple 16 now includes the ability to easily set up multi-process computations that interface with MPICH2 to to deploy multi-machine or cluster parallel computations.

How to Proceed:

Pricing & Purchase

Evaluate

Upgrade

Get Price Quote

Buy Online