Statistics Enhancements in Maple 16
Maple 16 includes significant enhancements to the Statistics package. These enhancements include:
withStatistics:
Discrete distributions with noninteger values
Parameter estimation is more efficient and handles more cases
Matrix data sets
Live data plots
Variable-width histograms
ScatterPlot3D
Pie chart improvements
Starting with version 16, Maple supports discrete distributions that can have noninteger values.
At the market
At a market stall, a vendor charges $5 for taking part in a game of chance. If you participate, you will receive the contents of one of four envelopes filled with small change, each with probability 1/4. The values of the envelopes are $8.24, $3.77, $3.91, and $0.16.
R ≔ RandomVariableEmpiricalDistribution8.24, 4.12, 3.91, 0.16:
We can find the expected amount of money we will receive as MeanR = 4.10750000000000. This sounds like a fairly unappealing deal.
Suppose furthermore that we are interested in buying peanuts with this money. Because of overhead, buying a few peanuts is more expensive per peanut than if you buy a lot of them. Our peanut supplier will sell us 10 x2 grams of peanuts for $x (for x < 10), so for $5, we can get 250g of peanuts. The expected weight of peanuts we will get if we participate in the game is:
ExpectedValue10 R2 = 10018574000, or ExpectedValue10 R2, 'numeric' = 250.4642500.
Since squaring so strongly benefits the highest outcome, $8.24, the expected payout in peanuts if we take part in the game is essentially equal to the payout without taking part in the game.
When the probabilities for different outcomes are not all the same (or all small multiples of a single value), we can use the new probabilities option to EmpiricalDistribution. In a more refined model, the weight or volume of the envelopes might influence how likely each one is to be picked. For example, suppose the probabilities are as follows:
where the top row gives the values in dollars as before and the bottom row gives the probabilities. (This information is tied to the variable Probabilities using the data table feature.) Doing the same computations as above, we now see:
R2 ≔ RandomVariableEmpiricalDistributionProbabilities1, 'probabilities' = Probabilities2:
MeanR2 = 4.68400000000000
The expected outcome is higher, but still falls short of the price. However, the expected weight of peanuts is nowExpectedValue10 R22 = 8149792500 or ExpectedValue10 R22, 'numeric' = 325.9916000.
We now see that the payout in peanuts is better if we take part in the game!
Custom distributions
EmpiricalDistribution can be used for all discrete distributions that can assume only finitely many values. All the discrete distributions that can assume infinitely many values that are built into Maple only support integer values. Therefore, to use a discrete distribution that can assume infinitely many values, some or all of which are not integers, we need to define this distribution ourselves, using the custom distribution feature of the Statistics package.
Consider the distribution that can assume all negative powers p of 2, each with probability p. That is, the corresponding random variable is 12 with probability 12, 14 with probability 14, and so on. This distribution is represented as follows:
d ≔ Distribution 'ProbabilityFunction' = p → p, 'DiscreteValueMap' = n → 2−n, 'Support' = 1 .. ∞, 'Type' = 'discrete':
R ≔ RandomVariabled:
The DiscreteValueMap and Support properties determine what values the probability distribution can assume; the ProbabilityFunction determines the probability that that value is assumed. For more details, see the DiscreteValueMap help page. We can now compute that
MeanR = 13, for example, or StandardDeviationR = 121⁢14, or CumulantR, 3 = −2945.
Another distribution can be obtained by taking the probabilities 8⋅ n⋅n + 1⋅n+4117⋅3n, for n = 0 .. ∞ (a modified negative binomial distribution) and associating the value ⅇn with the nth probability. This is defined as follows:
d2 ≔ Distribution 'DiscreteValueMap' = n → ⅇn, 'Support' = 0 .. ∞, 'ProbabilityFunction' = x → 8⋅ lnx⋅lnx+1⋅lnx+4117 ⋅ xln3, 'Type' = 'discrete':
R2 ≔ RandomVariabled2:
Again, we can compute such quantities as
MeanR2 = −1613⁢2⁢ⅇ−15⁢ⅇⅇ−34 and VarianceR2 = ∞.
Sampling custom discrete distributions
In previous versions, Maple did not support sampling of custom discrete distributions. This feature was added to Maple 16.
evalf5~SampleR, 10
evalf5~SampleR2, 10
Maple 16 has much more efficient and robust routines for doing maximum likelihood parameter estimation for many distributions. The following example was sped up by a factor of about 10.
s ≔ SampleNormal−1, exp1, 104:
CodeTools:-UsageMaximumLikelihoodEstimateNormal−1, σ, s;
2.70664452698181
Maple can now also estimate multiple parameters at the same time using maximum likelihood estimation.
CodeTools:-UsageMaximumLikelihoodEstimateNormalμ, σ, s;
μ=−1.09246434540031,σ=2.70506468306018
For more information, see the MaximumLikelihoodEstimate help page.
The Statistics package in Maple 16 has been updated to better handle Matrix data sets. In previous releases, there were instances where the Statistics package did not always accept Matrix data types. In Maple 16, the commands in the Statistics package have been updated to work with Matrix data sets. These commands work on each column of its input Matrix separately. In addition, Maple 16 now allows you to split your data into submatrices based on the value of one column. As a result you are now able to organize and present your data in different configurations in order to better observe particular trends. For more details see the SplitByColumn help page.
As an example, assume the data below represents some housing data. The first column has the number of bedrooms, the second column has the number of square feet, the third has the price in dollars. This data table corresponds to the variable HouseSalesData.
Using the SplitByColumn command we can easily rearrange the data in terms of the number of bedrooms:
PerBedroom ≔ SplitByColumnHouseSalesData,1
If we want to know the average area and price for the three bedroom houses, we can find that as follows:
ThreeBedrooms ≔ PerBedroom2;
MeanThreeBedrooms;
We see that the average three bedroom house has an area of about 960 square feet and costs just over $100100.
A new palette in Maple 16 makes it easy to create and customize statistical plots, including area charts, histograms, pie charts, and scatter plots.
From the Live Data Plots palette, click a plot type to insert this palette item into your document. To display your dataset, replace the placeholder with your dataset. You can customize the plot by clicking on options. For more information, see Live Data Plots in Maple 16.
Sometimes a phenomenon that lends itself to display using a histogram changes rapidly in a certain region and not so rapidly in a different region. In the region where the phenomenon changes rapidly, you would like to show a very fine-grained histogram, but elsewhere that would be overkill and be distracting. In this case, you can use the variable-width bins feature for histograms, new in Maple 16.
For example, suppose you have data that can come from one of two processes: either from a Beta distribution with parameters 3 and 2.5, or from a normal distribution with mean -2 and standard deviation 5. We have 10000 elements from each.
s1 ≔ SampleBetaDistribution3, 2.5, 102:
s2 ≔ SampleNormalDistribution−2, 3, 102:
s ≔ Joins1, s2:
The default histogram is too coarse:
Histograms;
But a histogram with a much finer bin width shows too many empty bins:
Histograms, binwidth=0.15;
The following command uses wider bins where there are fewer points. (The heights of the rectangles are proportional to the density of points in the given bin, and the total area of all rectangles is 1.)
Histograms, binbounds = proportional
For more information, see Histogram.
The ScatterPlot3D command provides functionality to plot a surface from an mx3 Array or Matrix representing points in three-dimensional space. The surface is a smoothed approximation, generated using the lowess algorithm. Considering each row of the data Matrix as a point in x-y-z space then the first two entries of each row represent a point on the x-y plane (independent data) while the third entry of each row represents the z-coordinate (dependent data).
The data in the first two entries of each row does not need to form a regular grid in the (x-y) plane.
The following example constructs data by adding noise to a function (z-value) in the first two (x and y) dimensions.
X := Sample(Uniform(-50,50),175):
Y := Sample(Uniform(-50,50),175):
Zerror := Sample(Normal(0,100),175):
Z := Array(1..175,(i)->-(sin(Y[i]/20)*(X[i]-6)^2+(Y[i]-7)^2+Zerror[i])):
XYZ := Matrix([[X],[Y],[Z]],datatype=float[8])^%T;
ScatterPlot3D(XYZ, axes=box, orientation=[20,0,0]);
ScatterPlot3D(XYZ, lowess, grid=[25,25], axes=box, orientation=[20,70,0]);
In Maple 16, you can now create three-dimensional pie charts and annular pie charts.
Additional improvements include new default coloring of pie charts. When you specify ranges for pie chart colors, the two range endpoint colors are plotted opposite each other in the pie chart. The color gradient for the pie chart changes clockwise. The colors in the pie chart are kept within the same hue.
In addition, labels of pie slices are automatically colored for better contrast.
dataset := ["A" = 5, "B" = 4, "C" = 3, "D" = 2, "E" = 3, "F" = 4, "G" = 5]:
Statistics:-PieChart(dataset, color = "CornflowerBlue" .. "DarkBlue", annular = true, render3d = true);
See Also
Statistics
Download Help Document