Student Statistics - Maple Help

All Products Maple MapleSim

Home : Support : Online Help : System : Information : Updates : Maple 18 : Student Statistics

Statistics for Students

The Student package is a collection of sub-packages designed to assist with the teaching and learning of standard undergraduate mathematics. For Maple 18, we added a new subpackage called Statistics to the Student family. Student[Statistics] provides more detailed explanations, instructions, and demonstrations about the material covered in statistics courses than is offered in the standard Statistics package.

With the Student Statistics package, students can work with data, visualize statistical distributions, and apply hypothesis tests. Students can even interactively explore the properties of different probability distributions.

There are many ways to interact with this new package. Typically, students will use Student[Statistics] to:

Create Data Samples

Work with Data Samples

Examples

Create Data Samples

There are three types of data samples valid in this package:

A data sample that follows a specific distribution
Data samples can easily be created using random variables with corresponding distributions. For example, to create a Normal random variable, one would call NormalRandomVariable( $μ$ , $σ$ ). For more information, see the Random Variable overview page.

2)	A data sample stored in a list or a Vector Each element in a list or Vector data sample represents a single recorded observation. There is no difference between a list sample and a Vector sample, either is valid.

3)	A data sample stored in a Matrix A Matrix data sample is treated as a collection of several list or Vector samples. Each column of the Matrix represents an individual sample.

Work with Data Samples

Compute quantities of data samples
There are many commands for computing quantities of data samples. This includes many different quantities, such as the Mean Value, the Standard Deviation, the Skewness, and many more. Also in this package, users are not only able to query for a symbolic formula or exact numeric value for a given quantity from a data sample, but it is also possible to return a visualization of the result.

Explore distributions
Users can easily explore the important properties of a distribution by using the command, ExploreRV. ExploreRV takes an arbitrary statistical distribution and displays an interactive interface to explore its various parameters. This includes returning key quantities, such as the mean, median and more, as well as returning visualizations of the CDF and PDF.

Apply hypothesis tests
To test a given hypothesis, there are several hypothesis tests available, including OneSampleTTest, ChiSquareGoodnessOfFitTest, ShapiroWilkWTest, and more. To better explain how and when to use different hypothesis tests, a new command, TestsGuide, is introduced in this package to direct a student through the process of choosing an appropriate test. You can read more on the Hypothesis Tests Overview page.

For more details, read through the Overview of Student Statistics page.

Examples

>	$with (Student [Statistics]) &colon;$

Example

We first define a discrete distribution:

>	$Distribution1 := BinomialRandomVariable (7, \frac{1}{2}) &colon;$

Then we can study some properties of this distribution:

>	$Mean (Distribution1)$

$\frac{7}{2}$

>	$StandardDeviation (Distribution1)$

$\frac{1}{2} \sqrt{7}$

To return a numeric value, we need to specify the optional parameter numeric or numeric=true.

>	$StandardDeviation (Distribution1, numeric)$

$1.322875656$

We can set the optional parameter output to output=plot to see a plot demonstration.

>	$ProbabilityFunction (Distribution1, x, output = plot)$

>	$CDF (Distribution1, 3, output = plot)$

To get the formula for computing the specific property of a distribution, we need to specify the optional parameter inert or inert=true.

>	$Probability (Distribution1 \leq 4, inert)$

$\sum_{_t = 0}^{4} ({\begin{array}{c} 0 & _t < 0 \\ binomial (7,_t) {(\frac{1}{2})}^{_t} {(\frac{1}{2})}^{7 -_t} & otherwise \end{array})$

Try another distribution, which is continuous.

>	$Distribution2 := NormalRandomVariable (10, 3) &colon;$

>	$Skewness (Distribution2)$

$0$

>	$Kurtosis (Distribution2)$

$3$

Example

Say we have observed and recorded some data. We can then put the data onto a list or Vector:

>	$Sample1 := [1, 2, 3, 1, 2, 3, 1, 2, 2, 2, 6, 2, 3, 4, 5, 2, 4] &colon;$

Compute the mode and the 30th percentile of this data sample:

>	$Mode (Sample1)$

$\{2\}$

>	$Percentile (Sample1, 30)$

$2$

We can randomly generate a data sample from a known distribution with a specified sample size.

>	$Sample2 := Sample (ExponentialRandomVariable (5), 1000)$

Compare the data sample generated and the original distribution.

>	$Sample (ExponentialRandomVariable (5), 1000, output = plot)$

>	$InterquartileRange (Sample2)$

$5.21969438283049$

>	$Median (Sample2)$

$3.11068660376637$

Then, test the sample to see if it follows the exponential distribution with parameter 5.

>	$ChiSquareSuitableModelTest (Sample2, ExponentialRandomVariable (5))$

Chi-Square Test for Suitable Probability Model

----------------------------------------------
Null Hypothesis:
Sample was drawn from specified probability distribution
Alt. Hypothesis:
Sample was not drawn from specified probability distribution
Bins:                    32
Degrees of freedom:      31
Distribution:            ChiSquare(31)
Computed statistic:      30.784
Computed pvalue:         0.477134
Critical value:          44.9853428040743
Result: [Accepted]
There is no statistical evidence against the null hypothesis

$[hypothesis = true, criticalvalue = 44.9853428040742, distribution = ChiSquare (31), pvalue = 0.477134451984691, statistic = 30.78400000]$

To read more on different hypothesis tests, you can use the command TestsGuide.

Example

Create a Matrix data sample:

>	$Matrix1 ≔ [\begin{array}{c} 1 & 2 & 3 \\ 2 & π & 5 \\ 9 & 7 & 3 \\ 5 & 5 & 2 \\ 2 & 8 & 10 \end{array}] &colon;$

If we want to compute the mean value of this Matrix data sample, then we are going to compute the mean values of three list or Vector data samples stored in the columns correspondingly.