Data Smoothing
The Statistics package provides several functions for performing data smoothing - the process of extracting identifiable patterns from data and obscuring noise. The data smoothing functionality includes algorithms to produce smoothed data (MovingAverage, MovingStatistic, ExponentialFit) or to produce an estimation curve to approximate the distribution of the population (i.e. kernel density estimation).
1 Data Filters
The Statistics package includes several data filters for smoothing otherwise rough data including moving average, moving median, moving statistic, a general linear filter, exponential fit and weighted moving average.
1.1 Stock Prices
This example demonstrates the use of data filters in analyzing stock prices.
restart:withStatistics:
Consider the following function that generates a sample stock path over N time periods. The stock is considered to have initial cost S0, trend parameter r and fluctuation parameter sigma.
StockPath:=procN∷posint,S0∷realcons,r∷realcons,σ∷realconslocal h,i,C,R,S; h≔1.N−1; C≔evalfexpr⋅h−σ2⋅h2;R≔C⋅expσ⋅h⋅RandomVariableNormal⁡0,1;S:=SampleR,N+1;S1:=S0;returnCumulativeProductSend proc:
Generate a sample stock path over 500 time periods and plot.
S:=StockPath⁡1000,100.,0.15,0.2:LineChartS,symbolsize=4,thickness=2
The data smoothing functions provided in the Statistics library now give us a means to analyze the overall trend of the data while disregarding small fluctuations. Consider the moving average function, which calculates the average value of a window around each data point.
T:=MovingAverageS,20:LineChart⁡T,symbolsize=4,thickness=2
Exponential smoothing can also be applied. This method works by 'smoothing' out rough edges, generally caused by cyclic or irregular patterns in the data.
T:=ExponentialSmoothing⁡S,0.9:LineChart⁡T,symbolsize=4,thickness=2
1.2 Department Store Sales
This example demonstrates the use of data filters in analyzing sales at a department store.
restart;withStatistics:
Consider the following function that randomly generates the times of n sales at a department store. The rate of sales is represented by the parameter r and the deviation in this rate by the parameter theta.
SaleTimes:=procN∷realcons,r∷realcons,θ∷realcons local R,S,T,i; R≔r⋅RandomVariableExponential⁡θ; S≔SampleR,N; return CumulativeSumS end proc:
Consider the first 100 sales with rate parameter 0.5 and deviation parameter 0.2.
S:=SaleTimes⁡100,0.5,0.2:LineChart⁡S,thickness=2,symbolsize=4
The overall trend is readily apparent with the application of the moving average filter.
T:=MovingAverage⁡S,20:LineChart⁡T,symbolsize=4,thickness=2
2 Kernel Density Estimation
The Statistics package provides algorithms for computing, plotting and sampling from kernel density estimates. A kernel density estimate is a continuous probability distribution used to approximate the population of a sample, constructed by considering a normalized sum of kernel functions for each data point.
The following is an example of Maple's kernel density estimation routines in action.
Consider the following bimodal data sample (hypothesized as bimodal since there appear to be two distinct clusterings of data - those in the range -1.2 to -0.8 and those in the range 0.7 to 0.9).
A:=Array−1.18,−1.12,−1.06,−1.02,−0.84,0.72,0.78,0.89:Z:=Array0.:
By applying kernel density estimation, we can create a function to interpolate the data. Since our data sample is relatively small, we can perform exact kernel density estimation. The exact method of kernel density estimation returns a probability density function which can then be evaluated at specific points.
F:=KernelDensity⁡A,bandwidth=0.4,kernel=gaussian,method=exact:evalf⁡F⁡−1.0,F⁡0.0,F⁡0.5,F⁡2.0
0.5947413597,0.08016057122,0.2829169446,0.004587682613
We can convert the kernel density estimate to a distribution using one of the standard RandomVariable constructors.
R:=RandomVariable⁡Distribution⁡PDF=x→F⁡x:
evalf⁡PDF⁡R,−1.0,PDF⁡R,0.0,PDF⁡R,0.5,PDF⁡R,2.0
evalf⁡CDF⁡R,−1.0,CDF⁡R,0.0,CDF⁡R,0.5,CDF⁡R,2.0
0.3394631178,0.6303924803,0.7121675015,0.9994260712
This probability density function can also be plotted, in this case against the cumulative distribution function.
P1:=plot⁡PDF⁡R,x,x=−2.5..2.5,thickness=3:P2:=plotCDF⁡R,x,x=−2.5..2.5,thickness=3,color=blue:plotsdisplayP2,P1
With the KernelDensitySample function, similar data can be quickly drawn from a data sample.
S:=KernelDensitySampleA,100000,bandwidth=0.4,kernel=gaussian:P1:=HistogramS,averageshifted=1,binwidth=0.1,range=−2.5..2.5:P2:=plotPDF⁡R,x,x=−2.5..2.5,thickness=3,color=red:plotsdisplayP1,P2
A kernel density estimate can be directly plotted using the KernelDensityPlot function. The following example demonstrates the difference between different choices of bandwidth.
P1:=KernelDensityPlot⁡A,bandwidth=0.1,kernel=biweight,method=exact,color=turquoise,thickness=2,range=−2..2:P2:=KernelDensityPlot⁡A,bandwidth=0.3,kernel=biweight,method=exact,color=blue,thickness=2,range=−2..2: P3≔KernelDensityPlotA,bandwidth=0.6,kernel=biweight,method=exact,color=navy,thickness=2,range=−2..2:plotsdisplayP1,P2,P3
In most cases, only a few hundred samples are needed to roughly approximate the original probability distribution with a kernel density estimate.
B:=Sample⁡StudentT⁡2,600:P1≔HistogramB,range=−5..5:P2≔DensityPlotStudentT⁡2,color=blue,thickness=3,range=−5..5:P3:=KernelDensityPlot⁡B,kernel=gaussian,method=piecewise,color=red,thickness=3,range=−5..5: plotsdisplayP1,P2,P3
Available Kernels
Kernel density estimation requires the use of a kernel function - a normalized continuous function that is mapped to each data point. Five standard kernel functions are available with kernel density estimation.
2.1 Gaussian Kernel
The Gaussian kernel should be used with continuous data that is defined on the whole real line. It possesses the familiar bell shape and is based on the Gaussian probability density function.
KernelDensityPlot⁡Z,kernel=gaussian,method=exact,thickness=3;KernelDensityPlot⁡A,kernel=gaussian,bandwidth=0.4,method=exact,thickness=3
2.2 Triangular Kernel
The triangular kernel is a piecewise function related to the triangular distribution. This kernel generally creates a kernel density estimate with sharp edges, although remaining relatively smooth.
KernelDensityPlot⁡Z,kernel=triangular,method=exact,thickness=3;KernelDensityPlot⁡A,kernel=triangular,bandwidth=0.4,method=exact,thickness=3
2.3 Rectangular Kernel
The rectangular kernel is a piecewise function related to the uniform distribution. This kernel creates a kernel density estimate that resembles a staircase function.
KernelDensityPlot⁡Z,kernel=rectangular,method=exact,thickness=3;KernelDensityPlot⁡A,kernel=rectangular,bandwidth=0.4,method=exact,thickness=3
2.4 Biweight Kernel
The biweight kernel is a smooth kernel that is defined on a finite interval, unlike the gaussian kernel. It should be used for bounded data that is smooth along the interval it is defined on.
KernelDensityPlot⁡Z,kernel=biweight,method=exact,thickness=3;KernelDensityPlot⁡A,kernel=biweight,bandwidth=0.4,method=exact,thickness=3
2.5 Epanechnikov Kernel
The Epanechnikov kernel is the standard kernel for kernel density estimation. It generally provides the closest match to a probability density function under most circumstances. The kernel itself is a rounded function similar to the biweight, except it is not differentiable at its boundaries.
KernelDensityPlot⁡Z,kernel=epanechnikov,method=exact,thickness=3;KernelDensityPlot⁡A,kernel=epanechnikov,bandwidth=0.4,method=exact,thickness=3
Return to Index for Example Worksheets
Download Help Document