Statistics
Excise
remove data items based on density
Calling Sequence
Parameters
Options
Description
Examples
Excise(p, X, Y, Z, options)
p
-
fraction of data points to be removed from the sample
X
first data sample
Y
(optional) second data sample
Z
(optional) third data sample
options
(optional) equation(s) of the form option=value where option is one of to_plot, or return_same; specify options for the Excise function
The options argument can contain one or more of the options shown below.
to_plot = truefalse
The range of the data samples will likely change with the removal of some points, and so the original scope of the data would be lost. This would make it impractical to make any kind of comparison with a plot of the data, since it would now be on a different range and scale than the original data. Setting this option to true will return a line after the returned data samples which will set the viewing range to be as though all the data is present. The returned structure will be of the form view = [range(s)] and will be used as a plot option by the corresponding plot function. The default value for this option is false.
return_same = truefalse
This option specifies whether or not the returned data samples should be returned as the same data type as they were entered in. The default value for this option is false.
The Excise command calculates how densely clustered every point of data is with respect to every other point of data. A certain number of points, determined by the magnitude of p, are removed from the data and the remaining points are returned as an expression sequence of one dimensional Arrays. Excise returns the same number of data samples as are passed into the function.
If p is a positive number between zero and one, and n is the number of points passed to Excise, then p*n of the least densely clustered points are excised, and (1-p)*n of the most densely clustered points will be returned.
If p is a negative number between zero and negative one, and n is the number of points passed to Excise, then (-p)*n of the most densely clustered points are excised, and (1+p)*n of the least densely clustered points will be returned.
The parameters X, Y and Z are the data samples to be excised. Each can be given as a Vector, Matrix, Array, or list, though they do not all have to be of the same type. They also do not need to be one dimensional, but will be treated as though they are. The first data sample, X, is required, but the second and third data samples, Y and Z respectively, are optional. Note that all data samples must have the same number of elements.
with⁡Statistics:
A simple 1D case. Excise will remove the sparsest half of the data, leaving the densest half, which it returns as a 1D Array. In this case, this will the center four points.
data1≔Array⁡1,2,3,4,5,6,7,8:
ret1≔Excise⁡0.5,data1
ret1≔4.5.3.6.
type⁡ret1,Array
true
If a negative fraction is used as the first argument, then the returned data will be the sparsest points, in this case the outer four points.
Excise⁡−0.5,data1
8.1.7.2.
If the return_same option is used, then Excise will return the remaining data as the same type as was entered.
data2≔2,4,6,8,10,12,14,16,18,20,22,24
ret2≔Excise⁡23,data2,return_same
ret2≔14.,12.,10.,16.
type⁡ret2,list
Excise can be used to trim points from data and then pass the remainders to a plotting function. If the to_plot option is used, then the original range of the data will be preserved so it can be compared with the original data. This is accomplished by returning a line of the form view= [range(s)] to be used as an option by the plotting function.
A≔Sample⁡RandomVariable⁡Normal⁡0,1,500:
B≔Sample⁡RandomVariable⁡Normal⁡0,1,500:
Plot original data
ScatterPlot⁡A,B
Plot of the densest half of the data
ScatterPlot⁡Excise⁡0.5,A,B,to_plot
See Also
Statistics/DataManipulation
Statistics[Remove]
Statistics[RemoveInRange]
Statistics[RemoveNonNumeric]
Download Help Document