Regression Commands
The Statistics package provides various commands for fitting linear and nonlinear models to data points and performing regression analysis. The fitting algorithms are based on least-squares methods, which minimize the sum of the residuals squared.
Available Commands
Linear Fitting
Nonlinear Fitting
Other Commands
Using the Regression Commands
Examples
References
ExponentialFit
fit an exponential function to data
Fit
fit a model function to data
LeastTrimmedSquares
robust linear regression
LinearFit
fit a linear model function to data
LogarithmicFit
fit a logarithmic function to data
Lowess
produce lowess smoothed functions
NonlinearFit
fit a nonlinear model function to data
OneWayANOVA
generate a one-way ANOVA table
PolynomialFit
fit a polynomial to data
PowerFit
fit a power function to data
PredictiveLeastSquares
fit a predictive linear model function to data
RepeatedMedianEstimator
A number of commands are available for fitting a model function that is linear in the model parameters to given data. For example, the model function b⁢t2+a⁢t is linear in the parameters a and b, though it is nonlinear in the independent variable t.
The LinearFit command is available for multiple general linear regression. For certain classes of model functions involving only one independent variable, the PolynomialFit, LogarithmicFit, PowerFit, and ExponentialFit commands are available. The PowerFit and ExponentialFit commands use a transformed model function that is linear in the parameters.
The NonlinearFit command is available for nonlinear fitting. An example model function is a⁢x+ⅇb⁢y where a and b are the parameters, and x and y are the independent variables.
This command relies on local nonlinear optimization solvers available in the Optimization package. The LSSolve and NLPSolve commands in that package can also be used directly for least-squares and general nonlinear minimization.
The general Fit command allows you to provide either a linear or nonlinear model function. It then determines the appropriate regression solver to use.
The OneWayANOVA command generates the standard ANOVA table for one-way classification, given two or more groups of observations.
Various options can be provided to the regression commands. For example, the weights option allows you to specify weights for the data points and the output option allows you to control the format of the results. The options available for each command are described briefly in the command's help page and in greater detail in the Statistics/Regression/Options help page.
The format of the solutions returned by the regression commands is described in the Statistics/Regression/Solution help page.
Most of the regression commands use methods implemented in a built-in library provided by the Numerical Algorithms Group (NAG). The underlying computation is done in floating-point. Either hardware or software (arbitrary precision) floating-point computation can be specified.
The model function and data sets may be provided in different ways. Full details are available in the Statistics/Regression/InputForms help page. The regression routines work primarily with Vectors and Matrices. In most cases, lists (both flat and nested) and Arrays are also accepted and automatically converted to Vectors or Matrices. Consequently, all output, including error messages, uses these data types.
with⁡Statistics:
Define Vectors X and Y, containing values of an independent variable x and a dependent variable y.
X≔Vector⁡1.2,2.1,3.1,4.0,5.7,6.6,7.2,7.9,9.1,10.3:
Y≔Vector⁡4.6,7.7,11.5,15.4,22.2,33.1,48.1,70.6,109.0,168.4:
Find the values of a and b that minimize the least-squares error when the model function a⁢t+b⁢ⅇx is used.
Fit⁡a⁢x+b⁢exp⁡x,X,Y,x
6.02861839712210⁢x+0.00380375570529786⁢ⅇx
It is also possible to return a summary of the regression model using the summarize option:
Fit⁡a⁢x+b⁢exp⁡x,X,Y,x,summarize=embed
Model:
6.0286184⁢x+0.0038037557⁢ⅇx
Coefficients
Estimate
Standard Error
t-value
P(>|t|)
a
6.02862
0.761415
7.91765
0.0000470413
b
0.00380376
0.000494423
7.69332
0.0000577943
R-squared:
0.978977
Adjusted R-squared:
0.973721
Residuals
Residual Sum of Squares
Residual Mean Square
Residual Standard Error
Degrees of Freedom
1042.23
130.279
11.4140
8
Five Point Summary
Minimum
First Quartile
Median
Third Quartile
Maximum
−13.2999
−8.96906
−5.89077
0.691999
20.0758
Fit a polynomial of degree 3 through this data.
PolynomialFit⁡3,X,Y,x
−3.37372868459017+9.90059487215674⁢x−2.79612412098216⁢x2+0.336249676048196⁢x3
Use the output option to see the residual sum of squares and the standard errors.
PolynomialFit⁡3,X,Y,x,output=residualsumofsquares,standarderrors
47.8471318673565,
Fit the model function a⁢x+ⅇb⁢x, which is nonlinear in the parameters.
NonlinearFit⁡a⁢x+exp⁡b⁢x,X,Y,x
2.12883148575966⁢x+ⅇ0.486510105685615⁢x
Consider now an experiment where quantities x, y, and z are quantities influencing a quantity w according to an approximate relationship
w=xa+b⁢x2y+c⁢y⁢z
with unknown parameters a, b, and c. Six data points are given by the following matrix, with respective columns for x, y, z, and w.
ExperimentalData≔1,1,1,2,2,2|1,2,3,1,2,3|1,2,3,4,5,6|0.531,0.341,0.163,0.641,0.713,−0.040
ExperimentalData≔
We take an initial guess that the first term will be approximately quadratic in x, that b will be approximately 1, and for c we don't even know whether it's going to be positive or negative, so we guess c=0. We compute both the model function and the residuals. Also, we select more verbose operation by setting infolevel.
infolevelStatistics≔2:
NonlinearFit⁡xa+b⁢x2y+c⁢y⁢z,ExperimentalData,x,y,z,initialvalues=a=2,b=1,c=0,output=leastsquaresfunction,residuals
In NonlinearFit (algebraic form)
x1.14701973996968−0.298041864889394⁢x2y−0.0982511893429762⁢y⁢z,
We note that Maple selected the nonlinear fitting method. Furthermore, the exponent on x is only about 1.14, and the other guesses were not very good either. However, this problem is conditioned well enough that Maple finds a good fit anyway.
Now suppose that the relationship that is used to model the data is altered as follows:
w=a⁢x+b⁢x2y+c⁢y⁢z
We adapt the calling sequence very slightly:
Fit⁡a⁢x+b⁢x2y+c⁢y⁢z,ExperimentalData,x,y,z,initialvalues=a=2,b=1,c=0,output=leastsquaresfunction,residuals
In Fit
In LinearFit (container form)
final value of residual sum of squares: .0537598869493245
Summary: ---------------- Model: .82307292*x-.16791011*x^2/y-.75802268e-1*y*z ---------------- Coefficients: Estimate Std. Error t-value P(>|t|) a 0.8231 0.1898 4.3374 0.0226 b -0.1679 0.0940 -1.7862 0.1720 c -0.0758 0.0182 -4.1541 0.0254 ---------------- R-squared: 0.9600, Adjusted R-squared: 0.9201
0.823072918385878⁢x−0.167910114211606⁢x2y−0.0758022678386438⁢y⁢z,
infolevelStatistics≔0:
This time, Maple could select the linear fitting method, because the expression is linear in the parameters. In addition, as the infolevel is greater than 0 and the expression is linear in the parameters, a summary for the regression is displayed. The initial values for the parameters are not used.
Finally, consider a situation where an ordinary differential equation leads to results that need to be fitted. The system is given by
x⁡0=−a,ⅆⅆtx⁡t=z⁢x⁡t−b+1
where a and b are parameters that we want to find, z is a variable that we can vary between experiments, and x⁡t is a quantity that we can measure at t=1. We perform 10 experiments at z=0.1,0.2,...,1.0, and the results are as follows.
Input≔seq⁡0.1..1,0.1
Input≔0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0
Output≔1.932,2.092,2.090,2.416,2.544,2.638,2.894,3.188,3.533,3.822
We now need to set up a procedure that NonlinearFit can call to obtain the value for a given input value z and a given pair of parameters a and b. We do this using dsolve/numeric.
ODE≔x⁡0=−a,diff⁡x⁡t,t=z⁢x⁡t−b+1
ODE≔x⁡0=−a,ⅆⅆtx⁡t=z⁢x⁡t−b+1
ODE_Solution≔dsolve⁡ODE,numeric,parameters=a,b,z
ODE_Solution≔procx_rkf45...end proc
We now have a procedure ODE_Solution that can compute the correct value, but we need to write a wrapper that has the form that NonlinearFit expects. We first need to call ODE_Solution once to set the parameters, then another time to obtain the value of x⁡t at t=1, and then return this value (for more information about how this works, see dsolve/numeric). By hand, we can do this as follows:
ODE_Solution⁡parameters=a=−1,b=−0.5,z=1
a=−1.,b=−0.5,z=1.
ODE_Solution⁡1
t=1.,x⁡t=3.44630585135012
ODE_Solution⁡parameters=a=1,b=1,z=1
a=1.,b=1.,z=1.
Error, (in ODE_Solution) cannot evaluate the solution past the initial point, problem may be complex, initially singular or improperly set up
Note that for some settings of the parameters, we cannot obtain a solution. We need to take care of this in the procedure we create (which we call f), by returning a value that is very far from all output points, leading to a very bad fit for these erroneous parameter values.
f := proc(zValue, aValue, bValue) global ODE_Solution, a, b, z, x, t; ODE_Solution('parameters' = [a = aValue, b = bValue, z = zValue]); try return eval(x(t), ODE_Solution(1)); catch: return 100; end try; end proc;
f≔proczValue,aValue,bValueglobalODE_Solution,a,b,z,x,t;ODE_Solution⁡'parameters'=a=aValue,b=bValue,z=zValue;tryreturneval⁡x⁡t,ODE_Solution⁡1catch:return100end tryend proc
f⁡1,−1,−0.5
3.44630585135012
We need to provide an initial estimate for the parameter values, because the fitting procedure is only performed in a local sense. We go with the values that provided a solution above: a=−1,b=−0.5.
NonlinearFit⁡f,Input,Output,output=parametervector,initialvalues=−1,−0.5
Draper, Norman R., and Smith, Harry. Applied Regression Analysis. 3rd ed. New York: Wiley, 1998.
Applications
Parameter Estimation for an N-Channel Enhancement MOSFET
See Also
CurveFitting
Statistics
Statistics/Computation
Statistics/MaximumLikelihoodEstimate
Statistics/Regression/Options
Statistics/Regression/Solution
TimeSeriesAnalysis
Download Help Document