Accelerating the pace of engineering and science

Statistics Toolbox

Regression and ANOVA

Regression

With regression, you can model a continuous response variable as a function of one or more predictors. Statistics Toolbox offers a wide variety of regression algorithms, including linear regression, generalized linear models, nonlinear regression, and mixed-effects models.

Linear Regression

Linear regression is a statistical modeling technique used to describe a continuous response variable as a function of one or more predictor variables. It can help you understand and predict the behavior of complex systems or analyze experimental, financial, and biological data.

The toolbox offers several types of linear regression models and fitting methods, including:

• Simple: model with only one predictor
• Multiple: model with multiple predictors
• Multivariate: model with multiple response variables
• Robust:  model in the presence of outliers
• Stepwise: model with automatic variable selection
• Regularized:  model that can deal with redundant predictors and prevent overfitting using ridge, lasso, and elastic net algorithms

Computational Statistics: Feature Selection, Regularization, and Shrinkage with MATLAB 36:51
In this webinar, you will learn how to use Statistics Toolbox to generate accurate predictive models from data sets that contain large numbers of correlated variables.

Nonlinear Regression

Nonlinear regression is a statistical modeling technique that helps describe nonlinear relationships in experimental data. Nonlinear regression models are generally assumed to be parametric, where the model is described as a nonlinear equation. Typically machine learning methods are used for non-parametric nonlinear regression.

The toolbox also offers robust nonlinear fitting to deal with outliers in the data.

Fitting with MATLAB: Statistics, Optimization, and Curve Fitting 38:37
In this webinar, you will learn applied curve fitting using MathWorks products. MathWorks engineers will present a series of techniques for solving real world challenges.

Generalized Linear Models

Generalized linear models are a special case of nonlinear models that use linear methods. They allow for the response variables to have nonnormal distributions and a link function that describes how the expected value of the response is related to the linear predictors.

Statistics Toolbox supports fitting generalized linear models with the following response distributions:

• Normal (probit regression)
• Binomial (logistic regression)
• Poisson
• Gamma
• Inverse Gaussian

Fitting Data with Generalized Linear Models (Example)
How to fit and evaluate generalized linear models using glmfit and glmval.

Mixed-Effects Models

Linear and nonlinear mixed-effects models are generalizations of linear and nonlinear models for data that is collected and summarized in groups. These models describe the relationship between a response variable and independent variables, with coefficients that can vary with respect to one or more grouping variables.

Statistics Toolbox supports fitting multilevel or hierarchical models with nested and/or crossed random effects, which can be used to perform a variety of studies, including:

• Longitudinal analysis/panel analysis
• Repeated measures modeling
• Growth modeling
Plot comparing Gross State Product for three states fitted using a multilevel mixed-effects model (left) and ordinary least-squares (right). The fitlme function in Statistics Toolbox can create models with increased prediction accuracy when data is collected and summarized in groups.

Model Assesment

Statistics Toolbox enables you to perform model assessment for regression algorithms using tests for statistical significance and goodness-of-fit measures such as:

• F-statistic and t-statistic
• Cross-validated mean squared error
• Akaike information criterion (AIC) and Bayesian information criterion (BIC)

You can calculate confidence intervals for both regression coefficients and predicted values.

Nonparametric Regression

Statistics Toolbox also supports nonparametric regression techniques for generating an accurate fit without specifying a model that describes the relationship between the predictor and the response. Nonparametric regression techniques can be more broadly classified under supervised machine learning for regression and include decision trees as well as boosted and bagged regression trees.

Nonparametric Fitting 4:07
Develop a predictive model without specifying a function that describes the relationship between variables.

ANOVA

Analysis of variance (ANOVA) enables you to assign sample variance to different sources and determine whether the variation arises within or among different population groups. Statistics Toolbox includes these ANOVA algorithms and related techniques: