# Regression (`regression`)¶

## Linear Regression¶

Linear regression is a statistical regression method which tries to predict a value of a continuous response (class) variable based on the values of several predictors. The model assumes that the response variable is a linear combination of the predictors, the task of linear regression is therefore to fit the unknown coefficients.

### Example¶

```>>> from Orange.regression.linear import LinearRegressionLearner
>>> mpg = Orange.data.Table('auto-mpg')
>>> mean_ = LinearRegressionLearner()
>>> model = mean_(mpg[40:110])
>>> print(model)
LinearModel LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
>>> mpg
Value('mpg', 25.0)
>>> model(mpg)
Value('mpg', 24.6)
```
class `Orange.regression.linear.``LinearRegressionLearner`(preprocessors=None, fit_intercept=True)[source]

A wrapper for sklearn.linear_model._base.LinearRegression. The following is its documentation:

Ordinary least squares Linear Regression.

LinearRegression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

class `Orange.regression.linear.``RidgeRegressionLearner`(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, solver='auto', preprocessors=None)[source]

A wrapper for sklearn.linear_model._ridge.Ridge. The following is its documentation:

Linear least squares with l2 regularization.

Minimizes the objective function:

```||y - Xw||^2_2 + alpha * ||w||^2_2
```

This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has built-in support for multi-variate regression (i.e., when y is a 2d-array of shape (n_samples, n_targets)).

Read more in the User Guide.

class `Orange.regression.linear.``LassoRegressionLearner`(alpha=1.0, fit_intercept=True, normalize=False, precompute=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, positive=False, preprocessors=None)[source]

A wrapper for sklearn.linear_model._coordinate_descent.Lasso. The following is its documentation:

Linear Model trained with L1 prior as regularizer (aka the Lasso)

The optimization objective for Lasso is:

```(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1
```

Technically the Lasso model is optimizing the same objective function as the Elastic Net with `l1_ratio=1.0` (no L2 penalty).

Read more in the User Guide.

class `Orange.regression.linear.``SGDRegressionLearner`(loss='squared_loss', penalty='l2', alpha=0.0001, l1_ratio=0.15, fit_intercept=True, max_iter=5, tol=0.001, shuffle=True, epsilon=0.1, n_jobs=1, random_state=None, learning_rate='invscaling', eta0=0.01, power_t=0.25, class_weight=None, warm_start=False, average=False, preprocessors=None)[source]

A wrapper for sklearn.linear_model._stochastic_gradient.SGDRegressor. The following is its documentation:

Linear model fitted by minimizing a regularized empirical loss with SGD

SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).

The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.

This implementation works with data represented as dense numpy arrays of floating point values for the features.

Read more in the User Guide.

class `Orange.regression.linear.``LinearModel`(skl_model)[source]

## Polynomial¶

Polynomial model is a wrapper that constructs polynomial features of a specified degree and learns a model on them.

class `Orange.regression.linear.``PolynomialLearner`(learner=LinearRegressionLearner(), degree=2, preprocessors=None, include_bias=True)[source]

Generate polynomial features and learn a prediction model

Parameters: learner (LearnerRegression) – learner to be fitted on the transformed features degree (int) – degree of used polynomial preprocessors (List[Preprocessor]) – preprocessors to be applied on the data before learning

## Mean¶

Mean model predicts the same value (usually the distribution mean) for all data instances. Its accuracy can serve as a baseline for other regression models.

The model learner (`MeanLearner`) computes the mean of the given data or distribution. The model is stored as an instance of `MeanModel`.

### Example¶

```>>> from Orange.data import Table
>>> from Orange.regression import MeanLearner
>>> data = Table('auto-mpg')
>>> learner = MeanLearner()
>>> model = learner(data)
>>> print(model)
MeanModel(23.51457286432161)
>>> model(data[:4])
array([ 23.51457286,  23.51457286,  23.51457286,  23.51457286])
```
class `Orange.regression.``MeanLearner`(preprocessors=None)[source]

Fit a regression model that returns the average response (class) value.

`fit_storage`(data)[source]

Construct a `MeanModel` by computing the mean value of the given data.

Parameters: data (Orange.data.Table) – data table regression model, which always returns mean value `MeanModel`

## Random Forest¶

class `Orange.regression.``RandomForestRegressionLearner`(n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, preprocessors=None)[source]

A wrapper for sklearn.ensemble._forest.RandomForestRegressor. The following is its documentation:

A random forest regressor.

A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is controlled with the max_samples parameter if bootstrap=True (default), otherwise the whole dataset is used to build each tree.

Read more in the User Guide.

## Simple Random Forest¶

class `Orange.regression.``SimpleRandomForestLearner`(n_estimators=10, min_instances=2, max_depth=1024, max_majority=1.0, skip_prob='sqrt', seed=42)[source]

A random forest regressor, optimized for speed. Trees in the forest are constructed with `SimpleTreeLearner` classification trees.

Parameters: n_estimators (int, optional (default = 10)) – Number of trees in the forest. min_instances (int, optional (default = 2)) – Minimal number of data instances in leaves. When growing the three, new nodes are not introduced if they would result in leaves with fewer instances than min_instances. Instance count is weighed. max_depth (int, optional (default = 1024)) – Maximal depth of tree. max_majority (float, optional (default = 1.0)) – Maximal proportion of majority class. When this is exceeded, induction stops (only used for classification). skip_prob (string, optional (default = "sqrt")) – Data attribute will be skipped with probability `skip_prob`. if float, then skip attribute with this probability. if “sqrt”, then skip_prob = 1 - sqrt(n_features) / n_features if “log2”, then skip_prob = 1 - log2(n_features) / n_features seed (int, optional (default = 42)) – Random seed.
`fit_storage`(data)[source]

Default implementation of fit_storage defaults to calling fit. Derived classes must define fit_storage or fit

## Regression Tree¶

Orange includes two implemenations of regression tres: a home-grown one, and one from scikit-learn. The former properly handles multinominal and missing values, and the latter is faster.

class `Orange.regression.``TreeLearner`(*args, binarize=False, min_samples_leaf=1, min_samples_split=2, max_depth=None, **kwargs)[source]

Tree inducer with proper handling of nominal attributes and binarization.

The inducer can handle missing values of attributes and target. For discrete attributes with more than two possible values, each value can get a separate branch (binarize=False), or values can be grouped into two groups (binarize=True, default).

The tree growth can be limited by the required number of instances for internal nodes and for leafs, and by the maximal depth of the tree.

If the tree is not binary, it can contain zero-branches.

Parameters: binarize – if True the inducer will find optimal split into two subsets for values of discrete attributes. If False (default), each value gets its branch. min_samples_leaf – the minimal number of data instances in a leaf min_samples_split – the minimal number of data instances that is split into subgroups max_depth – the maximal depth of the tree instance of OrangeTreeModel
`fit_storage`(data)[source]

Default implementation of fit_storage defaults to calling fit. Derived classes must define fit_storage or fit

class `Orange.regression.``SklTreeRegressionLearner`(criterion='mse', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, max_features=None, random_state=None, max_leaf_nodes=None, preprocessors=None)[source]

A wrapper for sklearn.tree._classes.DecisionTreeRegressor. The following is its documentation:

A decision tree regressor.

Read more in the User Guide.

## Neural Network¶

class `Orange.regression.``NNRegressionLearner`(hidden_layer_sizes=(100, ), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, preprocessors=None)[source]

A wrapper for Orange.regression.neural_network.MLPRegressorWCallback. The following is its documentation:

Multi-layer Perceptron regressor.

This model optimizes the squared-loss using LBFGS or stochastic gradient descent.

New in version 0.18.