Regression (regression
)¶
Linear Regression¶
Linear regression is a statistical regression method which tries to predict a value of a continuous response (class) variable based on the values of several predictors. The model assumes that the response variable is a linear combination of the predictors, the task of linear regression is therefore to fit the unknown coefficients.
Example¶
>>> from Orange.regression.linear import LinearRegressionLearner
>>> mpg = Orange.data.Table('autompg')
>>> mean_ = LinearRegressionLearner()
>>> model = mean_(mpg[40:110])
>>> print(model)
LinearModel LinearRegression(copy_X=True, fit_intercept=True, normalize=False)
>>> mpg[20]
Value('mpg', 25.0)
>>> model(mpg[0])
Value('mpg', 24.6)

class
Orange.regression.linear.
LinearRegressionLearner
(preprocessors=None, fit_intercept=True)[source]¶ A wrapper for sklearn.linear_model._base.LinearRegression. The following is its documentation:
Ordinary least squares Linear Regression.
LinearRegression fits a linear model with coefficients w = (w1, …, wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

class
Orange.regression.linear.
RidgeRegressionLearner
(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, solver='auto', preprocessors=None)[source]¶ A wrapper for sklearn.linear_model._ridge.Ridge. The following is its documentation:
Linear least squares with l2 regularization.
Minimizes the objective function:
y  Xw^2_2 + alpha * w^2_2
This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has builtin support for multivariate regression (i.e., when y is a 2darray of shape (n_samples, n_targets)).
Read more in the User Guide.

class
Orange.regression.linear.
LassoRegressionLearner
(alpha=1.0, fit_intercept=True, normalize=False, precompute=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, positive=False, preprocessors=None)[source]¶ A wrapper for sklearn.linear_model._coordinate_descent.Lasso. The following is its documentation:
Linear Model trained with L1 prior as regularizer (aka the Lasso)
The optimization objective for Lasso is:
(1 / (2 * n_samples)) * y  Xw^2_2 + alpha * w_1
Technically the Lasso model is optimizing the same objective function as the Elastic Net with
l1_ratio=1.0
(no L2 penalty).Read more in the User Guide.

class
Orange.regression.linear.
SGDRegressionLearner
(loss='squared_loss', penalty='l2', alpha=0.0001, l1_ratio=0.15, fit_intercept=True, max_iter=5, tol=0.001, shuffle=True, epsilon=0.1, n_jobs=1, random_state=None, learning_rate='invscaling', eta0=0.01, power_t=0.25, class_weight=None, warm_start=False, average=False, preprocessors=None)[source]¶ A wrapper for sklearn.linear_model._stochastic_gradient.SGDRegressor. The following is its documentation:
Linear model fitted by minimizing a regularized empirical loss with SGD
SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).
The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.
This implementation works with data represented as dense numpy arrays of floating point values for the features.
Read more in the User Guide.
Polynomial¶
Polynomial model is a wrapper that constructs polynomial features of a specified degree and learns a model on them.

class
Orange.regression.linear.
PolynomialLearner
(learner=LinearRegressionLearner(), degree=2, preprocessors=None, include_bias=True)[source]¶ Generate polynomial features and learn a prediction model
Parameters:  learner (LearnerRegression) – learner to be fitted on the transformed features
 degree (int) – degree of used polynomial
 preprocessors (List[Preprocessor]) – preprocessors to be applied on the data before learning
Mean¶
Mean model predicts the same value (usually the distribution mean) for all data instances. Its accuracy can serve as a baseline for other regression models.
The model learner (MeanLearner
) computes the mean of the given data or
distribution. The model is stored as an instance of MeanModel
.
Example¶
>>> from Orange.data import Table
>>> from Orange.regression import MeanLearner
>>> data = Table('autompg')
>>> learner = MeanLearner()
>>> model = learner(data)
>>> print(model)
MeanModel(23.51457286432161)
>>> model(data[:4])
array([ 23.51457286, 23.51457286, 23.51457286, 23.51457286])

class
Orange.regression.
MeanLearner
(preprocessors=None)[source]¶ Fit a regression model that returns the average response (class) value.

fit_storage
(data)[source]¶ Construct a
MeanModel
by computing the mean value of the given data.Parameters: data (Orange.data.Table) – data table Returns: regression model, which always returns mean value Return type: MeanModel

Random Forest¶

class
Orange.regression.
RandomForestRegressionLearner
(n_estimators=10, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, preprocessors=None)[source]¶ A wrapper for sklearn.ensemble._forest.RandomForestRegressor. The following is its documentation:
A random forest regressor.
A random forest is a meta estimator that fits a number of classifying decision trees on various subsamples of the dataset and uses averaging to improve the predictive accuracy and control overfitting. The subsample size is controlled with the max_samples parameter if bootstrap=True (default), otherwise the whole dataset is used to build each tree.
Read more in the User Guide.
Simple Random Forest¶

class
Orange.regression.
SimpleRandomForestLearner
(n_estimators=10, min_instances=2, max_depth=1024, max_majority=1.0, skip_prob='sqrt', seed=42)[source]¶ A random forest regressor, optimized for speed. Trees in the forest are constructed with
SimpleTreeLearner
classification trees.Parameters:  n_estimators (int, optional (default = 10)) – Number of trees in the forest.
 min_instances (int, optional (default = 2)) – Minimal number of data instances in leaves. When growing the three, new nodes are not introduced if they would result in leaves with fewer instances than min_instances. Instance count is weighed.
 max_depth (int, optional (default = 1024)) – Maximal depth of tree.
 max_majority (float, optional (default = 1.0)) – Maximal proportion of majority class. When this is exceeded, induction stops (only used for classification).
 skip_prob (string, optional (default = "sqrt")) –
Data attribute will be skipped with probability
skip_prob
. if float, then skip attribute with this probability.
 if “sqrt”, then skip_prob = 1  sqrt(n_features) / n_features
 if “log2”, then skip_prob = 1  log2(n_features) / n_features
 seed (int, optional (default = 42)) – Random seed.
Regression Tree¶
Orange includes two implemenations of regression tres: a homegrown one, and one from scikitlearn. The former properly handles multinominal and missing values, and the latter is faster.

class
Orange.regression.
TreeLearner
(*args, binarize=False, min_samples_leaf=1, min_samples_split=2, max_depth=None, **kwargs)[source]¶ Tree inducer with proper handling of nominal attributes and binarization.
The inducer can handle missing values of attributes and target. For discrete attributes with more than two possible values, each value can get a separate branch (binarize=False), or values can be grouped into two groups (binarize=True, default).
The tree growth can be limited by the required number of instances for internal nodes and for leafs, and by the maximal depth of the tree.
If the tree is not binary, it can contain zerobranches.
Parameters:  binarize – if True the inducer will find optimal split into two subsets for values of discrete attributes. If False (default), each value gets its branch.
 min_samples_leaf – the minimal number of data instances in a leaf
 min_samples_split – the minimal number of data instances that is split into subgroups
 max_depth – the maximal depth of the tree
Returns: Return type: instance of OrangeTreeModel

class
Orange.regression.
SklTreeRegressionLearner
(criterion='mse', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, max_features=None, random_state=None, max_leaf_nodes=None, preprocessors=None)[source]¶ A wrapper for sklearn.tree._classes.DecisionTreeRegressor. The following is its documentation:
A decision tree regressor.
Read more in the User Guide.
Neural Network¶

class
Orange.regression.
NNRegressionLearner
(hidden_layer_sizes=(100, ), activation='relu', solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e08, preprocessors=None)[source]¶ A wrapper for Orange.regression.neural_network.MLPRegressorWCallback. The following is its documentation:
Multilayer Perceptron regressor.
This model optimizes the squaredloss using LBFGS or stochastic gradient descent.
New in version 0.18.