Scoring methods (scoring)

CA

Orange.evaluation.CA(results=None, **kwargs)[source]

A wrapper for sklearn.metrics._classification.accuracy_score. The following is its documentation:

Accuracy classification score.

In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.

Read more in the User Guide.

Precision

Orange.evaluation.Precision(results=None, **kwargs)[source]

A wrapper for sklearn.metrics._classification.precision_score. The following is its documentation:

Compute the precision.

The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.

The best value is 1 and the worst value is 0.

Support beyond term:binary targets is achieved by treating multiclass and multilabel data as a collection of binary problems, one for each label. For the binary case, setting average='binary' will return precision for pos_label. If average is not 'binary', pos_label is ignored and precision for both classes are computed, then averaged or both returned (when average=None). Similarly, for multiclass and multilabel targets, precision for all labels are either returned or averaged depending on the average parameter. Use labels specify the set of labels to calculate precision for.

Read more in the User Guide.

Recall

Orange.evaluation.Recall(results=None, **kwargs)[source]

A wrapper for sklearn.metrics._classification.recall_score. The following is its documentation:

Compute the recall.

The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The best value is 1 and the worst value is 0.

Support beyond term:binary targets is achieved by treating multiclass and multilabel data as a collection of binary problems, one for each label. For the binary case, setting average='binary' will return recall for pos_label. If average is not 'binary', pos_label is ignored and recall for both classes are computed then averaged or both returned (when average=None). Similarly, for multiclass and multilabel targets, recall for all labels are either returned or averaged depending on the average parameter. Use labels specify the set of labels to calculate recall for.

Read more in the User Guide.

F1

Orange.evaluation.F1(results=None, **kwargs)[source]

A wrapper for sklearn.metrics._classification.f1_score. The following is its documentation:

Compute the F1 score, also known as balanced F-score or F-measure.

The F1 score can be interpreted as a harmonic mean of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal. The formula for the F1 score is:

\[\text{F1} = \frac{2 * \text{TP}}{2 * \text{TP} + \text{FP} + \text{FN}}\]

Where \(\text{TP}\) is the number of true positives, \(\text{FN}\) is the number of false negatives, and \(\text{FP}\) is the number of false positives. F1 is by default calculated as 0.0 when there are no true positives, false negatives, or false positives.

Support beyond binary targets is achieved by treating multiclass and multilabel data as a collection of binary problems, one for each label. For the binary case, setting average='binary' will return F1 score for pos_label. If average is not 'binary', pos_label is ignored and F1 score for both classes are computed, then averaged or both returned (when average=None). Similarly, for multiclass and multilabel targets, F1 score for all labels are either returned or averaged depending on the average parameter. Use labels specify the set of labels to calculate F1 score for.

Read more in the User Guide.

PrecisionRecallFSupport

Orange.evaluation.PrecisionRecallFSupport(results=None, **kwargs)[source]

A wrapper for sklearn.metrics._classification.precision_recall_fscore_support. The following is its documentation:

Compute precision, recall, F-measure and support for each class.

The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label a negative sample as positive.

The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

The F-beta score can be interpreted as a weighted harmonic mean of the precision and recall, where an F-beta score reaches its best value at 1 and worst score at 0.

The F-beta score weights recall more than precision by a factor of beta. beta == 1.0 means recall and precision are equally important.

The support is the number of occurrences of each class in y_true.

Support beyond term:binary targets is achieved by treating multiclass and multilabel data as a collection of binary problems, one for each label. For the binary case, setting average='binary' will return metrics for pos_label. If average is not 'binary', pos_label is ignored and metrics for both classes are computed, then averaged or both returned (when average=None). Similarly, for multiclass and multilabel targets, metrics for all labels are either returned or averaged depending on the average parameter. Use labels specify the set of labels to calculate metrics for.

Read more in the User Guide.

AUC

Orange.evaluation.AUC(results=None, **kwargs)[source]

${sklpar}

Parameters:
  • results (Orange.evaluation.Results) -- Stored predictions and actual data in model testing.

  • target (int, optional (default=None)) -- Value of class to report.

Log Loss

Orange.evaluation.LogLoss(results=None, **kwargs)[source]

${sklpar}

Parameters:
  • results (Orange.evaluation.Results) -- Stored predictions and actual data in model testing.

  • eps (float) -- Log loss is undefined for p=0 or p=1, so probabilities are clipped to max(eps, min(1 - eps, p)).

  • normalize (bool, optional (default=True)) -- If true, return the mean loss per sample. Otherwise, return the sum of the per-sample losses.

  • sample_weight (array-like of shape = [n_samples], optional) -- Sample weights.

Examples

>>> Orange.evaluation.LogLoss(results)
array([0.3...])

MSE

Orange.evaluation.MSE(results=None, **kwargs)[source]

A wrapper for sklearn.metrics._regression.mean_squared_error. The following is its documentation:

Mean squared error regression loss.

Read more in the User Guide.

MAE

Orange.evaluation.MAE(results=None, **kwargs)[source]

A wrapper for sklearn.metrics._regression.mean_absolute_error. The following is its documentation:

Mean absolute error regression loss.

Read more in the User Guide.

R2

Orange.evaluation.R2(results=None, **kwargs)[source]

A wrapper for sklearn.metrics._regression.r2_score. The following is its documentation:

\(R^2\) (coefficient of determination) regression score function.

Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). In the general case when the true y is non-constant, a constant model that always predicts the average y disregarding the input features would get a \(R^2\) score of 0.0.

In the particular case when y_true is constant, the \(R^2\) score is not finite: it is either NaN (perfect predictions) or -Inf (imperfect predictions). To prevent such non-finite numbers to pollute higher-level experiments such as a grid search cross-validation, by default these cases are replaced with 1.0 (perfect predictions) or 0.0 (imperfect predictions) respectively. You can set force_finite to False to prevent this fix from happening.

Note: when the prediction residuals have zero mean, the \(R^2\) score is identical to the Explained Variance score.

Read more in the User Guide.