Sampling procedures for testing models (testing
)¶

class
Orange.evaluation.testing.
Results
(data=None, *, nmethods=None, nrows=None, nclasses=None, domain=None, row_indices=None, folds=None, score_by_folds=True, learners=None, models=None, failed=None, actual=None, predicted=None, probabilities=None, store_data=None, store_models=None, train_time=None, test_time=None)[source]¶ Class for storing predictions in model testing.

models
¶ A list of induced models.
Type: Optional[List[Model]]

row_indices
¶ Indices of rows in data that were used in testing, stored as a numpy vector of length nrows. Values of actual[i], predicted[i] and probabilities[i] refer to the target value of instance, that is, the ith test instance is data[row_indices[i]], its actual class is actual[i], and the prediction by mth method is predicted[m, i].
Type: np.ndarray

nrows
¶ The number of test instances (including duplicates); nrows equals the length of row_indices and actual, and the second dimension of predicted and probabilities.
Type: int

actual
¶ true values of target variable in a vector of length nrows.
Type: np.ndarray

predicted
¶ predicted values of target variable in an array of shape (numberofmethods, nrows)
Type: np.ndarray

probabilities
¶ predicted probabilities (for discrete target variables) in an array of shape (numberofmethods, nrows, numberofclasses)
Type: Optional[np.ndarray]

folds
¶ a list of indices (or slice objects) corresponding to testing data subsets, that is, row_indices[folds[i]] contains row indices used in fold i, so data[row_indices[folds[i]]] is the corresponding testing data
Type: List[Slice or List[int]]

train_time
¶ training times of batches
Type: np.ndarray

test_time
¶ testing times of batches
Type: np.ndarray

get_augmented_data
(model_names, include_attrs=True, include_predictions=True, include_probabilities=True)[source]¶ Return the test data table augmented with meta attributes containing predictions, probabilities (if the task is classification) and fold indices.
Parameters: Returns: data augmented with predictions, probabilities and fold indices
Return type: augmented_data (Orange.data.Table)


class
Orange.evaluation.testing.
CrossValidation
(k=10, stratified=True, random_state=0, store_data=False, store_models=False, warnings=None)[source]¶ Kfold cross validation

random_state
¶ seed for random number generator (default: 0). If set to None, a different seed is used each time
Type: int

stratified
¶ flag deciding whether to perform stratified crossvalidation. If True but the class sizes don’t allow it, it uses nonstratified validataion and adds a list warning with a warning message(s) to the Result.
Type: bool

get_indices
(data)[source]¶ Return a list of arrays of indices of test data instance
For example, in kfold CV, the result is a list with k elements, each containing approximately len(data) / k nonoverlapping indices into data.
This method is abstract and must be implemented in derived classes unless they provide their own implementation of the __call__ method.
Parameters: data (Orange.data.Table) – test data Returns: a list of arrays of indices into data Return type: indices (list of np.ndarray)


class
Orange.evaluation.testing.
CrossValidationFeature
(feature=None, store_data=False, store_models=False, warnings=None)[source]¶ Cross validation with folds according to values of a feature.

feature
¶ the feature defining the folds
Type: Orange.data.Variable

get_indices
(data)[source]¶ Return a list of arrays of indices of test data instance
For example, in kfold CV, the result is a list with k elements, each containing approximately len(data) / k nonoverlapping indices into data.
This method is abstract and must be implemented in derived classes unless they provide their own implementation of the __call__ method.
Parameters: data (Orange.data.Table) – test data Returns: a list of arrays of indices into data Return type: indices (list of np.ndarray)


class
Orange.evaluation.testing.
LeaveOneOut
(*, store_data=False, store_models=False)[source]¶ Leaveoneout testing

get_indices
(data)[source]¶ Return a list of arrays of indices of test data instance
For example, in kfold CV, the result is a list with k elements, each containing approximately len(data) / k nonoverlapping indices into data.
This method is abstract and must be implemented in derived classes unless they provide their own implementation of the __call__ method.
Parameters: data (Orange.data.Table) – test data Returns: a list of arrays of indices into data Return type: indices (list of np.ndarray)

static
prepare_arrays
(data, indices)[source]¶ Prepare folds, row_indices and actual.
The method is used by __call__. While functional, it may be overriden in subclasses for speedups.
Parameters:  data (Orange.data.Table) – data use for testing
 indices (list of vectors) – indices of data instances in each test sample
Returns: (np.ndarray): see class documentation row_indices: (np.ndarray): see class documentation actual: (np.ndarray): see class documentation
Return type: folds


class
Orange.evaluation.testing.
ShuffleSplit
(n_resamples=10, train_size=None, test_size=0.1, stratified=True, random_state=0, store_data=False, store_models=False)[source]¶ Test by repeated random sampling

test_size
¶ If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. By default, the value is set to 0.1. The default will change in version 0.21. It will remain 0.1 only if
train_size
is unspecified, otherwise it will complement the specifiedtrain_size
. (from documentation of scipy.sklearn.StratifiedShuffleSplit)Type: float, int, None

train_size
¶ float, int, or None, default is None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size. (from documentation of scipy.sklearn.StratifiedShuffleSplit)

random_state
¶ seed for random number generator (default: 0). If set to None, a different seed is used each time
Type: int

get_indices
(data)[source]¶ Return a list of arrays of indices of test data instance
For example, in kfold CV, the result is a list with k elements, each containing approximately len(data) / k nonoverlapping indices into data.
This method is abstract and must be implemented in derived classes unless they provide their own implementation of the __call__ method.
Parameters: data (Orange.data.Table) – test data Returns: a list of arrays of indices into data Return type: indices (list of np.ndarray)


class
Orange.evaluation.testing.
TestOnTestData
(*, store_data=False, store_models=False)[source]¶ Test on separately provided test data
Note that the class has a different signature for __call__.

class
Orange.evaluation.testing.
TestOnTrainingData
(*, store_data=False, store_models=False)[source]¶ Test on training data

Orange.evaluation.testing.
sample
(table, n=0.7, stratified=False, replace=False, random_state=None)[source]¶ Samples data instances from a data table. Returns the sample and a dataset from input data table that are not in the sample. Also uses several sampling functions from scikitlearn.
 table : data table
 A data table from which to sample.
 n : float, int (default = 0.7)
 If float, should be between 0.0 and 1.0 and represents the proportion of data instances in the resulting sample. If int, n is the number of data instances in the resulting sample.
 stratified : bool, optional (default = False)
 If true, sampling will try to consider class values and match distribution of class values in train and test subsets.
 replace : bool, optional (default = False)
 sample with replacement
 random_state : int or RandomState
 Pseudorandom number generator state used for random sampling.