Sampling procedures for testing models (testing
)¶
- class Orange.evaluation.testing.Results(data=None, *, nmethods=None, nrows=None, nclasses=None, domain=None, row_indices=None, folds=None, score_by_folds=True, learners=None, models=None, failed=None, actual=None, predicted=None, probabilities=None, store_data=None, store_models=None, train_time=None, test_time=None)[source]¶
Class for storing predictions in model testing.
- models¶
A list of induced models.
- Type
Optional[List[Model]]
- row_indices¶
Indices of rows in data that were used in testing, stored as a numpy vector of length nrows. Values of actual[i], predicted[i] and probabilities[i] refer to the target value of instance, that is, the i-th test instance is data[row_indices[i]], its actual class is actual[i], and the prediction by m-th method is predicted[m, i].
- Type
np.ndarray
- nrows¶
The number of test instances (including duplicates); nrows equals the length of row_indices and actual, and the second dimension of predicted and probabilities.
- Type
- actual¶
true values of target variable in a vector of length nrows.
- Type
np.ndarray
- predicted¶
predicted values of target variable in an array of shape (number-of-methods, nrows)
- Type
np.ndarray
- probabilities¶
predicted probabilities (for discrete target variables) in an array of shape (number-of-methods, nrows, number-of-classes)
- Type
Optional[np.ndarray]
- folds¶
a list of indices (or slice objects) corresponding to testing data subsets, that is, row_indices[folds[i]] contains row indices used in fold i, so data[row_indices[folds[i]]] is the corresponding testing data
- Type
List[Slice or List[int]]
- train_time¶
training times of batches
- Type
np.ndarray
- test_time¶
testing times of batches
- Type
np.ndarray
- get_augmented_data(model_names, include_attrs=True, include_predictions=True, include_probabilities=True)[source]¶
Return the test data table augmented with meta attributes containing predictions, probabilities (if the task is classification) and fold indices.
- Parameters
- Returns
data augmented with predictions, probabilities and fold indices
- Return type
augmented_data (Orange.data.Table)
- class Orange.evaluation.testing.CrossValidation(data=None, learners=None, preprocessor=None, test_data=None, *, callback=None, store_data=False, store_models=False, n_jobs=None, **kwargs)[source]¶
K-fold cross validation
- random_state¶
seed for random number generator (default: 0). If set to None, a different seed is used each time
- Type
- stratified¶
flag deciding whether to perform stratified cross-validation. If True but the class sizes don't allow it, it uses non-stratified validataion and adds a list warning with a warning message(s) to the Result.
- Type
- get_indices(data)[source]¶
Return a list of arrays of indices of test data instance
For example, in k-fold CV, the result is a list with k elements, each containing approximately len(data) / k nonoverlapping indices into data.
This method is abstract and must be implemented in derived classes unless they provide their own implementation of the __call__ method.
- Parameters
data (Orange.data.Table) -- test data
- Returns
a list of arrays of indices into data
- Return type
indices (list of np.ndarray)
- class Orange.evaluation.testing.LeaveOneOut(data=None, learners=None, preprocessor=None, test_data=None, *, callback=None, store_data=False, store_models=False, n_jobs=None, **kwargs)[source]¶
Leave-one-out testing
- get_indices(data)[source]¶
Return a list of arrays of indices of test data instance
For example, in k-fold CV, the result is a list with k elements, each containing approximately len(data) / k nonoverlapping indices into data.
This method is abstract and must be implemented in derived classes unless they provide their own implementation of the __call__ method.
- Parameters
data (Orange.data.Table) -- test data
- Returns
a list of arrays of indices into data
- Return type
indices (list of np.ndarray)
- static prepare_arrays(data, indices)[source]¶
Prepare folds, row_indices and actual.
The method is used by __call__. While functional, it may be overriden in subclasses for speed-ups.
- Parameters
data (Orange.data.Table) -- data use for testing
indices (list of vectors) -- indices of data instances in each test sample
- Returns
(np.ndarray): see class documentation row_indices: (np.ndarray): see class documentation actual: (np.ndarray): see class documentation
- Return type
folds
- class Orange.evaluation.testing.TestOnTrainingData(data=None, learners=None, preprocessor=None, **kwargs)[source]¶
Test on training data
- class Orange.evaluation.testing.ShuffleSplit(data=None, learners=None, preprocessor=None, test_data=None, *, callback=None, store_data=False, store_models=False, n_jobs=None, **kwargs)[source]¶
Test by repeated random sampling
- test_size¶
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. By default, the value is set to 0.1. The default will change in version 0.21. It will remain 0.1 only if
train_size
is unspecified, otherwise it will complement the specifiedtrain_size
. (from documentation of scipy.sklearn.StratifiedShuffleSplit)
- train_size¶
float, int, or None, default is None If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size. (from documentation of scipy.sklearn.StratifiedShuffleSplit)
- random_state¶
seed for random number generator (default: 0). If set to None, a different seed is used each time
- Type
- get_indices(data)[source]¶
Return a list of arrays of indices of test data instance
For example, in k-fold CV, the result is a list with k elements, each containing approximately len(data) / k nonoverlapping indices into data.
This method is abstract and must be implemented in derived classes unless they provide their own implementation of the __call__ method.
- Parameters
data (Orange.data.Table) -- test data
- Returns
a list of arrays of indices into data
- Return type
indices (list of np.ndarray)
- class Orange.evaluation.testing.TestOnTestData(data=None, test_data=None, learners=None, preprocessor=None, **kwargs)[source]¶
Test on separately provided test data
Note that the class has a different signature for __call__.
- Orange.evaluation.testing.sample(table, n=0.7, stratified=False, replace=False, random_state=None)[source]¶
Samples data instances from a data table. Returns the sample and a dataset from input data table that are not in the sample. Also uses several sampling functions from scikit-learn.
- tabledata table
A data table from which to sample.
- nfloat, int (default = 0.7)
If float, should be between 0.0 and 1.0 and represents the proportion of data instances in the resulting sample. If int, n is the number of data instances in the resulting sample.
- stratifiedbool, optional (default = False)
If true, sampling will try to consider class values and match distribution of class values in train and test subsets.
- replacebool, optional (default = False)
sample with replacement
- random_stateint or RandomState
Pseudo-random number generator state used for random sampling.
- class Orange.evaluation.testing.CrossValidationFeature(data=None, learners=None, preprocessor=None, test_data=None, *, callback=None, store_data=False, store_models=False, n_jobs=None, **kwargs)[source]¶
Cross validation with folds according to values of a feature.
- feature¶
the feature defining the folds
- Type
- get_indices(data)[source]¶
Return a list of arrays of indices of test data instance
For example, in k-fold CV, the result is a list with k elements, each containing approximately len(data) / k nonoverlapping indices into data.
This method is abstract and must be implemented in derived classes unless they provide their own implementation of the __call__ method.
- Parameters
data (Orange.data.Table) -- test data
- Returns
a list of arrays of indices into data
- Return type
indices (list of np.ndarray)