Outlier detection (classification)

One Class Support Vector Machines

class Orange.classification.OneClassSVMLearner(kernel='rbf', degree=3, gamma='auto', coef0=0.0, tol=0.001, nu=0.5, shrinking=True, cache_size=200, max_iter=-1, preprocessors=None)[source]

A wrapper for sklearn.svm._classes.OneClassSVM. The following is its documentation:

Unsupervised Outlier Detection.

Estimate the support of a high-dimensional distribution.

The implementation is based on libsvm.

Read more in the User Guide.

preprocessors = [HasClass(), Continuize(), RemoveNaNColumns(), SklImpute(), AdaptiveNormalize(zero_based=<?>, norm_type=<?>, transform_class=<?>, normalize_datetime=<?>, center=<?>, scale=<?>)]

A sequence of data preprocessors to apply on data prior to fitting the model

Elliptic Envelope

class Orange.classification.EllipticEnvelopeLearner(store_precision=True, assume_centered=False, support_fraction=None, contamination=0.1, random_state=None, preprocessors=None)[source]

A wrapper for sklearn.covariance._elliptic_envelope.EllipticEnvelope. The following is its documentation:

An object for detecting outliers in a Gaussian distributed dataset.

Read more in the User Guide.

Local Outlier Factor

class Orange.classification.LocalOutlierFactorLearner(n_neighbors=20, algorithm='auto', leaf_size=30, metric='minkowski', p=2, metric_params=None, contamination='auto', novelty=True, n_jobs=None, preprocessors=None)[source]

A wrapper for sklearn.neighbors._lof.LocalOutlierFactor. The following is its documentation:

Unsupervised Outlier Detection using the Local Outlier Factor (LOF).

The anomaly score of each sample is called the Local Outlier Factor. It measures the local deviation of the density of a given sample with respect to its neighbors. It is local in that the anomaly score depends on how isolated the object is with respect to the surrounding neighborhood. More precisely, locality is given by k-nearest neighbors, whose distance is used to estimate the local density. By comparing the local density of a sample to the local densities of its neighbors, one can identify samples that have a substantially lower density than their neighbors. These are considered outliers.

New in version 0.19.

Isolation Forest

class Orange.classification.IsolationForestLearner(n_estimators=100, max_samples='auto', contamination='auto', max_features=1.0, bootstrap=False, n_jobs=None, behaviour='deprecated', random_state=None, verbose=0, warm_start=False, preprocessors=None)[source]

A wrapper for sklearn.ensemble._iforest.IsolationForest. The following is its documentation:

Isolation Forest Algorithm.

Return the anomaly score of each sample using the IsolationForest algorithm

The IsolationForest 'isolates' observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.

Since recursive partitioning can be represented by a tree structure, the number of splittings required to isolate a sample is equivalent to the path length from the root node to the terminating node.

This path length, averaged over a forest of such random trees, is a measure of normality and our decision function.

Random partitioning produces noticeably shorter paths for anomalies. Hence, when a forest of random trees collectively produce shorter path lengths for particular samples, they are highly likely to be anomalies.

Read more in the User Guide.

New in version 0.18.