Projection (`projection`)¶

PCA¶

Principal component analysis is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

Example¶

>>> from Orange.projection import PCA
>>> from Orange.data import Table
>>> iris = Table('iris')
>>> pca = PCA()
>>> model = pca(iris)
>>> model.components_    # PCA components
array([[ 0.36158968, -0.08226889,  0.85657211,  0.35884393],
    [ 0.65653988,  0.72971237, -0.1757674 , -0.07470647],
    [-0.58099728,  0.59641809,  0.07252408,  0.54906091],
    [ 0.31725455, -0.32409435, -0.47971899,  0.75112056]])
>>> transformed_data = model(iris)    # transformed data
>>> transformed_data
[[-2.684, 0.327, -0.022, 0.001 | Iris-setosa],
[-2.715, -0.170, -0.204, 0.100 | Iris-setosa],
[-2.890, -0.137, 0.025, 0.019 | Iris-setosa],
[-2.746, -0.311, 0.038, -0.076 | Iris-setosa],
[-2.729, 0.334, 0.096, -0.063 | Iris-setosa],
...
]

class Orange.projection.pca.PCA(n_components=None, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', random_state=None, preprocessors=None)[source]¶

A wrapper for sklearn.decomposition._pca.PCA. The following is its documentation:

Principal component analysis (PCA).

Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. The input data is centered but not scaled for each feature before applying the SVD.

It uses the LAPACK implementation of the full SVD or a randomized truncated SVD by the method of Halko et al. 2009, depending on the shape of the input data and the number of components to extract.

With sparse inputs, the ARPACK implementation of the truncated SVD can be used (i.e. through scipy.sparse.linalg.svds()). Alternatively, one may consider TruncatedSVD where the data are not centered.

Notice that this class only supports sparse inputs for some solvers such as "arpack" and "covariance_eigh". See TruncatedSVD for an alternative with sparse data.

For a usage example, see sphx_glr_auto_examples_decomposition_plot_pca_iris.py

Read more in the User Guide.

class Orange.projection.pca.SparsePCA(n_components=None, alpha=1, ridge_alpha=0.01, max_iter=1000, tol=1e-08, method='lars', n_jobs=1, U_init=None, V_init=None, verbose=False, random_state=None, preprocessors=None)[source]¶

A wrapper for sklearn.decomposition._sparse_pca.SparsePCA. The following is its documentation:

Sparse Principal Components Analysis (SparsePCA).

Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.

Read more in the User Guide.

class Orange.projection.pca.IncrementalPCA(n_components=None, whiten=False, copy=True, batch_size=None, preprocessors=None)[source]¶

A wrapper for sklearn.decomposition._incremental_pca.IncrementalPCA. The following is its documentation:

Incremental principal components analysis (IPCA).

Linear dimensionality reduction using Singular Value Decomposition of the data, keeping only the most significant singular vectors to project the data to a lower dimensional space. The input data is centered but not scaled for each feature before applying the SVD.

Depending on the size of the input data, this algorithm can be much more memory efficient than a PCA, and allows sparse input.

This algorithm has constant memory complexity, on the order of batch_size * n_features, enabling use of np.memmap files without loading the entire file into memory. For sparse matrices, the input is converted to dense in batches (in order to be able to subtract the mean) which avoids storing the entire dense matrix at any one time.

The computational overhead of each SVD is O(batch_size * n_features ** 2), but only 2 * batch_size samples remain in memory at a time. There will be n_samples / batch_size SVD computations to get the principal components, versus 1 large SVD of complexity O(n_samples * n_features ** 2) for PCA.

For a usage example, see sphx_glr_auto_examples_decomposition_plot_incremental_pca.py.

Read more in the User Guide.

Added in version 0.16.

FreeViz¶

FreeViz uses a paradigm borrowed from particle physics: points in the same class attract each other, those from different class repel each other, and the resulting forces are exerted on the anchors of the attributes, that is, on unit vectors of each of the dimensional axis. The points cannot move (are projected in the projection space), but the attribute anchors can, so the optimization process is a hill-climbing optimization where at the end the anchors are placed such that forces are in equilibrium.

Example¶

>>> from Orange.projection import FreeViz
>>> from Orange.data import Table
>>> iris = Table('iris')
>>> freeviz = FreeViz()
>>> model = freeviz(iris)
>>> model.components_    # FreeViz components
array([[  3.83487853e-01,   1.38777878e-17],
   [ -6.95058218e-01,   7.18953457e-01],
   [  2.16525357e-01,  -2.65741729e-01],
   [  9.50450079e-02,  -4.53211728e-01]])
>>> transformed_data = model(iris)    # transformed data
>>> transformed_data
[[-0.157, 2.053 | Iris-setosa],
[0.114, 1.694 | Iris-setosa],
[-0.123, 1.864 | Iris-setosa],
[-0.048, 1.740 | Iris-setosa],
[-0.265, 2.125 | Iris-setosa],
...
]

class Orange.projection.freeviz.FreeViz(weights=None, center=True, scale=True, dim=2, p=1, initial=None, maxiter=500, alpha=0.1, gravity=None, atol=1e-05, preprocessors=None)[source]¶

LDA¶

Linear discriminant analysis is another way of finding a linear transformation of data that reduces the number of dimensions required to represent it. It is often used for dimensionality reduction prior to classification, but can also be used as a classification technique itself ([1]).

Example¶

>>> from Orange.projection import LDA
>>> from Orange.data import Table
>>> iris = Table('iris')
>>> lda = LDA()
>>> model = LDA(iris)
>>> model.components_    # LDA components
array([[ 0.20490976,  0.38714331, -0.54648218, -0.71378517],
   [ 0.00898234,  0.58899857, -0.25428655,  0.76703217],
   [-0.71507172,  0.43568045,  0.45568731, -0.30200008],
   [ 0.06449913, -0.35780501, -0.42514529,  0.828895  ]])
>>> transformed_data = model(iris)    # transformed data
>>> transformed_data
[[1.492, 1.905 | Iris-setosa],
[1.258, 1.608 | Iris-setosa],
[1.349, 1.750 | Iris-setosa],
[1.180, 1.639 | Iris-setosa],
[1.510, 1.963 | Iris-setosa],
...
]

class Orange.projection.lda.LDA(solver='svd', shrinkage=None, priors=None, n_components=None, store_covariance=False, tol=0.0001, preprocessors=None)[source]¶

A wrapper for sklearn.discriminant_analysis.LinearDiscriminantAnalysis. The following is its documentation:

Linear Discriminant Analysis.

A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes' rule.

The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.

The fitted model can also be used to reduce the dimensionality of the input by projecting it to the most discriminative directions, using the transform method.

Added in version 0.17.

For a comparison between LinearDiscriminantAnalysis and QuadraticDiscriminantAnalysis, see sphx_glr_auto_examples_classification_plot_lda_qda.py.

Read more in the User Guide.

Projection (`projection`)¶

PCA¶

Example¶

FreeViz¶

Example¶

LDA¶

Example¶

References¶

Orange Data Mining Library

Navigation

Related Topics

Projection (projection)¶

PCA¶

Example¶

FreeViz¶

Example¶

LDA¶

Example¶

References¶

Projection (`projection`)¶