Test & Score ============ .. figure:: icons/test-learners.png Tests learning algorithms on data. Signals ------- **Inputs** - **Data** Data for training and, if there is no separate test data set, also testing. - **Test Data** Separate data for testing. - **Learner** One or more learning algorithms. **Outputs** - **Evaluation results** Results of testing the algorithms. Description ----------- The widget tests learning algorithms. Different sampling schemes are available, including using separate test data. The widget does two things. First, it shows a table with different classifier performance measures, such as `classification accuracy `_ and `area under ROC `_. Second, it outputs evaluation results, which can be used by other widgets for analyzing the performance of classifiers, such as :doc:`ROC Analysis <../evaluation/rocanalysis>` or :doc:`Confusion Matrix <../evaluation/confusionmatrix>`. The *Learner* signal has an uncommon property: it can be connected to more than one widget to test multiple learners with the same procedures. .. figure:: images/TestLearners-stamped.png 1. The widget supports various sampling methods. - `Cross-validation `_ splits the data into a given number of folds (usually 5 or 10). The algorithm is tested by holding out examples from one fold at a time; the model is induced from other folds and examples from the held out fold are classified. This is repeated for all the folds. - **Leave-one-out** is similar, but it holds out one instance at a time, inducing the model from all others and then classifying the held out instances. This method is obviously very stable, reliable … and very slow. - **Random sampling** randomly splits the data into the training and testing set in the given proportion (e.g. 70:30); the whole procedure is repeated for a specified number of times. - **Test on train data** uses the whole data set for training and then for testing. This method practically always gives wrong results. - **Test on test data**: the above methods use the data from *Data* signal only. To input another data set with testing examples (for instance from another file or some data selected in another widget), we select *Separate Test Data* signal in the communication channel and select Test on test data. 2. Only *Test on test data* requires a target class, e.g. having the disease or being of subvariety *Iris setosa*. When *Target class* is (None), the methods return the average value. Target class can be selected at the bottom of the widget. 3. Produce a report. 4. The widget will compute a number of performance statistics: Classification -------------- .. figure:: images/TestLearners.png - `Area under ROC `_ is the area under the receiver-operating curve. - `Classification accuracy `_ is the proportion of correctly classified examples. - `F-1 `_ is a weighted harmonic mean of precision and recall (see below). - `Precision `_ is the proportion of true positives among instances classified as positive, e.g. the proportion of *Iris virginica* correctly identified as Iris virginica. - `Recall `_ is the proportion of true positives among all positive instances in the data, e.g. the number of sick among all diagnosed as sick. Regression ---------- .. figure:: images/TestLearners-regression.png - `MSE `_ measures the average of the squares of the errors or deviations (the difference between the estimator and what is estimated). - `RMSE `_ is the square root of the arithmetic mean of the squares of a set of numbers (a measure of imperfection of the fit of the estimator to the data) - `MAE `_ is used to measure how close forecasts or predictions are to eventual outcomes. - `R2 `_ is interpreted as the proportion of the variance in the dependent variable that is predictable from the independent variable. Example ------- In a typical use of the widget, we give it a data set and a few learning algorithms and we observe their performance in the table inside the :doc:`Test&Score <../evaluation/testlearners>` widget and in the :doc:`ROC <../evaluation/rocanalysis>`. The data is often preprocessed before testing; in this case we did some manual feature selection (:doc:`Select Columns <../data/selectcolumns>` widget) on *Titanic* data set, where we want to know only the sex and status of the survived and omit the age. .. figure:: images/TestLearners-example-classification.png Another example of using this widget is presented in the documentation for the :doc:`Confusion Matrix <../evaluation/confusionmatrix>` widget.