Classification and Regression with the BigML Dashboard
7.6 Evaluation Comparison
Evaluation comparison allows you to identify the algorithms and configurations that yield the highest performance. In BigML you can compare two evaluations side by side or multiple evaluations simultaneously. Both options are explained in the following subsections.
7.6.1 Compare Evaluations Side by Side
In BigML, you can easily compare two evaluations side by side by clicking compare evaluation in the 1-click action menu option from the evaluation view. (Figure 7.91 ). This option is deprecated for the new evaluations that include the evaluation curves explained in Evaluation curves .
Alternatively, you can use the compare evaluations menu option, from the pop up menu. (Figure 7.92 .)
You can also click on the compare evaluations list view from the 1-click action menu in the evaluation list view. (Figure 7.93 ).
Using any of the options you will be redirected to the Compare Evaluations view where you will be able to select the evaluations you want to compare. (Figure 7.94 .)
Note: you can only compare two evaluations that use the same testing dataset and the same evaluation configuration; otherwise evaluations are not comparable.
The performance measures of your selected evaluations will be displayed side by side. BigML automatically computes the variation of the right-hand-side evaluation compared to the left-hand-side one. You can find those variations next to each performance measure of the right-hand-side evaluation. You can also select another evaluations from the evaluations selectors. (See Figure 7.95 .)
7.6.2 Compare Multiple Evaluations
The main goal of comparing multiple evaluations is to analyze different model performances by plotting their evaluation results in a chart. BigML evaluation comparison offers a flexible and visual way to compare multiple classification models built using different algorithms (models, ensembles, logistic regression, deepnets, or fusions) and/or different configurations as long as the testing dataset is the same. Cross-validation evaluations are not eligible for the Evaluation comparison tool since they use different testing datasets to compute the averaged measures.
You can compare multiple Classification evaluations using the option from the 1-click action menu as shown in Figure 7.96 . Previously built evaluations to the evaluation curves explained in Evaluation curves do not show this option anymore.
You can also click on the compare multiple evaluations list view from the 1-click action menu in the evaluation list view. (Figure 7.97 ).
A chart displaying the curves for the current evaluation will be displayed. You can select any of the four evaluation curves offered by BigML: Precision-Recall curve, ROC curve, Gain curve and Lift curve. (See Figure 7.98 .) You can find a detailed explanation of the evaluation curves in Evaluation curves .
You can plot different evaluations in the chart by clicking the evaluation selector shown in Figure 7.99 . You will only be able to select other comparable evaluations, i.e., other evaluations that were built using the same testing dataset. BigML allows to compare up to 150 evaluations simultaneously selecting them in groups of 20 evaluations from the selector.
Your selected evaluations will appear in the chart with different colors. The evaluation curves and metrics will be calculated according to the positive class selected. You can select to sort your evaluations in the legend by any of the available metrics: the ROC AUC, the PR AUC, the K-S statistic, the Kendall’s Tau or the Spearman’s Rho. In the legend, for each evaluation, you will have the name, the icon of the model used, the sorting metric, and the configuration parameters used to create the model. By hovering over the evaluations in the legend you will be able to hide or remove them at any time.
Apart from the ROC AUC and PR AUC metrics shown in the legend for, you will also be able to see the PR AUCH and ROC AUCH by hovering over the evaluation names. Again, refer to Evaluation curves for a detailed explanation about the AUCH (Are Under the Convex Hull).
If you select the Gain curve you will also obtain the K-S statististic by hovering over the evaluation name in the legend. (See Evaluation curves for a detailed explanation).
The panel below the chart, contains a table with all the selected evaluations, their metrics and a set of model parameters to help you identify which algorithm and configuration performs better (see Figure 7.103 ):
ROC AUC: the Area Under the Curve of the ROC curve.
PR AUC: the Area Under the Curve of the Precision-Recall curve.
K-S: the K-S statistic of the Gain curve.
Tau: the Kendall’s tau coefficient.
Rho: the Spearman’s rho coefficient.
Model Type: if the resource evaluated is a single model, ensemble, logistic regression, deepnet, or fusion.
Number of nodes: maximum number of splits set for training the model or ensemble.
Number of models: number of models set for training the ensemble. For models, it will always be 1.
% of Bagging: percentage of the dataset instances used for training the single trees composing the ensemble.
Randomize: if the single trees composing an ensemble take a random sample of fields per split.
Boosted: if the ensemble is composed of Boosted trees.
Balanced: if the model or ensemble has been built previously balancing the objective field.
Missing splits: if the model or ensemble has been built taking into account the missing values in the dataset.
Default Numeric: if the logistic regression has been built replacing the missing values by the field mean, maximum, minimum or zero.
Auto-scaled: if the fields in the logistic regression have been auto-scaled.
Bias: if the logistic regression includes the bias term.
You can find a detailed explanation of each configuration parameter depending if the model is a single tree (section 1.4 ), an esemble (section 2.4 ), a logistic regression (section 4.4 ), or a deepnet (section 5.4 ) in the corresponding sections.
Exporting Evaluation Comparison
You can export the multiple comparison chart in PNG format with or without legends by clicking the icon shown in Figure 7.104 . The legend includes the evaluations names, the selected sorting metric and the model configuration options.