Classification and Regression with the BigML Dashboard

7.5 Visualizing Evaluations

After evaluations are created, you can visualize them in the BigML Dashboard.

Different visualizations are provided for classification (see subsection 7.5.2 ) and regression (see subsection 7.5.3 ) evaluations. Both types of evaluations shared the original resources information at the top of the view (see subsection 7.5.1 ).

Cross-validation visualizations are very similar to single evaluations, the main differences are explained in subsection 7.5.4 .

7.5.1 Original Resources Information

For single evaluations, the original model and testing dataset used to create the evaluation will appear at the top of the evaluations view along with their corresponding icons so you can come back to see the resources. (See Figure 7.80 .)

\includegraphics[]{images/evaluations/evaluations-resources-info}
Figure 7.80 Example of an ensemble evaluation top view information

The information for each resource includes some of the parameter values used to create them:

  • For models, ensembles, logistic regressions and deepnets:

    • Model type (regression or classification).

    • Objective field name.

    • Model size.

    • Number of fields: used to build the model.

    • Number of instances: used to build the model.

    • Description (see section 1.10 .)

    • Sampling and ordering options (see subsection 1.4.7 and subsection 1.4.8 .)

      • Instances: number of sampled instances.

      • Sample rate: percentage of the training dataset used to build the model.

      • Range: instances selected to build the model.

      • Sampling: deterministic or random.

      • Replacement: true or false.

      • Out of bag: true or false.

      • Ordering: linear, random or deterministic.

  • Ensembles also include:

    • Type: Decision Forests or Boosted Trees (See subsection 2.4.3 )

    • Number of models: total single models in the ensemble

  • For testing datasets:

    • Dataset size

    • Number of fields: in the testing dataset

    • Number of instances: in the testing dataset

    • Description: see subsection 7.9.2

    • Sampling and ordering options (See subsection 7.4.4 .)

      • Instances: number of sampled instances

      • Sample rate: percentage of the testing dataset used to create the evaluation

      • Range: instances selected to sample the testing dataset

      • Sampling: deterministic or random.

      • Replacement: true or false.

      • Out of bag: true or false.

      • Ordering: linear, random or deterministic.

7.5.2 Classification Evaluations

For classification models, BigML provides several views that are accesible by clicking in the corresponding icons at the top menu.

General Evaluation

This first visualization provides the general confusion matrix which contains the correctly classified instances as well as the errors made by the model for each of the objective field classes whithout applying any threshold.

In the table rows you can find the predictions for the positive class and the negative classes, and in the columns the actual instances in the testing dataset. You can transpose rows by columns by clicking on the switcher shown in Figure 7.81 . You can also download the confusion matrix in Excel format by clicking the option shown in Figure 7.81 .

\includegraphics[]{images/evaluations/confusion-matrix-view}
Figure 7.81 Confusion matrix view

If you select a postive class in the selector, the confusion matrix cells will be painted according to the True Positives (TP), False Positives (FP), True Negatives (TN) and False Negatives (FN). By hovering a class in the table you get also these colors Figure 7.82 .

\includegraphics[]{images/evaluations/confusion-matrix-view2}
Figure 7.82 Select a positive class

In the metrics below the confusion matrix you will see the performance measures for the class selected as the positive class such as Precision, Recall, F-Measure, Phi Coefficient and Accuracy. If you select “All classes”, you will get the averages for the overall model. You can also compared these metrics to other two types of models: one using the mode as its prediction and the other predicting a random class of the objective field. At the very least, you would expect your model to outperform these weak benchmarks. (See Figure 7.83 ).

\includegraphics[]{images/evaluations/confusion-matrix-view3}
Figure 7.83 Performance metrics and benchmark models view

The confusion matrix will display up to five different classes. If your model contains more than five classes in the objective field, you can download the confusion matrix in Excel format by clicking the option shown in Figure 7.84 .

\includegraphics[]{images/evaluations/download-confusion-matrix}
Figure 7.84 Download confusion matrix

Evaluation curves

BigML offers four different evaluation curves: the ROC curve, the Precision-Recall curve, the Gain curve and the Lift curve. All of them are explained in detailed in Evaluation curves .

You can select the positive class and the threshold for each curve and you will see the confusion matrix values and the metrics changing in real-time (see Figure 7.85 ).

\includegraphics[]{images/evaluations/curves}
Figure 7.85 Select the evaluation curve, the positive class and the threshold

The confusion matrices for the curves have always two columns and two rows although there are more than two classes in the objective field. This is because the postive class is the one predicted given a certain threshold and if the threshold is not met, the rest of classes are aggregated as a single “Negative class” and predicted instead. Read more about positive and negative classes in Confusion Matrix ).

From each curve visualization you can:

  • Download the confusion matrix in Excel format

  • Download the chart in PNG format with or without legends.

\includegraphics[]{images/evaluations/export-options}
Figure 7.86 Export options for the threshold confusion matrices and the evaluation curves

7.5.3 Regression Evaluations

You can visualize the regression measures in the green boxed histograms: Mean Absolute Error, Mean Squared Errror and R Squared. Read more about regression measures in subsection 7.2.2 .

\includegraphics[]{images/evaluations/regression_measures}
Figure 7.87 Performance measures for regression models, ensembles, deepnets, and fusions

By default, BigML provides the measures of two other types of models to compare against your model performance. One of them uses the mean as its prediction and the other predicts a random value in the range of the objective field. At the very least, you would expect your model to outperform these weak benchmarks. You can remove the benchmarks by clicking the icons shown in Figure 7.88 .

\includegraphics[]{images/evaluations/compare-model-regression}
Figure 7.88 Compare regression models against random and mode values

7.5.4 Cross-Validation Visualization

Since cross-validation measures are calculated by averaging the single \(k\) evaluations, you cannot find the confusion matrix and evaluation curves explained in subsection 7.5.2 . Instead, the main metrics are plot in different histograms. Additionally, you can see the standard deviation per measure under the sigma symbol icon. You can find a cross-validation example for a classification model below.

\includegraphics[]{images/evaluations/cross-validation-view}
Figure 7.89 Cross-validation view for classification measures

At the top of the view, you can find the list of the single evaluations used to compute the cross-validation measures by clicking on the Evaluations panel. (See Figure 7.90 .)

\includegraphics[]{images/evaluations/cross-val-sing-eval}
Figure 7.90 Cross-validation single evaluations