Classification and Regression with the BigML Dashboard
5.5 Visualizing Deepnets
If the dataset for creating a deepnet does not contain images, you can analyze your results with BigML Partial Dependence Plot (PDP) after the deepnet is created. If the dataset for creating a deepnet contains images, you can analyze your results with BigML Image Deepnet Page. In either case, you can also inspect field importances.
5.5.1 Partial Dependence Plot
Partial Dependence Plot (PDP) is a heatmap chart for examining the impact of the input fileds on the Objective Field.
The PDP view is composed of three main parts: the CHART itself, the PREDICTION legend and the INPUT FIELDS form. (See Figure 5.31 .) You can find a detailed explanation of each one below.
The CHART allows you to view the impact of two input fields on the objective class predictions remaining the rest of input field values constant. You can select any categorical or numeric input field for each axis and the class probabilities are represented in different colors in the heatmap. You can switch the axis by clicking on the option on top of the chart area. (See Figure 5.32 ).
You can find the values for the fields in the axis in the grey area next to the selector. Freeze the view by pressing Figure 5.32 ).
and release it again by pressing from your keyboard. When the view is frozen, an edition icon will appear in this grey area so you can edit the axis values and obtain the prediction for another preferred value. (SeeThe PREDICTION legend allows you to visualize the objective field classes (classification deepnets) or the predicted value (regression deepnets). For classification, each class is represented by a color, the main probability color bar at the top is the probability for the predicted class. By default, colors are shaded according to the prediction range shown in the chart area. That way, smaller differences in predictions are easier to perceive. However, you can choose to see the color shading according to the total range of possible values for the objective field by clicking on the icon next to the prediction bar Figure 5.33 .) You can also select to see only one of the classes using the class selector at the bottom of the legend.
. (SeeAgain, freeze this view by pressing
, and release it again by pressing from your keyboard.Below the chart legend, you can find the INPUT FIELDS form. (See Figure 5.34 .) You can configure the values for any numeric, categorical, text or items field. By changing their values, you can see the predictions changing in real-time. You can also enable or disable the input fields, so they will be treated as missing values. You can sort the fields by their importance to predict the objective field. (See subsection 5.4.4 .)
Moreover, the chart includes a reset option for your input fields values, and an export option to download your chart in PNG format explained below:
After selecting the fields for the axis or configuring the input fields values, you can set them again to the default view by clicking the reset icon highlighted in Figure 5.35 .
You can also export your chart in PNG format with or without legends. Freeze the view by pressing
from your keyboard and export the chart to get the class percentages in the legend. Release the view by pressing .
Note: there are some limitations in the number of classes of the objective field and the number of input fields to visualize your deepnet in the chart (explained in subsection 5.8.1 ).
5.5.2 Image Deepnet Page
As stated in the previous section, when the dataset used to create a deepnet contains images, the deepnet created will be a convolutional neural network (CNN). See the explanation in subsection 5.2.1 .
A CNN deepnet uses images (i.e. raw image pixels) as input fields. If there are image feature fields that were extracted from those same images in the dataset, they are ignored during CNN training. After a CNN deepnet is created, you can analyze the results with BigML Image Deepnet Page.
On the top row of the Image Deepnet Page are applicable parameters of the deepnet, which may include its hidden layer number, algorithm, optimization option. It also lists the objective field of the deepnet and its number of instances.
Below the top row, the view of the Image Deepnet Page shows the performance of the deepnet on a set of sampled instances. This set is called holdout set, which is used for validation during deepnet training.
Depending on the objective field, the Image Deepnet Page has two variations.
Image Deepnet Page - Classification
If the objective field is categorical, the Image Deepnet Page is the classification variation.
Below the top row, the view is composed of three main sections: the Image Results, the Performance Panel and the Class List. See Figure 5.37 .
As stated above, the Image Deepnet Page mainly shows the performance of the deepnet on a set of sampled instances, called the holdout set. The number of the instances in the holdout set is at most 20% of the total instances in the dataset, or 1024, whichever is smaller.
The Performance Panel shows the overall performance of the deepnet on the holdout set. See Figure 5.38 . This is by default. However when a class is selected in the Class List, the Performance Panel shows the performance for that specific class.
The performance metrics, either correctly predicted or incorrectly predicted, are provided as both percentage and count. For instance, when the count is 893/1020 as shown in Figure 5.38 , that means 893 images were predicted correctly in the holdout set of 1020.
The probability slider allows users to filter the results by selecting the range of probabilities for each image classification. This controls how many images are shown in the Image Results section.
The Image Results section shows two subsections, with each a paginable list of images from the holdout set, Correctly predicted and Incorrectly predicted. In each list, every image has a caption which shows its predicted class, true class and probability. In the subsection of Correctly predicted, because a predicted class is the same as its true class, only one is shown in the caption. In the Incorrectly predicted subsection, both predicted class and true class are shown for each image. The length of the solid color in a probability bar is proportional to its value. See Figure 5.39 .
On the top right of each subsection are the performance metrics by percentage and count. The list of images are paginable by using the pagination arrows at the bottom of each subsection. Each page shows up to 6 images, scaled to fit the area.
When users mouse-over an image, a popup box would show prediction result of the image: predicted class, true class and probability. In the example shown in Figure 5.39 , the popup box is red in background, signaling an incorrect prediction, which its predicted class is frog while its true class is bird and the probability is 70.08%.
The Class List shows all the classes in the holdout set, sorted by their numbers of occurrences. In other words, the list is ranked by class popularity. See Figure 5.40 . The recall rate of each class is displayed by percentage and count at the right side of class bar. Recall is defined as the number of correctly predicted images for the class divided by the total number of images for the class. The length of the green color bar in proportion to the full class bar corresponds to the value of the recall of the class.
The Class List is scrollable when there are not enough space to show all of them together. There is also a download icon in the section heading that users can use to download the class list as a CSV.
There are two controls that can change the number of images to show in Image Results section. One is the probability slider in the Performance Panel. Another is a class bar in the Class List.
By default, the Image Results section shows all the images in the holdout set, both correctly predicted and incorrectly predicted. The probability slider displays the lower and upper ends of the probabilities associated with all the classification results in the set. Either end of the slider can be changed by dragging the respective knob, then the Image Results section will only show images with the probabilities within the range of the slider. The images having the probabilities outside of the range will be filtered out.
As in Figure 5.41 , the lower end of the probability slider was changed to 64.0%. Now any image with the classification probability lower than 64.0% won’t show up in the Image Results section, in both correctly predicted and incorrectly predicted. The performance metrics were changed as well, from 893/1020 to 856/928, reflecting the fewer number of images shown.
The probability slider can be reset to the default by dragging its knobs to include all possible probabilities, or by pressing the reset icon in the heading of the Performance Panel.
Both ends of the probability slider are editable text input boxes, so users can enter precise numbers if so desired.
In the Class List, one class can be selected by clicking on its class bar. When this happens, the Image Results section will only show images of that class. The Correctly Predicted subsection shows all images whose true class is that class, and whose predicted class is that class too. The Incorrectly Predicted subsection shows all images whose true class is that class selected, but whose predicted class is a different class. The performance metrics are changed accordingly.
In Figure 5.42 above, the class truck is selected, as shown by its darkened background. Now the Images Results section shows 89 images in the correctly predicted subsection, all truck, and 13 images in the incorrectly predicted subsection, all truck but all predicted as something else.
Also in Figure 5.42 , note that in the Performance Panel, not only the performance metrics are changed accordingly, but also the title becomes “Performance for truck” instead of “Performance overall”.
To reset the Image Results to include all classes, use the reset icon in the Performance Panel heading.
The class selection and the probability slider can be combined to show only images of one class which has a selected range of probabilities.
Image Deepnet Page - Regression
If the objective field is numeric, the Image Deepnet Page is the regression variation.
Below the top row, the view is composed of two main sections: the Image Results and the Performance Panel. See Figure 5.43 . The example deepnet in the figures of this section was created from a dataset for estimating the number of penguins in images, with its numeric objective field “count”.
As stated above, the Image Deepnet Page mainly shows the performance of the deepnet on a set of sampled instances, called the holdout set. The number of the instances in the holdout set is at most 20% of the total instances in the dataset, or 1024, whichever is smaller.
The performance panel shows the performance of the deepnet on the holdout set, in terms of error percentages. All numeric predictions have errors. A value in percentage, called split, is used to divide all predictions in the holdout set into two groups. The default split is 10%, which can be changed by users.
The size of the holdout set is shown right below the panel header. It also shows how many instances are displayed, as this can be changed by a filter.
As seen in Figure 5.44 , the split is at 10%, by default, so the images are divided into two groups, one “Predicted within 10.00% error”, another “Predicted outside of 10.00% error”. Both percentages and fractions are employed to show the relative sizes of the two groups in regard to the split. In the example, for instance, there are 2 images predicted within 10% error, so for this group, it’s 6.67% or 2/30.
“Average percent error” is a metric showing the average error a prediction has in each group. It is the sum of the absolute values of the percentage errors of all instances, divided by the number of instances.
There are two controls that can change how the Image Results section appears.
The DISPLAY ERROR slider allows users to filter the results by selecting the range of errors for each prediction. This controls how many images are shown in the Image Results section.
By default, the Image Results section shows all the images in the holdout set, which are divided into two groups by the split. The DISPLAY ERROR slider shows the greatest negative error at its left end, and the greatest positive error at its right end. Note: When a prediction is more than the true value, it produces a positive error. Conversely, a negative error. Either end of the slider can be changed by dragging the respective knob, then the Image Results section will only show images with the predictions within the range of the slider. The images having the prediction errors outside of the range will be filtered out.
As in Figure 5.45 , the lower end of the DISPLAY ERROR slider was changed to -15.0 and the upper end to 0.5. Now any image with the prediction error outside of the range, i.e. either less than -15.0 or greater than 0.5, doesn’t show up in the Image Results section. There are 21 images displayed, instead of 30, as indicated inside the parentheses below the panel heading. The relative sizes of the split groups are changed as well, from 2/30 and 28/30 to 2/21 and 19/21, respectively, reflecting the fewer number of images displayed.
The DISPLAY ERROR slider can be reset to the default by dragging its knobs to include all possible errors, or by pressing the reset icon in the heading of the Performance Panel.
Both ends of the DISPLAY ERROR slider are editable text input boxes, so users can enter precise numbers if so desired.
The SPLIT ERROR slider allows users to select the split, which is the error percentage used to divide the Image Results section into two groups: one having prediction errors smaller than the split, and another having prediction errors bigger than the split.
As seen in Figure 5.46 , moving the knob on the SPLIT ERROR slider changes the split value, which is shown above the slider, anywhere between 0% and the biggest possible error percentage produced by the holdout set. In addition, below the slider shows the total number of instances displayed, on the left of the slider is the number of instances in the group with prediction errors smaller than the split, and on the right the number of instances in the group with prediction errors bigger than the split. In the parentheses are the percentages of the group size in respect to the total instances, respectively. As the split value changes, both group sizes (and percentages) on the left and right sides change accordingly. This gives a direct visualization of how the split affects the partition of the Image Results section.
The SPLIT ERROR slider can be reset to the default value, 10%, by pressing the reset icon in the heading of the Performance Panel.
The split value above the SPLIT ERROR slider is an editable text input box, so users can enter a precise split value if so desired.
The total number of instances below the SPLIT ERROR slider may be affected by the DISPLAY ERROR slider.
The Image Results section shows two subsections, which represent two groups divided by the split value. The default split value is 10%. One group is called “Predicted within 10% error”, which has the images with prediction errors less than 10%. Another group is called “Predicted outside of 10% error”, which has the images with prediction errors greater than 10%. When the split value is changed by the SPLIT ERROR slider in the Performance Panel, the subsection titles and images change accordingly.
As seen in Figure 5.47 , each subsection in the Image Results section is a paginable list of images from the holdout set. In each list, every image has a caption which shows its predicted value, true value, error and error percentage.
On the top right of each subsection is the relative size of the subsection by percentage and count, with respect to the total number of instances displayed. The list of images are paginable by using the pagination arrows at the bottom of each subsection. Each page shows up to 6 images, scaled to fit the area.
When users mouse-over an image, a popup box would show prediction result of the image: predicted value, true value and the error which is defined as (predicted - true). In the example shown in Figure 5.47 , the popup box is red in background, signaling a prediction which error is greater than the split value, with its predicted value as 2.90374, its true value 6.00000 and the error -3.09626.
The total number of images displayed in the Image Results section may be affected by the DISPLAY ERROR slider in the Performance Panel.
5.5.3 Summary Report
Field Importances
The field importances for deepnets provide a measure of how important an input field is relative to the others to predict the objective field. Each field importance is normalized to take values between 0% and 100%. All field importances should sum 100%. You can access them by clicking in the Summary Report option shown in Figure 5.48 . You can also export the field importances in PNG, CSV and JSON format.
Deepnet field importances are based on Shapley values. For more information, please refer to this research paper: A Unified Approach to Interpreting Model Predictions [ 25 ] .
Summary
If you created your deepnet using the Automatic Network Search option (see subsection 5.4.2 ), you will be able to see a tab called “Summary” next to the field importances tab (see Figure 5.49 ). Find here the configuration of each of the networks composing the deepnet in JSON format. You can find all the parameters explained in section 5.4 per network.