Anomaly Detection with the BigML Dashboard
6.2 Creating Anomaly Scores
BigML provides two different ways to predict scores for new instances using your anomaly detector:
ANOMALY SCORE: to score single instances.
BATCH ANOMALY SCORE: to score multiple instances simultaneously.
6.2.1 Anomaly Score
To score new single instances BigML provides a form containing the fields used by the anomaly detector so you can easily configure the input data and get an immediate response.
Follow these steps to create your anomaly score:
Click the Anomaly score option in the 1-click action menu. (See Figure 6.6 .)
Alternatively, click Anomaly score in the pop up menu from the anomaly list view as shown in Figure 6.7 .
You will be redirected to the prediction form, where you will find all the input fields used by the anomaly to compute the anomaly score. (See Figure 6.8 .)
Select the input fields and set their values. Depending on the field type you will need to input the values in different ways. (See Figure 6.8 .)
Numeric fields: move the slider or input a specific value in the box.
Categorical fields: select one class from the selector.
Note: text and items fields are not supported to create anomalies (see section 2.2 ).
Click Figure 6.8 .)
to get the anomaly score on top of the form. (SeeThe score is saved automatically so you can find it afterwards in the prediction list view. (See Figure 6.1 .)
Note: single anomaly scores are only available for anomalies with less than 100 fields from the BigML Dashboard. If you want to perform single anomaly scores for anomalies with higher number of fields you can use the BigML API.
Anomaly Score with Images
BigML anomalies can be trained from images using extracted image features (section 2.4 ). Because image features are automatically generated numeric fields, creating anomaly scores with images is the same as creating other anomaly socres. The only thing different is input fields of images.
Note: When the input fields contain images, in order to create the anomaly score, BigML will extract image features automatically to match what were used in the dataset to train the anomaly.
The anomaly in Figure 6.9 , “firetruck resnet18”, was created from a dataset containing image features extracted from a pre-trained CNN, ResNet-18. Creating an anomaly score using the anomaly will be directed to the prediction form which presents all input fields used by the cluster. One of them is the image field. Because this is a single anomaly score, which is a single prediction, an image is input by using a single image source. Clicking on the input field box, single image sources available will be in the dropdown list. There is also a search box which can be used to locate specific ones.
Oftentimes single image sources were used for creating a composite source, they become component sources of the composite source. Or an image was uploaded as a part of an archive file (zip/tar) which created a composite source. In those cases, the composite source will be shown in the dropdown list, along with an icon “List components”. In the example in Figure 6.10 , anomaly-scores.zip is a composite source, click on the icon to show its component sources.
After the component sources of the composite are listed, scroll the dropdown list to find the desired one, then click to select it, as shown in Figure 6.11 . There is also a search box to locate specific component sources.
After a new anomaly score is created, as shown in Figure 6.12 , the score is at the top of the form. The anomaly interface is the same as ones created by non-image anomalies. Everything described earlier in this section (subsection 6.2.1 ) applies.
6.2.2 Batch Anomaly Scores
BigML batch anomaly scores allow you to make predictions for multiple instances simultaneously. All you need is the anomaly detector you want to use and a dataset containing the instances for which you want to obtain the scores. BigML will create a score for each instance.
Follow these steps to create a batch anomaly score:
Click batch anomaly score option in the anomaly 1-click action menu. (See Figure 6.13 .)
Alternatively, click create batch anomaly score in the pop up menu from the anomaly list view as shown in Figure 6.14 .
Select the dataset containing all the instances you want to predict. (See Figure 6.15 .) The instances should contain the input values for the fields used by the anomaly detector. BigML batch anomaly scores can handle missing data in your dataset as explained in section 2.2 . From this view you can also select another anomaly from the anomaly selector.
After you select the anomaly detector and the dataset, the batch anomaly score configuration options will appear along with a preview of the prediction file. (See Figure 6.16 .) The default format is a CSV file including all your dataset fields and adding an extra column for the anomaly scores. You can configure this file using the output settings explained in section 6.3 .
By default, BigML generates an output dataset with your batch anomaly scores that you can later find in your datasets section in the BigML Dashboard. This dataset can be helpful to analyze your results afterwards. This option is active by default, but you can deactivate it by clicking in the icon shown in Figure 6.17 .
Finally click on the
button to generate your batch anomaly score.When the batch anomaly score is created, you will be able to download the batch score containing all your dataset instances along with a score for each one of them. If you did not disable the option to create a new dataset, you will also be able to access the output dataset from the batch anomaly score view. (See Figure 6.19 .)
Batch Anomaly Scores with Images
BigML anomalies can be trained from images using extracted image features (section 2.4 ). The input of a batch anomaly score is a dataset. So when creating a batch anomaly score with images, the dataset has to have the same image features used to train the anomaly. The image features are in the dataset used to create the anomaly.
As shown in Figure 6.20 , the input for the batch anomaly score is selected as anomaly-scores resnet18, which is a dataset consisting of six images and contains image features extracted from a pre-trained CNN, ResNet-18.
Image features are configured at the source level. For more information about the image features and how to configure them, please refer to section Image Analysis of the Sources with the BigML Dashboard [ 22 ] .
For the rest of batch anomaly scores with images, including batch anomaly score configuration options and output datasets, everything stated earlier in current section (subsection 6.2.2 ) applies.