Anomaly Detection with the BigML Dashboard

6.3 Configuring Anomaly Scores

BigML provides several options to configure your anomaly scores, such as defining the automatic field mapping performed by BigML and the output file settings. See the following for an explanation of both options.

6.3.1 Field Mapping

You can specify which fields in the anomaly match with which fields in the dataset containing the instances you want to score. BigML automatically matches fields by name, but you can set an automatic match by field ID by clicking in the green switcher shown in Figure 6.21 . You can also manually search for fields or remove them if you do not want to consider them during the scoring.

\includegraphics[]{images/an-batch-field-mapping}
Figure 6.21 Field mapping for batch scores

Note: the field mapping from the BigML Dashboard has a limit of 200 fields, for batch scores with higher number of fields you can use BigML API.

6.3.2 Output Settings

Batch anomaly scores return a CSV file containing all your instances and their scores by default. You can tune the following settings to customize your output file:

  • Separator: this option allows you to choose the best separator for your output file columns. The default separator is comma. You can also select semicolon, tab or space.

  • New line: this option allows you to set the new line character to use as the line break in the generated csv file: “LF”, “CRLF”.

  • Output fields: you have an option to include or exclude all your dataset fields from your output file. You can also select the fields you want to include or exclude one by one from the preview shown in Figure 6.22 .

    Note: a maximum of 100 fields are displayed in the preview, but all your dataset fields are included in the output file by default unless you exclude them.

  • Headers: this option includes or excludes a first row in the output file (and in the output dataset) with the names of each column. By default, BigML includes the headers.

  • Score column name: you can customize the name for your scores column.

  • Field importances: you can include field importances in the output file in addition to the anomaly score. This is an indicator of each field contribution to the anomaly score. (See section 2.1 .)

\includegraphics[]{images/batch_options}
Figure 6.22 Output file settings for batch scores