Classification and Regression with the BigML Dashboard
6.6 Fusion Predictions
6.6.1 Introduction
The ultimate goal in building a fusion is being able to make predictions with it. On BigML, you can make predictions for single instances or for many instances in batch. Each prediction comes with a measure indicating the prediction confidence. For Regression problems, the expected error is provided along with the predicted value, for Classification problems the vector of probabilities per class is returned (a percentage ranging from 0% to 100%).
The predictions tab in the main menu of the BigML Dashboard is where all your saved predictions are listed. (see Figure 6.29 ). You can search your predictions by name clicking on the search option on the top menu. In the predictions list view, you can see the fusion icon used for each prediction, the Name of the prediction, the Objective (objective field name), the Prediction (the prediction result), and the Age (time since the prediction was created).
When you first create an account on BigML, or every time that you start a new Project, your list of predictions will be empty (see Figure 6.30 ).
Fusion predictions are saved under the Classification & Regression option in the menu (see Figure 6.31 ).
Select the list for your single instances predictions or your batch predictions by clicking on the corresponding icons. (See Figure 6.32 and Figure 6.33 .)
6.6.2 Creating Fusion Predictions
BigML provides two options to predict with your fusions explained in the following subsections:
predict: to predict a single instance
batch prediction: to predict multiple instances in batch.
Predict
BigML allows you to quickly make predictions for single instances by providing a form containing the input fields used by the fusion, so you can easily set the values and get an immediate response.
Follow these steps to create a single prediction:
Click predict in the fusion 1-click action menu. (See Figure 6.34 .)
Alternatively, click predict in the pop up menu in the list view. (See Figure 6.35 .)
You will be redirected to the prediction form where you will find all the fields used by the fusion as input fields. (See Figure 6.36 .)
Select the fields to be used for your prediction, set the input values for your selected fields and click Figure 6.37 ). Hide or display the histogram view containing the rest of your class probabilities. You can download all the probabilities in PNG, CSV or JSON format by clicking on the corresponding icons. (See Single Predictions .)
. Non-selected fields will be considered as missing values during the prediction. See the prediction at the top of the view along with the rest of the class probabilities (BigML predictions are synchronous, i.e., when you send the input data, you get an immediate response.
Your prediction is automatically saved and you can find it in the predictions list view.
For regression fusions, the process is the same, but instead of the predicted classes and the probabilities you get a numeric value for the objective field along with an expected error as the certainty measure.
Note: this option is only available from the BigML Dashboard for fusions with less than 100 fields. If you want to perform single instance predictions for a higher number of fields, use the BigML API.
Batch Predictions
BigML batch predictions allow you to make simultaneous predictions for multiple instances. All you need is the fusion you want to use to make predictions and a dataset containing the instances you want to predict. BigML will create a prediction for each instance in the dataset.
Follow these steps to create a batch prediction:
Click on batch prediction option under the fusion 1-click action menu (Figure 6.38 )
Alternatively, click on create batch prediction in the pop up menu of the list view (Figure 6.39 ).
Select the dataset containing all the instances you want to predict. The instances should contain the input values for the fields used by the fusion as input fields. From this view you can also select another fusion from the selector (see Figure 6.40 ).
After the fusion and the dataset are selected, the batch prediction configuration options will appear along with a preview of the prediction output (a CSV file) (see Figure 6.41 ). The default output format includes all your prediction dataset fields and adds an extra column with the class predicted. See subsection 6.6.3 for a detailed explanation of all configuration options.
By default, BigML generates an output Dataset with your batch predictions that you can later find in your datasets section of the BigML Dashboard. This option is active by default but you can deactivate it by clicking on the icon shown in Figure 6.42 .
After you configure your batch prediction, click on the green button
to generate your batch prediction.When the batch prediction is created, you will be able to download the CSV file containing all your dataset instances along with a prediction for each one of them. If you did not disable the option to create a dataset previously explained, you will also be able to access the output dataset from the batch prediction view (see Figure 6.43 ).
6.6.3 Configuring Fusion Predictions
BigML provides several options to configure your batch predictions such as the missing strategy (see Missing Strategies ), setting a probability threshold (see Probability threshold ), default values for your missing numeric values (see Default Numeric Value ), fields mapping (see Fields Mapping ), and output file settings (see Output Settings ).
Missing Strategies
This option is available only when the fusion contains models and/or ensembles since the missing strategy has no effect for logistic regressions or deepnets.
When you create a new prediction, BigML will automatically navigate through the corresponding model or ensemble to find the leaf node that best classifies the new instance. However, it may just so happen that your new data (the instances you want to predict) does not have populated values for all the fields used in building the original ensemble. For example, imagine that you are trying to predict diabetes and you have the patient’s glucose level and BMI (Body Mass Index) but not his blood pressure. If the model or ensemble arrives at a node where the blood pressure level is required, BigML can handle this missing value by using one of these two strategies:
Last prediction: it returns the prediction value and probability of the parent node.
Proportional: it combines all predictions beneath the current node based on the data distribution of their child nodes in order to compute the prediction value and probability.
For single predictions you can select either Missing strategies by clicking in the icons shown in Figure 6.44 .
For batch predictions you can find both options under the configuration panel as shown in Figure 6.45 .
Probability threshold
Probability thresholds usually makes sense when you want to minimize false positives at the cost of false negatives. The positive class will be predicted if its probability is greater than the given threshold; otherwise, the following class with greater probability will be predicted. This option is only available for classification fusions.
Follow these steps to configure a threshold for your batch prediction:
Select the positive class, i.e., the class for which you want to apply the threshold (Figure 6.46 ).
Set a probability threshold using the slider shown in Figure 6.47 and click .
If the positive class probability is greater than the given threshold, it will be predicted; otherwise, the following class with greater probability will be predicted.
You can also find the same options to set a threshold for batch predictions under the configure panel (see Figure 6.49 ).
Default Numeric Value
By using the Default numeric value before creating your batch prediction, you can easily replace all the missing numeric values in the dataset by the field’s Mean, Median, Maximum, Minimum or by Zero (see Figure 6.50 ).
Excluded Fiels
This option allows you to exclude a set of fields from the prediction calculation but at the same time keep them in the output file and dataset (see Figure 6.51 ).
Fields Mapping
You can specify which input fields of the fusion match with the fields in the dataset contaning the instances you want to predict. BigML automatically matches fields by name, but you can also set an automatic match by field ID by clicking on the green switcher. Additionally, you can manually search for fields or remove them from the Dataset fields column if you do not want them to be considered during the batch prediction (see Figure 6.52 ).
Note: Fields mapping from the BigML Dashboard is limited to 200 fields. For batch predictions with a higher number of fields, map your fields using the BigML API.
Output Settings
Batch predictions return a CSV file containing all your instances and the final predictions. Tune the following settings to customize your prediction file (see Figure 6.53 ):
Separator: this option allows you to choose the best separator for your output file columns. The default separator is the comma. You can also select the semicolon, tab, or space.
New line: this option allows you to set the new line character to use as the line break in the generated csv file: “LF”, “CRLF”.
Output fields: by clicking on the list icon next to the separator selector, you can include or exclude all your dataset fields from your output file. You can also individually select the fields you want to include or exclude using the multiple output fields selector. Note: a maximum of 100 fields can be displayed in this selector, but all your dataset fields will be included in the output file by default unless you exclude them.
Headers: this option includes or excludes a first row in the output file (and in the output dataset) with the names of each column (input field names, prediction column name, probability column name, etc.). By default, BigML includes the headers.
Prediction column name: customize the name for your predictions column. By default, BigML takes the name of the fusion’s objective field.
Probability: this option allows you to include an additional column with the probability for the predicted class. By default it is not included in your ouput file. For regression fusions, you will find the expected error instead of the probability.
Probability column name: customize the name for the probability column if you include it in the output file. BigML sets “probability” as the default name. For regression fusions, you will find the expected error column name.
Individual model predictions: this includes all the per-model predictions composing the fusion. That will add a column per model, named <prediction_name>_\(n\) where \(n\) is the position of the model in the model list in the fusion, starting at 1.
All class probabilities: this includes all the probabilities of the objective field classes per instance. This option will add \(n\) extra columns, one by class in the objective field. This option does not exist for regression fusions.
Field importances: this option allows you to include a column for each of the field relative importances for the fusion predictions (taking into account the decision trees, ensembles, and deepnets composing the fusion). This option will add a column per field, named "<field_name> importance". If the fusion only contains logistic regressions, these field importances cannot be calculated.
6.6.4 Visualizing Fusion Predictions
Fusion predictions visualizations change depending if you are predicting a single instance (Single Predictions ), or multiple instances using the batch predictions option (Batch Prediction ).
Single Predictions
For single predictions, find the predicted class given the input fields values at the top of the form along with its probability (see Figure 6.54 ).
Below the prediction, there’s a histogram representing the rest of the objective field class probabilities. All the class probabilities must sum to 100%. Show or hide this view by clicking on the icon highlighted in Figure 6.55 . You can see up to seven different classes at the same time; if you have more than seven classes, you can see the others by clicking on the arrows icons. Export this view in PNG, CSV, or JSON format by clicking on the corresponding icons (see Figure 6.55 ).
For regression fusions, instead of the class probabilities you will get the predicted numeric value for the objective field.
Prediction explanation
Prediction explanation helps understand why a fusion makes a certain prediction. This is very useful in many applications, and the reasons behind a fusion’s prediction are often as important as the prediction itself.
BigML prediction explanation is based on Shapley values. For more information, please refer to this research paper: A Unified Approach to Interpreting Model Predictions [ 25 ] .
For any classification or regression fusion, you can request the explanation for the prediction by clicking the Figure 6.56 ).
icon and then click (seeThe prediction explanation represents the most important factors considered by the fusion in a prediction given the input values. Each input value will yield an associated importance, as you can see Figure 6.57 . The importances across all input fields should sum to 100%.
For some input fields you will see a “+” icon next to the importance. This is because the importance may not be directly associated with the input value, i.e., it can be explained by other reasons. In the Figure 6.58 below, the importance of the field “Pregnancies” is not explained by this field being equal to 10, rather, it is because this field value is greater than 5.
The prediction explanation for fusions is calculated using the results of over a thousand distinct predictions using random perturbations of the input data. For this reason, the calculation of the explanation may take some time to be computed.
Note: the input field importances in the prediction explanation are different from the overall field importances of the fusion. A field can be very important for the fusion but insignificant for a given prediction.
Batch Prediction
After creating your batch prediction, you get a CSV file and, optionally, an output dataset. Both outputs are explained in the following subsections.
Output CSV file
The batch prediction generates a CSV file containing your predictions for each of your dataset instances in the last column (see Figure 6.59 ).
You can configure several options to customize your CSV file. You can find a detailed explanation of those options in Output Settings .
See an output CSV file example in Figure 6.60 . The column class in this example contains the final prediction (it is named by default as your fusion’s Objective Field). In this case, we are predicting whether a person is a good or a bad candidate for holding a credit. This file has been configured to also contain the probability for each prediction.
Output Dataset
By default, BigML automatically creates a dataset out of your batch prediction. You can disable this option by configuring your batch prediction (see Output Settings ). You will find the output dataset in your batch prediction view as shown in Figure 6.61 .
In the output dataset, you can find an additional field (named by default as your fusion’s objective field) containing the class predicted for each one of your instances (see Figure 6.62 ). If you configured your batch prediction to include the prediction probabilities and all class probabilites, you will be able to find them in the last fields of your output dataset.
6.6.5 Consuming Fusion Predictions
You can fully use single and batch predictions via the BigML API and bindings. The following subsections explain both tools.
Using Fusion Predictions via the BigML API
Fusion predictions have full citizenship in the BigML API which allows you to programmatically create, configure, retrieve, list, update, and delete single and batch predictions.
In the example below, see how to create a single prediction using a fusion and define the input data once you have properly set the BIGML_AUTH environment variable to contain your authentication credentials:
curl "https://bigml.io/prediction?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"fusion": "fusion/50650bdf3c19201b64000020",
"input_data": {"000001": 3, "000002":4.5, "000003":5}}}'
For more information on using predictions through the BigML API, please refer to the documentation.
Using Fusion Predictions via BigML Bindings
You can also create, configure, retrieve, list, update, and delete single and batch predictions via BigML bindings, which are libraries aimed to make it easier to use the BigML API from your language of choice including Python, Node.js, Java, Swift and Objective-C. See below an example to create a fusion with the Python bindings.
from bigml.api import BigML
api = BigML()
prediction = api.create_prediction(
"fusion/50650bdf3c19201b64000020",
{"credit_amount": 5, "duration": 2.5})
For more information on BigML bindings, please refer to the bindings page.
6.6.6 Descriptive Information
Each fusion prediction has an associated name, description, category, and tags. You can find a brief description of each concept in the following subsections. The More info menu option displays a panel that provides editing options (see Figure 6.63 ).
Name
If you do not specify a name for your predictions, BigML assigns a default name depending on the type of predictions:
Single predictions: the name always follows the structure “
<fusion name>
”.Batch predictions: BigML combines your prediction dataset name and the fusion name: “
<fusion name>
with<dataset name>
”.
Prediction names are displayed on the list and also on the top bar of a prediction view. Prediction names are indexed to be used in searches. Rename your predictions at any time from the More info menu.
The name of a prediction cannot be longer than 256 characters. More than one prediction can have the same name even within the same project, but they will always have different identifiers.
Description
Each prediction also has a description that is useful for documenting your Machine Learning projects. Predictions have the same description as the fusion used to create them.
Descriptions can be written using plain text and also markdown. BigML provides a simple markdown editor that accepts a subset of markdown syntax (see Figure 6.64 ).
Descriptions cannot be longer than 8192 characters.
Category
Each prediction has an associated category taken from the fusion used to create it. Categories are useful to classify predictions according to the domain which your data comes from, which helps when you use BigML to solve problems across industries or multiple customers.
A prediction category must be one of the categories listed on table Table 4.5 .
Tags
A prediction can also have a number of tags associated with it. These tags help to retrieve the prediction via the BigML API or to provide predictions with some extra information. Your prediction inherits the tags from the fusion used to create it. Each tag is limited to a maximum of 128 characters. Each prediction can have up to 32 different tags.
6.6.7 Fusion Predictions Privacy
The link displayed in the Privacy panel is the private URL of your prediction, so only a user logged into your account is able to see it. Neither single predictions nor batch predictions can be shared by using a secret link (see Figure 6.65 ).
6.6.8 Moving Fusion Predictions to Another Project
When you create a prediction, it will be assigned to the same project where the original fusion is located. You cannot move predictions between projects as you do with other resources.
6.6.9 Stopping Fusion Predictions
Single predictions are synchronous resources, so you cannot cancel them during the creation since you get the result immediately.
On the other hand, batch predictions are asynchronous resources, so you can stop their creation before the task is finished. Use the delete batch prediction option from the 1-click action menu (Figure 6.66 ) or from the pop up menu on the list view.
A modal window will be displayed asking you for confirmation. If you stop the prediction during its creation you won’t be able to resume the same task again. So if you want to create the same prediction, you will have to start a new task.
6.6.10 Deleting Fusion Predictions
You can delete your single or batch predictions from the predictions view, using the 1-click action menu (see Figure 6.68 ) or using the pop up menu on the predictions list view (see Figure 6.69 ).
A modal window will be displayed asking you for confirmation. Once a prediction is deleted, it is permanently deleted, and there is no way you (or even the IT folks at BigML) can retrieve it.