Classification and Regression with the BigML Dashboard
1.7 BigML Model Predictions
1.7.1 Introduction
The ultimate goal in building a BigML Model is being able to make predictions for previously unseen instances with an unknown label. In BigML, you can make predictions for single instances or for many instances in a batch. Each prediction comes with a measure indicating its reliability, expressed either as a Confidence, probability or as an Expected error.
The predictions tab in the main menu of your BigML Dashboard is where all your saved predictions are listed. (See Figure 1.39 .)
Model predictions are saved under the Classification & regression option in the menu. (See Figure 1.40 .)
From this view you can select to view the list for your single instances predictions or your batch predictions by clicking in the corresponding icons. (See Figure 1.41 and Figure 1.42 .)
In the predictions list view, you can see, for each prediction, the Model or Ensemble icon used for the prediction, the Name of the prediction, the Objective (objective field name), the Prediction (the prediction result), and the Age (time since the prediction was created). (See Figure 1.43 .)
You can also search your predictions by name by clicking the search button on the top right menu.
1.7.2 Creating Model Predictions
As shown in Figure 1.44 , BigML provides three options to make predictions from your models:
predict by question: to predict a single instance answering just the relevant questions required by the model.
predict: to predict a single instance using the prediction form.
batch prediction: to predict multiple instances simultaneously.
Predict by Question
This option will ask for a series of input field values, one question at a time, required to make a single instance prediction. In the general case, only the values of a subset of all input fields are needed to reach a prediction. So although your model may have seven data fields as potential predictors, if it is able to make a prediction with only two answered questions, then you will not have to specify the remaining seven fields by way of answering more questions. Follow these steps:
Start the questions by clicking
:Set the values for each question and click
:After the model gives you a final answer, you will be able to optionally
the prediction:
Predict
BigML allows you to quickly make predictions for single instances by providing a form containing the fields used by the model, so you can easily set the input data and get an immediate response. This option is only available from the BigML Dashboard for models with less than 100 fields. If you want to perform single instance predictions for models with a higher number of fields, you can use the BigML API.
Follow the steps detailed below to create a single prediction:
Choose the predict option under the model 1-click menu. (See Figure 1.48 .)
Alternatively, you can choose the predict option in the pop up menu in the list view as shown in Figure 1.49 .
You will be redirected to the prediction form where you will find all the fields used by the model as predictors ordered by Field importance. The importance percentage is found next to the field name as shown in Figure 1.50 . You may not find all the fields from your original dataset because the model may find them irrelevant or redundant in terms of their predictive impact.
Select the fields you want to be taken into account for your prediction as shown in Figure 1.51 . Non-selected fields will be considered as missing values during the prediction. If your model was trained with Missing splits (see subsection 1.4.4 ), then missing values are considered by the model as any other valid value. If your model was built without missing values then any of the Missing strategies may apply during your prediction (see Missing Strategies .)
Set input values for your selected fields. Depending on the field type, you will need to input the values differently:
Numeric fields: move the slider or input a specific value in the edition box.
Categorical fields: select one class from the selector.
Text fields: write one or several terms in the free text box.
Date-time fields: select the appropriate values from the selector.
Items fields: when you write the first three characters of an item name, several items matching those characters will appear, so you can select the right one. You can input more than one item for a field.
Get the prediction along with the confidence, the probability or expected error on the top of the form. BigML predictions are synchronous, i.e., when you send the input data you get an immediate response. Read more about local predictions in Figure 1.53 .
Optionally
the prediction so you can access them afterwards from the model predictions list view.
Local Predictions
BigML provides Local predictionss from the BigML Dashboard for single instance predictions. Local predictions allow you to get a real-time prediction without consuming any credits or requiring an internet connection. This is possible because your model is saved in the browser’s memory so when the input values change, BigML immediately navigates your model to obtain their predictions in a matter of microseconds.
Predictions with Images
BigML models can be trained from images using extracted image features (subsection 1.2.8 ). Because image features are automatically generated numeric fields, creating model predictions with images is the same as creating other models. The only thing different is input fields of images.
Note: When the input fields contain images, in order to create the single prediction, BigML will extract image features automatically to match what were used in the dataset to train the model.
The model in Figure 1.54 , “grape-strawberry”, was created from a dataset containing image features Histogram of gradients. This set of image features are extracted by default for all image composite sources. Creating a prediction using the model will be directed to the prediction form which presents all input fields used by the model. One of them is the image field. Because this is a single prediction, an image is input by using a single image source. Clicking on the input field box, single image sources available will be in the dropdown list. There is also a search box which can be used to locate specific ones.
Oftentimes single image sources were used for creating a composite source, they become component sources of the composite source. Or an image was uploaded as a part of an archive file (zip/tar) which created a composite source. In those cases, the composite source will be shown in the dropdown list, along with an icon “List components”. In the example in Figure 1.55 , predict-images.zip is a composite source, click on the icon to show its component sources.
After the component sources of the composite are listed, scroll the dropdown list to find the desired one, then click to select it, as shown in Figure 1.56 . There is also a search box to locate specific component sources.
In addition to images, models may use other fields, which will be in the prediction form too. As shown in Figure 1.57 , all the fields can be selected, and their input values be set by dragging the knobs on the sliders or by entering precise values in their input boxes.
Once all fields are selected, click on the green button
to create a prediction.After a new prediction is created, as shown in Figure 1.58 , the predicted class is at the top of the form along with its probability. The prediction interface is the same as ones created by non-image models. Everything described earlier in this section (Predict ) applies.
Batch Predictions
BigML batch predictions allow you to make simultaneous predictions for multiple instances. All you need is the model you want to use to make predictions and a dataset containing the instances for which you want to obtain prediction. BigML will create a prediction for each instance in the dataset. Follow the steps detailed below to create a batch prediction:
Select the batch prediction option under the model 1-click menu (see Figure 1.59 ) or the create batch prediction option in the pop up menu of the list view (see Figure 1.60 .)
Select the dataset containing all the instances you want to create a prediction for. The instances should contain the input values for the fields used by the model as predictors. You can also select a subset of the model fields to be taken into account by configuring your prediction (see Field Mapping .) BigML batch predictions can handle missing data in your prediction dataset (see Missing Strategies .)
Optionally, select the model you want to use for the prediction. BigML pre-selects the model you created the batch prediction from at step 1, but you can change it at any time in the batch prediction view by selecting another model from the model selector displayed in the right pane. You can even switch to an ensemble or logistic regression by selecting the corresponding icon in the top left menu.
After you have selected the model and the dataset, the batch prediction configuration options (see subsection 1.7.3 ) will appear along with a preview of the prediction output, which is formatted as a comma-separated list of values (CSV format). (See Figure 1.62 .) The default output includes all the fields in your prediction’s dataset plus a last column containing the calculated predictions.
Note: BigML does not include the predictions’ confidence, probability or expected error by default so you will have to configure your output file to include that information as explained in Output Settings .
By default, BigML generates an output Dataset containing the batch prediction results. You can find in BigML Dashboard’s dataset list view and can use it as any other dataset to analyze the batch prediction output afterwards. If you do not want a dataset with all the prediction results to be created, you can deselect the button highlighted in Figure 1.63 .
Once you are done configuring your batch prediction, click the Figure 1.64 .)
green button to generate it. This process may take some time depending on the size of the input dataset. (SeeAfter the batch prediction has been created, you will be able to download a CSV file with all the instances found in your input dataset along with the prediction corresponding to each one of them. (See Figure 1.65 .)
If you did not disable the option to create a dataset, as explained above (see step 4), an Figure 1.66 .)
button will also be available to allow you to directly jump to the output dataset. (See
Batch Prediction with Images
BigML models can be trained from images using extracted image features (subsection 1.2.8 ). The input of a batch prediction is a dataset. So when creating a batch prediction with images, the dataset has to have the same image features used to train the model. The image features are in the dataset used to create the model.
As shown in Figure 1.67 , the input for the model batch prediction is selected as predict-images, which is a dataset consisting of six images and contains the default set of extracted image features, Histogram of gradients.
Image features are configured at the source level. For more information about the image features and how to configure them, please refer to section Image Analysis of the Sources with the BigML Dashboard [ 22 ] .
For the rest of batch predictions with images, including batch prediction configuration options and output datasets, everything stated earlier in current section (Batch Predictions ) applies.
1.7.3 Configuring Model Predictions
BigML provides several options to change its default behavior when calculating predictions. For single predictions as well as for batch predictions you can configure the strategy used for handling missing values (see Missing Strategies .) and set a probability or confidence threshold for a given class only for classification models (see Confidence and Probability Threshold .). For batch predictions, you can also set a default numeric value for missing values (Default Numeric Value ), the automatic fields mapping performed by BigML (Field Mapping ), and define the output file settings (Output Settings .)
Missing Strategies
When you create a new prediction, BigML will automatically navigate through the corresponding model to find the leaf node that best classifies the new instance.
However, it may just so happen that your new data (the instances you want to predict) does not have populated values for all the fields used in building the original model. For example, imagine that you are trying to predict diabetes and you have the patient’s glucose level and BMI (Body Mass Index) but not his blood pressure. If the model arrives at a node where the blood pressure level is required, BigML can handle this missing value by using one of these two strategies:
Last prediction: it returns the prediction value and confidence or probability of the parent node.
Proportional: it combines all subtrees predictions beneath the current node based on the data distribution of their child nodes in order to compute the prediction value and confidence or probability.
For single predictions you can select any of both Missing strategies by clicking in the icons shown in Figure 1.68 .
For batch predictions you can find both options under the configuration panel as shown in Figure 1.69 .
Confidence and Probability Threshold
Confidence and probability thresholds are only available for classification models, and they usually make sense when you want to minimize false positives at the cost of false negatives. The positive class will be predicted if its confidence or probability is greater than the given threshold, otherwise the following class with greater confidence or probability will be predicted instead.
To configure a threshold for your single predictions follow these steps:
Select the probability or the confidence measure using the buttons shown in Figure 1.70 . To learn more about the differences between model confidences and probabilities refer to subsection 1.2.6 .
Select the positive class, i.e. the class for which you want to apply the confidence or probability threshold:
Set a value for the threshold using the slider. The positive class will only be predicted when the confidence or probability of the prediction is above the established threshold, otherwise the following class with higher probability or confidence will be predicted instead.
For batch predictions, you will find the same options under the Configure panel. (See Figure 1.73 .)
Default Numeric Value
If the dataset used to make the batch prediction contains instances with missing values for the numeric fields you can easily replace them by the field’s Mean, Median, Maximum, Minimum or by Zero using the Default numeric value before creating your batch prediction, (See Figure 1.74 .)
Field Mapping
By default, BigML maps fields based on their names. If there is a mismatch between the field names in your model and those in the input dataset you selected for the batch prediction, you can specify the right correspondence between the two sets of fields by explicitly assigning to each field appearing in the “Model fields” column its associated input field in the “Dataset fields” column. (See Figure 1.75 .)
If the dataset’s and model’s field names do not match but their IDs do, which happens when corresponding fields appear in the same order, you can tell BigML to use the field ID instead of the field name to map the fields. To this aim, click the green switcher shown in Figure 1.75 .
If you do not want some of the fields to be considered during the evaluation, you can also manually search for those fields and remove them from the “Dataset fields” column.
The fields mapping from the BigML Dashboard has a limit of 200 fields. For batch predictions with a higher number of fields, use the argument field_map
from BigML API if you need to map your fields.
Output Settings
As mentioned, batch predictions can create a CSV file containing all input instances along with the predictions BigML calculated for each of them. Define the following settings to customize your prediction file:
Separator: this option allows you to choose a separator for your output file values. The default separator is the comma. You can also select the semicolon, the tab, or the space.
New line: this option allows you to set the new line character to use as the line break in the generated csv file: “LF”, “CRLF”.
Output fields: this option allows you to include or exclude any of your dataset fields from the output file from the preview shown in Figure 1.76 .
Note: a maximum of 100 fields are displayed in the preview, but all your dataset fields are included in the output file by default unless you exclude them.
Headers: this option includes or excludes a first row in the output file (and in the output dataset) with the names of each column (input field names, prediction column name, probability and/or confidence column name, field importances column names, etc.). By default, BigML activates the headers.
Prediction column name: this option allows you to customize the name for your predictions column. By default BigML uses the name of the model’s objective field.
Confidence or expected error: this option allows you to include an additional column in the output file with the confidence or expected error per instance. By default, neither the confidence nor the expected error are included.
Confidence column name: this option allows you to customize the name for the confidence (or expected error) column in case you include it in the output file. By default BigML uses “confidence”.
Probability: this option allows you to include an additional column in the output file with the predicted class probability for each instance. By default, it is not included.
Probability column name: this option allows you to customize the name for the probabilities column in case you include it in the output file. By default BigML uses “probability”.
All class confidences: this option allows you to include the confidences for each class in the objective field. There is a column per class, named "<class_name> confidence".
All class probabilities: this option allows you to include the probabilities for each class in the objective field. There is a column per class, named "<class_name> probability".
Importances: this option allows you to include a column for each of the field relative importances for the model predictions. There is a column per field, named "<field_name> importance".
1.7.4 Visualizing Model Predictions
Model predictions visualization changes depending on whether you are predicting one single instance or you are predicting multiple instances using the batch predictions option. (See Single Predictions .)
Single Predictions
For single predictions you can find the prediction for your objective field at the top of the form along with the performance measure.
For classification models you will find the objective field class predicted along with the probability or the confidence depending on which measure you select. You will also get all the class distribution histogram according to the measure selected, i.e., all class probabilities or all class confidences. (See Figure 1.77 .)
For regression models you will get a numeric prediction and the expected error for that prediction as shown in Figure 1.78 .
In either case, you can change any time the value of the displayed input fields to have your prediction recalculated in real-time.
If you have saved your prediction, you can go back to it and visualize it.
Read a detailed explanation of confidence, the probability, and expected error calculations in subsection 1.2.6 and subsection 1.2.7 respectively.
Prediction explanation
Prediction explanation helps understand why a model makes a certain prediction. This is very useful in many applications, and the reasons behind a model’s prediction are often as important as the prediction itself.
BigML prediction explanation is based on Shapley values. For more information, please refer to this research paper: A Unified Approach to Interpreting Model Predictions [ 25 ] .
For any classification or regression model, you can request the explanation for the prediction by clicking the Figure 1.79 ).
icon and then click (seeThe prediction explanation represents the most important factors considered by the model in a prediction given the input values. Each input value will yield an associated importance, as you can see Figure 1.80 . The importances across all input fields should sum 100%.
For some input fields you will see a “+” icon next to the importance. This is because the importance may not be directly associated with the input value, i.e., it can be explained by other reasons. In the Figure 1.81 below, the importance of 6.12% for the field “Age” is not explained by this field being equal to 44.93. Rather, it is because this field value is higher than 30.5 and lower than 45.47.
The prediction explanation for models is calculated using the prediction path of the decision tree.
Note: the input field importances in the prediction explanation are different from the overall field importances of the model. A field can be very important for the model but insignificant for a given prediction.
Batch Predictions
For batch predictions, you always get a CSV file and an optional output dataset.
Output CSV File From the batch prediction view, you can access the CSV file containing your predictions for each of your dataset instances in the last column. (See Figure 1.82 .) You can configure several options to customize your CSV file including the separator for the columns, the name of your prediction column, the dataset fields you want to include, whether you want to include a first row with the names of your columns. You can find a detailed explanation of those options in Output Settings .
Note: by default BigML does not include the predictions confidence, probability or expected error in your output file. Again you will need to click that option from the output settings panel if you want to include it.
See an output CSV file example in Figure 1.83 where the two last columns contain the prediction and the confidence for each instance.
Output Dataset By default, BigML creates a dataset out of your batch prediction. (See Output Settings .) You can access your output dataset from the batch prediction view by clicking the button shown in Figure 1.85 .
In the output dataset you can find an additional field (named by default as per your model’s objective field) containing the predictions for each one of your instances. If you configured your batch prediction to include the confidence, the probability or expected error you will be able to find it in the last field of your output dataset as shown in Figure 1.85 .
Batch Prediction 1-Click Actions
From the batch prediction view you can perform the following actions (see Figure 1.86 ):
batch prediction again: this option will redirect you to the batch prediction creation view, with the same model and prediction dataset already selected. This option allows you to rapidly recreate the batch prediction using a different configuration.
batch prediction with another dataset: this option allow you to easily create a batch prediction using the same model and a different dataset.
batch prediction using another model: this option allows you easily create a batch prediction using the same dataset and a different model.
new batch prediction: this option redirects you to the batch prediction creation view where you can select a prediction dataset and a model to create your prediction.
1.7.5 Consuming Model Predictions
BigML provides plenty of means for developers to integrate BigML model predictions within their apps. In the following sections, we will describe how you can use the BigML REST API and the BigML Python bindings to work with model predictions.
Using Model Predictions via the BigML API
Model predictions have full citizenship in the BigML API. This means you can programmatically create, update, list, delete, and use them for predictions. For example, this is how you can create a single prediction using the command line from a given model and defining the input data. This will require properly setting the BIGML_AUTH environment variable to contain your authentication credentials:
curl "https://bigml.io/prediction?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"model": "model/50650bdf3c19201b64000020",
"input_data": {"000001": 3, "000002":4.5, "000003"}}}'
For more information on using model predictions through the BigML API, please refer to prediction REST API documentation.
Using Model Predictions via the BigML Bindings
BigML bindings provide a convenient way to access BigML REST API from your language of choice. They offer a higher-level view of BigML Machine Learning resources and algorithms in a number of languages, including Python, Node.js, Java, Swift, and Objective-C. For example, this is how you can create a model prediction in Python using BigML bindings:
from bigml.api import BigML
api = BigML()
prediction = api.create_prediction("model/573d997058a27e0f620038df",
{"sepal length": 5,
"sepal width": 2.5},
{"name": "my prediction"})
BigML bindings also provide the means to carry through predictions locally, without ever hitting the network, which can greatly improve the latency of predicting from your apps. This is made possible by BigML models being white-box, meaning you can download them and use them independently from BigML. For example, the following code snippet shows how you can download a model and use it for making a local prediction using the BigML bindings for Python:
from bigml.model import Model
from bigml.api import BigML
api = BigML()
model = api.get_model("model/502fdbff15526876610002615",
query_string="only_model=true;limit=-1")
local_model = Model(model)
prediction = local_model.predict({"petal length": 3, "petal width": 1})
For more information on using models through the BigML bindings, please refer to BigML bindings documentation.
1.7.6 Descriptive Information
Descriptive information is what allows you to describe a prediction so you can find it later and easily recognize it among other predictions.
Each prediction has an associated name, description, category, and tags. You can find a brief description for each concept in the following sections. In Figure 1.87 , you can see the options that the More info panel gives to edit them.
Name
If you do not specify a name for your predictions, BigML assigns a default name depending on the type of predictions:
Single predictions: the name always follows the structure “Prediction for
<objective field name>
”Batch predictions: BigML combines your prediction dataset name and the model name: “Batch prediction of
<model name>
with<dataset name>
”.
Predictions names are displayed on the list view and also on the top bar of a prediction view. Predictions names are indexed to be used in searches. You can rename your predictions at any time from the More info panel.
The name of a prediction cannot be longer than 256 characters. More than one prediction can have the same name even within the same project, since they are automatically assigned unique internal identifiers.
Description
Each model prediction also has a description that it is very useful for documenting your Machine Learning projects. Predictions take the description from the models used to create them.
Descriptions can be written using plain text and also markdown. BigML provides a simple markdown editor that accepts a subset of markdown syntax. (See Figure 1.88 .)
Descriptions cannot be longer than 8192 characters and can use almost any character.
Category
Each prediction has associated a category taken from model used to create it. Categories are useful to classify predictions according to the domain which your data comes from. This is useful when you use BigML to solve problems across industries or multiple customers.
A prediction category must be one of the categories listed on table Table 1.2 .
Category Aerospace and Defense Automotive, Engineering and Manufacturing Banking and Finance Chemical and Pharmaceutical Consumer and Retail Demographics and Surveys Energy, Oil and Gas Fraud and Crime Healthcare Higher Education and Scientific Research Human Resources and Psychology Insurance Law and Order Media, Marketing and Advertising Miscellaneous Physical, Earth and Life Sciences Professional Services Public Sector and Nonprofit Sports and Games Technology and Communications Transportation and Logistics Travel and Leisure Uncategorized Utilities
Tags
A prediction can also have a number of tags associated with it that can help to retrieve it via BigML API or to provide predictions with some extra information. Your prediction inherits the tags from the model use to create it. Each tag is limited to a maximum of 128 characters. Each prediction can have up to 32 different tags.
1.7.7 Model Predictions Privacy
The link displayed in the privacy panel is the private URL of your prediction, so only a user logged into your account is able to see it. Neither single predictions nor batch predictions can be shared from your BigML Dashboard by sharing a link, as you can do with other resources.
1.7.8 Moving Model Predictions to Another Project
When you create a prediction it will be assigned to the same project where the original model is located. You cannot move predictions between projects as you do with other resources.
1.7.9 Stopping Models Predictions
Single predictions are synchronous resources, so you cannot cancel them during the creation since you get the result immediately.
Batch predictions are asynchronous resources, so you can stop the creation before the task is finished. You can use the delete option from the 1-click action menu (Figure 1.90 ) or from the pop up menu on the prediction list view. (See Figure 1.91 .) You can see in Figure 1.91 that the objective field column has the label processing to indicate the batch prediction is still in progress. If you stop the prediction during its creation, you will not be able to resume the same task again, so if you want to create the same prediction, you will have to re-start a new task.
1.7.10 Deleting Model Predictions
You can delete your single or batch predictions from the predictions view, using the 1-click action menu (see Figure 1.92 ) or using the pop up menu on the predictions list view (see Figure 1.93 .)
A modal window will be displayed asking you for confirmation. Once a prediction is deleted, it is permanently deleted and there is no way you (or even the IT folks at BigML) can retrieve it.