Classification and Regression with the BigML Dashboard

2.6 BigML Ensemble Predictions

2.6.1 Introduction

The ultimate goal in building a BigML Ensembles is being able to make predictions for previously unseen instances with an unknown label. In BigML, you can make predictions for single instances or for many instances in a batch. Each prediction comes with a measure indicating its reliability, expressed either as a probability (for classification ensembles), as a Confidence or votes (only for classfication Decision Forests), or as an Expected error (only for regression Decision Forests).

The predictions tab in the main menu of your BigML Dashboard is where all your saved predictions are listed. (See Figure 2.41 .)

\includegraphics[]{images/ensemble-predictions/predictions-list-view}
Figure 2.41 Predictions list empty view

Ensemble predictions are saved under the Classification & Regression option in the menu. (See Figure 2.42 .)

\includegraphics[width=15cm]{images/ensemble-predictions/menu-options-predictions-list-view}
Figure 2.42 Menu options of the predictions list view

From this view you can select to view the list for your single instances predictions or your batch predictions by clicking in the corresponding icons. (See Figure 2.43 and Figure 2.44 .)

\includegraphics[width=2cm]{images/predictions/predictions-icon}
Figure 2.43 Single predictions icon
\includegraphics[width=2cm]{images/predictions/batchpredictions-icon}
Figure 2.44 Batch predictions icon

In the predictions list view, you can see, for each prediction, the Model, Ensemble or Logistic Regression icon used for the prediction, the Name of the prediction, the Objective (objective field name), the Prediction (the prediction result), and the Age (time since the prediction was created). (See Figure 2.45 .)

\includegraphics[]{images/predictions/predictions-list-view}
Figure 2.45 Predictions list view

You can also search your predictions by name by clicking the search button on the top right menu.

2.6.2 Creating Ensemble Predictions

As shown in Figure 2.46 , BigML provides three options to make predictions from your ensembles:

  1. predict: to predict a single instance using the prediction form.

  2. batch prediction: to predict multiple instances simultaneously.

\includegraphics[]{images/ensemble-predictions/prediction-menu-options}
Figure 2.46 Menu options to create predictions

Predict

BigML allows you to quickly make predictions for single instances by providing a form containing the fields used by the ensemble, so you can easily set the input data and get an immediate response. This option is only available from the BigML Dashboard for ensembles with less than 100 fields. If you want to perform single instance predictions for ensembles with a higher number of fields, you can use the BigML API.

Follow the steps detailed below to create a single prediction:

  1. Choose the predict option under the ensemble 1-click menu. (See Figure 2.47 .)

    \includegraphics[]{images/ensemble-predictions/ensemble-predict-one-click}
    Figure 2.47 Predict option from ensemble 1-click menu

    Alternatively, you can choose the predict option in the pop up menu in the list view as shown in Figure 2.48 .

    \includegraphics[]{images/ensemble-predictions/ensemble-predict-pop-up}
    Figure 2.48 Predict option from ensemble pop-up menu
  2. You will be redirected to the prediction form where you will find all the fields used by the ensemble as predictors ordered by Field importance. The importance percentage is found next to the field name as shown in Figure 2.49 . You may not find all the fields from your original dataset because the ensemble may find them irrelevant or redundant in terms of their predictive impact.

    \includegraphics[]{images/ensemble-predictions/field-importance}
    Figure 2.49 Single predictions form
  3. Select the fields you want to be taken into account for your prediction as shown in Figure 2.50 . Non-selected fields will be considered as missing values during the prediction. If your ensemble was trained with Missing splits (see Missing Splits ), then missing values are considered by the ensemble as any other valid value. If your ensemble was built without missing values then any of the Missing strategies may apply during your prediction (see Missing Strategies .)

    \includegraphics[]{images/ensemble-predictions/pred-ensembles-2}
    Figure 2.50 Select fields in the prediction form
  4. Set input values for your selected fields. Depending on the field type, you will need to input the values differently:

    • Numeric fields: move the slider or input a specific value in the text box.

    • Categorical fields: select one class from the selector.

    • Text fields: write one or several terms in the free text box.

    • Date-time fields: select the appropriate values from the selector.

    • Items fields: when you write the first three characters of an item name, several items matching those characters will appear, so you can select the right one. You can input more than one item for a field.

  5. Get the prediction on the top of the form. For classification ensembles you will get all classes distribution and for regression ensembles you will get the predicted value for the objective field. Both types of ensembles will also show a certainty measure along with the prediction:

    • For classification Decision Forests, you can get the probability, the confidence or the votes depending on the option you choose. For regression Decision Forests you get the expected error.

    • For classification Boosted Trees, you get the probability. For regression Boosted Trees the expected error cannot be calculated. Read more about ensemble predictions in Ensemble predictions: confidence, probability and expected error .

    \includegraphics[]{images/ensemble-predictions/single-pred}
    Figure 2.51 Single prediction view

    BigML predictions are synchronous, i.e., when you send the input data you get an immediate response. Read more about local predictions in Figure 2.52 .

  6. Optionally Save the prediction so you can access them afterwards from the ensemble predictions list view.

    \includegraphics[]{images/ensemble-predictions/predictions-section}
    Figure 2.52 Single predictions list view

Local Predictions

BigML provides Local predictionss from the BigML Dashboard for single instance predictions. Local predictions allow you to get a real-time prediction without consuming any credits or requiring an internet connection. This is possible because your ensemble is saved in the browser’s memory so when the input values change, BigML immediately evaluates all models, obtaining their predictions and then combining them in a matter of microseconds.

Local predictions are only available for ensembles built with 15 models or less for Decision Forests and 30 models or less in the case of Bosted trees. For ensembles with higher number of models, you can still perform remote single predictions.

Predictions with Images

BigML ensembles can be trained from images using extracted image features (subsection 2.2.3 ). Because image features are automatically generated numeric fields, creating ensemble predictions with images is the same as creating other ensembles. The only thing different is input fields of images.

Note: When the input fields contain images, in order to create the single prediction, BigML will extract image features automatically to match what were used in the dataset to train the ensemble.

\includegraphics[]{images/ensemble-predictions/ensemble-predict-image-select-single}
Figure 2.53 Select a single image source in the image input field

The ensemble in Figure 2.53 , “grape-strawberry texture”, was created from a dataset containing image features Wavelet subbands. Creating a prediction using the ensemble will be directed to the prediction form which presents all input fields used by the ensemble. One of them is the image field. Because this is a single prediction, an image is input by using a single image source. Clicking on the input field box, single image sources available will be in the dropdown list. There is also a search box which can be used to locate specific ones.

\includegraphics[]{images/ensemble-predictions/ensemble-pred-image-list-components}
Figure 2.54 List the components of a composite source

Oftentimes single image sources were used for creating a composite source, they become component sources of the composite source. Or an image was uploaded as a part of an archive file (zip/tar) which created a composite source. In those cases, the composite source will be shown in the dropdown list, along with an icon “List components”. In the example in Figure 2.54 , predict-images.zip is a composite source, click on the icon to show its component sources.

\includegraphics[]{images/ensemble-predictions/ensemble-pred-image-select-components}
Figure 2.55 Select a component of a composite source

After the component sources of the composite are listed, scroll the dropdown list to find the desired one, then click to select it, as shown in Figure 2.55 . There is also a search box to locate specific component sources.

\includegraphics[]{images/ensemble-predictions/ensemble-pred-image-select-more-fields}
Figure 2.56 Ensemble prediction form, image field and more

In addition to images, ensembles may use other fields, which will be in the prediction form too. As shown in Figure 2.56 , all the fields can be selected, and their input values be set by dragging the knobs on the sliders or by entering precise values in their input boxes.

Once all fields are selected, click on the green button Predict to create a prediction.

\includegraphics[]{images/ensemble-predictions/ensemble-pred-image-prediction-created}
Figure 2.57 Ensemble single image prediction

After a new prediction is created, as shown in Figure 2.57 , the predicted class is at the top of the form along with its probability. The prediction interface is the same as ones created by non-image ensembles. Everything described earlier in this section (Predict ) applies.

Batch Predictions

BigML batch predictions allow you to make simultaneous predictions for multiple instances. All you need is the ensemble you want to use to make predictions and a dataset containing the instances which you want to use as prediction inputs. BigML will create a prediction for each instance in the dataset. Follow the steps detailed below to create a batch prediction:

  1. Select the batch prediction option under the ensemble 1-click menu (see Figure 2.58 ) or the create batch prediction option in the pop up menu of the list view (see Figure 2.59 .)

    \includegraphics[]{images/ensemble-predictions/batchpred-ensembles-one-click}
    Figure 2.58 Batch predictions option from ensemble 1-click menu
    \includegraphics[]{images/ensemble-predictions/batchpred-ensembles-pop-up}
    Figure 2.59 Batch predictions option from ensemble pop up menu
  2. Select the dataset containing all the instances you want to create a prediction for. The instances should contain the input values for the fields used by the ensemble as predictors. You can also select a subset of the ensemble fields to be taken into account by configuring your prediction (see Field Mapping .) BigML batch predictions can handle missing data in your prediction dataset (see Missing Strategies .)

  3. Optionally, select the ensemble you want to use for the prediction. BigML pre-selects the ensemble you created the batch prediction from at step 1, but you can change it at any time in the batch prediction view by selecting another ensemble from the ensemble selector displayed in the right pane. You can even switch to a model or logistic regression by selecting the corresponding icon in the top left menu.

    \includegraphics[]{images/ensemble-predictions/batchpred-ensembles-1}
    Figure 2.60 Select dataset for batch predictions
  4. After you have selected the ensemble and the dataset, the batch prediction configuration options (see subsection 2.6.3 ) will appear along with a preview of the prediction output, which is formatted as a comma-separated list of values (CSV format). (See Figure 2.61 .) The default output includes all the fields in your prediction’s dataset plus a last column containing the calculated predictions.

    Note: BigML does not include the predictions’ probability, confidence or expected error by default so you will have to configure your output file to include that information as explained in Output Settings .

    \includegraphics[]{images/ensemble-predictions/batchpred-ensembles-2}
    Figure 2.61 Configuration options displayed and output preview
  5. By default, BigML generates an output Dataset containing the batch prediction results. You find it in the BigML Dashboard’s dataset list view and can use it as any other dataset to analyze the batch prediction output afterwards. If you do not want a dataset with all the prediction results to be created, you can deselect the button highlighted in Figure 2.62

    \includegraphics[]{images/ensemble-predictions/batchpred-ensembles-3}
    Figure 2.62 Create dataset from batch predictions
  6. Once you are done configuring your batch prediction, click the predict green button to generate it. This process may take some time depending on the size of the input dataset. (See Figure 2.63 .)

    \includegraphics[]{images/ensemble-predictions/batchpred-ensembles-3-5}
    Figure 2.63 Create batch predictions
  7. After the batch prediction has been created, you will be able to download a file with all the instances found in your input dataset along with the prediction corresponding to each one of them. (See Figure 2.64 .)

    \includegraphics[]{images/ensemble-predictions/batchpred-ensembles-4}
    Figure 2.64 Download batch prediction output CSV file
  8. If you did not disable the option to create a dataset, as explained above (see step 4), an Output dataset button will also be available to allow you to directly jump to the output dataset. (See Figure 2.65 .)

    \includegraphics[]{images/ensemble-predictions/batchpred-ensembles-5}
    Figure 2.65 View batch predictions output dataset

Batch Prediction with Images

BigML ensembles can be trained from images using extracted image features (subsection 2.2.3 ). The input of a batch prediction is a dataset. So when creating a batch prediction with images, the dataset has to have the same image features used to train the ensemble. The image features are in the dataset used to create the ensemble.

\includegraphics[]{images/ensemble-predictions/ensemble-batchpred-images}
Figure 2.66 Batch prediction using an image dataset

As shown in Figure 2.66 , the input for the ensemble batch prediction is selected as predict-images texture, which is a dataset consisting of six images and contains a set of extracted image features, Wavelet subbands.

Image features are configured at the source level. For more information about the image features and how to configure them, please refer to section Image Analysis of the Sources with the BigML Dashboard [ 22 ] .

For the rest of batch predictions with images, including batch prediction configuration options and output datasets, everything stated earlier in current section (Batch Predictions ) applies.

2.6.3 Configuring Ensemble Predictions

BigML provides several options to change its default behavior when calculating predictions. For single predictions as well as for batch predictions you can configure the strategy used for handling missing values (see Missing Strategies .) In the case of Decision Forests you can also configure the prediction operating kind, i.e. the criteria used to combine the single tree predictions, confidence, probability or votes (see Combine single tree predictions: probability, confidence or votes .) For classification ensembles you can also use a threshold for your single an batch predictions. For batch preidctions you can configure the automatic fields mapping performed by BigML (Field Mapping ), and define the output file settings (Output Settings .)

Missing Strategies

When you create a new prediction, BigML will automatically navigate through the corresponding ensemble to find the leaf node that best classifies the new instance.

However, it may just so happen that your new data (the instances you want to predict) does not have populated values for all the fields used in building the original ensemble. For example, imagine that you are trying to predict diabetes and you have the patient’s glucose level and BMI (Body Mass Index) but not his blood pressure. If the ensemble arrives at a node where the blood pressure level is required, BigML can handle this missing value by using one of these two strategies:

  • Last prediction: it returns the prediction value and confidence of the parent node.

  • Proportional: it combines all subtrees’ predictions beneath the current node based on the data distribution of their child nodes in order to compute the prediction value and confidence.

For single predictions you can select either Missing strategies by clicking in the icons shown in Figure 2.67 .

\includegraphics[]{images/ensemble-predictions/pred-missing-strategies}
Figure 2.67 Missing strategies for single predictions

For batch predictions you can find both options under the configuration panel as shown in Figure 2.68 .

\includegraphics[]{images/ensemble-predictions/batchpred-missing-strategies}
Figure 2.68 Missing strategies for batch predictions

Combine single tree predictions: probability, confidence or votes

Ensembles are composed of several trees. Each tree returns a different prediction given an input data. These single predictions need to be combined to get a final prediction for the ensemble. For Boosted Trees there is only one way to combine the single tree predictions explained in Ensemble predictions: confidence, probability and expected error , so you will not see any options in the prediction view. For Decision Forests there are three different options (called operating kinds in BigML API) to combine the single tree predictions that compose the ensemble and get a final prediction:

  • Probabilities:

    • For classification ensembles per-class probabilities are averaged taking into account all trees composing the ensemble. The class with the highest probability is the winner class. For example find below a classification ensemble built using three decision trees trying to predict two classes, “True” and “False”:

      Trees

      True

      False

      Tree 1

      80%

      20%

      Tree 2

      40%

      60%

      Tree 3

      60%

      40%

      Ensemble

      60%

      40%

      Table 2.2 Example of classification ensemble using probability

      The predicted class is “True” because it has a higher probability (\([80\% +40\% +60\% ]/3=60\% \)) than the “False” class (\([20\% +60\% +40\% ]/3=40\% \)).

    • For regression ensemble the probability option averages the predictions of the trees composing the ensemble. For example, considering again three different trees in an ensemble but predicting a numeric output this time:

      Trees

      Total Sales

      Expected Error

      Tree 1

      $200

      $2.40

      Tree 2

      $250

      $2.10

      Tree 3

      $180

      $1.45

      Ensemble

      $210

      $1.98

      Table 2.3 Example of regression ensemble using probability

      The final result will be a total sale of $210 (\([\$ 200+\$ 250+\$ 180]/3=\$ 210\)) with an error of $1.98 (\([\$ 2.4+\$ 2.1+\$ 1.45]/3=\$ 1.98\)).

  • Confidences:

    • For classification ensembles per-class confidences are averaged taking into account all trees composing the ensemble. The class with the highest confidence is the winner class. It is calculated in the same way as the prediction in Table 2.2 but using the per-class confidences instead of the probabilities.

    • For regression ensembles the confidence option averages the predictions of the trees composing the ensemble in the same way that explained for probabilities in Table 2.3 but weighted by the expected error.

  • Votes:

    • For classification ensembles each tree prediction is considered as one vote. The “votes” of a given class is the percentage of trees in the ensemble that vote for that class. You can find below an example that shows a classification ensemble built using three decision trees to predict two classes, “True” and “False”:

      Trees

      Predicted class

      Tree 1

      True

      Tree 2

      False

      Tree 3

      True

      Ensemble

      True

      Table 2.4 Example of classification ensemble using votes

      Since there are more trees that predicted the class “True” (two versus only one tree that predicted the class “False”), the final prediction is “True” with the 66.67% of votes (\(2 / 3 =0.6667\)).

      Note: if two or more classes have the same number of votes, the first class in alphabetical order will take precedence.

    • For regression ensembles the votes option averages the predictions of the trees composing the ensemble. It gives the same results as the probability (see Table 2.3 ).

You can choose any option (probabilities, confidences or votes) from the BigML Dashboard to calculate the Decision Forests single predictions (see Figure 2.69 ) or batch predictions (see Figure 2.70 ).

\includegraphics[]{images/ensemble-predictions/operating-kinds}
Figure 2.69 Probabilities, confidences or votes for Decision Forest single predictions
\includegraphics[]{images/ensemble-predictions/operating-kinds-batch}
Figure 2.70 Probabilities, confidences or votes for Decision Forest batch predictions

Probability, confidence, and votes thresholds

The thresholds are only available for classification ensembles, and it usually makes sense for unbalanced binary classifications, when you want to minimize false positives at the cost of false negatives. The positive class will be predicted if the probability, confidence or the votes are greater than the given threshold, otherwise the following class with greater probability, confidence or votes will be predicted instead.

To configure a threshold for your single predictions follow these steps:

  1. Select the probability, the confidence, or votes measure depending on the criteria you want to use for Decision Forests (see Figure 2.71 ). To learn more about these three ways to combine single tree predictions in the ensemble refer to Combine single tree predictions: probability, confidence or votes . Boosted trees will only have the option to set a probability threshold.

    \includegraphics[]{images/ensemble-predictions/config-threshold0}
    Figure 2.71 Select probability, confidence or votes
  2. Select the positive class, i.e. the class for which you want to apply the threshold:

    \includegraphics[]{images/ensemble-predictions/config-threshold1}
    Figure 2.72 Select the positive class
  3. Set a value for the threshold using the slider. The positive class will only be predicted when the probability, confidence or votes of the prediction is above the established threshold, otherwise the following class with higher probability, confidence or votes will be predicted instead.

    \includegraphics[]{images/ensemble-predictions/config-threshold2}
    Figure 2.73 Set a threshold

For batch predictions, you will find the same options under the Configure panel. (See Figure 2.74 .)

\includegraphics[]{images/ensemble-predictions/batchpred-config-threshold}
Figure 2.74 Configure a threshold for batchpredictions

Default Numeric Value

If the dataset used to make the batch prediction contains instances with missing values for the numeric fields you can easily replace them by the field’s Mean, Median, Maximum, Minimum or by Zero using the Default numeric value before creating your batch prediction, (See Figure 2.75 .)

\includegraphics[]{images/ensemble-predictions/batchpred-default-numeric}
Figure 2.75 Default numeric value for batch predictions

Field Mapping

By default, BigML maps fields based on their names. If there is a mismatch between the field names in your ensemble and those in the input dataset you selected for the batch prediction, you can specify the right correspondence between the two sets of fields by explicitly assigning to each field appearing in the “Ensemble fields” column its associated input field in the “Dataset fields” column. (See Figure 2.76 .)

If the dataset’s and ensemble’s field names do not match but their IDs do, which happens when corresponding fields appear in the same order, you can tell BigML to use the field ID instead of the field name to map the fields. To do this, click the green switcher shown in Figure 2.76 .

If you do not want some of the fields to be considered during the evaluation, you can also manually search for those fields and remove them from the “Dataset fields” column.

\includegraphics[]{images/ensemble-predictions/fields-mapping}
Figure 2.76 Fields Mapping for batch predictions

The fields mapping from the BigML Dashboard has a limit of 200 fields. For batch predictions with a higher number of fields, use the argument field_map from BigML API if you need to map your fields.

Output Settings

As mentioned, batch predictions can create a file containing all input instances along with the predictions BigML calculated for each of them. Define the following settings to customize your prediction file:

  • Separator: this option allows you to choose a separator for your output file values. The default separator is the comma. You can also select the semicolon, the tab, or the space.

  • New line: this option allows you to set the new line character to use as the line break in the generated csv file: “LF”, “CRLF”.

  • Output fields: this option allows you to include or exclude any of your dataset fields from the output file from the preview shown in Figure 2.77 .

    Note: a maximum of 100 fields are displayed in the preview, but all your dataset fields are included in the output file by default unless you exclude them.

  • Headers: this option includes or excludes a first row in the output file (and in the output dataset) with the names of each column (input field names, prediction column name, probability and/or confidence column name, field importances column names, single tree predictions column names, etc.). By default, BigML includes the headers.

  • Prediction column name: this option allows you to customize the name for your predictions column. By default BigML uses the name of the ensemble’s objective field.

  • Probability, confidence (or expected error) or votes: this option allows you to include an additional column in the output file with the probability, the confidence (or expected error in case of regression trees) or the votes per instance. In the case of classifcation Boosted Trees you will always see the probability option and in the case of classification Decision Forests it will depend on the option selected to combine the single tree predictions (see Figure 2.70 ). These measures are not included by default in your batch predictions.

    Note: remember that the expected error cannot be calculated for Boosted Trees, therefore this option will be disabled for regression Boosted trees.

  • Probability, confidence (or expected error) or votes column name: this option allows you to customize the name for the probability, confidence (or expected error) or votes column in case you include it in the output file. By default BigML uses “probability”, “confidence” (for both confidence and expected error) or “votes”.

  • Single tree predictions: this option allows you to include a column for each of the individual model predictions only in the case of Decision Forests. That will add a column per model, named <prediction_name>_\(n\) where \(n\) is the position of the model in the model list in the ensemble, starting at 1.

  • All class confidences: this option allows you to include the probabilities for each class in the objective field for classification Decision Forests. There is a column per class, named "<class_name> confidence".

  • All class votes: this option allows you to include the votes for each class in the objective field for classification Decision Forests and Boosted trees. There is a column per class, named "<class_name> votes".

  • All class probabilities: this option allows you to include the probabilities for each class in the objective field for classification Decision Forests and Boosted trees. There is a column per class, named "<class_name> probability".

  • Importances: this option allows you to include a column for each of the field relative importances for the ensemble predictions (for Boosted trees and Decision Forests). There is a column per field, named "<field_name> importance".

\includegraphics[]{images/ensemble-predictions/output-settings}
Figure 2.77 Output settings for batch predictions

2.6.4 Visualizing Ensemble Predictions

Visualization of ensemble predictions changes depending on whether you are predicting one single instance or you are predicting multiple instances using the batch predictions option. (See Single Predictions .)

Single Predictions

There are some essential differences between Decision Forests and Boosted Trees predictions, hence the visualizations for both of them differ. These differences are highlighted in the following paragraphs.

For single predictions you can find the prediction for your objective field at the top of the form along with the performance measure.

For classification ensembles, you will get the predicted class at the top and all the objective field class probabilties in the histogram below as shown in Figure 2.78 .

\includegraphics[]{images/ensemble-predictions/viz-boosted}
Figure 2.78 Single predictions view for classification ensembles

For regression Decision forest ensembles you will get a numeric prediction and the expected error for that prediction as shown in Figure 2.79 .

\includegraphics[]{images/ensemble-predictions/regress-single-view}
Figure 2.79 Single predictions view for regression ensembles

For regression Boosted trees ensembles you will get a numeric prediction (see Figure 2.80 ) but the expected error for that prediction cannot be calculated as in the case of Decision Forests (see subsection 2.2.1 ).

\includegraphics[]{images/ensemble-predictions/viz-boosted-regress}
Figure 2.80 Single predictions view for regression Boosted Trees

In any of the above-mentioned cases, you can change any time the value of the displayed input fields to have your prediction recalculated in real-time.

If you have saved your prediction, you can go back to it and visualize it.

Read a detailed explanation of confidences, probabilities and expected error calculations in Ensemble predictions: confidence, probability and expected error .

Prediction explanation

Prediction explanation helps understand why an ensemble makes a certain prediction. This is very useful in many applications, and the reasons behind an ensemble’s prediction are often as important as the prediction itself.

BigML prediction explanation is based on Shapley values. For more information, please refer to this research paper: A Unified Approach to Interpreting Model Predictions [ 25 ] .

For any classification or regression ensemble, you can request the explanation for the prediction by clicking the prediction explanation icon and then click Predict (see Figure 2.81 ).

\includegraphics[]{images/ensemble-predictions/prediction-explan-ensembles}
Figure 2.81 Explain prediction

The prediction explanation represents the most important factors considered by the ensemble in a prediction given the input values. Each input value will yield an associated importance, as you can see Figure 2.82 . The importances across all input fields should sum 100%.

\includegraphics[]{images/ensemble-predictions/prediction-explan-ensembles2}
Figure 2.82 Input field importances

For some input fields you will see a “+” icon next to the importance. This is because the importance may not be directly associated with the input value, i.e., it can be explained by other reasons. In the Figure 2.83 below, the importance of 13.62% for the field “Fare today” is not explained by this field being equal to 24,719. Rather, it is because this field value is not missing (which accounts for an importance of 8.80%) and because it is higher than 17,000 (4.82% of importance).

\includegraphics[]{images/ensemble-predictions/prediction-explan-ensembles3}
Figure 2.83 See the detailed explanation

The prediction explanation for ensembles is calculated using the results of over a thousand distinct predictions using random perturbations of the input data. For this reason, the calculation of the explanation may take some time to be computed.

Note: the input field importances in the prediction explanation are different from the overall field importances of the ensemble. A field can be very important for the ensemble but insignificant for a given prediction.

Batch Predictions

For batch predictions, you always get a file and an optional output dataset.

Output File From the batch prediction view, you can access the output file containing your predictions for each of your dataset instances in the last column. (See Figure 2.84 .) You can configure several options to customize your output file including the separator for the columns, the name of your prediction column, the dataset fields you want to include, whether you want to include the a first row with the headers for your column names. You can find a detailed explanation of those options in Output Settings .

\includegraphics[]{images/ensemble-predictions/batchpred-ensembles-4}
Figure 2.84 Download batch prediction output file

See an output file example in Figure 2.85 where the two last columns contain the prediction and the confidence for each instance.

Pregnancies,Glucose,Blood pressure,Skinfold,Insulin,BMI,Diabetes,Confidence
  8,183,64,0,0,23.3,True,0.6574
  5,116,74,0,0,25.6,False,0.845
  10,115,0,0,0,35.3,True,0.6469
  8,125,96,0,0,0.0,False,0.9356
  1,189,60,23,846,30.1,True,0.7574
  1,103,30,38,83,43.3,False,0.675
  7,103,66,32,0,39.1,False,0.7682
  1,101,50,15,36,24.2,False,0.948
  0,100,88,60,110,46.8,False,0.5413
Figure 2.85 An example of a batch prediction output file

Output Dataset By default, BigML creates a dataset out of your batch prediction. (See Output Settings .) You can access your output dataset from the batch prediction view by clicking the Output dataset button shown in Figure 2.87 .

\includegraphics[]{images/ensemble-predictions/batchpred-ensembles-5}
Figure 2.86 View batch predictions output dataset

In the output dataset you can find an additional field (named by default as per your ensemble’s objective field) containing the predictions for each one of your instances. If you configured your batch prediction to include the confidence or expected error you will be able to find it in the last field of your output dataset as shown in Figure 2.87 .

\includegraphics[]{images/ensemble-predictions/batchpred-ensembles-dataset2}
Figure 2.87 Batch predictions output dataset

Batch Prediction 1-Click Actions

From the batch prediction view you can perform the following actions (see Figure 2.88 ):

  • batch prediction again: this option will redirect you to the batch prediction creation view, with the same ensemble and prediction dataset already selected. This option allows you to rapidly recreate the batch prediction using a different configuration.

  • batch prediction with another dataset: this option allow you to easily create a batch prediction using the same ensemble and a different dataset.

  • batch prediction using another ensemble: this option allows you easily create a batch prediction using the same dataset and a different ensemble.

  • new batch prediction: this option redirects you to the batch prediction creation view where you can select a prediction dataset and an ensemble to create your prediction.

\includegraphics[]{images/ensemble-predictions/batchpred-one-click}
Figure 2.88 Batch prediction 1-click actions

2.6.5 Consuming Ensemble Predictions

BigML provides plenty of means for developers to integrate BigML ensemble predictions within their apps. In the following sections, we will describe how you can use BigML REST API and BigML Python bindings to work with ensemble predictions.

Using Ensemble Predictions via the BigML API

Ensemble predictions have full citizenship in the BigML API. This means you can programmatically create, update, list, delete, and use them for predictions. For example, this is how you can create a single prediction using the command line from a given ensemble and defining the input data. This will require properly setting the BIGML_AUTH environment variable to contain your authentication credentials:

curl "https://bigml.io/prediction?$BIGML_AUTH" \ -X POST \ -H
'content-type: application/json' \ -d '{"ensemble":
  "ensemble/50650bdf3c19201b64000020", "input_data": {"000001": 3,
    "000002":4.5, "000003"}}}'

For more information on using ensemble predictions through the BigML API, please refer to prediction REST API documentation.

Using Ensemble Predictions via the BigML Bindings

BigML bindings provide a convenient way to access the BigML REST API from your language of choice. They offer a higher-level view of BigML Machine Learning resources and algorithms in a number of languages, including Python, Node.js, Java, Swift, and Objective-C. For example, this is how you can create an ensemble prediction in Python using BigML bindings:

from bigml.api import BigML
api = BigML()
prediction = api.create_prediction("ensemble/573d997058a27e0f620038df",
                                   {"sepal length": 5,
                                    "sepal width": 2.5},
                                   {"name": "my prediction"})

BigML bindings also provide the means to carry through predictions locally, without ever hitting the network, which can greatly improve the latency of predicting from your apps. This is made possible by BigML ensembles being white-box, meaning you can download them and use them independently from BigML. For example, the following code snippet shows how you can download an ensemble and use it for making a local prediction using BigML bindings for Python:

from bigml.ensemble import Ensemble
from bigml.api import BigML
api = BigML()
ensemble = api.get_ensemble("ensemble/502fdbff15526876610002615",
                      query_string="only_ensemble=true;limit=-1")

local_ensemble = Ensemble(ensemble)
prediction = local_ensemble.predict({"petal length": 3, "petal width": 1})

For more information on using ensembles through the BigML API, please refer to BigML bindings documentation.

2.6.6 Descriptive Information

Descriptive information is what allows you to describe a prediction so you can find it later and easily recognize it among other predictions.

Each prediction is associated with name, description, category, and tags. See the following sections for a brief description of each concept. In Figure 2.89 , you can see the possibilities that the More info menu option gives to edit them.

\includegraphics[]{images/ensemble-predictions/edit-pred}
Figure 2.89 Edit predictions

Name

If you do not specify a name for your predictions, BigML assigns a default name depending on the type of predictions:

  • Single predictions: the name always follows the structure “Prediction for <objective field name>

  • Batch predictions: BigML combines your prediction dataset name and the ensemble name: “Batch prediction of <ensemble name> with <dataset name>”.

Predictions names are displayed on the list view and also on the top bar of a prediction view. Predictions names are indexed to be used in searches. You can rename your predictions at any time from the More info panel. The name of a prediction cannot be longer than 256 characters. More than one prediction can have the same name even within the same project, since they are automatically assigned unique internal identifiers.

Description

Each ensemble prediction also has a description that it is very useful for documenting your Machine Learning projects. Predictions take the description from the ensembles used to create them.

Descriptions can be written using plain text and also markdown. BigML provides a simple markdown editor that accepts a subset of markdown syntax. (See Figure 2.90 .)

\includegraphics[width=0.5\textwidth ]{images/ensemble-predictions/ensemble-description}
Figure 2.90 Markdown editor for evaluations descriptions

Descriptions cannot be longer than 8192 characters and can use almost any character.

Category

Each prediction is associated with a category taken from ensemble used to create it. Categories are useful to classify predictions according to the domain which your data comes from. This is useful when you use BigML to solve problems across industries or multiple customers.

A prediction category must be one of the categories listed on table Table 2.5 .

Table 2.5 Categories used to classify predictions by BigML

Category

Aerospace and Defense

Automotive, Engineering and Manufacturing

Banking and Finance

Chemical and Pharmaceutical

Consumer and Retail

Demographics and Surveys

Energy, Oil and Gas

Fraud and Crime

Healthcare

Higher Education and Scientific Research

Human Resources and Psychology

Insurance

Law and Order

Media, Marketing and Advertising

Miscellaneous

Physical, Earth and Life Sciences

Professional Services

Public Sector and Nonprofit

Sports and Games

Technology and Communications

Transportation and Logistics

Travel and Leisure

Uncategorized

Utilities

Tags

A prediction can also have a number of tags associated with it that can help to retrieve it via the BigML API or to provide predictions with some extra information. Your prediction inherits the tags from the ensemble use to create it. Each tag is limited to a maximum of 128 characters. Each prediction can have up to 32 different tags.

2.6.7 Ensemble Predictions Privacy

The link displayed in the More info panel is the private URL of your prediction, so only a user logged into your account is able to see it. Neither single predictions nor batch predictions can be shared from your BigML Dashboard by sharing a link, as you can do with other resources.

\includegraphics[]{images/ensemble-predictions/pred-privacy}
Figure 2.91 Private link of a prediction

2.6.8 Moving Ensemble Predictions to Another Project

When you create a prediction it will be assigned to the same project where the original ensemble is located. You cannot move predictions between projects as you do with other resources.

2.6.9 Stopping Ensembles Predictions

Single predictions are synchronous resources, so you cannot cancel them during the creation since you get the result immediately.

Batch predictions are asynchronous resources, so you can stop the creation before the task is finished. You can use the delete option from the 1-click action menu (Figure 2.92 ) or from the pop up menu on the prediction list view. (See Figure 2.93 .) You can see in Figure 2.93 that the objective field column has the label processing to indicate the batch prediction is still in progress. If you stop the prediction during its creation, you will not be able to resume the same task again, so if you want to create the same prediction, you will have to restart a new task.

\includegraphics[]{images/ensemble-predictions/stop-pred-one-click}
Figure 2.92 Stop prediction from the 1-click menu
\includegraphics[]{images/ensemble-predictions/delete-pred-pop-up}
Figure 2.93 Stop prediction from the predictions list view

2.6.10 Deleting Ensemble Predictions

You can delete your single or batch predictions from the predictions view, using the 1-click action menu (see Figure 2.94 ) or using the pop up menu on the predictions list view (see Figure 2.95 .)

\includegraphics[]{images/ensemble-predictions/delete-pred-one-click}
Figure 2.94 Delete prediction from the 1-click menu
\includegraphics[]{images/ensemble-predictions/delete-pred-pop-up}
Figure 2.95 Delete prediction from popup menu

A modal window will be displayed asking you for confirmation. Once a prediction is deleted, it is permanently deleted and there is no way you (or even the IT folks at BigML) can retrieve it.

\includegraphics[]{images/ensemble-predictions/delete-pred-confirm}
Figure 2.96 Delete prediction confirmation