Classification and Regression with the BigML Dashboard

4.6 Logistic Regression Predictions

4.6.1 Introduction

The ultimate goal in building a logistic regression is being able to make predictions with it. In BigML, you can make predictions for single instances or for many instances in batch. Each prediction comes with a measure indicating the probability of the predicted class, a percentage ranging from 0% up to 100%. For each prediction, BigML also provides the probabilities for the rest of classes in the objective field.

The predictions tab in the main menu of the BigML Dashboard is where all your saved predictions are listed. (See Figure 4.81 .) You can search your predictions by name clicking on the search option on the top menu. In the predictions list view, you can see, for each prediction, the logistic regression icon used for the prediction, the Name of the prediction, the Objective (objective field name), the Prediction (the prediction result), and the Age (time since the prediction was created).

\includegraphics[]{images/logisticregression/lr-pred-listings}
Figure 4.81 Predictions list view

When you first create an account at BigML, or every time that you start a new Project, your list of predictions will be empty. (See Figure 4.82 .)

\includegraphics[]{images/logisticregression/lr-pred-listing}
Figure 4.82 Empty predictions list view

Logistic regression predictions are saved under the Classification & Regression option in the menu (see Figure 4.83 .)

\includegraphics[width=15cm]{images/logisticregression/menu-options-predictions-list-view}
Figure 4.83 Menu options of the predictions list view

Select the list for your single instances predictions or your batch predictions by clicking on the corresponding icons. (See Figure 4.84 and Figure 4.85 .)

\includegraphics[width=2cm]{images/logisticregression/predictions-icon}
Figure 4.84 Single predictions icon
\includegraphics[width=2cm]{images/logisticregression/batchpredictions-icon}
Figure 4.85 Batch predictions icon

4.6.2 Creating Logistic Regression Predictions

BigML provides two options to predict with your logistic regressions explained in the following subsections:

  • predict: to predict one single instance

  • batch prediction to predict multiple instances in batch.

Predict

BigML allows you to quickly make predictions for single instances by providing a form containing the input fields used by the logistic regression, so you can easily set the values and get an immediate response.

Follow these steps to create a single prediction:

  1. Click predict in the logistic regression 1-click action menu. (See Figure 4.86 .)

    \includegraphics[]{images/logisticregression/lr-predict-one-click}
    Figure 4.86 Predict using the 1-click action menu

    Alternatively, click predict in the pop up menu in the list view. (See Figure 4.87 .)

    \includegraphics[]{images/logisticregression/lr-predict-pop-up}
    Figure 4.87 Predict using the pop up menu
  2. You will be redirected to the prediction form where you will find all the fields used by the logistic regression as input fields. (See Figure 4.88 .)

    \includegraphics[]{images/logisticregression/lr-pred-form}
    Figure 4.88 Logistic regression prediction form
  3. Select the fields to be used for your prediction (Figure 4.89 .) Non-selected fields will be considered as missing values during the prediction.

    \includegraphics[]{images/logisticregression/lr-pred-select-fields}
    Figure 4.89 Logistic regression predictions form

    If your logistic regression was not trained with Missing numerics (see subsection 4.4.5 ) you won’t be able to disable your numeric fields for the prediction and a warning message will appear instead: “This field cannot be disabled because your model has been trained without missing numerics”.

  4. Set input values for your selected fields. BigML supports numeric, categorical, text and items fields as inputs.

  5. Get the prediction at the top of the view along with the predicted class probability. (See Figure 4.90 .) BigML predictions are synchronous, i.e., when you send the input data, you get an immediate response. Moreover, single predictions from the BigML Dashboard are performed locally, so unless you save your prediction, it will not consume any credits and it will be updated instantly when you change your input values. Learn more about Local predictionss in Figure 4.116 .

    \includegraphics[]{images/logisticregression/lr-prediction}
    Figure 4.90 Get the logistic regression prediction
  6. Below the prediction you can see a histogram view containing the rest of your class probabilities (Figure 4.91 ). You can download all the probabilities in PNG format, in CSV or JSON file by clicking on the corresponding icons. (See Single Predictions .)

    \includegraphics[]{images/logisticregression/lr-classes-prob}
    Figure 4.91 All classes probabilities distribution
  7. Optionally, you can Save the logistic regression prediction, so you will find it afterwards in the predictions list view. (See Figure 4.92 .)

    \includegraphics[]{images/logisticregression/lr-save-pred}
    Figure 4.92 Save your logistic regression predictions

Note: this option is only available from the BigML Dashboard for logistic regressions with less than 100 fields. If you want to perform single instance predictions for a higher number of fields, use the BigML API.

Logistic Regression Prediction with Images

BigML logistic regressions can be trained from images using extracted image features (subsection 4.2.3 ). Because image features are automatically generated numeric fields, creating logistic regression predictions with images is the same as creating other logistic regressions. The only thing different is input fields of images.

Note: When the input fields contain images, in order to create the single prediction, BigML will extract image features automatically to match what were used in the dataset to train the logistic regression.

\includegraphics[]{images/logisticregression/lr-predict-image-select-single}
Figure 4.93 Select a single image source in the image input field

The logistic regression in Figure 4.93 , “grape-strawberry resnet18”, was created from a dataset containing image features extracted from a pre-trained CNN, ResNet-18. Creating a prediction using the logistic regression will be directed to the prediction form which presents all input fields used by the logistic regression. One of them is the image field. Because this is a single prediction, an image is input by using a single image source. Clicking on the input field box, single image sources available will be in the dropdown list. There is also a search box which can be used to locate specific ones.

\includegraphics[]{images/logisticregression/lr-predict-image-select-list-components}
Figure 4.94 List the components of a composite source

Oftentimes single image sources were used for creating a composite source, they become component sources of the composite source. Or an image was uploaded as a part of an archive file (zip/tar) which created a composite source. In those cases, the composite source will be shown in the dropdown list, along with an icon “List components”. In the example in Figure 4.94 , predict-images.zip is a composite source, click on the icon to show its component sources.

\includegraphics[]{images/logisticregression/lr-predict-image-select-components}
Figure 4.95 Select a component of a composite source

After the component sources of the composite are listed, scroll the dropdown list to find the desired one, then click to select it, as shown in Figure 4.95 . There is also a search box to locate specific component sources.

\includegraphics[]{images/logisticregression/lr-predict-image-select-more-fields}
Figure 4.96 Logistic regression image prediction form, more fields

In addition to images, logistic regressions may use other fields, which will be in the prediction form too. As shown in Figure 4.96 , all the fields can be selected, and their input values be set by dragging the knobs on the sliders or by entering precise values in their input boxes.

Once all fields are selected, click on the green button Predict to create a prediction.

\includegraphics[]{images/logisticregression/lr-predict-image-prediction-created}
Figure 4.97 Logistic regression image single prediction

After a new prediction is created, as shown in Figure 4.97 , the predicted class is at the top of the form along with its probability. The prediction interface is the same as ones created by non-image logistic regression. Everything described earlier in this section (Predict ) applies.

Batch Predictions

BigML batch predictions allow you to make simultaneous predictions for multiple instances. All you need is the logistic regression you want to use to make predictions and a dataset containing the instances you want to predict. BigML will create a prediction for each instance in the dataset.

Follow these steps to create a batch prediction:

  1. Click on batch prediction option under the logistic regression 1-click action menu (Figure 4.98 )

    \includegraphics[]{images/logisticregression/lr-batchpred-one-click}
    Figure 4.98 Create batch prediction using 1-click action menu

    Alternatively, click on create batch prediction in the pop up menu of the list view (Figure 4.99 ).

    \includegraphics[]{images/logisticregression/lr-batchpred-popup}
    Figure 4.99 Create batch prediction using pop up menu
  2. Select the dataset containing all the instances you want to predict. The instances should contain the input values for the fields used by the logistic regression as input fields. From this view you can also select another logistic regression from the selector or even a model or ensemble by clicking on the icons on the top left menu. (See Figure 4.100 .)

    \includegraphics[]{images/logisticregression/lr-batchpred-select-dataset}
    Figure 4.100 Select dataset for batch prediction
  3. After the logistic regression and the dataset are selected, the batch prediction configuration options will appear along with a preview of the prediction output (a CSV file). (See Figure 4.101 .) The default output format includes all your prediction dataset fields and adds an extra column with the class predicted. See subsection 4.6.3 ofr a detailed explanation of all configuration options.

    \includegraphics[]{images/logisticregression/lr-batchpred-configure}
    Figure 4.101 Configuration options for logistic regression batch prediction
  4. By default, BigML generates an output Dataset with your batch predictions that you can later find in your datasets section in the BigML Dashboard. This option is active by default but you can deactivate it by clicking on the icon shown in Figure 4.102 .

    \includegraphics[]{images/logisticregression/lr-batchpred-dataset}
    Figure 4.102 Create a dataset from batch prediction
  5. After you configure your batch prediction, click on the green button Predict to generate your batch prediction. (See Figure 4.103 .)

    \includegraphics[]{images/logisticregression/lr-batchpred-predict}
    Figure 4.103 Create batch prediction
  6. When the batch prediction is created, you will be able to download the CSV file containing all your dataset instances along with a prediction for each one of them. (See Figure 4.104 .)

    \includegraphics[]{images/logisticregression/lr-batchpred-csv}
    Figure 4.104 Download batch prediction CSV file
  7. If you didn’t disable the option to create a dataset explained in step 4, you will also be able to access the output dataset from the batch prediction view. (See Figure 4.105 .)

    \includegraphics[]{images/logisticregression/lr-batchpred-output-dataset}
    Figure 4.105 Batch prediction output dataset

Batch Prediction with Images

BigML logistic regression can be trained from images using extracted image features (subsection 4.2.3 ). The input of a batch prediction is a dataset. So when creating a batch prediction with images, the dataset has to have the same image features used to train the logsitic regression. The image features are in the dataset used to create the logistic regression.

\includegraphics[]{images/logisticregression/lr-batchpred-images}
Figure 4.106 Batch prediction using an image dataset

As shown in Figure 4.106 , the input for the logistic regression batch prediction is selected as predict-images resnet18, which is a dataset consisting of six images and contains image features extracted from a pre-trained CNN, ResNet-18.

Image features are configured at the source level. For more information about the image features and how to configure them, please refer to section Image Analysis of the Sources with the BigML Dashboard [ 22 ] .

For the rest of batch predictions with images, including batch prediction configuration options and output datasets, everything stated earlier in current section (Batch Predictions ) applies.

4.6.3 Configuring Logistic Regression Predictions

BigML provides several options to configure your predictions such as setting a probability threshold Probability threshold ), default values for your missing numeric values (see Default Numeric Value ), fields mapping (see Fields Mapping ), and output file settings (see Output Settings .)

Probability threshold

Probability thresholds usually makes sense when you want to minimize false positives at the cost of false negatives. The positive class will be predicted if its probability is greater than the given threshold, otherwise the following class with greater probability will be predicted instead.

To configure a threshold for your predictions follow these steps:

  1. Use the selector shown in Figure 4.107 to select the positive class, i.e., the class for which you want to apply the probability threshold.

    \includegraphics[]{images/logisticregression/single-lr-threshold}
    Figure 4.107 Select the positive class for single predictions
  2. Then set a threshold value between 0% and 100% using the slider (see Figure 4.108 ). The positive class will be predicted if its probability is greater than the given threshold, otherwise the following class with greater probability will be predicted instead.

    \includegraphics[]{images/logisticregression/single-lr-threshold2}
    Figure 4.108 Set the probability threshold for single predictions

For batch predictions you can find the same options under the configure panel:

\includegraphics[]{images/logisticregression/batch-lr-threshold}
Figure 4.109 Set the probability threshold for batch predictions

Default Numeric Value

If the dataset used to make the batch prediction contains instances with missing values for the numeric fields, the prediction will not be computed for them, unless you built the logistic regression enabling the Missing numerics parameter (see subsection 4.4.5 ).

By using the Default numeric value before creating your batch prediction, you can easily replace all the missing numeric values by the field’s Mean, Median, Maximum, Minimum or by Zero. (See Figure 4.110 .)

\includegraphics[]{images/logisticregression/lr-pred-default-numeric}
Figure 4.110 Configure Default numeric value for batch prediction

Fields Mapping

You can specify which input fields of the logistic regression match with the fields in the dataset contaning the instances you want to predict. BigML automatically matches fields by name, but you can also set an automatic match by field ID by clicking on the green switcher. Additionally, you can manually search for fields or remove them from the Dataset fields column if you do not want them to be considered during the batch prediction. (See Figure 4.111 .)

\includegraphics[]{images/logisticregression/lr-fields-mapping}
Figure 4.111 Configure the fields mapping for batch prediction

Note: Fields mapping from the BigML Dashboard is limited to 200 fields. For batch predictions with a higher number of fields, map your fields using the BigML API.

Output Settings

Batch predictions return a CSV file containing all your instances and the final predictions. Tune the following settings to customize your prediction file (see Figure 4.112 ):

  • Separator: this option allows you to choose the best separator for your output file columns. The default separator is the comma. You can also select the semicolon, the tab, or the space.

  • New line: this option allows you to set the new line character to use as the line break in the generated csv file: “LF”, “CRLF”.

  • Output fields: by clicking on the list icon next to the separator selector, you can include or exclude all your dataset fields from your output file. You can also individually select the fields you want to include or exclude using the multiple output fields selector. Note: a maximum of 100 fields can be displayed in this selector, but all your dataset fields will be included in the output file by default unless you exclude them.

  • Headers: this option includes or excludes a first row in the output file (and in the output dataset) with the names of each column (input field names, prediction column name, probability column name, etc.). By default, BigML includes the headers.

  • Prediction column name: customize the name for your predictions column. By default, BigML takes the name of the logistic regression’s objective field.

  • Probability: this option allows you to include an additional column with the probability for the predicted class. By default it is not included in your ouput file.

  • Probability column name: customize the name for the probability column if you include it in the output file. BigML sets “probability” as the default name.

  • All class probabilities: this includes all the probabilities of the objective field classes per instance. This option will add \(n\) extra columns, one by class in the objective field.

\includegraphics[]{images/logisticregression/lr-output-settings}
Figure 4.112 Logistic regression output settings for batch predictions

4.6.4 Visualizing Logistic Regression Predictions

Logistic regression predictions visualization changes depending on if you are predicting one single instance (Single Predictions ), or you are predicting multiple instances using the batch predictions option (Batch Prediction ).

Single Predictions

For single predictions, find the predicted class given the input fields values at the top of the form along with its probability. (See Figure 4.113 .)

\includegraphics[]{images/logisticregression/lr-single-pred}
Figure 4.113 Logistic regression single prediction

Below the prediction, there’s a histogram representing the rest of the objective field class probabilities. All the class probabilities must sum 100%. Show or hide this view by clicking on the icon highlighted in Figure 4.114 .

\includegraphics[]{images/logisticregression/lr-classes-prob2}
Figure 4.114 Logistic regression all class probabilities

You can see up to seven different classes at the same time; if you have more than seven classes, you will see that some arrow icons appear next to the histogram so you can see the rest of classes.

Export this view in PNG format, in a CSV file, or in a JSON file by clicking on the corresponding icons. (See Figure 4.115 .)

\includegraphics[]{images/logisticregression/lr-classes-prob-export}
Figure 4.115 Logistic regression export all class probabilities histogram

Set a probability threshold for a selected positive class to minimize false negatives at the cost of false positives. If the probability for the positive class is greater than the established threshold, then the positive class will be predicted. Otherwise, the next class with higher probability will be predicted instead.

\includegraphics[]{images/logisticregression/lr-prob-threshold}
Figure 4.116 Logistic regression probability threshold

Local Predictions

Local predictionss are provided for single instances from the BigML Dashboard which are performed faster at no cost. Local predictions allow you to get a real-time prediction without consuming any credits or requiring any internet connection. This is possible because the logistic regression is saved in-memory, so when the input values change, BigML is able to compute predictions in microseconds.

Prediction explanation

Prediction explanation helps understand why a logistic regression makes a certain prediction. This is very useful in many applications, and the reasons behind a prediction are often as important as the prediction itself.

BigML prediction explanation is based on Shapley values. For more information, please refer to this research paper: A Unified Approach to Interpreting Model Predictions [ 25 ] .

For any logistic regression, you can request the explanation for the prediction by clicking the prediction explanation icon and then click Save (see Figure 4.117 ).

\includegraphics[]{images/logisticregression/prediction-explan-logisticregression}
Figure 4.117 Explain prediction

The prediction explanation represents the most important factors considered by the logistic regression in a prediction given the input values. Each input value will yield an associated importance, as you can see Figure 4.118 . The importances across all input fields should sum 100%.

\includegraphics[]{images/logisticregression/prediction-explan-logisticregression2}
Figure 4.118 Input field importances

For some input fields you will see a “+” icon next to the importance. This is because the importance may not be only directly associated with the input value, i.e., it can also be explained by other reasons. In the Figure 4.119 below, the importance of 28.81% for the field “Class/Dept” is not only explained by this field being equal to “1st Class”. Rather, it is because this field value is not “3rd Class” (which accounts for the majority of importance, 23.98%) and, then because it is “1st Class” (4.83% of importance).

\includegraphics[]{images/logisticregression/prediction-explan-logisticregression3}
Figure 4.119 See the detailed explanation

The prediction explanation for logistic regressions is calculated using the results of over a thousand distinct predictions using random perturbations of the input data. For this reason, the calculation of the explanation may take some time to be computed.

Batch Prediction

After creating your batch prediction, you get a CSV file and, optionally, an output dataset. Both outputs are explained in the following subsections.

Output CSV file

The batch prediction generates a CSV file containing your predictions for each of your dataset instances in the last column. (See Figure 4.120 .)

\includegraphics[]{images/logisticregression/lr-batchpred-csv}
Figure 4.120 Download batch prediction CSV file

You can configure several options to customize your CSV file. You can find a detailed explanation of those options in Output Settings .

See an output CSV file example in Figure 4.121 . The column class in this example contains the final prediction (it is named by default as your logistic regression’s Objective Field). In this case we are predicting whether a person is a good or a bad candidate for holding a credit. This file has been configured to contain also the probability for each prediction.

duration,age,amount,purpose,class,probability
24,26,5433,used car,good,0.88785
36,42,8086,new car,bad,0.55526
24,28,1376,radio/tv,good,0.8385
48,31,6758,radio/tv,bad,0.73576
26,30,7966,used car,good,0.7201
12,42,2577,furniture/equipment,good,0.67644
36,30,4455,business,good,0.52227
18,32,1442,new car,bad,0.75488
9,22,276,new car,good,0.57819
Figure 4.121 An example of a logistic regression batch prediction CSV file

Output Dataset

By default BigML automatically creates a dataset out of your batch prediction. You can disable this option by configuring your batch prediction. (See Output Settings .) You will find the output dataset in your batch prediction view as shown in Figure 4.122 .

\includegraphics[]{images/logisticregression/lr-batchpred-output-dataset}
Figure 4.122 Batch prediction output dataset

In the output dataset, you can find an additional field (named by default as your logistic regression’s objective field) containing the class predicted for each one of your instances. If you configured your batch prediction to include the prediction probabilities and all class probabilites you will be able to find them in the last fields of your output dataset. (See Figure 4.123 .)

\includegraphics[]{images/logisticregression/lr-output-dataset}
Figure 4.123 Logistic regression batch prediction output dataset

4.6.5 Consuming Logistic Regression Predictions

You can fully used single and batch predictions via the BigML API and bindings. The following subsections explain both tools.

Using Logistic Regression Predictions via the BigML API

Logistic regression predictions have full citizenship in the BigML API which allows you to programmatically create, configure, retrieve, list, update, and delete single and batch predictions.

In the example below, see how to create a single prediction using a logistic regression and defining the input data once you have properly set the BIGML_AUTH environment variable to contain your authentication credentials:

curl "https://bigml.io/prediction?$BIGML_AUTH" \
    -X POST \
    -H 'content-type: application/json' \
    -d '{"logisticregression": "logisticregression/50650bdf3c19201b64000020",
         "input_data": {"000001": 3, "000002":4.5, "000003":5}}}'

For more information on using logistic regressions through the BigML API, please refer to the documentation.

Using Logistic Regression Predictions via BigML Bindings

You can also create, configure, retrieve, list, update, and delete single and batch predictions via BigML bindings which are libraries aimed to make it easier to use the BigML API from your language of choice. BigML offers bindings in multiple languages including Python, Node.js, Java, Swift and Objective-C. See below an example to create a logistic regression with the Python bindings.

from bigml.api import BigML
api = BigML()
prediction = api.create_prediction(
    "logisticregression/50650bdf3c19201b64000020",
    {"credit_amount": 5, "duration": 2.5})

For more information on BigML bindings, please refer to the bindings page.

4.6.6 Descriptive Information

Each logistic regression prediction has an associated name, description, category, and tags. You can find a brief description of each concept in the following subsections. The More info menu option displays a panel that provides editing options. (See Figure 4.124 .)

\includegraphics[]{images/logisticregression/lr-pred-edit}
Figure 4.124 Logistic regression prediction descriptive information

Name

If you do not specify a name for your predictions, BigML assigns a default name depending on the type of predictions:

  • Single predictions: the name always follows the structure “Prediction for <objective field name>”.

  • Batch predictions: BigML combines your prediction dataset name and the logistic regression name: “Batch prediction of <logistic regression name> with <dataset name>”.

Predictions names are displayed on the list and also on the top bar of a prediction view. Predictions names are indexed to be used in searches. Rename your predictions any time from the More info menu.

The name of a prediction cannot be longer than 256 characters. More than one prediction can have the same name even within the same project, but they will always have different identifiers.

Description

Each prediction also has a description that it is very useful for documenting your Machine Learning projects. Predictions take their description from the logistic regression used to create them.

Descriptions can be written using plain text and also markdown. BigML provides a simple markdown editor that accepts a subset of markdown syntax. (See Figure 4.125 .)

\includegraphics[width=0.5\textwidth ]{images/logisticregression/lr-description}
Figure 4.125 Markdown editor for logistic regression descriptions

Descriptions cannot be longer than 8192 characters.

Category

A category taken from the logistic regression used to create it is associated with each prediction. Categories are useful to classify predictions according to the domain which your data comes from. This is useful when you use BigML to solve problems across industries or multiple customers.

A prediction category must be one of the categories listed on table Table 4.5 .

Tags

A prediction can also have a number of tags associated with it. These tags help to retrieve the prediction via the BigML API or to provide predictions with some extra information. Your prediction inherits the tags from the logistic regression used to create it. Each tag is limited to a maximum of 128 characters. Each prediction can have up to 32 different tags.

4.6.7 Logistic Regression Predictions Privacy

The link displayed in the Privacy panel is the private URL of your prediction, so only a user logged into your account is able to see it. Neither single predictions nor batch predictions can be shared by using a secret link. (See Figure 4.126 .)

\includegraphics[]{images/logisticregression/lr-pred-privacy}
Figure 4.126 Logistic regression predictions privacy

4.6.8 Moving Logistic Regression Predictions to Another Project

When you create a prediction, it will be assigned to the same project where the original logistic regression is located. You cannot move predictions between projects as you do with other resources.

4.6.9 Stopping Logistic Regression Predictions

Single predictions are synchronous resources, so you cannot cancel them during the creation since you get the result immediately.

Bycontrast, batch predictions are asynchronous resources, so you can stop their creation before the task is finished. Use the delete batch prediction option from the 1-click action menu (Figure 4.127 ) or from the pop up menu on the list view.

\includegraphics[]{images/logisticregression/lr-stop-batch-pred}
Figure 4.127 Stop logistic regression batch prediction from 1-click action menu

A modal window will be displayed asking you for confirmation. If you stop the prediction during its creation you won’t be able to resume the same task again, so if you want to create the same prediction you will have to start a new task.

\includegraphics[]{images/logisticregression/lr-confirm-delete-pred}
Figure 4.128 Logistic regression delete prediction confirmation

4.6.10 Deleting Logistic Regression Predictions

You can delete your single or batch predictions from the predictions view, using the 1-click action menu (see Figure 4.129 ) or using the pop up menu on the predictions list view (see Figure 4.130 ).

\includegraphics[]{images/logisticregression/lr-delete-batchpred}
Figure 4.129 Logistic regression delete prediction from 1-click menu
\includegraphics[]{images/logisticregression/lr-delete-batchpred-popup}
Figure 4.130 Logistic regression delete prediction from pop up menu

A modal window will be displayed asking you for confirmation. Once a prediction is deleted, it is permanently deleted, and there is no way you (or even the IT folks at BigML) can retrieve it.

\includegraphics[]{images/logisticregression/lr-confirm-delete-pred}
Figure 4.131 Logistic regression delete prediction confirmation