Classification and Regression with the BigML Dashboard
3.6 linear Regression Predictions
3.6.1 Introduction
The ultimate goal in building a linear regression is being able to make predictions with it. In BigML, you can make predictions for single instances or for many instances in batch. Each prediction comes with a measure, prediction interval, indicating the 95% confidence range for the predicted value.
The predictions tab in the main menu of the BigML Dashboard is where all your saved predictions are listed. (See Figure 3.67 .) You can search your predictions by name clicking on the search option on the top menu. In the predictions list view, you can see, for each prediction, the linear regression icon used for the prediction, the Name of the prediction, the Objective (objective field name), the Prediction (the prediction result), and the Age (time since the prediction was created).
When you first create an account at BigML, or every time that you start a new Project, your list of predictions will be empty. (See Figure 3.68 ).
Linear regression predictions are saved under the Classification & Regression option in the menu (see Figure 3.69 ).
Select the list for your single instances predictions or your batch predictions by clicking on the corresponding icons. (See Figure 3.70 and Figure 3.71 .)
3.6.2 Creating Linear Regression Predictions
BigML provides two options to predict with your linear regressions explained in the following subsections:
predict: to predict one single instance
batch prediction to predict multiple instances in batch.
Predict
BigML allows you to quickly make predictions for single instances by providing a form containing the input fields used by the linear regression, so you can easily set the values and get an immediate response.
Follow these steps to create a single prediction:
Click predict in the linear regression 1-click action menu. (See Figure 3.72 )
Alternatively, click predict in the pop up menu in the list view. (See Figure 3.73 )
You will be redirected to the prediction form where you will find all the fields used by the linear regression as input fields. (See Figure 3.74 .)
Select the fields to be used for your prediction (Figure 3.75 .) Non-selected fields will be considered as missing values during the prediction.
Set input values for your selected fields. BigML supports numeric, categorical, text and items fields as inputs.
Get the prediction at the top of the view along with the prediction interval. (See Figure 3.76 .) BigML predictions are synchronous, i.e., when you send the input data, you get an immediate response. Moreover, single predictions from the BigML Dashboard are performed locally, so unless you save your prediction, it will not consume any credits and it will be updated instantly when you change your input values. Learn more about Local predictionss in Local Predictions .
Optionally, you can Figure 3.77 .)
the linear regression prediction, so you will find it afterwards in the predictions list view. (See
Note: this option is only available from the BigML Dashboard for linear regressions with less than 100 fields. If you want to perform single instance predictions for a higher number of fields, use the BigML API.
Batch Predictions
BigML batch predictions allow you to make simultaneous predictions for multiple instances. All you need is the linear regression you want to use to make predictions and a dataset containing the instances you want to predict. BigML will create a prediction for each instance in the dataset.
Follow these steps to create a batch prediction:
Click on batch prediction option under the linear regression 1-click action menu (Figure 3.78 )
Alternatively, click on create batch prediction in the pop up menu of the list view (Figure 3.79 ).
Select the dataset containing all the instances you want to predict. The instances should contain the input values for the fields used by the linear regression as input fields. From this view you can also select another linear regression from the selector or even a model or ensemble by clicking on the icons on the top left menu. (See Figure 3.80 .)
After the linear regression and the dataset are selected, the batch prediction configuration options will appear along with a preview of the prediction output (a CSV file). (See Figure 3.81 .) The default output format includes all your prediction dataset fields and adds an extra column with the class predicted. See subsection 3.6.3 ofr a detailed explanation of all configuration options.
By default, BigML generates an output Dataset with your batch predictions that you can later find in your datasets section in the BigML Dashboard. This option is active by default but you can deactivate it by clicking on the icon shown in Figure 3.82 .
After you configure your batch prediction, click on the green button Figure 3.83 .)
to generate your batch prediction. (SeeWhen the batch prediction is created, you will be able to download the CSV file containing all your dataset instances along with a prediction for each one of them. (See Figure 3.84 .)
If you didn’t disable the option to create a dataset explained in step 4, you will also be able to access the output dataset from the batch prediction view. (See Figure 3.85 .)
3.6.3 Configuring Linear Regression Predictions
BigML provides several options to configure your predictions such as setting default values for your missing numeric values (see Default Numeric Value ), fields mapping (see Fields Mapping ), and output file settings (see Output Settings .)
Default Numeric Value
By using the Default numeric value before creating your batch prediction, you can easily replace all the missing numeric values by the field’s Mean, Median, Maximum, Minimum or by Zero. (See Figure 3.86 .)
Excluded Fields
You can specify which field or fields to exclude from the input data when creating your batch prediction. You search the field by typing the name and click on the field found to add to the list of exclusion. (See Figure 3.87 )
Fields Mapping
You can specify which input fields of the linear regression match with the fields in the dataset contaning the instances you want to predict. BigML automatically matches fields by name, but you can also set an automatic match by field ID by clicking on the green switcher. Additionally, you can manually search for fields or remove them from the Dataset fields column if you do not want them to be considered during the batch prediction. (See Figure 3.88 .)
Note: Fields mapping from the BigML Dashboard is limited to 200 fields. For batch predictions with a higher number of fields, map your fields using the BigML API.
Output Settings
Batch predictions return a CSV file containing all your instances and the final predictions. Tune the following settings to customize your prediction file (see Figure 3.89 ):
Separator: this option allows you to choose the best separator for your output file columns. The default separator is the comma. You can also select the semicolon, the tab, or the space.
New line: this option allows you to set the new line character to use as the line break in the generated csv file: “LF”, “CRLF”.
Output fields: by clicking on the list icon next to the separator selector, you can include or exclude all your dataset fields from your output file. You can also individually select the fields you want to include or exclude using the multiple output fields selector. Note: a maximum of 100 fields can be displayed in this selector, but all your dataset fields will be included in the output file by default unless you exclude them.
Headers: this option includes or excludes a first row in the output file (and in the output dataset) with the names of each column (input field names, prediction column name, probability column name, etc.). By default, BigML includes the headers.
Prediction column name: customize the name for your predictions column. By default, BigML takes the name of the linear regression’s objective field.
Confidence bounds: this option allows you to include two additional columns with the confidence interval and prediction interval. By default they are not included in your ouput file.
Confidence interval column name: customize the name for the confidence interval column if you include it in the output file. BigML sets “confidence interval” as the default name.
Prediction interval column name: customize the name for the prediction interval column if you include it in the output file. BigML sets “prediction interval” as the default name.
Prediction explanation
Prediction explanation helps understand why a linear regression makes a certain prediction. This is very useful in many applications, and the reasons behind a prediction are often as important as the prediction itself.
BigML prediction explanation is based on Shapley values. For more information, please refer to this research paper: A Unified Approach to Interpreting Model Predictions [ 25 ] .
When creating single linear regression prediction, you can request the explanation for the prediction by clicking the Figure 3.90 ).
icon and then click (seeThe prediction explanation represents the most important factors considered by the linear regression in a prediction given the input values. Each input value will yield an associated importance, as you can see Figure 3.91 . The importances across all input fields should sum 100%.
You can export the prediction explanation to a PNG image file, a CSV file or a JSON file by clicking the top right icons respectively.
3.6.4 Consuming Linear Regression Predictions
Local Predictions
Local predictionss are provided for single instances from the BigML Dashboard which are performed faster at no cost. Local predictions allow you to get a real-time prediction without consuming any credits or requiring any internet connection. This is possible because the linear regression is saved in-memory, so when the input values change, BigML is able to compute predictions in microseconds.
In addition to BigML Dashboard, you can fully use single and batch predictions via the BigML API and bindings. The following subsections explain both tools.
Using Linear Regression Predictions via the BigML API
Linear regression predictions have full citizenship in the BigML API which allows you to programmatically create, configure, retrieve, list, update, and delete single and batch predictions.
In the example below, see how to create a single prediction using a linear regression and defining the input data once you have properly set the BIGML_AUTH environment variable to contain your authentication credentials:
curl "https://bigml.io/prediction?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"linearregression": "linearregression/5c79513a983efc522f000009",
"input_data": {"000003":0.61, "000004":1.58, "000005":1.15,
"000007":0.55}}}'
For more information on using linear regressions through the BigML API, please refer to the documentation.
Using Linear Regression Predictions via BigML Bindings
You can also create, configure, retrieve, list, update, and delete single and batch predictions via BigML bindings which are libraries aimed to make it easier to use the BigML API from your language of choice. BigML offers bindings in multiple languages including Python, Node.js, Java, Swift and Objective-C. See below an example to create a linear regression with the Python bindings.
from bigml.api import BigML
api = BigML()
prediction = api.create_prediction(
"linearregression/5c702c91983efc4cc6000016",
{"age": 230, "cement": 326.81, "blast_furnace_slag": 205.33, "fly_ash":105.17})
For more information on BigML bindings, please refer to the bindings page.
3.6.5 Descriptive Information
Each linear regression prediction has an associated name, description, category, and tags. You can find a brief description of each concept in the following subsections. The More info menu option displays a panel that provides editing options. (See Figure 3.92 )
Name
If you do not specify a name for your predictions, BigML assigns a default name depending on the type of predictions:
Single predictions: BigML uses the linear regression name “
<linear regression name>
”.Batch predictions: BigML combines your prediction dataset name and the linear regression name: “
<linear regression name>
with<dataset name>
”.
Predictions names are displayed on the list and also on the top bar of a prediction view. Predictions names are indexed to be used in searches. Rename your predictions any time from the More info menu.
The name of a prediction cannot be longer than 256 characters. More than one prediction can have the same name even within the same project, but they will always have different identifiers.
Description
Each prediction also has a description that it is very useful for documenting your Machine Learning projects. Predictions take their description from the linear regression used to create them.
Descriptions can be written using plain text and also markdown. BigML provides a simple markdown editor that accepts a subset of markdown syntax. (See Figure 3.93 .)
Descriptions cannot be longer than 8192 characters.
Category
A category taken from the linear regression used to create it is associated with each prediction. Categories are useful to classify predictions according to the domain which your data comes from. This is useful when you use BigML to solve problems across industries or multiple customers.
A prediction category must be one of the categories listed on table Table 3.5 .
Tags
A prediction can also have a number of tags associated with it. These tags help to retrieve the prediction via the BigML API or to provide predictions with some extra information. Your prediction inherits the tags from the linear regression used to create it. Each tag is limited to a maximum of 128 characters. Each prediction can have up to 32 different tags.
3.6.6 Linear Regression Predictions Privacy
The link displayed in the Privacy panel is the private URL of your prediction, so only a user logged into your account is able to see it. Neither single predictions nor batch predictions can be shared by using a secret link. (See Figure 3.94 .)
3.6.7 Moving Linear Regression Predictions to Another Project
When you create a prediction, it will be assigned to the same project where the original linear regression is located. You cannot move predictions between projects as you do with other resources.
3.6.8 Stopping Linear Regression Predictions
Single predictions are synchronous resources, so you cannot cancel them during the creation since you get the result immediately.
By contrast, batch predictions are asynchronous resources, so you can stop their creation before the task is finished. Use the delete batch prediction option from the 1-click action menu (Figure 3.95 ) or from the pop up menu on the list view.
A modal window will be displayed asking you for confirmation. If you stop the prediction during its creation you won’t be able to resume the same task again, so if you want to create the same prediction you will have to start a new task.
3.6.9 Deleting Linear Regression Predictions
You can delete your single or batch predictions from the predictions view, using the 1-click action menu (see Figure 3.97 ) or using the pop up menu on the predictions list view (see Figure 3.98 ).
A modal window will be displayed asking you for confirmation. Once a prediction is deleted, it is permanently deleted, and there is no way you (or even the IT folks at BigML) can retrieve it.