Classification and Regression with the BigML Dashboard

1.1 Introduction

There are multiple Machine Learning problems that require a model to predict an output variable (Objective Field in BigML parlance) given a number of input variables (input Fields in BigML). These problems can be divided into Classification and Regression problems, depending on the data type of the objective field:

  • Classification: when the objective field is categorical. For these problems, a Machine Learning algorithm is used to build a model that predicts a category (label or class) for a new example (instance). That is, it “classifies” new instances into a given set of categories (or discrete values). For example, “true or false”, “fraud or not fraud”, “high risk, low risk or medium risk”, etc. There can be hundreds of different categories.

  • Regression: when the objective field is numeric. For these problems, a Machine Learning algorithm is used to build a model that predicts a continuous value. That is, given the fields that define a new instance the model predicts a real number. For example, “the price of a house”, “the number of units sold for a product”, “the potential revenue of a lead”, “the number of hours until next system failure”, etc.

Both classification and regression problems can be solved using Supervised learning Machine Learning techniques. They are called supervised in the sense that the values of the output variable has either been provided by a human expert (e.g., the patient had been diagnosed with diabetes or not) or by a deterministic automated process (e.g., customers who did not pay their fees in the last three months are labeled as “delinquent”). The objective field values along with the input fields need to be collected for each Instances in a structured Dataset that is used to train the model. The algorithms learn a predictive model that maps your input data to a predicted objective field value.

A BigML Model uses a proprietary Decision Trees algorithm based on the Classification and Regression Trees (CART) algorithm proposed by Leo Breiman. section 1.2 explains in detail BigML models implementation and interpretation.

This chapter contains comprehensive description of BigML models including how they can be created with 1-click (section 1.3 ), all configuration options available (section 1.4 ), and the different visualizations provided by BigML (section 1.5 ). Once you create a model, you can get a report for each field importance (subsection 1.6.1 ). See section 1.7 for an explanation of how models can be used to make predictions. You can also export your models in different formats to make local predictions faster at no cost (section 1.8 ). The process to evaluate your model’s predictive performance in BigML is explained in a different chapter (Chapter 7 ).

In BigML, the third tab of the main menu of your Dashboard allows you to list all your available models.In the model list view (Figure 1.2 ), you can see, for each model, the dataset it was created from, the model’s Name, Type (either classification or regression), Objective, Age (time elapsed since it was created), Size, and number of predictions, batch predictions, or evaluations that have been created using that model. The search menu option in the top right corner of the ensemble list view allows you to search your models by name.

\includegraphics[width=\textwidth ]{images/models/models-list-view}
Figure 1.1 Models list view

When you first create an account at BigML, or every time that you start a new Project, your list view for models will be empty. (See Figure 1.2 .)

\includegraphics[width=\textwidth ]{images/models/empty-listing}
Figure 1.2 Empty Dashboard models view

Finally, in Figure 1.3 you can see the icon used to represent a model in BigML.

\includegraphics[width=2cm]{images/models/model-icon}
Figure 1.3 Models icon