Classification and Regression with the BigML Dashboard
1.16 Takeaways
This chapter explained models in detail. We conclude it with a list of key points:
BigML models are human friendly as opposed to many other Machine Learning models. They provide a set of rules organized in a tree structure that are easy to understand for non-experts.
You can use BigML models to solve Classification and Regression problems.
BigML models support any type of fields as input fields (categorical, numeric, date and time, text, and items fields).
BigML Models are not affected by uninformative or redundant fields.
Normalization is not needed.
To build a BigML model you just need a dataset. (See Figure 1.119 ).
A BigML model can be an input to an evaluation, to a prediction, or to a batch prediction. (See Figure 1.119 ).
A BigML model can be the output of a cluster [ 5 ] . (See Figure 1.119 ).
You can create a BigML model with just 1-click or configure it as you wish. BigML models are easy to tune without having to configure difficult parameters.
You can also create models using BigML REST API or the BigML bindings for your language of choice.
If you do not specify any Objective Field, BigML will use the last valid field in your dataset.
You can choose three different pruning strategies when building your BigML model to avoid overfitting: smart pruning, statistical pruning, or no statistical pruning.
By default, BigML models do not consider missing values when choosing splitting rules, but you can explicitly include them.
You can set the maximum number of nodes you want for your BigML model. Greater number of nodes will grow more complex trees that will perform better with the training data at the expense of generalization.
To deal with imbalanced datasets, BigML provides three different options to assign specific weight your instances: balance objective, objective weights, weight field.
For classification problems, the Confidence is a measure of the model’s certainty when predicting a class at a certain node.
For regression problems, the Expected error is a measure of the expected error at a node.
You can visualizeBigML models in an interactive decision tree’s structure or with the Sunburst view.
BigML provides a summarized view of your model, including: the data distribution, prediction distribution, field importance, and the rules summary.
You can easily see which fields in your dataset have more impact on predictions by clicking in the model’s summary report.
You need to evaluate your model’s performance using data that the model has not seen before. Evaluating a model is a key step to understand if that model is satisfactory or needs training adjustments. In the latter case, you can use different creation options until you achieve the desired evaluation results.
You can download your model to your preferred programming language to use it in your local environment, and make predictions faster at no cost.
Once you get a satisfactory model, you can make single or batch predictions using your model.
BigML provides local predictions from the BigML Dashboard for single instance predictions. Local predictions allow you to get a real-time prediction without consuming any credits or requiring an internet connection.
BigML batch predictions allow you to make simultaneous predictions for multiple instances. For batch predictions, you always get a CSV file and an optional output dataset.
You can furnish your model with descriptive information (name, description, tags, and category).
You can clone an existing model from BigML Gallery.
You can share your model in the BigML Gallery as Black Box or White Box, so other BigML users can clone your model and make predictions with it.
You can stop the model’s creation before the task is finished.
You can permanently delete a model.