Classification and Regression with the BigML Dashboard

Classification and Regression with the BigML Dashboard
Logistic Regressions
Visualizing Logistic Regressions

4.5 Visualizing Logistic Regressions

After creating your logistic regression, you will be able to analyze your results with BigML unique visualization: a 1D and 2D chart, to examine the impact of the input fileds in the Objective Field, and a coefficients table, for more advanced users, to interpret the resulting coefficients for each input field. The following subsections explain both visualizations in detail.

Switch from the chart view to the table view by clicking on icons in the top bar menu of the logistic regression view. (See Figure 4.60 .)

\includegraphics[]{images/logisticregression/lr-table-chart} — Figure 4.60 Switch chart and table views

4.5.1 Logistic Regression Chart

The chart view is composed of three main parts: the CHART itself, which you can view in one dimension (1D) or two dimensions (2D), the PREDICTION legend and the INPUT FIELDS form. (See Figure 4.61 .) You can find a detailed explanation of each one below.

\includegraphics[]{images/logisticregression/lr-chart-parts} — Figure 4.61 Logistic regression chart parts

The CHART allows you to view the impact of the input fields on the objective classes predictions. You can select the 1D chart or 2D chart by clicking on the green switcher in the top bar menu.

The 1D chart, allows you to select one input field for the x-axis. The y-axis represents the probabilities for each of the predicted classes. Only numeric fields can be selected for the x-axis.(See chart limitations in subsection 4.8.1 .) You can extend the upper limit of the x-axis by clicking on the plus icon.

$\includegraphics[]{images/logisticregression/lr-chart-selectx}$

Figure 4.62 1D chart

The 2D chart allows you to select two different input fields for both axis and the class probabilities are represented in different colors in a heat map.You can select numeric or categorical fields for the axis. You can switch the axis by clicking on the option on top of the chart area.

$\includegraphics[]{images/logisticregression/lr-chart2d}$

Figure 4.63 2D chart

In both charts you can inspect the axis values in the grey area next to the selector. You can freeze the view by pressing Shift and release it again by pressing Escape from your keyboard. When the view is frozen, an edition icon will appear you can edit the axis values and obtain a prediction for another preferred value.The resulting predicted probabilities are in the prediction legend to the right.

To know more about how to interpret the influence of the input fields on predictions, read this post.
The PREDICTION legend allows you to visualize the classes represented in the chart along with their corresponding colors. The main probability color bar at the top is the probability for the predicted class. By default, in the 2D chart, colors are shaded according to the range of probabilites shown in the chart area. That way, smaller differences in predictions are easier to perceive. However, you can choose to see the color shading according to the total range of class probabilities (from 0% to 100%) by clicking on the icon next to the probability bar Total . (See Figure 4.64 .) You can also select to see only one of the classes using the class selector int he bottom of the legend.

$\includegraphics[]{images/logisticregression/legend-2d}$

Figure 4.64 Prediction legend

Freeze this view by pressing Shift , and release it again by pressing Escape from your keyboard.
Below the chart legend, you can find the INPUT FIELDS form. (See Figure 4.65 .) You can configure the values for any numeric, categorical, text or items field. By changing their values, you can see the class probabilities changing in real-time. You can also select or disable your input fields, so they will be treated as missing values. If you configured your model to deactivate the Missing numerics option, you will not be able to disable your numeric fields. (See subsection 4.4.5 .)

$\includegraphics[]{images/logisticregression/lr-chart-input}$

Figure 4.65 Configure the values for other input fields

Moreover, the chart includes a reset option for your input fields values, and an export option to download your chart in PNG format explained below:

After selecting the fields for the axis or configuring the input fields values, you can set them again to the default view by clicking the reset icon highlighted in Figure 4.66 .

$\includegraphics[]{images/logisticregression/lr-chart-reset}$

Figure 4.66 Reset the values for the input fields
You can also export your chart in PNG format with or without legends. Freeze the view by pressing Shift from your keyboard and export the chart to get the classes percentages in the legend. Release the view by pressing Escape .

$\includegraphics[]{images/logisticregression/lr-chart-export}$

Figure 4.67 Export chart as image with or without legends

Note: there are some limitations in the number of classes of the objective field and the number of input fields to visualize your logistic regression in the chart (explained in subsection 4.8.1 ).

4.5.2 Coefficient Table

The main goal of the logistic regression algorithm is to learn the coefficients of the logistic function for each of the dependent variables, i.e., for each of the input fields. A different set of coefficients is associated with each class of the objective field. See section 4.2 for a detailed explanation of logistic regression coefficients interpretation.

BigML allows you to inspect the learned coefficients for each one of the input fields in the coefficient table. The table columns represent the objective field classes while the table rows represent the input field variables and the bias (a.k.a. intercept term) of the logistic regression. In the first row you will always find the Bias coefficients. If your objective field has a high number of classes, you may need to scroll horizontally to see all of them. You can order the table rows by clicking on any of the columns labels.

\includegraphics[]{images/logisticregression/lr-table-view} — Figure 4.68 Table view for logistic regression

For numeric fields, there is always one coefficient by field, however categorical, text and items fields have one coefficient by value (category, term or item). This is due to the required transformations, explained in section 4.2 , to convert categorical, text, and items fields in numeric fields (each single value is mapped to a separate variable in the formula). Missing values also get their own coefficients.

For numeric fields, you always get one coefficient per field. If you train the logistic regression with Missing numerics, you will find an additional coefficient per field for the missing values. (See subsection 4.4.5 .)
For categorical fields, you have one coefficient per classand an additional one for missing values per field. (See subsection 4.4.10 .) Note: if you configure the field with Contrast or Other coding there will be just one coefficient for that field (see subsection 4.4.10 ).
For text fields, there is one coefficient per term and an additional one for missing values per field.
For items fields, you get one coefficient per item and an additional one for missing values per field.

See an example of coefficients for a categorical field in Figure 4.69 where one single field, “Atmospheric condition”, yields twelve different variables associated with different coefficients, one per each one of the field’s classes.

\includegraphics[]{images/logisticregression/categorical-field-transformation} — Figure 4.69 Multiple field variables for categorical one-hot encoded fields

Coefficients for missing values are always found at the end of the table. (See Figure 4.70 .) For fields without missing values in the original dataset, those coefficients should be zero (see subsection 4.2.2 .)

\includegraphics[]{images/logisticregression/lr-missing-coeff-table} — Figure 4.70 Missing numeric coefficients at the end of logistic regression table

Additional options for the table include a filtering option and an export option:

You can filter the table first column by field name, class, term or item using the search box at the top of the table (see Figure 4.71 .)

$\includegraphics[]{images/logisticregression/lr-table-search}$

Figure 4.71 Search and filter logistic regression table
You can also export the table in a CSV file by clicking on the icon highlighted in Figure 4.72 .

$\includegraphics[]{images/logisticregression/lr-table-export}$

Figure 4.72 Export table in CSV file

Note: there are some limitations in the number of classes of the objective field and the number of input fields to visualize your logistic regression in the coefficient table (explained in subsection 4.8.1 ).

Coefficient Table with Stats Computation

When stats are enabled, BigML displays them in the coefficient table. See a detailed explanation of how to include the stats in your model in subsection 4.4.7 .

In the resulting coefficient table you will find three new elements: a new row at the top of the table containing the likelihood ratio, an icon per coefficient indicating its significance and a summary of the stats per coefficient. You can find a detailed explanation below:

The likelihood ratio tests if the coefficients as a whole have any predictive power. It is the difference in the log likelihood between the fitted model and an intercept-only model. You will find it in the first row of the table. (See Figure 4.73 .)

$\includegraphics[]{images/logisticregression/lr-likelihood}$

Figure 4.73 Likelihood ratio

If this difference is significant, i.e., the p-value is lower than the significance level, then the coefficients as a whole have more predictive power than an intercept-only model. The icon shown in Figure 4.74 indicates if the likelihood ratio is significant or not given the selected significance level.

$\includegraphics[]{images/logisticregression/lr-likeli-significant}$

Figure 4.74 Significant likelihood ratio

You can select your preferred significance level by using the selector shown in Figure 4.75 .

$\includegraphics[]{images/logisticregression/lr-significance}$

Figure 4.75 Select significance level

You can see the p-value associated to the likelihood ratio by mousing over the sigma icon. (See Figure 4.76 .)

$\includegraphics[]{images/logisticregression/lr-likel-pvalue}$

Figure 4.76 Likelihood ratio p-value
Next to each coefficient you will find one icon indicating if it is significant (see Figure 4.77 ) or non-significant (see Figure 4.78 ).

$\includegraphics[width=2cm]{images/logisticregression/significant-icon}$

Figure 4.77 Significant icon

$\includegraphics[width=2cm]{images/logisticregression/non-significant-icon}$

Figure 4.78 Non-significant icon

The significance of a coefficient is determined by comparing the p-value against the significance level selected in the top menu (see Figure 4.75 ). If the p-value is higher than the significance level, the coefficient will be non-significant. If the p-value is lower than the significance level, the coefficient will be significant. A good practice is to retrain the logistic regression removing the non-significant coefficients. However, in most cases, the model performance should not be affected.

$\includegraphics[]{images/logisticregression/lr-icons-significance}$

Figure 4.79 Significance icons for coefficient estimates
Next to each icon indicating the significance of a coefficient, you will find a $\sigma $ symbol. If you mouse over it, a tooltip will display a summary of the stats for that coefficient. (See Figure 4.80 .) First, you will find the p-value from the Wald test. As mentioned in the previous point, this p-value is compared against the selected significance level to determine the coefficient’s significance. Then, associated with the Z score chart, you will find the Z score value, the confidence interval for a 95% confidence and the standard error, or variance, of the coefficient estimate.

$\includegraphics[]{images/logisticregression/lr-summary-stats}$

Figure 4.80 Summary of stats per coefficient

Note: remember that your categorical fields will automatically be configured with dummy coding when stats are enabled to avoid multicollinearity and the dummy class selected will be the first one in alphabetical order. Learn more about field codings and stats configuration in subsection 4.4.10 and subsection 4.4.7 respectively.

You can download all the stats information by clicking in the download CSV icon in the top menu to the right. (See subsection 4.7.1 .)