Classification and Regression with the BigML Dashboard

Classification and Regression with the BigML Dashboard
Linear Regressions
Visualizing Linear Regressions

3.5 Visualizing Linear Regressions

After creating your linear regression, you will be able to analyze your results with BigML unique visualization: a 1D chart and a Partial Dependence Plot (PDP), to examine the impact of the input fileds in the Objective Field, and a coefficients table, for more advanced users, to interpret the resulting coefficients for each input field. The following subsections explain both visualizations in detail.

Switch among the three views of 1D chart, PDP and table by clicking on icons in the top bar menu of the linear regression view. (See Figure 3.49 .)

\includegraphics[]{images/linearregression/lnr-table-chart} — Figure 3.49 Switch chart, PDP and tableviews

3.5.1 Linear Regression Chart

Both views of 1D chart and PDP are composed of three main parts: the CHART or PLOT, the PREDICTION legend and the INPUT FIELDS form. (See Figure 3.50 ). You can find a detailed explanation of each one below.

\includegraphics[]{images/linearregression/lnr-chart-parts} — Figure 3.50 Linear regression chart parts

Both the 1D Chart and the Partial Dependence Plot (PDP) allow you to view the impact of the input fields on the objective classes predictions, and their relationship. You can select the 1D chart or PDP by clicking on the selection buttons in the top bar menu.

The 1D chart, allows you to select one input field for the x-axis. The y-axis represents the objective field. Only numeric fields can be selected for the x-axis. (See chart limitations in subsection 3.8.1 ). You can extend the upper limit of the x-axis by clicking on the plus icon.

The blue band is formed by the upper bounds and lower bounds of the 95% prediction intervals. This means, for any given point at x-axis, its y value will be within this blue range with 95% probability. You can hide or show the band by clicking on the icon next to the field name of the y-axis.

$\includegraphics[]{images/linearregression/lnr-chart-selectx}$

Figure 3.51 1D chart

The PDP allows you to select two different input fields for both axis and the values of the objective field are represented by differences in a color scale in the heatmap chart. You can select numeric or categorical fields for the axis. You can switch the axis by clicking on the option on top of the chart area.

$\includegraphics[]{images/linearregression/lnr-pdp}$

Figure 3.52 Partial Dependence Plot

In both charts you can inspect the axis values in the gray area boxes next to the selector. You can freeze the view by pressing Shift and release it again by pressing Esc from your keyboard. When the view is frozen, an edit icon will appear you can edit the axis values and obtain a prediction for another preferred value.The resulting predicted probabilities are in the prediction legend to the right.
The PREDICTION legend allows you to visualize the predicted values represented in the chart along with their corresponding colors. By default, in PDP, colors are shaded according to the prediction range shown in the chart area. This way, smaller differences in predictions are easier to perceive. However, you can choose to see the color shading according to the total range of values for the objective field by clicking on the icon next to the prediction bar Total . This Total option allows you to see the color scale for the total range of predictions. (See Figure 3.53 )

$\includegraphics[]{images/linearregression/lnr-prediction-legend}$

Figure 3.53 Prediction legend

Again, freeze this view by pressing Shift , and release it again by pressing Esc from your keyboard.
Below the chart legend, you can find the INPUT FIELDS form. (See Figure 3.54 ). You can configure the values for any numeric, categorical, text or items field. By changing their values, you can see the predictions changing in real-time.

$\includegraphics[]{images/linearregression/lnr-chart-input}$

Figure 3.54 Configure the values for other input fields

Moreover, the chart includes a reset option for your input fields values, and an export option to download your chart in PNG format explained below:

After selecting the fields for the axis or configuring the input fields values, you can set them again to the default view by clicking the reset icon highlighted in Figure 3.55 .

$\includegraphics[]{images/linearregression/lnr-chart-reset}$

Figure 3.55 Reset the values for the input fields
You can also export your chart in PNG format with or without legends. Freeze the view by pressing Shift from your keyboard and export the chart to get the classes percentages in the legend. Release the view by pressing Esc .

$\includegraphics[]{images/linearregression/lnr-chart-export}$

Figure 3.56 Export chart as image with or without legends

Note: there are some limitations in the number of input fields to visualize your linear regression in the chart (explained in subsection 3.8.1 ).

3.5.2 Coefficient Table

The main goal of the linear regression algorithm is to learn the coefficients of the linear function for each of the dependent variables, i.e., for each of the input fields. See section 3.2 for a detailed explanation of linear regression coefficients interpretation.

BigML allows you to inspect the learned coefficients for each one of the input fields in the coefficient table. The table columns represent coefficients and their statistics while the table rows represent the input field variables and the Bias (a.k.a. intercept term) of the linear regression. In the first row you will always find the Bias coefficients. You can sort the table rows by clicking on any of the columns labels.

\includegraphics[]{images/linearregression/lnr-table-view} — Figure 3.57 Table view for linear regression

For numeric fields, there is always one coefficient by field, however categorical, text and items fields have one coefficient by value (category, term or item). This is due to the required transformations, explained in section 3.2 , to convert categorical, text, and items fields to numeric fields (each single value is mapped to a separate variable in the formula). Missing values also get their own coefficients.

For numeric fields, you always get one coefficient per field. If a field contains missing values, you will find an additional coefficient per field for the missing values. (See subsection 3.2.2 .)
For categorical fields, you have one coefficient per class and an additional one for missing values per field. (See subsection 3.2.3 and subsection 3.4.6 .)

Note: if you configure the field with Contrast or Other coding there will be just one coefficient for that field (see subsection 3.4.6 ).
For text fields, there is one coefficient per term and an additional one for missing values per field.
For items fields, you get one coefficient per item and an additional one for missing values per field.

See an example of coefficients for a categorical field in Figure 3.58 where one single field, “Atmospheric condition”, yields eleven different variables associated with different coefficients. There are twelve classes in the categorical field, with one set as the dummy class.

\includegraphics[]{images/linearregression/lnr-categorical-field-transformation} — Figure 3.58 Multiple field variables for categorical dummy encoded fields

Coefficients for missing values are always found at the end of the table. (See Figure 3.59 )

\includegraphics[]{images/linearregression/lnr-missing-coeff-table} — Figure 3.59 Missing numeric coefficients at the end of linear regression table

Next to each coefficient you will find one icon indicating if it is significant (see Figure 3.60 ) or non-significant (see Figure 3.61 ).

\includegraphics[width=2cm]{images/linearregression/lnr-significant-icon} — Figure 3.60 Significant icon

\includegraphics[width=2cm]{images/linearregression/lnr-non-significant-icon} — Figure 3.61 Non-significant icon

The significance of a coefficient is determined by comparing the p-value against the significance level selected in the top menu (see Figure 3.62 ). If the p-value is higher than the significance level, the coefficient will be non-significant. If the p-value is lower than the significance level, the coefficient will be significant. A good practice is to retrain the linear regression removing the non-significant coefficients. However, in most cases, the model performance should not be affected.

\includegraphics[]{images/linearregression/lnr-significance} — Figure 3.62 Select significance level

\includegraphics[]{images/linearregression/lnr-icons-significance} — Figure 3.63 Significance icons for coefficient estimates

Next to each icon indicating the significance of a coefficient, you will find a $\sigma $ symbol. If you mouse over it, a tooltip will display a summary of the stats for that coefficient. (See Figure 3.64 .) First, you will find the p-value from the Wald test. As mentioned in the previous point, this p-value is compared against the selected significance level to determine the coefficient’s significance. Then, associated with the Z score chart, you will find the Z score value, the confidence interval for a 95% confidence and the standard error, or variance, of the coefficient estimate.

\includegraphics[]{images/linearregression/lnr-summary-stats} — Figure 3.64 Summary of stats per coefficient

You can download all the stats information by clicking in the download CSV icon in the top menu to the right. (See subsection 3.7.1 .)

Additional options for the table include a filtering option and an export option:

You can filter the table first column by field name, class, term or item using the search box at the top of the table (see Figure 3.65 .)

$\includegraphics[]{images/linearregression/lnr-table-search}$

Figure 3.65 Search and filter linear regression table
You can also export the table in a CSV file by clicking on the icon highlighted in Figure 3.66 .

$\includegraphics[]{images/linearregression/lnr-table-export}$

Figure 3.66 Export table in CSV file